Empower your editors with ExamineX’s AI capabilities to transform your media library! Seamlessly search using auto-generated Descriptions, Tags, Categories, Locations, People, and more 🙌

You can also automatically have the contents of your Umbraco media files (i.e. PDF, MS Word, etc…) indexed with ExamineX without the need for additional indexes.

Requirements

This feature is specifically targeted at Umbraco media when configured with the Umbraco.StorageProviders.AzureBlob package.

This feature is specifically targeted at Umbraco media when configured with the UmbracoFileSystemProviders.Azure.Media (>= 2.0.0) package.

If you are using Azure to host your Umbraco website it is recommended to use blob storage as your media provider. This provides more flexibility with scaling your solution along with the benefits of CDN support.

With this ExamineX extension, your media document files such as PDFs and Microsoft Office docs will automatically have their content’s indexed. And with AI capabilities enabled, your images will be scanned and and auto-generated metadata can be produced such as OCR extracted text, Descriptions, Tags, Categories, Locations, People, and more 🎉.

Installation

After installing and configuring ExamineX…

Install, configure and test the Umbraco.StorageProviders.AzureBlob package for your media.

Then install the ExamineX.AzureSearch.Umbraco.Media Nuget package.

Then you’ll need to enable the integration in your Startup.cs and add the .AddExamineXAzureSearchForMedia() call after the .AddExamineXAzureSearch() call:

public void ConfigureServices(IServiceCollection services)
{
    services.AddUmbraco(_env, _config)
        .AddBackOffice()
        .AddWebsite()
        .AddDeliveryApi()
        .AddComposers()
        .AddExamineXAzureSearch()
        .AddExamineXAzureSearchForMedia()
        .Build();
}

Install, configure and test the Umbraco.StorageProviders.AzureBlob package for your media.

Then install the ExamineX.AzureSearch.Umbraco.BlobMedia Nuget package.

Then you’ll need to enable the integration in your Startup.cs and add the .AddExamineXForBlobMedia() call after the .AddExamineXAzureSearch() call:

public void ConfigureServices(IServiceCollection services)
{
    services.AddUmbraco(_env, _config)
        .AddBackOffice()
        .AddWebsite()
        .AddDeliveryApi()
        .AddComposers()
        .AddExamineXAzureSearch()
        .AddExamineXForBlobMedia()
        .Build();
}

Install, configure and test the Umbraco.StorageProviders.AzureBlob package for your media.

Then install the ExamineX.AzureSearch.Umbraco.BlobMedia Nuget package.

Install, configure and test the UmbracoFileSystemProviders.Azure.Media package for your media.

Then install the ExamineX.AzureSearch.Umbraco.BlobMedia Nuget package.

Once the package is installed any PDF files, MS Office document files, and others file types will automatically be indexed and stored in your corresponding internal/external Umbraco indexes with the field name content.

NOTE: The field name ‘content’ cannot be changed. This is a limitation of Azure Search’s field mapping. For this to work you should not have a Property Type called content.

AI Integration (v6.1+)

Once the above package is installed, you can configure the AI integration for image analysis.

Quick config

The simplest config using defaults to enable this feature in your appsettings.json file will be:

{
  "ExamineX": {
    "AzureSearch": {
      "Media": {
        "EnableImageAnalysis": true
      }
    }
  }
}

Once this is enabled and the Azure Search configured indexer is executed (i.e. a media item is saved), the media items in the index will be populated with generative AI details as per your configuration. ExamineX also automatically configures the Umbraco back office search to include the relevant fields so that your editors can easily find media based on the AI generated information. For example, for this image:

24 Days people at Codegarden

It will generate and index this information:

AI generated info

Now, when your editors search in the backoffice, media will be found based on what is in the image:

Back office search

Similarly, when searching for media in a media picker, ExamineX has configured the search to work against the index so your editors can find the media they need quickly:

Media picker search

Integrating UI

The ExamineX.AzureSearch.Umbraco.Media package will install a new Property Editor called: ExamineX Media Info which is a readonly property editor to display the generated/indexed information for a media item based on the fields populated by Azure AI Search.

The simplest way to integrate this is to:

  • Create a new Data Type based on the ExamineX Media Info.
  • Update your Image Media Type, add a Property Type with your newly created Data Type.

ExamineX Media configuration options

Name Description Default value
EnableImageAnalysis Enables image analysis FALSE
ExcludedFileNameExtensions File types to be excluded from being indexed  
ImageAnalysisDefaultLanguage The default language code applied to the Azure Search ImageAnalysisSkill “en”
ImageAnalysisFeatures An array of the image analysis features to be enabled. The options are: “ocr”, “categories”, “description”, “brands”, “tags”, “celebrities”, “landmarks”.

See Azure Search docs for more details.
[“ocr”, “categories”, “description”, “brands”, “tags”]
AzureAiServicesKey [Optional] Sets the Azure AI (Cognitive Services) Key to use for billing When empty, image analysis billing will be attached to the same account as the Azure Search service.

See Azure Search docs for more details.
IndexingScheduleInterval The time interval configured for the Azure Search indexer that scans blob storage media files to re-index new changes. Whenever media is changed in Umbraco, the indexer is manually triggered so the scanning will happen in near real time. 5 hours

Index fields

Several index fields will be created/used based on the image analysis features enabled:

Field name Description
content The contents of document media files such as PDFs, MS Word, etc…
imageOcr The extracted OCR text of the image
imageDescription The AI generated description of the image
imageBrands The detected Brand names found in the image
imageCategories The AI generated categories of the image
imageCategoriesCelebrities The detected Celebrity names found in the image
imageCategoriesLandmarks The detected Landmark names found in the image
imageTags The AI generated tags of the image

Searching

Searching on this content is exactly the same way you would search any field in Examine. For example, if you wanted to search for a term within the contents of a media file in the ExternalIndex, you could do:

 if(ExamineManager.TryGetIndex("ExternalIndex", out var index))
{
    var searcher = index.GetSearcher();

    // Query on the 'content' field for media
    var results = searcher
        .CreateQuery("media")
        .Field("content", searchTerm)
        .Execute();
}