NOTE: This is not yet available in ExamineX version 3

You can automatically have the contents of your Umbraco media files indexed with ExamineX without the need for additional indexes.

Requirements

This new feature is specifically targeted at Umbraco media when configured with the UmbracoFileSystemProviders.Azure.Media (>= 2.0.0) package. If you are using Azure to host your Umbraco website it is recommended to use blob storage as your media provider. This provides more flexibility with scaling your solution along with the benefits of CDN support.

With this ExamineX extension, your media document files such as PDFs and Microsoft Office docs will automatically have their content’s indexed 🎉

Installation

Install, configure and test the UmbracoFileSystemProviders.Azure.Media package for your media.

Install, configure and test ExamineX

Then install the ExamineX.AzureSearch.Umbraco.BlobMedia Nuget package:

PM> Install-Package ExamineX.AzureSearch.Umbraco.BlobMedia

Once the ExamineX.AzureSearch.Umbraco.BlobMedia package is installed any PDF files, MS Office document files, and others file types will automatically be indexed and stored in your corresponding internal/external Umbraco indexes with the field name content.

NOTE: The field name ‘content’ typically cannot be changed. This is a limitation of Azure Search’s field mapping. It is possible to re-map this field with the Azure Search indexer field mappings, but then it’s not possible to also still have a field called ‘content’.

Searching

Searching on this content is exactly the same way you would search any field in Examine. For example, if you wanted to search for a term within the contents of a media file in the ExternalIndex, you could do:

 if(ExamineManager.TryGetIndex("ExternalIndex", out var index))
{
    var searcher = index.GetSearcher();

    // Query on the 'content' field for media
    var results = searcher
        .CreateQuery("media")
        .Field("content", searchTerm)
        .Execute();
}