ExamineX can automatically index the contents of your Umbraco media files without the need for additional indexes. This includes indexing PDFs, Microsoft Office documents, HTML files and more.

Requirements

This feature is specifically targeted at Umbraco media when configured with the Umbraco.StorageProviders.AzureBlob package.

This feature is specifically targeted at Umbraco media when configured with the UmbracoFileSystemProviders.Azure.Media (>= 2.0.0) package.

If you are using Azure to host your Umbraco website it is recommended to use blob storage as your media provider. This provides more flexibility with scaling your solution along with the benefits of CDN support.

With this ExamineX extension, your media document files such as PDFs and Microsoft Office docs will automatically have their content’s indexed 🎉

Installation

After installing and configuring ExamineX…

Install, configure and test the Umbraco.StorageProviders.AzureBlob package for your media.

Then install the ExamineX.AzureSearch.Umbraco.BlobMedia Nuget package.

Then you’ll need to enable the integration in your Startup.cs and add the .AddExamineXForBlobMedia() call after the .AddExamineXAzureSearch() call:

public void ConfigureServices(IServiceCollection services)
{
    services.AddUmbraco(_env, _config)
        .AddBackOffice()
        .AddWebsite()
        .AddDeliveryApi()
        .AddComposers()
        .AddExamineXAzureSearch()
        .AddExamineXForBlobMedia()
        .Build();
}

Install, configure and test the Umbraco.StorageProviders.AzureBlob package for your media.

Then install the ExamineX.AzureSearch.Umbraco.BlobMedia Nuget package.

Install, configure and test the UmbracoFileSystemProviders.Azure.Media package for your media.

Then install the ExamineX.AzureSearch.Umbraco.BlobMedia Nuget package.

Once the ExamineX.AzureSearch.Umbraco.BlobMedia package is installed any PDF files, MS Office document files, and others file types will automatically be indexed and stored in your corresponding internal/external Umbraco indexes with the field name content.

NOTE: The field name ‘content’ cannot be changed. This is a limitation of Azure Search’s field mapping. For this to work you should not have a Property Type called content.

Searching

Searching on this content is exactly the same way you would search any field in Examine. For example, if you wanted to search for a term within the contents of a media file in the ExternalIndex, you could do:

 if(ExamineManager.TryGetIndex("ExternalIndex", out var index))
{
    var searcher = index.GetSearcher();

    // Query on the 'content' field for media
    var results = searcher
        .CreateQuery("media")
        .Field("content", searchTerm)
        .Execute();
}