Azure Search comes with many more features than the default Lucene engine in Examine and you can leverage these features with ExamineX.

Analyzers

There are many additional analyzers in Azure Search than there are available in Lucene, some of which tend to work better than Lucene’s depending on your requirements. For example, Microsoft’s documentation states:

The default analyzer is Standard Lucene, which works well for English, but perhaps not as well as Lucene’s English analyzer or Microsoft’s English analyzer.

Default analyzers

The default analyzers used in ExamineX are configured to be synonymous with the ones shipped by default in Umbraco:

  • InternalIndex: keyword_lowercase_asciifolding, this is a custom analyzer that ExamineX adds to the index which is the keyword (i.e. whitespace) analyzer with both the lowercase and asciifolding filters applied.
  • ExternalIndex: standard.lucene, the lucene standard analyzer
  • MembersIndex: keyword_lowercase_asciifolding (as above)

It is possible to change the default analyzer used for each index by replacing the IUmbracoIndexesCreator in your own Umbraco composer

Example:

[ComposeAfter(typeof(ExamineXComposer))]
[RuntimeLevel(MinLevel = RuntimeLevel.Run)]
public class MyComposer : IUserComposer
{
    public void Compose(Composition composition)
    {
        // Replace the default ExamineX implementation
        composition.RegisterUnique<IUmbracoIndexesCreator, MyUmbracoIndexesCreator>();
    }
}

// Override the default ExamineX implementation
public class MyUmbracoIndexesCreator : ExamineXIndexFactory
{
    public MyUmbracoIndexesCreator(
        ExamineXConfig examineXConfig, IUmbracoIndexConfig umbracoIndexConfig, IExamineXLogger logger, 
        ILicenseManager licenceManager, IRuntimeState runtimeState, UmbracoIndexesCreator defaultFactory) 
    : base(examineXConfig, umbracoIndexConfig, logger, licenceManager, runtimeState, defaultFactory)
    {
    }

    // Replace the default analyzer for the external index to be 
    // Microsoft's English Analyzer
    protected override IIndex CreateInternalIndex(string defaultAnalyzer) 
        => base.CreateInternalIndex(AnalyzerName.AsString.EnMicrosoft);
}

Default field types & analyzers

It is possible to change the analyzer used per field in ExamineX in almost the same way you do in Examine, however there are slightly different field definition types for ExamineX:

  • AzureSearchFieldDefinitionTypes.FullText - Default. The field will be indexed with the index’s default Analyzer without sortability. Generally this is fine for normal text searching.
  • AzureSearchFieldDefinitionTypes.FullTextSortable - Will be indexed with FullText but also enable sorting on this field for search results.
  • AzureSearchFieldDefinitionTypes.FullTextMultiValue - Will be indexed with FullText to allow multiple values per field, sorting cannot be allowed with multiple values.
  • AzureSearchFieldDefinitionTypes.Integer - Stored as a numerical structure.
  • AzureSearchFieldDefinitionTypes.Double - Stored as a numerical structure.
  • AzureSearchFieldDefinitionTypes.Long - Stored as a numerical structure.
  • AzureSearchFieldDefinitionTypes.DateTime - Stored as a DateTime.
  • AzureSearchFieldDefinitionTypes.Raw - Will be indexed with the keyword analyzer so searching will only match with an exact value.

The safest way to add/remove/modify field value types is to create a custom ExamineXIndexFactory (as in the above example) and override the OnCreatingIndexes to modify the FieldDefinitionCollection an index by using any of the following methods:

  • index.FieldDefinitionCollection.TryAdd
  • index.FieldDefinitionCollection.AddOrUpdate
  • index.FieldDefinitionCollection.GetOrAdd

Example:

 // Set the field definition for "productPrice" to be a Double
index.FieldDefinitionCollection.TryAdd(
    new FieldDefinition("productPrice", AzureSearchFieldDefinitionTypes.Double));
}

Custom field types & analyzers

If you want to define a custom field type, you can do that during index creation by passing in a value to the indexValueTypesFactory parameter of the AzureSearchIndex constructor. The value for this parameter is: IReadOnlyDictionary<string, IAzureSearchFieldValueTypeFactory> indexValueTypesFactory. The dictionary Key is the name of the field type adn the Value is a factory that returns a IAzureSearchFieldValueTypeFactory for a given field name. ExamineX has a default implementation of this which includes all of the above default field types. If this parameter is specified, all of ExamineX’s default field types will still be used but you can override the custom implementations if a Key that you provide matches a default implementation.

Example:

This will create a Japanese field type with a “ja.microsoft” analyzer.

protected override IIndex CreateInternalIndex(string defaultAnalyzer)
{
    var customFieldTypeFactory = new Dictionary<string, IAzureSearchFieldValueTypeFactory>
    {
        // The name of this field type will be "ja.microsoft",
        // based on the value of this constant
        [AnalyzerName.AsString.JaMicrosoft] =
        // define the factory
        new AzureSearchFieldValueTypeFactory(fieldName =>
            new AzureSearchFieldValueType(
                fieldName,
                // This example will allow multiple values 
                // per field
                AzureSearchFieldValueType.StringCollectionType,
                // Use microsoft's Japanese analyzer
                AnalyzerName.AsString.JaMicrosoft))
    };

    var index = new UmbracoAzureSearchContentIndex(
        Constants.UmbracoIndexes.InternalIndexName,
        _licenceManager,
        _logger,
        _examineXConfig,
        _runtimeState,
        _umbracoIndexConfig.GetContentValueSetValidator(),
        new UmbracoFieldDefinitionCollection(),
        defaultAnalyzer,
        // Pass in the custom field types
        customFieldTypeFactory);

    return index;
}

Because this is a factory, this could be done to create a generic Language field type factory if your fields were named with the language name reference. For example, if your fields were suffixed with a language ISO code like: bodyText_ja-jp, bodyText_it-it, etc… you could return a specific language analyzer for the matched field name suffix.

To wire this up you need to pass this parameter to the index constructor which means you need to have a custom ExamineXIndexFactory (see above) and override the Create... methods for the index you want to customize.

NOTE: In the near future, this process will be made easier and more flexible. Coming soon to version 1.2.1.

Events

All of the underlying Examine events are available in ExamineX such as TransformingIndexValues, IndexingError and OperationComplete.

These additional events are available in ExamineX:

AzureSearchIndex.CreatingOrUpdatingIndex - Allows you to modify the definition of the index before it is created in Azure Search.

It is advised to not remove any indexes, indexers, fields, field mappings or custom analyzers created with ExamineX otherwise unexpected errors may result

Customizing the Azure Search index

Using the AzureSearchIndex.CreatingOrUpdatingIndex can be quite powerful if you want to leverage more out of Azure Cognitive Search than what is provided by default. For example, with this event you could create custom scoring profiles and custom analyzers.

An example of adding an event handler for CreatingOrUpdatingIndex:

if (examineManager.TryGetIndex("ExternalIndex", out var index) 
    && index is AzureSearchIndex azureIndex)
{
    azureIndex.CreatingOrUpdatingIndex += AzureIndex_CreatingIndex;
}

An example of creating a custom scoring profiles:

private void AzureIndex_CreatingOrUpdatingIndex(object sender, CreatingOrUpdatingIndexEventArgs e)
{
    // NOTE: You cannot add a scoring rule for a field unless that field exists in the index definition!
    //       When ExamineX first creates the index it will only contain the fields defined in the
    //       initial field definitions. When new items are indexed and new fields are detected then
    //       the Azure Cognitive Search index is updated.

    // get the azure cognitive search definition
    var index = e.AzureSearchIndexDefinition;

    // get or create scoring profiles list (will be null for new indexes)
    index.ScoringProfiles = index.ScoringProfiles ?? new List<ScoringProfile>();

    // this example will create a scoring profile called "pages"
    const string scoringProfileName = "pages";

    // get or create a scoring profile
    var scoringProfile = index.ScoringProfiles.FirstOrDefault(x => x.Name == scoringProfileName);
    if (scoringProfile == null)
        index.ScoringProfiles.Add(scoringProfile = new ScoringProfile
        {
            Name = scoringProfileName,
            FunctionAggregation = ScoringFunctionAggregation.Sum
        });

    // add a 'boost' of 3 for the "pageTitle" field if the field exists
    if (index.Fields.Any(x => x.Name == "pageTitle"))
    {
        // ensure the object exists
        scoringProfile.TextWeights = scoringProfile.TextWeights ?? new TextWeights(new Dictionary<string, double>());
        scoringProfile.TextWeights.Weights.Add("pageTitle", 3);
    }

    // add a 'boost' for pages that have been updated within the last two days
    if (index.Fields.Any(x => x.Name == "updateDate"))
    {
        // ensure the object exists
        scoringProfile.Functions = scoringProfile.Functions ?? new List<ScoringFunction>();

        // check existing or add
        var updateDateFreshness = scoringProfile.Functions.FirstOrDefault(x => x.FieldName == "updateDate");
        if (updateDateFreshness == null)
            scoringProfile.Functions.Add(updateDateFreshness = new FreshnessScoringFunction
            {
                FieldName = "updateDate",
                Boost = 3,
                Parameters = new FreshnessScoringParameters(new TimeSpan(2, 0, 0, 0)),
                Interpolation = ScoringFunctionInterpolation.Logarithmic
            });
    }

    // Set the default scoring profile
    index.DefaultScoringProfile = scoringProfileName;
}