Azure 搜索文档添加自定义分析器、标记器和标记过滤器

Azure Search Documents Add Custom Analyzers, Tokenizers and TokenFilters

我正在将 Azure 搜索 sdk 从 Microsoft.Azure.Search (v10) 迁移到 Azure.Search.Documents (v11)。

之前,在 v10 中,我们能够使用 C# SDK 使用自定义分析器、分词器...创建索引,如下所示:

var index = new Microsoft.Azure.Search.Models.Index(
                name: GetIndexName(),
                defaultScoringProfile: defaultScoringProfile,
                fields: AzureQuestionItemDefinition.GetQuestionItemFieldsDefinition(),
                analyzers: new[] {
                    new CustomAnalyzer
                    {
                        Name = "standardAnalyzer",
                        Tokenizer = TokenizerName.Standard,
                        TokenFilters = new[]
                        {
                            TokenFilterName.Lowercase,
                            TokenFilterName.AsciiFolding,
                            TokenFilterName.Phonetic,
                        }
                    },
                    new CustomAnalyzer
                    {
                        Name = "prefixAnalyzer",
                        Tokenizer = TokenizerName.Standard,
                        TokenFilters = new[]
                        {
                            TokenFilterName.Lowercase,
                            TokenFilterName.AsciiFolding,
                            TokenFilterName.Phonetic,
                            "edgeNgramTokenFilter"
                        }
                    },
                },
                tokenFilters: new[]
                {
                    new EdgeNGramTokenFilterV2("edgeNgramTokenFilter", minGram: 2, maxGram: 10, EdgeNGramTokenFilterSide.Front),
                },
                scoringProfiles: new[]
                {
                    new ScoringProfile(defaultScoringProfile)
                    {
                        TextWeights = new TextWeights()
                        {
                            Weights = new Dictionary<string, double>() {
                                { nameof(QuestionItem.Text), 5.0 },
                                { nameof(QuestionItem.Context), 5.0 },
                                { $"{nameof(QuestionItem.Asker)}/{nameof(QuestionItem.Asker.Name)}", 3.0 },
                                { $"{nameof(QuestionItem.Answers)}/{nameof(AnswerItem.Text)}", 2.0 },
                                { $"{nameof(QuestionItem.Answers)}/{nameof(AnswerItem.AnswererName)}", 2.0 }
                            }
                        }
                    }
                }

在迁移到新的 Azure.Search.Documents v11 时,我找不到像这样使用 C# SDK 创建索引的方法。

我发现 SearchIndex 属性是 readonly:

//
    // Summary:
    //     Represents a search index definition, which describes the fields and search behavior
    //     of an index.
    public class SearchIndex : IUtf8JsonSerializable
    {
        //
        // Summary:
        //     Initializes a new instance of the Azure.Search.Documents.Indexes.Models.SearchIndex
        //     class.
        //
        // Parameters:
        //   name:
        //     The name of the index.
        //
        // Exceptions:
        //   T:System.ArgumentException:
        //     name is an empty string.
        //
        //   T:System.ArgumentNullException:
        //     name is null.
        public SearchIndex(string name);
        //
        // Summary:
        //     Initializes a new instance of the Azure.Search.Documents.Indexes.Models.SearchIndex
        //     class.
        //
        // Parameters:
        //   name:
        //     The name of the index.
        //
        //   fields:
        //     Fields to add to the index.
        //
        // Exceptions:
        //   T:System.ArgumentException:
        //     name is an empty string.
        //
        //   T:System.ArgumentNullException:
        //     name or fields is null.
        public SearchIndex(string name, IEnumerable<SearchField> fields);

        //
        // Summary:
        //     The name of the scoring profile to use if none is specified in the query. If
        //     this property is not set and no scoring profile is specified in the query, then
        //     default scoring (tf-idf) will be used.
        public string DefaultScoringProfile { get; set; }
        //
        // Summary:
        //     Options to control Cross-Origin Resource Sharing (CORS) for the index.
        public CorsOptions CorsOptions { get; set; }
        //
        // Summary:
        //     A description of an encryption key that you create in Azure Key Vault. This key
        //     is used to provide an additional level of encryption-at-rest for your data when
        //     you want full assurance that no one, not even Microsoft, can decrypt your data
        //     in Azure Cognitive Search. Once you have encrypted your data, it will always
        //     remain encrypted. Azure Cognitive Search will ignore attempts to set this property
        //     to null. You can change this property as needed if you want to rotate your encryption
        //     key; Your data will be unaffected. Encryption with customer-managed keys is not
        //     available for free search services, and is only available for paid services created
        //     on or after January 1, 2019.
        public SearchResourceEncryptionKey EncryptionKey { get; set; }
        //
        // Summary:
        //     The type of similarity algorithm to be used when scoring and ranking the documents
        //     matching a search query. The similarity algorithm can only be defined at index
        //     creation time and cannot be modified on existing indexes. If null, the ClassicSimilarity
        //     algorithm is used.
        public SimilarityAlgorithm Similarity { get; set; }
        //
        // Summary:
        //     Gets the name of the index.
        [CodeGenMemberAttribute("name")]
        public string Name { get; }
        //
        // Summary:
        //     Gets the analyzers for the index.
        public IList<LexicalAnalyzer> Analyzers { get; }
        //
        // Summary:
        //     Gets the character filters for the index.
        public IList<CharFilter> CharFilters { get; }
        //
        // Summary:
        //     Gets or sets the fields in the index. Use Azure.Search.Documents.Indexes.FieldBuilder
        //     to define fields based on a model class, or Azure.Search.Documents.Indexes.Models.SimpleField,
        //     Azure.Search.Documents.Indexes.Models.SearchableField, and Azure.Search.Documents.Indexes.Models.ComplexField
        //     to manually define fields. Index fields have many constraints that are not validated
        //     with Azure.Search.Documents.Indexes.Models.SearchField until the index is created
        //     on the server.
        public IList<SearchField> Fields { get; set; }
        //
        // Summary:
        //     Gets the scoring profiles for the index.
        public IList<ScoringProfile> ScoringProfiles { get; }
        //
        // Summary:
        //     Gets the suggesters for the index.
        public IList<SearchSuggester> Suggesters { get; }
        //
        // Summary:
        //     Gets the token filters for the index.
        public IList<TokenFilter> TokenFilters { get; }
        //
        // Summary:
        //     Gets the tokenizers for the index.
        public IList<LexicalTokenizer> Tokenizers { get; }
        //
        // Summary:
        //     The Azure.ETag of the Azure.Search.Documents.Indexes.Models.SearchIndex.
        public ETag? ETag { get; set; }
    }

我的问题是如何设置自定义 Tokenizers、TokenFilters、ScoringProfiles...

集合属性在新的 Azure .NET 客户端库中默认初始化。虽然您不能设置属性,但您仍然可以对每个属性调用 Add

var index = new SearchIndex("myindex");
index.ScoringProfiles.Add(new ScoringProfile(...));

我个人觉得这不太方便,因为我喜欢编写 expression-based 代码,所以我已经将此反馈传递给 Azure SDK 团队。