Azure 搜索文档添加自定义分析器、标记器和标记过滤器
Azure Search Documents Add Custom Analyzers, Tokenizers and TokenFilters
我正在将 Azure 搜索 sdk 从 Microsoft.Azure.Search (v10) 迁移到 Azure.Search.Documents (v11)。
之前,在 v10 中,我们能够使用 C# SDK 使用自定义分析器、分词器...创建索引,如下所示:
var index = new Microsoft.Azure.Search.Models.Index(
name: GetIndexName(),
defaultScoringProfile: defaultScoringProfile,
fields: AzureQuestionItemDefinition.GetQuestionItemFieldsDefinition(),
analyzers: new[] {
new CustomAnalyzer
{
Name = "standardAnalyzer",
Tokenizer = TokenizerName.Standard,
TokenFilters = new[]
{
TokenFilterName.Lowercase,
TokenFilterName.AsciiFolding,
TokenFilterName.Phonetic,
}
},
new CustomAnalyzer
{
Name = "prefixAnalyzer",
Tokenizer = TokenizerName.Standard,
TokenFilters = new[]
{
TokenFilterName.Lowercase,
TokenFilterName.AsciiFolding,
TokenFilterName.Phonetic,
"edgeNgramTokenFilter"
}
},
},
tokenFilters: new[]
{
new EdgeNGramTokenFilterV2("edgeNgramTokenFilter", minGram: 2, maxGram: 10, EdgeNGramTokenFilterSide.Front),
},
scoringProfiles: new[]
{
new ScoringProfile(defaultScoringProfile)
{
TextWeights = new TextWeights()
{
Weights = new Dictionary<string, double>() {
{ nameof(QuestionItem.Text), 5.0 },
{ nameof(QuestionItem.Context), 5.0 },
{ $"{nameof(QuestionItem.Asker)}/{nameof(QuestionItem.Asker.Name)}", 3.0 },
{ $"{nameof(QuestionItem.Answers)}/{nameof(AnswerItem.Text)}", 2.0 },
{ $"{nameof(QuestionItem.Answers)}/{nameof(AnswerItem.AnswererName)}", 2.0 }
}
}
}
}
在迁移到新的 Azure.Search.Documents v11 时,我找不到像这样使用 C# SDK 创建索引的方法。
我发现 SearchIndex 属性是 readonly:
//
// Summary:
// Represents a search index definition, which describes the fields and search behavior
// of an index.
public class SearchIndex : IUtf8JsonSerializable
{
//
// Summary:
// Initializes a new instance of the Azure.Search.Documents.Indexes.Models.SearchIndex
// class.
//
// Parameters:
// name:
// The name of the index.
//
// Exceptions:
// T:System.ArgumentException:
// name is an empty string.
//
// T:System.ArgumentNullException:
// name is null.
public SearchIndex(string name);
//
// Summary:
// Initializes a new instance of the Azure.Search.Documents.Indexes.Models.SearchIndex
// class.
//
// Parameters:
// name:
// The name of the index.
//
// fields:
// Fields to add to the index.
//
// Exceptions:
// T:System.ArgumentException:
// name is an empty string.
//
// T:System.ArgumentNullException:
// name or fields is null.
public SearchIndex(string name, IEnumerable<SearchField> fields);
//
// Summary:
// The name of the scoring profile to use if none is specified in the query. If
// this property is not set and no scoring profile is specified in the query, then
// default scoring (tf-idf) will be used.
public string DefaultScoringProfile { get; set; }
//
// Summary:
// Options to control Cross-Origin Resource Sharing (CORS) for the index.
public CorsOptions CorsOptions { get; set; }
//
// Summary:
// A description of an encryption key that you create in Azure Key Vault. This key
// is used to provide an additional level of encryption-at-rest for your data when
// you want full assurance that no one, not even Microsoft, can decrypt your data
// in Azure Cognitive Search. Once you have encrypted your data, it will always
// remain encrypted. Azure Cognitive Search will ignore attempts to set this property
// to null. You can change this property as needed if you want to rotate your encryption
// key; Your data will be unaffected. Encryption with customer-managed keys is not
// available for free search services, and is only available for paid services created
// on or after January 1, 2019.
public SearchResourceEncryptionKey EncryptionKey { get; set; }
//
// Summary:
// The type of similarity algorithm to be used when scoring and ranking the documents
// matching a search query. The similarity algorithm can only be defined at index
// creation time and cannot be modified on existing indexes. If null, the ClassicSimilarity
// algorithm is used.
public SimilarityAlgorithm Similarity { get; set; }
//
// Summary:
// Gets the name of the index.
[CodeGenMemberAttribute("name")]
public string Name { get; }
//
// Summary:
// Gets the analyzers for the index.
public IList<LexicalAnalyzer> Analyzers { get; }
//
// Summary:
// Gets the character filters for the index.
public IList<CharFilter> CharFilters { get; }
//
// Summary:
// Gets or sets the fields in the index. Use Azure.Search.Documents.Indexes.FieldBuilder
// to define fields based on a model class, or Azure.Search.Documents.Indexes.Models.SimpleField,
// Azure.Search.Documents.Indexes.Models.SearchableField, and Azure.Search.Documents.Indexes.Models.ComplexField
// to manually define fields. Index fields have many constraints that are not validated
// with Azure.Search.Documents.Indexes.Models.SearchField until the index is created
// on the server.
public IList<SearchField> Fields { get; set; }
//
// Summary:
// Gets the scoring profiles for the index.
public IList<ScoringProfile> ScoringProfiles { get; }
//
// Summary:
// Gets the suggesters for the index.
public IList<SearchSuggester> Suggesters { get; }
//
// Summary:
// Gets the token filters for the index.
public IList<TokenFilter> TokenFilters { get; }
//
// Summary:
// Gets the tokenizers for the index.
public IList<LexicalTokenizer> Tokenizers { get; }
//
// Summary:
// The Azure.ETag of the Azure.Search.Documents.Indexes.Models.SearchIndex.
public ETag? ETag { get; set; }
}
我的问题是如何设置自定义 Tokenizers、TokenFilters、ScoringProfiles...
集合属性在新的 Azure .NET 客户端库中默认初始化。虽然您不能设置属性,但您仍然可以对每个属性调用 Add
:
var index = new SearchIndex("myindex");
index.ScoringProfiles.Add(new ScoringProfile(...));
我个人觉得这不太方便,因为我喜欢编写 expression-based 代码,所以我已经将此反馈传递给 Azure SDK 团队。
我正在将 Azure 搜索 sdk 从 Microsoft.Azure.Search (v10) 迁移到 Azure.Search.Documents (v11)。
之前,在 v10 中,我们能够使用 C# SDK 使用自定义分析器、分词器...创建索引,如下所示:
var index = new Microsoft.Azure.Search.Models.Index(
name: GetIndexName(),
defaultScoringProfile: defaultScoringProfile,
fields: AzureQuestionItemDefinition.GetQuestionItemFieldsDefinition(),
analyzers: new[] {
new CustomAnalyzer
{
Name = "standardAnalyzer",
Tokenizer = TokenizerName.Standard,
TokenFilters = new[]
{
TokenFilterName.Lowercase,
TokenFilterName.AsciiFolding,
TokenFilterName.Phonetic,
}
},
new CustomAnalyzer
{
Name = "prefixAnalyzer",
Tokenizer = TokenizerName.Standard,
TokenFilters = new[]
{
TokenFilterName.Lowercase,
TokenFilterName.AsciiFolding,
TokenFilterName.Phonetic,
"edgeNgramTokenFilter"
}
},
},
tokenFilters: new[]
{
new EdgeNGramTokenFilterV2("edgeNgramTokenFilter", minGram: 2, maxGram: 10, EdgeNGramTokenFilterSide.Front),
},
scoringProfiles: new[]
{
new ScoringProfile(defaultScoringProfile)
{
TextWeights = new TextWeights()
{
Weights = new Dictionary<string, double>() {
{ nameof(QuestionItem.Text), 5.0 },
{ nameof(QuestionItem.Context), 5.0 },
{ $"{nameof(QuestionItem.Asker)}/{nameof(QuestionItem.Asker.Name)}", 3.0 },
{ $"{nameof(QuestionItem.Answers)}/{nameof(AnswerItem.Text)}", 2.0 },
{ $"{nameof(QuestionItem.Answers)}/{nameof(AnswerItem.AnswererName)}", 2.0 }
}
}
}
}
在迁移到新的 Azure.Search.Documents v11 时,我找不到像这样使用 C# SDK 创建索引的方法。
我发现 SearchIndex 属性是 readonly:
//
// Summary:
// Represents a search index definition, which describes the fields and search behavior
// of an index.
public class SearchIndex : IUtf8JsonSerializable
{
//
// Summary:
// Initializes a new instance of the Azure.Search.Documents.Indexes.Models.SearchIndex
// class.
//
// Parameters:
// name:
// The name of the index.
//
// Exceptions:
// T:System.ArgumentException:
// name is an empty string.
//
// T:System.ArgumentNullException:
// name is null.
public SearchIndex(string name);
//
// Summary:
// Initializes a new instance of the Azure.Search.Documents.Indexes.Models.SearchIndex
// class.
//
// Parameters:
// name:
// The name of the index.
//
// fields:
// Fields to add to the index.
//
// Exceptions:
// T:System.ArgumentException:
// name is an empty string.
//
// T:System.ArgumentNullException:
// name or fields is null.
public SearchIndex(string name, IEnumerable<SearchField> fields);
//
// Summary:
// The name of the scoring profile to use if none is specified in the query. If
// this property is not set and no scoring profile is specified in the query, then
// default scoring (tf-idf) will be used.
public string DefaultScoringProfile { get; set; }
//
// Summary:
// Options to control Cross-Origin Resource Sharing (CORS) for the index.
public CorsOptions CorsOptions { get; set; }
//
// Summary:
// A description of an encryption key that you create in Azure Key Vault. This key
// is used to provide an additional level of encryption-at-rest for your data when
// you want full assurance that no one, not even Microsoft, can decrypt your data
// in Azure Cognitive Search. Once you have encrypted your data, it will always
// remain encrypted. Azure Cognitive Search will ignore attempts to set this property
// to null. You can change this property as needed if you want to rotate your encryption
// key; Your data will be unaffected. Encryption with customer-managed keys is not
// available for free search services, and is only available for paid services created
// on or after January 1, 2019.
public SearchResourceEncryptionKey EncryptionKey { get; set; }
//
// Summary:
// The type of similarity algorithm to be used when scoring and ranking the documents
// matching a search query. The similarity algorithm can only be defined at index
// creation time and cannot be modified on existing indexes. If null, the ClassicSimilarity
// algorithm is used.
public SimilarityAlgorithm Similarity { get; set; }
//
// Summary:
// Gets the name of the index.
[CodeGenMemberAttribute("name")]
public string Name { get; }
//
// Summary:
// Gets the analyzers for the index.
public IList<LexicalAnalyzer> Analyzers { get; }
//
// Summary:
// Gets the character filters for the index.
public IList<CharFilter> CharFilters { get; }
//
// Summary:
// Gets or sets the fields in the index. Use Azure.Search.Documents.Indexes.FieldBuilder
// to define fields based on a model class, or Azure.Search.Documents.Indexes.Models.SimpleField,
// Azure.Search.Documents.Indexes.Models.SearchableField, and Azure.Search.Documents.Indexes.Models.ComplexField
// to manually define fields. Index fields have many constraints that are not validated
// with Azure.Search.Documents.Indexes.Models.SearchField until the index is created
// on the server.
public IList<SearchField> Fields { get; set; }
//
// Summary:
// Gets the scoring profiles for the index.
public IList<ScoringProfile> ScoringProfiles { get; }
//
// Summary:
// Gets the suggesters for the index.
public IList<SearchSuggester> Suggesters { get; }
//
// Summary:
// Gets the token filters for the index.
public IList<TokenFilter> TokenFilters { get; }
//
// Summary:
// Gets the tokenizers for the index.
public IList<LexicalTokenizer> Tokenizers { get; }
//
// Summary:
// The Azure.ETag of the Azure.Search.Documents.Indexes.Models.SearchIndex.
public ETag? ETag { get; set; }
}
我的问题是如何设置自定义 Tokenizers、TokenFilters、ScoringProfiles...
集合属性在新的 Azure .NET 客户端库中默认初始化。虽然您不能设置属性,但您仍然可以对每个属性调用 Add
:
var index = new SearchIndex("myindex");
index.ScoringProfiles.Add(new ScoringProfile(...));
我个人觉得这不太方便,因为我喜欢编写 expression-based 代码,所以我已经将此反馈传递给 Azure SDK 团队。