Java,巴拿马项目以及如何处理 Hunspell 'suggest' 结果
Java, project panama and how to deal with Hunspell 'suggest' result
我正在试验 Hunspell and how to interact with it using Java Project Panama(Build 19-panama+1-13 (2022/1/18))。我能够完成一些初始测试,例如创建 handle to Hunspell
并随后使用它来执行拼写检查。我现在正在尝试更精细的东西,让 Hunspell 给我 suggestions
字典中没有的单词。这是我现在拥有的代码:
public class HelloHun {
public static void main(String[] args) {
MemoryAddress hunspellHandle = null;
try (ResourceScope scope = ResourceScope.newConfinedScope()) {
var allocator = SegmentAllocator.nativeAllocator(scope);
// Point it to US english dictionary and (so called) affix file
// Note #1: it is possible to add words to the dictionary if you like
// Note #2: it is possible to have separate/individual dictionaries and affix files (e.g. per user/doc type)
var en_US_aff = allocator.allocateUtf8String("/usr/share/hunspell/en_US.aff");
var en_US_dic = allocator.allocateUtf8String("/usr/share/hunspell/en_US.dic");
// Get a handle to the Hunspell shared library and load up the dictionary and affix
hunspellHandle = Hunspell_create(en_US_aff, en_US_dic);
// Feed it a wrong word
var javaWord = "koing";
// Do a simple spell check of the word
var word = allocator.allocateUtf8String(javaWord);
var spellingResult = Hunspell_spell(hunspellHandle, word);
System.out.println(String.format("%s is spelled %s", javaWord, (spellingResult == 0 ? "incorrect" : "correct")));
// Hunspell also supports giving suggestions for a word - which is what we do next
// Note #3: by testing this `koing` word in isolation - we know that there are 4 alternatives for this word
// Note #4: I'm still investigating how to access individual suggestions
var suggestions = allocator.allocate(10);
var suggestionCount = Hunspell_suggest(hunspellHandle, suggestions, word);
System.out.println(String.format("There are %d suggestions for %s", suggestionCount, javaWord));
// `suggestions` - according to the hunspell API - is a `pointer to an array of strings pointer`
// we know how many `strings` pointer there are, as that is the returned value from `suggest`
// Question: how to process `suggestions` to get individual suggestions
} finally {
if (hunspellHandle != null) {
Hunspell_destroy(hunspellHandle);
}
}
}
}
我看到的是对 Hunspell_suggest
(从 jextract
创建)的调用成功并返回 (4) 个建议(我从命令行使用 Hunspell 进行了验证)- 所以没问题。
现在对我来说更具挑战性的是如何解压从这个调用返回的 suggestions
元素?我一直在查看各种示例,但其中 none 似乎已经达到了这种程度的详细程度(即使我找到了示例,它们似乎也在使用过时的巴拿马 API)。
所以本质上,这是我的问题:
我如何使用巴拿马 JDK19 API 解压一个结构,据报道该结构包含一个 指向字符串数组的指针 到它们各自的字符串集合?
在此处查看 header:https://github.com/hunspell/hunspell/blob/master/src/hunspell/hunspell.h#L80
/* suggest(suggestions, word) - search suggestions
* input: pointer to an array of strings pointer and the (bad) word
* array of strings pointer (here *slst) may not be initialized
* output: number of suggestions in string array, and suggestions in
* a newly allocated array of strings (*slts will be NULL when number
* of suggestion equals 0.)
*/
LIBHUNSPELL_DLL_EXPORTED int Hunspell_suggest(Hunhandle* pHunspell,
char*** slst,
const char* word);
slst
是一个经典的 'out' 参数。也就是说,我们传递一个指向某个值的指针(在本例中为 char**
,即字符串数组),该函数将为我们设置该指针,作为 return 多个结果的一种方式。 (第一个结果是建议的数量)
在巴拿马,您使用 'out' 参数,方法是分配一个段,该段的布局类型为参数的指针。在本例中 char***
是指向 char**
的指针,因此布局为 ADDRESS
。然后我们将创建的段传递给函数,最后 retrieve/use 函数调用后该段的值,它将填充段内容:
// char***
var suggestionsRef = allocator.allocate(ValueLayout.ADDRESS); // allocate space for an address
var suggestionCount = Hunspell_suggest(hunspellHandle, suggestionsRef, word);
// char** (the value set by the function)
MemoryAddress suggestions = suggestionsRef.get(ValueLayout.ADDRESS, 0);
之后,您可以遍历字符串数组:
for (int i = 0; i < suggestionCount; i++) {
// char* (an element in the array)
MemoryAddress suggestion = suggestions.getAtIndex(ValueLayout.ADDRESS, i);
// read the string
String javaSuggestion = suggestion.getUtf8String(suggestion, 0);
}
我正在试验 Hunspell and how to interact with it using Java Project Panama(Build 19-panama+1-13 (2022/1/18))。我能够完成一些初始测试,例如创建 handle to Hunspell
并随后使用它来执行拼写检查。我现在正在尝试更精细的东西,让 Hunspell 给我 suggestions
字典中没有的单词。这是我现在拥有的代码:
public class HelloHun {
public static void main(String[] args) {
MemoryAddress hunspellHandle = null;
try (ResourceScope scope = ResourceScope.newConfinedScope()) {
var allocator = SegmentAllocator.nativeAllocator(scope);
// Point it to US english dictionary and (so called) affix file
// Note #1: it is possible to add words to the dictionary if you like
// Note #2: it is possible to have separate/individual dictionaries and affix files (e.g. per user/doc type)
var en_US_aff = allocator.allocateUtf8String("/usr/share/hunspell/en_US.aff");
var en_US_dic = allocator.allocateUtf8String("/usr/share/hunspell/en_US.dic");
// Get a handle to the Hunspell shared library and load up the dictionary and affix
hunspellHandle = Hunspell_create(en_US_aff, en_US_dic);
// Feed it a wrong word
var javaWord = "koing";
// Do a simple spell check of the word
var word = allocator.allocateUtf8String(javaWord);
var spellingResult = Hunspell_spell(hunspellHandle, word);
System.out.println(String.format("%s is spelled %s", javaWord, (spellingResult == 0 ? "incorrect" : "correct")));
// Hunspell also supports giving suggestions for a word - which is what we do next
// Note #3: by testing this `koing` word in isolation - we know that there are 4 alternatives for this word
// Note #4: I'm still investigating how to access individual suggestions
var suggestions = allocator.allocate(10);
var suggestionCount = Hunspell_suggest(hunspellHandle, suggestions, word);
System.out.println(String.format("There are %d suggestions for %s", suggestionCount, javaWord));
// `suggestions` - according to the hunspell API - is a `pointer to an array of strings pointer`
// we know how many `strings` pointer there are, as that is the returned value from `suggest`
// Question: how to process `suggestions` to get individual suggestions
} finally {
if (hunspellHandle != null) {
Hunspell_destroy(hunspellHandle);
}
}
}
}
我看到的是对 Hunspell_suggest
(从 jextract
创建)的调用成功并返回 (4) 个建议(我从命令行使用 Hunspell 进行了验证)- 所以没问题。
现在对我来说更具挑战性的是如何解压从这个调用返回的 suggestions
元素?我一直在查看各种示例,但其中 none 似乎已经达到了这种程度的详细程度(即使我找到了示例,它们似乎也在使用过时的巴拿马 API)。
所以本质上,这是我的问题:
我如何使用巴拿马 JDK19 API 解压一个结构,据报道该结构包含一个 指向字符串数组的指针 到它们各自的字符串集合?
在此处查看 header:https://github.com/hunspell/hunspell/blob/master/src/hunspell/hunspell.h#L80
/* suggest(suggestions, word) - search suggestions
* input: pointer to an array of strings pointer and the (bad) word
* array of strings pointer (here *slst) may not be initialized
* output: number of suggestions in string array, and suggestions in
* a newly allocated array of strings (*slts will be NULL when number
* of suggestion equals 0.)
*/
LIBHUNSPELL_DLL_EXPORTED int Hunspell_suggest(Hunhandle* pHunspell,
char*** slst,
const char* word);
slst
是一个经典的 'out' 参数。也就是说,我们传递一个指向某个值的指针(在本例中为 char**
,即字符串数组),该函数将为我们设置该指针,作为 return 多个结果的一种方式。 (第一个结果是建议的数量)
在巴拿马,您使用 'out' 参数,方法是分配一个段,该段的布局类型为参数的指针。在本例中 char***
是指向 char**
的指针,因此布局为 ADDRESS
。然后我们将创建的段传递给函数,最后 retrieve/use 函数调用后该段的值,它将填充段内容:
// char***
var suggestionsRef = allocator.allocate(ValueLayout.ADDRESS); // allocate space for an address
var suggestionCount = Hunspell_suggest(hunspellHandle, suggestionsRef, word);
// char** (the value set by the function)
MemoryAddress suggestions = suggestionsRef.get(ValueLayout.ADDRESS, 0);
之后,您可以遍历字符串数组:
for (int i = 0; i < suggestionCount; i++) {
// char* (an element in the array)
MemoryAddress suggestion = suggestions.getAtIndex(ValueLayout.ADDRESS, i);
// read the string
String javaSuggestion = suggestion.getUtf8String(suggestion, 0);
}