带复数词的阿拉伯语 WordNet

Arabic WordNet with plural words

我在 C# 中使用阿拉伯语 wordNet 来获取单数词的同义词,例如“عرض” 并且我得到以下同义词(علام٩、أمار、شدو、ضر、شؤم、بليو等)。
我的问题是:有没有一种方法可以从阿拉伯语 WordNet 中获取复数词的同义词,例如单词“علامات”。
我需要它,因为我没有找到一种方法来从阿拉伯语中的复数词中获取单数词,例如“علامات”=>“علامة.
感谢您提供的任何帮助。

我通过编辑 awn.xml 文件并添加所有需要的复数词来解决这个问题,例如“display”这个词有复数形式的“symptoms”并且有以下同义词

<wordnet version="20">
<item itemid="&gt;aArad_n1AR" offset="102231120" lexfile="" name="أعراض" type="synset" headword="" POS="n" source="" gloss="" authorshipid="1" />
<authorship author="ali" date="20150215" score="" comment="" covering="1" authorshipid="1" />
<item itemid="&gt;aMrad_n1AR" offset="102231121" lexfile="" name="أمراض" type="synset" headword="" POS="n" source="" gloss="" authorshipid="2" />
<authorship author="ali" date="20150215" score="" comment="" covering="1" authorshipid="1" />
<item itemid="&gt;Isteqsa'at" offset="102231121" lexfile="" name="استقصاءات" type="synset" headword="" POS="n" source="" gloss="" authorshipid="3" />
<authorship author="ali" date="20150215" score="" comment="" covering="1" authorshipid="1" />

然后添加同义词如下

<authorship author="ali" date="20180215" score="" comment="From suggested word" covering="0" authorshipid="12136" />
<word wordid="&lt;aArad_n1AR" value="أعراض" synsetid="&gt;aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="&lt;aArad_n1AR" type="brokenPlural" authorshipid="12137" />
<word wordid="&lt;$araat" value="إشارات" synsetid="&gt;aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="&lt;$araat" type="brokenPlural" authorshipid="12137" />
<word wordid="&lt;Alamat" value="علامات" synsetid="&gt;aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="&lt;Alamat" type="brokenPlural" authorshipid="12137" />
<word wordid="&lt;$adaed" value="شدائد" synsetid="&gt;aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="&lt;$adaed" type="brokenPlural" authorshipid="12137" />
<word wordid="&lt;adrar" value="أضرار" synsetid="&gt;aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="&lt;adrar" type="brokenPlural" authorshipid="12137" />
<word wordid="&lt;balaya" value="بلايا" synsetid="&gt;aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="&lt;balaya" type="brokenPlural" authorshipid="12137" />
<word wordid="&lt;tawar'a" value="طوارئ" synsetid="&gt;aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="&lt;tawar'a" type="brokenPlural" authorshipid="12137" />
<word wordid="&lt;fawajea" value="فواجع" synsetid="&gt;aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="&lt;fawajea" type="brokenPlural" authorshipid="12137" />
<word wordid="&lt;fawadeh" value="فوادح" synsetid="&gt;aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="&lt;fawadeh" type="brokenPlural" authorshipid="12137" />
<word wordid="&lt;kawareth" value="كوارث" synsetid="&gt;aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="&lt;kawareth" type="brokenPlural" authorshipid="12137" />
<word wordid="&lt;mehan" value="محن" synsetid="&gt;aArad_n1AR"  type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="&lt;mehan" type="brokenPlural" authorshipid="12137" />
<word wordid="&lt;makrohat" value="مكروهات" synsetid="&gt;aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="&lt;makrohat" type="brokenPlural" authorshipid="12137" />
<word wordid="&lt;masaeb" value="مصائب" synsetid="&gt;aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="&lt;masaeb" type="brokenPlural" authorshipid="12137" />
<word wordid="&lt;masawea" value="مساوئ" synsetid="&gt;aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="&lt;masawea" type="brokenPlural" authorshipid="12137" />
<word wordid="&lt;Elal" value="علل" synsetid="&gt;aMrad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أمراض" wordid="&lt;Elal" type="brokenPlural" authorshipid="12137" />
<word wordid="&lt;Ellat" value="علات" synsetid="&gt;aMrad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أمراض" wordid="&lt;Ellat" type="brokenPlural" authorshipid="12137" />
<word wordid="&lt;Eatilalat" value="اعتلالات" synsetid="&gt;aMrad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أمراض" wordid="&lt;Eatilalat" type="brokenPlural" authorshipid="12137" />
<word wordid="&lt;Da'aat" value="داءات" synsetid="&gt;aMrad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أمراض" wordid="&lt;Da'aat" type="brokenPlural" authorshipid="12137" />
<word wordid="&lt;waakat" value="وعكات" synsetid="&gt;aMrad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أمراض" wordid="&lt;waakat" type="brokenPlural" authorshipid="12137" />
<word wordid="&lt;askaam" value="أسقام" synsetid="&gt;aMrad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أمراض" wordid="&lt;askaam" type="brokenPlural" authorshipid="12137" />
<word wordid="&lt;$akawa" value="شكاوى" synsetid="&gt;aMrad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أمراض" wordid="&lt;$akawa" type="brokenPlural" authorshipid="12137" />
<word wordid="&lt;aMrad_n1AR" value="أمراض" synsetid="&gt;aMrad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أمراض" wordid="&lt;aMrad_n1AR" type="brokenPlural" authorshipid="12137" />
<word wordid="&lt;Fohosat" value="فحوصات" synsetid="&gt;Isteqsa'at" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="استقصاءات" wordid="&lt;Fohosat" type="brokenPlural" authorshipid="12137" />
<word wordid="&lt;Taharieat" value="تحريات" synsetid="&gt;Isteqsa'at" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="استقصاءات" wordid="&lt;Taharieat" type="brokenPlural" authorshipid="12137" />
<word wordid="&lt;Isteqsa'at" value="استقصاءات" synsetid="&gt;Isteqsa'at" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="استقصاءات" wordid="&lt;Isteqsa'at" type="brokenPlural" authorshipid="12137" />

现在当我们执行下面的代码片段时

        List<string> wordId = _awn.Get_List_Word_Id_From_Value("علامات");
        List<string> synonyms = new List<string>();
        if (wordId != null)
        {
            foreach (string ss in wordId)
            {
                string temp = _awn.Get_Synset_ID_From_Word_Id(ss);
                List<string> test = _awn.Get_List_Word_Id_From_Synset_ID(temp);
                if (test.Count != 0)
                {
                    foreach (string str in test)
                    {
                        string s = _awn.Get_Word_Value_From_Word_Id(str);
                        if (!synonyms.Contains(s))
                            synonyms.Add(s);
                    }
                }
            }
        }

我们在同义词列表“疾病”、“疾病”、“疾病”、“疾病”、“疾病”、“疾病”、“投诉”中得到以下单词。这是“显示”一词的同义词的复数词。

如果你想从复数中得到单数词,你可以使用任何可用的形态分析器,比如 "ALKhalil" 这是一个开源 java 项目,但这只是为了得到单数对于复数而不是对比。