带复数词的阿拉伯语 WordNet
Arabic WordNet with plural words
我在 C# 中使用阿拉伯语 wordNet 来获取单数词的同义词,例如“عرض”
并且我得到以下同义词(علام٩、أمار、شدو、ضر、شؤم、بليو等)。
我的问题是:有没有一种方法可以从阿拉伯语 WordNet 中获取复数词的同义词,例如单词“علامات”。
我需要它,因为我没有找到一种方法来从阿拉伯语中的复数词中获取单数词,例如“علامات”=>“علامة.
感谢您提供的任何帮助。
我通过编辑 awn.xml 文件并添加所有需要的复数词来解决这个问题,例如“display”这个词有复数形式的“symptoms”并且有以下同义词
<wordnet version="20">
<item itemid=">aArad_n1AR" offset="102231120" lexfile="" name="أعراض" type="synset" headword="" POS="n" source="" gloss="" authorshipid="1" />
<authorship author="ali" date="20150215" score="" comment="" covering="1" authorshipid="1" />
<item itemid=">aMrad_n1AR" offset="102231121" lexfile="" name="أمراض" type="synset" headword="" POS="n" source="" gloss="" authorshipid="2" />
<authorship author="ali" date="20150215" score="" comment="" covering="1" authorshipid="1" />
<item itemid=">Isteqsa'at" offset="102231121" lexfile="" name="استقصاءات" type="synset" headword="" POS="n" source="" gloss="" authorshipid="3" />
<authorship author="ali" date="20150215" score="" comment="" covering="1" authorshipid="1" />
然后添加同义词如下
<authorship author="ali" date="20180215" score="" comment="From suggested word" covering="0" authorshipid="12136" />
<word wordid="<aArad_n1AR" value="أعراض" synsetid=">aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="<aArad_n1AR" type="brokenPlural" authorshipid="12137" />
<word wordid="<$araat" value="إشارات" synsetid=">aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="<$araat" type="brokenPlural" authorshipid="12137" />
<word wordid="<Alamat" value="علامات" synsetid=">aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="<Alamat" type="brokenPlural" authorshipid="12137" />
<word wordid="<$adaed" value="شدائد" synsetid=">aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="<$adaed" type="brokenPlural" authorshipid="12137" />
<word wordid="<adrar" value="أضرار" synsetid=">aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="<adrar" type="brokenPlural" authorshipid="12137" />
<word wordid="<balaya" value="بلايا" synsetid=">aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="<balaya" type="brokenPlural" authorshipid="12137" />
<word wordid="<tawar'a" value="طوارئ" synsetid=">aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="<tawar'a" type="brokenPlural" authorshipid="12137" />
<word wordid="<fawajea" value="فواجع" synsetid=">aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="<fawajea" type="brokenPlural" authorshipid="12137" />
<word wordid="<fawadeh" value="فوادح" synsetid=">aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="<fawadeh" type="brokenPlural" authorshipid="12137" />
<word wordid="<kawareth" value="كوارث" synsetid=">aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="<kawareth" type="brokenPlural" authorshipid="12137" />
<word wordid="<mehan" value="محن" synsetid=">aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="<mehan" type="brokenPlural" authorshipid="12137" />
<word wordid="<makrohat" value="مكروهات" synsetid=">aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="<makrohat" type="brokenPlural" authorshipid="12137" />
<word wordid="<masaeb" value="مصائب" synsetid=">aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="<masaeb" type="brokenPlural" authorshipid="12137" />
<word wordid="<masawea" value="مساوئ" synsetid=">aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="<masawea" type="brokenPlural" authorshipid="12137" />
<word wordid="<Elal" value="علل" synsetid=">aMrad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أمراض" wordid="<Elal" type="brokenPlural" authorshipid="12137" />
<word wordid="<Ellat" value="علات" synsetid=">aMrad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أمراض" wordid="<Ellat" type="brokenPlural" authorshipid="12137" />
<word wordid="<Eatilalat" value="اعتلالات" synsetid=">aMrad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أمراض" wordid="<Eatilalat" type="brokenPlural" authorshipid="12137" />
<word wordid="<Da'aat" value="داءات" synsetid=">aMrad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أمراض" wordid="<Da'aat" type="brokenPlural" authorshipid="12137" />
<word wordid="<waakat" value="وعكات" synsetid=">aMrad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أمراض" wordid="<waakat" type="brokenPlural" authorshipid="12137" />
<word wordid="<askaam" value="أسقام" synsetid=">aMrad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أمراض" wordid="<askaam" type="brokenPlural" authorshipid="12137" />
<word wordid="<$akawa" value="شكاوى" synsetid=">aMrad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أمراض" wordid="<$akawa" type="brokenPlural" authorshipid="12137" />
<word wordid="<aMrad_n1AR" value="أمراض" synsetid=">aMrad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أمراض" wordid="<aMrad_n1AR" type="brokenPlural" authorshipid="12137" />
<word wordid="<Fohosat" value="فحوصات" synsetid=">Isteqsa'at" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="استقصاءات" wordid="<Fohosat" type="brokenPlural" authorshipid="12137" />
<word wordid="<Taharieat" value="تحريات" synsetid=">Isteqsa'at" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="استقصاءات" wordid="<Taharieat" type="brokenPlural" authorshipid="12137" />
<word wordid="<Isteqsa'at" value="استقصاءات" synsetid=">Isteqsa'at" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="استقصاءات" wordid="<Isteqsa'at" type="brokenPlural" authorshipid="12137" />
现在当我们执行下面的代码片段时
List<string> wordId = _awn.Get_List_Word_Id_From_Value("علامات");
List<string> synonyms = new List<string>();
if (wordId != null)
{
foreach (string ss in wordId)
{
string temp = _awn.Get_Synset_ID_From_Word_Id(ss);
List<string> test = _awn.Get_List_Word_Id_From_Synset_ID(temp);
if (test.Count != 0)
{
foreach (string str in test)
{
string s = _awn.Get_Word_Value_From_Word_Id(str);
if (!synonyms.Contains(s))
synonyms.Add(s);
}
}
}
}
我们在同义词列表“疾病”、“疾病”、“疾病”、“疾病”、“疾病”、“疾病”、“投诉”中得到以下单词。这是“显示”一词的同义词的复数词。
如果你想从复数中得到单数词,你可以使用任何可用的形态分析器,比如 "ALKhalil" 这是一个开源 java 项目,但这只是为了得到单数对于复数而不是对比。
我在 C# 中使用阿拉伯语 wordNet 来获取单数词的同义词,例如“عرض”
并且我得到以下同义词(علام٩、أمار、شدو、ضر、شؤم、بليو等)。
我的问题是:有没有一种方法可以从阿拉伯语 WordNet 中获取复数词的同义词,例如单词“علامات”。
我需要它,因为我没有找到一种方法来从阿拉伯语中的复数词中获取单数词,例如“علامات”=>“علامة.
感谢您提供的任何帮助。
我通过编辑 awn.xml 文件并添加所有需要的复数词来解决这个问题,例如“display”这个词有复数形式的“symptoms”并且有以下同义词
<wordnet version="20">
<item itemid=">aArad_n1AR" offset="102231120" lexfile="" name="أعراض" type="synset" headword="" POS="n" source="" gloss="" authorshipid="1" />
<authorship author="ali" date="20150215" score="" comment="" covering="1" authorshipid="1" />
<item itemid=">aMrad_n1AR" offset="102231121" lexfile="" name="أمراض" type="synset" headword="" POS="n" source="" gloss="" authorshipid="2" />
<authorship author="ali" date="20150215" score="" comment="" covering="1" authorshipid="1" />
<item itemid=">Isteqsa'at" offset="102231121" lexfile="" name="استقصاءات" type="synset" headword="" POS="n" source="" gloss="" authorshipid="3" />
<authorship author="ali" date="20150215" score="" comment="" covering="1" authorshipid="1" />
然后添加同义词如下
<authorship author="ali" date="20180215" score="" comment="From suggested word" covering="0" authorshipid="12136" />
<word wordid="<aArad_n1AR" value="أعراض" synsetid=">aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="<aArad_n1AR" type="brokenPlural" authorshipid="12137" />
<word wordid="<$araat" value="إشارات" synsetid=">aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="<$araat" type="brokenPlural" authorshipid="12137" />
<word wordid="<Alamat" value="علامات" synsetid=">aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="<Alamat" type="brokenPlural" authorshipid="12137" />
<word wordid="<$adaed" value="شدائد" synsetid=">aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="<$adaed" type="brokenPlural" authorshipid="12137" />
<word wordid="<adrar" value="أضرار" synsetid=">aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="<adrar" type="brokenPlural" authorshipid="12137" />
<word wordid="<balaya" value="بلايا" synsetid=">aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="<balaya" type="brokenPlural" authorshipid="12137" />
<word wordid="<tawar'a" value="طوارئ" synsetid=">aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="<tawar'a" type="brokenPlural" authorshipid="12137" />
<word wordid="<fawajea" value="فواجع" synsetid=">aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="<fawajea" type="brokenPlural" authorshipid="12137" />
<word wordid="<fawadeh" value="فوادح" synsetid=">aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="<fawadeh" type="brokenPlural" authorshipid="12137" />
<word wordid="<kawareth" value="كوارث" synsetid=">aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="<kawareth" type="brokenPlural" authorshipid="12137" />
<word wordid="<mehan" value="محن" synsetid=">aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="<mehan" type="brokenPlural" authorshipid="12137" />
<word wordid="<makrohat" value="مكروهات" synsetid=">aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="<makrohat" type="brokenPlural" authorshipid="12137" />
<word wordid="<masaeb" value="مصائب" synsetid=">aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="<masaeb" type="brokenPlural" authorshipid="12137" />
<word wordid="<masawea" value="مساوئ" synsetid=">aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="<masawea" type="brokenPlural" authorshipid="12137" />
<word wordid="<Elal" value="علل" synsetid=">aMrad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أمراض" wordid="<Elal" type="brokenPlural" authorshipid="12137" />
<word wordid="<Ellat" value="علات" synsetid=">aMrad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أمراض" wordid="<Ellat" type="brokenPlural" authorshipid="12137" />
<word wordid="<Eatilalat" value="اعتلالات" synsetid=">aMrad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أمراض" wordid="<Eatilalat" type="brokenPlural" authorshipid="12137" />
<word wordid="<Da'aat" value="داءات" synsetid=">aMrad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أمراض" wordid="<Da'aat" type="brokenPlural" authorshipid="12137" />
<word wordid="<waakat" value="وعكات" synsetid=">aMrad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أمراض" wordid="<waakat" type="brokenPlural" authorshipid="12137" />
<word wordid="<askaam" value="أسقام" synsetid=">aMrad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أمراض" wordid="<askaam" type="brokenPlural" authorshipid="12137" />
<word wordid="<$akawa" value="شكاوى" synsetid=">aMrad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أمراض" wordid="<$akawa" type="brokenPlural" authorshipid="12137" />
<word wordid="<aMrad_n1AR" value="أمراض" synsetid=">aMrad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أمراض" wordid="<aMrad_n1AR" type="brokenPlural" authorshipid="12137" />
<word wordid="<Fohosat" value="فحوصات" synsetid=">Isteqsa'at" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="استقصاءات" wordid="<Fohosat" type="brokenPlural" authorshipid="12137" />
<word wordid="<Taharieat" value="تحريات" synsetid=">Isteqsa'at" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="استقصاءات" wordid="<Taharieat" type="brokenPlural" authorshipid="12137" />
<word wordid="<Isteqsa'at" value="استقصاءات" synsetid=">Isteqsa'at" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="استقصاءات" wordid="<Isteqsa'at" type="brokenPlural" authorshipid="12137" />
现在当我们执行下面的代码片段时
List<string> wordId = _awn.Get_List_Word_Id_From_Value("علامات");
List<string> synonyms = new List<string>();
if (wordId != null)
{
foreach (string ss in wordId)
{
string temp = _awn.Get_Synset_ID_From_Word_Id(ss);
List<string> test = _awn.Get_List_Word_Id_From_Synset_ID(temp);
if (test.Count != 0)
{
foreach (string str in test)
{
string s = _awn.Get_Word_Value_From_Word_Id(str);
if (!synonyms.Contains(s))
synonyms.Add(s);
}
}
}
}
我们在同义词列表“疾病”、“疾病”、“疾病”、“疾病”、“疾病”、“疾病”、“投诉”中得到以下单词。这是“显示”一词的同义词的复数词。
如果你想从复数中得到单数词,你可以使用任何可用的形态分析器,比如 "ALKhalil" 这是一个开源 java 项目,但这只是为了得到单数对于复数而不是对比。