开发一个 python/pyspark 程序来显示类似的单词

develop a python/pyspark program to display similar kinds of words

[code_image..

它应该在一个列中打印类似的输出 ]1>

from fuzzywuzzy import fuzz
from fuzzywuzzy import process
query = "Apple"
#set of DATA 25 records
choices = ["apil",
    "apple",
    "Apille",
    "aple",
    "apil",
    "appple",
    "Apple APPLE",
    "Apil Orange",
    "apples"
]
process.extract(query, choices)
#### Printing Accuracy Value
print ("List of ratios: ")
print (process.extract(query, choices), "\n")
#process.extractone(query, choices)
print ("\nBest among the above list ----->",process.extractOne(query, choices))

输出:

List of ratios:

[('apple', 100), ('appple', 91), ('apples', 91), ('Apple APPLE', 90), ('aple', 89)]

Best among the above list -----> ('apple', 100)

我只需要更改一行并在您的代码段中添加另一行。 您可以在我应用这些更改的地方找到注释,解释它们的作用。我不确定您想要的确切输出格式,如果不是您想要的,请随时再次询问。

如果您想更深入地了解最后一行的工作原理,请查看 list comprehension

from fuzzywuzzy import fuzz
from fuzzywuzzy import process
query = "Apple"
#set of DATA 25 records
choices = ["apil",
    "apple",
    "Apille",
    "aple",
    "apil",
    "appple",
    "Apple APPLE",
    "Apil Orange",
    "apples"
]
# 1st change here
# The next line stores tuples of each choice and it's according similarity measure in a list. This entries seem to be ordered from what your snippet shows.
ordered_choices = process.extract(query, choices)
#### Printing Accuracy Value
print ("List of ratios: ")
print (process.extract(query, choices), "\n")
#process.extractone(query, choices)
print ("\nBest among the above list ----->",process.extractOne(query, choices))

# 2nd change here
# The following line takes the first element of each tuple in the list and adds is to another list, which is afterwards printed. 
print("\nOrdered choices: ", [choice for choice, value in ordered_choices])