使用行的 jellyfish.metaphone() 值填充 CSV 中的行
Populate Rows in CSV with jellyfish.metaphone() value of row
我是超级python菜鸟。
我正在尝试确定名称列表的 metaphone 代码。稍后将比较这些代码以找到潜在的 similar-sounding 个名称。
jellyfish模块很适合我的需求,我在创建列表的时候可以得到metaphone代码,如下:
import jellyfish
names = ['alexander','algoma','angel','antler']
for i in names:
print(i, "metaphone value =", jellyfish.metaphone(i))
##OUTPUT:
alexander metaphone value = ALKSNTR
algoma metaphone value = ALKM
angel metaphone value = ANJL
antler metaphone value = ANTLR
不过,我需要获取 metaphone 代码以获取约 3000 个名称的列表。我创建了一个 .csv,其中包含我需要的列 headers 和现有的名称列表。它看起来像这样:
RID *,ST_NAME,FirstWord,FirstWordMeta,StMeta
742,A F JOHNSON,A,,
1240,ABBEY,ABBEY,,
2133,ACES,ACES,,
362,ADAMS,ADAMS,,
所以理想情况下,我需要 FirstWordMeta = metaphone 每行 FirstWord 列中单词的代码和 StMeta = metaphone 每行 ST_NAME 列中单词的代码.我希望输出 .csv 看起来像这样:
RID *,ST_NAME,FirstWord,FirstWordMeta,StMeta
742,A F JOHNSON,A,A,A F JNSN
1240,ABBEY,ABBEY,SS,AB
2133,ACES,ACES,SS,SS
362,ADAMS,ADAMS,ATMS,ATMS
我试过 csv 模块,但我不明白在使用 jellyfish.metaphone()
时如何合并对特定列的引用
您可以使用 pandas 模块:
import pandas as pd
import jellyfish
data = pd.read_csv("test.csv") # Your filename here
# Looping over the rows and calculating the metaphone
for i in range(data.shape[0]):
data["FirstWordMeta"][i] = jellyfish.metaphone(data["FirstWord"][i])
data["StMeta"][i] = jellyfish.metaphone(data["ST_NAME"][i])
# Save to csv
data.to_csv("result.csv")
你可以试试这个:
import csv
import jellyfish
with open('input.csv') as inputfile:
reader = csv.reader(inputfile)
headers = next(reader)
inputdata = list(reader)
with open('output.csv', 'w') as outputfile:
writer = csv.writer(outputfile)
writer.writerow(headers)
for row in inputdata:
outputrow = row[:3] + [
jellyfish.metaphone(row[2]),
jellyfish.metaphone(row[1])
]
writer.writerow(outputrow)
我是超级python菜鸟。
我正在尝试确定名称列表的 metaphone 代码。稍后将比较这些代码以找到潜在的 similar-sounding 个名称。
jellyfish模块很适合我的需求,我在创建列表的时候可以得到metaphone代码,如下:
import jellyfish
names = ['alexander','algoma','angel','antler']
for i in names:
print(i, "metaphone value =", jellyfish.metaphone(i))
##OUTPUT:
alexander metaphone value = ALKSNTR
algoma metaphone value = ALKM
angel metaphone value = ANJL
antler metaphone value = ANTLR
不过,我需要获取 metaphone 代码以获取约 3000 个名称的列表。我创建了一个 .csv,其中包含我需要的列 headers 和现有的名称列表。它看起来像这样:
RID *,ST_NAME,FirstWord,FirstWordMeta,StMeta
742,A F JOHNSON,A,,
1240,ABBEY,ABBEY,,
2133,ACES,ACES,,
362,ADAMS,ADAMS,,
所以理想情况下,我需要 FirstWordMeta = metaphone 每行 FirstWord 列中单词的代码和 StMeta = metaphone 每行 ST_NAME 列中单词的代码.我希望输出 .csv 看起来像这样:
RID *,ST_NAME,FirstWord,FirstWordMeta,StMeta
742,A F JOHNSON,A,A,A F JNSN
1240,ABBEY,ABBEY,SS,AB
2133,ACES,ACES,SS,SS
362,ADAMS,ADAMS,ATMS,ATMS
我试过 csv 模块,但我不明白在使用 jellyfish.metaphone()
时如何合并对特定列的引用您可以使用 pandas 模块:
import pandas as pd
import jellyfish
data = pd.read_csv("test.csv") # Your filename here
# Looping over the rows and calculating the metaphone
for i in range(data.shape[0]):
data["FirstWordMeta"][i] = jellyfish.metaphone(data["FirstWord"][i])
data["StMeta"][i] = jellyfish.metaphone(data["ST_NAME"][i])
# Save to csv
data.to_csv("result.csv")
你可以试试这个:
import csv
import jellyfish
with open('input.csv') as inputfile:
reader = csv.reader(inputfile)
headers = next(reader)
inputdata = list(reader)
with open('output.csv', 'w') as outputfile:
writer = csv.writer(outputfile)
writer.writerow(headers)
for row in inputdata:
outputrow = row[:3] + [
jellyfish.metaphone(row[2]),
jellyfish.metaphone(row[1])
]
writer.writerow(outputrow)