使用行的 jellyfish.metaphone() 值填充 CSV 中的行

Populate Rows in CSV with jellyfish.metaphone() value of row

我是超级python菜鸟。

我正在尝试确定名称列表的 metaphone 代码。稍后将比较这些代码以找到潜在的 similar-sounding 个名称。

jellyfish模块很适合我的需求,我在创建列表的时候可以得到metaphone代码,如下:

import jellyfish
names = ['alexander','algoma','angel','antler']
for i in names:
        print(i, "metaphone value =", jellyfish.metaphone(i))

##OUTPUT: 
alexander metaphone value = ALKSNTR
algoma metaphone value = ALKM
angel metaphone value = ANJL
antler metaphone value = ANTLR

不过,我需要获取 metaphone 代码以获取约 3000 个名称的列表。我创建了一个 .csv,其中包含我需要的列 headers 和现有的名称列表。它看起来像这样:

RID *,ST_NAME,FirstWord,FirstWordMeta,StMeta
742,A F JOHNSON,A,,
1240,ABBEY,ABBEY,,
2133,ACES,ACES,,
362,ADAMS,ADAMS,,

所以理想情况下,我需要 FirstWordMeta = metaphone 每行 FirstWord 列中单词的代码和 StMeta = metaphone 每行 ST_NAME 列中单词的代码.我希望输出 .csv 看起来像这样:

RID *,ST_NAME,FirstWord,FirstWordMeta,StMeta
742,A F JOHNSON,A,A,A F JNSN
1240,ABBEY,ABBEY,SS,AB
2133,ACES,ACES,SS,SS
362,ADAMS,ADAMS,ATMS,ATMS

我试过 csv 模块,但我不明白在使用 jellyfish.metaphone()

时如何合并对特定列的引用

您可以使用 pandas 模块:

import pandas as pd
import jellyfish

data = pd.read_csv("test.csv")  # Your filename here

# Looping over the rows and calculating the metaphone
for i in range(data.shape[0]):
    data["FirstWordMeta"][i] = jellyfish.metaphone(data["FirstWord"][i])
    data["StMeta"][i] = jellyfish.metaphone(data["ST_NAME"][i])

# Save to csv
data.to_csv("result.csv")

你可以试试这个:

import csv
import jellyfish

with open('input.csv') as inputfile:
    reader = csv.reader(inputfile)
    headers = next(reader)
    inputdata = list(reader)

with open('output.csv', 'w') as outputfile:
    writer = csv.writer(outputfile)
    writer.writerow(headers)

    for row in inputdata:
        outputrow = row[:3] + [
            jellyfish.metaphone(row[2]),
            jellyfish.metaphone(row[1])
        ]    
        writer.writerow(outputrow)