应用于 pandas 数据帧中的两列时出现 Difflib 错误
Difflib error when applying onto two columns in pandas dataframe
我的 DataFrame 看起来像这样:
Cities Cities_Dict
"San Francisco" ["San Francisco", "New York", "Boston"]
"Los Angeles" ["Los Angeles"]
"berlin" ["Munich", "Berlin"]
"Dubai" ["Dubai"]
我想创建一个新列,将第一列的城市与第二列的城市列表进行比较,并找到最匹配的城市。
我为此使用 difflib
:
df["new_col"]=difflib.get_close_matches(df["Cities"],df["Cities_Dict"])
但是我得到错误:
TypeError: object of type 'float' has no len()
使用带有 lambda 函数的 DataFrame.apply
和 axis=1
按行处理:
import difflib, ast
#if necessary convert values to lists
#df['Cities_Dict'] = df['Cities_Dict'].apply(ast.literal_eval)
f = lambda x: difflib.get_close_matches(x["Cities"],x["Cities_Dict"])
df["new_col"] = df.apply(f, axis=1)
print (df)
Cities Cities_Dict new_col
0 San Francisco [San Francisco, New York, Boston] [San Francisco]
1 Los Angeles [Los Angeles] [Los Angeles]
2 berlin [Munich, Berlin] [Berlin]
3 Dubai [Dubai] [Dubai]
编辑:
对于空列表使用空字符串的第一个值:
f = lambda x: next(iter(difflib.get_close_matches(x["Cities"],x["Cities_Dict"])), '')
df["new_col"] = df.apply(f, axis=1)
print (df)
Cities Cities_Dict new_col
0 San Francisco [San Francisco, New York, Boston] San Francisco
1 Los Angeles [Los Angeles] Los Angeles
2 berlin [Munich, Berlin] Berlin
3 Dubai [Dubai] Dubai
EDIT1:如果可能有问题的数据可能使用 try-except
:
def f(x):
try:
return difflib.get_close_matches(x["Cities"],x["Cities_Dict"])[0]
except:
return ''
df["new_col"] = df.apply(f, axis=1)
print (df)
Cities Cities_Dict new_col
0 NaN [San Francisco, New York, Boston]
1 Los Angeles [10]
2 berlin [Munich, Berlin] Berlin
3 Dubai [Dubai] Dubai
我的 DataFrame 看起来像这样:
Cities Cities_Dict
"San Francisco" ["San Francisco", "New York", "Boston"]
"Los Angeles" ["Los Angeles"]
"berlin" ["Munich", "Berlin"]
"Dubai" ["Dubai"]
我想创建一个新列,将第一列的城市与第二列的城市列表进行比较,并找到最匹配的城市。
我为此使用 difflib
:
df["new_col"]=difflib.get_close_matches(df["Cities"],df["Cities_Dict"])
但是我得到错误:
TypeError: object of type 'float' has no len()
使用带有 lambda 函数的 DataFrame.apply
和 axis=1
按行处理:
import difflib, ast
#if necessary convert values to lists
#df['Cities_Dict'] = df['Cities_Dict'].apply(ast.literal_eval)
f = lambda x: difflib.get_close_matches(x["Cities"],x["Cities_Dict"])
df["new_col"] = df.apply(f, axis=1)
print (df)
Cities Cities_Dict new_col
0 San Francisco [San Francisco, New York, Boston] [San Francisco]
1 Los Angeles [Los Angeles] [Los Angeles]
2 berlin [Munich, Berlin] [Berlin]
3 Dubai [Dubai] [Dubai]
编辑:
对于空列表使用空字符串的第一个值:
f = lambda x: next(iter(difflib.get_close_matches(x["Cities"],x["Cities_Dict"])), '')
df["new_col"] = df.apply(f, axis=1)
print (df)
Cities Cities_Dict new_col
0 San Francisco [San Francisco, New York, Boston] San Francisco
1 Los Angeles [Los Angeles] Los Angeles
2 berlin [Munich, Berlin] Berlin
3 Dubai [Dubai] Dubai
EDIT1:如果可能有问题的数据可能使用 try-except
:
def f(x):
try:
return difflib.get_close_matches(x["Cities"],x["Cities_Dict"])[0]
except:
return ''
df["new_col"] = df.apply(f, axis=1)
print (df)
Cities Cities_Dict new_col
0 NaN [San Francisco, New York, Boston]
1 Los Angeles [10]
2 berlin [Munich, Berlin] Berlin
3 Dubai [Dubai] Dubai