Python pandas:高效地比较数据帧的行?
Python pandas: Efficiently compare rows of a dataframe?
我有数据框 'dfm' :
match group
adamant 86
adamant 86
adamant bild 86
360works 94
360works 94
其中'group'列相同,我想将'match'列的内容两两比较,并将比较结果添加到另一列'result'中。例如预期结果将是:
group compare result
86 adamant, adamant same
86 adamant, adamant bild not same
86 adamant, adamant bild not same
94 360works,360works same
有人可以帮忙吗?
有点hacky,但它似乎对我有用:
# initialize the list to store the dictionaries
# that will create the new DataFrame
new_df_dicts = []
# group on 'group'
for group, indices in dfm.groupby('group').groups.iteritems():
# get the values in the 'match' column
vals = dfm.ix[indices]['match'].values
# choose every possible pair from the array of column values
for i in range(len(vals)):
for j in range(i+1, len(vals)):
# compute the new values
compare = vals[i] + ', ' + vals[j]
if vals[i] == vals[j]:
result = 'same'
else:
result = 'not same'
# append the results to the DataFrame
new_df_dicts.append({'group': group, 'compare': compare, 'result': result})
# create the new DataFrame
new_df = DataFrame(new_df_dicts)
这是我的输出:
compare group result
0 360works, 360works 94 same
1 adamant, adamant 86 same
2 adamant, adamant bild 86 not same
3 adamant, adamant bild 86 not same
之前我建议将行附加到已初始化的 DataFrame。从字典列表创建一个 DataFrame,而不是对 DataFrame 进行多次追加,运行速度大约快 9-10 倍。
这是另一种选择。虽然不确定它是否更有效率
import itertools
import pandas as pd
new_df = pd.DataFrame()
for grp in set( dfm['group']):
for combo in itertools.combinations( dfm[dfm['group'] == grp].index, 2 ):
# compute the new values
match1 = dfm['match'][combo[0]]
match2 = dfm['match'][combo[0]]
compare = match1 + ', ' + match2
if match1 == match2:
result = 'same'
else:
result = 'not same'
# append the results to the DataFrame
new_df = new_df.append({'group': grp, 'compare': compare, 'result': result}, ignore_index=True)
print new_df
(格式借鉴自 James 的回答)
我有数据框 'dfm' :
match group
adamant 86
adamant 86
adamant bild 86
360works 94
360works 94
其中'group'列相同,我想将'match'列的内容两两比较,并将比较结果添加到另一列'result'中。例如预期结果将是:
group compare result
86 adamant, adamant same
86 adamant, adamant bild not same
86 adamant, adamant bild not same
94 360works,360works same
有人可以帮忙吗?
有点hacky,但它似乎对我有用:
# initialize the list to store the dictionaries
# that will create the new DataFrame
new_df_dicts = []
# group on 'group'
for group, indices in dfm.groupby('group').groups.iteritems():
# get the values in the 'match' column
vals = dfm.ix[indices]['match'].values
# choose every possible pair from the array of column values
for i in range(len(vals)):
for j in range(i+1, len(vals)):
# compute the new values
compare = vals[i] + ', ' + vals[j]
if vals[i] == vals[j]:
result = 'same'
else:
result = 'not same'
# append the results to the DataFrame
new_df_dicts.append({'group': group, 'compare': compare, 'result': result})
# create the new DataFrame
new_df = DataFrame(new_df_dicts)
这是我的输出:
compare group result
0 360works, 360works 94 same
1 adamant, adamant 86 same
2 adamant, adamant bild 86 not same
3 adamant, adamant bild 86 not same
之前我建议将行附加到已初始化的 DataFrame。从字典列表创建一个 DataFrame,而不是对 DataFrame 进行多次追加,运行速度大约快 9-10 倍。
这是另一种选择。虽然不确定它是否更有效率
import itertools
import pandas as pd
new_df = pd.DataFrame()
for grp in set( dfm['group']):
for combo in itertools.combinations( dfm[dfm['group'] == grp].index, 2 ):
# compute the new values
match1 = dfm['match'][combo[0]]
match2 = dfm['match'][combo[0]]
compare = match1 + ', ' + match2
if match1 == match2:
result = 'same'
else:
result = 'not same'
# append the results to the DataFrame
new_df = new_df.append({'group': grp, 'compare': compare, 'result': result}, ignore_index=True)
print new_df
(格式借鉴自 James 的回答)