比较 pandas 数据框中的两列和 return 差异
Comparing two columns in pandas data frame and return difference
我正在比较两个 excel 工作表的内容,我已将其转换为 pandas 数据框并将各列并排放置。
我已经创建了一些代码来比较两列并在下面给出输出,但是一些字符串包含大量文本所以我想显示 仅 差异.
+---------------------+-------------------------+---------------------------------------------+
| Old | New | Changes |
+---------------------+-------------------------+---------------------------------------------+
| Apple, Egg, Ham | Apple, Egg, Norway, Ham | Apple, Egg, Ham --> Apple, Egg, Norway, Ham |
| Instagram, Facebook | Instagram, Twitter | Instagram, Facebook --> Instagram, Twitter |
+---------------------+-------------------------+---------------------------------------------+
最佳结果如下所示:
+---------------------+-------------------------+---------------------+
| Old | New | Changes |
+---------------------+-------------------------+---------------------+
| Apple, Egg, Ham | Apple, Egg, Norway, Ham | +Norway |
| Instagram, Facebook | Instagram, Twitter | +Twitter, -Facebook |
+---------------------+-------------------------+---------------------+
第 1 行添加了挪威,第 2 行添加了 Twitter,第 2 行删除了 Facebook。
我该如何解决这个问题?
将值转换为集合,然后使用它们的差异,还在 f-string
中添加 +
和 -
,并在最后一步加入 ,
:
def f(x):
old, new = set(x['Old'].split(', ')), set(x['New'].split(', '))
d = old.difference(new)
e = new.difference(old)
return ', '.join([f'+{y}' for y in e] + [f'-{y}' for y in d])
df['Changes'] = df.apply(f, axis=1)
print (df)
Old New Changes
0 Apple, Egg, Ham Apple, Egg, Norway, Ham +Norway
1 Instagram, Facebook Instagram, Twitter +Twitter, -Facebook
下面是我要走的路:
def find_diff(x):
more = set(x.New.split(",")) - set(x.Old.split(","))
less = set(x.Old.split(",")) - set(x.New.split(","))
result = " ".join([f"+{x}" for x in more]) +", " + " ".join([f"-{x}" for x in less])
return result
df.apply(find_diff, axis=1)
我正在比较两个 excel 工作表的内容,我已将其转换为 pandas 数据框并将各列并排放置。
我已经创建了一些代码来比较两列并在下面给出输出,但是一些字符串包含大量文本所以我想显示 仅 差异.
+---------------------+-------------------------+---------------------------------------------+
| Old | New | Changes |
+---------------------+-------------------------+---------------------------------------------+
| Apple, Egg, Ham | Apple, Egg, Norway, Ham | Apple, Egg, Ham --> Apple, Egg, Norway, Ham |
| Instagram, Facebook | Instagram, Twitter | Instagram, Facebook --> Instagram, Twitter |
+---------------------+-------------------------+---------------------------------------------+
最佳结果如下所示:
+---------------------+-------------------------+---------------------+
| Old | New | Changes |
+---------------------+-------------------------+---------------------+
| Apple, Egg, Ham | Apple, Egg, Norway, Ham | +Norway |
| Instagram, Facebook | Instagram, Twitter | +Twitter, -Facebook |
+---------------------+-------------------------+---------------------+
第 1 行添加了挪威,第 2 行添加了 Twitter,第 2 行删除了 Facebook。
我该如何解决这个问题?
将值转换为集合,然后使用它们的差异,还在 f-string
中添加 +
和 -
,并在最后一步加入 ,
:
def f(x):
old, new = set(x['Old'].split(', ')), set(x['New'].split(', '))
d = old.difference(new)
e = new.difference(old)
return ', '.join([f'+{y}' for y in e] + [f'-{y}' for y in d])
df['Changes'] = df.apply(f, axis=1)
print (df)
Old New Changes
0 Apple, Egg, Ham Apple, Egg, Norway, Ham +Norway
1 Instagram, Facebook Instagram, Twitter +Twitter, -Facebook
下面是我要走的路:
def find_diff(x):
more = set(x.New.split(",")) - set(x.Old.split(","))
less = set(x.Old.split(",")) - set(x.New.split(","))
result = " ".join([f"+{x}" for x in more]) +", " + " ".join([f"-{x}" for x in less])
return result
df.apply(find_diff, axis=1)