Python 部分匹配的数据帧差异
Differences in Data Frames w/ Partial Matches for Python
这将简明扼要。我正在寻找数据帧之间差异的摘要,但部分匹配计为无差异。
import pandas as pd
import numpy as np
abc = {'Sport' : ['Football', 'Basketball', 'Baseball', 'Hockey'], 'Year' : ['2021','2021','2022','2022'], 'ID' : ['1','2','3','4']}
abc = pd.DataFrame({k: pd.Series(v) for k, v in abc.items()})
abc
xyz = {'SportLeague' : ['Football:NFL', 'Basketball:NBA', 'Baseball:MLB', 'Hockey:NHL', 'Soccer:MLS'], 'Year' : ['2022','2022','2022','2022', '2022'], 'ID' : ['2','3','2','4', '1']}
xyz = pd.DataFrame({k: pd.Series(v) for k, v in xyz.items()})
xyz = xyz.sort_values(by = ['ID'], ascending = True)
abc
这是一个使用 pd.DataFrame.compare
的想法,但您需要 reindex
数据帧才能像索引一样:
xyz = xyz.assign(**xyz['SportLeague'].str.split(':', expand=True).set_axis(['Sport','League'], axis=1))
xyz_c = xyz.reindex(abc.columns, axis=1)
xyz_c.compare(abc.reindex_like(xyz_c), keep_shape=True, keep_equal=True)
输出:
Sport Year ID
self other self other self other
4 Soccer NaN 2022 NaN 1 NaN
0 Football Football 2022 2021 2 1
2 Baseball Baseball 2022 2022 2 3
1 Basketball Basketball 2022 2021 3 2
3 Hockey Hockey 2022 2022 4 4
这将简明扼要。我正在寻找数据帧之间差异的摘要,但部分匹配计为无差异。
import pandas as pd
import numpy as np
abc = {'Sport' : ['Football', 'Basketball', 'Baseball', 'Hockey'], 'Year' : ['2021','2021','2022','2022'], 'ID' : ['1','2','3','4']}
abc = pd.DataFrame({k: pd.Series(v) for k, v in abc.items()})
abc
xyz = {'SportLeague' : ['Football:NFL', 'Basketball:NBA', 'Baseball:MLB', 'Hockey:NHL', 'Soccer:MLS'], 'Year' : ['2022','2022','2022','2022', '2022'], 'ID' : ['2','3','2','4', '1']}
xyz = pd.DataFrame({k: pd.Series(v) for k, v in xyz.items()})
xyz = xyz.sort_values(by = ['ID'], ascending = True)
abc
这是一个使用 pd.DataFrame.compare
的想法,但您需要 reindex
数据帧才能像索引一样:
xyz = xyz.assign(**xyz['SportLeague'].str.split(':', expand=True).set_axis(['Sport','League'], axis=1))
xyz_c = xyz.reindex(abc.columns, axis=1)
xyz_c.compare(abc.reindex_like(xyz_c), keep_shape=True, keep_equal=True)
输出:
Sport Year ID
self other self other self other
4 Soccer NaN 2022 NaN 1 NaN
0 Football Football 2022 2021 2 1
2 Baseball Baseball 2022 2022 2 3
1 Basketball Basketball 2022 2021 3 2
3 Hockey Hockey 2022 2022 4 4