数据合并 Pandas
Data Merging Pandas
我 运行 在不同的机器上进行了一些 pcmark 测试。最后我想巩固机器结果。我修改了最终结果以显示。我尝试了使用 pandas 的不同形式的合并,但我无法获得预期的结果,但这已经足够接近了。如有任何建议,我们将不胜感激
来自机器 1 的数据帧:
|------------|---------------------------|--------------|------------|
|Test Case | SubTest | App | Count |
|------------|---------------------------|--------------|------------|
|pcmark10 | AppStartUp | NaN | NaN |
|pcmark10 | PhotoEditing | NaN | NaN |
|pcmark10 | RenderingAndVisualization | NaN | NaN |
|pcmark10 | Spreadsheet | soffice.bin | 1.0 |
|pcmark10 | VideoConferencing | NaN | NaN |
|pcmark10 | VideoEditing | NaN | NaN |
|pcmark10 | WebBrowsing | NaN | NaN |
|pcmark10 | Writing | NaN | NaN |
|------------|---------------------------|--------------|------------|
来自机器 2 的数据帧:
|------------|---------------------------|--------------|------------|
|Test Case | SubTest | App | Count |
|------------|---------------------------|--------------|------------|
|pcmark10 | AppStartUp | NaN | NaN |
|pcmark10 | PhotoEditing | NaN | NaN |
|pcmark10 | RenderingAndVisualization | NaN | NaN |
|pcmark10 | Spreadsheet | NaN | NaN |
|pcmark10 | VideoConferencing | NaN | NaN |
|pcmark10 | VideoEditing | NaN | NaN |
|pcmark10 | WebBrowsing | chrome.exe | 2 |
|pcmark10 | Writing | NaN | NaN |
|------------|---------------------------|--------------|------------|
我希望结果如下所示:
|------------|---------------------------|--------------|------------|------------|
|Test Case | SubTest | App | Count_x | Count_y |
|------------|---------------------------|--------------|------------|------------|
|pcmark10 | AppStartUp | NaN | NaN | NaN |
|pcmark10 | PhotoEditing | NaN | NaN | NaN |
|pcmark10 | RenderingAndVisualization | NaN | NaN | NaN |
|pcmark10 | Spreadsheet | soffice.bin | 1.0 | NaN |
|pcmark10 | VideoConferencing | NaN | NaN | NaN |
|pcmark10 | VideoEditing | NaN | NaN | NaN |
|pcmark10 | WebBrowsing | chrome.exe | NaN | 2 |
|pcmark10 | Writing | NaN | NaN | NaN |
|------------|---------------------------|--------------|------------|------------|
我尝试了结合所有键的外部合并
这就是我得到的。使用外部函数将 pcmark10 的行值与 web 浏览器引导为空白。 Chrome 缺少应用程序列。
|------------|---------------------------|--------------|------------|------------|
|Test Case | SubTest | App | Count_x | Count_y |
|------------|---------------------------|--------------|------------|------------|
|pcmark10 | AppStartUp | NaN | NaN | NaN |
|pcmark10 | PhotoEditing | NaN | NaN | NaN |
|pcmark10 | RenderingAndVisualization | NaN | NaN | NaN |
|pcmark10 | Spreadsheet | soffice.bin | 1.0 | NaN |
|pcmark10 | VideoConferencing | NaN | NaN | NaN |
|pcmark10 | VideoEditing | NaN | NaN | NaN |
|pcmark10 | WebBrowsing | NaN | NaN | 2 |
|pcmark10 | Writing | NaN | NaN | NaN |
|------------|---------------------------|--------------|------------|------------|
合并命令:-
pd.merge(df1, df2, on=['Test Case', 'SubTest', 'App'], how="outer", indicator=True)
在您的情况下,合并 Test Case
和 SubTest
,然后使用 ffill
或 bfill
创建 App
:
(df1.merge(df2, on=['Test Case', 'SubTest'])
.assign(App=lambda x: x.filter(like='App').bfill(1).iloc[:,0])
.drop(['App_x','App_y'], axis=1)
)
输出:
Test Case SubTest Count_x Count_y App
0 pcmark10 AppStartUp NaN NaN NaN
1 pcmark10 PhotoEditing NaN NaN NaN
2 pcmark10 RenderingAndVisualization NaN NaN NaN
3 pcmark10 Spreadsheet 1.0 NaN soffice.bin
4 pcmark10 VideoConferencing NaN NaN NaN
5 pcmark10 VideoEditing NaN NaN NaN
6 pcmark10 WebBrowsing NaN 2.0 chrome.exe
7 pcmark10 Writing NaN NaN NaN
我 运行 在不同的机器上进行了一些 pcmark 测试。最后我想巩固机器结果。我修改了最终结果以显示。我尝试了使用 pandas 的不同形式的合并,但我无法获得预期的结果,但这已经足够接近了。如有任何建议,我们将不胜感激
来自机器 1 的数据帧:
|------------|---------------------------|--------------|------------|
|Test Case | SubTest | App | Count |
|------------|---------------------------|--------------|------------|
|pcmark10 | AppStartUp | NaN | NaN |
|pcmark10 | PhotoEditing | NaN | NaN |
|pcmark10 | RenderingAndVisualization | NaN | NaN |
|pcmark10 | Spreadsheet | soffice.bin | 1.0 |
|pcmark10 | VideoConferencing | NaN | NaN |
|pcmark10 | VideoEditing | NaN | NaN |
|pcmark10 | WebBrowsing | NaN | NaN |
|pcmark10 | Writing | NaN | NaN |
|------------|---------------------------|--------------|------------|
来自机器 2 的数据帧:
|------------|---------------------------|--------------|------------|
|Test Case | SubTest | App | Count |
|------------|---------------------------|--------------|------------|
|pcmark10 | AppStartUp | NaN | NaN |
|pcmark10 | PhotoEditing | NaN | NaN |
|pcmark10 | RenderingAndVisualization | NaN | NaN |
|pcmark10 | Spreadsheet | NaN | NaN |
|pcmark10 | VideoConferencing | NaN | NaN |
|pcmark10 | VideoEditing | NaN | NaN |
|pcmark10 | WebBrowsing | chrome.exe | 2 |
|pcmark10 | Writing | NaN | NaN |
|------------|---------------------------|--------------|------------|
我希望结果如下所示:
|------------|---------------------------|--------------|------------|------------|
|Test Case | SubTest | App | Count_x | Count_y |
|------------|---------------------------|--------------|------------|------------|
|pcmark10 | AppStartUp | NaN | NaN | NaN |
|pcmark10 | PhotoEditing | NaN | NaN | NaN |
|pcmark10 | RenderingAndVisualization | NaN | NaN | NaN |
|pcmark10 | Spreadsheet | soffice.bin | 1.0 | NaN |
|pcmark10 | VideoConferencing | NaN | NaN | NaN |
|pcmark10 | VideoEditing | NaN | NaN | NaN |
|pcmark10 | WebBrowsing | chrome.exe | NaN | 2 |
|pcmark10 | Writing | NaN | NaN | NaN |
|------------|---------------------------|--------------|------------|------------|
我尝试了结合所有键的外部合并 这就是我得到的。使用外部函数将 pcmark10 的行值与 web 浏览器引导为空白。 Chrome 缺少应用程序列。
|------------|---------------------------|--------------|------------|------------|
|Test Case | SubTest | App | Count_x | Count_y |
|------------|---------------------------|--------------|------------|------------|
|pcmark10 | AppStartUp | NaN | NaN | NaN |
|pcmark10 | PhotoEditing | NaN | NaN | NaN |
|pcmark10 | RenderingAndVisualization | NaN | NaN | NaN |
|pcmark10 | Spreadsheet | soffice.bin | 1.0 | NaN |
|pcmark10 | VideoConferencing | NaN | NaN | NaN |
|pcmark10 | VideoEditing | NaN | NaN | NaN |
|pcmark10 | WebBrowsing | NaN | NaN | 2 |
|pcmark10 | Writing | NaN | NaN | NaN |
|------------|---------------------------|--------------|------------|------------|
合并命令:- pd.merge(df1, df2, on=['Test Case', 'SubTest', 'App'], how="outer", indicator=True)
在您的情况下,合并 Test Case
和 SubTest
,然后使用 ffill
或 bfill
创建 App
:
(df1.merge(df2, on=['Test Case', 'SubTest'])
.assign(App=lambda x: x.filter(like='App').bfill(1).iloc[:,0])
.drop(['App_x','App_y'], axis=1)
)
输出:
Test Case SubTest Count_x Count_y App
0 pcmark10 AppStartUp NaN NaN NaN
1 pcmark10 PhotoEditing NaN NaN NaN
2 pcmark10 RenderingAndVisualization NaN NaN NaN
3 pcmark10 Spreadsheet 1.0 NaN soffice.bin
4 pcmark10 VideoConferencing NaN NaN NaN
5 pcmark10 VideoEditing NaN NaN NaN
6 pcmark10 WebBrowsing NaN 2.0 chrome.exe
7 pcmark10 Writing NaN NaN NaN