将两个 Pandas 具有相同列的数据框组合成一个字符串列
Combining Two Pandas Dataframe with Same Columns into one String Columns
我有两个 Pandas 数据帧,即:
+-------+-------------------+--+
| Name | Class | |
+-------+-------------------+--+
| Alice | Physics | |
| Bob | "" (Empty string) | |
+-------+-------------------+--+
Table 2:
+-------+-----------+
| Name | Class |
+-------+-----------+
| Alice | Chemistry |
| Bob | Math |
+-------+-----------+
有没有办法在 Class 列上轻松组合它,所以结果 table 就像:
+-------+--------------------+
| Name | Class |
+-------+--------------------+
| Alice | Physics, Chemistry |
| Bob | Math |
+-------+--------------------+
我还想确保在添加列时没有多余的逗号。谢谢!
试试 concat
和 groupby
:
>>> pd.concat([df1, df2]).groupby("Name").agg(lambda x: ", ".join(i for i in x.tolist() if len(i.strip())>0)).reset_index()
Name Class
Alice Physics, Chemistry
Bob Math
df = pd.DataFrame({'Name':['Alice','Bob'],
'Class':['Physics',np.nan]})
df2 = pd.DataFrame({'Name':['Alice','Bob'],
'Class':['Chemistry','Math']})
df3 = df.append(df2).dropna(subset=['Class']).groupby('Name')['Class'].apply(list).reset_index()
# to remove list
df3['Class'] = df3['Class'].apply(lambda x: ', '.join(x))
我有两个 Pandas 数据帧,即:
+-------+-------------------+--+
| Name | Class | |
+-------+-------------------+--+
| Alice | Physics | |
| Bob | "" (Empty string) | |
+-------+-------------------+--+
Table 2:
+-------+-----------+
| Name | Class |
+-------+-----------+
| Alice | Chemistry |
| Bob | Math |
+-------+-----------+
有没有办法在 Class 列上轻松组合它,所以结果 table 就像:
+-------+--------------------+
| Name | Class |
+-------+--------------------+
| Alice | Physics, Chemistry |
| Bob | Math |
+-------+--------------------+
我还想确保在添加列时没有多余的逗号。谢谢!
试试 concat
和 groupby
:
>>> pd.concat([df1, df2]).groupby("Name").agg(lambda x: ", ".join(i for i in x.tolist() if len(i.strip())>0)).reset_index()
Name Class
Alice Physics, Chemistry
Bob Math
df = pd.DataFrame({'Name':['Alice','Bob'],
'Class':['Physics',np.nan]})
df2 = pd.DataFrame({'Name':['Alice','Bob'],
'Class':['Chemistry','Math']})
df3 = df.append(df2).dropna(subset=['Class']).groupby('Name')['Class'].apply(list).reset_index()
# to remove list
df3['Class'] = df3['Class'].apply(lambda x: ', '.join(x))