将多个 pandas 列合并到新列中
Merge multiple pandas columns into new column
我有一个数据框,其中一些列指示是否看到了一组调查问题。例如:
Q1_Seen Q2_Seen Q3_Seen Q4_Seen
Q1a nan nan nan
nan Q2a nan nan
nan nan Q3d nan
nan Q2c nan nan
我想将这些列合并为一列,比方说 Q_Seen
,其形式为:
Q_Seen
Q1a
Q2a
Q3d
Q2c
请注意,每一行都是互斥的:如果其中一列中有一个值,则所有其他列都将为 NaN。
我尝试用 pd.concat
这样做,但它似乎没有产生正确的结果。
以下对我有用:
df = pd.DataFrame({'Q1': [1, None, None], 'Q2': [None, 2, None], 'Q3': [None, None, 3]})
df['Q'] = df.concat([df['Q1'], df['Q2'], df['Q3']]).dropna()
可能有更优雅的解决方案,但这是我首先想到的。
试试这个:
df['Q_Seen'] = df.stack().values
>>> df
Q1_Seen Q2_Seen Q3_Seen Q4_Seen Q_Seen
Q1a nan nan nan Q1a
nan Q2a nan nan Q2a
nan nan Q3d nan Q3d
nan Q2c nan nan Q2c
按列使用 max()
——即 max(axis=1)
——将允许您将所有值折叠到一个列中:
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({"Q1_Seen": ['Q1a', None, None, None], "Q2_Seen": [None, "Q2a", None, "Q2c"], "Q3_Seen": [None, None, "Q3d", None],"Q4_Seen": [None, None, None, None]})
In [3]: df
Out[3]:
Q1_Seen Q2_Seen Q3_Seen Q4_Seen
0 Q1a None None None
1 None Q2a None None
2 None None Q3d None
3 None Q2c None None
In [4]: df['Q_Seen'] = df.max(axis=1)
In [5]: df
Out[5]:
Q1_Seen Q2_Seen Q3_Seen Q4_Seen Q_Seen
0 Q1a None None None Q1a
1 None Q2a None None Q2a
2 None None Q3d None Q3d
3 None Q2c None None Q2c
我有一个数据框,其中一些列指示是否看到了一组调查问题。例如:
Q1_Seen Q2_Seen Q3_Seen Q4_Seen
Q1a nan nan nan
nan Q2a nan nan
nan nan Q3d nan
nan Q2c nan nan
我想将这些列合并为一列,比方说 Q_Seen
,其形式为:
Q_Seen
Q1a
Q2a
Q3d
Q2c
请注意,每一行都是互斥的:如果其中一列中有一个值,则所有其他列都将为 NaN。
我尝试用 pd.concat
这样做,但它似乎没有产生正确的结果。
以下对我有用:
df = pd.DataFrame({'Q1': [1, None, None], 'Q2': [None, 2, None], 'Q3': [None, None, 3]})
df['Q'] = df.concat([df['Q1'], df['Q2'], df['Q3']]).dropna()
可能有更优雅的解决方案,但这是我首先想到的。
试试这个:
df['Q_Seen'] = df.stack().values
>>> df
Q1_Seen Q2_Seen Q3_Seen Q4_Seen Q_Seen
Q1a nan nan nan Q1a
nan Q2a nan nan Q2a
nan nan Q3d nan Q3d
nan Q2c nan nan Q2c
按列使用 max()
——即 max(axis=1)
——将允许您将所有值折叠到一个列中:
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({"Q1_Seen": ['Q1a', None, None, None], "Q2_Seen": [None, "Q2a", None, "Q2c"], "Q3_Seen": [None, None, "Q3d", None],"Q4_Seen": [None, None, None, None]})
In [3]: df
Out[3]:
Q1_Seen Q2_Seen Q3_Seen Q4_Seen
0 Q1a None None None
1 None Q2a None None
2 None None Q3d None
3 None Q2c None None
In [4]: df['Q_Seen'] = df.max(axis=1)
In [5]: df
Out[5]:
Q1_Seen Q2_Seen Q3_Seen Q4_Seen Q_Seen
0 Q1a None None None Q1a
1 None Q2a None None Q2a
2 None None Q3d None Q3d
3 None Q2c None None Q2c