如何 pandas 将字符串转义列表的一列分解为 int 的 pandas 列
How to pandas explode a column of a string escaped list into a pandas column of int
我参考了 pandas explode 文档 :#https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.explode.html
此代码适用于字符串。
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [["1058","1057","1056","1055","1054"], np.nan, np.nan, ["10","57","56","55","54"]],
'B': 1,
'C': [['a', 'b', 'c'], np.nan, [], ['d', 'e']]})
df.explode('A')
给予
A B C
0 1058 1 [a, b, c]
0 1057 1 [a, b, c]
0 1056 1 [a, b, c]
0 1055 1 [a, b, c]
0 1054 1 [a, b, c]
1 NaN 1 NaN
2 NaN 1 []
3 10 1 [d, e]
3 57 1 [d, e]
3 56 1 [d, e]
3 55 1 [d, e]
3 54 1 [d, e]
如何使用包含引号的数据框获得与上面相同的 A 列分解结果?:
df = pd.DataFrame({'A': [['\"1058\",\"1057\",\"1056\",\"1055\",\"1054\"'], np.nan, np.nan, ['\"10\",\"57\",\"56\",\"55\",\"54\"']],
'B': 1,
'C': [['a', 'b', 'c'], np.nan, [], ['d', 'e']]})
使用 ast
哪个更好 luke eval
:
import ast
df['A'] = df.A.apply(lambda x: ast.literal_eval(x[0]) if isinstance(x, list) else x)
df = df.explode('A')
print (df)
A B C
0 1058 1 [a, b, c]
0 1057 1 [a, b, c]
0 1056 1 [a, b, c]
0 1055 1 [a, b, c]
0 1054 1 [a, b, c]
1 NaN 1 NaN
2 NaN 1 []
3 10 1 [d, e]
3 57 1 [d, e]
3 56 1 [d, e]
3 55 1 [d, e]
3 54 1 [d, e]
在explode
之前使用pd.eval
:
>>> df.assign(A=df['A'].apply(lambda x: pd.eval(x) if pd.notna(x) and x else x)) \
.explode('A')
A B C
0 1058 1 [a, b, c]
0 1057 1 [a, b, c]
0 1056 1 [a, b, c]
0 1055 1 [a, b, c]
0 1054 1 [a, b, c]
1 NaN 1 NaN
2 NaN 1 []
3 10 1 [d, e]
3 57 1 [d, e]
3 56 1 [d, e]
3 55 1 [d, e]
3 54 1 [d, e]
我参考了 pandas explode 文档 :#https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.explode.html
此代码适用于字符串。
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [["1058","1057","1056","1055","1054"], np.nan, np.nan, ["10","57","56","55","54"]],
'B': 1,
'C': [['a', 'b', 'c'], np.nan, [], ['d', 'e']]})
df.explode('A')
给予
A B C
0 1058 1 [a, b, c]
0 1057 1 [a, b, c]
0 1056 1 [a, b, c]
0 1055 1 [a, b, c]
0 1054 1 [a, b, c]
1 NaN 1 NaN
2 NaN 1 []
3 10 1 [d, e]
3 57 1 [d, e]
3 56 1 [d, e]
3 55 1 [d, e]
3 54 1 [d, e]
如何使用包含引号的数据框获得与上面相同的 A 列分解结果?:
df = pd.DataFrame({'A': [['\"1058\",\"1057\",\"1056\",\"1055\",\"1054\"'], np.nan, np.nan, ['\"10\",\"57\",\"56\",\"55\",\"54\"']],
'B': 1,
'C': [['a', 'b', 'c'], np.nan, [], ['d', 'e']]})
使用 ast
哪个更好 luke eval
:
import ast
df['A'] = df.A.apply(lambda x: ast.literal_eval(x[0]) if isinstance(x, list) else x)
df = df.explode('A')
print (df)
A B C
0 1058 1 [a, b, c]
0 1057 1 [a, b, c]
0 1056 1 [a, b, c]
0 1055 1 [a, b, c]
0 1054 1 [a, b, c]
1 NaN 1 NaN
2 NaN 1 []
3 10 1 [d, e]
3 57 1 [d, e]
3 56 1 [d, e]
3 55 1 [d, e]
3 54 1 [d, e]
在explode
之前使用pd.eval
:
>>> df.assign(A=df['A'].apply(lambda x: pd.eval(x) if pd.notna(x) and x else x)) \
.explode('A')
A B C
0 1058 1 [a, b, c]
0 1057 1 [a, b, c]
0 1056 1 [a, b, c]
0 1055 1 [a, b, c]
0 1054 1 [a, b, c]
1 NaN 1 NaN
2 NaN 1 []
3 10 1 [d, e]
3 57 1 [d, e]
3 56 1 [d, e]
3 55 1 [d, e]
3 54 1 [d, e]