将不同长度的元组字符串拆分为 Pandas DF 中的列

Question

我有一个看起来像这样的数据框

id	human_id
1	('apples', '2022-12-04', 'a5ted')
2	('bananas', '2012-2-14')
3	('2012-2-14', 'reda21', 'ss')
..	..

我想要一种“pythonic”方式来获得这样的输出

id	human_id	col1	col2	col3
1	('apples', '2022-12-04', 'a5ted')	apples	2022-12-04	a5ted
2	('bananas', '2012-2-14')	bananas	2022-12-04	np.NaN
3	('2012-2-14', 'reda21', 'ss')	2012-2-14	reda21	ss

import pandas as pd

df['a'], df['b'], df['c'] = df.human_id.str

我试过的代码报错：

ValueError: not enough values to unpack (expected 2, got 1) Python

如何将元组中的值拆分为列？

谢谢。

Answer 1

你可以做到

out = df.join(pd.DataFrame(df.human_id.tolist(),index=df.index,columns=['a','b','c']))

Answer 2

你可以这样做。它只会将 None 放在找不到值的地方。然后您可以将 df1 附加到 df.

d = {'id': [1,2,3], 
     'human_id': ["('apples', '2022-12-04', 'a5ted')", 
                  "('bananas', '2012-2-14')",
                  "('2012-2-14', 'reda21', 'ss')"
                 ]}

df = pd.DataFrame(data=d)

list_human_id = tuple(list(df['human_id']))

newList = []
for val in listh:
    newList.append(eval(val))

df1 = pd.DataFrame(newList, columns=['col1', 'col2', 'col3'])

print(df1)

Output


        col1        col2   col3
0     apples  2022-12-04  a5ted
1    bananas   2012-2-14   None
2  2012-2-14      reda21     ss

Answer 3

列将使用元组的长度和使用相同的数据帧创建动态

import pandas as pd

id = [1, 2, 3]
human_id = [('apples', '2022-12-04', 'a5ted')
            ,('bananas', '2012-2-14')
            , ('2012-2-14', 'reda21', 'ss')]

df = pd.DataFrame({'id': id, 'human_id': human_id})

print("*"*20,'Dataframe',"*"*20)
print(df.to_string())

print()

print("*"*20,'Split Data',"*"*20)

row = 0

for x in df['human_id']:

    col = 1

    for xx in x:

        #df['col'+str(z)] = str(xx)

        name_column = 'col'+str(col)
        df.loc[df.index[row], name_column] = str(xx)

        col+=1

    row+=1

print(df.to_string())

将不同长度的元组字符串拆分为 Pandas DF 中的列

Splitting strings of tuples of different lengths to columns in Pandas DF

python

python-3.x

pandas

sklearn-pandas

jupyter-notebook