将不同长度的元组字符串拆分为 Pandas DF 中的列
Splitting strings of tuples of different lengths to columns in Pandas DF
我有一个看起来像这样的数据框
id
human_id
1
('apples', '2022-12-04', 'a5ted')
2
('bananas', '2012-2-14')
3
('2012-2-14', 'reda21', 'ss')
..
..
我想要一种“pythonic”方式来获得这样的输出
id
human_id
col1
col2
col3
1
('apples', '2022-12-04', 'a5ted')
apples
2022-12-04
a5ted
2
('bananas', '2012-2-14')
bananas
2022-12-04
np.NaN
3
('2012-2-14', 'reda21', 'ss')
2012-2-14
reda21
ss
import pandas as pd
df['a'], df['b'], df['c'] = df.human_id.str
我试过的代码报错:
ValueError: not enough values to unpack (expected 2, got 1) Python
如何将元组中的值拆分为列?
谢谢。
你可以做到
out = df.join(pd.DataFrame(df.human_id.tolist(),index=df.index,columns=['a','b','c']))
你可以这样做。它只会将 None 放在找不到值的地方。然后您可以将 df1 附加到 df.
d = {'id': [1,2,3],
'human_id': ["('apples', '2022-12-04', 'a5ted')",
"('bananas', '2012-2-14')",
"('2012-2-14', 'reda21', 'ss')"
]}
df = pd.DataFrame(data=d)
list_human_id = tuple(list(df['human_id']))
newList = []
for val in listh:
newList.append(eval(val))
df1 = pd.DataFrame(newList, columns=['col1', 'col2', 'col3'])
print(df1)
Output
col1 col2 col3
0 apples 2022-12-04 a5ted
1 bananas 2012-2-14 None
2 2012-2-14 reda21 ss
列将使用元组的长度和使用相同的数据帧创建动态
import pandas as pd
id = [1, 2, 3]
human_id = [('apples', '2022-12-04', 'a5ted')
,('bananas', '2012-2-14')
, ('2012-2-14', 'reda21', 'ss')]
df = pd.DataFrame({'id': id, 'human_id': human_id})
print("*"*20,'Dataframe',"*"*20)
print(df.to_string())
print()
print("*"*20,'Split Data',"*"*20)
row = 0
for x in df['human_id']:
col = 1
for xx in x:
#df['col'+str(z)] = str(xx)
name_column = 'col'+str(col)
df.loc[df.index[row], name_column] = str(xx)
col+=1
row+=1
print(df.to_string())
我有一个看起来像这样的数据框
id | human_id |
---|---|
1 | ('apples', '2022-12-04', 'a5ted') |
2 | ('bananas', '2012-2-14') |
3 | ('2012-2-14', 'reda21', 'ss') |
.. | .. |
我想要一种“pythonic”方式来获得这样的输出
id | human_id | col1 | col2 | col3 |
---|---|---|---|---|
1 | ('apples', '2022-12-04', 'a5ted') | apples | 2022-12-04 | a5ted |
2 | ('bananas', '2012-2-14') | bananas | 2022-12-04 | np.NaN |
3 | ('2012-2-14', 'reda21', 'ss') | 2012-2-14 | reda21 | ss |
import pandas as pd
df['a'], df['b'], df['c'] = df.human_id.str
我试过的代码报错:
ValueError: not enough values to unpack (expected 2, got 1) Python
如何将元组中的值拆分为列?
谢谢。
你可以做到
out = df.join(pd.DataFrame(df.human_id.tolist(),index=df.index,columns=['a','b','c']))
你可以这样做。它只会将 None 放在找不到值的地方。然后您可以将 df1 附加到 df.
d = {'id': [1,2,3],
'human_id': ["('apples', '2022-12-04', 'a5ted')",
"('bananas', '2012-2-14')",
"('2012-2-14', 'reda21', 'ss')"
]}
df = pd.DataFrame(data=d)
list_human_id = tuple(list(df['human_id']))
newList = []
for val in listh:
newList.append(eval(val))
df1 = pd.DataFrame(newList, columns=['col1', 'col2', 'col3'])
print(df1)
Output
col1 col2 col3
0 apples 2022-12-04 a5ted
1 bananas 2012-2-14 None
2 2012-2-14 reda21 ss
列将使用元组的长度和使用相同的数据帧创建动态
import pandas as pd
id = [1, 2, 3]
human_id = [('apples', '2022-12-04', 'a5ted')
,('bananas', '2012-2-14')
, ('2012-2-14', 'reda21', 'ss')]
df = pd.DataFrame({'id': id, 'human_id': human_id})
print("*"*20,'Dataframe',"*"*20)
print(df.to_string())
print()
print("*"*20,'Split Data',"*"*20)
row = 0
for x in df['human_id']:
col = 1
for xx in x:
#df['col'+str(z)] = str(xx)
name_column = 'col'+str(col)
df.loc[df.index[row], name_column] = str(xx)
col+=1
row+=1
print(df.to_string())