如何拆分数据框所有列中的元组
How to split tuples in all columns of a dataframe
我的数据框中的列包含元组和空单元格。数据框中的列数是动态的,列没有标签。
**0 1 2**
**0** name,score,ID name,score,ID None
**1** name,score,ID None None
**2** None None name,score,ID
我想将所有列的元组拆分成单独的列,即:
**Name0 Score0 ID0 Name1 Score1 ID1 Name2 Score2 ID2**
**0** name score ID name score ID None None None
**1** name score ID None None None None None None
**2** None None None None None None name score ID
我发现了以下内容:
df1[['Name', 'Score', "ID"]] = pd.DataFrame(df1[0].tolist(), index=df1.index)
这基本上是有效的,但是它只是将第一列元组拆分为单独的列 (--> df1[0])。我找不到如何将其应用于所有列元组。
感谢任何帮助!
考虑以下玩具数据框:
import pandas as pd
df = pd.DataFrame(
{
0: {
0: None,
1: None,
2: None,
3: ("bartenbach gmbh rinner strasse 14 aldrans", 96, 1050),
4: (
"ait austrian institute of technology gmbh giefinggasse 4 wien",
70,
537,
),
},
1: {0: None, 1: None, 2: None, 3: None, 4: None},
2: {0: None, 1: None, 2: None, 3: None, 4: None},
}
)
print(df)
# Outputs
0 1 2
0 None None None
1 None None None
2 None None None
3 (bartenbach gmbh rinner strasse 14 aldrans, 96... None None
4 (ait austrian institute of technology gmbh gie... None None
您可以迭代每一列,然后迭代每个值,拆分字符串并填充一个新的数据框,如下所示:
new_df = pd.DataFrame()
for col_num, series in df.iteritems():
for i, value in enumerate(series.values):
try:
name, score, id_num = value
new_df.loc[i, f"Name{col_num}"] = name
new_df.loc[i, f"Score{col_num}"] = score
new_df.loc[i, f"ID{col_num}"] = id_num
except TypeError:
continue
new_df = new_df.reset_index(drop=True)
print(new_df)
# Outputs
Name0 Score0 ID0
0 bartenbach gmbh rinner strasse 14 aldrans 96.0 1050.0
1 ait austrian institute of technology gmbh gief... 70.0 537.0
我的数据框中的列包含元组和空单元格。数据框中的列数是动态的,列没有标签。
**0 1 2**
**0** name,score,ID name,score,ID None
**1** name,score,ID None None
**2** None None name,score,ID
我想将所有列的元组拆分成单独的列,即:
**Name0 Score0 ID0 Name1 Score1 ID1 Name2 Score2 ID2**
**0** name score ID name score ID None None None
**1** name score ID None None None None None None
**2** None None None None None None name score ID
我发现了以下内容:
df1[['Name', 'Score', "ID"]] = pd.DataFrame(df1[0].tolist(), index=df1.index)
这基本上是有效的,但是它只是将第一列元组拆分为单独的列 (--> df1[0])。我找不到如何将其应用于所有列元组。
感谢任何帮助!
考虑以下玩具数据框:
import pandas as pd
df = pd.DataFrame(
{
0: {
0: None,
1: None,
2: None,
3: ("bartenbach gmbh rinner strasse 14 aldrans", 96, 1050),
4: (
"ait austrian institute of technology gmbh giefinggasse 4 wien",
70,
537,
),
},
1: {0: None, 1: None, 2: None, 3: None, 4: None},
2: {0: None, 1: None, 2: None, 3: None, 4: None},
}
)
print(df)
# Outputs
0 1 2
0 None None None
1 None None None
2 None None None
3 (bartenbach gmbh rinner strasse 14 aldrans, 96... None None
4 (ait austrian institute of technology gmbh gie... None None
您可以迭代每一列,然后迭代每个值,拆分字符串并填充一个新的数据框,如下所示:
new_df = pd.DataFrame()
for col_num, series in df.iteritems():
for i, value in enumerate(series.values):
try:
name, score, id_num = value
new_df.loc[i, f"Name{col_num}"] = name
new_df.loc[i, f"Score{col_num}"] = score
new_df.loc[i, f"ID{col_num}"] = id_num
except TypeError:
continue
new_df = new_df.reset_index(drop=True)
print(new_df)
# Outputs
Name0 Score0 ID0
0 bartenbach gmbh rinner strasse 14 aldrans 96.0 1050.0
1 ait austrian institute of technology gmbh gief... 70.0 537.0