如何拆分数据框所有列中的元组

How to split tuples in all columns of a dataframe

我的数据框中的列包含元组和空单元格。数据框中的列数是动态的,列没有标签。

          **0                1               2**
**0**    name,score,ID    name,score,ID      None
**1**    name,score,ID       None            None
**2**        None            None         name,score,ID

我想将所有列的元组拆分成单独的列,即:

      **Name0    Score0    ID0    Name1   Score1    ID1    Name2   Score2   ID2**
**0**    name    score     ID     name    score     ID      None    None    None
**1**    name    score     ID     None    None     None     None    None    None
**2**    None    None     None    None    None     None     name    score     ID

我发现了以下内容:

df1[['Name', 'Score', "ID"]] = pd.DataFrame(df1[0].tolist(), index=df1.index)

这基本上是有效的,但是它只是将第一列元组拆分为单独的列 (--> df1[0])。我找不到如何将其应用于所有列元组。

感谢任何帮助!

考虑以下玩具数据框:

import pandas as pd

df = pd.DataFrame(
    {
        0: {
            0: None,
            1: None,
            2: None,
            3: ("bartenbach gmbh rinner strasse 14 aldrans", 96, 1050),
            4: (
                "ait austrian institute of technology gmbh giefinggasse 4 wien",
                70,
                537,
            ),
        },
        1: {0: None, 1: None, 2: None, 3: None, 4: None},
        2: {0: None, 1: None, 2: None, 3: None, 4: None},
    }
)

print(df)
# Outputs
                                                   0     1     2
0                                               None  None  None
1                                               None  None  None
2                                               None  None  None
3  (bartenbach gmbh rinner strasse 14 aldrans, 96...  None  None
4  (ait austrian institute of technology gmbh gie...  None  None

您可以迭代每一列,然后迭代每个值,拆分字符串并填充一个新的数据框,如下所示:

new_df = pd.DataFrame()

for col_num, series in df.iteritems():
    for i, value in enumerate(series.values):
        try:
            name, score, id_num = value
            new_df.loc[i, f"Name{col_num}"] = name
            new_df.loc[i, f"Score{col_num}"] = score
            new_df.loc[i, f"ID{col_num}"] = id_num
        except TypeError:
            continue
new_df = new_df.reset_index(drop=True)

print(new_df)
# Outputs
                                               Name0  Score0     ID0
0          bartenbach gmbh rinner strasse 14 aldrans    96.0  1050.0
1  ait austrian institute of technology gmbh gief...    70.0   537.0