拆分行以在 pandas 中创建 header

split row to create header in pandas

我有一个数据框 df 像:

age=14  gender=male     loc=NY  key=0012328434    Unnamed: 4
age=45  gender=female   loc=CS  key=834734hh43    pre="axe"
age=23  gender=female   loc=CA  key=545df35fdf    NaN
..
..
age=65  gender=male     loc=LA  key=dfdf545dfg    pre="cold"

我需要这个 df 有一个 header 并删除冗余数据,比如 desired_df:

age     gender          loc     key             pre
14      male            NY      0012328434      NaN
45      female          CS      834734hh43      axe
23      female          CA      545df35fdf      NaN
..
..
65      male            LA      dfdf545dfg      cold

我尝试做的事情:

df1 = df.str.split()
df_out = pd.DataFrame(df1.str[1::2].tolist(), columns=df1[0][0::2])

但这失败了,很明显,因为我没有 df 名称开头。非常感谢任何帮助。

(未经测试!)

headers = ['age', 'gender', 'loc', 'key', 'pre']

df.columns = headers
for name in df.columns:
    df[name] = df[name].str.removeprefix(f'{name}=')
# df = pd.read_csv(r'xyz.csv', header = None)

df1=(pd.DataFrame(df.fillna('NaN=NaN')
         .apply(lambda x: dict(list(x.str.replace('"', '')
                          .str.split('='))), axis=1).to_list())
       .drop('NaN', axis = 1))


  age  gender loc         key     pre
0  14    male  NY  0012328434     
1  45  female  CS  834734hh43     axe
2  23  female  CA  545df35fdf     NaN
3  65    male  LA  dfdf545dfg    cold