如何为转置数据创建新列

How to create a new column for transposed data

我正在尝试使用 pandas 数据框将一行转置为新列。访问 ID 是唯一标识符。我使用 df.pivot 和 df.melt 但 df.melt 似乎相反。我是 Python 的新手,我开始了,但我很迷茫。 有什么建议吗?

当前输入:

访问 ID DX代码 保险 小学还是中学
1 123 安泰 小学
1 234 亲和力 中学
2 456 VNS 中学
2 789 医疗保险 小学

期望输出:

访问 ID DX代码 DX代码2 小学 中学
1 123 234 安泰 亲和力
2 456 789 医疗保险 VNS
import pandas as pd

df = pd.read_excel(r'C:\Users\TEST.xlsx', sheet_name = 'Sheet1')

# pivot = df.pivot(index='Visit ID', columns='DX Code', values = 'DX ID')
# print(pivot)

# melt = df.melt(value_name='DX Code', var_name='DX Code2')
# print(melt)

您可以使用 datar,它使用 pandas 作为后端,但实现了类似 dplyr 的语法:

>>> from datar.all import c, f, tribble, tibble, rep, paste0, pivot_wider
>>> 
>>> df = tribble(
...     f.Visit_ID, f.DX_Code, f.Insurance, f.Primary_or_Secondary,
...     1,          123,       "Aetna",     "Primary",
...     1,          234,       "Affinity",  "Secondary",
...     2,          456,       "VNS",       "Secondary",
...     2,          789,       "Medicare",  "Primary",
... )
>>> df
   Visit_ID  DX_Code Insurance Primary_or_Secondary
    <int64>  <int64>  <object>             <object>
0         1      123     Aetna              Primary
1         1      234  Affinity            Secondary
2         2      456       VNS            Secondary
3         2      789  Medicare              Primary

>>> # Create a new df with names and values
>>> df2 = tibble(
...     Visit_ID=rep(df.Visit_ID, 2),
...     name=c(paste0("DX Code", rep(c("", "2"), 2)), df.Primary_or_Secondary),
...     value=c(df.DX_Code, df.Insurance)
... )
>>> 
>>> df2
   Visit_ID       name     value
    <int64>   <object>  <object>
0         1    DX Code       123
1         1   DX Code2       234
2         2    DX Code       456
3         2   DX Code2       789
4         1    Primary     Aetna
5         1  Secondary  Affinity
6         2  Secondary       VNS
7         2    Primary  Medicare
>>> df2 >> pivot_wider()
   Visit_ID  DX Code DX Code2   Primary Secondary
    <int64> <object> <object>  <object>  <object>
0         1      123      234     Aetna  Affinity
1         2      456      789  Medicare       VNS

免责声明:我是 datar 包的作者。

您可以使用 merge:

out = pd.merge(df[df['Primary or Secondary'] == 'Primary'],
               df[df['Primary or Secondary'] == 'Secondary'],
               on='Visit ID', suffixes=('', '2'))

剩下的只是重新格式化:

out = out[['Visit ID', 'DX Code', 'DX Code2', 'Insurance', 'Insurance2']] \
          .rename(columns={'Insurance': 'Primary', 'Insurance2': 'Secondary'})
>>> df
   Visit ID  DX Code  DX Code2   Primary Secondary
0         1      123       234     Aetna  Affinity
1         2      789       456  Medicare       VNS