pandas wide_long vs (stack and melt) for data-frame transformation

pandas wide_long vs (stack and melt) for data-frame transformation

我有一个如下所示的数据框

df = pd.DataFrame({
'subject_ID':[1,2,3,4,5],
'date_visit':['1/1/2020','3/3/2200','13/11/2100','24/05/2198','30/03/2071'],
'a11fever':['Yes','No','Yes','Yes','No'],
'a12diagage':[36,34,42,40,np.nan],
'a12diagyr':[2021,3213,2091,4567,8901],
'a12diagyrago':[6,np.nan,9,np.nan,np.nan]})

我想转换数据帧,其中 一个主题的样本输出 如下所示

虽然我能够使用 pd.meltstack 成功地做到这一点,但我无法使用 wide_long.

做到同样的事情
pd.melt(df, id_vars =['subject_ID','date_visit'], value_vars =['a11fever', 'a12diagage', 'a12diagyr','a12diagyrago'])  # works fine

pd.wide_to_long(df, stubnames=['measurement', 'val'],i=(['subject_ID','date_visit']), j='grp').sort_index(level=0) # returns 0 records

df.set_index(['subject_ID','date_visit']).stack().reset_index() #works fine

我的另一个问题是,

a) 我们是否总是必须在 pd.meltvalue_vars 部分下提及我们想要转换的所有列名称。我的真实数据将有 120 多列。那么我必须在这里一一列举吗?

你能帮我解决一下如何使用 wide_long 吗?

Do we always have to mention all the column names that we would like to transform under value_vars section of pd.melt. My real data will have more than 120 columns. So do I have to mention all of them here?

不,没有必要,如果省略参数 value_vars 那么将使用所有列而不用于 id_vars:

df = pd.melt(df, id_vars =['subject_ID','date_visit'])
print (df)

    subject_ID  date_visit      variable value
0            1    1/1/2020      a11fever   Yes
1            2    3/3/2200      a11fever    No
2            3  13/11/2100      a11fever   Yes
3            4  24/05/2198      a11fever   Yes
4            5  30/03/2071      a11fever    No
5            1    1/1/2020    a12diagage    36
6            2    3/3/2200    a12diagage    34
7            3  13/11/2100    a12diagage    42
8            4  24/05/2198    a12diagage    40
9            5  30/03/2071    a12diagage   NaN
10           1    1/1/2020     a12diagyr  2021
11           2    3/3/2200     a12diagyr  3213
12           3  13/11/2100     a12diagyr  2091
13           4  24/05/2198     a12diagyr  4567
14           5  30/03/2071     a12diagyr  8901
15           1    1/1/2020  a12diagyrago     6
16           2    3/3/2200  a12diagyrago   NaN
17           3  13/11/2100  a12diagyrago     9
18           4  24/05/2198  a12diagyrago   NaN
19           5  30/03/2071  a12diagyrago   NaN

这不是 pd.wide_to_long 的用例,因为它会生成不正确的输出。您必须使用 stubnames,这些将被转换为列 (a11 & a12)。参见示例:

melt = pd.wide_to_long(df, 
                i=['subject_ID', 'date_visit'], 
                stubnames=['a11', 'a12'], 
                suffix='\D+', 
                j='fever_diag').reset_index()
    subject_ID  date_visit fever_diag  a11     a12
0            1    1/1/2020    diagage  NaN    36.0
1            1    1/1/2020     diagyr  NaN  2021.0
2            1    1/1/2020  diagyrago  NaN     6.0
3            1    1/1/2020      fever  Yes     NaN
4            2    3/3/2200    diagage  NaN    34.0
5            2    3/3/2200     diagyr  NaN  3213.0
6            2    3/3/2200  diagyrago  NaN     NaN
7            2    3/3/2200      fever   No     NaN
8            3  13/11/2100    diagage  NaN    42.0
9            3  13/11/2100     diagyr  NaN  2091.0
10           3  13/11/2100  diagyrago  NaN     9.0
11           3  13/11/2100      fever  Yes     NaN
12           4  24/05/2198    diagage  NaN    40.0
13           4  24/05/2198     diagyr  NaN  4567.0
14           4  24/05/2198  diagyrago  NaN     NaN
15           4  24/05/2198      fever  Yes     NaN
16           5  30/03/2071    diagage  NaN     NaN
17           5  30/03/2071     diagyr  NaN  8901.0
18           5  30/03/2071  diagyrago  NaN     NaN
19           5  30/03/2071      fever   No     NaN