合并连续列中相同变量的级别
Merge levels of same variable which are in consecutive columns
我有一个 csv 数据文件,它有 2 个 header,这意味着一个 header 作为问题,第二个作为子 header,它有多个级别或答案主要header。当前的 csv 如下所示 table
Header Which country do you live? Which country you previously visited?
Users Canada USA UK Mexico Norway India Singapore Pakistan
User 1 Canada Singapore
User 2 UK India
User 3 Mexico Pakistan
User 4 Norway India
我需要转换成下面的样子table
Users Which country do you live? Which country you previously visited?
User 1 Canada Singapore
User 2 UK India
User 3 Norway Pakistan
User 4 Mexico India
有人可以帮我解决这个问题吗?
这是我的数据的样子
我的输入文件是这样的
这就是我的最终输出的样子
首先通过 bfill
填充缺失值,然后通过 select 第一列并通过 DataFrame.droplevel
移除 MultiIndex
的第二级:
print (df.columns)
MultiIndex(levels=[['Header', 'Which country do you live?'],
['Canada', 'Mexico', 'UK', 'USA', 'Users']],
codes=[[0, 1, 1, 1, 1], [4, 0, 3, 2, 1]])
#if first column is not index, create it
#df = df.set_index([df.columns[0]])
#if empty strings repalce them to NaNs
#df = df.replace('', np.nan)
df = df.bfill(axis=1).iloc[:, 0].reset_index().droplevel(level=1, axis=1)
print (df)
Header Which country do you live?
0 User 1 Canada
1 User 2 UK
2 User 3 Mexico
3 User 4 Norway
编辑:
df = df.groupby(level=0, axis=1).apply(lambda x: x.bfill(axis=1).iloc[:, 0])
print (df)
Header Which country do you live? Which country you previously visited?
0 User 1 Canada Singapore
1 User 2 UK India
2 User 3 Mexico Pakistan
3 User 4 Norway India
我有一个 csv 数据文件,它有 2 个 header,这意味着一个 header 作为问题,第二个作为子 header,它有多个级别或答案主要header。当前的 csv 如下所示 table
Header Which country do you live? Which country you previously visited? Users Canada USA UK Mexico Norway India Singapore Pakistan User 1 Canada Singapore User 2 UK India User 3 Mexico Pakistan User 4 Norway India
我需要转换成下面的样子table
Users Which country do you live? Which country you previously visited? User 1 Canada Singapore User 2 UK India User 3 Norway Pakistan User 4 Mexico India
有人可以帮我解决这个问题吗?
这是我的数据的样子
我的输入文件是这样的
首先通过 bfill
填充缺失值,然后通过 select 第一列并通过 DataFrame.droplevel
移除 MultiIndex
的第二级:
print (df.columns)
MultiIndex(levels=[['Header', 'Which country do you live?'],
['Canada', 'Mexico', 'UK', 'USA', 'Users']],
codes=[[0, 1, 1, 1, 1], [4, 0, 3, 2, 1]])
#if first column is not index, create it
#df = df.set_index([df.columns[0]])
#if empty strings repalce them to NaNs
#df = df.replace('', np.nan)
df = df.bfill(axis=1).iloc[:, 0].reset_index().droplevel(level=1, axis=1)
print (df)
Header Which country do you live?
0 User 1 Canada
1 User 2 UK
2 User 3 Mexico
3 User 4 Norway
编辑:
df = df.groupby(level=0, axis=1).apply(lambda x: x.bfill(axis=1).iloc[:, 0])
print (df)
Header Which country do you live? Which country you previously visited?
0 User 1 Canada Singapore
1 User 2 UK India
2 User 3 Mexico Pakistan
3 User 4 Norway India