"Un-melt" 数据框并保留其余列? Python Pandas
"Un-melt" Dataframe and keep rest of columns? Python Pandas
我有一个table这种格式,我想用"opposite"融化。还有另一个问题可以解决这个问题,但它不适用于我想保留的许多其他专栏。
原文:
COUNTRY STATE CATEGORY RESTAURANT STARS REVIEWS
US Texas NaN Texas Chicken 4.1 1,157
US Texas Spicy Texas Chicken 4.1 1,157
US Ohio NaN Mamas Shop 3.6 700
US Ohio NaN Pizza Hut 4.5 855
US Ohio Pizza Pizza Hut 4.5 855
期望的输出:
COUNTRY STATE RESTAURANT STARS REVIEWS SPICY PIZZA
US Texas Texas Chicken 4.1 1,157 1 0
US Ohio Mamas Shop 3.6 700 0 0
US Ohio Pizza Hut 4.5 855 0 1
基本上我想 "group by" 许多列,同时根据类别列中的类别创建额外的列。对于所有这些附加列,没有任何特定类别的餐厅的值为 0。我也不想要任何额外的列层,因为我打算将其全部写入 [=27=]。
非常感谢对此的任何帮助,并在此先感谢您!
set_index, crosstab and reindex 的组合可以 'unmelt' 数据帧,并处理数据帧中存在的空值:
#set aside required multiindex of country, state, restaurant, stars, and reviews
ind = df.set_index(['COUNTRY','STATE','RESTAURANT','STARS','REVIEWS']).index
#get frequency count for Pizza and Spicy
res = pd.crosstab([df.COUNTRY,df.STATE,df.RESTAURANT,df.STARS,df.REVIEWS],df.CATEGORY)
#reindex frequency dataframe with ind
res = res.reindex(ind,fill_value=0).drop_duplicates()
res
CATEGORY Pizza Spicy
COUNTRY STATE RESTAURANT STARS REVIEWS
US Texas Texas Chicken 4.1 1,157 0 1
Ohio Mamas Shop 3.6 700 0 0
Pizza Hut 4.5 855 1 0
我想这应该可行:
pd.crosstab([df.COUNTRY,df.STATE,df.RESTAURANT,df.STARS,df.REVIEWS],
df['CATEGORY'].fillna('_')).drop(columns='_')
我有一个table这种格式,我想用"opposite"融化。还有另一个问题可以解决这个问题,但它不适用于我想保留的许多其他专栏。
原文:
COUNTRY STATE CATEGORY RESTAURANT STARS REVIEWS
US Texas NaN Texas Chicken 4.1 1,157
US Texas Spicy Texas Chicken 4.1 1,157
US Ohio NaN Mamas Shop 3.6 700
US Ohio NaN Pizza Hut 4.5 855
US Ohio Pizza Pizza Hut 4.5 855
期望的输出:
COUNTRY STATE RESTAURANT STARS REVIEWS SPICY PIZZA
US Texas Texas Chicken 4.1 1,157 1 0
US Ohio Mamas Shop 3.6 700 0 0
US Ohio Pizza Hut 4.5 855 0 1
基本上我想 "group by" 许多列,同时根据类别列中的类别创建额外的列。对于所有这些附加列,没有任何特定类别的餐厅的值为 0。我也不想要任何额外的列层,因为我打算将其全部写入 [=27=]。
非常感谢对此的任何帮助,并在此先感谢您!
set_index, crosstab and reindex 的组合可以 'unmelt' 数据帧,并处理数据帧中存在的空值:
#set aside required multiindex of country, state, restaurant, stars, and reviews
ind = df.set_index(['COUNTRY','STATE','RESTAURANT','STARS','REVIEWS']).index
#get frequency count for Pizza and Spicy
res = pd.crosstab([df.COUNTRY,df.STATE,df.RESTAURANT,df.STARS,df.REVIEWS],df.CATEGORY)
#reindex frequency dataframe with ind
res = res.reindex(ind,fill_value=0).drop_duplicates()
res
CATEGORY Pizza Spicy
COUNTRY STATE RESTAURANT STARS REVIEWS
US Texas Texas Chicken 4.1 1,157 0 1
Ohio Mamas Shop 3.6 700 0 0
Pizza Hut 4.5 855 1 0
我想这应该可行:
pd.crosstab([df.COUNTRY,df.STATE,df.RESTAURANT,df.STARS,df.REVIEWS],
df['CATEGORY'].fillna('_')).drop(columns='_')