如何按数据框中的列分组并在循环中创建数据透视表
How to group by column in a dataframe and create pivot tables in a loop
我有以下 table df
.
ID CATEG LEVEL COLS VALUE COMMENT
1 A 3 Apple 388 comment1
1 A 3 Orange 204 comment1
1 A 2 Orange 322 comment1
1 A 1 Orange 716 comment1
1 A 1 Apple 282 comment1
1 A 2 Apple 555 comment1
1 A Berry 289 comment1
2 A Car 316 comment1
1 B Berry 297 comment1
1 B 3 Apple 756 comment1
1 B 2 Apple 460 comment1
1 B 3 Orange 497 comment1
1 B 2 Orange 831 comment1
1 B 1 Orange 225 comment1
1 B 1 Apple 395 comment1
2 B Car 486 comment1
1 C 2 Orange 320 comment1
1 C 1 Orange 208 comment1
1 C 1 Apple 464 comment1
1 C 2 Apple 613 comment1
1 C 3 Apple 369 comment1
1 C Berry 474 comment1
2 C Car 888 comment1
1 C 3 Orange 345 comment1
2 B Car 664 comment2
我想在 dataframe
中创建此视图并在 excel 中为每组 ID
写入。ID
1
的示例。在我的示例中,只有一个评论,所以 sheet 名称像 ID_COMMENT
像 1_comment1
:-
Berry Apple Orange
1 2 3 1 2 3
A 289 388 555 282 204 322 716
B 297 756 460 395 497 831 225
C 474 369 613 464 345 320 208
如果 LEVEL
是 None/na
我应该能够根据 COLS
和 comments
创建/拆分 df
单独使用名称“ID_NULL_COMMENT" as sheet 名称如:-
2_NULL_comment1
sheet :-
CATEG Car
A 316
B 486
C 888
2_NULL_comment2
sheet :-
CATEG Car
B 664
我尝试了什么:
from pandas import ExcelWriter
writer = ExcelWriter('Values.xlsx')
distinct_id_df= np.unique(df[['ID']], axis=0)
for ID in distinct_id_df.iloc[:,0] :
sample_df = pd.DataFrame()
for df in sample_df:
for i in(distinct_id_df):
distinct_id_df = df.groupby['ID'].pivot_table('VALUE', ['LEVEL','CATEEG'],'COLS')
sample_df = sample_df.append(df)
print(sample_df.shape, '===>', datetime.now())
sample_df.to_excel(writer,'{}''{}'.format(id).format(comments),index= False)
writer.save()
这显然不正确,我无法正确执行 pivot
并且还停留在如何正确循环以放置在不同的 sheet.
使用:
df = pd.DataFrame({'ID': [1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 2, 1, 2], 'CATEG': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'B'], 'LEVEL': [3.0, 3.0, 2.0, 1.0, 1.0, 2.0, np.nan, np.nan, np.nan, 3.0, 2.0, 3.0, 2.0, 1.0, 1.0, np.nan, 2.0, 1.0, 1.0, 2.0, 3.0, np.nan, np.nan, 3.0, np.nan], 'COLS': ['Apple', 'Orange', 'Orange', 'Orange', 'Apple', 'Apple', 'Berry', 'Car', 'Berry', 'Apple', 'Apple', 'Orange', 'Orange', 'Orange', 'Apple', 'Car', 'Orange', 'Orange', 'Apple', 'Apple', 'Apple', 'Berry', 'Car', 'Orange', 'Car'], 'VALUE': [388, 204, 322, 716, 282, 555, 289, 316, 297, 756, 460, 497, 831, 225, 395, 486, 320, 208, 464, 613, 369, 474, 888, 345, 664], 'COMMENT': ['comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment2']})
#check misisng values
mask = df['LEVEL'].isna()
#split DataFrames for different processing
df1 = df[~mask]
df2 = df[mask]
#pivoting with differnet columns parameters
df1 = df1.pivot_table(index=['ID','COMMENT','CATEG'],
columns=['COLS','LEVEL'],
values='VALUE')
# print (df1)
df2 = df2.pivot_table(index=['ID','COMMENT','CATEG'], columns='COLS',values='VALUE')
# print (df1)
from pandas import ExcelWriter
with pd.ExcelWriter('Values.xlsx') as writer:
#groupby by first 2 levels ID, COMMENT
for (ids,comments), sample_df in df1.groupby(['ID','COMMENT']):
#removed first 2 levels, also removed only NaNs columns
df = sample_df.reset_index(level=[1], drop=True).dropna(how='all', axis=1)
#new sheetnames by f-strings
name = f'{ids}_{comments}'
#write to file
df.to_excel(writer,sheet_name=name)
for (ids,comments), sample_df in df2.groupby(['ID','COMMENT']):
df = sample_df.reset_index(level=[1], drop=True).dropna(how='all', axis=1)
name = f'{ids}_NULL_{comments}'
df.to_excel(writer,sheet_name=name)
无重复代码的另一种解决方案:
mask = df['LEVEL'].isna()
dfs = {'no_null': df[~mask], 'null': df[mask]}
from pandas import ExcelWriter
with pd.ExcelWriter('Values.xlsx') as writer:
for k, v in dfs.items():
if k == 'no_null':
add = ''
cols = ['COLS','LEVEL']
else:
add = 'NULL_'
cols = 'COLS'
df = v.pivot_table(index=['ID','COMMENT','CATEG'], columns=cols, values='VALUE')
for (ids,comments), sample_df in df.groupby(['ID','COMMENT']):
df = sample_df.reset_index(level=[1], drop=True).dropna(how='all', axis=1)
name = f'{ids}_{add}{comments}'
df.to_excel(writer,sheet_name=name)
我有以下 table df
.
ID CATEG LEVEL COLS VALUE COMMENT
1 A 3 Apple 388 comment1
1 A 3 Orange 204 comment1
1 A 2 Orange 322 comment1
1 A 1 Orange 716 comment1
1 A 1 Apple 282 comment1
1 A 2 Apple 555 comment1
1 A Berry 289 comment1
2 A Car 316 comment1
1 B Berry 297 comment1
1 B 3 Apple 756 comment1
1 B 2 Apple 460 comment1
1 B 3 Orange 497 comment1
1 B 2 Orange 831 comment1
1 B 1 Orange 225 comment1
1 B 1 Apple 395 comment1
2 B Car 486 comment1
1 C 2 Orange 320 comment1
1 C 1 Orange 208 comment1
1 C 1 Apple 464 comment1
1 C 2 Apple 613 comment1
1 C 3 Apple 369 comment1
1 C Berry 474 comment1
2 C Car 888 comment1
1 C 3 Orange 345 comment1
2 B Car 664 comment2
我想在 dataframe
中创建此视图并在 excel 中为每组 ID
写入。ID
1
的示例。在我的示例中,只有一个评论,所以 sheet 名称像 ID_COMMENT
像 1_comment1
:-
Berry Apple Orange
1 2 3 1 2 3
A 289 388 555 282 204 322 716
B 297 756 460 395 497 831 225
C 474 369 613 464 345 320 208
如果 LEVEL
是 None/na
我应该能够根据 COLS
和 comments
创建/拆分 df
单独使用名称“ID_NULL_COMMENT" as sheet 名称如:-
2_NULL_comment1
sheet :-
CATEG Car
A 316
B 486
C 888
2_NULL_comment2
sheet :-
CATEG Car
B 664
我尝试了什么:
from pandas import ExcelWriter
writer = ExcelWriter('Values.xlsx')
distinct_id_df= np.unique(df[['ID']], axis=0)
for ID in distinct_id_df.iloc[:,0] :
sample_df = pd.DataFrame()
for df in sample_df:
for i in(distinct_id_df):
distinct_id_df = df.groupby['ID'].pivot_table('VALUE', ['LEVEL','CATEEG'],'COLS')
sample_df = sample_df.append(df)
print(sample_df.shape, '===>', datetime.now())
sample_df.to_excel(writer,'{}''{}'.format(id).format(comments),index= False)
writer.save()
这显然不正确,我无法正确执行 pivot
并且还停留在如何正确循环以放置在不同的 sheet.
使用:
df = pd.DataFrame({'ID': [1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 2, 1, 2], 'CATEG': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'B'], 'LEVEL': [3.0, 3.0, 2.0, 1.0, 1.0, 2.0, np.nan, np.nan, np.nan, 3.0, 2.0, 3.0, 2.0, 1.0, 1.0, np.nan, 2.0, 1.0, 1.0, 2.0, 3.0, np.nan, np.nan, 3.0, np.nan], 'COLS': ['Apple', 'Orange', 'Orange', 'Orange', 'Apple', 'Apple', 'Berry', 'Car', 'Berry', 'Apple', 'Apple', 'Orange', 'Orange', 'Orange', 'Apple', 'Car', 'Orange', 'Orange', 'Apple', 'Apple', 'Apple', 'Berry', 'Car', 'Orange', 'Car'], 'VALUE': [388, 204, 322, 716, 282, 555, 289, 316, 297, 756, 460, 497, 831, 225, 395, 486, 320, 208, 464, 613, 369, 474, 888, 345, 664], 'COMMENT': ['comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment2']})
#check misisng values
mask = df['LEVEL'].isna()
#split DataFrames for different processing
df1 = df[~mask]
df2 = df[mask]
#pivoting with differnet columns parameters
df1 = df1.pivot_table(index=['ID','COMMENT','CATEG'],
columns=['COLS','LEVEL'],
values='VALUE')
# print (df1)
df2 = df2.pivot_table(index=['ID','COMMENT','CATEG'], columns='COLS',values='VALUE')
# print (df1)
from pandas import ExcelWriter
with pd.ExcelWriter('Values.xlsx') as writer:
#groupby by first 2 levels ID, COMMENT
for (ids,comments), sample_df in df1.groupby(['ID','COMMENT']):
#removed first 2 levels, also removed only NaNs columns
df = sample_df.reset_index(level=[1], drop=True).dropna(how='all', axis=1)
#new sheetnames by f-strings
name = f'{ids}_{comments}'
#write to file
df.to_excel(writer,sheet_name=name)
for (ids,comments), sample_df in df2.groupby(['ID','COMMENT']):
df = sample_df.reset_index(level=[1], drop=True).dropna(how='all', axis=1)
name = f'{ids}_NULL_{comments}'
df.to_excel(writer,sheet_name=name)
无重复代码的另一种解决方案:
mask = df['LEVEL'].isna()
dfs = {'no_null': df[~mask], 'null': df[mask]}
from pandas import ExcelWriter
with pd.ExcelWriter('Values.xlsx') as writer:
for k, v in dfs.items():
if k == 'no_null':
add = ''
cols = ['COLS','LEVEL']
else:
add = 'NULL_'
cols = 'COLS'
df = v.pivot_table(index=['ID','COMMENT','CATEG'], columns=cols, values='VALUE')
for (ids,comments), sample_df in df.groupby(['ID','COMMENT']):
df = sample_df.reset_index(level=[1], drop=True).dropna(how='all', axis=1)
name = f'{ids}_{add}{comments}'
df.to_excel(writer,sheet_name=name)