如何使用pandas从给定时间table创建每个主题的频率table?
How to create a frequency table of each subject from a given timetable using pandas?
这是一个时间 table,列=小时,行=工作日,数据=主题 [工作日 x 小时]
1 2 3 4 5 6 7
Name
Monday Project Project Project Data Science Embedded Systems Data Mining Industrial Psychology
Tuesday Project Project Project Project Data Science Industrial Psychology Embedded Systems
Wednesday Data Science Project Project Project Project Project Project
Thursday Data Mining Industrial Psychology Embedded Systems Data Mining Project Project Project
Friday Industrial Psychology Embedded Systems Data Science Data Mining Project Project Project
如何生成 pandas.Dataframe
其中,行=工作日,列=主题,数据=相应工作日的主题频率?
必填table:[工作日 x 主题]
Data Mining, Data Science, Embedded Systems, Industrial Psychology, Project
Name
Monday 1 1 1 1 3
Tuesday ...
Wednesday
Thursday
Friday
self.file = 'timetable.csv'
self.sdf = pd.read_csv(self.file, header=0, index_col="Name")
print(self.sdf.to_string())
self.subject_frequency = self.sdf.apply(pd.value_counts)
print(self.subject_frequency.to_string())
self.subject_frequency["sum"] = self.subject_frequency.sum(axis=1)
使用 melt
展平您的数据框,然后 pivot_table
重塑您的数据框:
out = (
df.melt(var_name='Freq', value_name='Data', ignore_index=False).assign(variable=1)
.pivot_table('Freq', 'Name', 'Data', fill_value=0, aggfunc='count')
.loc[df.index] # sort by original index: Monday > Thuesday > ...
)
输出:
>>> out
Data Data Mining Data Science Embedded Systems Industrial Psychology Project
Name
Monday 1 1 1 1 3
Tuesday 0 1 1 1 4
Wednesday 0 1 0 0 6
Thursday 2 0 1 1 3
Friday 1 1 1 1 3
这是一个时间 table,列=小时,行=工作日,数据=主题 [工作日 x 小时]
1 2 3 4 5 6 7
Name
Monday Project Project Project Data Science Embedded Systems Data Mining Industrial Psychology
Tuesday Project Project Project Project Data Science Industrial Psychology Embedded Systems
Wednesday Data Science Project Project Project Project Project Project
Thursday Data Mining Industrial Psychology Embedded Systems Data Mining Project Project Project
Friday Industrial Psychology Embedded Systems Data Science Data Mining Project Project Project
如何生成 pandas.Dataframe
其中,行=工作日,列=主题,数据=相应工作日的主题频率?
必填table:[工作日 x 主题]
Data Mining, Data Science, Embedded Systems, Industrial Psychology, Project
Name
Monday 1 1 1 1 3
Tuesday ...
Wednesday
Thursday
Friday
self.file = 'timetable.csv'
self.sdf = pd.read_csv(self.file, header=0, index_col="Name")
print(self.sdf.to_string())
self.subject_frequency = self.sdf.apply(pd.value_counts)
print(self.subject_frequency.to_string())
self.subject_frequency["sum"] = self.subject_frequency.sum(axis=1)
使用 melt
展平您的数据框,然后 pivot_table
重塑您的数据框:
out = (
df.melt(var_name='Freq', value_name='Data', ignore_index=False).assign(variable=1)
.pivot_table('Freq', 'Name', 'Data', fill_value=0, aggfunc='count')
.loc[df.index] # sort by original index: Monday > Thuesday > ...
)
输出:
>>> out
Data Data Mining Data Science Embedded Systems Industrial Psychology Project
Name
Monday 1 1 1 1 3
Tuesday 0 1 1 1 4
Wednesday 0 1 0 0 6
Thursday 2 0 1 1 3
Friday 1 1 1 1 3