在 Pandas DataFrame 中提取和分组列集

Question

我有一个 DataFrame 结构，该结构源自一个关于多年人口统计的 CSV 文件。也就是说，文件中的列是每月时间间隔（1999-01、1999-02 ... 2016-12），行是世界上不同的人口中心（例如伦敦、多伦多、波士顿等）：

df = pd.DataFrame({'1999-01' : [100, 5000, 8000], '1999-02' : [200, 6000, 9000], '1999-03' : [300, 7000, 10000], ..., cities : ['CityA', 'CityB', 'CityC' ...]})

我想按季度分隔这些列。因此，我将为每一行取 1999-01、1999-02、1999-9 的平均人口，并为此条目创建一个新列“1999Q1”，每 3 个月执行一次：

df_quarter = pd.DataFrame({'1999Q1' : [200, 6000, 9000], '1999Q2' : ..., cities = ['CityA', 'CityB', 'CityC' ...]})

#Q1 corresponds to months 01-03, Q2 to months 04-06, Q3 to months 07-09, Q4 months 10-12, all inclusive

但是，我很难将查询概念化以完成此操作。我不太想先使用 .groupby()，然后再使用 .agg()，但我不确定如何有效地指定 3 列分组并遍历列。有人可以给我指出正确的方向吗？

编辑：假设这些列不是日期，而是更抽象的东西，并且不能使用简单的时间段重采样。例如：

#Prices of different foods from different vendors
df = pd.DataFrame({'oranges' : [2, 3, 7], 'apples' : [6, 3, 9], 'cheese' : [13, 9, 11], 'milk' : [6, 5, 12], 'vendors' : ['VendorA', 'VendorB', 'VendorC']})

现在，如果我想创建两列，将水果和奶制品组合起来，有什么方法可以指定要聚合的索引吗？

Answer 1

您可以使用聚合 mean:

按列 (axis=1) 和 quarter (q) 转换列 to_datetime and then to month period with to_period first and then resample

df = pd.DataFrame({'1999-01':[4,5,4,5,5,4],
                   '1999-02':[7,8,9,4,2,3],
                   '1999-03':[1,3,5,7,1,0],
                   '1999-04':[1,3,5,7,1,0],
                   '1999-05':[5,3,6,9,2,4]}, index=list('abcdef'))

print (df)
   1999-01  1999-02  1999-03  1999-04  1999-05
a        4        7        1        1        5
b        5        8        3        3        3
c        4        9        5        5        6
d        5        4        7        7        9
e        5        2        1        1        2
f        4        3        0        0        4

df.columns = pd.to_datetime(df.columns).to_period('m')
df = df.resample('q', axis=1).mean()

print (df)
     1999Q1  1999Q2
a  4.000000     3.0
b  5.333333     3.0
c  6.000000     5.5
d  5.333333     8.0
e  2.666667     1.5
f  2.333333     2.0

在 Pandas DataFrame 中提取和分组列集

Extracting and Grouping Sets of Columns in a Pandas DataFrame

python

dataframe

pandas

pandas-groupby