使用 Python Pandas 使用每日数据的每月平均值

Question

我有一个包含四列的文本文件：年、月、日和雪深。这是 1979-2009 年 30 年期间的每日数据。

我想使用 pandas 计算 360（30 年 X 12 个月）个人每月平均值（即隔离 1979 年 1 月、1979 年 2 月、... 2009 年 12 月的所有值，并对每个月取平均值).谁能帮我提供一些示例代码？

1979    1   1   3
1979    1   2   3
1979    1   3   3
1979    1   4   3
1979    1   5   3
1979    1   6   3
1979    1   7   4
1979    1   8   5
1979    1   9   7
1979    1   10  8
1979    1   11  16
1979    1   12  16
1979    1   13  16
1979    1   14  18
1979    1   15  18
1979    1   16  18
1979    1   17  18
1979    1   18  20
1979    1   19  20
1979    1   20  20
1979    1   21  20
1979    1   22  20
1979    1   23  18
1979    1   24  18
1979    1   25  18
1979    1   26  18
1979    1   27  18
1979    1   28  18
1979    1   29  18
1979    1   30  18
1979    1   31  19
1979    2   1   19
1979    2   2   19
1979    2   3   19
1979    2   4   19
1979    2   5   19
1979    2   6   22
1979    2   7   24
1979    2   8   27
1979    2   9   29
1979    2   10  32
1979    2   11  32
1979    2   12  32
1979    2   13  32
1979    2   14  33
1979    2   15  33
1979    2   16  33
1979    2   17  34
1979    2   18  36
1979    2   19  36
1979    2   20  36
1979    2   21  36
1979    2   22  36
1979    2   23  36
1979    2   24  31
1979    2   25  29
1979    2   26  27
1979    2   27  27
1979    2   28  27

Answer 1

您需要按年和月对数据进行分组，然后计算每组的平均值。伪代码：

import numpy as np
import pandas as pd

# Read in your file as a pandas.DataFrame
# using 'any number of whitespace' as the seperator
df = pd.read_csv("snow.txt", sep='\s*', names=["year", "month", "day", "snow_depth"])

# Show the first 5 rows of the DataFrame
print df.head()

# Group data first by year, then by month
g = df.groupby(["year", "month"])

# For each group, calculate the average of only the snow_depth column
monthly_averages = g.aggregate({"snow_depth":np.mean})

有关 Pandas 中的拆分-应用-组合方法的更多信息，请阅读 here。

一个DataFrame是一个：

"Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns)."

就您的目的而言，numpy ndarray 和 DataFrame 之间的区别并不太明显，但是 DataFrames 有很多功能可以让您的生活更轻松，所以我建议阅读它们。

使用 Python Pandas 使用每日数据的每月平均值

Monthly Averages Using Daily Data Using Python Pandas

python

time-series

pandas