用 pandas 和 statsmodels 预测未来

Question

我需要做的是用这些 "requirments" 绘制未来温度：“假设温度大致是二氧化碳排放量的线性函数，从最近的数据点估计线性函数的系数（使用过去的 2 个很好，如果你想更彻底的话，也可以使用过去的 10 个左右）。此外，假设 CO2 排放量的增长率将是与今天相同（即如果 2016 年的二氧化碳排放量比 2015年，2017年二氧化碳排放量将比2016年多X吨）。

我有 2 个数据集，一个是每年每个月的温度，另一个是每年的碳水平。

(post 合并并缩短了一个，因为它不是那么大，但如果看到它们未修改更有帮助，那么我也可以 post，你可以看到它是如何在下面完成我 post 我的代码）

Year    Carbon    June

2000    6727  20.386
2001    6886  20.445
2002    6946  20.662
2003    7367  20.343
2004    7735  20.242
2005    8025  20.720
2006    8307  20.994
2007    8488  20.661
2008    8738  20.657
2009    8641  20.548
2010    9137  21.027
2011    9508  20.915
2012    9671  21.172

到目前为止我所做的是将两个数据集合并在一起，然后尝试预测未来几年一个月的温度，我将它限制在 2000-2012 年只是为了让它更简单并确保两者table 的长度相同，因为一个 table 比另一个长。我对 python 和整体编码还很陌生，我不知道该怎么做，下面你可以看到我到目前为止所做的尝试：

data1 = pd.read_csv("co2.csv", sep=',')
data2 = pd.read_csv("temperature.csv", sep=',')

data1 = data1.set_index('Year')
data2 = data2.set_index('Year')

data3 = data1.loc["2000":"2012"]

data4 = data2.loc["2000":"2012"]

data4 = data4.loc[:, "June":"June"]

data5 = pd.merge(data3,data4, how= 'left', left_index =True , right_index=True)

x = data5["Carbon"]

y = data5["June"]

model = sm.OLS(y,x).fit()

prediction = model.predict(x)

prediction.plot()


plt.show()

Answer 1

方法OLS.predict不将x作为参数，而是模型参数（最终是外生数据）。此外，您必须为 X 添加一个常数，否则它会强制线性回归通过原点。这是一个例子：

import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt
from StringIO import StringIO

data = StringIO("""
Year Carbon June
2000 6727 20.386
2001 6886 20.445
2002 6946 20.662
2003 7367 20.343
2004 7735 20.242
2005 8025 20.720
2006 8307 20.994
2007 8488 20.661
2008 8738 20.657
2009 8641 20.548
2010 9137 21.027
2011 9508 20.915
2012 9671 21.172
""")

# Model training
df = pd.read_table(data, index_col=0, sep='\s+')
Y_train = df['June']
X_train = df['Carbon']
X_train = sm.add_constant(X_train) # add this to your code
model = sm.OLS(Y_train, X_train)
results = model.fit()

# Prediction of future values
future_carbon = range(9700, 10000, 50)
X_pred = pd.DataFrame(data=future_carbon, columns=['Carbon'])
X_pred = sm.add_constant(X_pred)
prediction = model.predict(results.params, X_pred)

# Plot
plt.figure()
plt.plot(X_train['Carbon'], model.predict(results.params), '-r', label='Linear model')
plt.plot(X_pred['Carbon'], prediction, '--r', label='Linear prediction')
plt.scatter(df['Carbon'], df['June'], label='data')
plt.xlabel('Carbon')
plt.ylabel('June temperature')
plt.legend()
plt.show()

用 pandas 和 statsmodels 预测未来

Predicting the future with pandas and statsmodels

python

pandas

statsmodels