如何将导入到 python 的数据从 csv 文件转换为时间序列?
How can I turn data imported into python from a csv file to time-series?
我想将通过 .csv 文件导入 python 的数据转换为时间序列。
GDP = pd.read_csv('GDP.csv')
[87]: GDP
Out[87]:
GDP growth (%)
0 0.5
1 -5.2
2 -7.9
3 -9.1
4 -10.3
5 -8.8
6 -7.4
7 -10.1
8 -8.4
9 -8.7
10 -7.9
11 -4.1
由于通过 .csv 文件导入的数据是 DataFrame 格式,我首先尝试将它们转换为 pd.Series:
GDP2 = pd.Series(data = GDP, index = pd.date_range(start = '01-2010', end = '01-2018', freq = 'Q'))
但我得到的是这样的:
GDP2
Out[90]:
2010-03-31 (G, D, P, , g, r, o, w, t, h, , (, %, ))
2010-06-30 (G, D, P, , g, r, o, w, t, h, , (, %, ))
2010-09-30 (G, D, P, , g, r, o, w, t, h, , (, %, ))
2010-12-31 (G, D, P, , g, r, o, w, t, h, , (, %, ))
2011-03-31 (G, D, P, , g, r, o, w, t, h, , (, %, ))
2011-06-30 (G, D, P, , g, r, o, w, t, h, , (, %, ))
2011-09-30 (G, D, P, , g, r, o, w, t, h, , (, %, ))
2011-12-31 (G, D, P, , g, r, o, w, t, h, , (, %, ))
2012-03-31 (G, D, P, , g, r, o, w, t, h, , (, %, ))
2012-06-30 (G, D, P, , g, r, o, w, t, h, , (, %, ))
2012-09-30 (G, D, P, , g, r, o, w, t, h, , (, %, ))
2012-12-31 (G, D, P, , g, r, o, w, t, h, , (, %, ))
当我尝试通过 pd.DataFrame 做到这一点时,同样的事情发生了:
GDP2 = pd.DataFrame(data = GDP, index = pd.date_range(start = '01-2010', end = '01-2018', freq = 'Q'))
GDP2
Out[92]:
GDP growth (%)
2010-03-31 NaN
2010-06-30 NaN
2010-09-30 NaN
2010-12-31 NaN
2011-03-31 NaN
2011-06-30 NaN
2011-09-30 NaN
2011-12-31 NaN
2012-03-31 NaN
2012-06-30 NaN
2012-09-30 NaN
或者当我通过使用 reindex() 进行尝试时:
dates = pd.date_range(start = '01-2010', end = '01-2018', freq = 'Q')
dates
Out[100]:
DatetimeIndex(['2010-03-31', '2010-06-30', '2010-09-30', '2010-12-31',
'2011-03-31', '2011-06-30', '2011-09-30', '2011-12-31',
'2012-03-31', '2012-06-30', '2012-09-30', '2012-12-31',
'2013-03-31', '2013-06-30', '2013-09-30', '2013-12-31',
'2014-03-31', '2014-06-30', '2014-09-30', '2014-12-31',
'2015-03-31', '2015-06-30', '2015-09-30', '2015-12-31',
'2016-03-31', '2016-06-30', '2016-09-30', '2016-12-31',
'2017-03-31', '2017-06-30', '2017-09-30', '2017-12-31'],
dtype='datetime64[ns]', freq='Q-DEC')
GDP.reindex(dates)
Out[101]:
GDP growth (%)
2010-03-31 NaN
2010-06-30 NaN
2010-09-30 NaN
2010-12-31 NaN
2011-03-31 NaN
2011-06-30 NaN
2011-09-30 NaN
2011-12-31 NaN
2012-03-31 NaN
2012-06-30 NaN
2012-09-30 NaN
2012-12-31 NaN
我肯定犯了一些愚蠢的新手错误,如果有人能帮助我,我将不胜感激。干杯。
使用set_index
df
gdp
0 0.5
1 -5.2
2 -7.9
3 -9.1
4 -10.3
5 -8.8
6 -7.4
7 -10.1
8 -8.4
9 -8.7
10 -7.9
11 -4.1
df = df.set_index(pd.date_range(start = '01-2010', end = '01-2013',freq = 'Q'))
gdp
2010-03-31 0.5
2010-06-30 -5.2
2010-09-30 -7.9
2010-12-31 -9.1
2011-03-31 -10.3
2011-06-30 -8.8
2011-09-30 -7.4
2011-12-31 -10.1
2012-03-31 -8.4
2012-06-30 -8.7
2012-09-30 -7.9
2012-12-31 -4.1
要修复您的代码,请添加 values
GDP2 = pd.DataFrame(data = GDP.values, index = pd.date_range(start = '01-2010', end = '01-2013',freq = 'Q'))
GDP2
Out[71]:
0
2010-03-31 0.5
2010-06-30 -5.2
2010-09-30 -7.9
2010-12-31 -9.1
2011-03-31 -10.3
2011-06-30 -8.8
2011-09-30 -7.4
2011-12-31 -10.1
2012-03-31 -8.4
2012-06-30 -8.7
2012-09-30 -7.9
2012-12-31 -4.1
我想将通过 .csv 文件导入 python 的数据转换为时间序列。
GDP = pd.read_csv('GDP.csv')
[87]: GDP
Out[87]:
GDP growth (%)
0 0.5
1 -5.2
2 -7.9
3 -9.1
4 -10.3
5 -8.8
6 -7.4
7 -10.1
8 -8.4
9 -8.7
10 -7.9
11 -4.1
由于通过 .csv 文件导入的数据是 DataFrame 格式,我首先尝试将它们转换为 pd.Series:
GDP2 = pd.Series(data = GDP, index = pd.date_range(start = '01-2010', end = '01-2018', freq = 'Q'))
但我得到的是这样的:
GDP2
Out[90]:
2010-03-31 (G, D, P, , g, r, o, w, t, h, , (, %, ))
2010-06-30 (G, D, P, , g, r, o, w, t, h, , (, %, ))
2010-09-30 (G, D, P, , g, r, o, w, t, h, , (, %, ))
2010-12-31 (G, D, P, , g, r, o, w, t, h, , (, %, ))
2011-03-31 (G, D, P, , g, r, o, w, t, h, , (, %, ))
2011-06-30 (G, D, P, , g, r, o, w, t, h, , (, %, ))
2011-09-30 (G, D, P, , g, r, o, w, t, h, , (, %, ))
2011-12-31 (G, D, P, , g, r, o, w, t, h, , (, %, ))
2012-03-31 (G, D, P, , g, r, o, w, t, h, , (, %, ))
2012-06-30 (G, D, P, , g, r, o, w, t, h, , (, %, ))
2012-09-30 (G, D, P, , g, r, o, w, t, h, , (, %, ))
2012-12-31 (G, D, P, , g, r, o, w, t, h, , (, %, ))
当我尝试通过 pd.DataFrame 做到这一点时,同样的事情发生了:
GDP2 = pd.DataFrame(data = GDP, index = pd.date_range(start = '01-2010', end = '01-2018', freq = 'Q'))
GDP2
Out[92]:
GDP growth (%)
2010-03-31 NaN
2010-06-30 NaN
2010-09-30 NaN
2010-12-31 NaN
2011-03-31 NaN
2011-06-30 NaN
2011-09-30 NaN
2011-12-31 NaN
2012-03-31 NaN
2012-06-30 NaN
2012-09-30 NaN
或者当我通过使用 reindex() 进行尝试时:
dates = pd.date_range(start = '01-2010', end = '01-2018', freq = 'Q')
dates
Out[100]:
DatetimeIndex(['2010-03-31', '2010-06-30', '2010-09-30', '2010-12-31',
'2011-03-31', '2011-06-30', '2011-09-30', '2011-12-31',
'2012-03-31', '2012-06-30', '2012-09-30', '2012-12-31',
'2013-03-31', '2013-06-30', '2013-09-30', '2013-12-31',
'2014-03-31', '2014-06-30', '2014-09-30', '2014-12-31',
'2015-03-31', '2015-06-30', '2015-09-30', '2015-12-31',
'2016-03-31', '2016-06-30', '2016-09-30', '2016-12-31',
'2017-03-31', '2017-06-30', '2017-09-30', '2017-12-31'],
dtype='datetime64[ns]', freq='Q-DEC')
GDP.reindex(dates)
Out[101]:
GDP growth (%)
2010-03-31 NaN
2010-06-30 NaN
2010-09-30 NaN
2010-12-31 NaN
2011-03-31 NaN
2011-06-30 NaN
2011-09-30 NaN
2011-12-31 NaN
2012-03-31 NaN
2012-06-30 NaN
2012-09-30 NaN
2012-12-31 NaN
我肯定犯了一些愚蠢的新手错误,如果有人能帮助我,我将不胜感激。干杯。
使用set_index
df
gdp
0 0.5
1 -5.2
2 -7.9
3 -9.1
4 -10.3
5 -8.8
6 -7.4
7 -10.1
8 -8.4
9 -8.7
10 -7.9
11 -4.1
df = df.set_index(pd.date_range(start = '01-2010', end = '01-2013',freq = 'Q'))
gdp
2010-03-31 0.5
2010-06-30 -5.2
2010-09-30 -7.9
2010-12-31 -9.1
2011-03-31 -10.3
2011-06-30 -8.8
2011-09-30 -7.4
2011-12-31 -10.1
2012-03-31 -8.4
2012-06-30 -8.7
2012-09-30 -7.9
2012-12-31 -4.1
要修复您的代码,请添加 values
GDP2 = pd.DataFrame(data = GDP.values, index = pd.date_range(start = '01-2010', end = '01-2013',freq = 'Q'))
GDP2
Out[71]:
0
2010-03-31 0.5
2010-06-30 -5.2
2010-09-30 -7.9
2010-12-31 -9.1
2011-03-31 -10.3
2011-06-30 -8.8
2011-09-30 -7.4
2011-12-31 -10.1
2012-03-31 -8.4
2012-06-30 -8.7
2012-09-30 -7.9
2012-12-31 -4.1