Python - Pandas - 根据现有级别填充缺失数据
Python - Pandas - Fill missing data based on existing Levels
- 我使用 pandas 并从 SQL 数据库中获取数据
- 我有两个代码。一个是 U.S 股票,另一个是欧洲股票。两只股票的日期不一定相同(节假日等)。
- 我所有的数据都存储在一个多索引 DataFrame 中。
- 希望根据级别填充缺失值
运行 下面的代码:
import pandas as pd
import datetime
ticker_date = [('US',datetime.date.today()-datetime.timedelta(3)),
('US',datetime.date.today()-datetime.timedelta(2)),
('US',datetime.date.today()-datetime.timedelta(1)),
('EU',datetime.date.today()-datetime.timedelta(3)),
('EU',datetime.date.today()-datetime.timedelta(1))]
index_df = pd.MultiIndex.from_tuples(ticker_date)
example = pd.DataFrame([12.2,12.5,12.6,15.1,15],index_df,['value'])
输出:
Output from code above
我正在寻找一种方法来重塑我的输出,用以前的值填充缺失的数据:
Objective: add a dec 11th line and fill with previous value
我会这样做:
In [24]: idx = pd.MultiIndex.from_product((
example.index.get_level_values(0).unique(),
example.index.get_level_values(1).unique()))
In [25]: example = example.reindex(idx).ffill()
In [26]: example
Out[26]:
value
US 2017-12-10 12.2
2017-12-11 12.5
2017-12-12 12.6
EU 2017-12-10 15.1
2017-12-11 15.1
2017-12-12 15.0
您需要谨慎使用此解决方案。它不考虑交叉主索引。例如,
import pandas as pd
import datetime
ticker_date = [ ('US',datetime.date.today()-datetime.timedelta(3)),
('US',datetime.date.today()-datetime.timedelta(2)),
('US',datetime.date.today()-datetime.timedelta(1)),
('EU',datetime.date.today()-datetime.timedelta(2)),
('EU',datetime.date.today()-datetime.timedelta(1))]
index_df = pd.MultiIndex.from_tuples(ticker_date)
example = pd.DataFrame([12.2,12.5,12.6,15.1,15],index_df,['value'])
idx = pd.MultiIndex.from_product((
example.index.get_level_values(0).unique(),
example.index.get_level_values(1).unique()))
example = example.reindex(idx).ffill()
print(example)
产生:
value
US 2019-11-23 12.2
2019-11-24 12.5
2019-11-25 12.6
EU 2019-11-23 12.6 <==
2019-11-24 15.1
2019-11-25 15.0
- 我使用 pandas 并从 SQL 数据库中获取数据
- 我有两个代码。一个是 U.S 股票,另一个是欧洲股票。两只股票的日期不一定相同(节假日等)。
- 我所有的数据都存储在一个多索引 DataFrame 中。
- 希望根据级别填充缺失值
运行 下面的代码:
import pandas as pd
import datetime
ticker_date = [('US',datetime.date.today()-datetime.timedelta(3)),
('US',datetime.date.today()-datetime.timedelta(2)),
('US',datetime.date.today()-datetime.timedelta(1)),
('EU',datetime.date.today()-datetime.timedelta(3)),
('EU',datetime.date.today()-datetime.timedelta(1))]
index_df = pd.MultiIndex.from_tuples(ticker_date)
example = pd.DataFrame([12.2,12.5,12.6,15.1,15],index_df,['value'])
输出:
Output from code above
我正在寻找一种方法来重塑我的输出,用以前的值填充缺失的数据:
Objective: add a dec 11th line and fill with previous value
我会这样做:
In [24]: idx = pd.MultiIndex.from_product((
example.index.get_level_values(0).unique(),
example.index.get_level_values(1).unique()))
In [25]: example = example.reindex(idx).ffill()
In [26]: example
Out[26]:
value
US 2017-12-10 12.2
2017-12-11 12.5
2017-12-12 12.6
EU 2017-12-10 15.1
2017-12-11 15.1
2017-12-12 15.0
您需要谨慎使用此解决方案。它不考虑交叉主索引。例如,
import pandas as pd
import datetime
ticker_date = [ ('US',datetime.date.today()-datetime.timedelta(3)),
('US',datetime.date.today()-datetime.timedelta(2)),
('US',datetime.date.today()-datetime.timedelta(1)),
('EU',datetime.date.today()-datetime.timedelta(2)),
('EU',datetime.date.today()-datetime.timedelta(1))]
index_df = pd.MultiIndex.from_tuples(ticker_date)
example = pd.DataFrame([12.2,12.5,12.6,15.1,15],index_df,['value'])
idx = pd.MultiIndex.from_product((
example.index.get_level_values(0).unique(),
example.index.get_level_values(1).unique()))
example = example.reindex(idx).ffill()
print(example)
产生:
value
US 2019-11-23 12.2
2019-11-24 12.5
2019-11-25 12.6
EU 2019-11-23 12.6 <==
2019-11-24 15.1
2019-11-25 15.0