Pandas:两个长度不同的系列之间的差异(未对齐索引)
Pandas: Difference between two series with different length (unaligned index)
考虑以下两个系列:
sri = inp.groupby(inp.index.date)['value'].count()
2009-01-12 7
2009-01-14 3
和
sro = out.groupby(out.index.date)['value'].count()
2009-01-03 1
2009-01-09 14
2009-01-10 61
2009-01-11 93
2009-01-12 106
2009-01-13 123
2009-01-14 130
当我们从一个减去另一个时,sro-sri
,我们有:
2009-01-03 NaN
2009-01-09 NaN
2009-01-10 NaN
2009-01-11 NaN
2009-01-12 99.0
2009-01-13 NaN
2009-01-14 127.0
但是我想要的输出是:
2009-01-03 1.0
2009-01-04 0.0
2009-01-05 0.0
2009-01-06 0.0
2009-01-07 0.0
2009-01-08 0.0
2009-01-09 14.0
2009-01-10 61.0
2009-01-11 93.0
2009-01-12 99.0
2009-01-13 123.0
2009-01-14 127.0
我们可以使用以下解决方法来生成相同的结果:
start_date = '2009-01-03'
end_date = '2009-01-15'
df = pd.DataFrame(
index=pd.date_range(pd.to_datetime(start_date), pd.to_datetime(end_date) - timedelta(days=1), freq='d').date)
df = df.merge(sro.to_frame(), how='outer', left_index=True, right_index=True) \
.merge(sri.to_frame(), how='outer', left_index=True, right_index=True).fillna(0)
print(df['value_x'] - df['value_y'])
是否有更紧凑的解决方案来生成相同的输出?
简单减法的一个简单方法是使用 sub
和 fillna=0
:
sro.sub(sri, fill_value=0).convert_dtypes()
输出:
2009-01-03 1
2009-01-09 14
2009-01-10 61
2009-01-11 93
2009-01-12 99
2009-01-13 123
2009-01-14 127
添加缺失的索引:
idx = sro.index.union(sri.index)
(sro.sub(sri, fill_value=0)
.reindex(pd.date_range(idx.min(), idx.max()).astype(str), fill_value=0)
.convert_dtypes()
)
输出:
2009-01-03 1
2009-01-04 0
2009-01-05 0
2009-01-06 0
2009-01-07 0
2009-01-08 0
2009-01-09 14
2009-01-10 61
2009-01-11 93
2009-01-12 99
2009-01-13 123
2009-01-14 127
使用的输入:
sri = pd.Series({'2009-01-12': 7, '2009-01-14': 3})
sro = pd.Series({'2009-01-03': 1, '2009-01-09': 14, '2009-01-10': 61, '2009-01-11': 93, '2009-01-12': 106, '2009-01-13': 123, '2009-01-14': 130})
考虑以下两个系列:
sri = inp.groupby(inp.index.date)['value'].count()
2009-01-12 7
2009-01-14 3
和
sro = out.groupby(out.index.date)['value'].count()
2009-01-03 1
2009-01-09 14
2009-01-10 61
2009-01-11 93
2009-01-12 106
2009-01-13 123
2009-01-14 130
当我们从一个减去另一个时,sro-sri
,我们有:
2009-01-03 NaN
2009-01-09 NaN
2009-01-10 NaN
2009-01-11 NaN
2009-01-12 99.0
2009-01-13 NaN
2009-01-14 127.0
但是我想要的输出是:
2009-01-03 1.0
2009-01-04 0.0
2009-01-05 0.0
2009-01-06 0.0
2009-01-07 0.0
2009-01-08 0.0
2009-01-09 14.0
2009-01-10 61.0
2009-01-11 93.0
2009-01-12 99.0
2009-01-13 123.0
2009-01-14 127.0
我们可以使用以下解决方法来生成相同的结果:
start_date = '2009-01-03'
end_date = '2009-01-15'
df = pd.DataFrame(
index=pd.date_range(pd.to_datetime(start_date), pd.to_datetime(end_date) - timedelta(days=1), freq='d').date)
df = df.merge(sro.to_frame(), how='outer', left_index=True, right_index=True) \
.merge(sri.to_frame(), how='outer', left_index=True, right_index=True).fillna(0)
print(df['value_x'] - df['value_y'])
是否有更紧凑的解决方案来生成相同的输出?
简单减法的一个简单方法是使用 sub
和 fillna=0
:
sro.sub(sri, fill_value=0).convert_dtypes()
输出:
2009-01-03 1
2009-01-09 14
2009-01-10 61
2009-01-11 93
2009-01-12 99
2009-01-13 123
2009-01-14 127
添加缺失的索引:
idx = sro.index.union(sri.index)
(sro.sub(sri, fill_value=0)
.reindex(pd.date_range(idx.min(), idx.max()).astype(str), fill_value=0)
.convert_dtypes()
)
输出:
2009-01-03 1
2009-01-04 0
2009-01-05 0
2009-01-06 0
2009-01-07 0
2009-01-08 0
2009-01-09 14
2009-01-10 61
2009-01-11 93
2009-01-12 99
2009-01-13 123
2009-01-14 127
使用的输入:
sri = pd.Series({'2009-01-12': 7, '2009-01-14': 3})
sro = pd.Series({'2009-01-03': 1, '2009-01-09': 14, '2009-01-10': 61, '2009-01-11': 93, '2009-01-12': 106, '2009-01-13': 123, '2009-01-14': 130})