如果数据在时间段之间拆分,如何正确连接 Pandas 系列?

How to properly concatenate Pandas Series if data split between time period?

索引 Pandas 日期时间,我按周计算事件并绘制它们。当前每个对象都是一个 pandas.core.series.Series。因为数据是每年下载的,所以某些周是分开的。这是一个例子:

Datetime
2005-12-18    1840
2005-12-25    1959
2006-01-01    1695

Datetime
2006-01-01     285
2006-01-08    1917
2006-01-15    1821
Freq: W-SUN, dtype: int64

2006-01-01 周应该有 285 + 1695 = 1980 个事件。

如果我连接这两个系列,

import pandas as pd
pd.concat([weeks2005, weeks2006])

这不会发生。由于这些不连续性,data/plots 中会有很大的 "spikes"。我该如何修改?

您可以使用 add 和参数 fill_value=0:

print weeks2005.add(weeks2006, fill_value=0)
2005-12-18    1840
2005-12-25    1959
2006-01-01    1980
2006-01-08    1917
2006-01-15    1821
Freq: W-SUN, dtype: float64

然后您可以通过 astype:

转换为 int
print weeks2005.add(weeks2006, fill_value=0).astype(int)
2005-12-18    1840
2005-12-25    1959
2006-01-01    1980
2006-01-08    1917
2006-01-15    1821
Freq: W-SUN, dtype: int32

编辑:

如果你有 50 个 Series,你可以使用 concat and groupby by index with sum:

import pandas as pd

dt1 = pd.to_datetime('2005-12-18')
idx1 = pd.date_range(dt1, periods=3, freq='W-SUN')
weeks2005 = pd.Series( [1840, 1959, 1695], index=idx1)

dt2 = pd.to_datetime('2006-01-01')
idx2 = pd.date_range(dt2, periods=3, freq='W-SUN')
weeks2006 = pd.Series( [285, 1917, 1821], index=idx2)

dt3 = pd.to_datetime('2006-01-15')
idx3 = pd.date_range(dt3, periods=3, freq='W-SUN')
weeks2006a = pd.Series( [100, 200, 500], index=idx3)

weeks = [weeks2005, weeks2006, weeks2006a ] 
print weeks
[2005-12-18    1840
2005-12-25    1959
2006-01-01    1695
Freq: W-SUN, dtype: int64, 2006-01-01     285
2006-01-08    1917
2006-01-15    1821
Freq: W-SUN, dtype: int64, 2006-01-15    100
2006-01-22    200
2006-01-29    500
Freq: W-SUN, dtype: int64]
#concat list of series 
#duplicity of some index value in output series
concated_series = pd.concat([weeks2005, weeks2006, weeks2006a]
#concated_series = pd.concat(weeks)
print concated_series
#2005-12-18    1840
#2005-12-25    1959
#2006-01-01    1695
#2006-01-01     285
#2006-01-08    1917
#2006-01-15    1821
#2006-01-15     100
#2006-01-22     200
#2006-01-29     500
#dtype: int64

#grouping by index and aggregation sum
output = concated_series.groupby(by=concated_series.index).sum()
#level=0 is first level of multiindex, but it works in index too
#output = concated_series.groupby(level=0).sum()
print output

#2005-12-18    1840
#2005-12-25    1959
#2006-01-01    1980
#2006-01-08    1917
#2006-01-15    1921
#2006-01-22     200
#2006-01-29     500
#dtype: int64

有关 groupby 的更多信息和示例是 here

您可以将系列转换为数据框,然后使用日期作为键将它们合并在一起:

import pandas as pd
from pandas import Series, DataFrame

df2005 = pd.DataFrame(weeks2005.values)
df2005.columns = ["Datetime"]
df2006 = pd.DataFrame(weeks2006.values)
df2006.columns = ["Datetime"]

def split_datetime(record):
    record_splited = record.partition(" ")
    return record_splited[0]

def split_number(record):
    record_splited = record.partition(" ")
    return int(record_splited[1])

df2005["Number"] = df2005["Datetime"].apply(split_number)
df2005["Datetime"] = df2005["Datetime"].apply(split_datetime)

df2006["Number"] = df2006["Datetime"].apply(split_number)
df2006["Datetime"] = df2006["Datetime"].apply(split_datetime)

df_merge = pd.merge(df2005, df2006, on="Datetime", how="outer").fillna(0)
df_merge["Sum"] = df_merge["Number_x"] + df_merge["Number_y"]
df_merge.drop(["Number_x", "Number_y"], axis=1)

print df_merge