使用类型对象列的时间序列重采样

Time series resampling with column of type object

晚上好,

我想对类型为对象的列的不规则时间序列重新采样,但它不起作用

这是我的示例数据:

Actual start date   Ingredients                 NumberShortage
2002-01-01          LEVOBUNOLOL HYDROCHLORIDE   1
2006-07-30          LEVETIRACETAM               1
2008-03-19          FLAVOXATE HYDROCHLORIDE     1
2010-01-01          LEVOTHYROXINE SODIUM        1
2011-04-01          BIMATOPROST                 1

我尝试每天对我的数据框重新采样,但它不适用于我的代码,如下所示:

df3 =  df1.resample('D', on='Actual start date').sum()

这是它给出的:

Actual start date   NumberShortage
2002-01-01          1
2002-01-02          0
2002-01-03          0
2002-01-04          0
2002-01-05          0

以及我想要的结果:

 Actual start date  Ingredients                 NumberShortage
    2002-01-01      LEVOBUNOLOL HYDROCHLORIDE   1
    2002-01-02      NAN                         0
    2002-01-03      NAN                         0
    2002-01-04      NAN                         0
    2002-01-05      NAN                         0

有什么想法吗?

数据详情

所以我使用一个 excel 文件,它包含几个属性,然后它是一个 csv 文件(这个文件可以从这个网站下载 https://www.drugshortagescanada.ca/search?perform=0 )然后我按 'Actual start date' 分组和'成分'以获得'NumberShortage'

这是源代码:

import pandas as pd

df = pd.read_excel("Data/Data.xlsx")

df = df.dropna(how='any')   

df = df.groupby(['Actual start date','Ingredients']).size().reset_index(name='NumberShortage')

最后,在此处应用您的源代码后,eureur 给了我:

这里是示例 excel 文件:

        Brand name             Company Name  Ingredients Actual start date 
        ACETAMINOPHEN          PHARMASCIENCE INC ACETAMINOPHEN CODEINE 2017-03-23
PMS-METHYLPHENIDATE ER         PHARMASCIENCE INC METHYLPHENIDATE 2017-03-28

您更需要 reindex using date_range 作为新日期的来源,并将时间序列作为临时索引:

df['Actual start date'] = pd.to_datetime(df['Actual start date'])

(df
 .set_index('Actual start date')
 .reindex(pd.date_range(df['Actual start date'].min(),
                        df['Actual start date'].max(), freq='D'))
 .fillna({'NumberShortage': 0}, downcast='infer')
 .reset_index()
)

输出:

          index                Ingredients  NumberShortage
0    2002-01-01  LEVOBUNOLOL HYDROCHLORIDE               1
1    2002-01-02                        NaN               0
2    2002-01-03                        NaN               0
3    2002-01-04                        NaN               0
4    2002-01-05                        NaN               0
...         ...                        ...             ...
3373 2011-03-28                        NaN               0
3374 2011-03-29                        NaN               0
3375 2011-03-30                        NaN               0
3376 2011-03-31                        NaN               0
3377 2011-04-01                BIMATOPROST               1

[3378 rows x 3 columns]