基于日期的多条件计数器
multi-conditional counter based on dates
我有这个数据框
df:
entrance leaving counter
1 2012-07-01 NaT NaN
2 2013-03-15 NaT NaN
3 2013-03-15 2013-04-15 NaN
4 2014-06-01 NaT NaN
5 2014-06-01 NaT NaN
我想要考虑两列日期的计数器,并在 entrance
日期递增,在有 leaving
日期时递减。此外,下面的 date
列也应增加一个月。
所需的输出应为:
df_new:
date counter
2012-07 1
2012-08 1
... ...
2013-03 2
... ...
2014-06 4
我在这行中根据 entrance
递增,但我无法使用 np.where()
来递减 `df.entrance.notnull()'。
df.groupby([df['entrance'].dt.to_period("M")]).entrance.count().cumsum()
我认为您的问题未指定。计数器不能共享原始 DF 的索引。以下是原因的示例:
# Lets assume this is the DF:
entrance leaving counter
1 2012-07-01 NaT 1
2 2013-03-15 NaT 2
3 2013-03-15 2013-06-15 2 ?
4 2013-06-01 NaT 3 or 4? Depends if you count the exit in prev row or not
不管怎样,解决方法如下:
# Load Data
s = ''' entrance leaving counter
1 2012-07-01 NaT NaN
2 2013-03-15 NaT NaN
3 2013-03-15 2013-04-15 NaN
4 2014-06-01 NaT NaN
5 2014-06-01 NaT NaN'''
df = pd.DataFrame.from_csv(io.StringIO(s), sep='\s+')
df['leaving']= pd.to_datetime(df['leaving'])
df['entrance']= pd.to_datetime(df['entrance'])
不会遵循原始索引的明确解决方案:
# Counter
counter = pd.Series(1, df['entrance'].dropna()).subtract(pd.Series(1, df['leaving'].dropna()), fill_value=0).cumsum()
# If you want it monthly
counter.resample('M').last().ffill()
保持原始索引但有些模糊的解决方案:
count_df = df.notna().cumsum()
df['counter'] = count_df['entrance'] - count_df['leaving']
我有这个数据框
df:
entrance leaving counter
1 2012-07-01 NaT NaN
2 2013-03-15 NaT NaN
3 2013-03-15 2013-04-15 NaN
4 2014-06-01 NaT NaN
5 2014-06-01 NaT NaN
我想要考虑两列日期的计数器,并在 entrance
日期递增,在有 leaving
日期时递减。此外,下面的 date
列也应增加一个月。
所需的输出应为:
df_new:
date counter
2012-07 1
2012-08 1
... ...
2013-03 2
... ...
2014-06 4
我在这行中根据 entrance
递增,但我无法使用 np.where()
来递减 `df.entrance.notnull()'。
df.groupby([df['entrance'].dt.to_period("M")]).entrance.count().cumsum()
我认为您的问题未指定。计数器不能共享原始 DF 的索引。以下是原因的示例:
# Lets assume this is the DF:
entrance leaving counter
1 2012-07-01 NaT 1
2 2013-03-15 NaT 2
3 2013-03-15 2013-06-15 2 ?
4 2013-06-01 NaT 3 or 4? Depends if you count the exit in prev row or not
不管怎样,解决方法如下:
# Load Data
s = ''' entrance leaving counter
1 2012-07-01 NaT NaN
2 2013-03-15 NaT NaN
3 2013-03-15 2013-04-15 NaN
4 2014-06-01 NaT NaN
5 2014-06-01 NaT NaN'''
df = pd.DataFrame.from_csv(io.StringIO(s), sep='\s+')
df['leaving']= pd.to_datetime(df['leaving'])
df['entrance']= pd.to_datetime(df['entrance'])
不会遵循原始索引的明确解决方案:
# Counter
counter = pd.Series(1, df['entrance'].dropna()).subtract(pd.Series(1, df['leaving'].dropna()), fill_value=0).cumsum()
# If you want it monthly
counter.resample('M').last().ffill()
保持原始索引但有些模糊的解决方案:
count_df = df.notna().cumsum()
df['counter'] = count_df['entrance'] - count_df['leaving']