Pandas 滚动时间 window 字符串计数失败 - 为什么?
Pandas rolling time window fails on count of string - why?
尝试使用 pandas 通过时间索引和 count() 方法滚动时出现错误,我在这里缺少什么?
这里有一个例子:
d = {'vv': {pd.Timestamp('2020-01-13 08:22:00', freq='T'): 'aa',
pd.Timestamp('2020-01-13 08:23:00', freq='T'): 'bb',
pd.Timestamp('2020-01-13 08:24:00', freq='T'): 'cc',
pd.Timestamp('2020-01-13 08:25:00', freq='T'): np.nan,
pd.Timestamp('2020-01-13 08:26:00', freq='T'): 'dd'}}
df = pd.DataFrame(d)
df['vv'].rolling('72s').count()
得到这个:
ValueError Traceback (most recent call last)
~\Anaconda3\lib\site-packages\pandas\core\window\rolling.py in _prep_values(self, values)
354 try:
--> 355 values = ensure_float64(values)
356 except (ValueError, TypeError) as err:
pandas\_libs\algos_common_helper.pxi in pandas._libs.algos.ensure_float64()
ValueError: could not convert string to float: 'aa'
The above exception was the direct cause of the following exception:
TypeError Traceback (most recent call last)
~\Anaconda3\lib\site-packages\pandas\core\window\rolling.py in _apply(self, func, center, require_min_periods, floor, is_weighted, name, use_numba_cache, **kwargs)
535 try:
--> 536 values = self._prep_values(b.values)
537
~\Anaconda3\lib\site-packages\pandas\core\window\rolling.py in _prep_values(self, values)
356 except (ValueError, TypeError) as err:
--> 357 raise TypeError(f"cannot handle this type -> {values.dtype}") from err
358
TypeError: cannot handle this type -> object
The above exception was the direct cause of the following exception:
DataError Traceback (most recent call last)
<ipython-input-166-8676c880bf7e> in <module>
7 df = pd.DataFrame(d)
8
----> 9 df['vv'].rolling('72s').count()
~\Anaconda3\lib\site-packages\pandas\core\window\rolling.py in count(self)
2048 if self.is_freq_type or isinstance(self.window, BaseIndexer):
2049 window_func = self._get_roll_func("roll_count")
-> 2050 return self._apply(window_func, center=self.center, name="count")
2051
2052 return super().count()
~\Anaconda3\lib\site-packages\pandas\core\window\rolling.py in _apply(self, func, center, require_min_periods, floor, is_weighted, name, use_numba_cache, **kwargs)
542 continue
543 else:
--> 544 raise DataError("No numeric types to aggregate") from err
545
546 if values.size == 0:
DataError: No numeric types to aggregate
Pandas rolling time window fails on count of string - why?
我认为这是错误。它可以很好地处理 window 的数值:
s = df['vv'].rolling(4).count()
print (s)
2020-01-13 08:22:00 1.0
2020-01-13 08:23:00 2.0
2020-01-13 08:24:00 3.0
2020-01-13 08:25:00 3.0
2020-01-13 08:26:00 3.0
Name: vv, dtype: float64
一个可能的想法是使用 count
将非缺失值替换为 Series.isna
到 1
:
d = {'vv': {pd.Timestamp('2020-01-13 08:22:00', freq='T'): 'aa',
pd.Timestamp('2020-01-13 08:23:00', freq='T'): 'bb',
pd.Timestamp('2020-01-13 08:24:00', freq='T'): 'cc',
pd.Timestamp('2020-01-13 08:25:00', freq='T'): np.nan,
pd.Timestamp('2020-01-13 08:26:00', freq='T'): 'dd'}}
df = pd.DataFrame(d)
df = df['vv'].where(df['vv'].isna(), 1).rolling('72s').count()
print (df)
2020-01-13 08:22:00 1.0
2020-01-13 08:23:00 2.0
2020-01-13 08:24:00 2.0
2020-01-13 08:25:00 1.0
2020-01-13 08:26:00 1.0
Name: vv, dtype: float64
详情:
print (df['vv'].where(df['vv'].isna(), 1))
2020-01-13 08:22:00 1
2020-01-13 08:23:00 1
2020-01-13 08:24:00 1
2020-01-13 08:25:00 NaN
2020-01-13 08:26:00 1
Name: vv, dtype: object
测试没有缺失值的第一个想法 sum
:
df = df['vv'].notna().rolling('72s').sum()
print (df)
2020-01-13 08:22:00 1.0
2020-01-13 08:23:00 2.0
2020-01-13 08:24:00 2.0
2020-01-13 08:25:00 1.0
2020-01-13 08:26:00 1.0
Name: vv, dtype: float64
尝试使用 pandas 通过时间索引和 count() 方法滚动时出现错误,我在这里缺少什么?
这里有一个例子:
d = {'vv': {pd.Timestamp('2020-01-13 08:22:00', freq='T'): 'aa',
pd.Timestamp('2020-01-13 08:23:00', freq='T'): 'bb',
pd.Timestamp('2020-01-13 08:24:00', freq='T'): 'cc',
pd.Timestamp('2020-01-13 08:25:00', freq='T'): np.nan,
pd.Timestamp('2020-01-13 08:26:00', freq='T'): 'dd'}}
df = pd.DataFrame(d)
df['vv'].rolling('72s').count()
得到这个:
ValueError Traceback (most recent call last)
~\Anaconda3\lib\site-packages\pandas\core\window\rolling.py in _prep_values(self, values)
354 try:
--> 355 values = ensure_float64(values)
356 except (ValueError, TypeError) as err:
pandas\_libs\algos_common_helper.pxi in pandas._libs.algos.ensure_float64()
ValueError: could not convert string to float: 'aa'
The above exception was the direct cause of the following exception:
TypeError Traceback (most recent call last)
~\Anaconda3\lib\site-packages\pandas\core\window\rolling.py in _apply(self, func, center, require_min_periods, floor, is_weighted, name, use_numba_cache, **kwargs)
535 try:
--> 536 values = self._prep_values(b.values)
537
~\Anaconda3\lib\site-packages\pandas\core\window\rolling.py in _prep_values(self, values)
356 except (ValueError, TypeError) as err:
--> 357 raise TypeError(f"cannot handle this type -> {values.dtype}") from err
358
TypeError: cannot handle this type -> object
The above exception was the direct cause of the following exception:
DataError Traceback (most recent call last)
<ipython-input-166-8676c880bf7e> in <module>
7 df = pd.DataFrame(d)
8
----> 9 df['vv'].rolling('72s').count()
~\Anaconda3\lib\site-packages\pandas\core\window\rolling.py in count(self)
2048 if self.is_freq_type or isinstance(self.window, BaseIndexer):
2049 window_func = self._get_roll_func("roll_count")
-> 2050 return self._apply(window_func, center=self.center, name="count")
2051
2052 return super().count()
~\Anaconda3\lib\site-packages\pandas\core\window\rolling.py in _apply(self, func, center, require_min_periods, floor, is_weighted, name, use_numba_cache, **kwargs)
542 continue
543 else:
--> 544 raise DataError("No numeric types to aggregate") from err
545
546 if values.size == 0:
DataError: No numeric types to aggregate
Pandas rolling time window fails on count of string - why?
我认为这是错误。它可以很好地处理 window 的数值:
s = df['vv'].rolling(4).count()
print (s)
2020-01-13 08:22:00 1.0
2020-01-13 08:23:00 2.0
2020-01-13 08:24:00 3.0
2020-01-13 08:25:00 3.0
2020-01-13 08:26:00 3.0
Name: vv, dtype: float64
一个可能的想法是使用 count
将非缺失值替换为 Series.isna
到 1
:
d = {'vv': {pd.Timestamp('2020-01-13 08:22:00', freq='T'): 'aa',
pd.Timestamp('2020-01-13 08:23:00', freq='T'): 'bb',
pd.Timestamp('2020-01-13 08:24:00', freq='T'): 'cc',
pd.Timestamp('2020-01-13 08:25:00', freq='T'): np.nan,
pd.Timestamp('2020-01-13 08:26:00', freq='T'): 'dd'}}
df = pd.DataFrame(d)
df = df['vv'].where(df['vv'].isna(), 1).rolling('72s').count()
print (df)
2020-01-13 08:22:00 1.0
2020-01-13 08:23:00 2.0
2020-01-13 08:24:00 2.0
2020-01-13 08:25:00 1.0
2020-01-13 08:26:00 1.0
Name: vv, dtype: float64
详情:
print (df['vv'].where(df['vv'].isna(), 1))
2020-01-13 08:22:00 1
2020-01-13 08:23:00 1
2020-01-13 08:24:00 1
2020-01-13 08:25:00 NaN
2020-01-13 08:26:00 1
Name: vv, dtype: object
测试没有缺失值的第一个想法 sum
:
df = df['vv'].notna().rolling('72s').sum()
print (df)
2020-01-13 08:22:00 1.0
2020-01-13 08:23:00 2.0
2020-01-13 08:24:00 2.0
2020-01-13 08:25:00 1.0
2020-01-13 08:26:00 1.0
Name: vv, dtype: float64