在 pandas 滚动函数上获取 CSV 值

Question

我试图在滚动方法中获取给定 window 值的 csv 输出，但出现错误 must be real number, not str。

看来输出必须是数字类型。 https://github.com/pandas-dev/pandas/issues/23002

df = pd.DataFrame({"a": range(1,10)})
df.head(10)

# Output
    a
0   1
1   2
2   3
3   4
4   5
5   6
6   7
7   8
8   9

尝试过：

to_csv = lambda x: x.to_csv(index=False)
# to_csv = lambda x: ",".join([str(d) for d in x])
df["running_csv"] = df.rolling(min_periods=1, window=3).apply(to_csv) # <= Causes Error

# Error:
# TypeError: must be real number, not str

预期输出

    a   running_csv
0   1   1
1   2   1,2
2   3   1,2,3
3   4   2,3,4
4   5   3,4,5
5   6   4,5,6
6   7   5,6,7
7   8   6,7,8
8   9   7,8,9

问题：是否有任何替代方法来获取如上所示的 CSV 输出？

Answer 1

是这样的吗？

>>> df['running_csv'] = pd.Series(df.rolling(min_periods=1, window=3)).apply(lambda x:x.a.values)
>>> df
   a running_csv
0  1         [1]
1  2      [1, 2]
2  3   [1, 2, 3]
3  4   [2, 3, 4]
4  5   [3, 4, 5]
5  6   [4, 5, 6]
6  7   [5, 6, 7]
7  8   [6, 7, 8]
8  9   [7, 8, 9]

从这里开始，进一步的处理应该很容易了。

Answer 2

虽然如果能够使用以下方式执行此操作会很棒：

df['a'].astype(str).rolling(min_periods=1, window=3).apply(''.join)

如您所述，目前正在滚动 does not work with strings

这是一种方法：

(pd.DataFrame({i: df['a'].astype(str).shift(i) for i in range(3)[::-1]})
   .fillna('')
   .apply(','.join, axis=1)
   .str.strip(',')
)

输出：

Answer 3

如果输出不是数字，

pd.rolling 不起作用。 github 上存在多个问题。但是，有可能得到结果：

to_str_list = lambda x: ','.join(x[x.notna()].astype(int).astype(str).values.tolist())

df['running_csv'] = pd.concat([df['a'].shift(2), df['a'].shift(1), df['a']], axis=1) \
                      .apply(to_str_list), axis=1)

>>> df
   a running_csv
0  1           1
1  2         1,2
2  3       1,2,3
3  4       2,3,4
4  5       3,4,5
5  6       4,5,6
6  7       5,6,7
7  8       6,7,8
8  9       7,8,9

Answer 4

利用@fsimonjetz的incomplete/partial解决方案，我们可以完成它来生成CSV值，如下：

df['running_csv'] = (pd.Series(df.rolling(min_periods=1, window=3))
                                 .apply(lambda x: x['a'].astype(str).values)
                                 .str.join(',')
                    )

或者进一步简化和增强到所有向量化函数，如下：

df['running_csv'] = (pd.Series(df['a'].astype(str).rolling(min_periods=1, window=3))
                                                  .str.join(','))

现在，我们已将所有（慢速）.apply() 和 lambda 函数转换为仅（快速）向量化函数。

结果：

print(df)

   a running_csv
0  1           1
1  2         1,2
2  3       1,2,3
3  4       2,3,4
4  5       3,4,5
5  6       4,5,6
6  7       5,6,7
7  8       6,7,8
8  9       7,8,9

在 pandas 滚动函数上获取 CSV 值

Get CSV values on a pandas rolling function

python

pandas

rolling-computation