数据框列 numpy

Dataframe column numpy

这里需要一些帮助:我想让 df2 的 column1 返回相关的 Cat 值。下面的代码和实际输出和预期输出:

import pandas as pd
import numpy as np

data = {'cat': [1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3],
        'date': ['2021-12-30', '2022-01-02', '2022-01-03', '2022-01-04', '2022-01-05', '2022-01-06',\
                '2022-01-09', '2022-01-10', '2021-12-30', '2022-01-02', '2022-01-03', '2022-01-04', \
                '2022-01-05', '2022-01-06', '2022-01-09', '2022-01-10','2021-12-30', '2022-01-02', \
                '2022-01-03', '2022-01-04', '2022-01-05', '2022-01-06', '2022-01-09', '2022-01-10'],
        'value': [99435, 99401, 100113, 100528, 100428, 100734, 99035, 100077, 100018, 99425, 100728, 100863, \
                99930, 100076, 100995, 100640, 99158, 100868, 100819, 99247, 100851, 100500, 100082, 99089],
        'act': [1.0000, 0.7981, 0.7785, 0.3563, 0.1916, 0.0000, 0.0000, 0.0233, 1.0000, 0.5625, 0.5774, \
                0.6777, 0.7300, 0.1951, 0.1966, 0.6413, 1.0000, 0.7905, 0.7867, 0.000, 0.8769, 0.4683, 0.7122, 0.7183]
    }

df =pd.DataFrame(data)
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')
df['prev'] = df.groupby('cat')['act'].shift(1)
df['ln_return'] = np.log(df['act']/df['prev'].clip(0.000001)).clip(np.log(0.000001))
df['ln_return'] = df['ln_return'].mask(df['prev'].eq(0)&df['act'].eq(0), 0)
df = df.dropna()

a_avg = df.groupby('cat')['value'].mean().to_numpy()
a_test  = a_avg /1000
df2 = pd.DataFrame({'col2': a_avg, 'col3':a_test})
print(df2)

Out:                             Expected:
            col2        col3         col1             col2        col3
0  100045.142857  100.045143     0      1    100045.142857  100.045143
1  100379.571429  100.379571     1      2    100379.571429  100.379571
2  100208.000000  100.208000     2      3    100208.000000  100.208000

您不应该下降到 numpy,因为这会导致对齐问题。 Insted 在 pandas 内完全工作。在这种情况下,您可以 groupby + agg,然后 eval 创建您的列除以 1000,然后 rename cat 列为您想要的。

如果将它分成多个语句而不是只做数学运算,则可以避免 eval

(df.groupby('cat', as_index=False)
   .agg(col2=('value', 'mean'))
   .eval('col3 = col2/1000')
   .rename(columns={'cat': 'col1'}))

   col1           col2        col3
0     1  100045.142857  100.045143
1     2  100379.571429  100.379571
2     3  100208.000000  100.208000