数据框列 numpy
Dataframe column numpy
这里需要一些帮助:我想让 df2 的 column1 返回相关的 Cat 值。下面的代码和实际输出和预期输出:
import pandas as pd
import numpy as np
data = {'cat': [1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3],
'date': ['2021-12-30', '2022-01-02', '2022-01-03', '2022-01-04', '2022-01-05', '2022-01-06',\
'2022-01-09', '2022-01-10', '2021-12-30', '2022-01-02', '2022-01-03', '2022-01-04', \
'2022-01-05', '2022-01-06', '2022-01-09', '2022-01-10','2021-12-30', '2022-01-02', \
'2022-01-03', '2022-01-04', '2022-01-05', '2022-01-06', '2022-01-09', '2022-01-10'],
'value': [99435, 99401, 100113, 100528, 100428, 100734, 99035, 100077, 100018, 99425, 100728, 100863, \
99930, 100076, 100995, 100640, 99158, 100868, 100819, 99247, 100851, 100500, 100082, 99089],
'act': [1.0000, 0.7981, 0.7785, 0.3563, 0.1916, 0.0000, 0.0000, 0.0233, 1.0000, 0.5625, 0.5774, \
0.6777, 0.7300, 0.1951, 0.1966, 0.6413, 1.0000, 0.7905, 0.7867, 0.000, 0.8769, 0.4683, 0.7122, 0.7183]
}
df =pd.DataFrame(data)
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')
df['prev'] = df.groupby('cat')['act'].shift(1)
df['ln_return'] = np.log(df['act']/df['prev'].clip(0.000001)).clip(np.log(0.000001))
df['ln_return'] = df['ln_return'].mask(df['prev'].eq(0)&df['act'].eq(0), 0)
df = df.dropna()
a_avg = df.groupby('cat')['value'].mean().to_numpy()
a_test = a_avg /1000
df2 = pd.DataFrame({'col2': a_avg, 'col3':a_test})
print(df2)
Out: Expected:
col2 col3 col1 col2 col3
0 100045.142857 100.045143 0 1 100045.142857 100.045143
1 100379.571429 100.379571 1 2 100379.571429 100.379571
2 100208.000000 100.208000 2 3 100208.000000 100.208000
您不应该下降到 numpy
,因为这会导致对齐问题。 Insted 在 pandas
内完全工作。在这种情况下,您可以 groupby
+ agg
,然后 eval
创建您的列除以 1000,然后 rename
cat 列为您想要的。
如果将它分成多个语句而不是只做数学运算,则可以避免 eval
。
(df.groupby('cat', as_index=False)
.agg(col2=('value', 'mean'))
.eval('col3 = col2/1000')
.rename(columns={'cat': 'col1'}))
col1 col2 col3
0 1 100045.142857 100.045143
1 2 100379.571429 100.379571
2 3 100208.000000 100.208000
这里需要一些帮助:我想让 df2 的 column1 返回相关的 Cat 值。下面的代码和实际输出和预期输出:
import pandas as pd
import numpy as np
data = {'cat': [1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3],
'date': ['2021-12-30', '2022-01-02', '2022-01-03', '2022-01-04', '2022-01-05', '2022-01-06',\
'2022-01-09', '2022-01-10', '2021-12-30', '2022-01-02', '2022-01-03', '2022-01-04', \
'2022-01-05', '2022-01-06', '2022-01-09', '2022-01-10','2021-12-30', '2022-01-02', \
'2022-01-03', '2022-01-04', '2022-01-05', '2022-01-06', '2022-01-09', '2022-01-10'],
'value': [99435, 99401, 100113, 100528, 100428, 100734, 99035, 100077, 100018, 99425, 100728, 100863, \
99930, 100076, 100995, 100640, 99158, 100868, 100819, 99247, 100851, 100500, 100082, 99089],
'act': [1.0000, 0.7981, 0.7785, 0.3563, 0.1916, 0.0000, 0.0000, 0.0233, 1.0000, 0.5625, 0.5774, \
0.6777, 0.7300, 0.1951, 0.1966, 0.6413, 1.0000, 0.7905, 0.7867, 0.000, 0.8769, 0.4683, 0.7122, 0.7183]
}
df =pd.DataFrame(data)
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')
df['prev'] = df.groupby('cat')['act'].shift(1)
df['ln_return'] = np.log(df['act']/df['prev'].clip(0.000001)).clip(np.log(0.000001))
df['ln_return'] = df['ln_return'].mask(df['prev'].eq(0)&df['act'].eq(0), 0)
df = df.dropna()
a_avg = df.groupby('cat')['value'].mean().to_numpy()
a_test = a_avg /1000
df2 = pd.DataFrame({'col2': a_avg, 'col3':a_test})
print(df2)
Out: Expected:
col2 col3 col1 col2 col3
0 100045.142857 100.045143 0 1 100045.142857 100.045143
1 100379.571429 100.379571 1 2 100379.571429 100.379571
2 100208.000000 100.208000 2 3 100208.000000 100.208000
您不应该下降到 numpy
,因为这会导致对齐问题。 Insted 在 pandas
内完全工作。在这种情况下,您可以 groupby
+ agg
,然后 eval
创建您的列除以 1000,然后 rename
cat 列为您想要的。
如果将它分成多个语句而不是只做数学运算,则可以避免 eval
。
(df.groupby('cat', as_index=False)
.agg(col2=('value', 'mean'))
.eval('col3 = col2/1000')
.rename(columns={'cat': 'col1'}))
col1 col2 col3
0 1 100045.142857 100.045143
1 2 100379.571429 100.379571
2 3 100208.000000 100.208000