如何在 pandas 中跨数据帧划分值?
How do I divide values across dataframes in pandas?
所以我有一个原始数据集:
original_data_set
我从csv文件中读入,然后按字段分开:
像这样,loan_df = re_df.loc[re_df.field == 'loan_amount'] home_df = re_df.loc[re_df.field == 'home_value']
产生
loans
home_vals
我想在两个数据帧上划分值字段,但是当我尝试时,ltv_df = loan_df['value']/home_df['value']
,我得到了一系列 NaN 值。
有人有什么建议吗?
两个选项:
如果只需要 values
numpy 除法工作:
ltv_df = loan_df['value'].values / home_df['value'].values
[0.57238284 1.30293486]
或者如果需要 DataFrame,请使用 set_index
, divide then reset_index
返回 DataFrame:
ltv_df = (
loan_df.set_index('loan_id')['value'] /
home_df.set_index('loan_id')['value']
).reset_index(name='result')
loan_id result
0 1 0.572383
1 2 1.302935
直接从初始 DataFrame 获取值
ltv_df = (
re_df.groupby('loan_id')['value'].apply(lambda x: np.divide(*x))
.reset_index(name='result')
)
loan_id result
0 1 0.572383
1 2 1.302935
数据帧设置:
import numpy as np
import pandas as pd
re_df = pd.DataFrame({'loan_id': [1, 1, 2, 2],
'field': ['loan_amount', 'home_value'] * 2,
'value': [65037, 113625, 84395, 64773]})
loan_df = re_df.loc[re_df.field == 'loan_amount']
home_df = re_df.loc[re_df.field == 'home_value']
所以我有一个原始数据集: original_data_set
我从csv文件中读入,然后按字段分开:
像这样,loan_df = re_df.loc[re_df.field == 'loan_amount'] home_df = re_df.loc[re_df.field == 'home_value']
产生 loans home_vals
我想在两个数据帧上划分值字段,但是当我尝试时,ltv_df = loan_df['value']/home_df['value']
,我得到了一系列 NaN 值。
有人有什么建议吗?
两个选项:
如果只需要 values
numpy 除法工作:
ltv_df = loan_df['value'].values / home_df['value'].values
[0.57238284 1.30293486]
或者如果需要 DataFrame,请使用 set_index
, divide then reset_index
返回 DataFrame:
ltv_df = (
loan_df.set_index('loan_id')['value'] /
home_df.set_index('loan_id')['value']
).reset_index(name='result')
loan_id result
0 1 0.572383
1 2 1.302935
直接从初始 DataFrame 获取值
ltv_df = (
re_df.groupby('loan_id')['value'].apply(lambda x: np.divide(*x))
.reset_index(name='result')
)
loan_id result
0 1 0.572383
1 2 1.302935
数据帧设置:
import numpy as np
import pandas as pd
re_df = pd.DataFrame({'loan_id': [1, 1, 2, 2],
'field': ['loan_amount', 'home_value'] * 2,
'value': [65037, 113625, 84395, 64773]})
loan_df = re_df.loc[re_df.field == 'loan_amount']
home_df = re_df.loc[re_df.field == 'home_value']