如何使用 rpy2 使用 for 循环测试重要性?
How to use rpy2 to test significance using a for loop?
我正在尝试使用 r(在 rpy2 包的帮助下)对 pandas 数据帧中的一些变量进行 运行 t 检验。我在jupyter notebook中使用魔术函数来获得python与R交互。交互成功,除了循环。
这是数据框:
df.head()
Out[60]:
ID Category Num Vert_Horizon Description Fem_Valence_Mean \
0 Animals_001_h Animals 1 h Dead Stork 2.40
1 Animals_002_v Animals 2 v Lion 6.31
2 Animals_003_h Animals 3 h Snake 5.14
3 Animals_004_v Animals 4 v Wolf 4.55
4 Animals_005_h Animals 5 h Bat 5.29
Fem_Valence_SD Fem_Av/Ap_Mean Fem_Av/Ap_SD Arousal_Mean ... \
0 1.30 3.03 1.47 6.72 ...
1 2.19 5.96 2.24 6.69 ...
2 1.19 5.14 1.75 5.34 ...
3 1.87 4.82 2.27 6.84 ...
4 1.56 4.61 1.81 5.50 ...
Luminance Contrast JPEG_size80 LABL LABA LABB Entropy \
0 126.05 68.45 263028 51.75 -0.39 16.93 7.86
1 123.41 32.34 250208 52.39 10.63 30.30 6.71
2 135.28 59.92 190887 55.45 0.25 4.41 7.83
3 122.15 75.10 282350 49.84 3.82 1.36 7.69
4 131.81 59.77 329325 54.26 -0.34 -0.95 7.82
Classification valence_median_split temp_selection
0 Low_Valence OUT
1 High_Valence NaN
2 Low_Valence OUT
3 Low_Valence OUT
4 Low_Valence OUT
[5 rows x 35 columns]
我是这样尝试的:
%Rpush df
Variables = 'All_Valence_Mean', 'Male_Valence_Mean', 'Fem_Valence_Mean'
for var in Variables:
%R var + '_Sig' <- t.test(var ~ valence_median_split, data = df, var.equal = TRUE)
我试图将结果保存到添加了 "Sig" 字符串的 'var' 变量中。这个组件并不重要,但我真正想要的是让这个命令将 "var" 识别为变量列表中的一个变量。
这是我得到的错误:
Error in model.frame.default(formula = var ~ valence_median_split, data = df) :
invalid type (list) for variable 'var'
Error in model.frame.default(formula = var ~ valence_median_split, data = df) :
invalid type (list) for variable 'var'
Error in model.frame.default(formula = var ~ valence_median_split, data = df) :
invalid type (list) for variable 'var'
/anaconda3/lib/python3.7/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: Error in model.frame.default(formula = var ~ valence_median_split, data = df) :
invalid type (list) for variable 'var'
warnings.warn(x, RRuntimeWarning)
如果您对 R 更熟悉,请将尽可能多的逻辑推送到 R。例如,这会将结果存储在 results
您将能够在随后的笔记本单元格中从 Python 访问。
%%R -i df -o results
Variables <- c("All_Valence_Mean", "Male_Valence_Mean",
"Fem_Valence_Mean")
results <- list()
for (var in Variables) {
results[[paste0(var, '_Sig')]] <- t.test(
as.formula(paste(var, '~ valence_median_split')),
data = df, var.equal = TRUE)
}
如果您对 Python 更满意,请尽可能多地保留 Python:
Variables = ('All_Valence_Mean', 'Male_Valence_Mean',
'Fem_Valence_Mean')
results = dict()
from rpy2.robjects.packages import importr
from rpy2.robjects import Formula
stats = importr('stats')
for var in Variables:
results[('%s_Sig' % var] = stats.t_test(
Formula('%s ~ valence_median_split' % var),
data=df, var_equal=True)
我正在尝试使用 r(在 rpy2 包的帮助下)对 pandas 数据帧中的一些变量进行 运行 t 检验。我在jupyter notebook中使用魔术函数来获得python与R交互。交互成功,除了循环。
这是数据框:
df.head()
Out[60]:
ID Category Num Vert_Horizon Description Fem_Valence_Mean \
0 Animals_001_h Animals 1 h Dead Stork 2.40
1 Animals_002_v Animals 2 v Lion 6.31
2 Animals_003_h Animals 3 h Snake 5.14
3 Animals_004_v Animals 4 v Wolf 4.55
4 Animals_005_h Animals 5 h Bat 5.29
Fem_Valence_SD Fem_Av/Ap_Mean Fem_Av/Ap_SD Arousal_Mean ... \
0 1.30 3.03 1.47 6.72 ...
1 2.19 5.96 2.24 6.69 ...
2 1.19 5.14 1.75 5.34 ...
3 1.87 4.82 2.27 6.84 ...
4 1.56 4.61 1.81 5.50 ...
Luminance Contrast JPEG_size80 LABL LABA LABB Entropy \
0 126.05 68.45 263028 51.75 -0.39 16.93 7.86
1 123.41 32.34 250208 52.39 10.63 30.30 6.71
2 135.28 59.92 190887 55.45 0.25 4.41 7.83
3 122.15 75.10 282350 49.84 3.82 1.36 7.69
4 131.81 59.77 329325 54.26 -0.34 -0.95 7.82
Classification valence_median_split temp_selection
0 Low_Valence OUT
1 High_Valence NaN
2 Low_Valence OUT
3 Low_Valence OUT
4 Low_Valence OUT
[5 rows x 35 columns]
我是这样尝试的:
%Rpush df
Variables = 'All_Valence_Mean', 'Male_Valence_Mean', 'Fem_Valence_Mean'
for var in Variables:
%R var + '_Sig' <- t.test(var ~ valence_median_split, data = df, var.equal = TRUE)
我试图将结果保存到添加了 "Sig" 字符串的 'var' 变量中。这个组件并不重要,但我真正想要的是让这个命令将 "var" 识别为变量列表中的一个变量。
这是我得到的错误:
Error in model.frame.default(formula = var ~ valence_median_split, data = df) :
invalid type (list) for variable 'var'
Error in model.frame.default(formula = var ~ valence_median_split, data = df) :
invalid type (list) for variable 'var'
Error in model.frame.default(formula = var ~ valence_median_split, data = df) :
invalid type (list) for variable 'var'
/anaconda3/lib/python3.7/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: Error in model.frame.default(formula = var ~ valence_median_split, data = df) :
invalid type (list) for variable 'var'
warnings.warn(x, RRuntimeWarning)
如果您对 R 更熟悉,请将尽可能多的逻辑推送到 R。例如,这会将结果存储在 results
您将能够在随后的笔记本单元格中从 Python 访问。
%%R -i df -o results
Variables <- c("All_Valence_Mean", "Male_Valence_Mean",
"Fem_Valence_Mean")
results <- list()
for (var in Variables) {
results[[paste0(var, '_Sig')]] <- t.test(
as.formula(paste(var, '~ valence_median_split')),
data = df, var.equal = TRUE)
}
如果您对 Python 更满意,请尽可能多地保留 Python:
Variables = ('All_Valence_Mean', 'Male_Valence_Mean',
'Fem_Valence_Mean')
results = dict()
from rpy2.robjects.packages import importr
from rpy2.robjects import Formula
stats = importr('stats')
for var in Variables:
results[('%s_Sig' % var] = stats.t_test(
Formula('%s ~ valence_median_split' % var),
data=df, var_equal=True)