如何关联 R 中的多个子集

Question

如何分别将 8 个子集与两个不同的因变量相关联？对于两个不同的子集，我一直得到相同的相关系数（下面的示例）。这是输入：

with(subset(mydata2, PARTYID_Strength = 1), cor.test(PARTYID_Strength,
                                                     mean.legit))

with(subset(mydata2, PARTYID_Strength = 1), cor.test(PARTYID_Strength,
                                                     mean.leegauthor))

with(subset(mydata2, PARTYID_Strength = 2), cor.test(PARTYID_Strength,
                                                     mean.legit))

with(subset(mydata2, PARTYID_Strength = 2), cor.test(PARTYID_Strength,
                                                     mean.leegauthor))

输出（我为 PARTY_Strength = 1 和 2 都得到了这个）：

Pearson's product-moment correlation

data: PARTYID_Strength and mean.legit t = 3.1005, df = 607, p-value = 0.002022 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval:
0.0458644 0.2023031 sample estimates:
cor
0.1248597

Pearson's product-moment correlation

data: PARTYID_Strength and mean.leegauthor t = 2.8474, df = 607, p-value = 0.004557 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval:
0.03568431 0.19250344 sample estimates:
cor
0.1148091

示例数据：

> dput(head(mydata2, 10))
``structure(list(PARTYID = c(1, 3, 1, 1, 1, 4, 3, 1, 1, 1), PARTYID_Other = 
c("NA", 
"NA", "NA", "NA", "NA", "Green", "NA", "NA", "NA", "NA"), PARTYID_Strength = 
c(1, 
7, 1, 2, 1, 8, 1, 6, 1, 1), PARTYID_Strength_Other = c("NA", 
"NA", "NA", "NA", "NA", "Green", "NA", "NA", "NA", "NA"), THERM_Dem = c(80, 
65, 85, 30, 76, 15, 55, 62, 90, 95), THERM_Rep = c(1, 45, 10, 
5, 14, 14, 0, 4, 10, 3), Gender = c("Female", "Male", "Male", 
"Female", "Female", "Male", "Male", "Female", "Female", "Male"
), `MEAN Age` = c(29.5, 49.5, 29.5, 39.5, 29.5, 21, 39.5, 39.5, 
29.5, 65), Age = c("25 - 34", "45 - 54", "25 - 34", "35 - 44", 
"25 - 34", "18 - 24", "35 - 44", "35 - 44", "25 - 34", "65+"), 
Ethnicity = c("White or Caucasian", "Asian or Asian American", 
"White or Caucasian", "White or Caucasian", "Hispanic or Latino", 
"White or Caucasian", "White or Caucasian", "White or Caucasian", 
"White or Caucasian", "White or Caucasian"), Ethnicity_Other = c("NA", 
"NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA"), States = c("Texas", 
"Texas", "Ohio", "Texas", "Puerto Rico", "New Hampshire", 
"South Carolina", "Texas", "Texas", "Texas"), Education = c("Master's 
degree", 
"Bachelor's degree in college (4-year)", "Bachelor's degree in college (4- 
 year)", 
"Master's degree", "Master's degree", "Less than high school degree", 
"Some college but no degree", "Master's degree", "Master's degree", 
"Some college but no degree"), `MEAN Income` = c(30000, 140000, 
150000, 60000, 80000, 30000, 30000, 120000, 150000, 60000
), Income = c("Less than ,000", "0,001 to 0,000", 
"More than 0,000", ",001 to ,000", ",001 to ,000", 
"Less than ,000", "Less than ,000", "0,001 to 0,000", 
"More than 0,000", ",001 to ,000"), mean.partystrength = c(3.875, 
2.875, 2.375, 3.5, 2.625, 3.125, 3.375, 3.125, 3.25, 3.625
), mean.traitrep = c(2.5, 2.625, 2.25, 2.625, 2.75, 1.875, 
2.75, 2.875, 2.75, 3), mean.traitdem = c(2.25, 2.625, 2.375, 
2.75, 2.625, 2.125, 1.875, 3, 2, 2.5), mean.leegauthor = c(1, 
2, 2, 4, 1, 4, 1, 1, 1, 1), mean.legit = c(1.71428571428571, 
3.28571428571429, 2.42857142857143, 2.42857142857143, 2.14285714285714, 
1.28571428571429, 1.42857142857143, 1.14285714285714, 2.14285714285714, 
1.28571428571429)), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame"))``

谢谢！

Answer 1

为了运行测试，创建一个感兴趣列的向量，然后 sapply 每个列的匿名函数。

fixed <- "PARTYID_Strength"
cols <- c("mean.leegauthor", "mean.legit")

cor_test_result <- sapply(cols, function(x){
  fmla <- paste(fixed, x, sep = "+")
  fmla <- as.formula(paste("~", fmla))
  cor.test(fmla, mydata2)
}, simplify = FALSE)

cor_test_result$mean.leegauthor
#
#        Pearson's product-moment correlation
#
#data:  PARTYID_Strength and mean.leegauthor
#t = 1.4804, df = 8, p-value = 0.177
#alternative hypothesis: true correlation is not equal to 0
#95 percent confidence interval:
# -0.2343269  0.8462610
#sample estimates:
#      cor 
#0.4637152

如何关联 R 中的多个子集

How to correlate multiple subsets in R

regression

r

correlation

lm