如何关联 R 中的多个子集
How to correlate multiple subsets in R
如何分别将 8 个子集与两个不同的因变量相关联?对于两个不同的子集,我一直得到相同的相关系数(下面的示例)。这是输入:
with(subset(mydata2, PARTYID_Strength = 1), cor.test(PARTYID_Strength,
mean.legit))
with(subset(mydata2, PARTYID_Strength = 1), cor.test(PARTYID_Strength,
mean.leegauthor))
with(subset(mydata2, PARTYID_Strength = 2), cor.test(PARTYID_Strength,
mean.legit))
with(subset(mydata2, PARTYID_Strength = 2), cor.test(PARTYID_Strength,
mean.leegauthor))
输出(我为 PARTY_Strength = 1 和 2 都得到了这个):
Pearson's product-moment correlation
data: PARTYID_Strength and mean.legit t = 3.1005, df = 607, p-value
= 0.002022 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval:
0.0458644 0.2023031 sample estimates:
cor
0.1248597
Pearson's product-moment correlation
data: PARTYID_Strength and mean.leegauthor t = 2.8474, df = 607,
p-value = 0.004557 alternative hypothesis: true correlation is not
equal to 0 95 percent confidence interval:
0.03568431 0.19250344 sample estimates:
cor
0.1148091
示例数据:
> dput(head(mydata2, 10))
``structure(list(PARTYID = c(1, 3, 1, 1, 1, 4, 3, 1, 1, 1), PARTYID_Other =
c("NA",
"NA", "NA", "NA", "NA", "Green", "NA", "NA", "NA", "NA"), PARTYID_Strength =
c(1,
7, 1, 2, 1, 8, 1, 6, 1, 1), PARTYID_Strength_Other = c("NA",
"NA", "NA", "NA", "NA", "Green", "NA", "NA", "NA", "NA"), THERM_Dem = c(80,
65, 85, 30, 76, 15, 55, 62, 90, 95), THERM_Rep = c(1, 45, 10,
5, 14, 14, 0, 4, 10, 3), Gender = c("Female", "Male", "Male",
"Female", "Female", "Male", "Male", "Female", "Female", "Male"
), `MEAN Age` = c(29.5, 49.5, 29.5, 39.5, 29.5, 21, 39.5, 39.5,
29.5, 65), Age = c("25 - 34", "45 - 54", "25 - 34", "35 - 44",
"25 - 34", "18 - 24", "35 - 44", "35 - 44", "25 - 34", "65+"),
Ethnicity = c("White or Caucasian", "Asian or Asian American",
"White or Caucasian", "White or Caucasian", "Hispanic or Latino",
"White or Caucasian", "White or Caucasian", "White or Caucasian",
"White or Caucasian", "White or Caucasian"), Ethnicity_Other = c("NA",
"NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA"), States = c("Texas",
"Texas", "Ohio", "Texas", "Puerto Rico", "New Hampshire",
"South Carolina", "Texas", "Texas", "Texas"), Education = c("Master's
degree",
"Bachelor's degree in college (4-year)", "Bachelor's degree in college (4-
year)",
"Master's degree", "Master's degree", "Less than high school degree",
"Some college but no degree", "Master's degree", "Master's degree",
"Some college but no degree"), `MEAN Income` = c(30000, 140000,
150000, 60000, 80000, 30000, 30000, 120000, 150000, 60000
), Income = c("Less than ,000", "0,001 to 0,000",
"More than 0,000", ",001 to ,000", ",001 to ,000",
"Less than ,000", "Less than ,000", "0,001 to 0,000",
"More than 0,000", ",001 to ,000"), mean.partystrength = c(3.875,
2.875, 2.375, 3.5, 2.625, 3.125, 3.375, 3.125, 3.25, 3.625
), mean.traitrep = c(2.5, 2.625, 2.25, 2.625, 2.75, 1.875,
2.75, 2.875, 2.75, 3), mean.traitdem = c(2.25, 2.625, 2.375,
2.75, 2.625, 2.125, 1.875, 3, 2, 2.5), mean.leegauthor = c(1,
2, 2, 4, 1, 4, 1, 1, 1, 1), mean.legit = c(1.71428571428571,
3.28571428571429, 2.42857142857143, 2.42857142857143, 2.14285714285714,
1.28571428571429, 1.42857142857143, 1.14285714285714, 2.14285714285714,
1.28571428571429)), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))``
谢谢!
为了 运行 测试,创建一个感兴趣列的向量,然后 sapply
每个列的匿名函数。
fixed <- "PARTYID_Strength"
cols <- c("mean.leegauthor", "mean.legit")
cor_test_result <- sapply(cols, function(x){
fmla <- paste(fixed, x, sep = "+")
fmla <- as.formula(paste("~", fmla))
cor.test(fmla, mydata2)
}, simplify = FALSE)
cor_test_result$mean.leegauthor
#
# Pearson's product-moment correlation
#
#data: PARTYID_Strength and mean.leegauthor
#t = 1.4804, df = 8, p-value = 0.177
#alternative hypothesis: true correlation is not equal to 0
#95 percent confidence interval:
# -0.2343269 0.8462610
#sample estimates:
# cor
#0.4637152
如何分别将 8 个子集与两个不同的因变量相关联?对于两个不同的子集,我一直得到相同的相关系数(下面的示例)。这是输入:
with(subset(mydata2, PARTYID_Strength = 1), cor.test(PARTYID_Strength,
mean.legit))
with(subset(mydata2, PARTYID_Strength = 1), cor.test(PARTYID_Strength,
mean.leegauthor))
with(subset(mydata2, PARTYID_Strength = 2), cor.test(PARTYID_Strength,
mean.legit))
with(subset(mydata2, PARTYID_Strength = 2), cor.test(PARTYID_Strength,
mean.leegauthor))
输出(我为 PARTY_Strength = 1 和 2 都得到了这个):
Pearson's product-moment correlation
data: PARTYID_Strength and mean.legit t = 3.1005, df = 607, p-value = 0.002022 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval:
0.0458644 0.2023031 sample estimates:
cor
0.1248597Pearson's product-moment correlation
data: PARTYID_Strength and mean.leegauthor t = 2.8474, df = 607, p-value = 0.004557 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval:
0.03568431 0.19250344 sample estimates:
cor
0.1148091
示例数据:
> dput(head(mydata2, 10))
``structure(list(PARTYID = c(1, 3, 1, 1, 1, 4, 3, 1, 1, 1), PARTYID_Other =
c("NA",
"NA", "NA", "NA", "NA", "Green", "NA", "NA", "NA", "NA"), PARTYID_Strength =
c(1,
7, 1, 2, 1, 8, 1, 6, 1, 1), PARTYID_Strength_Other = c("NA",
"NA", "NA", "NA", "NA", "Green", "NA", "NA", "NA", "NA"), THERM_Dem = c(80,
65, 85, 30, 76, 15, 55, 62, 90, 95), THERM_Rep = c(1, 45, 10,
5, 14, 14, 0, 4, 10, 3), Gender = c("Female", "Male", "Male",
"Female", "Female", "Male", "Male", "Female", "Female", "Male"
), `MEAN Age` = c(29.5, 49.5, 29.5, 39.5, 29.5, 21, 39.5, 39.5,
29.5, 65), Age = c("25 - 34", "45 - 54", "25 - 34", "35 - 44",
"25 - 34", "18 - 24", "35 - 44", "35 - 44", "25 - 34", "65+"),
Ethnicity = c("White or Caucasian", "Asian or Asian American",
"White or Caucasian", "White or Caucasian", "Hispanic or Latino",
"White or Caucasian", "White or Caucasian", "White or Caucasian",
"White or Caucasian", "White or Caucasian"), Ethnicity_Other = c("NA",
"NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA"), States = c("Texas",
"Texas", "Ohio", "Texas", "Puerto Rico", "New Hampshire",
"South Carolina", "Texas", "Texas", "Texas"), Education = c("Master's
degree",
"Bachelor's degree in college (4-year)", "Bachelor's degree in college (4-
year)",
"Master's degree", "Master's degree", "Less than high school degree",
"Some college but no degree", "Master's degree", "Master's degree",
"Some college but no degree"), `MEAN Income` = c(30000, 140000,
150000, 60000, 80000, 30000, 30000, 120000, 150000, 60000
), Income = c("Less than ,000", "0,001 to 0,000",
"More than 0,000", ",001 to ,000", ",001 to ,000",
"Less than ,000", "Less than ,000", "0,001 to 0,000",
"More than 0,000", ",001 to ,000"), mean.partystrength = c(3.875,
2.875, 2.375, 3.5, 2.625, 3.125, 3.375, 3.125, 3.25, 3.625
), mean.traitrep = c(2.5, 2.625, 2.25, 2.625, 2.75, 1.875,
2.75, 2.875, 2.75, 3), mean.traitdem = c(2.25, 2.625, 2.375,
2.75, 2.625, 2.125, 1.875, 3, 2, 2.5), mean.leegauthor = c(1,
2, 2, 4, 1, 4, 1, 1, 1, 1), mean.legit = c(1.71428571428571,
3.28571428571429, 2.42857142857143, 2.42857142857143, 2.14285714285714,
1.28571428571429, 1.42857142857143, 1.14285714285714, 2.14285714285714,
1.28571428571429)), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))``
谢谢!
为了 运行 测试,创建一个感兴趣列的向量,然后 sapply
每个列的匿名函数。
fixed <- "PARTYID_Strength"
cols <- c("mean.leegauthor", "mean.legit")
cor_test_result <- sapply(cols, function(x){
fmla <- paste(fixed, x, sep = "+")
fmla <- as.formula(paste("~", fmla))
cor.test(fmla, mydata2)
}, simplify = FALSE)
cor_test_result$mean.leegauthor
#
# Pearson's product-moment correlation
#
#data: PARTYID_Strength and mean.leegauthor
#t = 1.4804, df = 8, p-value = 0.177
#alternative hypothesis: true correlation is not equal to 0
#95 percent confidence interval:
# -0.2343269 0.8462610
#sample estimates:
# cor
#0.4637152