为什么我的 R 代码 运行 在虚拟集群上没问题,但在我的物理机上却不行?
Why is my R Code running fine on a virtual cluster, but not on my physical machine?
我只是有一个关于我的代码的快速问题。我在虚拟集群 运行 RStudio 上的代码 运行 和我的物理机器上的代码 运行 之间存在一些差异。
我们必须创建一个 R-Markdown 文件才能重现方差分析 table。我运行我的代码在集群上就好了。
这是我的代码:
```{r, message=FALSE, warning=FALSE}
wine <- read.csv("wine.csv")
cultivar <- as.factor( wine[, "Cultivar"])
alcohol <- wine[, "Alcohol"]
alcohol.list <- split(alcohol, cultivar)
alcohol.list
$`1`
[1] 14.23 13.20 13.16 14.37 13.24 14.20 14.39 14.06 14.83 13.86 14.10 14.12 13.75 14.75 14.38 13.63 14.30 13.83 14.19 13.64
[21] 14.06 12.93 13.71 12.85 13.50 13.05 13.39 13.30 13.87 14.02 13.73 13.58 13.68 13.76 13.51 13.48 13.28 13.05 13.07 14.22
[41] 13.56 13.41 13.88 13.24 13.05 14.21 14.38 13.90 14.10 13.94 13.05 13.83 13.82 13.77 13.74 13.56 14.22 13.29 13.72
$`2`
[1] 12.37 12.33 12.64 13.67 12.37 12.17 12.37 13.11 12.37 13.34 12.21 12.29 13.86 13.49 12.99 11.96 11.66 13.03 11.84 12.33
[21] 12.70 12.00 12.72 12.08 13.05 11.84 12.67 12.16 11.65 11.64 12.08 12.08 12.00 12.69 12.29 11.62 12.47 11.81 12.29 12.37
[41] 12.29 12.08 12.60 12.34 11.82 12.51 12.42 12.25 12.72 12.22 11.61 11.46 12.52 11.76 11.41 12.08 11.03 11.82 12.42 12.77
[61] 12.00 11.45 11.56 12.42 13.05 11.87 12.07 12.43 11.79 12.37 12.04
$`3`
[1] 12.86 12.88 12.81 12.70 12.51 12.60 12.25 12.53 13.49 12.84 12.93 13.36 13.52 13.62 12.25 13.16 13.88 12.87 13.32 13.08
[21] 13.50 12.79 13.11 13.23 12.58 13.17 13.84 12.45 14.34 13.48 12.36 13.69 12.85 12.96 13.78 13.73 13.45 12.82 13.58 13.40
[41] 12.20 12.77 14.16 13.71 13.40 13.27 13.17 14.13
oneway <- function(z)
{
ni <- sapply(z, length)
yi_bar <- sapply(z, mean)
s2i <- sapply(z, sd)
Y_bar <- mean(unlist(z))
g <- length(z)
N <-length(unlist(z))
Within_SS = sum((ni-1) * s2i^2)
Between_SS = sum(ni *((yi_bar)-(Y_bar))^2)
DF_Within = (N - g)
DF_Between = (g - 1)
list("WithinSS" = Within_SS, "BetweenSS"= Between_SS, "DFWithin" = DF_Within, "DFBetween" = DF_Between)
}
alcohol.aov <- oneway(alcohol.list)
alcohol.aov
oneway.table <- function(z)
{
Mean_SSW <- z[[1]]/z[[3]]
Mean_SSB <- z[[2]]/z[[4]]
F_value <- (Mean_SSB/Mean_SSW)
P_value <- pf(F_value, DF_Between, DF_Within, lower.tail = FALSE)
anova <- matrix(c( z[[4]], z[[3]], z[[2]], z[[1]], Mean_SSB, Mean_SSW, F_value, NA, P_value, NA), ncol =5)
dimnames(anova) <- list("Group" = c("cultivar", "Residuals"), "ANOVA" = c("DF", "Sum_Sq", "Mean_Sq", "F_value", "P_value"))
printCoefmat(anova, signif.stars = TRUE, has.Pvalue = TRUE, digits = 3, na.print="")
}
oneway.table(alcohol.aov)
我所做的代码在虚拟集群上工作得很好,我能够重现这个方差分析 table:
DF Sum_Sq Mean_Sq F_value P_value
cultivar 2.000 70.795 35.397 135 <2e-16 ***
Residuals 175.000 45.859 0.262
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
但是当我在我的本地机器上 运行 它时,我收到了这个错误信息:
Error in pf(F_value, DF_Between, DF_Within, lower.tail = FALSE) : object 'DF_Between' not found
我知道我的 DF_Between 在我的第二段代码中找不到,但为什么它在集群中工作而不在我的本地机器上工作?
我也重新 运行 我的代码,这次添加了变量的定义:
oneway.table <- function(z)
{
g <- length(z)
N <-length(unlist(z))
DF_Within <- (N - g)
DF_Between <- (g - 1)
Mean_SSW <- z[[1]]/z[[3]]
Mean_SSB <- z[[2]]/z[[4]]
F_value <- (Mean_SSB/Mean_SSW)
P_value <- pf(F_value, DF_Between, DF_Within, lower.tail = FALSE)
anova <- matrix(c( z[[4]], z[[3]], z[[2]], z[[1]], Mean_SSB, Mean_SSW, F_value, NA, P_value, NA), ncol =5)
dimnames(anova) <- list("Group" = c("cultivar", "Residuals"), "ANOVA" = c("DF", "Sum_Sq", "Mean_Sq", "F_value", "P_value"))
printCoefmat(anova, signif.stars = TRUE, has.Pvalue = TRUE, digits = 3, na.print="")
}
oneway.table(alcohol.aov)
但是现在,我的输出是这样的:
ANOVA
Group DF Sum_Sq Mean_Sq F_value P_value
cultivar 2.000 70.795 35.397 135
Residuals 175.000 45.859 0.262
没有明显的等级星或任何P_Value,如果有人能提供帮助,那将不胜感激。
解决方案
这是没有解释的修复方法。
创建一个可重现的例子:
alcohol.list <- list("1"=c(14.2, 13.2),
"2"=c(12.3, 12.3),
"3"=c(12.8, 12.9))
alcohol.list
您未修改的 oneway
函数:
oneway <- function(z)
{
ni <- sapply(z, length)
yi_bar <- sapply(z, mean)
s2i <- sapply(z, sd)
Y_bar <- mean(unlist(z))
g <- length(z)
N <-length(unlist(z))
Within_SS = sum((ni-1) * s2i^2)
Between_SS = sum(ni *((yi_bar)-(Y_bar))^2)
DF_Within = (N - g)
DF_Between = (g - 1)
list("WithinSS" = Within_SS, "BetweenSS"= Between_SS, "DFWithin" = DF_Within, "DFBetween" = DF_Between)
}
alcohol.aov <- oneway(alcohol.list)
最后,你的 oneway.table
和 p.value
:
oneway.table <- function(z)
{
Mean_SSW <- z$WithinSS/z$DFWithin
Mean_SSB <- z$BetweenSS/z$DFBetween
F_value <- (Mean_SSB/Mean_SSW)
P_value <- pf(F_value, z$DFBetween, z$DFWithin, lower.tail = FALSE)
anova <- matrix(c(z[[4]], z[[3]], z[[2]], z[[1]], Mean_SSB, Mean_SSW, F_value, NA, P_value, NA), ncol =5)
dimnames(anova) <- list("Group" = c("cultivar", "Residuals"), "ANOVA" = c("DF", "Sum_Sq", "Mean_Sq", "F_value", "P_value"))
printCoefmat(anova, signif.stars = TRUE, has.Pvalue = TRUE, digits = 3, na.print="")
}
oneway.table(alcohol.aov)
Returns:
DF Sum_Sq Mean_Sq F_value P_value
cultivar 2.000 1.990 0.995 5.91 0.091 .
Residuals 3.000 0.505 0.168
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
说明
在下面的代码中,DF_Between
不是在 pf()
方法调用之前创建的。事实上 DF_Within
也没有创建并且不存在于该范围内。
这可以工作,例如:
# create DF_Between and DF_Within first and pass in all three as arguments
oneway.table <- function(z, DF_Between, DF_Within){
Mean_SSW <- z[[1]]/z[[3]]
Mean_SSB <- z[[2]]/z[[4]]
F_value <- (Mean_SSB/Mean_SSW)
P_value <- pf(F_value, DF_Between, DF_Within, lower.tail = FALSE)
...
}
这也可以工作:
oneway.table <- function(z){
Mean_SSW <- z[[1]]/z[[3]]
Mean_SSB <- z[[2]]/z[[4]]
F_value <- (Mean_SSB/Mean_SSW)
# provided that z is a list with the two elements
P_value <- pf(F_value, z$DF_Between, z$DF_Within, lower.tail = FALSE)
...
}
这也适用:
oneway.table <- function(z){
Mean_SSW <- z[[1]]/z[[3]]
Mean_SSB <- z[[2]]/z[[4]]
F_value <- (Mean_SSB/Mean_SSW)
# create DF_Between and DF_Within directly in here
g <- length(z)
N <-length(unlist(z))
DF_Within <- (N - g)
DF_Between <- (g - 1)
P_value <- pf(F_value, DF_Between, DF_Within, lower.tail = FALSE)
...
}
无论选择哪种方式,您只需要了解 R 使用的词法范围规则。省去冗长乏味的解释,事情是这样的:
The search process that occurs goes as follows:
- If the value of a symbol is not found in the environment in which a
function was defined, then the search is continued in the parent
environment.
- The search continues down the sequence of parent
environments until we hit the top-level environment; this usually the
global environment (workspace) or the namespace of a package.
- After the top-level environment, the search continues down the search list
until we hit the empty environment.
在您本地计算机的环境中,它首先在定义该函数的环境 oneway.table
中搜索 DF_Between
和 DF_Within
。那里没有找到,所以在父环境中搜索 DF_Between
和 DF_Within
,也没有找到,它进入了空环境。
但是在您的集群上,它首先在定义该函数的环境 oneway.table
中搜索 DF_Between
和 DF_Within
。那里没有找到,所以在父环境中搜索 DF_Between
和 DF_Within
并在那里找到了。所以没有出现错误或异常。
您可以通过 运行 ls()
打印并确认 DF_Within
和 DF_Between
确实存在于集群的父环境中,而不是本地机器。
我只是有一个关于我的代码的快速问题。我在虚拟集群 运行 RStudio 上的代码 运行 和我的物理机器上的代码 运行 之间存在一些差异。 我们必须创建一个 R-Markdown 文件才能重现方差分析 table。我运行我的代码在集群上就好了。
这是我的代码:
```{r, message=FALSE, warning=FALSE}
wine <- read.csv("wine.csv")
cultivar <- as.factor( wine[, "Cultivar"])
alcohol <- wine[, "Alcohol"]
alcohol.list <- split(alcohol, cultivar)
alcohol.list
$`1`
[1] 14.23 13.20 13.16 14.37 13.24 14.20 14.39 14.06 14.83 13.86 14.10 14.12 13.75 14.75 14.38 13.63 14.30 13.83 14.19 13.64
[21] 14.06 12.93 13.71 12.85 13.50 13.05 13.39 13.30 13.87 14.02 13.73 13.58 13.68 13.76 13.51 13.48 13.28 13.05 13.07 14.22
[41] 13.56 13.41 13.88 13.24 13.05 14.21 14.38 13.90 14.10 13.94 13.05 13.83 13.82 13.77 13.74 13.56 14.22 13.29 13.72
$`2`
[1] 12.37 12.33 12.64 13.67 12.37 12.17 12.37 13.11 12.37 13.34 12.21 12.29 13.86 13.49 12.99 11.96 11.66 13.03 11.84 12.33
[21] 12.70 12.00 12.72 12.08 13.05 11.84 12.67 12.16 11.65 11.64 12.08 12.08 12.00 12.69 12.29 11.62 12.47 11.81 12.29 12.37
[41] 12.29 12.08 12.60 12.34 11.82 12.51 12.42 12.25 12.72 12.22 11.61 11.46 12.52 11.76 11.41 12.08 11.03 11.82 12.42 12.77
[61] 12.00 11.45 11.56 12.42 13.05 11.87 12.07 12.43 11.79 12.37 12.04
$`3`
[1] 12.86 12.88 12.81 12.70 12.51 12.60 12.25 12.53 13.49 12.84 12.93 13.36 13.52 13.62 12.25 13.16 13.88 12.87 13.32 13.08
[21] 13.50 12.79 13.11 13.23 12.58 13.17 13.84 12.45 14.34 13.48 12.36 13.69 12.85 12.96 13.78 13.73 13.45 12.82 13.58 13.40
[41] 12.20 12.77 14.16 13.71 13.40 13.27 13.17 14.13
oneway <- function(z)
{
ni <- sapply(z, length)
yi_bar <- sapply(z, mean)
s2i <- sapply(z, sd)
Y_bar <- mean(unlist(z))
g <- length(z)
N <-length(unlist(z))
Within_SS = sum((ni-1) * s2i^2)
Between_SS = sum(ni *((yi_bar)-(Y_bar))^2)
DF_Within = (N - g)
DF_Between = (g - 1)
list("WithinSS" = Within_SS, "BetweenSS"= Between_SS, "DFWithin" = DF_Within, "DFBetween" = DF_Between)
}
alcohol.aov <- oneway(alcohol.list)
alcohol.aov
oneway.table <- function(z)
{
Mean_SSW <- z[[1]]/z[[3]]
Mean_SSB <- z[[2]]/z[[4]]
F_value <- (Mean_SSB/Mean_SSW)
P_value <- pf(F_value, DF_Between, DF_Within, lower.tail = FALSE)
anova <- matrix(c( z[[4]], z[[3]], z[[2]], z[[1]], Mean_SSB, Mean_SSW, F_value, NA, P_value, NA), ncol =5)
dimnames(anova) <- list("Group" = c("cultivar", "Residuals"), "ANOVA" = c("DF", "Sum_Sq", "Mean_Sq", "F_value", "P_value"))
printCoefmat(anova, signif.stars = TRUE, has.Pvalue = TRUE, digits = 3, na.print="")
}
oneway.table(alcohol.aov)
我所做的代码在虚拟集群上工作得很好,我能够重现这个方差分析 table:
DF Sum_Sq Mean_Sq F_value P_value
cultivar 2.000 70.795 35.397 135 <2e-16 ***
Residuals 175.000 45.859 0.262
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
但是当我在我的本地机器上 运行 它时,我收到了这个错误信息:
Error in pf(F_value, DF_Between, DF_Within, lower.tail = FALSE) : object 'DF_Between' not found
我知道我的 DF_Between 在我的第二段代码中找不到,但为什么它在集群中工作而不在我的本地机器上工作?
我也重新 运行 我的代码,这次添加了变量的定义:
oneway.table <- function(z)
{
g <- length(z)
N <-length(unlist(z))
DF_Within <- (N - g)
DF_Between <- (g - 1)
Mean_SSW <- z[[1]]/z[[3]]
Mean_SSB <- z[[2]]/z[[4]]
F_value <- (Mean_SSB/Mean_SSW)
P_value <- pf(F_value, DF_Between, DF_Within, lower.tail = FALSE)
anova <- matrix(c( z[[4]], z[[3]], z[[2]], z[[1]], Mean_SSB, Mean_SSW, F_value, NA, P_value, NA), ncol =5)
dimnames(anova) <- list("Group" = c("cultivar", "Residuals"), "ANOVA" = c("DF", "Sum_Sq", "Mean_Sq", "F_value", "P_value"))
printCoefmat(anova, signif.stars = TRUE, has.Pvalue = TRUE, digits = 3, na.print="")
}
oneway.table(alcohol.aov)
但是现在,我的输出是这样的:
ANOVA
Group DF Sum_Sq Mean_Sq F_value P_value
cultivar 2.000 70.795 35.397 135
Residuals 175.000 45.859 0.262
没有明显的等级星或任何P_Value,如果有人能提供帮助,那将不胜感激。
解决方案
这是没有解释的修复方法。
创建一个可重现的例子:
alcohol.list <- list("1"=c(14.2, 13.2),
"2"=c(12.3, 12.3),
"3"=c(12.8, 12.9))
alcohol.list
您未修改的 oneway
函数:
oneway <- function(z)
{
ni <- sapply(z, length)
yi_bar <- sapply(z, mean)
s2i <- sapply(z, sd)
Y_bar <- mean(unlist(z))
g <- length(z)
N <-length(unlist(z))
Within_SS = sum((ni-1) * s2i^2)
Between_SS = sum(ni *((yi_bar)-(Y_bar))^2)
DF_Within = (N - g)
DF_Between = (g - 1)
list("WithinSS" = Within_SS, "BetweenSS"= Between_SS, "DFWithin" = DF_Within, "DFBetween" = DF_Between)
}
alcohol.aov <- oneway(alcohol.list)
最后,你的 oneway.table
和 p.value
:
oneway.table <- function(z)
{
Mean_SSW <- z$WithinSS/z$DFWithin
Mean_SSB <- z$BetweenSS/z$DFBetween
F_value <- (Mean_SSB/Mean_SSW)
P_value <- pf(F_value, z$DFBetween, z$DFWithin, lower.tail = FALSE)
anova <- matrix(c(z[[4]], z[[3]], z[[2]], z[[1]], Mean_SSB, Mean_SSW, F_value, NA, P_value, NA), ncol =5)
dimnames(anova) <- list("Group" = c("cultivar", "Residuals"), "ANOVA" = c("DF", "Sum_Sq", "Mean_Sq", "F_value", "P_value"))
printCoefmat(anova, signif.stars = TRUE, has.Pvalue = TRUE, digits = 3, na.print="")
}
oneway.table(alcohol.aov)
Returns:
DF Sum_Sq Mean_Sq F_value P_value
cultivar 2.000 1.990 0.995 5.91 0.091 .
Residuals 3.000 0.505 0.168
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
说明
在下面的代码中,DF_Between
不是在 pf()
方法调用之前创建的。事实上 DF_Within
也没有创建并且不存在于该范围内。
这可以工作,例如:
# create DF_Between and DF_Within first and pass in all three as arguments
oneway.table <- function(z, DF_Between, DF_Within){
Mean_SSW <- z[[1]]/z[[3]]
Mean_SSB <- z[[2]]/z[[4]]
F_value <- (Mean_SSB/Mean_SSW)
P_value <- pf(F_value, DF_Between, DF_Within, lower.tail = FALSE)
...
}
这也可以工作:
oneway.table <- function(z){
Mean_SSW <- z[[1]]/z[[3]]
Mean_SSB <- z[[2]]/z[[4]]
F_value <- (Mean_SSB/Mean_SSW)
# provided that z is a list with the two elements
P_value <- pf(F_value, z$DF_Between, z$DF_Within, lower.tail = FALSE)
...
}
这也适用:
oneway.table <- function(z){
Mean_SSW <- z[[1]]/z[[3]]
Mean_SSB <- z[[2]]/z[[4]]
F_value <- (Mean_SSB/Mean_SSW)
# create DF_Between and DF_Within directly in here
g <- length(z)
N <-length(unlist(z))
DF_Within <- (N - g)
DF_Between <- (g - 1)
P_value <- pf(F_value, DF_Between, DF_Within, lower.tail = FALSE)
...
}
无论选择哪种方式,您只需要了解 R 使用的词法范围规则。省去冗长乏味的解释,事情是这样的:
The search process that occurs goes as follows:
- If the value of a symbol is not found in the environment in which a function was defined, then the search is continued in the parent environment.
- The search continues down the sequence of parent environments until we hit the top-level environment; this usually the global environment (workspace) or the namespace of a package.
- After the top-level environment, the search continues down the search list until we hit the empty environment.
在您本地计算机的环境中,它首先在定义该函数的环境 oneway.table
中搜索 DF_Between
和 DF_Within
。那里没有找到,所以在父环境中搜索 DF_Between
和 DF_Within
,也没有找到,它进入了空环境。
但是在您的集群上,它首先在定义该函数的环境 oneway.table
中搜索 DF_Between
和 DF_Within
。那里没有找到,所以在父环境中搜索 DF_Between
和 DF_Within
并在那里找到了。所以没有出现错误或异常。
您可以通过 运行 ls()
打印并确认 DF_Within
和 DF_Between
确实存在于集群的父环境中,而不是本地机器。