新手在 R studio 中尝试线性混合效果模型 - 完全失败

Newbie attempting linear mixed effects model in R studio - TOTAL FAIL

经过一个多小时的搜索(本论坛、Youtube、class 注释、google),我的问题没有找到任何帮助。我是一个对 R 或统计一无所知的新手。

我正在尝试在 R 中创建一个线性混合效应模型。我正在测量三个不同位置(佛罗里达州杰克逊维尔、佐治亚州奥古斯塔和佐治亚州亚特兰大)的叶宽,并且在这三个位置内有一个高-氮肥和低氮地块。我有 50 棵树的 150 个叶子测量值。

我有限的理解告诉我叶宽是连续的响应变量,城市和地块是离散的解释变量。 运行dom 效应将是单个树,因为单个树中的叶子宽度是非独立的。

我用"nlme"做了一个模型:

leaf.width.model <- lme(width ~ city*plot, (1|tree.id), data=leaf)

然后我 运行 进行了方差分析测试,它表明城市和城市与地块之间的相互作用发生了一些变化。这就是我被困的地方。我想制作一个包含所有三个城市的线路的地块,但我不知道该怎么做。当我尝试使用 plot 函数时,我只会得到一个箱线图。

我确实尝试了几个小时,但比以前更加迷茫和困惑。

1) 如何制作这张图?

2) 我还应该做哪些其他测试来分析 and/or 可视化此数据?

我永远感激任何帮助。非常想学R和stats,但是越来越气馁

谢谢,

富有

P.S 这里是 dput 函数的输出:

> dput(tree) structure(list(tree.id = structure(c(24L, 24L, 32L, 25L, 25L, 24L, 24L, 32L, 25L, 25L, 43L, 45L, 45L, 43L, 23L, 23L, 45L, 45L, 23L, 23L, 41L, 41L, 38L, 11L, 11L, 38L, 41L, 41L, 11L, 11L, 14L, 14L, 29L, 13L, 13L, 14L, 14L, 29L, 13L, 13L, 4L, 4L, 1L, 1L, 20L, 1L, 1L, 20L, 6L, 8L, 8L, 5L, 5L, 6L, 4L, 4L, 8L, 8L, 5L, 5L, 9L, 9L, 10L, 10L, 12L, 12L, 13L, 13L, 22L, 22L, 23L, 23L, 24L, 24L, 25L, 25L, 25L, 25L, 40L, 40L, 41L, 41L, 38L, 38L, 39L, 39L, 14L, 14L, 14L, 15L, 15L, 28L, 28L, 29L, 29L, 35L, 35L, 36L, 36L, 37L, 37L, 42L, 42L, 43L, 43L, 44L, 44L, 45L, 45L, 46L, 46L, 47L, 47L, 2L, 1L, 3L, 3L, 4L, 4L, 7L, 11L, 11L, 16L, 16L, 20L, 20L, 21L, 21L, 17L, 17L, 18L, 18L, 19L, 19L, 26L, 26L, 27L, 27L, 30L, 30L, 31L, 31L, 32L, 32L, 33L, 33L, 34L, 34L, 48L), .Label = c("Tree_112", "Tree_112 ", "Tree_115", "Tree_130", "Tree_137", "Tree_139", "Tree_140", "Tree_141", "Tree_153", "Tree_154", "Tree_156", "Tree_159", "Tree_166", "Tree_169", "Tree_171", "Tree_180", "Tree_182", "Tree_184", "Tree_185", "Tree_202", "Tree_213", "Tree_218", "Tree_222", "Tree_227", "Tree_239", "Tree_242", "Tree_246", "Tree_247", "Tree_252", "Tree_260", "Tree_267", "Tree_269", "Tree_271", "Tree_272", "Tree_291", "Tree_293", "Tree_298", "Tree_327", "Tree_329", "Tree_336", "Tree_350", "Tree_401", "Tree_403", "Tree_405", "Tree_407", "Tree_409", "Tree_420", "Tree_851"), class = "factor"), city = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Atlanta", "Augusta", "Jacksonville"), class = "factor"),     plot = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("High-N", "Low-N"), class = "factor"), width = c(0.66, 0.716, 0.682, 0.645, 0.645, 0.696, 0.733,
0.707, 0.668, 0.686, 0.617, 0.733, 0.73, 0.615, 0.669, 0.746, 0.687, 0.682, 0.76, 0.713, 0.651, 0.664, 0.679, 0.729, 0.756, 
0.669, 0.647, 0.713, 0.767, 0.685, 0.69, 0.731, 0.781, 0.729, 
0.725, 0.739, 0.769, 0.791, 0.676, 0.688, 0.719, 0.753, 0.748, 
0.791, 0.785, 0.78, 0.723, 0.756, 0.664, 0.645, 0.653, 0.615, 
0.591, 0.642, 0.693, 0.716, 0.694, 0.676, 0.662, 0.629, 0.665, 
0.748, 0.726, 0.693, 0.715, 0.714, 0.764, 0.732, 0.61, 0.721, 
0.703, 0.713, 0.746, 0.752, 0.662, 0.733, 0.707, 0.674, 0.734, 
0.79, 0.732, 0.794, 0.703, 0.712, 0.737, 0.731, 0.747, 0.746, 
0.787, 0.709, 0.716, 0.764, 0.77, 0.764, 0.802, 0.663, 0.777, 
0.642, 0.779, 0.81, 0.724, 0.645, 0.68, 0.637, 0.695, 0.768, 
0.761, 0.7, 0.759, 0.726, 0.696, 0.794, 0.774, 0.799, 0.747, 
0.606, 0.691, 0.733, 0.707, 0.698, 0.706, 0.72, 0.694, 0.697, 
0.737, 0.716, 0.73, 0.706, 0.667, 0.734, 0.528, 0.695, 0.684, 
0.763, 0.733, 0.809, 0.6, 0.676, 0.718, 0.759, 0.609, 0.665, 
0.667, 0.647, 0.701, 0.663, 0.688, 0.693, 0.899)), .Names = c("tree.id", "city", "plot", "width"), class = "data.frame", row.names = c(NA, -149L))

非常感谢大家的意见,衷心感谢大家的帮助!

正如评论中所建议的那样,线图可能对您的数据没有意义,因为您正在研究宽度在离散类别中的变化方式(在不同的城市和不同的地块中)。箱线图很有意义,因为您可以为城市和地块的每个交互制作它们。为了让您了解您可以做什么,我生成了一些假数据并制作了一个可能对您有帮助的情节示例:

# fake data
leaf <- data.frame(tree.id = rep(1:50, each = 3), 
                   city = rep(c("Jackson", "Augusta", "Atlanta"), each = 50),
                   plot = rep(1:6, each = 25))
# I'll make the average of width different for each plot
leaf$width <- rnorm(nrow(leaf), leaf$plot, 1)

# plotting the data
library(ggplot2) # this is a great library for plotting in R

ggplot(leaf, aes(x = factor(plot), y = width, color = factor(plot))) + 
  facet_grid(~city, scales = 'free_x') + # This creates a subplot for each city
  geom_boxplot() + 
  geom_point(position = "jitter") + 
  theme_bw()

在这个图中,我添加了点(每棵树的叶子宽度),但我 'jittered' 它们,这意味着稍微扰动它们的位置,这样它们就不会堆积在一起,并且都是可见的。如果您愿意,可以将其删除。

探索性数据分析应该很有趣!而且我认为可视化是开始统计的好地方。希望这对您有所帮助。

leaf.width.model <- lme(width ~ city*plot, (1|tree.id), data=leaf)

在这个模型中,如果你想绘制一些东西,你可能正在尝试回答: 对于每种地块,每个城市中所有树木的平均叶子宽度是多少。

要在图中显示此信息,您需要在 y 轴上绘制 width 在 x 轴上绘制 plot(高氮和低氮)并将数据分组 city .然后你会得到你正在走的3条线。但是,您需要获取每个组的平均宽度,因为您只想显示城市差异。

从原始数据中获取此图:(使用 gfgm 提供的假数据)

set.seed(100)
leaf <- data.frame(tree.id = rep(1:50, each = 3), 
                   city = rep(c("Jackson", "Augusta", "Atlanta"), each = 50),
                   plot = rep(c(1, 0), each = 25))

# I'll make the average of width different for each plot
leaf$width <- rnorm(nrow(leaf), leaf$plot, 1)

library(plotly)
library(tidyverse)
leaf %>% 
  group_by(city,plot) %>% 
  summarise(avwidth = mean(width, na.rm=T),
            avsd = 1.96*sd(width, na.rm=T)/sqrt(25)) %>%
  plot_ly(x = ~plot, y = ~avwidth, color= ~city, 
          type="scatter", mode="markers+lines", 
          error_y = ~list(array=avsd)
          )