为多个变量绘制多个斜率图(使用 for 循环)(面临关键冗余问题)
Plotting multiple slope plot's for multiple variables (With a for loop) (Facing issues with key redundancy)
尽管我设法用假数据绘制了一个多斜率图(参见下面的可重现示例),但我在设法使代码适应我的真实数据时遇到了麻烦,并且由于密钥冗余而不断面临错误。
首先,一些上下文:我有一个包含许多“_x”和“_y”变量的数据集,它们是时间 1 和 2 的测量值 - 记录在一列中,因为每个条目都有时间 1 和时间 2 - 和我想为每个人绘制我的斜率,为每个变量(变量对)绘制一个图。
在一些帮助下,我设法为以下可重现示例中的一组变量执行了此操作,没有“_x”或“_y”列名。然而,当我尝试使用 selects 调整此代码时 - 为了只采用这些列而不是所有数据集 - 更改 colnames 以模仿示例,更改正则表达式等。我一直面临键冗余的错误。
"Error in spread()
:
! Each row of output must be identified by a unique combination of keys.
Keys are shared for 195 rows:"
我怀疑这是因为我的数据中确实有一些相同的值,但使用列 ID 应该不是问题,我不太明白我能做些什么来解决它.
foo 示例:
library(tidyverse)
Id <- rep(1:10)
a = c(5,10,15,12,13,25,12,13,11,9)
b = c(8,14,20,13,19,29,15,19,20,11)
c = c(10,14,20,1.5,9,21,13,21,11,10)
d = c(15,9,20,14,12,5,12,13,12,30)
group = as.factor( rep(1:2,each=5) )
data = data.frame(Id,a,b,c,d,group)
case_mapping <- data.frame(
key = c("a", "b", "c", "d"),
key2 = c("x1", "x2", "y1", "y2")
)
data %>%
gather(key, val, c(a:d)) %>%
left_join(case_mapping, by = "key") %>%
select(-key) %>%
extract(key2, into = c("key", "order"), "([a-z])([0-9])") %>%
spread(key, val) %>%
ggplot() +
aes(x, y, group = Id, color = group) + xlab("Age")+ #ggtitle(paste("Variable")+
geom_point() +
geom_line()
现在是我的数据示例。
library(tidyverse)
Id <- rep(1:10)
var1_x = c(5,10,15,12,13,25,12,13,11,9)
var2_x = c(8,14,20,13,19,29,NA,19,20,11) # just adding some nas.
var3_x = c(10,14,20,1.5,9,21,13,21,11,10)
var1_y = var1_x+3
var2_y = var2_x*2
var3_y = c(10,14,20,1.5,9,21,13,21,11,10) #same, just to see.
age1 = c(15,9,20,14,12,5,12,13,12,30)
age2 = c(18,19,24,16,15,9,16,19,14,37)
group = as.factor( rep(1:2,each=5) )
data = data.frame(Id,var1_x,var2_x,var3_x, var1_y,var2_y,var3_y,age1,age2,group)
现在,我是否应该创建一个 for 循环,以便我可以正确配对变量。
首先我们创建两个字符串,colnames _x 和 _y
sub_x = colnames(data)[2:4] # sub x
sub_y = colnames(data)[5:7] # suby
现在我们应该可以实现 for 循环了。
for( i in 1:length(sub_x)) {
# We define the matching keys.
case_mapping <- data.frame(
key = c(sub_x[i],sub_y[i], "age1", "age2"),
key2 = c("x1", "x2", "y1", "y2")
)
# And now we should be able to plot this.
data %>%
gather(key, val, c(!!sym(sub_x[i]), !!sym(sub_y[i]), age1,age2 )) %>%
left_join(case_mapping, by = "key") %>%
select(-key) %>%
extract(key2, into = c("key", "order"), "([a-z])([0-9])") %>%
spread(key, val) %>%
ggplot() +
aes(x, y, group = Id, color = group) +
xlab("Age")+
geom_point() +
geom_line()
}
但这并没有给我任何结果,当我尝试调整它时,它会因收集而抛出错误。希望大家多多指教,明白我做错了什么。
PD:抱歉,如果我的语法不完全正确,但英语是我的第二语言。
编辑澄清:
我打算为每个变量绘制类似的东西 - 如果有一种方法可以指示每个斜率的 ID,那将非常好,这样我就不必从数据中查找它来查看他们对应的)
编辑 2
在 Tjebo 的帮助下,我有点“解决了它”,但我仍然需要通过 dplyr 自动化从提供的 data_long1 构建这个 data_long2。
data_long2 <- data.frame( Id = rep(data_long$Id,2), Group = rep(data_long$group,2), Var= rep(data_long$var,2) , Valueage= c(data_long$age1,data_long$age2), Valuevar= c(data_long$x,data_long$y) )
ggplot(data_long2) +
## I've removed the grouping by ID, because there was only one observation per ID
aes(Valueage, Valuevar, color=Id) +
geom_point() +
geom_line(aes(group = Id))+
# geom_line() +
## you can for example facet by your new variable column
facet_grid(~Var)
#> Warning: Removed 1 rows containing missing values (geom_point).
并更改颜色组
我认为你可能把事情复杂化了。据我了解,您很难重塑数据然后绘制所有变量,对吗?
下面是一种使用 new-ish pivot_longer 进行重塑的方法(它具有惊人的功能,尤其是在“多次聚会”方面),然后分面而不是循环。
更新
你基本上需要旋转更长的时间两次
library(tidyverse)
Id <- rep(1:10)
var1_x = c(5,10,15,12,13,25,12,13,11,9)
var2_x = c(8,14,20,13,19,29,NA,19,20,11) # just adding some nas.
var3_x = c(10,14,20,1.5,9,21,13,21,11,10)
var1_y = var1_x+3
var2_y = var2_x*2
var3_y = c(10,14,20,1.5,9,21,13,21,11,10) #same, just to see.
age1 = c(15,9,20,14,12,5,12,13,12,30)
age2 = c(18,19,24,16,15,9,16,19,14,37)
group = as.factor( rep(1:2,each=5) )
data = data.frame(Id,var1_x,var2_x,var3_x, var1_y,var2_y,var3_y,age1,age2,group)
data_long <-
data %>%
## make use of the cool pivot_longer function
pivot_longer(cols = matches("_[x|y]"),
names_to = c("var", ".value"),
names_pattern = "(.*)_(.*)") %>%
## now make even longer! all y (currently confusingly called x and y) belong into one column
## and all x (currently called age1 and age2) in another column
## this is easier with a similar pattern in both, therefore renaming
## note the .value name is switched when compared with the first pivoting
rename(y1= x, y2 = y) %>%
pivot_longer(
matches(".*([1-2])"),
names_to = c(".value", "set"),
names_pattern = "(.+)([0-9+])"
)
ggplot(data_long) +
## I've removed the grouping by ID, because there was only one observation per ID
aes(age, y, color = as.character(Id)) +
geom_point() +
geom_line() +
## you can for example facet by your new variable column
facet_grid(~var)
#> Warning: Removed 2 rows containing missing values (geom_point).
在循环中分别创建每个绘图:
## split by your new variable and run a loop to create a list of plots
ls_p <- lapply(split(data_long, data_long$var), function(.x){
ggplot(.x) +
## I've removed the grouping by ID, because there was only one observation per ID
aes(age, y, color = as.character(Id)) +
geom_point() +
geom_line() +
## you can for example facet by your new variable column
facet_grid(~var)
} )
## you can then either print them separately or all together, e.g. with patchwork
patchwork::wrap_plots(ls_p) + patchwork::plot_layout(ncol = 1)
#> Warning: Removed 2 rows containing missing values (geom_point).
#> Warning: Removed 2 row(s) containing missing values (geom_path).
由 reprex package (v2.0.1)
于 2022-05-31 创建
尽管我设法用假数据绘制了一个多斜率图(参见下面的可重现示例),但我在设法使代码适应我的真实数据时遇到了麻烦,并且由于密钥冗余而不断面临错误。
首先,一些上下文:我有一个包含许多“_x”和“_y”变量的数据集,它们是时间 1 和 2 的测量值 - 记录在一列中,因为每个条目都有时间 1 和时间 2 - 和我想为每个人绘制我的斜率,为每个变量(变量对)绘制一个图。
在一些帮助下,我设法为以下可重现示例中的一组变量执行了此操作,没有“_x”或“_y”列名。然而,当我尝试使用 selects 调整此代码时 - 为了只采用这些列而不是所有数据集 - 更改 colnames 以模仿示例,更改正则表达式等。我一直面临键冗余的错误。
"Error in
spread()
:
! Each row of output must be identified by a unique combination of keys.
Keys are shared for 195 rows:"
我怀疑这是因为我的数据中确实有一些相同的值,但使用列 ID 应该不是问题,我不太明白我能做些什么来解决它.
foo 示例:
library(tidyverse)
Id <- rep(1:10)
a = c(5,10,15,12,13,25,12,13,11,9)
b = c(8,14,20,13,19,29,15,19,20,11)
c = c(10,14,20,1.5,9,21,13,21,11,10)
d = c(15,9,20,14,12,5,12,13,12,30)
group = as.factor( rep(1:2,each=5) )
data = data.frame(Id,a,b,c,d,group)
case_mapping <- data.frame(
key = c("a", "b", "c", "d"),
key2 = c("x1", "x2", "y1", "y2")
)
data %>%
gather(key, val, c(a:d)) %>%
left_join(case_mapping, by = "key") %>%
select(-key) %>%
extract(key2, into = c("key", "order"), "([a-z])([0-9])") %>%
spread(key, val) %>%
ggplot() +
aes(x, y, group = Id, color = group) + xlab("Age")+ #ggtitle(paste("Variable")+
geom_point() +
geom_line()
现在是我的数据示例。
library(tidyverse)
Id <- rep(1:10)
var1_x = c(5,10,15,12,13,25,12,13,11,9)
var2_x = c(8,14,20,13,19,29,NA,19,20,11) # just adding some nas.
var3_x = c(10,14,20,1.5,9,21,13,21,11,10)
var1_y = var1_x+3
var2_y = var2_x*2
var3_y = c(10,14,20,1.5,9,21,13,21,11,10) #same, just to see.
age1 = c(15,9,20,14,12,5,12,13,12,30)
age2 = c(18,19,24,16,15,9,16,19,14,37)
group = as.factor( rep(1:2,each=5) )
data = data.frame(Id,var1_x,var2_x,var3_x, var1_y,var2_y,var3_y,age1,age2,group)
现在,我是否应该创建一个 for 循环,以便我可以正确配对变量。 首先我们创建两个字符串,colnames _x 和 _y
sub_x = colnames(data)[2:4] # sub x
sub_y = colnames(data)[5:7] # suby
现在我们应该可以实现 for 循环了。
for( i in 1:length(sub_x)) {
# We define the matching keys.
case_mapping <- data.frame(
key = c(sub_x[i],sub_y[i], "age1", "age2"),
key2 = c("x1", "x2", "y1", "y2")
)
# And now we should be able to plot this.
data %>%
gather(key, val, c(!!sym(sub_x[i]), !!sym(sub_y[i]), age1,age2 )) %>%
left_join(case_mapping, by = "key") %>%
select(-key) %>%
extract(key2, into = c("key", "order"), "([a-z])([0-9])") %>%
spread(key, val) %>%
ggplot() +
aes(x, y, group = Id, color = group) +
xlab("Age")+
geom_point() +
geom_line()
}
但这并没有给我任何结果,当我尝试调整它时,它会因收集而抛出错误。希望大家多多指教,明白我做错了什么。
PD:抱歉,如果我的语法不完全正确,但英语是我的第二语言。
编辑澄清:
我打算为每个变量绘制类似的东西 - 如果有一种方法可以指示每个斜率的 ID,那将非常好,这样我就不必从数据中查找它来查看他们对应的)
编辑 2 在 Tjebo 的帮助下,我有点“解决了它”,但我仍然需要通过 dplyr 自动化从提供的 data_long1 构建这个 data_long2。
data_long2 <- data.frame( Id = rep(data_long$Id,2), Group = rep(data_long$group,2), Var= rep(data_long$var,2) , Valueage= c(data_long$age1,data_long$age2), Valuevar= c(data_long$x,data_long$y) )
ggplot(data_long2) +
## I've removed the grouping by ID, because there was only one observation per ID
aes(Valueage, Valuevar, color=Id) +
geom_point() +
geom_line(aes(group = Id))+
# geom_line() +
## you can for example facet by your new variable column
facet_grid(~Var)
#> Warning: Removed 1 rows containing missing values (geom_point).
并更改颜色组
我认为你可能把事情复杂化了。据我了解,您很难重塑数据然后绘制所有变量,对吗?
下面是一种使用 new-ish pivot_longer 进行重塑的方法(它具有惊人的功能,尤其是在“多次聚会”方面),然后分面而不是循环。
更新
你基本上需要旋转更长的时间两次
library(tidyverse)
Id <- rep(1:10)
var1_x = c(5,10,15,12,13,25,12,13,11,9)
var2_x = c(8,14,20,13,19,29,NA,19,20,11) # just adding some nas.
var3_x = c(10,14,20,1.5,9,21,13,21,11,10)
var1_y = var1_x+3
var2_y = var2_x*2
var3_y = c(10,14,20,1.5,9,21,13,21,11,10) #same, just to see.
age1 = c(15,9,20,14,12,5,12,13,12,30)
age2 = c(18,19,24,16,15,9,16,19,14,37)
group = as.factor( rep(1:2,each=5) )
data = data.frame(Id,var1_x,var2_x,var3_x, var1_y,var2_y,var3_y,age1,age2,group)
data_long <-
data %>%
## make use of the cool pivot_longer function
pivot_longer(cols = matches("_[x|y]"),
names_to = c("var", ".value"),
names_pattern = "(.*)_(.*)") %>%
## now make even longer! all y (currently confusingly called x and y) belong into one column
## and all x (currently called age1 and age2) in another column
## this is easier with a similar pattern in both, therefore renaming
## note the .value name is switched when compared with the first pivoting
rename(y1= x, y2 = y) %>%
pivot_longer(
matches(".*([1-2])"),
names_to = c(".value", "set"),
names_pattern = "(.+)([0-9+])"
)
ggplot(data_long) +
## I've removed the grouping by ID, because there was only one observation per ID
aes(age, y, color = as.character(Id)) +
geom_point() +
geom_line() +
## you can for example facet by your new variable column
facet_grid(~var)
#> Warning: Removed 2 rows containing missing values (geom_point).
在循环中分别创建每个绘图:
## split by your new variable and run a loop to create a list of plots
ls_p <- lapply(split(data_long, data_long$var), function(.x){
ggplot(.x) +
## I've removed the grouping by ID, because there was only one observation per ID
aes(age, y, color = as.character(Id)) +
geom_point() +
geom_line() +
## you can for example facet by your new variable column
facet_grid(~var)
} )
## you can then either print them separately or all together, e.g. with patchwork
patchwork::wrap_plots(ls_p) + patchwork::plot_layout(ncol = 1)
#> Warning: Removed 2 rows containing missing values (geom_point).
#> Warning: Removed 2 row(s) containing missing values (geom_path).
由 reprex package (v2.0.1)
于 2022-05-31 创建