tidyr::spread() 函数抛出错误

Question

我尝试在 tidyverse 包中使用 gather 和 spread 函数，但它在 spread 函数中抛出错误

库（插入符号）

dataset<-iris

# gather function is to convert wide data to long data

dataset_gather<-dataset %>% tidyr::gather(key=Type,value = Values,1:4)

head(dataset_gather)

# spead is the opposite of gather

下面的代码抛出这样的错误错误：行的重复标识符

dataset_spead<- dataset_gather%>%tidyr::spread(key = Type,value = Values)

Answer 1

我们可以用 data.table

library(data.table)
dcast(melt(setDT(dataset, keep.rownames = TRUE), id.var = c("rn", "Species")), rn + Species ~ variable)

Answer 2

稍后添加：抱歉@alistaire，在 post 收到此回复后才看到您对原始 post 的评论。

据我了解Error: Duplicate identifiers for rows...，当您的值具有相同的标识符时，就会发生这种情况。例如，在原始 'iris' 数据集中，Species = setosa 的前五行的 Petal.Width 均为 0.2 , 三行 Petal.Length 的值为 1.4。收集这些数据不是问题，但是当您尝试传播它们时，函数不知道什么属于什么。即0.2Petal.Width和1.4Petal.Length属于setosa[=的哪一行30=].

我在这些情况下使用的（tidyverse）解决方案是在收集阶段为每一行数据创建一个唯一的标记，以便该函数可以在您想要再次传播时跟踪哪些重复数据属于哪些行.请参见下面的示例：

# Load packages library(dplyr) library(tidyr) # Get data dataset <- iris # View dataset head(dataset) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.2 setosa #> 2 4.9 3.0 1.4 0.2 setosa #> 3 4.7 3.2 1.3 0.2 setosa #> 4 4.6 3.1 1.5 0.2 setosa #> 5 5.0 3.6 1.4 0.2 setosa #> 6 5.4 3.9 1.7 0.4 setosa # Gather data dataset_gathered <- dataset %>% # Create a unique identifier for each row mutate(marker = row_number(Species)) %>% # Gather the data gather(key = Type, value = Values, 1:4) # View gathered data head(dataset_gathered) #> Species marker Type Values #> 1 setosa 1 Sepal.Length 5.1 #> 2 setosa 2 Sepal.Length 4.9 #> 3 setosa 3 Sepal.Length 4.7 #> 4 setosa 4 Sepal.Length 4.6 #> 5 setosa 5 Sepal.Length 5.0 #> 6 setosa 6 Sepal.Length 5.4 # Spread it out again dataset_spread <- dataset_gathered %>% # Group the data by the marker group_by(marker) %>% # Spread it out again spread(key = Type, value = Values) %>% # Not essential, but remove marker ungroup() %>% select(-marker) # View spread data head(dataset_spread) #> # A tibble: 6 x 5 #> Species Petal.Length Petal.Width Sepal.Length Sepal.Width #> <fctr> <dbl> <dbl> <dbl> <dbl> #> 1 setosa 1.4 0.2 5.1 3.5 #> 2 setosa 1.4 0.2 4.9 3.0 #> 3 setosa 1.3 0.2 4.7 3.2 #> 4 setosa 1.5 0.2 4.6 3.1 #> 5 setosa 1.4 0.2 5.0 3.6 #> 6 setosa 1.7 0.4 5.4 3.9

（一如既往，感谢 Jenny Bryan 的 reprex 包）

tidyr::spread() 函数抛出错误

tidyr::spread() function throws an error

r

syntax-error

spread