在没有 value.var 2 列的情况下重塑或 dcast 长到宽
reshape or dcast long to wide with no value.var 2 columns
我有一个包含 2 列的 data.frame df,这里显示了前六行,但它有更多的块序列,每个跨越 3 行:
blocksequenceid description
M049-S215-S085 ECDTM-49
M049-S215-S085 ICD-215
M049-S215-S085 ICD-85
M049-S213-S044 ECDTM-49
M049-S213-S044 ICD-213
M049-S213-S044 ICD-44
我想将其转换成这种格式:
blocksequenceid description1 description2 description3
M049-S215-S085 ECDTM-49 ICD-215 ICD-85
M049-S213-S044 ECDTM-49 ICD-213 ICD-44
我考虑过 dcast 和 reshape,但是当它说 ERROR: column time not found
用于 reshape 时我不知道该怎么做,而且我不确定 dcast 是否适合在这里使用。这是我试过的:
reshape(df, idvar='blocksequenceid', timevar = 'description', direction = 'wide')
reshape(df, idvar='blocksequenceid', v.names = 'description', direction = 'wide')
我确定这很简单,但我缺少一些东西。
这是可重现的数据。
t <- 'blocksequenceid description
M049-S215-S085 ECDTM-49
M049-S215-S085 ICD-215
M049-S215-S085 ICD-85'
df <- read.table(text = t, header = T)
这是一个可能的解决方案。
library(tidyverse)
df %>%
rename(description1 = description) %>%
mutate(description = row_number()) %>%
spread(description, description1, sep = "")
# blocksequenceid description1 description2 description3
# 1 M049-S215-S085 ECDTM-49 ICD-215 ICD-85
编辑修改后的数据
t <- 'blocksequenceid description
M049-S215-S085 ECDTM-49
M049-S215-S085 ICD-215
M049-S215-S085 ICD-85
M049-S213-S044 ECDTM-49
M049-S213-S044 ICD-213
M049-S213-S044 ICD-44'
df <- read.table(text = t, header = T)
在更新的数据中,你应该先group_by(blocksequenceid)
。
library(tidyverse)
df %>%
rename(description1 = description) %>%
group_by(blocksequenceid) %>%
mutate(description = row_number()) %>%
spread(description, description1, sep = "")
# # A tibble: 2 x 4
# # Groups: blocksequenceid [2]
# blocksequenceid description1 description2 description3
# <chr> <chr> <chr> <chr>
# 1 M049-S213-S044 ECDTM-49 ICD-213 ICD-44
# 2 M049-S215-S085 ECDTM-49 ICD-215 ICD-85
我有一个包含 2 列的 data.frame df,这里显示了前六行,但它有更多的块序列,每个跨越 3 行:
blocksequenceid description
M049-S215-S085 ECDTM-49
M049-S215-S085 ICD-215
M049-S215-S085 ICD-85
M049-S213-S044 ECDTM-49
M049-S213-S044 ICD-213
M049-S213-S044 ICD-44
我想将其转换成这种格式:
blocksequenceid description1 description2 description3
M049-S215-S085 ECDTM-49 ICD-215 ICD-85
M049-S213-S044 ECDTM-49 ICD-213 ICD-44
我考虑过 dcast 和 reshape,但是当它说 ERROR: column time not found
用于 reshape 时我不知道该怎么做,而且我不确定 dcast 是否适合在这里使用。这是我试过的:
reshape(df, idvar='blocksequenceid', timevar = 'description', direction = 'wide')
reshape(df, idvar='blocksequenceid', v.names = 'description', direction = 'wide')
我确定这很简单,但我缺少一些东西。
这是可重现的数据。
t <- 'blocksequenceid description
M049-S215-S085 ECDTM-49
M049-S215-S085 ICD-215
M049-S215-S085 ICD-85'
df <- read.table(text = t, header = T)
这是一个可能的解决方案。
library(tidyverse)
df %>%
rename(description1 = description) %>%
mutate(description = row_number()) %>%
spread(description, description1, sep = "")
# blocksequenceid description1 description2 description3
# 1 M049-S215-S085 ECDTM-49 ICD-215 ICD-85
编辑修改后的数据
t <- 'blocksequenceid description
M049-S215-S085 ECDTM-49
M049-S215-S085 ICD-215
M049-S215-S085 ICD-85
M049-S213-S044 ECDTM-49
M049-S213-S044 ICD-213
M049-S213-S044 ICD-44'
df <- read.table(text = t, header = T)
在更新的数据中,你应该先group_by(blocksequenceid)
。
library(tidyverse)
df %>%
rename(description1 = description) %>%
group_by(blocksequenceid) %>%
mutate(description = row_number()) %>%
spread(description, description1, sep = "")
# # A tibble: 2 x 4
# # Groups: blocksequenceid [2]
# blocksequenceid description1 description2 description3
# <chr> <chr> <chr> <chr>
# 1 M049-S213-S044 ECDTM-49 ICD-213 ICD-44
# 2 M049-S215-S085 ECDTM-49 ICD-215 ICD-85