如何将函数应用于列的每个单元格?
How do you apply a function to each cell of a column?
我正在使用以下完美运行的函数来解析文本数据,以查找患者病历中动脉狭窄的百分比。
txt <- "Small caliber RCA with 50% proximal and 70% mid stenoses."
coronary_anatomy <- function(x) {
# Check if sentence
if(!is.character(x)) {stop("Requires character string", call. = FALSE)}
# Establish variables
epicardial <- c("LM", "LAD", "LCX", "RCA")
mods <- c("proximal", "mid", "distal", "ostial")
sentence <-
tibble(line = 1, sentence = x) %>%
tidytext::unnest_tokens(input = sentence, output = word, to_lower = FALSE) %>%
pull(word)
# Identify number/locations of disease
artery <- sentence[which(sentence %in% epicardial)]
locs <- grep("\d+", sentence)
mlocs <- which(sentence %in% mods)
# Find the nearest neighbors to identify which modifier goes with which location
space <- combn(mlocs, length(locs))
dist <- apply(space, 2, function(x) {sum(abs(locs - x))})
matched <- space[, which.min(dist)]
tbl <-
tibble(
anatomy = paste(sentence[matched], artery),
stenosis = as.numeric(sentence[locs])
)
# Return
return(tbl)
}
# Test it out
coronary_anatomy(txt)
Output:
# A tibble: 2 x 2
anatomy stenosis
<chr> <dbl>
1 proximal RCA 50
2 mid RCA 70
代码运行良好。但现在我 运行 正在研究更大规模应用它的问题。我想将此代码应用于包含一整列患者病历的数据框。我想要运行函数通过的数据框的简化数据框如下所示。
# A tibble: 2 x 2
PatientID Records
<chr> <chr>
1 1234 Small caliber RCA with 50% proximal and 70% mid stenoses
2 1235 Small caliber LCX with 40% proximal and 70% mid stenoses
那么问题来了。我想以某种方式 运行 通过整个记录列使用此功能。但是,运行使用此函数(如上所示)会输出一个小标题,其大小会根据可用于解析的信息量而有所不同。
有没有比我聪明的人知道如何运行这个函数通过包含医疗记录的数据table列中的每个单元格,并以有组织的方式输出它,假设输出是小标题?
如果速度不是问题,您可以使用 lapply
或 purrr::map
函数(甚至是 for 循环)遍历每一行数据,将每个小标题结果保存在a list
,然后将 tibbles 列表组合成一个漂亮的大 tibble 以供使用。例如,
# dplyr and lapply
result_list = lapply(your_data$Records, coronary_anatomy)
names(result_list) = your_data$PatientID
result_tbl = bind_rows(result_list, .id = "PatientID")
result_tbl
# # A tibble: 4 x 3
# PatientID anatomy stenosis
# <chr> <chr> <dbl>
# 1 1234 proximal RCA 50
# 2 1234 mid RCA 70
# 3 1235 proximal LCX 40
# 4 1235 mid LCX 70
如果您使用的是 dplyr
1.0 或更高版本,您也可以简单地使用 group_by
和 summarize
:
your_data %>%
group_by(PatientID) %>%
summarize(coronary_anatomy(Records))
# `summarise()` regrouping output by 'PatientID' (override with `.groups` argument)
# # A tibble: 4 x 3
# # Groups: PatientID [2]
# PatientID anatomy stenosis
# <int> <chr> <dbl>
# 1 1234 proximal RCA 50
# 2 1234 mid RCA 70
# 3 1235 proximal LCX 40
# 4 1235 mid LCX 70
我正在使用以下完美运行的函数来解析文本数据,以查找患者病历中动脉狭窄的百分比。
txt <- "Small caliber RCA with 50% proximal and 70% mid stenoses."
coronary_anatomy <- function(x) {
# Check if sentence
if(!is.character(x)) {stop("Requires character string", call. = FALSE)}
# Establish variables
epicardial <- c("LM", "LAD", "LCX", "RCA")
mods <- c("proximal", "mid", "distal", "ostial")
sentence <-
tibble(line = 1, sentence = x) %>%
tidytext::unnest_tokens(input = sentence, output = word, to_lower = FALSE) %>%
pull(word)
# Identify number/locations of disease
artery <- sentence[which(sentence %in% epicardial)]
locs <- grep("\d+", sentence)
mlocs <- which(sentence %in% mods)
# Find the nearest neighbors to identify which modifier goes with which location
space <- combn(mlocs, length(locs))
dist <- apply(space, 2, function(x) {sum(abs(locs - x))})
matched <- space[, which.min(dist)]
tbl <-
tibble(
anatomy = paste(sentence[matched], artery),
stenosis = as.numeric(sentence[locs])
)
# Return
return(tbl)
}
# Test it out
coronary_anatomy(txt)
Output:
# A tibble: 2 x 2
anatomy stenosis
<chr> <dbl>
1 proximal RCA 50
2 mid RCA 70
代码运行良好。但现在我 运行 正在研究更大规模应用它的问题。我想将此代码应用于包含一整列患者病历的数据框。我想要运行函数通过的数据框的简化数据框如下所示。
# A tibble: 2 x 2
PatientID Records
<chr> <chr>
1 1234 Small caliber RCA with 50% proximal and 70% mid stenoses
2 1235 Small caliber LCX with 40% proximal and 70% mid stenoses
那么问题来了。我想以某种方式 运行 通过整个记录列使用此功能。但是,运行使用此函数(如上所示)会输出一个小标题,其大小会根据可用于解析的信息量而有所不同。
有没有比我聪明的人知道如何运行这个函数通过包含医疗记录的数据table列中的每个单元格,并以有组织的方式输出它,假设输出是小标题?
如果速度不是问题,您可以使用 lapply
或 purrr::map
函数(甚至是 for 循环)遍历每一行数据,将每个小标题结果保存在a list
,然后将 tibbles 列表组合成一个漂亮的大 tibble 以供使用。例如,
# dplyr and lapply
result_list = lapply(your_data$Records, coronary_anatomy)
names(result_list) = your_data$PatientID
result_tbl = bind_rows(result_list, .id = "PatientID")
result_tbl
# # A tibble: 4 x 3
# PatientID anatomy stenosis
# <chr> <chr> <dbl>
# 1 1234 proximal RCA 50
# 2 1234 mid RCA 70
# 3 1235 proximal LCX 40
# 4 1235 mid LCX 70
如果您使用的是 dplyr
1.0 或更高版本,您也可以简单地使用 group_by
和 summarize
:
your_data %>%
group_by(PatientID) %>%
summarize(coronary_anatomy(Records))
# `summarise()` regrouping output by 'PatientID' (override with `.groups` argument)
# # A tibble: 4 x 3
# # Groups: PatientID [2]
# PatientID anatomy stenosis
# <int> <chr> <dbl>
# 1 1234 proximal RCA 50
# 2 1234 mid RCA 70
# 3 1235 proximal LCX 40
# 4 1235 mid LCX 70