我怎样才能转换这个旧的 dplyr 语法?
How can I convert this old dplyr syntax?
我是 dplyr 的新手,我在 (i) 理解它的语法和 (ii) 将它的旧版本代码转换成我可以在其最新版本 (dplyr 1.0.2) 中使用的代码时遇到困难。特别是,我对以下两行代码感到困惑:
mutate_each(funs(replace(.,.=="NOT ANSWERED",NA))) %>%
mutate_each(funs(ordered(.,c("NOT AT ALL","ONCE A WEEK", "2-4 TIMES PER WEEK/HALF THE TIME", "5 OR MORE TIMES PER WEEK/ALMOST ALWAYS"))))
我认为第一行代码应该用 NA 替换所有“NOT ANSWERED”。
你觉得下面的改造合适吗?
mutate(across(everything(),~replace(., .== "NOT ANSWERED", NA)))
但是,我不明白第二行代码是什么意思。我相信这是关于创建某种有序变量,其中包含“完全没有”、“每周一次”、“每个 WEEK/HALF 时间 2-4 次”和“每个 WEEK/ALMOST 5 次或更多次”总是”作为级别。
对于这一行的作用以及如何使用 mutate(across()) 将其转换为新语法,您有什么建议吗?
一些上下文
我正在尝试按照有关如何使用 Bootnet R 包的教程进行操作。以下文字来自教程first part
To download the dataset, go to:
https://datashare.nida.nih.gov/study/nida-ctn-0015 and click on
“CTN-0015 Data Files”. The relevant data file is called “qs.csv”,
which can be loaded into R by using the default read.csv function:
FullData <- read.csv("qs.csv", stringsAsFactors = FALSE)
This loads the data in long format, which contains a column with
subject id’s, a column with the names of the administered items, and a
third column containing the item responses. For network analysis, we
need the data to be in wide format. Furthermore, we need to assign
that the response "NOT ANSWERED" indicates a missing response and
other responses are ordinal. Finally, we need to extract relevant
dataset at baseline measure for the PTSD symptom frequency scores. To
do this, we can utilize the dplyr and tidyr R packages as follows:
# Load packages:
library("dplyr")
library("tidyr")
# Frequency at baseline:
Data <- FullData %>%
filter(EPOCH == "BASELINE",grepl("^PSSR\d+A$",QSTESTCD)) %>%
select(USUBJID,QSTEST,QSORRES) %>%
spread(QSTEST, QSORRES) %>%
select(-USUBJID) %
mutate_each(funs(replace(.,.=="NOT ANSWERED",NA))) %>%
mutate_each(funs(ordered(.,c("NOT AT ALL","ONCE A WEEK", "2-4 TIMES PER WEEK/HALF THE TIME", "5 OR MORE TIMES PER WEEK/ALMOST ALWAYS"))))
names(Data) <- seq_len(ncol(Data))
教程继续 second part。
ordered
用于按照出现的顺序创建有序因子。由于这两个调用都应用于相同的列,您可以将它们组合成一个函数。尝试:
library(dplyr)
vals <- c("NOT AT ALL","ONCE A WEEK", "2-4 TIMES PER WEEK/HALF THE TIME", "5 OR MORE TIMES PER WEEK/ALMOST ALWAYS")
Data <- FullData %>%
#....
#....
#....
mutate(across(.fns = ~ordered(replace(., .== "NOT ANSWERED", NA), vals)))
我是 dplyr 的新手,我在 (i) 理解它的语法和 (ii) 将它的旧版本代码转换成我可以在其最新版本 (dplyr 1.0.2) 中使用的代码时遇到困难。特别是,我对以下两行代码感到困惑:
mutate_each(funs(replace(.,.=="NOT ANSWERED",NA))) %>%
mutate_each(funs(ordered(.,c("NOT AT ALL","ONCE A WEEK", "2-4 TIMES PER WEEK/HALF THE TIME", "5 OR MORE TIMES PER WEEK/ALMOST ALWAYS"))))
我认为第一行代码应该用 NA 替换所有“NOT ANSWERED”。
你觉得下面的改造合适吗?
mutate(across(everything(),~replace(., .== "NOT ANSWERED", NA)))
但是,我不明白第二行代码是什么意思。我相信这是关于创建某种有序变量,其中包含“完全没有”、“每周一次”、“每个 WEEK/HALF 时间 2-4 次”和“每个 WEEK/ALMOST 5 次或更多次”总是”作为级别。
对于这一行的作用以及如何使用 mutate(across()) 将其转换为新语法,您有什么建议吗?
一些上下文
我正在尝试按照有关如何使用 Bootnet R 包的教程进行操作。以下文字来自教程first part
To download the dataset, go to: https://datashare.nida.nih.gov/study/nida-ctn-0015 and click on “CTN-0015 Data Files”. The relevant data file is called “qs.csv”, which can be loaded into R by using the default read.csv function:
FullData <- read.csv("qs.csv", stringsAsFactors = FALSE)
This loads the data in long format, which contains a column with subject id’s, a column with the names of the administered items, and a third column containing the item responses. For network analysis, we need the data to be in wide format. Furthermore, we need to assign that the response "NOT ANSWERED" indicates a missing response and other responses are ordinal. Finally, we need to extract relevant dataset at baseline measure for the PTSD symptom frequency scores. To do this, we can utilize the dplyr and tidyr R packages as follows:
# Load packages: library("dplyr") library("tidyr") # Frequency at baseline: Data <- FullData %>% filter(EPOCH == "BASELINE",grepl("^PSSR\d+A$",QSTESTCD)) %>% select(USUBJID,QSTEST,QSORRES) %>% spread(QSTEST, QSORRES) %>% select(-USUBJID) % mutate_each(funs(replace(.,.=="NOT ANSWERED",NA))) %>% mutate_each(funs(ordered(.,c("NOT AT ALL","ONCE A WEEK", "2-4 TIMES PER WEEK/HALF THE TIME", "5 OR MORE TIMES PER WEEK/ALMOST ALWAYS")))) names(Data) <- seq_len(ncol(Data))
教程继续 second part。
ordered
用于按照出现的顺序创建有序因子。由于这两个调用都应用于相同的列,您可以将它们组合成一个函数。尝试:
library(dplyr)
vals <- c("NOT AT ALL","ONCE A WEEK", "2-4 TIMES PER WEEK/HALF THE TIME", "5 OR MORE TIMES PER WEEK/ALMOST ALWAYS")
Data <- FullData %>%
#....
#....
#....
mutate(across(.fns = ~ordered(replace(., .== "NOT ANSWERED", NA), vals)))