识别文件夹名称中的字符串以创建变量 (stringi r)
identifying strings in folder names to create variables (stringi r)
希望你一切顺利。
我有一份 csv 文件列表,这些文件使用与此类似的约定,“SubB1V2timecourses_chanHbO_Cond2_202010281527”
我想合并数据集中的所有文件并添加 ID (B1V2)、发色团(在本例中为 HbO;但其他文件标记为 Hbb)等变量;条件(在这种情况下为 Cond2,但可以是 Cond1-Cond9)。
下面是我当前的函数。到目前为止,我可以读入 ID、时间(这是一个单独的 excel 文档)和数据。但是,我得到了条件和发色团的 NA。字符串规范中是否缺少某些内容?
非常感谢任何帮助。
保重身体,保持健康,
卡罗琳
multmerge <- function(mypath){
require(stringi)
require(readxl)
filenames <- list.files(path=mypath, full.names=TRUE) #path=mypath
datalist <- lapply(filenames, function(x){
df <- read.csv(file=x,header= TRUE)
ID <- unlist(stri_extract_all_regex(toupper(x), "B\d+"))
Condition <- unlist(stri_extract_all_regex(tolower(x), "Cond\d+"))
Chromophore <- ifelse(stri_detect_regex(toupper(x), "HbO"), "HbO",
ifelse(stri_detect_regex(toupper(x), "Hbb"), "Hbb", "NA"))
#ifelse(stri_detect_regex(tolower(x), "nonsocial"),"NonSocial",
# ifelse(stri_detect_regex(tolower(x),"social-inverted"), "social_inverted",
# ifelse(stri_detect_regex(tolower(x),"social"), "social", "NA")))
# time <- read_excel("time4hz.xlsx")
df <- data.frame(ID, time, Condition, Chromophore, df)
return(df)
}) # end read-in function
Reduce(function(x,y) {merge(x,y,all = TRUE)}, datalist)
}
也许您想要 strcapture
之类的东西?例如,如果您有这样的文件名列表
filenames <- c(
"/path/to/SubB1V2timecourses_chanHbO_Cond2_202010281527",
"/path/to/SubB4V9timecourses_chanHbb_Cond7_202010011527"
)
然后
strcapture(
"Sub([^_]+)timecourses_chan([^_]+)_([^_]+)_\d+",
basename(filenames),
data.frame(ID = character(), chromophore = character(), condition = character())
)
returns
ID chromophore condition
1 B1V2 HbO Cond2
2 B4V9 Hbb Cond7
将此与您的 multmerge
相结合:
multmerge <- function(mypath){
filenames <- list.files(path = mypath, full.names = TRUE) #path=mypath
metadata <- strcapture(
"Sub([^_]+)timecourses_chan([^_]+)_([^_]+)_\d+",
basename(filenames),
data.frame(ID = character(), chromophore = character(), condition = character())
)
datalist <- lapply(seq_along(filenames), function(i, nms, info) {
df <- read.csv(file = nms[[i]], header = TRUE)
data.frame(info[i, ], df)
}, filenames, metadata)
Reduce(function(x,y) {merge(x, y, all = TRUE)}, datalist)
}
希望你一切顺利。
我有一份 csv 文件列表,这些文件使用与此类似的约定,“SubB1V2timecourses_chanHbO_Cond2_202010281527”
我想合并数据集中的所有文件并添加 ID (B1V2)、发色团(在本例中为 HbO;但其他文件标记为 Hbb)等变量;条件(在这种情况下为 Cond2,但可以是 Cond1-Cond9)。
下面是我当前的函数。到目前为止,我可以读入 ID、时间(这是一个单独的 excel 文档)和数据。但是,我得到了条件和发色团的 NA。字符串规范中是否缺少某些内容?
非常感谢任何帮助。
保重身体,保持健康, 卡罗琳
multmerge <- function(mypath){
require(stringi)
require(readxl)
filenames <- list.files(path=mypath, full.names=TRUE) #path=mypath
datalist <- lapply(filenames, function(x){
df <- read.csv(file=x,header= TRUE)
ID <- unlist(stri_extract_all_regex(toupper(x), "B\d+"))
Condition <- unlist(stri_extract_all_regex(tolower(x), "Cond\d+"))
Chromophore <- ifelse(stri_detect_regex(toupper(x), "HbO"), "HbO",
ifelse(stri_detect_regex(toupper(x), "Hbb"), "Hbb", "NA"))
#ifelse(stri_detect_regex(tolower(x), "nonsocial"),"NonSocial",
# ifelse(stri_detect_regex(tolower(x),"social-inverted"), "social_inverted",
# ifelse(stri_detect_regex(tolower(x),"social"), "social", "NA")))
# time <- read_excel("time4hz.xlsx")
df <- data.frame(ID, time, Condition, Chromophore, df)
return(df)
}) # end read-in function
Reduce(function(x,y) {merge(x,y,all = TRUE)}, datalist)
}
也许您想要 strcapture
之类的东西?例如,如果您有这样的文件名列表
filenames <- c(
"/path/to/SubB1V2timecourses_chanHbO_Cond2_202010281527",
"/path/to/SubB4V9timecourses_chanHbb_Cond7_202010011527"
)
然后
strcapture(
"Sub([^_]+)timecourses_chan([^_]+)_([^_]+)_\d+",
basename(filenames),
data.frame(ID = character(), chromophore = character(), condition = character())
)
returns
ID chromophore condition
1 B1V2 HbO Cond2
2 B4V9 Hbb Cond7
将此与您的 multmerge
相结合:
multmerge <- function(mypath){
filenames <- list.files(path = mypath, full.names = TRUE) #path=mypath
metadata <- strcapture(
"Sub([^_]+)timecourses_chan([^_]+)_([^_]+)_\d+",
basename(filenames),
data.frame(ID = character(), chromophore = character(), condition = character())
)
datalist <- lapply(seq_along(filenames), function(i, nms, info) {
df <- read.csv(file = nms[[i]], header = TRUE)
data.frame(info[i, ], df)
}, filenames, metadata)
Reduce(function(x,y) {merge(x, y, all = TRUE)}, datalist)
}