R:为 grepl 粘贴数据框列
R: Paste a dataframe column for grepl
我在 R 中有两行代码理论上应该做同样的事情
我想用它们在列中设置一个值
a <- paste(ref.dg.safe[,'safewords'], collapse="|")
"c(\"NO BATTERIES\", \"COSTUME\", \"CABLE\", \"BAG\", \"CLOTHING\)"
b <- paste(ref.dg.safe$safewords, collapse="|")
NO BATTERIES|COSTUME|CABLE|BAG|CLOTHING|
我想要第二个输出使用第一行代码,因为当我在函数中使用 "b" 时出现部分匹配错误
我也想了解为什么输出如此不同
更新:
最初我使用
行导入数据集
ref.dg.safe <- unique(tbl_df(read.csv("~/Projects/foo_project/REF_SafeList.txt", sep = "\t", as.is = TRUE, strip.white=TRUE)))
dput 看起来像
structure(list(safewords = c("NO BATTERIES", "COSTUME", "CABLE",
"BAG", "CLOTHING", "BRACELET", "FAUCET", "IRON", "CASE", "NO BATTERY",
"BELT", "JACKET", "CONVERTER", "HAIR", "GLASS", "SHOE", "ROUTER",
"LABEL", "ADAPTOR", "SILICONE", "EARPHONE", "SPONGE", "WOOD",
"TANKTOP", "WALLET", "TUBE", "TRIPODS", "STONE", "LAMP", "HEADPHONES",
"COOKIECUTTERS", "CONVERTERS", "COWLEATHER", "INFLATABLETOY",
"HEADPHONE", "LABLE", "ROMPER", "POLE", "PROBE", "FIBEROPTIC",
"APRON", "TABLECLOTH", "AVR", "TABLEBASE", "DESK", "BEAUTYGOODS",
"SEAT", "NOBATTERIES", "SHEOS", "CHARGERS", "STAPLER", "SATCHEL"
)), .Names = "safewords", class = c("tbl_df", "data.frame"), row.names = c(NA,
-52L))
回答原因:
> class(df[,"safewords"])
[1] "tbl_df" "data.frame"
> class(df$safewords)
[1] "character"
这是由于 [
和 $
运算符的工作方式以及它们如何强制 return 或不强制(我不知道如何总结这一点,看看在 data.frame 和子集运算符的文档中)。
一个解决方法是让第一个表单松散它的 data.frame 状态 unlist
像这样:
> paste(unlist(df[,"safewords"]),collapse="|")
[1] "NO BATTERIES|COSTUME|CABLE|BAG|CLOTHING|BRACELET[...]"
我删除了部分输出以使其在此处可读
我在 R 中有两行代码理论上应该做同样的事情 我想用它们在列中设置一个值
a <- paste(ref.dg.safe[,'safewords'], collapse="|")
"c(\"NO BATTERIES\", \"COSTUME\", \"CABLE\", \"BAG\", \"CLOTHING\)"
b <- paste(ref.dg.safe$safewords, collapse="|")
NO BATTERIES|COSTUME|CABLE|BAG|CLOTHING|
我想要第二个输出使用第一行代码,因为当我在函数中使用 "b" 时出现部分匹配错误
我也想了解为什么输出如此不同
更新:
最初我使用
行导入数据集ref.dg.safe <- unique(tbl_df(read.csv("~/Projects/foo_project/REF_SafeList.txt", sep = "\t", as.is = TRUE, strip.white=TRUE)))
dput 看起来像
structure(list(safewords = c("NO BATTERIES", "COSTUME", "CABLE",
"BAG", "CLOTHING", "BRACELET", "FAUCET", "IRON", "CASE", "NO BATTERY",
"BELT", "JACKET", "CONVERTER", "HAIR", "GLASS", "SHOE", "ROUTER",
"LABEL", "ADAPTOR", "SILICONE", "EARPHONE", "SPONGE", "WOOD",
"TANKTOP", "WALLET", "TUBE", "TRIPODS", "STONE", "LAMP", "HEADPHONES",
"COOKIECUTTERS", "CONVERTERS", "COWLEATHER", "INFLATABLETOY",
"HEADPHONE", "LABLE", "ROMPER", "POLE", "PROBE", "FIBEROPTIC",
"APRON", "TABLECLOTH", "AVR", "TABLEBASE", "DESK", "BEAUTYGOODS",
"SEAT", "NOBATTERIES", "SHEOS", "CHARGERS", "STAPLER", "SATCHEL"
)), .Names = "safewords", class = c("tbl_df", "data.frame"), row.names = c(NA,
-52L))
回答原因:
> class(df[,"safewords"])
[1] "tbl_df" "data.frame"
> class(df$safewords)
[1] "character"
这是由于 [
和 $
运算符的工作方式以及它们如何强制 return 或不强制(我不知道如何总结这一点,看看在 data.frame 和子集运算符的文档中)。
一个解决方法是让第一个表单松散它的 data.frame 状态 unlist
像这样:
> paste(unlist(df[,"safewords"]),collapse="|")
[1] "NO BATTERIES|COSTUME|CABLE|BAG|CLOTHING|BRACELET[...]"
我删除了部分输出以使其在此处可读