dplyr 过滤器功能结合 agrep
dplyr filter function in combination with agrep
我正在尝试仅过滤 table 中标题列中包含单词 "dog" 的行,但我无法使其正常工作。
这是一个数据示例:
ID NozamaItemID NozamaTitle
1 4557 12000017544 Starbucks Double Shot Espresso Light (4 Count, 6.5 Fl Oz Each)
2 4558 12000021992 Pepsi, 8Ct, 12Oz Bottle
3 4559 12000024542 Zuke'S Natural Hip Action dog Treats, 3 Oz
4 4560 12000030680 Pepsi Made With Real Sugar, 12 Ct, 12 Oz Cans
5 4561 12000030680 Pepsi Made With Real Sugar, 12 Ct, 12 Oz Cans
6 4562 12000030680 Pepsi Made With Real Sugar, 12 Ct, 12 Oz Cans
下面的代码应该有效,但没有:
amzp <- select(amz, ID, NozamaItemID, NozamaTitle, NozamaCustomerID)
searchTerm="cat|dog"
amzp.a <- mutate(amzp, animalFood = ifelse(grepl(searchTerm, amzp$NozamaTitle, ignore.case = TRUE) == TRUE, TRUE, FALSE))
我希望第 3 行的结果为 TRUE。非常感谢您的帮助。谢谢
我不太确定你想要达到什么目的,但如果你的目标只是只留下单词 "dog" 出现在 NozamaTitle
列中的行,你只需需要使用 dplyr::filter
。使用 chickwts
作为替代最小可重现示例的示例:
levels(chickwts$feed)
# [1] "casein" "horsebean" "linseed" "meatmeal" "soybean"
# [6] "sunflower"
df <- filter(chickwts, grepl("bean", feed))
df
# weight feed
# 1 179 horsebean
# 2 160 horsebean
# 3 136 horsebean
# ...
# 11 243 soybean
# 12 230 soybean
# 13 248 soybean
# ...
这就是你想要的吗?
你很接近,你只需要摆脱 ifelse
:
amzp.a <- mutate(amzp, animalFood = grepl(searchTerm,
NozamaTitle, ignore.case = TRUE))
给出:
> amzp.a
ID NozamaItemID NozamaTitle animalFood
1 4557 12000017544 Starbucks Double Shot Espresso Light (4 Count, 6.5 Fl Oz Each) FALSE
2 4558 12000021992 Pepsi, 8Ct, 12Oz Bottle FALSE
3 4559 12000024542 Zuke'S Natural Hip Action dog Treats, 3 Oz TRUE
4 4560 12000030680 Pepsi Made With Real Sugar, 12 Ct, 12 Oz Cans FALSE
5 4561 12000030680 Pepsi Made With Real Sugar, 12 Ct, 12 Oz Cans FALSE
6 4562 12000030680 Pepsi Made With Real Sugar, 12 Ct, 12 Oz Cans FALSE
已用数据:
amzp <- structure(list(ID = 4557:4562,
NozamaItemID = c(12000017544, 12000021992, 12000024542, 12000030680, 12000030680, 12000030680),
NozamaTitle = structure(c(4L, 1L, 2L, 3L, 3L, 3L), .Label = c("Pepsi, 8Ct, 12Oz Bottle","Zuke'S Natural Hip Action dog Treats, 3 Oz","Pepsi Made With Real Sugar, 12 Ct, 12 Oz Cans","Starbucks Double Shot Espresso Light (4 Count, 6.5 Fl Oz Each)"), class = "factor")),
.Names = c("ID", "NozamaItemID", "NozamaTitle"), class = "data.frame", row.names = c(NA, -6L))
编辑:您的原始代码:
amzp.a <- mutate(amzp, animalFood = ifelse(grepl(searchTerm, amzp$NozamaTitle, ignore.case = TRUE) == TRUE, TRUE, FALSE))
确实有效。尽管它包含几个不需要的组件(ifelse
语句并在标准 dplyr 函数中使用 data$column
),但它给出了所需的结果:
> amzp.a
ID NozamaItemID NozamaTitle animalFood
1 4557 12000017544 Starbucks Double Shot Espresso Light (4 Count, 6.5 Fl Oz Each) FALSE
2 4558 12000021992 Pepsi, 8Ct, 12Oz Bottle FALSE
3 4559 12000024542 Zuke'S Natural Hip Action dog Treats, 3 Oz TRUE
4 4560 12000030680 Pepsi Made With Real Sugar, 12 Ct, 12 Oz Cans FALSE
5 4561 12000030680 Pepsi Made With Real Sugar, 12 Ct, 12 Oz Cans FALSE
6 4562 12000030680 Pepsi Made With Real Sugar, 12 Ct, 12 Oz Cans FALSE
因此,您可能想要更详细地描述 "does not work" 语句。
我正在尝试仅过滤 table 中标题列中包含单词 "dog" 的行,但我无法使其正常工作。
这是一个数据示例:
ID NozamaItemID NozamaTitle
1 4557 12000017544 Starbucks Double Shot Espresso Light (4 Count, 6.5 Fl Oz Each)
2 4558 12000021992 Pepsi, 8Ct, 12Oz Bottle
3 4559 12000024542 Zuke'S Natural Hip Action dog Treats, 3 Oz
4 4560 12000030680 Pepsi Made With Real Sugar, 12 Ct, 12 Oz Cans
5 4561 12000030680 Pepsi Made With Real Sugar, 12 Ct, 12 Oz Cans
6 4562 12000030680 Pepsi Made With Real Sugar, 12 Ct, 12 Oz Cans
下面的代码应该有效,但没有:
amzp <- select(amz, ID, NozamaItemID, NozamaTitle, NozamaCustomerID)
searchTerm="cat|dog"
amzp.a <- mutate(amzp, animalFood = ifelse(grepl(searchTerm, amzp$NozamaTitle, ignore.case = TRUE) == TRUE, TRUE, FALSE))
我希望第 3 行的结果为 TRUE。非常感谢您的帮助。谢谢
我不太确定你想要达到什么目的,但如果你的目标只是只留下单词 "dog" 出现在 NozamaTitle
列中的行,你只需需要使用 dplyr::filter
。使用 chickwts
作为替代最小可重现示例的示例:
levels(chickwts$feed)
# [1] "casein" "horsebean" "linseed" "meatmeal" "soybean"
# [6] "sunflower"
df <- filter(chickwts, grepl("bean", feed))
df
# weight feed
# 1 179 horsebean
# 2 160 horsebean
# 3 136 horsebean
# ...
# 11 243 soybean
# 12 230 soybean
# 13 248 soybean
# ...
这就是你想要的吗?
你很接近,你只需要摆脱 ifelse
:
amzp.a <- mutate(amzp, animalFood = grepl(searchTerm,
NozamaTitle, ignore.case = TRUE))
给出:
> amzp.a
ID NozamaItemID NozamaTitle animalFood
1 4557 12000017544 Starbucks Double Shot Espresso Light (4 Count, 6.5 Fl Oz Each) FALSE
2 4558 12000021992 Pepsi, 8Ct, 12Oz Bottle FALSE
3 4559 12000024542 Zuke'S Natural Hip Action dog Treats, 3 Oz TRUE
4 4560 12000030680 Pepsi Made With Real Sugar, 12 Ct, 12 Oz Cans FALSE
5 4561 12000030680 Pepsi Made With Real Sugar, 12 Ct, 12 Oz Cans FALSE
6 4562 12000030680 Pepsi Made With Real Sugar, 12 Ct, 12 Oz Cans FALSE
已用数据:
amzp <- structure(list(ID = 4557:4562,
NozamaItemID = c(12000017544, 12000021992, 12000024542, 12000030680, 12000030680, 12000030680),
NozamaTitle = structure(c(4L, 1L, 2L, 3L, 3L, 3L), .Label = c("Pepsi, 8Ct, 12Oz Bottle","Zuke'S Natural Hip Action dog Treats, 3 Oz","Pepsi Made With Real Sugar, 12 Ct, 12 Oz Cans","Starbucks Double Shot Espresso Light (4 Count, 6.5 Fl Oz Each)"), class = "factor")),
.Names = c("ID", "NozamaItemID", "NozamaTitle"), class = "data.frame", row.names = c(NA, -6L))
编辑:您的原始代码:
amzp.a <- mutate(amzp, animalFood = ifelse(grepl(searchTerm, amzp$NozamaTitle, ignore.case = TRUE) == TRUE, TRUE, FALSE))
确实有效。尽管它包含几个不需要的组件(ifelse
语句并在标准 dplyr 函数中使用 data$column
),但它给出了所需的结果:
> amzp.a
ID NozamaItemID NozamaTitle animalFood
1 4557 12000017544 Starbucks Double Shot Espresso Light (4 Count, 6.5 Fl Oz Each) FALSE
2 4558 12000021992 Pepsi, 8Ct, 12Oz Bottle FALSE
3 4559 12000024542 Zuke'S Natural Hip Action dog Treats, 3 Oz TRUE
4 4560 12000030680 Pepsi Made With Real Sugar, 12 Ct, 12 Oz Cans FALSE
5 4561 12000030680 Pepsi Made With Real Sugar, 12 Ct, 12 Oz Cans FALSE
6 4562 12000030680 Pepsi Made With Real Sugar, 12 Ct, 12 Oz Cans FALSE
因此,您可能想要更详细地描述 "does not work" 语句。