在执行 unnest_tokens 并删除停用词后删除具有空白值的行?
Delete rows with blank values after performing unnest_tokens and remove stopwords?
这是我的 df:
df <- structure(list(id = 1:50, strain_id = c(6L, 6L, 7L, 12L, 19L,
35L, 81L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L,
100L, 123L, 123L, 123L, 123L, 123L, 123L, 123L, 123L, 123L, 123L,
123L, 202L, 202L, 202L, 202L, 202L, 202L, 202L, 202L, 202L, 202L,
202L, 246L, 246L, 246L, 246L, 246L, 246L, 246L, 246L, 246L, 246L,
246L), name = c("Anorexia and Cachexia", "Autoimmune Diseases and Inflammation",
"Psychiatric Symptoms", "Autoimmune Diseases and Inflammation",
"Pain", "Autoimmune Diseases and Inflammation", "Dependency and Withdrawal",
"Anorexia and Cachexia", "Spasticity", "Movement Disorders",
"Pain", "Glaucoma", "Epilepsy", "Asthma", "Dependency and Withdrawal",
"Psychiatric Symptoms", "Autoimmune Diseases and Inflammation",
"Nausea and Vomiting", "Anorexia and Cachexia", "Spasticity",
"Movement Disorders", "Pain", "Glaucoma", "Epilepsy", "Asthma",
"Dependency and Withdrawal", "Psychiatric Symptoms", "Autoimmune Diseases and Inflammation",
"Nausea and Vomiting", "Anorexia and Cachexia", "Spasticity",
"Movement Disorders", "Pain", "Glaucoma", "Epilepsy", "Asthma",
"Dependency and Withdrawal", "Psychiatric Symptoms", "Autoimmune Diseases and Inflammation",
"Nausea and Vomiting", "Anorexia and Cachexia", "Spasticity",
"Movement Disorders", "Pain", "Glaucoma", "Epilepsy", "Asthma",
"Dependency and Withdrawal", "Psychiatric Symptoms", "Autoimmune Diseases and Inflammation"
), rating = c(4, 4, 5, 5, 4, 5, 5, 5, 4, 5, 5, 4, 4, 3, 5, 5,
5, 3, 3, 5, 5, 4, 3, 4, 4, 4, 3, 4, 3, 3, 2, 3, 4, 4, 3, 2, 5,
3, 3, 3, 3, 4, 4, 3, 5, 3, 1, 3, 4, 3), dose = c(3, 3, 3, 3,
3, 3, 1, 3, 2, 1, 2, 2, 2, 3, 2, 2, 2, 2, 2, 3, 3, 2, 2, 2, 3,
3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 1, 2, 2, 1, 3, 2,
3, 2, 2, 3), info = c("Affects / helps even in small doses very well at / against Anorexia and Cachexia.",
"Affects / helps even in small doses very well at / against Autoimmune Diseases and Inflammation.",
"Affects / helps even in small doses extremly well at / against Psychiatric Symptoms.",
"Affects / helps even in small doses extremly well at / against Autoimmune Diseases and Inflammation.",
"Affects / helps even in small doses very well at / against Pain.",
"Affects / helps even in small doses extremly well at / against Autoimmune Diseases and Inflammation.",
"Affects / helps only in heavy doses extremly well at / against Dependency and Withdrawal.",
"Affects / helps even in small doses extremly well at / against Anorexia and Cachexia.",
"Affects / helps in average doses very well at / against Spasticity.",
"Affects / helps only in heavy doses extremly well at / against Movement Disorders.",
"Affects / helps in average doses extremly well at / against Pain.",
"Affects / helps in average doses very well at / against Glaucoma.",
"Affects / helps in average doses very well at / against Epilepsy.",
"Affects / helps even in small doses well at / against Asthma.",
"Affects / helps in average doses extremly well at / against Dependency and Withdrawal.",
"Affects / helps in average doses extremly well at / against Psychiatric Symptoms.",
"Affects / helps in average doses extremly well at / against Autoimmune Diseases and Inflammation.",
"Affects / helps in average doses well at / against Nausea and Vomiting.",
"Affects / helps in average doses well at / against Anorexia and Cachexia.",
"Affects / helps even in small doses extremly well at / against Spasticity.",
"Affects / helps even in small doses extremly well at / against Movement Disorders.",
"Affects / helps in average doses very well at / against Pain.",
"Affects / helps in average doses well at / against Glaucoma.",
"Affects / helps in average doses very well at / against Epilepsy.",
"Affects / helps even in small doses very well at / against Asthma.",
"Affects / helps even in small doses very well at / against Dependency and Withdrawal.",
"Affects / helps in average doses well at / against Psychiatric Symptoms.",
"Affects / helps in average doses very well at / against Autoimmune Diseases and Inflammation.",
"Affects / helps in average doses well at / against Nausea and Vomiting.",
"Affects / helps in average doses well at / against Anorexia and Cachexia.",
"Affects / helps in average doses low at / against Spasticity.",
"Affects / helps in average doses well at / against Movement Disorders.",
"Affects / helps in average doses very well at / against Pain.",
"Affects / helps in average doses very well at / against Glaucoma.",
"Affects / helps in average doses well at / against Epilepsy.",
"Affects / helps even in small doses low at / against Asthma.",
"Affects / helps in average doses extremly well at / against Dependency and Withdrawal.",
"Affects / helps in average doses well at / against Psychiatric Symptoms.",
"Affects / helps in average doses well at / against Autoimmune Diseases and Inflammation.",
"Affects / helps in average doses well at / against Nausea and Vomiting.",
"Affects / helps only in heavy doses well at / against Anorexia and Cachexia.",
"Affects / helps in average doses very well at / against Spasticity.",
"Affects / helps in average doses very well at / against Movement Disorders.",
"Affects / helps only in heavy doses well at / against Pain.",
"Affects / helps even in small doses extremly well at / against Glaucoma.",
"Affects / helps in average doses well at / against Epilepsy.",
"Affects / helps even in small doses very low at / against Asthma.",
"Affects / helps in average doses well at / against Dependency and Withdrawal.",
"Affects / helps in average doses very well at / against Psychiatric Symptoms.",
"Affects / helps even in small doses well at / against Autoimmune Diseases and Inflammation."
), votes = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L)), row.names = c(NA, 50L), class = "data.frame")
我需要处理 name
列。
df %>%
tidytext::unnest_tokens(input = name,
output = word,
token = "words",
format = "text",
drop = T,
to_lower = T) %>%
dplyr::mutate(word = sapply(word, tm::removePunctuation, ucp = T),
word = tm::removeWords(word, stopwords("en")),
word = tm::stripWhitespace(word)) %>%
dplyr::filter(!word == "")
请告知我应该使用哪个函数或设置来避免过滤 (dplyr::filter(!word == "")
) 并删除具有空白值的行。
换句话说,我希望我的代码自动(使用设置或函数)过滤特定列中具有空值的行。
我可以仅使用 tidytext 中的函数来重现您的结果。
不需要来自 tm 的函数,因为带有 unnest_tokens 的 tidytext 已经处理标点符号和空格删除(除非另有说明)。您可以使用 dplyr 的 antijoin
和 tidytext 中的 stop_words
来删除不需要的停用词。
df %>%
tidytext::unnest_tokens(input = name,
output = word,
token = "words",
format = "text",
drop = T,
to_lower = T) %>%
anti_join(tidytext::stop_words)
这是我的 df:
df <- structure(list(id = 1:50, strain_id = c(6L, 6L, 7L, 12L, 19L,
35L, 81L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L,
100L, 123L, 123L, 123L, 123L, 123L, 123L, 123L, 123L, 123L, 123L,
123L, 202L, 202L, 202L, 202L, 202L, 202L, 202L, 202L, 202L, 202L,
202L, 246L, 246L, 246L, 246L, 246L, 246L, 246L, 246L, 246L, 246L,
246L), name = c("Anorexia and Cachexia", "Autoimmune Diseases and Inflammation",
"Psychiatric Symptoms", "Autoimmune Diseases and Inflammation",
"Pain", "Autoimmune Diseases and Inflammation", "Dependency and Withdrawal",
"Anorexia and Cachexia", "Spasticity", "Movement Disorders",
"Pain", "Glaucoma", "Epilepsy", "Asthma", "Dependency and Withdrawal",
"Psychiatric Symptoms", "Autoimmune Diseases and Inflammation",
"Nausea and Vomiting", "Anorexia and Cachexia", "Spasticity",
"Movement Disorders", "Pain", "Glaucoma", "Epilepsy", "Asthma",
"Dependency and Withdrawal", "Psychiatric Symptoms", "Autoimmune Diseases and Inflammation",
"Nausea and Vomiting", "Anorexia and Cachexia", "Spasticity",
"Movement Disorders", "Pain", "Glaucoma", "Epilepsy", "Asthma",
"Dependency and Withdrawal", "Psychiatric Symptoms", "Autoimmune Diseases and Inflammation",
"Nausea and Vomiting", "Anorexia and Cachexia", "Spasticity",
"Movement Disorders", "Pain", "Glaucoma", "Epilepsy", "Asthma",
"Dependency and Withdrawal", "Psychiatric Symptoms", "Autoimmune Diseases and Inflammation"
), rating = c(4, 4, 5, 5, 4, 5, 5, 5, 4, 5, 5, 4, 4, 3, 5, 5,
5, 3, 3, 5, 5, 4, 3, 4, 4, 4, 3, 4, 3, 3, 2, 3, 4, 4, 3, 2, 5,
3, 3, 3, 3, 4, 4, 3, 5, 3, 1, 3, 4, 3), dose = c(3, 3, 3, 3,
3, 3, 1, 3, 2, 1, 2, 2, 2, 3, 2, 2, 2, 2, 2, 3, 3, 2, 2, 2, 3,
3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 1, 2, 2, 1, 3, 2,
3, 2, 2, 3), info = c("Affects / helps even in small doses very well at / against Anorexia and Cachexia.",
"Affects / helps even in small doses very well at / against Autoimmune Diseases and Inflammation.",
"Affects / helps even in small doses extremly well at / against Psychiatric Symptoms.",
"Affects / helps even in small doses extremly well at / against Autoimmune Diseases and Inflammation.",
"Affects / helps even in small doses very well at / against Pain.",
"Affects / helps even in small doses extremly well at / against Autoimmune Diseases and Inflammation.",
"Affects / helps only in heavy doses extremly well at / against Dependency and Withdrawal.",
"Affects / helps even in small doses extremly well at / against Anorexia and Cachexia.",
"Affects / helps in average doses very well at / against Spasticity.",
"Affects / helps only in heavy doses extremly well at / against Movement Disorders.",
"Affects / helps in average doses extremly well at / against Pain.",
"Affects / helps in average doses very well at / against Glaucoma.",
"Affects / helps in average doses very well at / against Epilepsy.",
"Affects / helps even in small doses well at / against Asthma.",
"Affects / helps in average doses extremly well at / against Dependency and Withdrawal.",
"Affects / helps in average doses extremly well at / against Psychiatric Symptoms.",
"Affects / helps in average doses extremly well at / against Autoimmune Diseases and Inflammation.",
"Affects / helps in average doses well at / against Nausea and Vomiting.",
"Affects / helps in average doses well at / against Anorexia and Cachexia.",
"Affects / helps even in small doses extremly well at / against Spasticity.",
"Affects / helps even in small doses extremly well at / against Movement Disorders.",
"Affects / helps in average doses very well at / against Pain.",
"Affects / helps in average doses well at / against Glaucoma.",
"Affects / helps in average doses very well at / against Epilepsy.",
"Affects / helps even in small doses very well at / against Asthma.",
"Affects / helps even in small doses very well at / against Dependency and Withdrawal.",
"Affects / helps in average doses well at / against Psychiatric Symptoms.",
"Affects / helps in average doses very well at / against Autoimmune Diseases and Inflammation.",
"Affects / helps in average doses well at / against Nausea and Vomiting.",
"Affects / helps in average doses well at / against Anorexia and Cachexia.",
"Affects / helps in average doses low at / against Spasticity.",
"Affects / helps in average doses well at / against Movement Disorders.",
"Affects / helps in average doses very well at / against Pain.",
"Affects / helps in average doses very well at / against Glaucoma.",
"Affects / helps in average doses well at / against Epilepsy.",
"Affects / helps even in small doses low at / against Asthma.",
"Affects / helps in average doses extremly well at / against Dependency and Withdrawal.",
"Affects / helps in average doses well at / against Psychiatric Symptoms.",
"Affects / helps in average doses well at / against Autoimmune Diseases and Inflammation.",
"Affects / helps in average doses well at / against Nausea and Vomiting.",
"Affects / helps only in heavy doses well at / against Anorexia and Cachexia.",
"Affects / helps in average doses very well at / against Spasticity.",
"Affects / helps in average doses very well at / against Movement Disorders.",
"Affects / helps only in heavy doses well at / against Pain.",
"Affects / helps even in small doses extremly well at / against Glaucoma.",
"Affects / helps in average doses well at / against Epilepsy.",
"Affects / helps even in small doses very low at / against Asthma.",
"Affects / helps in average doses well at / against Dependency and Withdrawal.",
"Affects / helps in average doses very well at / against Psychiatric Symptoms.",
"Affects / helps even in small doses well at / against Autoimmune Diseases and Inflammation."
), votes = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L)), row.names = c(NA, 50L), class = "data.frame")
我需要处理 name
列。
df %>%
tidytext::unnest_tokens(input = name,
output = word,
token = "words",
format = "text",
drop = T,
to_lower = T) %>%
dplyr::mutate(word = sapply(word, tm::removePunctuation, ucp = T),
word = tm::removeWords(word, stopwords("en")),
word = tm::stripWhitespace(word)) %>%
dplyr::filter(!word == "")
请告知我应该使用哪个函数或设置来避免过滤 (dplyr::filter(!word == "")
) 并删除具有空白值的行。
换句话说,我希望我的代码自动(使用设置或函数)过滤特定列中具有空值的行。
我可以仅使用 tidytext 中的函数来重现您的结果。
不需要来自 tm 的函数,因为带有 unnest_tokens 的 tidytext 已经处理标点符号和空格删除(除非另有说明)。您可以使用 dplyr 的 antijoin
和 tidytext 中的 stop_words
来删除不需要的停用词。
df %>%
tidytext::unnest_tokens(input = name,
output = word,
token = "words",
format = "text",
drop = T,
to_lower = T) %>%
anti_join(tidytext::stop_words)