在 apply 函数中使用库函数
Use a library function in the apply function
我有一个数据框,其中包含一个名为 'msgText' 的列,其中包含文本。为此,我想创建另一个名为 'wordcount' 的列来计算 'msgText' 的每一行包含的单词数。
该列如下所示:
head(all_transcripts$msgText)
[1] "Hi, my name is Chris and I am a programmer"
[2] "I have worked with R for 12 years"
[3] "Being a programmer I have many questions"
[4] "The fellow programmers at Whosebug help me to get the answer"
[5] "This help has saved my life many times."
[6] "Thanks Whosebug!"
我想要的结果是:
head(all_transcripts$wordcount)
[1] 10
[2] 8
[3] 7
[4] 11
[5] 8
[6] 2
为此,我将 ngram
库与 wordcount
函数结合使用。
我试过:
all_transcripts$wordcount <- apply(all_transcripts, 2,
wordcount(all_transcripts$msgText))
但是,这样做我得到了以下错误:
Error in match.fun(FUN) : 'wordcount(all_transcripts$msgText)' is
not a function, character or symbol
如何在不对数据集使用 for 循环的情况下正确使用 apply
函数?
我们可以遍历 'msgText' 的元素并应用 wordcount
函数
library(ngram)
library(tidyverse)
all_transcripts %>%
mutate(wordcount = map_int(msgText, wordcount))
# msgText wordcount
#1 Hi, my name is Chris and I am a programmer 10
#2 I have worked with R for 12 years 8
#3 Being a programmer I have many questions 7
#4 The fellow programmers at Whosebug help me to get the answer 11
#5 This help has saved my life many times. 8
#6 Thanks Whosebug! 2
或 base R
all_transcripts$wordcount <- sapply(all_transcripts$msgText, wordcount)
OP 代码中的问题是它循环遍历列(MARGIN = 2
in apply
),其中向量 (alltranscripts$wordcount
) 没有 dim
属性
数据
all_transcripts <- structure(list(msgText = c("Hi, my name is Chris and I am a programmer",
"I have worked with R for 12 years", "Being a programmer I have many questions",
"The fellow programmers at Whosebug help me to get the answer",
"This help has saved my life many times.", "Thanks Whosebug!"
)), class = "data.frame", row.names = c(NA, -6L))
考虑向量化 lengths
和 strsplit
以使用基数 R:
进行字数统计
all_transcripts$word_count <- lengths(strsplit(all_transcripts$text, split=" "))
all_transcripts
# text word_count
# 1 Hi, my name is Chris and I am a programmer 10
# 2 I have worked with R for 12 years 8
# 3 Being a programmer I have many questions 7
# 4 The fellow programmers at Whosebug help me to get the answer 11
# 5 This help has saved my life many times. 8
# 6 Thanks Whosebug! 2
数据
all_transcripts <- data.frame(text=c("Hi, my name is Chris and I am a programmer",
"I have worked with R for 12 years",
"Being a programmer I have many questions",
"The fellow programmers at Whosebug help me to get the answer",
"This help has saved my life many times.",
"Thanks Whosebug!"),
stringsAsFactors=FALSE)
我有一个数据框,其中包含一个名为 'msgText' 的列,其中包含文本。为此,我想创建另一个名为 'wordcount' 的列来计算 'msgText' 的每一行包含的单词数。
该列如下所示:
head(all_transcripts$msgText)
[1] "Hi, my name is Chris and I am a programmer"
[2] "I have worked with R for 12 years"
[3] "Being a programmer I have many questions"
[4] "The fellow programmers at Whosebug help me to get the answer"
[5] "This help has saved my life many times."
[6] "Thanks Whosebug!"
我想要的结果是:
head(all_transcripts$wordcount)
[1] 10
[2] 8
[3] 7
[4] 11
[5] 8
[6] 2
为此,我将 ngram
库与 wordcount
函数结合使用。
我试过:
all_transcripts$wordcount <- apply(all_transcripts, 2,
wordcount(all_transcripts$msgText))
但是,这样做我得到了以下错误:
Error in match.fun(FUN) : 'wordcount(all_transcripts$msgText)' is
not a function, character or symbol
如何在不对数据集使用 for 循环的情况下正确使用 apply
函数?
我们可以遍历 'msgText' 的元素并应用 wordcount
函数
library(ngram)
library(tidyverse)
all_transcripts %>%
mutate(wordcount = map_int(msgText, wordcount))
# msgText wordcount
#1 Hi, my name is Chris and I am a programmer 10
#2 I have worked with R for 12 years 8
#3 Being a programmer I have many questions 7
#4 The fellow programmers at Whosebug help me to get the answer 11
#5 This help has saved my life many times. 8
#6 Thanks Whosebug! 2
或 base R
all_transcripts$wordcount <- sapply(all_transcripts$msgText, wordcount)
OP 代码中的问题是它循环遍历列(MARGIN = 2
in apply
),其中向量 (alltranscripts$wordcount
) 没有 dim
属性
数据
all_transcripts <- structure(list(msgText = c("Hi, my name is Chris and I am a programmer",
"I have worked with R for 12 years", "Being a programmer I have many questions",
"The fellow programmers at Whosebug help me to get the answer",
"This help has saved my life many times.", "Thanks Whosebug!"
)), class = "data.frame", row.names = c(NA, -6L))
考虑向量化 lengths
和 strsplit
以使用基数 R:
all_transcripts$word_count <- lengths(strsplit(all_transcripts$text, split=" "))
all_transcripts
# text word_count
# 1 Hi, my name is Chris and I am a programmer 10
# 2 I have worked with R for 12 years 8
# 3 Being a programmer I have many questions 7
# 4 The fellow programmers at Whosebug help me to get the answer 11
# 5 This help has saved my life many times. 8
# 6 Thanks Whosebug! 2
数据
all_transcripts <- data.frame(text=c("Hi, my name is Chris and I am a programmer",
"I have worked with R for 12 years",
"Being a programmer I have many questions",
"The fellow programmers at Whosebug help me to get the answer",
"This help has saved my life many times.",
"Thanks Whosebug!"),
stringsAsFactors=FALSE)