循环并添加到R中的计数器
Looping and adding to a counter in R
我有一个包含几列的数据框 df
,但下面给出了唯一相关的列。
node | precedingWord
-------------------------
A-bom de
A-bom die
A-bom de
A-bom een
A-bom n
A-bom de
acroniem het
acroniem t
acroniem het
acroniem n
acroniem een
act de
act het
act die
act dat
act t
act n
我想使用这些值来计算每个节点的 precedingWords,但包含子类别。例如:要向其添加值的一列标题为 neuter
,另一列标题为 non-neuter
,最后一列标题为 rest
。 neuter
将包含 precedingWord 是以下值之一的所有值:t
、het
、dat
。 non-neuter
将包含 de
和 die,
,而 rest
将包含不属于 neuter
或 non-neuter
的所有内容。 (如果这可以是动态的,那就太好了,换句话说,rest
使用某种用于中性和 non-neuter 的反向变量。或者简单地减去中性和 [=44 中的值=] 来自具有该节点的行的长度。)
示例输出(在新的数据框中,假设 freqDf
,看起来像这样:
node | neuter | nonNeuter | rest
-----------------------------------------
A-bom 0 4 2
acroniem 3 0 2
act 3 2 1
要创建 freqDf$node,我可以这样做:
freqDf<- data.frame(node = unique(df$node), stringsAsFactors = FALSE)
但这已经是我的全部了;我不知道如何继续。我想我可以做这样的事情,但不幸的是 ++
运算符没有像我希望的那样工作。
freqDf$neuter[grep("dat|het|t", df$precedingWord, perl=TRUE)] <- ++
freqDf$nonNeuter[grep("de|die", df$precedingWord, perl=TRUE)] <- ++
e <- table(df$Node)
freqDf$rest <- as.numeric(e - freqDf$neuter - freqDf$nonNeuter)
此外,这不适用于每个节点。我需要某种循环,它可以针对 freqDf$node
.
中的每个不同值自动运行
一种方法是用类别替换值,然后使用 table
函数生成频率。
neuter <- c("t", "het", "dat")
non.neuter <- c("de", "die")
df$precedingWord[df$precedingWord %in% neuter] <- "neuter"
df$precedingWord[df$precedingWord %in% non.neuter] <- "non.neuter"
df$precedingWord[!df$precedingWord %in% c(neuter, non.neuter)] <- "rest"
table(df)
precedingWord
node neuter non.neuter rest
A-bom 0 4 2
acroniem 3 0 2
act 3 2 1
但我确信有一个更好的解决方案,例如 dplyr 包。
编辑:也许是这样的:
(它不会覆盖您的 "precedingWord" 列,而是添加一个新的 "gender" 列)
library(dplyr)
df %>%
mutate(gender = ifelse(!precedingWord %in% c(neuter, non.neuter), "rest",
ifelse(precedingWord %in% neuter, "neuter", "non.neuter"))) %>%
count(node, gender)
Source: local data frame [7 x 3]
Groups: node
node gender n
1 A-bom non.neuter 4
2 A-bom rest 2
3 acroniem neuter 3
4 acroniem rest 2
5 act neuter 3
6 act non.neuter 2
7 act rest 1
# And if you want the same output you put in your question, you can use table
df2 <- mutate(df, gender = ifelse(!precedingWord %in% c(neuter, non.neuter), "rest",
ifelse(precedingWord %in% neuter, "neuter", "non.neuter")))
table(df2$node, df2$gender)
neuter non.neuter rest
A-bom 0 4 2
acroniem 3 0 2
act 3 2 1
编辑:将 table 转换为可操作的数据框
myTable <- table(df2$node, df2$gender) %>%
as.data.frame.matrix %>%
mutate(node = row.names(.))
> myTable
neuter non.neuter rest node
1 0 4 2 A-bom
2 3 0 2 acroniem
3 3 2 1 act
> str(myTable)
'data.frame': 3 obs. of 4 variables:
$ neuter : int 0 3 3
$ non.neuter: int 4 0 2
$ rest : int 2 2 1
$ node : chr "A-bom" "acroniem" "act"
# And here is a more understandable way if you are not familiar with piping
# To learn more about forward piping : https://github.com/smbache/magrittr
myTable <- table(df2$node, df2$gender)
myTable2 <- as.data.frame.matrix(myTable)
myTable3 <- mutate(myTable2, node = row.names(myTable2))
R 通常不需要循环。它旨在使用向量和 apply
命令作用于数据结构的所有元素。在这种情况下,您不需要使用 tapply
,因为 table
函数已经完成了您想要的操作。
Julien 的回答适用于您的示例,但在(可能不常见的)不存在给定类型的单词的情况下,它将失败。例如,如果您没有 "neuter" 个单词,那么 table 中就会缺少 "neuter",而不是像预期的那样显示全零。为了解决这个问题,您可以将单词类型视为一个因素。
请注意,在下面的代码中,我添加了第四种类型的词 ("nonword") 来演示零词的情况。
df<-as.data.frame(matrix(c("A-bom","de","A-bom","die","A-bom","de","A-bom","een","A-bom","n","A-bom","de","acroniem","het","acroniem","t","acroniem","het","acroniem","n","acroniem","een","act","de","act","het","act","die","act","dat","act","t","act","n"), byrow=T, ncol=2), stringsAsFactors=F)
names(df)<-c("node", "precedingWord")
# dictionary of word types.
# I added a fourth type of word to demonstrate what happens
# if no words of a given type are present.
classes<-c("t"="neuter", "het"="neuter" ,"dat"="neuter", "de"="non-neuter", "die"="non-neuter", "blorble"="nonword")
# create class variable and initialize to "rest"
df$class<-"rest"
df$class<-ifelse(!is.na(classes[df$precedingWord]), classes[df$precedingWord], "rest")
# note fourth category, "nonword", is missing.
table(df$node, df$class)
# make sure any missing categories are still possible levels for class
df$class<-factor(df$class)
levels(df$class)<-c(levels(df$class), unique(classes))
#now non-represented categories are still there.
table(df$node, df$class)
我有一个包含几列的数据框 df
,但下面给出了唯一相关的列。
node | precedingWord
-------------------------
A-bom de
A-bom die
A-bom de
A-bom een
A-bom n
A-bom de
acroniem het
acroniem t
acroniem het
acroniem n
acroniem een
act de
act het
act die
act dat
act t
act n
我想使用这些值来计算每个节点的 precedingWords,但包含子类别。例如:要向其添加值的一列标题为 neuter
,另一列标题为 non-neuter
,最后一列标题为 rest
。 neuter
将包含 precedingWord 是以下值之一的所有值:t
、het
、dat
。 non-neuter
将包含 de
和 die,
,而 rest
将包含不属于 neuter
或 non-neuter
的所有内容。 (如果这可以是动态的,那就太好了,换句话说,rest
使用某种用于中性和 non-neuter 的反向变量。或者简单地减去中性和 [=44 中的值=] 来自具有该节点的行的长度。)
示例输出(在新的数据框中,假设 freqDf
,看起来像这样:
node | neuter | nonNeuter | rest
-----------------------------------------
A-bom 0 4 2
acroniem 3 0 2
act 3 2 1
要创建 freqDf$node,我可以这样做:
freqDf<- data.frame(node = unique(df$node), stringsAsFactors = FALSE)
但这已经是我的全部了;我不知道如何继续。我想我可以做这样的事情,但不幸的是 ++
运算符没有像我希望的那样工作。
freqDf$neuter[grep("dat|het|t", df$precedingWord, perl=TRUE)] <- ++
freqDf$nonNeuter[grep("de|die", df$precedingWord, perl=TRUE)] <- ++
e <- table(df$Node)
freqDf$rest <- as.numeric(e - freqDf$neuter - freqDf$nonNeuter)
此外,这不适用于每个节点。我需要某种循环,它可以针对 freqDf$node
.
一种方法是用类别替换值,然后使用 table
函数生成频率。
neuter <- c("t", "het", "dat")
non.neuter <- c("de", "die")
df$precedingWord[df$precedingWord %in% neuter] <- "neuter"
df$precedingWord[df$precedingWord %in% non.neuter] <- "non.neuter"
df$precedingWord[!df$precedingWord %in% c(neuter, non.neuter)] <- "rest"
table(df)
precedingWord
node neuter non.neuter rest
A-bom 0 4 2
acroniem 3 0 2
act 3 2 1
但我确信有一个更好的解决方案,例如 dplyr 包。
编辑:也许是这样的: (它不会覆盖您的 "precedingWord" 列,而是添加一个新的 "gender" 列)
library(dplyr)
df %>%
mutate(gender = ifelse(!precedingWord %in% c(neuter, non.neuter), "rest",
ifelse(precedingWord %in% neuter, "neuter", "non.neuter"))) %>%
count(node, gender)
Source: local data frame [7 x 3]
Groups: node
node gender n
1 A-bom non.neuter 4
2 A-bom rest 2
3 acroniem neuter 3
4 acroniem rest 2
5 act neuter 3
6 act non.neuter 2
7 act rest 1
# And if you want the same output you put in your question, you can use table
df2 <- mutate(df, gender = ifelse(!precedingWord %in% c(neuter, non.neuter), "rest",
ifelse(precedingWord %in% neuter, "neuter", "non.neuter")))
table(df2$node, df2$gender)
neuter non.neuter rest
A-bom 0 4 2
acroniem 3 0 2
act 3 2 1
编辑:将 table 转换为可操作的数据框
myTable <- table(df2$node, df2$gender) %>%
as.data.frame.matrix %>%
mutate(node = row.names(.))
> myTable
neuter non.neuter rest node
1 0 4 2 A-bom
2 3 0 2 acroniem
3 3 2 1 act
> str(myTable)
'data.frame': 3 obs. of 4 variables:
$ neuter : int 0 3 3
$ non.neuter: int 4 0 2
$ rest : int 2 2 1
$ node : chr "A-bom" "acroniem" "act"
# And here is a more understandable way if you are not familiar with piping
# To learn more about forward piping : https://github.com/smbache/magrittr
myTable <- table(df2$node, df2$gender)
myTable2 <- as.data.frame.matrix(myTable)
myTable3 <- mutate(myTable2, node = row.names(myTable2))
R 通常不需要循环。它旨在使用向量和 apply
命令作用于数据结构的所有元素。在这种情况下,您不需要使用 tapply
,因为 table
函数已经完成了您想要的操作。
Julien 的回答适用于您的示例,但在(可能不常见的)不存在给定类型的单词的情况下,它将失败。例如,如果您没有 "neuter" 个单词,那么 table 中就会缺少 "neuter",而不是像预期的那样显示全零。为了解决这个问题,您可以将单词类型视为一个因素。
请注意,在下面的代码中,我添加了第四种类型的词 ("nonword") 来演示零词的情况。
df<-as.data.frame(matrix(c("A-bom","de","A-bom","die","A-bom","de","A-bom","een","A-bom","n","A-bom","de","acroniem","het","acroniem","t","acroniem","het","acroniem","n","acroniem","een","act","de","act","het","act","die","act","dat","act","t","act","n"), byrow=T, ncol=2), stringsAsFactors=F)
names(df)<-c("node", "precedingWord")
# dictionary of word types.
# I added a fourth type of word to demonstrate what happens
# if no words of a given type are present.
classes<-c("t"="neuter", "het"="neuter" ,"dat"="neuter", "de"="non-neuter", "die"="non-neuter", "blorble"="nonword")
# create class variable and initialize to "rest"
df$class<-"rest"
df$class<-ifelse(!is.na(classes[df$precedingWord]), classes[df$precedingWord], "rest")
# note fourth category, "nonword", is missing.
table(df$node, df$class)
# make sure any missing categories are still possible levels for class
df$class<-factor(df$class)
levels(df$class)<-c(levels(df$class), unique(classes))
#now non-represented categories are still there.
table(df$node, df$class)