如何使用匹配函数的输出根据索引行的值编写 if 函数
How to use output from match function to write if function based on value from indexed row
我有一个分层数据框 (df),如下所示,如果文档值为 "NA",则相应的 SubDoc 代表文档的最高级别
Document SubDoc Level
*NA* Document1 "1"
Document1 SubDocument1 "NA"
Document1 SubDocument2 "NA"
Document1 SubDocument3 "NA"
Document1 SubDocument4 "NA"
SubDocument1 Outcome1 "NA"
SubDocument1 Outcome2 "NA"
SubDocument1 Outcome3 "NA"
Subdocument2 Outcome1 "NA"
Subdocument2 Outcome2 "NA"
Subdocument3 Outcome1 "NA"
*NA* Document2 "1"
Document2 SubDoc1 "NA"
等等...
我希望级别表示文档从顶部向下的级别数,目前我通过检查文档列是否为空来分配级别 1,如果是,则分配它 1
df$Level <- ifelse(is.na(df$Document), df$Level <- "1", df$Level <- "NA")
现在我想通过检查文档列中的字符串是否与 SubDoc 列中的匹配来分配低于该级别的所有级别(因为它们匹配的行将包含其父级别)
match(df$Document,df$Subdoc)
其中 returns 他们在这种情况下匹配的位置的索引
"NA",1,1,1,1,2,2,2
我想做的是获取这些索引号并编写一个语句,如果返回的行索引的 Level 值 == 1,那么 Level 将得到 2,例如第 2 行在第 1 行,在该行中,Level == 1,因此第 2 行的 Level 值为 2。生成的数据框如下所示
Document SubDoc Level
*NA* Document1 "1"
Document1 SubDocument1 "2"
Document1 SubDocument2 "2"
Document1 SubDocument3 "2"
Document1 SubDocument4 "2"
SubDocument1 Outcome1 "3"
SubDocument1 Outcome2 "3"
SubDocument1 Outcome3 "3"
Subdocument2 Outcome1 "3"
Subdocument2 Outcome2 "3"
Subdocument3 Outcome1 "3"
*NA* Document2 "1"
Document2 SubDoc1 "2"
但是我不确定如何为此编写解决方案。感谢任何帮助
我们可以使用match
df$Level <- match(df$Document, unique(df$Document))
df$Level
#[1] 1 2 2 2 2 3 3 3
或 factor
as.integer(factor(df$Document, levels = unique(df$Document)))
注意:这里,不清楚 OP 的数据集是真实的 NA
还是带引号的字符串 "NA"
数据
df <- structure(list(Document = c(NA, "Document", "Document", "Document",
"Document", "SubDocument1", "SubDocument1", "SubDocument1"),
SubDoc = c("Document", "SubDocument1", "SubDocument2", "SubDocument3",
"SubDocument4", "Outcome1", "Outcome2", "Outcome3"), Level = c(1L,
NA, NA, NA, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA,
-8L))
我能够使用以下代码解决这个问题
df$level[is.na(df$Document)] <- 1
df["ParentLevel"] <- df[match(df$Document,df$Subdoc),"level"]
df$level[df$ParentLevel == 1] <- 2
df["ParentLevel"] <- df[match(df$Document,df$Subdoc),"level"]
df$level[Sub_docs$ParentLevel == 2] <- 3
df["ParentLevel"] <- df[match(df$Document,df$Subdoc),"level"]
df$level[df$ParentLevel == 3] <- 4
等...
从这里开始,只需创建一个循环并从数据集中删除 ParentLevel 列。输出看起来像这样,
Document SubDoc Level
*NA* Document1 1
Document1 SubDocument1 2
Document1 SubDocument2 2
Document1 SubDocument3 2
Document1 SubDocument4 2
SubDocument1 Outcome1 3
SubDocument1 Outcome2 3
SubDocument1 Outcome3 3
Subdocument2 Outcome1 3
Subdocument2 Outcome2 3
Subdocument3 Outcome1 3
*NA* Document2 1
Document2 SubDoc1 2
我有一个分层数据框 (df),如下所示,如果文档值为 "NA",则相应的 SubDoc 代表文档的最高级别
Document SubDoc Level
*NA* Document1 "1"
Document1 SubDocument1 "NA"
Document1 SubDocument2 "NA"
Document1 SubDocument3 "NA"
Document1 SubDocument4 "NA"
SubDocument1 Outcome1 "NA"
SubDocument1 Outcome2 "NA"
SubDocument1 Outcome3 "NA"
Subdocument2 Outcome1 "NA"
Subdocument2 Outcome2 "NA"
Subdocument3 Outcome1 "NA"
*NA* Document2 "1"
Document2 SubDoc1 "NA"
等等...
我希望级别表示文档从顶部向下的级别数,目前我通过检查文档列是否为空来分配级别 1,如果是,则分配它 1
df$Level <- ifelse(is.na(df$Document), df$Level <- "1", df$Level <- "NA")
现在我想通过检查文档列中的字符串是否与 SubDoc 列中的匹配来分配低于该级别的所有级别(因为它们匹配的行将包含其父级别)
match(df$Document,df$Subdoc)
其中 returns 他们在这种情况下匹配的位置的索引
"NA",1,1,1,1,2,2,2
我想做的是获取这些索引号并编写一个语句,如果返回的行索引的 Level 值 == 1,那么 Level 将得到 2,例如第 2 行在第 1 行,在该行中,Level == 1,因此第 2 行的 Level 值为 2。生成的数据框如下所示
Document SubDoc Level
*NA* Document1 "1"
Document1 SubDocument1 "2"
Document1 SubDocument2 "2"
Document1 SubDocument3 "2"
Document1 SubDocument4 "2"
SubDocument1 Outcome1 "3"
SubDocument1 Outcome2 "3"
SubDocument1 Outcome3 "3"
Subdocument2 Outcome1 "3"
Subdocument2 Outcome2 "3"
Subdocument3 Outcome1 "3"
*NA* Document2 "1"
Document2 SubDoc1 "2"
但是我不确定如何为此编写解决方案。感谢任何帮助
我们可以使用match
df$Level <- match(df$Document, unique(df$Document))
df$Level
#[1] 1 2 2 2 2 3 3 3
或 factor
as.integer(factor(df$Document, levels = unique(df$Document)))
注意:这里,不清楚 OP 的数据集是真实的 NA
还是带引号的字符串 "NA"
数据
df <- structure(list(Document = c(NA, "Document", "Document", "Document",
"Document", "SubDocument1", "SubDocument1", "SubDocument1"),
SubDoc = c("Document", "SubDocument1", "SubDocument2", "SubDocument3",
"SubDocument4", "Outcome1", "Outcome2", "Outcome3"), Level = c(1L,
NA, NA, NA, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA,
-8L))
我能够使用以下代码解决这个问题
df$level[is.na(df$Document)] <- 1
df["ParentLevel"] <- df[match(df$Document,df$Subdoc),"level"]
df$level[df$ParentLevel == 1] <- 2
df["ParentLevel"] <- df[match(df$Document,df$Subdoc),"level"]
df$level[Sub_docs$ParentLevel == 2] <- 3
df["ParentLevel"] <- df[match(df$Document,df$Subdoc),"level"]
df$level[df$ParentLevel == 3] <- 4
等... 从这里开始,只需创建一个循环并从数据集中删除 ParentLevel 列。输出看起来像这样,
Document SubDoc Level
*NA* Document1 1
Document1 SubDocument1 2
Document1 SubDocument2 2
Document1 SubDocument3 2
Document1 SubDocument4 2
SubDocument1 Outcome1 3
SubDocument1 Outcome2 3
SubDocument1 Outcome3 3
Subdocument2 Outcome1 3
Subdocument2 Outcome2 3
Subdocument3 Outcome1 3
*NA* Document2 1
Document2 SubDoc1 2