替换 R 中条目的更短方法
Shorter method to replace entries in R
最近开始学习R。这是我正在使用的源文件 (https://github.com/cosname/art-r-translation/blob/master/data/Grades.txt)。无论如何,我可以在不使用循环的情况下将字母等级从 A 更改为 4.0,A- 更改为 3.7 等吗?
我问是因为如果有 1M 个条目,"for" 循环可能不是修改数据的最有效方法。如果有任何帮助,我将不胜感激。
自从一位 post 人员告诉我 post 我的代码后,我想到了 运行 for 循环,看看我是否能够做到。这是我的代码:
mygrades<-read.table("grades.txt",header = TRUE)
i <- for (i in 1:nrow(mygrades))
{
#print(i)
#for now, see whether As get replaced with 4.0.
if(mygrades[i,1]=="A")
{
mygrades[i,1]=4.0
}
else if (mygrades[i,2]=="A")
{
mygrades[i,2]=4.0
}
else if (mygrades[i,3]=="A")
{
mygrades[i,3]=4.0
}
else
{
#do nothing...continues
}
}
write.table(mygrades,"newgrades.txt")
但是,输出有点奇怪。对于一些 "A"s,我得到 NA 而其他人保持原样。有人可以帮我处理这段代码吗?
@alistaire,我确实尝试过 Hadley 的查找 table,它有效。我还查看了 dplyr 代码,它运行良好。但是,为了我的理解,我仍在尝试使用 for 循环。请注意,我打开 R 书已经有两天了。这是修改后的代码。
#there was one mistake in my code: I didn't use stringsAsFactors=False.
#now, this code doesn't work for all "A"s. It spits out 4.0 for some As, and #doesn't do so for others. Why would that be?
mygrades<-read.table("grades.txt",header = TRUE,stringsAsFactors=FALSE)
i <- for (i in 1:nrow(mygrades))
{
#print(i)
if(mygrades[i,1]=="A")
{
mygrades[i,1]=4.0
}
else if (mygrades[i,2]=="A")
{
mygrades[i,2]=4.0
}
else if (mygrades[i,3]=="A")
{
mygrades[i,3]=4.0
}
else
{
#do nothing...continues
}
}
write.table(mygrades,"newgrades.txt")
输出为:
"final_exam" "quiz_avg" "homework_avg"
"1" "C" "4" "A"
"2" "C-" "B-" "4"
"3" "D+" "B+" "4"
"4" "B+" "B+" "4"
"5" "F" "B+" "4"
"6" "B" "A-" "4"
"7" "D+" "B+" "A-"
"8" "D" "A-" "4"
"9" "F" "B+" "4"
"10" "4" "C-" "B+"
"11" "A+" "4" "A"
"12" "A-" "4" "A"
"13" "B" "4" "A"
"14" "D-" "A-" "4"
"15" "A+" "4" "A"
"16" "B" "A-" "4"
"17" "F" "D" "A-"
"18" "B" "4" "A"
"19" "B" "B+" "4"
"20" "A+" "A-" "4"
"21" "4" "A" "A"
"22" "B" "B+" "4"
"23" "D" "B+" "4"
"24" "A-" "A-" "4"
"25" "F" "4" "A"
"26" "B+" "B+" "4"
"27" "A-" "B+" "4"
"28" "A+" "4" "A"
"29" "4" "A-" "A"
"30" "A+" "A-" "4"
"31" "4" "B+" "A-"
"32" "B+" "B+" "4"
"33" "C" "4" "A"
正如您在第一行中看到的,第一个 A 被重新编码为 4,但第二个 A 没有被重新编码。知道为什么会这样吗?
提前致谢。
Base R 中的一种典型方法是将命名向量作为查找 table,例如
# data with fewer levels for simplicity
df <- data.frame(x = rep(1:3, 2), y = rep(1:2, 3))
lookup <- c(`1` = "A", `2` = "B", `3` = "C")
并将其与每一列进行子集化:
data.frame(lapply(df, function(x){lookup[x]}))
## x y
## 1 A A
## 2 B B
## 3 C A
## 4 A B
## 5 B A
## 6 C B
另外,dplyr
最近添加了一个 recode
对此类工作很有用的函数:
library(dplyr)
df <- read.table('https://raw.githubusercontent.com/cosname/art-r-translation/master/data/Grades.txt', header = TRUE)
df %>% mutate_all(funs(recode(., A = '4.0',
`A-` = '3.7'))) %>% # etc.
as_data_frame() # for prettier printing
## # A tibble: 33 x 3
## final_exam quiz_avg homework_avg
## <fctr> <fctr> <fctr>
## 1 C 4.0 4.0
## 2 C- B- 4.0
## 3 D+ B+ 4.0
## 4 B+ B+ 4.0
## 5 F B+ 4.0
## 6 B 3.7 4.0
## 7 D+ B+ 3.7
## 8 D 3.7 4.0
## 9 F B+ 4.0
## 10 39 C- B+
## # ... with 23 more rows
最近开始学习R。这是我正在使用的源文件 (https://github.com/cosname/art-r-translation/blob/master/data/Grades.txt)。无论如何,我可以在不使用循环的情况下将字母等级从 A 更改为 4.0,A- 更改为 3.7 等吗?
我问是因为如果有 1M 个条目,"for" 循环可能不是修改数据的最有效方法。如果有任何帮助,我将不胜感激。
自从一位 post 人员告诉我 post 我的代码后,我想到了 运行 for 循环,看看我是否能够做到。这是我的代码:
mygrades<-read.table("grades.txt",header = TRUE)
i <- for (i in 1:nrow(mygrades))
{
#print(i)
#for now, see whether As get replaced with 4.0.
if(mygrades[i,1]=="A")
{
mygrades[i,1]=4.0
}
else if (mygrades[i,2]=="A")
{
mygrades[i,2]=4.0
}
else if (mygrades[i,3]=="A")
{
mygrades[i,3]=4.0
}
else
{
#do nothing...continues
}
}
write.table(mygrades,"newgrades.txt")
但是,输出有点奇怪。对于一些 "A"s,我得到 NA 而其他人保持原样。有人可以帮我处理这段代码吗?
@alistaire,我确实尝试过 Hadley 的查找 table,它有效。我还查看了 dplyr 代码,它运行良好。但是,为了我的理解,我仍在尝试使用 for 循环。请注意,我打开 R 书已经有两天了。这是修改后的代码。
#there was one mistake in my code: I didn't use stringsAsFactors=False.
#now, this code doesn't work for all "A"s. It spits out 4.0 for some As, and #doesn't do so for others. Why would that be?
mygrades<-read.table("grades.txt",header = TRUE,stringsAsFactors=FALSE)
i <- for (i in 1:nrow(mygrades))
{
#print(i)
if(mygrades[i,1]=="A")
{
mygrades[i,1]=4.0
}
else if (mygrades[i,2]=="A")
{
mygrades[i,2]=4.0
}
else if (mygrades[i,3]=="A")
{
mygrades[i,3]=4.0
}
else
{
#do nothing...continues
}
}
write.table(mygrades,"newgrades.txt")
输出为:
"final_exam" "quiz_avg" "homework_avg"
"1" "C" "4" "A"
"2" "C-" "B-" "4"
"3" "D+" "B+" "4"
"4" "B+" "B+" "4"
"5" "F" "B+" "4"
"6" "B" "A-" "4"
"7" "D+" "B+" "A-"
"8" "D" "A-" "4"
"9" "F" "B+" "4"
"10" "4" "C-" "B+"
"11" "A+" "4" "A"
"12" "A-" "4" "A"
"13" "B" "4" "A"
"14" "D-" "A-" "4"
"15" "A+" "4" "A"
"16" "B" "A-" "4"
"17" "F" "D" "A-"
"18" "B" "4" "A"
"19" "B" "B+" "4"
"20" "A+" "A-" "4"
"21" "4" "A" "A"
"22" "B" "B+" "4"
"23" "D" "B+" "4"
"24" "A-" "A-" "4"
"25" "F" "4" "A"
"26" "B+" "B+" "4"
"27" "A-" "B+" "4"
"28" "A+" "4" "A"
"29" "4" "A-" "A"
"30" "A+" "A-" "4"
"31" "4" "B+" "A-"
"32" "B+" "B+" "4"
"33" "C" "4" "A"
正如您在第一行中看到的,第一个 A 被重新编码为 4,但第二个 A 没有被重新编码。知道为什么会这样吗?
提前致谢。
Base R 中的一种典型方法是将命名向量作为查找 table,例如
# data with fewer levels for simplicity
df <- data.frame(x = rep(1:3, 2), y = rep(1:2, 3))
lookup <- c(`1` = "A", `2` = "B", `3` = "C")
并将其与每一列进行子集化:
data.frame(lapply(df, function(x){lookup[x]}))
## x y
## 1 A A
## 2 B B
## 3 C A
## 4 A B
## 5 B A
## 6 C B
另外,dplyr
最近添加了一个 recode
对此类工作很有用的函数:
library(dplyr)
df <- read.table('https://raw.githubusercontent.com/cosname/art-r-translation/master/data/Grades.txt', header = TRUE)
df %>% mutate_all(funs(recode(., A = '4.0',
`A-` = '3.7'))) %>% # etc.
as_data_frame() # for prettier printing
## # A tibble: 33 x 3
## final_exam quiz_avg homework_avg
## <fctr> <fctr> <fctr>
## 1 C 4.0 4.0
## 2 C- B- 4.0
## 3 D+ B+ 4.0
## 4 B+ B+ 4.0
## 5 F B+ 4.0
## 6 B 3.7 4.0
## 7 D+ B+ 3.7
## 8 D 3.7 4.0
## 9 F B+ 4.0
## 10 39 C- B+
## # ... with 23 more rows