如何将数据框中的记录转换为 r 中的 0 和 1?
how to convert my records in a dataframe into 0's and 1's in r?
我的样本数据框如下
p<-c("name1","name2","name3","name4","name5")
x<-c(seq(0,4,by=1))
y<-c(0,0,1,1,2)
z<-c(11,2,1,0,1)
df<-data.frame(p,x,y,z)
我想将上面的数据框转换成下面的格式
p<-c("name1","name2","name3","name4","name5")
x<-c(0,1,1,1,1)
y<-c(0,0,1,1,1)
z<-c(1,1,1,0,1)
df<-data.frame(p,x,y,z)
即,我希望所有大于 1 的记录都为 1,所有零都为 zeros.Please help
您可以为此目的使用函数 sign
:
df[c("x","y","z")] <- sign(df[c("x","y","z")])
df
# p x y z
# 1 name1 0 0 1
# 2 name2 1 0 1
# 3 name3 1 1 1
# 4 name4 1 1 0
# 5 name5 1 1 1
你也可以
df[-1] <- (df[-1]!=0)+0L
或者
df[-1] <- (!!df[-1])+0L
基准
set.seed(24)
df2 <- as.data.frame(matrix(sample(0:10, 5000*5000, replace=TRUE), ncol=5000))
system.time((df2!=0)+0L)
# user system elapsed
# 0.801 0.061 0.861
system.time(sign(df2))
# user system elapsed
#1.315 0.022 1.336
system.time((!!df2)+0L)
# user system elapsed
# 0.602 0.044 0.647
0.602 0.044 0.647
library(microbenchmark)
microbenchmark(pascal=sign(df2), akrun=(!!df2)+0L, times=20L, unit='relative')
#Unit: relative
# expr min lq mean median uq max neval cld
# pascal 2.184227 2.164029 2.163411 2.142952 2.138964 2.196735 20 b
# akrun 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 20 a
使用ifelse
有条件地将1
或0
分配给每个元素:
df[, 2:4] <- ifelse(df[, 2:4] == 0, 0, 1)
我的样本数据框如下
p<-c("name1","name2","name3","name4","name5")
x<-c(seq(0,4,by=1))
y<-c(0,0,1,1,2)
z<-c(11,2,1,0,1)
df<-data.frame(p,x,y,z)
我想将上面的数据框转换成下面的格式
p<-c("name1","name2","name3","name4","name5")
x<-c(0,1,1,1,1)
y<-c(0,0,1,1,1)
z<-c(1,1,1,0,1)
df<-data.frame(p,x,y,z)
即,我希望所有大于 1 的记录都为 1,所有零都为 zeros.Please help
您可以为此目的使用函数 sign
:
df[c("x","y","z")] <- sign(df[c("x","y","z")])
df
# p x y z
# 1 name1 0 0 1
# 2 name2 1 0 1
# 3 name3 1 1 1
# 4 name4 1 1 0
# 5 name5 1 1 1
你也可以
df[-1] <- (df[-1]!=0)+0L
或者
df[-1] <- (!!df[-1])+0L
基准
set.seed(24)
df2 <- as.data.frame(matrix(sample(0:10, 5000*5000, replace=TRUE), ncol=5000))
system.time((df2!=0)+0L)
# user system elapsed
# 0.801 0.061 0.861
system.time(sign(df2))
# user system elapsed
#1.315 0.022 1.336
system.time((!!df2)+0L)
# user system elapsed
# 0.602 0.044 0.647
0.602 0.044 0.647
library(microbenchmark)
microbenchmark(pascal=sign(df2), akrun=(!!df2)+0L, times=20L, unit='relative')
#Unit: relative
# expr min lq mean median uq max neval cld
# pascal 2.184227 2.164029 2.163411 2.142952 2.138964 2.196735 20 b
# akrun 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 20 a
使用ifelse
有条件地将1
或0
分配给每个元素:
df[, 2:4] <- ifelse(df[, 2:4] == 0, 0, 1)