R:在保持最大值的同时删除重复项
R: removing duplicates while keeping the max value
对于重复的每一行 var
我需要取每一列的最大值。我尝试了 plyr 方法,但它非常非常慢。有任何优化建议或替代方案吗?
x <- ddply(x, "var", numcolwise(max))
下面的数据示例,但我有大约 10K 行和 5K。
var L1 L2 L3 L4 L5 L6
var1 0.509100101 0.015739581 0.957789164 0.902292389 0.826366893 0.182984811
var1 0.879927428 0.652231012 0.937680366 0.275712565 0.697747469 0.839823493
var1 0.708760238 0.650970415 0.342625412 0.945388553 0.780597336 0.125712247
var4 0.169279846 0.712183393 0.805263513 0.119002298 0.774175311 0.656468809
var5 0.343674627 0.808923336 0.339527067 0.73768585 0.166461354 0.588347206
var5 0.690868073 0.195300702 0.16614307 0.112241319 0.747591381 0.9433346
var5 0.209711182 0.764779757 0.283426455 0.373367398 0.225350503 0.492510721
var8 0.805516572 0.842888657 0.275147394 0.960842561 0.734521004 0.605078001
var9 0.534889726 0.444212963 0.165563287 0.590991951 0.244009571 0.188597899
var10 0.499110864 0.496455884 0.993483606 0.444440833 0.755452203 0.895849025
var11 0.366111435 0.580177811 0.411523103 0.111694655 0.892282071 0.777860302
var12 0.200600203 0.809859039 0.890912839 0.063123472 0.107956515 0.441015454
var13 0.46876789 0.529304262 0.410794171 0.072144288 0.497160109 0.291629322
var14 0.546108044 0.436783087 0.315849023 0.610337503 0.552454722 0.50320375
var14 0.206252562 0.919754189 0.060999717 0.808003665 0.441351656 0.190068706
你可以试试
library(dplyr)
x %>%
group_by(var) %>%
summarise_each(funs(max))
或者
library(data.table)
setDT(x)[, lapply(.SD, max) , var]
将数据融成"long"格式,按值递减排序,取var
和variable
每个组合的第一个实例:
library(magrittr)
library(plyr)
library(reshape2)
data %>%
melt(id.vars = "var") %>%
arrange(-value) %>%
subset(!duplicated(interaction(var, value)))
如有必要,将数据转换回 "wide" 格式:
... %>%
dcast(var~variable)
对于重复的每一行 var
我需要取每一列的最大值。我尝试了 plyr 方法,但它非常非常慢。有任何优化建议或替代方案吗?
x <- ddply(x, "var", numcolwise(max))
下面的数据示例,但我有大约 10K 行和 5K。
var L1 L2 L3 L4 L5 L6
var1 0.509100101 0.015739581 0.957789164 0.902292389 0.826366893 0.182984811
var1 0.879927428 0.652231012 0.937680366 0.275712565 0.697747469 0.839823493
var1 0.708760238 0.650970415 0.342625412 0.945388553 0.780597336 0.125712247
var4 0.169279846 0.712183393 0.805263513 0.119002298 0.774175311 0.656468809
var5 0.343674627 0.808923336 0.339527067 0.73768585 0.166461354 0.588347206
var5 0.690868073 0.195300702 0.16614307 0.112241319 0.747591381 0.9433346
var5 0.209711182 0.764779757 0.283426455 0.373367398 0.225350503 0.492510721
var8 0.805516572 0.842888657 0.275147394 0.960842561 0.734521004 0.605078001
var9 0.534889726 0.444212963 0.165563287 0.590991951 0.244009571 0.188597899
var10 0.499110864 0.496455884 0.993483606 0.444440833 0.755452203 0.895849025
var11 0.366111435 0.580177811 0.411523103 0.111694655 0.892282071 0.777860302
var12 0.200600203 0.809859039 0.890912839 0.063123472 0.107956515 0.441015454
var13 0.46876789 0.529304262 0.410794171 0.072144288 0.497160109 0.291629322
var14 0.546108044 0.436783087 0.315849023 0.610337503 0.552454722 0.50320375
var14 0.206252562 0.919754189 0.060999717 0.808003665 0.441351656 0.190068706
你可以试试
library(dplyr)
x %>%
group_by(var) %>%
summarise_each(funs(max))
或者
library(data.table)
setDT(x)[, lapply(.SD, max) , var]
将数据融成"long"格式,按值递减排序,取var
和variable
每个组合的第一个实例:
library(magrittr)
library(plyr)
library(reshape2)
data %>%
melt(id.vars = "var") %>%
arrange(-value) %>%
subset(!duplicated(interaction(var, value)))
如有必要,将数据转换回 "wide" 格式:
... %>%
dcast(var~variable)