找到 r 中最大的因子和索引的最大值

Question

这应该非常简单，但我没能弄明白。我想获得每组的最大值，我这样做如下。

ddply(dd,~group,summarise,max=max(value))

但除了 return 值和组之外，我还想 return 值、组和另一列、日期，在下面编入索引（显然行不通）。我该怎么做？谢谢

ddply(dd,~group,summarise,max=max(value))['date']

Answer 1

如果您在对应于具有最大值的行的日期之后，请尝试 subset 获取最大值的行以及 select 获取您想要的列再之后。

# reproducible example using `iris`

# your original
ddply(iris, ~Species, summarise, max=max(Sepal.Length))
#      Species max
# 1     setosa 5.8
# 2 versicolor 7.0
# 3  virginica 7.9


# now we want to get the Sepal.Width that corresponds to max sepal.length too.
ddply(iris, ~Species, subset, Sepal.Length==max(Sepal.Length),
      select=c('Species', 'Sepal.Length', 'Sepal.Width'))
#      Species Sepal.Length Sepal.Width
# 1     setosa          5.8         4.0
# 2 versicolor          7.0         3.2
# 3  virginica          7.9         3.8

（或者不在 subset 调用中使用 select，而是在 ddply 之后使用 [, c('columns', 'I', 'want')]）。如果同一物种有多行达到最大值，这将 return 所有这些。

你也可以使用 summarise 来做，只需在调用中添加你的 date 定义，但它的效率有点低（计算两次最大值）：

ddply(iris, ~Species, summarise,
      max=max(Sepal.Length),
      width=Sepal.Width[which.max(Sepal.Length)])

每个物种只有 return 一行，如果有多朵花的萼片长度达到其物种的最大长度，则只有第一朵被 return 编辑（which.max returns 第一个匹配索引）。

Answer 2

如果我们使用 data.table（使用 iris 数据集），我们将 data.frame 转换为 data.table，按分组变量分组（'Species' )，我们得到一个变量 ('Sepal.Length') 的 max 值的索引，并使用它来对 .SDcols.

中指示的列进行子集化

library(data.table)
dt <- as.data.table(iris)
dt[, .SD[which.max(Sepal.Length)]  , by = Species, 
                 .SDcols= c('Sepal.Length', 'Sepal.Width')]

找到 r 中最大的因子和索引的最大值

find max of factor and index that max in r

r

plyr

dataframe