如何以长格式改变 tibble 的值

Question

我想使用 min/max 两个指标的值进行标准化。是否可以将 tibble 保持为长格式？（下面我用left join做宽幅）

library(tidyverse)

df <- tibble(ind =c(1, 2),
             `2015` = c(3,10),
             `2016` = c(7,18),
            `2017` = c(1,4))

# long format
df2 <- df %>%
    gather("year", "value", 2:4)

df3 <- df2 %>%
    group_by(ind) %>%
    summarise(mn = min(value),
              mx = max(value))

# wide format? 
df4 <- left_join(df2, df3, by = c("ind"="ind"))

df5 <- df4 %>%
  mutate(value2 = (value-mn)/(mx-mn))
Created on 2019-10-07 by the reprex package (v0.3.0)

Answer 1

而不是执行 left_join，可以使用 mutate 创建列并避免 summarise 步骤

library(dplyr)
df2 %>% 
    group_by(ind) %>%
    mutate(mn = min(value), mx = max(value)) %>%
    ungroup %>%
    mutate(value2 = (value - mn)/(mx-mn))

注意：在这里，我们假设 OP 需要最终输出中的列 'mx'、'mn'。但是，如果只打算获得 'value2'，则不需要像评论中提到的@Gregor 那样创建额外的列

df2 %>%
    group_by(ind) %>%
    mutate(value2 = (value - min(value))/(max(value) - min(value)))

此外，使用 tidyr_1.0.0 而不是 gather，可以使用更通用的 pivot_longer，因为它可以处理多组列以从 [=29= 重塑] 至 'long'

library(tidyr)
df %>% 
   pivot_longer(cols = -ind) %>% 
   group_by(ind) %>% 
   mutate(mn = min(value), mx = max(value))  %>%
   ungroup %>%
   mutate(value2 = (value - mn)/(mx-mn))

如何以长格式改变 tibble 的值

How to mutate values of a tibble in long format

r

tibble