总结具有共同数值字段的 2 个独立数据帧的同时 group_by

Question

我有一个由 'name'、'measure' 组成的数据框 A。我有另一个由 'type'、'measure' 组成的数据框 B。我想使用 'measure' 字段在每个 'type' 组 B 上使用每个 'name' 组 A 执行汇总操作。例如：

A 有：

Name | Measure
George   5
George   6
Tyrone   7
Tyrone   3

B 拥有：

Type | Measure
cold      3
cold      2
hot       1
hot       5

我想对 George and cold、George and hot、Tyrone and cold、Tyrone and hot 做一个总结，在每个总结中我找到最小绝对差（George on cold 将是 min(abs(5- 3, 5-2, 6-3, 6-2)) = 2，然后为每个 'name' 找到具有最低此类分数的 'Type'。对于大型数据集，我究竟该怎么做有很多组？

Answer 1

可能有更简单的方法，但您可以这样做：

library(tidyverse)

crossing(
  distinct(A, Name),
  distinct(B, Type)
) %>%
  left_join(A, by = 'Name') %>%
  left_join(B, by = 'Type') %>%
  group_by(Name, Type) %>%
  summarise(minAbsDiff = min(abs(Measure.x - Measure.y))) %>%
  group_by(Name) %>%
  slice(which.min(minAbsDiff))

输出：

# A tibble: 2 x 3
# Groups:   Name [2]
  Name   Type  minAbsDiff
  <fct>  <fct>      <int>
1 George hot            0
2 Tyrone cold           0

总结具有共同数值字段的 2 个独立数据帧的同时 group_by

Summarize Simultaneous group_by of 2 separate dataframes with common numerical field

group-by

r

plyr