我如何对每个样本的多个观察结果中的变量进行分类？

Question

新 R 用户。我已经测量了一堆公司徽标的颜色（色调）。每个徽标的观察次数可以不同。我的数据格式如下：

Industry <- c("Fossil", "Fossil", "Fossil", "Fossil", "Fossil", "Renewable", "Renewable", "Renewable")
Logo <- c("Petrox", "Petrox", "Petrox", "Petrox", "Petrox", "Windo", "Windo", "Windo")
Hue <- c(36, 37, 43, 185, 190, 356, 310, 25)
df <- data.frame(Industry, Logo, Hue)

我一直在尝试使用 cut().

为示例中的每个徽标合并 df$Hue 变量

# set up cut-off values 
breaks <- c(0,45,90,135,180,225,270,315,360)

# specify interval/bin labels
labels <- c("[0-45)","[45-90)", "[90-135)", "[135-180)", "[180-225)", "[225-270)","[270-315)", "[315-360)")

我想得到一个数据框，每个徽标一行，每个箱子一列，它计算每个徽标每个徽标在一个间隔内发生观察的次数，例如这个：

Ind	Logo	[0-45)	[45-90)	[90-135)	[135-180)	[180-225)	[225-270)	[270-315)	[315-360)
Fossil	Petrol	3	0	0	0	2	0	0	0
Renewable	Wind	1	0	0	0	0	0	1	1

我一直在寻找好的解决方案，但到目前为止还没有找到有用的答案。有没有一种简单的方法可以 subset() 或 split() 与 cut() 功能？到目前为止，我对解决方案的搜索一无所获。我确定这是我需要的一个非常简单的东西。

Answer 1

您可以使用 cut 将数据分类，complete 序列并使用 pivot_wider.

获取宽格式数据

library(dplyr)  
library(tidyr)

  
df %>%
  count(Industry, Logo, Hue = cut(Hue, breaks, labels)) %>%
  complete(Industry, Hue = labels, fill = list(n = 0)) %>%
  fill(Logo) %>%
  arrange(match(Hue, labels)) %>%
  pivot_wider(names_from = Hue, values_from = n)

#   Industry  Logo   `[0-45)` `[45-90)` `[90-135)` `[135-180)` `[180-225)` `[225-270)` `[270-315)` `[315-360)`
#  <chr>     <chr>     <dbl>     <dbl>      <dbl>       <dbl>       <dbl>       <dbl>       <dbl>       <dbl>
#1 Fossil    Petrox        3         0          0           0           2           0           0           0
#2 Renewable Windo         1         0          0           0           0           0           1           1

我如何对每个样本的多个观察结果中的变量进行分类？

How do I bin a variable across a number of observations for each specimen?

loops

r

subset

categories

binning