缩放数据行

Question

我有一个数据框，行中包含客户信息，列中包含时间段（月）。我将这种格式用于集群目的。我想缩放行中的值。我可以用下面的代码来做，但是有一些问题：

对于本应是简单操作的内容，代码过于复杂。
在某些情况下 "scale" 函数 returns "NaN"。
输入明确的客户名称 (vars=c("A","B",...) 将不起作用，因为真实数据有成千上万的客户。

这是我的示例数据和代码：

mydata 
  cust P1  P2 P3  P4 P5  P6 P7  P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20
1    A  1 1.0  1 1.0  1 1.0  1 1.0  1 1.0   1 1.0   1 1.0   1 1.0   1 1.0   1 1.0
2    B  5 5.0  5 5.0  5 5.0  5 5.0  5 5.0   5 5.0   5 5.0   5 5.0   5 5.0   5 5.0
3    C  9 9.0  9 9.0  9 9.0  9 9.0  9 9.0   9 9.0   9 9.0   9 9.0   9 9.0   9 9.0
4    D  0 1.0  2 1.0  0 1.0  2 1.0  0 1.0   2 1.0   0 1.0   2 1.0   0 1.0   2 1.0
5    E  4 5.0  6 5.0  4 5.0  6 5.0  4 5.0   6 5.0   4 5.0   6 5.0   4 5.0   6 5.0
6    F  8 9.0 10 9.0  8 9.0 10 9.0  8 9.0  10 9.0   8 9.0  10 9.0   8 9.0  10 9.0
7    G  2 1.5  1 0.5  0 0.5  1 1.5  2 1.5   1 0.5   0 0.5   1 1.5   2 1.5   1 0.5
8    H  6 5.5  5 4.5  4 4.5  5 5.5  6 5.5   5 4.5   4 4.5   5 5.5   6 5.5   5 4.5
9    I 10 9.5  9 8.5  8 8.5  9 9.5 10 9.5   9 8.5   8 8.5   9 9.5  10 9.5   9 8.5

我正在使用的代码：

library(dplyr)
library(tidyr)
# first transpose the data
g_mydata = mydata %>% gather(period,value,-cust)
spr_mydata = g_mydata %>% spread(cust,value)
# then scale the values for each period
sc_mydata = spr_mydata %>% 
      mutate_each_(funs(scale),vars = c("A","B","C","D","E","F","G","H","I") )   
# then transpose again back to original format
g_scdata = sc_mydata %>% gather(cust,value,-period)
scaled_data = g_scdata %>% spread(period,value)

感谢您的帮助或建议。

Answer 1

你总是可以尝试 apply():

sc_mydata = apply(spr_mydata[, -1], 1, scale)

如果 NaN 搞砸了，您可以转置 spr_mydata 并尝试直接运行 scale()：

scale(spr_mydata[-1, ])

Answer 2

这是一个 dplyr 的实现方式。

long_data = 
  mydata %>% 
  gather(period, value,-cust)

to_scale = 
  long_data %>%
  group_by(cust) %>%
  summarize(sd = sd(value)) %>%
  filter(sd != 0) %>%
  select(-sd)

flat = 
  long_data %>%
  anti_join(to_scale) %>%
  mutate(value = 0)

wide_scale = 
  long_data %>%
  right_join(to_scale) %>%
  group_by(cust) %>%
  mutate(value = 
           value %>%
           scale %>%
           signif(7)) %>%
  bind_rows(flat) %>%
  spread(period, value)

type = 
  wide_scale %>%
  select(-cust) %>%
  distinct %>%
  mutate(type_ID = 1:n())

customer__type = 
  type %>%
  left_join(wide_scale) %>%
  select(type_ID, cust)

缩放数据行

Scale rows of data

r

scale

dplyr