如何向上移动数据并摆脱 NA?

How to move data up and get rid of NA?

photo of current data

数据显示某些点不适用,但信息就在它的正下方。它是相同的 UPC、Store 和 Week。如何对我的数据进行分组以避免冗余和 NA 数据?

到目前为止,这是我的代码:

`library(tidyverse)
RD <- read.csv("Raw Soft Drinks Sales Data.csv")
U  <- read.csv("UPC Soft Drinks.csv") %>%
  mutate(UPC   = as.factor(UPC),
         BRAND = as.factor(BRAND),
         CLASS = as.factor(CLASS))
RDX <- RD %>%
  filter(UPC != "Total") %>%
  select (-c(Total.Q1,Total.Q2,Total.Q3,Total.Q4))


RDXL <- RDX %>%
  pivot_longer(
    cols = starts_with("Week"),
#    cols = X1:X52,
#    cols = !c("STORE","UPC"),
    names_to = "WEEK", 
    names_prefix = "Week",
    values_to = "UNITS",
    values_drop_na = TRUE)

RDW <- pivot_wider(RDXL, names_from = "ITEM", values_from = "UNITS")%>%
  select(-TOTAL)

`

这是原始数据集的样子: original data

我需要将 Store、UPC、Dollars、Units、Feat、Deal 和 Week 作为它们自己的列。

我认为您可能希望以宽格式获取所有数据。 如果您当前的数据每周每个数据列 (DOLLARS:DEAL) 都有一个值,并且所有其他数据列都是 NA,您可能希望像这样合并数据:

最小示例数据

tibble(store=c(1,1,1), UPC=rep(1200000044, 3), week=c(1, 1, 1), DOLLARS=c(10.1, NA, NA), UNITS=c(NA, 30, NA), FEAT=c(NA, NA, 'unknown'))

# A tibble: 3 x 6
  store        UPC  week DOLLARS UNITS FEAT   
  <dbl>      <dbl> <dbl>   <dbl> <dbl> <chr>  
1     1 1200000044     1    10.1    NA NA     
2     1 1200000044     1    NA      30 NA     
3     1 1200000044     1    NA      NA unknown

一个解决方案

您可以 group_by ID 列(UPC、周),然后 sortna.last=TRUE :

library(dplyr)

 df %>% group_by(UPC, week) %>%
         mutate(across(DOLLARS:FEAT, sort, na.last = TRUE)) %>%
         filter(!if_all(DOLLARS:FEAT, is.na))

# A tibble: 1 x 6
# Groups:   UPC, week [1]
  store        UPC  week DOLLARS UNITS FEAT   
  <dbl>      <dbl> <dbl>   <dbl> <dbl> <chr>  
1     1 1200000044     1    10.1    30 unknown

原始数据的最小表示

df2<-tibble(store=c(1, 1), UPC=rep(1200000044, 2), ITEM=c('DOLLARS', 'UNITS'), Week.1=c(58.4, 29), Week.2=c(118.8, 55))

df2

# A tibble: 2 x 5
  store        UPC ITEM    Week.1 Week.2
  <dbl>      <dbl> <chr>    <dbl>  <dbl>
1     1 1200000044 DOLLARS   58.4   119.
2     1 1200000044 UNITS     29      55 

原始数据的解

从原始数据开始,可以依次pivot_longer %>% pivot_wider

df2 %>% pivot_longer(cols = starts_with('Week'), names_to = 'week', values_to = 'value')%>%
        pivot_wider(names_from = ITEM)

# A tibble: 2 x 5
  store        UPC week   DOLLARS UNITS
  <dbl>      <dbl> <chr>    <dbl> <dbl>
1     1 1200000044 Week.1    58.4    29
2     1 1200000044 Week.2   119.     55

我们也可以

library(dplyr)
df %>%
    group_by(UPC, week) %>%
    mutate(across(DOLLARS:FEAT, ~ .[order(is.na(.))])) %>%
    filter(if_all(DOLLARS:FEAT, Negate(is.na)))