如何向上移动数据并摆脱 NA?
How to move data up and get rid of NA?
photo of current data
数据显示某些点不适用,但信息就在它的正下方。它是相同的 UPC、Store 和 Week。如何对我的数据进行分组以避免冗余和 NA 数据?
到目前为止,这是我的代码:
`library(tidyverse)
RD <- read.csv("Raw Soft Drinks Sales Data.csv")
U <- read.csv("UPC Soft Drinks.csv") %>%
mutate(UPC = as.factor(UPC),
BRAND = as.factor(BRAND),
CLASS = as.factor(CLASS))
RDX <- RD %>%
filter(UPC != "Total") %>%
select (-c(Total.Q1,Total.Q2,Total.Q3,Total.Q4))
RDXL <- RDX %>%
pivot_longer(
cols = starts_with("Week"),
# cols = X1:X52,
# cols = !c("STORE","UPC"),
names_to = "WEEK",
names_prefix = "Week",
values_to = "UNITS",
values_drop_na = TRUE)
RDW <- pivot_wider(RDXL, names_from = "ITEM", values_from = "UNITS")%>%
select(-TOTAL)
`
这是原始数据集的样子:
original data
我需要将 Store、UPC、Dollars、Units、Feat、Deal 和 Week 作为它们自己的列。
我认为您可能希望以宽格式获取所有数据。
如果您当前的数据每周每个数据列 (DOLLARS:DEAL) 都有一个值,并且所有其他数据列都是 NA,您可能希望像这样合并数据:
最小示例数据
tibble(store=c(1,1,1), UPC=rep(1200000044, 3), week=c(1, 1, 1), DOLLARS=c(10.1, NA, NA), UNITS=c(NA, 30, NA), FEAT=c(NA, NA, 'unknown'))
# A tibble: 3 x 6
store UPC week DOLLARS UNITS FEAT
<dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 1 1200000044 1 10.1 NA NA
2 1 1200000044 1 NA 30 NA
3 1 1200000044 1 NA NA unknown
一个解决方案
您可以 group_by ID 列(UPC、周),然后 sort
和 na.last=TRUE
:
library(dplyr)
df %>% group_by(UPC, week) %>%
mutate(across(DOLLARS:FEAT, sort, na.last = TRUE)) %>%
filter(!if_all(DOLLARS:FEAT, is.na))
# A tibble: 1 x 6
# Groups: UPC, week [1]
store UPC week DOLLARS UNITS FEAT
<dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 1 1200000044 1 10.1 30 unknown
原始数据的最小表示
df2<-tibble(store=c(1, 1), UPC=rep(1200000044, 2), ITEM=c('DOLLARS', 'UNITS'), Week.1=c(58.4, 29), Week.2=c(118.8, 55))
df2
# A tibble: 2 x 5
store UPC ITEM Week.1 Week.2
<dbl> <dbl> <chr> <dbl> <dbl>
1 1 1200000044 DOLLARS 58.4 119.
2 1 1200000044 UNITS 29 55
原始数据的解
从原始数据开始,可以依次pivot_longer
%>% pivot_wider
df2 %>% pivot_longer(cols = starts_with('Week'), names_to = 'week', values_to = 'value')%>%
pivot_wider(names_from = ITEM)
# A tibble: 2 x 5
store UPC week DOLLARS UNITS
<dbl> <dbl> <chr> <dbl> <dbl>
1 1 1200000044 Week.1 58.4 29
2 1 1200000044 Week.2 119. 55
我们也可以
library(dplyr)
df %>%
group_by(UPC, week) %>%
mutate(across(DOLLARS:FEAT, ~ .[order(is.na(.))])) %>%
filter(if_all(DOLLARS:FEAT, Negate(is.na)))
photo of current data
数据显示某些点不适用,但信息就在它的正下方。它是相同的 UPC、Store 和 Week。如何对我的数据进行分组以避免冗余和 NA 数据?
到目前为止,这是我的代码:
`library(tidyverse)
RD <- read.csv("Raw Soft Drinks Sales Data.csv")
U <- read.csv("UPC Soft Drinks.csv") %>%
mutate(UPC = as.factor(UPC),
BRAND = as.factor(BRAND),
CLASS = as.factor(CLASS))
RDX <- RD %>%
filter(UPC != "Total") %>%
select (-c(Total.Q1,Total.Q2,Total.Q3,Total.Q4))
RDXL <- RDX %>%
pivot_longer(
cols = starts_with("Week"),
# cols = X1:X52,
# cols = !c("STORE","UPC"),
names_to = "WEEK",
names_prefix = "Week",
values_to = "UNITS",
values_drop_na = TRUE)
RDW <- pivot_wider(RDXL, names_from = "ITEM", values_from = "UNITS")%>%
select(-TOTAL)
`
这是原始数据集的样子: original data
我需要将 Store、UPC、Dollars、Units、Feat、Deal 和 Week 作为它们自己的列。
我认为您可能希望以宽格式获取所有数据。 如果您当前的数据每周每个数据列 (DOLLARS:DEAL) 都有一个值,并且所有其他数据列都是 NA,您可能希望像这样合并数据:
最小示例数据
tibble(store=c(1,1,1), UPC=rep(1200000044, 3), week=c(1, 1, 1), DOLLARS=c(10.1, NA, NA), UNITS=c(NA, 30, NA), FEAT=c(NA, NA, 'unknown'))
# A tibble: 3 x 6
store UPC week DOLLARS UNITS FEAT
<dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 1 1200000044 1 10.1 NA NA
2 1 1200000044 1 NA 30 NA
3 1 1200000044 1 NA NA unknown
一个解决方案
您可以 group_by ID 列(UPC、周),然后 sort
和 na.last=TRUE
:
library(dplyr)
df %>% group_by(UPC, week) %>%
mutate(across(DOLLARS:FEAT, sort, na.last = TRUE)) %>%
filter(!if_all(DOLLARS:FEAT, is.na))
# A tibble: 1 x 6
# Groups: UPC, week [1]
store UPC week DOLLARS UNITS FEAT
<dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 1 1200000044 1 10.1 30 unknown
原始数据的最小表示
df2<-tibble(store=c(1, 1), UPC=rep(1200000044, 2), ITEM=c('DOLLARS', 'UNITS'), Week.1=c(58.4, 29), Week.2=c(118.8, 55))
df2
# A tibble: 2 x 5
store UPC ITEM Week.1 Week.2
<dbl> <dbl> <chr> <dbl> <dbl>
1 1 1200000044 DOLLARS 58.4 119.
2 1 1200000044 UNITS 29 55
原始数据的解
从原始数据开始,可以依次pivot_longer
%>% pivot_wider
df2 %>% pivot_longer(cols = starts_with('Week'), names_to = 'week', values_to = 'value')%>%
pivot_wider(names_from = ITEM)
# A tibble: 2 x 5
store UPC week DOLLARS UNITS
<dbl> <dbl> <chr> <dbl> <dbl>
1 1 1200000044 Week.1 58.4 29
2 1 1200000044 Week.2 119. 55
我们也可以
library(dplyr)
df %>%
group_by(UPC, week) %>%
mutate(across(DOLLARS:FEAT, ~ .[order(is.na(.))])) %>%
filter(if_all(DOLLARS:FEAT, Negate(is.na)))