在单独的行 R 之后清除组中除第一行之外的变量
Clear the variables except first row in a group after separate row R
我必须完成 3 件事:
- 由 Col 拆分的单独行:"Date" 包含不同的年份。
- EXCEPT Col :"Fix" , 只保留 FIRST row within a group 有数据。
计算 Col : "Price" 偶数除以拆分的行数。
# Dataset call df
Name Fix Price Date
Tom 600 500 2016-01-01
John 800 400 2016-01-03;2016-01-09
Mary 1100 1000 2016-01-04;2017-09-01;2017-08-10;2018-01-01
Joe 30 25 2016-01-04;2017-09-01
Paul 400 304 2015-01-02
Alex 505 238 2018-01-02;2018-02-02
# Targeted df
Name Fix Price Date
Tom 600 500 2016-01-01
John 800 400 2016-01-03;2016-01-09
Mary 1100 250 2016-01-04
Mary 0 250 2017-09-01
Mary 0 250 2017-08-10
Mary 0 250 2018-01-01
Joe 30 12.5 2016-01-04
Joe 0 12.5 2017-09-01
Paul 400 304 2015-01-02
Alex 505 238 2018-01-02;2018-02-02
我有实现 1 和 3 的方法,但我无法制作 2,因为我不知道 fill() 的任何 Diverse 函数。
# Find the SPLIT TARGET first :
inds <- sapply(strsplit(df$Date, ";"), function(x)
length(unique(format(as.Date(x), "%Y"))) > 1) ### This approach actually
does not works in my actual data when over 1 Million rows , i chunk it into a smaller data
to fit this approach's limit.
library(tidyverse)
library(stringr)
#Select those indices
df[inds, ] %>% mutate(Price = Price / (str_count(Date,";") + 1)) %>%
separate_rows(Date, sep = ";") %>%
bind_rows(df[!inds,])
*请提醒您不能使用 Col : " Name " 作为某些内容,因为它们仅表示数据集具有其他必须由 separate_rows
复制的值
我怎样才能清除第 2 点?这让我发疯。感谢您提前提出任何意见
dt = read.table(text = "
Name Fix Price Date
Tom 600 500 2016-01-01
John 800 400 2016-01-03;2016-01-09
Mary 1100 1000 2016-01-04;2017-09-01;2017-08-10;2018-01-01
Joe 30 25 2016-01-04;2017-09-01
Paul 400 304 2015-01-02
Alex 505 238 2018-01-02;2018-02-02
", header=T, stringsAsFactors=F)
library(tidyverse)
library(lubridate)
dt %>%
separate_rows(Date, sep=";") %>% # separate dates
group_by(Name, year = year(ymd(Date))) %>% # for each Name and year of the date
summarise(Fix = unique(Fix), # keep Fix
Price = unique(Price), # keep Price
Date = paste0(Date, collapse = ";")) %>% # combine dates with same year
mutate(Fix = ifelse(row_number() > 1, 0, Fix), # update Fix values
Price = Price/length(Price)) %>% # divide Price by number of rows
ungroup() # forget the grouping
# # A tibble: 9 x 5
# Name year Fix Price Date
# <chr> <dbl> <dbl> <dbl> <chr>
# 1 Alex 2018 505 238 2018-01-02;2018-02-02
# 2 Joe 2016 30 12.5 2016-01-04
# 3 Joe 2017 0 12.5 2017-09-01
# 4 John 2016 800 400 2016-01-03;2016-01-09
# 5 Mary 2016 1100 333. 2016-01-04
# 6 Mary 2017 0 333. 2017-09-01;2017-08-10
# 7 Mary 2018 0 333. 2018-01-01
# 8 Paul 2015 400 304 2015-01-02
# 9 Tom 2016 600 500 2016-01-01
我必须完成 3 件事:
- 由 Col 拆分的单独行:"Date" 包含不同的年份。
- EXCEPT Col :"Fix" , 只保留 FIRST row within a group 有数据。
计算 Col : "Price" 偶数除以拆分的行数。
# Dataset call df Name Fix Price Date Tom 600 500 2016-01-01 John 800 400 2016-01-03;2016-01-09 Mary 1100 1000 2016-01-04;2017-09-01;2017-08-10;2018-01-01 Joe 30 25 2016-01-04;2017-09-01 Paul 400 304 2015-01-02 Alex 505 238 2018-01-02;2018-02-02 # Targeted df Name Fix Price Date Tom 600 500 2016-01-01 John 800 400 2016-01-03;2016-01-09 Mary 1100 250 2016-01-04 Mary 0 250 2017-09-01 Mary 0 250 2017-08-10 Mary 0 250 2018-01-01 Joe 30 12.5 2016-01-04 Joe 0 12.5 2017-09-01 Paul 400 304 2015-01-02 Alex 505 238 2018-01-02;2018-02-02
我有实现 1 和 3 的方法,但我无法制作 2,因为我不知道 fill() 的任何 Diverse 函数。
# Find the SPLIT TARGET first :
inds <- sapply(strsplit(df$Date, ";"), function(x)
length(unique(format(as.Date(x), "%Y"))) > 1) ### This approach actually
does not works in my actual data when over 1 Million rows , i chunk it into a smaller data
to fit this approach's limit.
library(tidyverse)
library(stringr)
#Select those indices
df[inds, ] %>% mutate(Price = Price / (str_count(Date,";") + 1)) %>%
separate_rows(Date, sep = ";") %>%
bind_rows(df[!inds,])
*请提醒您不能使用 Col : " Name " 作为某些内容,因为它们仅表示数据集具有其他必须由 separate_rows
复制的值
我怎样才能清除第 2 点?这让我发疯。感谢您提前提出任何意见
dt = read.table(text = "
Name Fix Price Date
Tom 600 500 2016-01-01
John 800 400 2016-01-03;2016-01-09
Mary 1100 1000 2016-01-04;2017-09-01;2017-08-10;2018-01-01
Joe 30 25 2016-01-04;2017-09-01
Paul 400 304 2015-01-02
Alex 505 238 2018-01-02;2018-02-02
", header=T, stringsAsFactors=F)
library(tidyverse)
library(lubridate)
dt %>%
separate_rows(Date, sep=";") %>% # separate dates
group_by(Name, year = year(ymd(Date))) %>% # for each Name and year of the date
summarise(Fix = unique(Fix), # keep Fix
Price = unique(Price), # keep Price
Date = paste0(Date, collapse = ";")) %>% # combine dates with same year
mutate(Fix = ifelse(row_number() > 1, 0, Fix), # update Fix values
Price = Price/length(Price)) %>% # divide Price by number of rows
ungroup() # forget the grouping
# # A tibble: 9 x 5
# Name year Fix Price Date
# <chr> <dbl> <dbl> <dbl> <chr>
# 1 Alex 2018 505 238 2018-01-02;2018-02-02
# 2 Joe 2016 30 12.5 2016-01-04
# 3 Joe 2017 0 12.5 2017-09-01
# 4 John 2016 800 400 2016-01-03;2016-01-09
# 5 Mary 2016 1100 333. 2016-01-04
# 6 Mary 2017 0 333. 2017-09-01;2017-08-10
# 7 Mary 2018 0 333. 2018-01-01
# 8 Paul 2015 400 304 2015-01-02
# 9 Tom 2016 600 500 2016-01-01