根据 SQL 或 R 中的某些条件复制行
Duplicate the rows based on some criteria in SQL or R
我用R生成了一套玩具
data.frame(name = c("Tom", "Shane", "Daniel", "Akira", "Jack", "Zoe"), c1 = c(1,2,3,0,5,0), c2 = c(0, 3, 5, 0,4,0), c3 = c(0, 0,1,0,0,3), c4=c(0,0,0,1,0,0))
显示如下:
我只关心列c1, c2, c3, c4
,如果特定行有多个值,即大于0。我们需要复制行以确保只有一个值,这大于 0,然后删除原来的行。
例如,第二行有两个值大于0(c1:2,c2:3),那么我们必须将该行复制为两个,看起来像这样
Shane 2 0 0 0
Shane 0 3 0 0
我正在尝试构建一个 SQL 查询来捕获它。但是,我不确定是否有任何 SQL 函数可以在不先查看结果的情况下检测特定行中的多个非零值。无论如何,如果存在任何神奇的 SQL 函数,最终结果应该是这样的:
我也在考虑用R来完成。我知道唯一可以复制行的 R 函数是 do.call()
函数,然后将其与 rbind()
函数组合。但是,它不适用于我的情况。你能给我任何提示吗?非常感谢 :)
您可以使用一些 tidyverse
函数来完成此操作。首先,我们输入您的示例数据
library(tidyverse)
dd <- tribble(~name, ~c1, ~c2, ~c3, ~c4,
"Tom", 1, 0, 0, 0,
"Shane", 2, 3, 0, 0,
"Daniel", 3, 5, 1, 0,
"Akira", 0, 0, 0 ,1,
"Jack", 5, 4, 0, 0,
"Zoe", 0, 0, 3, 0)
然后我们收集、过滤和传播以获得您想要的行。通过添加行 ID,我们将不同的值保留在不同的行上。
dd %>%
gather("var", "val", -name) %>%
rowid_to_column() %>%
filter(val>0) %>%
spread(var, val, fill=0) %>%
select(-rowid)
# A tibble: 10 x 5
# name c1 c2 c3 c4
# * <chr> <dbl> <dbl> <dbl> <dbl>
# 1 Tom 1 0 0 0
# 2 Shane 2 0 0 0
# 3 Daniel 3 0 0 0
# 4 Jack 5 0 0 0
# 5 Shane 0 3 0 0
# 6 Daniel 0 5 0 0
# 7 Jack 0 4 0 0
# 8 Daniel 0 0 1 0
# 9 Zoe 0 0 3 0
# 10 Akira 0 0 0 1
也许使用 CROSS APPLY
的另一个选项
例子
Select A.Name
,B.*
From YourTable A
Cross Apply ( values (C1,0,0,0)
,(0,C2,0,0)
,(0,0,C3,0)
,(0,0,0,C4)
) B (C1,C2,C3,C4)
Where B.C1+B.C2+B.C3+B.C4<>0
Returns
使用 union all
的另一个选项。
select name,c1,0 as c2,0 as c3,0 as c4 from tbl where c1>0
union all
select name,0,c2,0,0 from tbl where c2>0
union all
select name,0,0,c3,0 from tbl where c3>0
union all
select name,0,0,0,c4 from tbl where c4>0
df1 = data.frame(name = c("Tom", "Shane", "Daniel", "Akira", "Jack", "Zoe"),
c1 = c(1,2,3,0,5,0),
c2 = c(0, 3, 5, 0,4,0),
c3 = c(0, 0,1,0,0,3),
c4=c(0,0,0,1,0,0))
df2 = df1[rep(1:NROW(df1), apply(df1, 1, function(x) sum(x[-(1)] > 0))),]
df3 = df2
df3[-1] = df3[-1] * 0
df3[ave(1:NROW(df2), df2$name, FUN = length) == 1,] = df2[ave(1:NROW(df2), df2$name, FUN = length) == 1,]
replace(x = df3,
list = cbind(1:NROW(df3), 1+ave(1:NROW(df2), df2$name, FUN = seq_along)),
values = df2[cbind(1:NROW(df3), 1+ave(1:NROW(df2), df2$name, FUN = seq_along))])
# name c1 c2 c3 c4
#1 Tom 1 0 0 0
#2 Shane 2 0 0 0
#2.1 Shane 0 3 0 0
#3 Daniel 3 0 0 0
#3.1 Daniel 0 5 0 0
#3.2 Daniel 0 0 1 0
#4 Akira 0 0 0 1
#5 Jack 5 0 0 0
#5.1 Jack 0 4 0 0
#6 Zoe 0 0 3 0
考虑使用 by
的基础 R,它为每个不同的名称构建一个零填充数据帧,然后行将所有数据帧绑定到最后一个,类似于 union SQL:
df_list <- by(df, df$name, FUN = function(d){
tmp <- data.frame(name = d$name[1],
c1 = c(max(d$c1), rep(0, 3)),
c2 = c(0, max(d$c2), rep(0, 2)),
c3 = c(rep(0, 2), max(d$c3), 0),
c4 = c(rep(0, 3), max(d$c4)))
tmp <- tmp[rowSums(tmp[-1])!=0,]
row.names(tmp) <- NULL
tmp
})
final_df <- do.call(rbind, unname(df_list))
final_df
我用R生成了一套玩具
data.frame(name = c("Tom", "Shane", "Daniel", "Akira", "Jack", "Zoe"), c1 = c(1,2,3,0,5,0), c2 = c(0, 3, 5, 0,4,0), c3 = c(0, 0,1,0,0,3), c4=c(0,0,0,1,0,0))
显示如下:
我只关心列c1, c2, c3, c4
,如果特定行有多个值,即大于0。我们需要复制行以确保只有一个值,这大于 0,然后删除原来的行。
例如,第二行有两个值大于0(c1:2,c2:3),那么我们必须将该行复制为两个,看起来像这样
Shane 2 0 0 0
Shane 0 3 0 0
我正在尝试构建一个 SQL 查询来捕获它。但是,我不确定是否有任何 SQL 函数可以在不先查看结果的情况下检测特定行中的多个非零值。无论如何,如果存在任何神奇的 SQL 函数,最终结果应该是这样的:
我也在考虑用R来完成。我知道唯一可以复制行的 R 函数是 do.call()
函数,然后将其与 rbind()
函数组合。但是,它不适用于我的情况。你能给我任何提示吗?非常感谢 :)
您可以使用一些 tidyverse
函数来完成此操作。首先,我们输入您的示例数据
library(tidyverse)
dd <- tribble(~name, ~c1, ~c2, ~c3, ~c4,
"Tom", 1, 0, 0, 0,
"Shane", 2, 3, 0, 0,
"Daniel", 3, 5, 1, 0,
"Akira", 0, 0, 0 ,1,
"Jack", 5, 4, 0, 0,
"Zoe", 0, 0, 3, 0)
然后我们收集、过滤和传播以获得您想要的行。通过添加行 ID,我们将不同的值保留在不同的行上。
dd %>%
gather("var", "val", -name) %>%
rowid_to_column() %>%
filter(val>0) %>%
spread(var, val, fill=0) %>%
select(-rowid)
# A tibble: 10 x 5
# name c1 c2 c3 c4
# * <chr> <dbl> <dbl> <dbl> <dbl>
# 1 Tom 1 0 0 0
# 2 Shane 2 0 0 0
# 3 Daniel 3 0 0 0
# 4 Jack 5 0 0 0
# 5 Shane 0 3 0 0
# 6 Daniel 0 5 0 0
# 7 Jack 0 4 0 0
# 8 Daniel 0 0 1 0
# 9 Zoe 0 0 3 0
# 10 Akira 0 0 0 1
也许使用 CROSS APPLY
例子
Select A.Name
,B.*
From YourTable A
Cross Apply ( values (C1,0,0,0)
,(0,C2,0,0)
,(0,0,C3,0)
,(0,0,0,C4)
) B (C1,C2,C3,C4)
Where B.C1+B.C2+B.C3+B.C4<>0
Returns
使用 union all
的另一个选项。
select name,c1,0 as c2,0 as c3,0 as c4 from tbl where c1>0
union all
select name,0,c2,0,0 from tbl where c2>0
union all
select name,0,0,c3,0 from tbl where c3>0
union all
select name,0,0,0,c4 from tbl where c4>0
df1 = data.frame(name = c("Tom", "Shane", "Daniel", "Akira", "Jack", "Zoe"),
c1 = c(1,2,3,0,5,0),
c2 = c(0, 3, 5, 0,4,0),
c3 = c(0, 0,1,0,0,3),
c4=c(0,0,0,1,0,0))
df2 = df1[rep(1:NROW(df1), apply(df1, 1, function(x) sum(x[-(1)] > 0))),]
df3 = df2
df3[-1] = df3[-1] * 0
df3[ave(1:NROW(df2), df2$name, FUN = length) == 1,] = df2[ave(1:NROW(df2), df2$name, FUN = length) == 1,]
replace(x = df3,
list = cbind(1:NROW(df3), 1+ave(1:NROW(df2), df2$name, FUN = seq_along)),
values = df2[cbind(1:NROW(df3), 1+ave(1:NROW(df2), df2$name, FUN = seq_along))])
# name c1 c2 c3 c4
#1 Tom 1 0 0 0
#2 Shane 2 0 0 0
#2.1 Shane 0 3 0 0
#3 Daniel 3 0 0 0
#3.1 Daniel 0 5 0 0
#3.2 Daniel 0 0 1 0
#4 Akira 0 0 0 1
#5 Jack 5 0 0 0
#5.1 Jack 0 4 0 0
#6 Zoe 0 0 3 0
考虑使用 by
的基础 R,它为每个不同的名称构建一个零填充数据帧,然后行将所有数据帧绑定到最后一个,类似于 union SQL:
df_list <- by(df, df$name, FUN = function(d){
tmp <- data.frame(name = d$name[1],
c1 = c(max(d$c1), rep(0, 3)),
c2 = c(0, max(d$c2), rep(0, 2)),
c3 = c(rep(0, 2), max(d$c3), 0),
c4 = c(rep(0, 3), max(d$c4)))
tmp <- tmp[rowSums(tmp[-1])!=0,]
row.names(tmp) <- NULL
tmp
})
final_df <- do.call(rbind, unname(df_list))
final_df