发现不良信用后取消所有交易

Question

下面是一个交易记录的小数据集，ID，月份日期，虚拟变量Bad_Credit与否。我想在信用不良开始后取消所有交易。 OUTPUT 列表示正确的结果，即第 1,2,3,5,6,8,10 行。

这只是一个示例，可能有数千行。 SQL, R, SPSS 都行。谢谢。

DATE	ID	Bad_CREDIT	OUTPUT
12	A	1	1
15	A	1	1
18	A	0	1
2	B	0	0
10	B	1	1
20	B	0	1
5	C	0	0
15	C	1	1
1	D	0	0
9	E	1	1

Answer 1

如果我没理解错的话，你可以使用window函数：

select t.*
from (select t.*,
             min(case when bad_credit = 1 then date end) over (partition by id) as min_bd_date
      from t
     ) t
where date >= min_bd_date;

您也可以使用相关子查询来执行此操作：

select t.*
from t
where t.date >= (select min(t2.date)
                 from t t2
                 where t2.id = t.id and
                       t2.bad_credit = 1
                );

Answer 2

如果这是在数据库中，那么我认为 SQL 可能是解决这个问题的更好地方。但是，如果您已经在 R 中拥有它，那么 ...

这是一个 R 方法，使用 dplyr:

library(dplyr)
dat %>%
  group_by(ID) %>%
  mutate(OUTPUT2 = +cumany(Bad_CREDIT)) %>%
  ungroup()
# # A tibble: 10 x 5
#     DATE ID    Bad_CREDIT OUTPUT OUTPUT2
#    <int> <chr>      <int>  <int>   <int>
#  1    12 A              1      1       1
#  2    15 A              1      1       1
#  3    18 A              0      1       1
#  4     2 B              0      0       0
#  5    10 B              1      1       1
#  6    20 B              0      1       1
#  7     5 C              0      0       0
#  8    15 C              1      1       1
#  9     1 D              0      0       0
# 10     9 E              1      1       1

因为这实际上是一个简单的分组操作，所以基础 R 和 data.table 解决方案一样简单。

+ave(dat$Bad_CREDIT, dat$ID, FUN=cumany)
#  [1] 1 1 1 0 1 1 0 1 0 1

library(data.table)
datDT <- as.data.table(dat)
datDT[, OUTPUT2 := +cumany(Bad_CREDIT), by = .(ID)]

Answer 3

您可以通过 ID 和 DATE arrange 数据，如果 Bad_CREDIT 的第一个值为 0，则为每个 ID 分配 0。

library(dplyr)

df %>%
  arrange(ID, DATE) %>%
  group_by(ID) %>%
  mutate(OUTPUT = as.integer(!(first(Bad_CREDIT) == 0 & row_number() == 1)))

#    DATE ID    Bad_CREDIT OUTPUT
#   <int> <chr>      <int>  <int>
# 1    12 A              1      1
# 2    15 A              1      1
# 3    18 A              0      1
# 4     2 B              0      0
# 5    10 B              1      1
# 6    20 B              0      1
# 7     5 C              0      0
# 8    15 C              1      1
# 9     1 D              0      0
#10     9 E              1      1

数据

df <- structure(list(DATE = c(12L, 15L, 18L, 2L, 10L, 20L, 5L, 15L, 
1L, 9L), ID = c("A", "A", "A", "B", "B", "B", "C", "C", "D", 
"E"), Bad_CREDIT = c(1L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 1L)), 
row.names = c(NA, -10L), class = "data.frame")

Answer 4

您可以按如下方式使用EXISTS：

select t.* from your_table t
where exists
     (select 1
        from your_table tt
       where t.id = tt.id
         and t.date >= tt.date 
         and tt.bad_credit = 1);

Answer 5

这是针对 SPSS 的：

sort cases by ID date.
compute PullOut=Bad_CREDIT.
if $casenum>1 and ID=lag(ID) and lag(PullOut)=1 PullOut=1.
exe.

发现不良信用后取消所有交易

Pull out all transactions after a bad credit found

sql

r

spss