使用sqldf基于多个条件进行计数
Count based on multiple conditions using sqldf
大家好,我正在使用 sqldf 在 R 上编写 sql 查询,但似乎遇到了障碍。我有一个 table,其中包含一个 Id 列、两个日期列和一个按列分组。
AlertDate AppointmentDate ID Branch
01/01/20 04/01/20 1 W1
01/01/20 09/01/20 1 W1
08/01/20 09/01/20 1 W2
01/01/20 23/01/20 1 W1
我正在写的查询是
sqldf('select Branch,count(ID) from df where AlertDate <= AppointmentDate
and AppointmentDate <AlertDate+7 group by Branch')
从这个查询中我得到的结果是
Branch Count
W1 1
W2 1
根据查询,这是正确的。我想要实现的是,如果我的第二个条件为假,即 AppointmentDate 小于 AlertDate+7。与其丢弃计数,不如根据日期将其计入下一组。例如,如果提醒日期是 01/01/20,约会日期是 23/01/20,那么它应该计入 W4。 ceil((Appointmentdate-alertdate)/7) 所以最后我想要的结果是
Branch Count
W1 1
W2 2
W4 1
第二行应计入 W2,第四行应计入 W4 而不是丢弃。我试图在 sql 中使用 R 中的 sqldf 来实现这一点。使用 R 或 Sql 的任何可能的解决方案都对我有用。
dput(测试)的输出
structure(list(AlertDate = structure(c(18262, 18262, 18269, 18262), class = "Date"), AppointmentDate = structure(c(18265, 18270,18270, 18284), class =
"Date"), ID = c(1, 1, 1, 1), Branch = c("W1","W1", "W2", "W1")), class = c("spec_tbl_df", "tbl_df", "tbl","data.frame"), row.names = c(NA, -4L), problems =
structure(list( row = 4L, col = "Branch", expected = "", actual = "embedded null",
file = "'C:/Users/FRssarin/Desktop/test.txt'"), row.names = c(NA,-1L), class = c("tbl_df", "tbl", "data.frame")), spec = structure(list( cols = list(AlertDate =
structure(list(format = "%d/%m/%y"), class = c("collector_date",
"collector")), AppointmentDate = structure(list(format = "%d/%m/%y"), class = c("collector_date", "collector")), ID = structure(list(), class = c("collector_double", "collector")), Branch = structure(list(), class =
c("collector_character", "collector"))), default = structure(list(), class = c("collector_guess", "collector")), skip = 1), class = "col_spec"))
这是使用 data.table
的一种方法
df <- structure(list(AlertDate = structure(c(18262, 18262, 18269, 18262), class = "Date"), AppointmentDate = structure(c(18265, 18270,18270, 18284), class =
"Date"), ID = c(1, 1, 1, 1), Branch = c("W1","W1", "W2", "W1")), class = c("spec_tbl_df", "tbl_df", "tbl","data.frame"), row.names = c(NA, -4L), problems =
structure(list( row = 4L, col = "Branch", expected = "", actual = "embedded null",
file = "'C:/Users/FRssarin/Desktop/test.txt'"), row.names = c(NA,-1L), class = c("tbl_df", "tbl", "data.frame")), spec = structure(list( cols = list(AlertDate =
structure(list(format = "%d/%m/%y"), class = c("collector_date",
我正在将其转换为 data.table 并为您的逻辑创建一个新列。
library(data.table)
df <- data.table(df)
df <- df[AlertDate <= AppointmentDate]
df[, new_branch:= ifelse(as.numeric(AppointmentDate-AlertDate)>=7
,paste0("W", as.character(ceiling(as.numeric(AppointmentDate-AlertDate)/7))),Branch)]
这是结果table
AlertDate AppointmentDate ID Branch new_branch
1: 2020-01-01 2020-01-04 1 W1 W1
2: 2020-01-01 2020-01-09 1 W1 W2
3: 2020-01-08 2020-01-09 1 W2 W2
4: 2020-01-01 2020-01-23 1 W1 W4
这是 goupby 结果..
df[, .(.N, alert=head(AlertDate,1), appoint=head(AppointmentDate,1)), by = list(new_branch)]
new_branch N alert appoint
1: W1 1 2020-01-01 2020-01-04
2: W2 2 2020-01-01 2020-01-09
3: W4 1 2020-01-01 2020-01-23
大家好,我正在使用 sqldf 在 R 上编写 sql 查询,但似乎遇到了障碍。我有一个 table,其中包含一个 Id 列、两个日期列和一个按列分组。
AlertDate AppointmentDate ID Branch
01/01/20 04/01/20 1 W1
01/01/20 09/01/20 1 W1
08/01/20 09/01/20 1 W2
01/01/20 23/01/20 1 W1
我正在写的查询是
sqldf('select Branch,count(ID) from df where AlertDate <= AppointmentDate
and AppointmentDate <AlertDate+7 group by Branch')
从这个查询中我得到的结果是
Branch Count
W1 1
W2 1
根据查询,这是正确的。我想要实现的是,如果我的第二个条件为假,即 AppointmentDate 小于 AlertDate+7。与其丢弃计数,不如根据日期将其计入下一组。例如,如果提醒日期是 01/01/20,约会日期是 23/01/20,那么它应该计入 W4。 ceil((Appointmentdate-alertdate)/7) 所以最后我想要的结果是
Branch Count
W1 1
W2 2
W4 1
第二行应计入 W2,第四行应计入 W4 而不是丢弃。我试图在 sql 中使用 R 中的 sqldf 来实现这一点。使用 R 或 Sql 的任何可能的解决方案都对我有用。
dput(测试)的输出
structure(list(AlertDate = structure(c(18262, 18262, 18269, 18262), class = "Date"), AppointmentDate = structure(c(18265, 18270,18270, 18284), class =
"Date"), ID = c(1, 1, 1, 1), Branch = c("W1","W1", "W2", "W1")), class = c("spec_tbl_df", "tbl_df", "tbl","data.frame"), row.names = c(NA, -4L), problems =
structure(list( row = 4L, col = "Branch", expected = "", actual = "embedded null",
file = "'C:/Users/FRssarin/Desktop/test.txt'"), row.names = c(NA,-1L), class = c("tbl_df", "tbl", "data.frame")), spec = structure(list( cols = list(AlertDate =
structure(list(format = "%d/%m/%y"), class = c("collector_date",
"collector")), AppointmentDate = structure(list(format = "%d/%m/%y"), class = c("collector_date", "collector")), ID = structure(list(), class = c("collector_double", "collector")), Branch = structure(list(), class =
c("collector_character", "collector"))), default = structure(list(), class = c("collector_guess", "collector")), skip = 1), class = "col_spec"))
这是使用 data.table
的一种方法df <- structure(list(AlertDate = structure(c(18262, 18262, 18269, 18262), class = "Date"), AppointmentDate = structure(c(18265, 18270,18270, 18284), class =
"Date"), ID = c(1, 1, 1, 1), Branch = c("W1","W1", "W2", "W1")), class = c("spec_tbl_df", "tbl_df", "tbl","data.frame"), row.names = c(NA, -4L), problems =
structure(list( row = 4L, col = "Branch", expected = "", actual = "embedded null",
file = "'C:/Users/FRssarin/Desktop/test.txt'"), row.names = c(NA,-1L), class = c("tbl_df", "tbl", "data.frame")), spec = structure(list( cols = list(AlertDate =
structure(list(format = "%d/%m/%y"), class = c("collector_date",
我正在将其转换为 data.table 并为您的逻辑创建一个新列。
library(data.table)
df <- data.table(df)
df <- df[AlertDate <= AppointmentDate]
df[, new_branch:= ifelse(as.numeric(AppointmentDate-AlertDate)>=7
,paste0("W", as.character(ceiling(as.numeric(AppointmentDate-AlertDate)/7))),Branch)]
这是结果table
AlertDate AppointmentDate ID Branch new_branch
1: 2020-01-01 2020-01-04 1 W1 W1
2: 2020-01-01 2020-01-09 1 W1 W2
3: 2020-01-08 2020-01-09 1 W2 W2
4: 2020-01-01 2020-01-23 1 W1 W4
这是 goupby 结果..
df[, .(.N, alert=head(AlertDate,1), appoint=head(AppointmentDate,1)), by = list(new_branch)]
new_branch N alert appoint
1: W1 1 2020-01-01 2020-01-04
2: W2 2 2020-01-01 2020-01-09
3: W4 1 2020-01-01 2020-01-23