比较两个数据框以填充 r 中的天数范围
Compare two data frames to populate the Days range in r
我有两个数据框 DF1 和 DF2,我需要将 DF1 的天数与 DF2 的低范围和高范围列进行比较,并在结果数据框中获取天数范围列。
Items=c("Vegetables","Fruits","Grocery","Dairy Product")
Days=c(16,5,41,25)
DF1=data.frame(Items,Days)
Low_Range=c(0,8,15,22,31,61)
Hi_Range=c(7,14,21,30,60,90)
Days_Range=c("within 7 days","8 to 14 days","15 to 21 days","22 to 30 days","31 to 60 days","61 to 90 days")
DF2=data.frame(Low_Range,Hi_Range,Days_Range)
Days_Slot=c("15 to 21 days","within 7 days","31 to 60 days","22 to 30 days")
DF_Result=data.frame(Items,Days,Days_Slot)
DF_Result 将是我的结果数据框,其中 Days_Slot 作为新列添加到 DF1。
谁能帮忙解决这个问题
您可以使用 fuzzyjoin
.
fuzzyjoin::fuzzy_left_join(DF1, DF2,
by = c('Days' = 'Low_Range', 'Days' = 'Hi_Range'),
match_fun = c(`>=`, `<=`))
# Items Days Low_Range Hi_Range Days_Range
#1 Vegetables 16 15 21 15 to 21 days
#2 Fruits 5 0 7 within 7 days
#3 Grocery 41 31 60 31 to 60 days
#4 Dairy Product 25 22 30 22 to 30 days
如果您的数据集很大,您也可以尝试 data.table
。
library(data.table)
setDT(DF1)
setDT(DF2)
DF2[DF1, on = .(Low_Range <= Days, Hi_Range >= Days)]
这可以通过 在非 equi 连接中更新来解决:
library(data.table)
setDT(DF1)[setDT(DF2), on = .(Days >= Low_Range, Days <= Hi_Range),
Days_Slot := Days_Range][]
Items Days Days_Slot
1: Vegetables 16 15 to 21 days
2: Fruits 5 within 7 days
3: Grocery 41 31 to 60 days
4: Dairy Product 25 22 to 30 days
请注意,DF1
通过引用 更新 ,即新列 Days_Slot
附加到 DF1
而不复制对象。
由于间隔是连续的,匹配 Days_Range
也可以通过 rolling join:
来确定
library(data.table)
setDT(DF1)
setDT(DF2)
DF1[, Days_Slot := DF2[DF1, on = .(Low_Range = Days), roll = TRUE]$Days_Range][]
Items Days Days_Slot
1: Vegetables 16 15 to 21 days
2: Fruits 5 within 7 days
3: Grocery 41 31 to 60 days
4: Dairy Product 25 22 to 30 days
同样,一个新列 Days_Slot
被附加到 DF1
通过引用 。
顺便说一句,向后滚动连接 将给出相同的结果:
DF1[, Days_Slot := DF2[DF1, on = .(Hi_Range = Days), roll = -Inf]$Days_Range][]
我有两个数据框 DF1 和 DF2,我需要将 DF1 的天数与 DF2 的低范围和高范围列进行比较,并在结果数据框中获取天数范围列。
Items=c("Vegetables","Fruits","Grocery","Dairy Product")
Days=c(16,5,41,25)
DF1=data.frame(Items,Days)
Low_Range=c(0,8,15,22,31,61)
Hi_Range=c(7,14,21,30,60,90)
Days_Range=c("within 7 days","8 to 14 days","15 to 21 days","22 to 30 days","31 to 60 days","61 to 90 days")
DF2=data.frame(Low_Range,Hi_Range,Days_Range)
Days_Slot=c("15 to 21 days","within 7 days","31 to 60 days","22 to 30 days")
DF_Result=data.frame(Items,Days,Days_Slot)
DF_Result 将是我的结果数据框,其中 Days_Slot 作为新列添加到 DF1。 谁能帮忙解决这个问题
您可以使用 fuzzyjoin
.
fuzzyjoin::fuzzy_left_join(DF1, DF2,
by = c('Days' = 'Low_Range', 'Days' = 'Hi_Range'),
match_fun = c(`>=`, `<=`))
# Items Days Low_Range Hi_Range Days_Range
#1 Vegetables 16 15 21 15 to 21 days
#2 Fruits 5 0 7 within 7 days
#3 Grocery 41 31 60 31 to 60 days
#4 Dairy Product 25 22 30 22 to 30 days
如果您的数据集很大,您也可以尝试 data.table
。
library(data.table)
setDT(DF1)
setDT(DF2)
DF2[DF1, on = .(Low_Range <= Days, Hi_Range >= Days)]
这可以通过 在非 equi 连接中更新来解决:
library(data.table)
setDT(DF1)[setDT(DF2), on = .(Days >= Low_Range, Days <= Hi_Range),
Days_Slot := Days_Range][]
Items Days Days_Slot 1: Vegetables 16 15 to 21 days 2: Fruits 5 within 7 days 3: Grocery 41 31 to 60 days 4: Dairy Product 25 22 to 30 days
请注意,DF1
通过引用 更新 ,即新列 Days_Slot
附加到 DF1
而不复制对象。
由于间隔是连续的,匹配 Days_Range
也可以通过 rolling join:
library(data.table)
setDT(DF1)
setDT(DF2)
DF1[, Days_Slot := DF2[DF1, on = .(Low_Range = Days), roll = TRUE]$Days_Range][]
Items Days Days_Slot 1: Vegetables 16 15 to 21 days 2: Fruits 5 within 7 days 3: Grocery 41 31 to 60 days 4: Dairy Product 25 22 to 30 days
同样,一个新列 Days_Slot
被附加到 DF1
通过引用 。
顺便说一句,向后滚动连接 将给出相同的结果:
DF1[, Days_Slot := DF2[DF1, on = .(Hi_Range = Days), roll = -Inf]$Days_Range][]