Reduce/filter 数据基于 class 和发生日期

Reduce/filter data based on class and date occurrence

我有一个不同地区不同船只的数据集。我得到的数据输出记录了船只的名称、类型(例如 fishing/cargo)以及它进入该区域的时间、它离开的时间以及它在该区域的持续时间/ DOS 只是离岸距离 - 或区域 i正在看

我的问题是渔船经常横断面,一天内会多次进出该区域,因此会在我的报告输出中多次注明。

我想合并渔船数据,这样如果同名船(仅适用于类型:捕鱼)每天被记录不止一次,除了一个帐户之外的所有帐户都会被删除。为简单起见,也许只看一下 "First seen in zone date",因为我认为当特定持续时间跨越多天时它会变得更加复杂(我可以稍后再回到那个想法)。

虚拟数据:

 df <- structure(list(Name = structure(c(1L, 1L, 2L, 2L, 2L, 3L, 3L, 
 3L, 3L, 3L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 7L, 7L, 8L, 
 8L, 9L), .Label = c("A", "B", "C", "D", "E", "F", "G", "H", "I"
 ), class = "factor"), Type = structure(c(2L, 2L, 2L, 2L, 2L, 
 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 
 2L, 1L, 1L, 2L), .Label = c("Cargo", "Fishing"), class = "factor"), 
 `First seen inside` = structure(c(1556385360, 1556393640, 
 1556002200, 1556260260, 1556518860, 1556136660, 1556278500, 
 1556285820, 1556391480, 1556509620, 1556319480, 1556214120, 
 1556235600, 1556325540, 1556326920, 1556329500, 1556330220, 
 1556330580, 1556330880, 1556330940, 1556332980, 1556339880, 
 1556340900, 1556344140, 1556344500, 1556345220, 1556346420, 
 1556348220, 1556348520, 1556350860, 1556351460, 1556356620, 
 1556360220, 1556365920, 1556366520, 1556367180, 1556076420, 
 1556166900, 1556154840, 1556454900, 1556291220), class = c("POSIXct", 
 "POSIXt"), tzone = ""), `Last seen inside` = structure(c(34L, 
 35L, 1L, 8L, 38L, 3L, 7L, 9L, 36L, 38L, 27L, 4L, 5L, 10L, 
 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 
 23L, 24L, 25L, 26L, 28L, 29L, 30L, 31L, 32L, 33L, 2L, 6L, 
 37L, 38L, 38L), .Label = c("4/23/2019 14:27", "4/24/2019 21:23", 
 "4/25/2019 00:00", "4/25/2019 10:47", "4/25/2019 16:59", 
 "4/25/2019 23:49", "4/26/2019 05:17", "4/26/2019 13:39", 
 "4/26/2019 15:12", "4/26/2019 17:54", "4/26/2019 18:05", 
 "4/26/2019 18:51", "4/26/2019 19:00", "4/26/2019 19:06", 
 "4/26/2019 19:08", "4/26/2019 19:13", "4/26/2019 21:24", 
 "4/26/2019 21:38", "4/26/2019 22:02", "4/26/2019 22:51", 
 "4/26/2019 22:55", "4/26/2019 23:22", "4/26/2019 23:51", 
 "4/27/2019 00:00", "4/27/2019 00:36", "4/27/2019 00:42", 
 "4/27/2019 01:17", "4/27/2019 02:06", "4/27/2019 03:11", 
 "4/27/2019 04:30", "4/27/2019 05:00", "4/27/2019 05:03", 
 "4/27/2019 05:13", "4/27/2019 10:29", "4/27/2019 12:42", 
 "4/27/2019 17:21", "4/28/2019 03:47", "4/29/2019 09:56"), class = 
  "factor"), 
`Time in zone` = structure(c(5L, 31L, 6L, 7L, 2L, 3L, 23L, 
 30L, 26L, 4L, 32L, 27L, 9L, 8L, 22L, 28L, 22L, 22L, 1L, 24L, 
 15L, 1L, 29L, 18L, 1L, 8L, 17L, 22L, 19L, 16L, 14L, 25L, 
 13L, 31L, 16L, 1L, 12L, 10L, 21L, 11L, 20L), .Label = c("", 
 "10h 35m", "10h 49m", "13h 9m", "13m", "14h 37m", "14h 8m", 
 "15m", "19m", "1d 2h 14m", "1d 4h 21m", "1d 56m", "1h 13m", 
 "1h 15m", "1h 41m", "1m", "24m", "2m", "34m", "3d 1h 49m", 
 "3d 9h 33m", "3m", "42m", "4m", "54m", "5h 23m", "5m", "6m", 
 "7m", "8h 35m", "8m", "9h 19m"), class = "factor"), DOS = 
  structure(c(1L, 
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "0-12", class = 
 "factor")), row.names = c(NA, 
 -41L), class = "data.frame")

所以如果例如在我的虚拟数据集中:

我希望这有点道理?我不确定这是否是基于同一天乘法的 dplyr 或 mutate 上可能的 filter 选项?关于如何管理这个 "problem" 的任何建议都会很棒......或者我可能需要对数据集做一些手动工作:(

df %>% group_by(Name,DOS,as.Date(`First seen inside`)) %>% 
  filter(Type=="Fishing") %>% 
  summarize(last=max(as.Date(`Last seen inside`, format="%m/%d/%Y")))

是这样的吗?结果:

# A tibble: 10 x 4
# Groups:   Name, DOS [6]
   Name  DOS   `as.Date(\`First seen inside\`)` last      
   <fct> <fct> <date>                           <date>    
 1 A     0-12  2019-04-27                       2019-04-27
 2 B     0-12  2019-04-23                       2019-04-23
 3 B     0-12  2019-04-26                       2019-04-26
 4 B     0-12  2019-04-29                       2019-04-29
 5 D     0-12  2019-04-26                       2019-04-27
 6 E     0-12  2019-04-25                       2019-04-25
 7 E     0-12  2019-04-27                       2019-04-27
 8 G     0-12  2019-04-24                       2019-04-24
 9 G     0-12  2019-04-25                       2019-04-25
10 I     0-12  2019-04-26                       2019-04-29