mutate(percentage = n / sum(n)) - 未正确计算百分比

Question

Mutate output

我一直在使用下面的代码来计算每个行为的每小时百分比（时间列 d h），但是它混淆了时间列的顺序并且错误地计算了百分比。我附上了输出样本和一些数据。非常感谢任何帮助！

S06Behav <- S06 %>%
 group_by(Time, PredictedBehaviorFull, Context)%>%
 summarise(count= n())

S06Proportions<-S06Behav %>%
 group_by(Time, PredictedBehaviorFull, Context) %>%
 summarise(n = sum(count)) %>%
 mutate(percentage = n / sum(n))

我的数据样本是 https://pastebin.com/KE0xEzk7

谢谢

Answer 1

我认为百分比未按预期计算的原因是因为根据代码，您是根据两个相同的值确定百分比，因此比例为 1.0。

我不能完全确定你的问题，但如果你说“混淆时间列的顺序”，你的意思是整个 Time列不正确，那么你最好使用 lubridate 包来制作你的 Time 列。

library(lubridate)

S06 %>% 
  
  # first we convert the Timestamp column into datetime format
  mutate(
    Timestamp = ymd_hms(Timestamp)
  ) %>% 
  
  # then, we can extract the components from the Timestamp
  mutate(
    date = date(Timestamp),
    hour = lubridate::hour(Timestamp), 
    timestamp_hour = ymd_h(str_c(date, ' ', hour))
  ) %>%

  {. ->> S06_a} # this saves the data as 'S06_a' to use next

那么如果我没理解错的话你想确定每小时每种行为类型的观察百分比。

S06_a %>% 
  
  # then, work out the total number of observations per hour, context and behaviour
  group_by(timestamp_hour, Context, PredictedBehaviorFull) %>% 
  summarise(
    behav_total = n()
  ) %>% 
  
  # calculate the total number of observations per hour
  group_by(timestamp_hour) %>% 
  mutate(
    hour_total = sum(behav_total), 
    percentage = behav_total / hour_total
  )

产生以下输出：

# A tibble: 7 x 6
# Groups:   timestamp_hour [3]
  timestamp_hour      Context PredictedBehaviorFull behav_total hour_total percentage
  <dttm>              <chr>   <chr>                       <int>      <int>      <dbl>
1 2020-05-23 19:00:00 Present Bait                         1971       2184    0.902  
2 2020-05-23 19:00:00 Present Boat                           96       2184    0.0440 
3 2020-05-23 19:00:00 Present No_OP                         117       2184    0.0536 
4 2020-05-24 10:00:00 Absent  Bait                            9       1202    0.00749
5 2020-05-24 10:00:00 Absent  No_OP                        1193       1202    0.993  
6 2020-05-24 11:00:00 Absent  Bait                            5        129    0.0388 
7 2020-05-24 11:00:00 Absent  No_OP                         124        129    0.961

mutate(percentage = n / sum(n)) - 未正确计算百分比

mutate(percentage = n / sum(n)) - not correctly calculating percentage

r

percentage

dplyr