mutate(percentage = n / sum(n)) - 未正确计算百分比
mutate(percentage = n / sum(n)) - not correctly calculating percentage
Mutate output
我一直在使用下面的代码来计算每个行为的每小时百分比(时间列 d h),但是它混淆了时间列的顺序并且错误地计算了百分比。我附上了输出样本和一些数据。非常感谢任何帮助!
S06Behav <- S06 %>%
group_by(Time, PredictedBehaviorFull, Context)%>%
summarise(count= n())
S06Proportions<-S06Behav %>%
group_by(Time, PredictedBehaviorFull, Context) %>%
summarise(n = sum(count)) %>%
mutate(percentage = n / sum(n))
我的数据样本是 https://pastebin.com/KE0xEzk7
谢谢
我认为百分比未按预期计算的原因是因为根据代码,您是根据两个相同的值确定百分比,因此比例为 1.0。
我不能完全确定你的问题,但如果你说“混淆时间列的顺序”,你的意思是整个 Time
列不正确,那么你最好使用 lubridate
包来制作你的 Time
列。
library(lubridate)
S06 %>%
# first we convert the Timestamp column into datetime format
mutate(
Timestamp = ymd_hms(Timestamp)
) %>%
# then, we can extract the components from the Timestamp
mutate(
date = date(Timestamp),
hour = lubridate::hour(Timestamp),
timestamp_hour = ymd_h(str_c(date, ' ', hour))
) %>%
{. ->> S06_a} # this saves the data as 'S06_a' to use next
那么如果我没理解错的话你想确定每小时每种行为类型的观察百分比。
S06_a %>%
# then, work out the total number of observations per hour, context and behaviour
group_by(timestamp_hour, Context, PredictedBehaviorFull) %>%
summarise(
behav_total = n()
) %>%
# calculate the total number of observations per hour
group_by(timestamp_hour) %>%
mutate(
hour_total = sum(behav_total),
percentage = behav_total / hour_total
)
产生以下输出:
# A tibble: 7 x 6
# Groups: timestamp_hour [3]
timestamp_hour Context PredictedBehaviorFull behav_total hour_total percentage
<dttm> <chr> <chr> <int> <int> <dbl>
1 2020-05-23 19:00:00 Present Bait 1971 2184 0.902
2 2020-05-23 19:00:00 Present Boat 96 2184 0.0440
3 2020-05-23 19:00:00 Present No_OP 117 2184 0.0536
4 2020-05-24 10:00:00 Absent Bait 9 1202 0.00749
5 2020-05-24 10:00:00 Absent No_OP 1193 1202 0.993
6 2020-05-24 11:00:00 Absent Bait 5 129 0.0388
7 2020-05-24 11:00:00 Absent No_OP 124 129 0.961
Mutate output
我一直在使用下面的代码来计算每个行为的每小时百分比(时间列 d h),但是它混淆了时间列的顺序并且错误地计算了百分比。我附上了输出样本和一些数据。非常感谢任何帮助!
S06Behav <- S06 %>%
group_by(Time, PredictedBehaviorFull, Context)%>%
summarise(count= n())
S06Proportions<-S06Behav %>%
group_by(Time, PredictedBehaviorFull, Context) %>%
summarise(n = sum(count)) %>%
mutate(percentage = n / sum(n))
我的数据样本是 https://pastebin.com/KE0xEzk7
谢谢
我认为百分比未按预期计算的原因是因为根据代码,您是根据两个相同的值确定百分比,因此比例为 1.0。
我不能完全确定你的问题,但如果你说“混淆时间列的顺序”,你的意思是整个 Time
列不正确,那么你最好使用 lubridate
包来制作你的 Time
列。
library(lubridate)
S06 %>%
# first we convert the Timestamp column into datetime format
mutate(
Timestamp = ymd_hms(Timestamp)
) %>%
# then, we can extract the components from the Timestamp
mutate(
date = date(Timestamp),
hour = lubridate::hour(Timestamp),
timestamp_hour = ymd_h(str_c(date, ' ', hour))
) %>%
{. ->> S06_a} # this saves the data as 'S06_a' to use next
那么如果我没理解错的话你想确定每小时每种行为类型的观察百分比。
S06_a %>%
# then, work out the total number of observations per hour, context and behaviour
group_by(timestamp_hour, Context, PredictedBehaviorFull) %>%
summarise(
behav_total = n()
) %>%
# calculate the total number of observations per hour
group_by(timestamp_hour) %>%
mutate(
hour_total = sum(behav_total),
percentage = behav_total / hour_total
)
产生以下输出:
# A tibble: 7 x 6
# Groups: timestamp_hour [3]
timestamp_hour Context PredictedBehaviorFull behav_total hour_total percentage
<dttm> <chr> <chr> <int> <int> <dbl>
1 2020-05-23 19:00:00 Present Bait 1971 2184 0.902
2 2020-05-23 19:00:00 Present Boat 96 2184 0.0440
3 2020-05-23 19:00:00 Present No_OP 117 2184 0.0536
4 2020-05-24 10:00:00 Absent Bait 9 1202 0.00749
5 2020-05-24 10:00:00 Absent No_OP 1193 1202 0.993
6 2020-05-24 11:00:00 Absent Bait 5 129 0.0388
7 2020-05-24 11:00:00 Absent No_OP 124 129 0.961