如何使用R查找多列中的比率差异
How to find the difference of ratio in multiple columns using R
我的数据框包含 4 个不同年份的值。我需要找出这些值在所有年份中是如何变化的,即哪个城市的值变化太频繁,哪个最少。
City Ratio1 Ratio2 Ratio3 Ratio4
A 1.0177722 1.0173251 1.0133026 1.0140027
B 1.0132619 1.0122653 1.0128473 1.0111068
C 1.0689484 1.0640355 1.0625305 1.0544790
..... other 1000 entries
我试过通过差异来做到这一点,但没有运气。问题是哪个城市的比例在 ratio1 到 ratio4 之间变化最大,哪个城市变化最小。
我尝试使用 mutate 函数来计算方差,但它向我发出警告:
DF<- DF%>% mutate(vari = var(Ratio1:Ratio4,na.rm = T))
Warning messages:
1: In POP_2013_ratio:POP_2016_ratio :
numerical expression has 439 elements: only the first used
2: In POP_2013_ratio:POP_2016_ratio :
numerical expression has 439 elements: only the first used
R 的 data.table 包有一种非常简洁的方法可以根据现有列创建新列:
dt <- data.table(City = c("A", "B", "C"),
Ratio1 = c(1.0177722, 1.0132619, 1.0689484),
Ratio2 = c(1.0173251, 1.0122653, 1.0640355),
Ratio3 = c(1.0133026, 1.0128473, 1.0625305),
Ratio4 = c(1.0140027,1.0111068, 1.0544790))
>dt
City Ratio1 Ratio2 Ratio3 Ratio4
1: A 1.017772 1.017325 1.013303 1.014003
2: B 1.013262 1.012265 1.012847 1.011107
3: C 1.068948 1.064035 1.062531 1.054479
您可以试用一些功能,然后看看最适合您的功能:
dt[, diff := Ratio4-Ratio1
][, abs_diff := abs(Ratio4-Ratio1)
][, range:= max(c(Ratio1, Ratio2, Ratio3, Ratio4))- min(c(Ratio1, Ratio2, Ratio3, Ratio4)), by = City
][,variance:=var(c(Ratio1, Ratio2, Ratio3, Ratio4)), by = City]
>dt
City Ratio1 Ratio2 Ratio3 Ratio4 diff abs_diff range variance
1: A 1.017772 1.017325 1.013303 1.014003 -0.0037695 0.0037695 0.0044696 5.174612e-06
2: B 1.013262 1.012265 1.012847 1.011107 -0.0021551 0.0021551 0.0021551 8.766456e-07
3: C 1.068948 1.064035 1.062531 1.054479 -0.0144694 0.0144694 0.0144694 3.609233e-05
当您最终决定要使用的标准(比方说,方差)时,您可以 select 顶级城市使用:
dt[order(-variance)][1]
>dt
City Ratio1 Ratio2 Ratio3 Ratio4 diff abs_diff range variance
1: C 1.068948 1.064035 1.062531 1.054479 -0.0144694 0.0144694 0.0144694 3.609233e-05
我的数据框包含 4 个不同年份的值。我需要找出这些值在所有年份中是如何变化的,即哪个城市的值变化太频繁,哪个最少。
City Ratio1 Ratio2 Ratio3 Ratio4
A 1.0177722 1.0173251 1.0133026 1.0140027
B 1.0132619 1.0122653 1.0128473 1.0111068
C 1.0689484 1.0640355 1.0625305 1.0544790
..... other 1000 entries
我试过通过差异来做到这一点,但没有运气。问题是哪个城市的比例在 ratio1 到 ratio4 之间变化最大,哪个城市变化最小。 我尝试使用 mutate 函数来计算方差,但它向我发出警告:
DF<- DF%>% mutate(vari = var(Ratio1:Ratio4,na.rm = T))
Warning messages:
1: In POP_2013_ratio:POP_2016_ratio :
numerical expression has 439 elements: only the first used
2: In POP_2013_ratio:POP_2016_ratio :
numerical expression has 439 elements: only the first used
R 的 data.table 包有一种非常简洁的方法可以根据现有列创建新列:
dt <- data.table(City = c("A", "B", "C"),
Ratio1 = c(1.0177722, 1.0132619, 1.0689484),
Ratio2 = c(1.0173251, 1.0122653, 1.0640355),
Ratio3 = c(1.0133026, 1.0128473, 1.0625305),
Ratio4 = c(1.0140027,1.0111068, 1.0544790))
>dt
City Ratio1 Ratio2 Ratio3 Ratio4
1: A 1.017772 1.017325 1.013303 1.014003
2: B 1.013262 1.012265 1.012847 1.011107
3: C 1.068948 1.064035 1.062531 1.054479
您可以试用一些功能,然后看看最适合您的功能:
dt[, diff := Ratio4-Ratio1
][, abs_diff := abs(Ratio4-Ratio1)
][, range:= max(c(Ratio1, Ratio2, Ratio3, Ratio4))- min(c(Ratio1, Ratio2, Ratio3, Ratio4)), by = City
][,variance:=var(c(Ratio1, Ratio2, Ratio3, Ratio4)), by = City]
>dt
City Ratio1 Ratio2 Ratio3 Ratio4 diff abs_diff range variance
1: A 1.017772 1.017325 1.013303 1.014003 -0.0037695 0.0037695 0.0044696 5.174612e-06
2: B 1.013262 1.012265 1.012847 1.011107 -0.0021551 0.0021551 0.0021551 8.766456e-07
3: C 1.068948 1.064035 1.062531 1.054479 -0.0144694 0.0144694 0.0144694 3.609233e-05
当您最终决定要使用的标准(比方说,方差)时,您可以 select 顶级城市使用:
dt[order(-variance)][1]
>dt
City Ratio1 Ratio2 Ratio3 Ratio4 diff abs_diff range variance
1: C 1.068948 1.064035 1.062531 1.054479 -0.0144694 0.0144694 0.0144694 3.609233e-05