获得 data.table R 中的最小值差和大小写
get minimum value difference and case in data.table R
我在 R 中有一个 data.table,如下所示:
State City City.Population num.stores
state A City_1 523 5
state A City_2 456 NA
state A City_3 1230 52
state A City_4 780 NA
state B City_5 788 NA
state B City_6 111 15
state B City_7 897 NA
state B City_8 5 48
我想通过比较一个州内另一个城市的人口,得到每个 City
另一个城市的 num.stores
。例如,在 State A
的 City_1
中,我会得到 State A
的 City 2
在人口方面比 City 3
更相似,因为他们之间的人口差异是 67 (在 City_1
和 City_2
之间,均来自 state A
)与 707(City_1
与 City_3
)相比,因此将 5 家商店分配给 City_2
。我的最终结果如下所示:
State City City.Population num.stores assigned.stores similar_pop_city
state A City_1 523 5 5
state A City_2 456 NA 5 City_1 (City_1 is closer in population and not null)
state A City_3 1230 52 52
state A City_4 780 NA 52 City_1 (City_1 is closer in population and not null)
state B City_5 788 NA 15 City_6 (City_6 is closer in population and not null)
state B City_6 111 15 15
state B City_7 897 NA 15 City_6 (City_6 is closer in population and not null)
state B City_8 5 48 5
我尝试执行以下操作,但我仍然缺少执行它的正确逻辑:
d.f[, which.min(City.Population -City.Population), .(State)]
但只返回 1。
P.D 与州内其他城市进行比较时,我也应该排除同一个城市
我不确定我是否得到了错误的输出,或者您想要的输出是否仍然有一些错误(例如,对于 City_4
,您将 City_4
(本身)指定为 Closest.City
).
我首先将 City
的 Closest.City
分配给 num.stores
中没有 NA
的 City
num.stores
中包含 NA
].之后我创建了列 assigned.stores
,从 City
复制了 num.stores
和已知的 num.stores
,否则分配了 Closest.City
.[=29 的 num.stores
=]
数据
State <- c(rep("State A", 4), rep("State B", 4))
City <- c("City_1", "City_2", "City_3", "City_4", "City_5", "City_6", "City_7", "City_8")
City.Population <- c(523, 456, 1230, 780, 788, 111, 897, 5)
num.stores <- c(5, NA, 52, NA, NA, 15, NA, 48)
d.f <- data.frame(State, City, City.Population, num.stores)
代码
d.f %>%
group_by(State) %>%
mutate(Closest.City = ifelse(is.na(num.stores),
apply(sapply(City.Population[!is.na(num.stores)], function(i) abs(i - City.Population[!is.na(num.stores)])), 2,
function(n) City[which(n == sort(n)[2])]), NA)) %>%
mutate(assigned.stores = ifelse(is.na(num.stores), num.stores[match(Closest.City, City)], num.stores))
输出
# A tibble: 8 x 6
# Groups: State [2]
State City City.Population num.stores Closest.City assigned.stores
<chr> <chr> <dbl> <dbl> <chr> <dbl>
1 State A City_1 523 5 NA 5
2 State A City_2 456 NA City_1 5
3 State A City_3 1230 52 NA 52
4 State A City_4 780 NA City_1 5
5 State B City_5 788 NA City_6 15
6 State B City_6 111 15 NA 15
7 State B City_7 897 NA City_6 15
8 State B City_8 5 48 NA 48
我在 R 中有一个 data.table,如下所示:
State City City.Population num.stores
state A City_1 523 5
state A City_2 456 NA
state A City_3 1230 52
state A City_4 780 NA
state B City_5 788 NA
state B City_6 111 15
state B City_7 897 NA
state B City_8 5 48
我想通过比较一个州内另一个城市的人口,得到每个 City
另一个城市的 num.stores
。例如,在 State A
的 City_1
中,我会得到 State A
的 City 2
在人口方面比 City 3
更相似,因为他们之间的人口差异是 67 (在 City_1
和 City_2
之间,均来自 state A
)与 707(City_1
与 City_3
)相比,因此将 5 家商店分配给 City_2
。我的最终结果如下所示:
State City City.Population num.stores assigned.stores similar_pop_city
state A City_1 523 5 5
state A City_2 456 NA 5 City_1 (City_1 is closer in population and not null)
state A City_3 1230 52 52
state A City_4 780 NA 52 City_1 (City_1 is closer in population and not null)
state B City_5 788 NA 15 City_6 (City_6 is closer in population and not null)
state B City_6 111 15 15
state B City_7 897 NA 15 City_6 (City_6 is closer in population and not null)
state B City_8 5 48 5
我尝试执行以下操作,但我仍然缺少执行它的正确逻辑:
d.f[, which.min(City.Population -City.Population), .(State)]
但只返回 1。
P.D 与州内其他城市进行比较时,我也应该排除同一个城市
我不确定我是否得到了错误的输出,或者您想要的输出是否仍然有一些错误(例如,对于 City_4
,您将 City_4
(本身)指定为 Closest.City
).
我首先将 City
的 Closest.City
分配给 num.stores
中没有 NA
的 City
num.stores
中包含 NA
].之后我创建了列 assigned.stores
,从 City
复制了 num.stores
和已知的 num.stores
,否则分配了 Closest.City
.[=29 的 num.stores
=]
数据
State <- c(rep("State A", 4), rep("State B", 4))
City <- c("City_1", "City_2", "City_3", "City_4", "City_5", "City_6", "City_7", "City_8")
City.Population <- c(523, 456, 1230, 780, 788, 111, 897, 5)
num.stores <- c(5, NA, 52, NA, NA, 15, NA, 48)
d.f <- data.frame(State, City, City.Population, num.stores)
代码
d.f %>%
group_by(State) %>%
mutate(Closest.City = ifelse(is.na(num.stores),
apply(sapply(City.Population[!is.na(num.stores)], function(i) abs(i - City.Population[!is.na(num.stores)])), 2,
function(n) City[which(n == sort(n)[2])]), NA)) %>%
mutate(assigned.stores = ifelse(is.na(num.stores), num.stores[match(Closest.City, City)], num.stores))
输出
# A tibble: 8 x 6
# Groups: State [2]
State City City.Population num.stores Closest.City assigned.stores
<chr> <chr> <dbl> <dbl> <chr> <dbl>
1 State A City_1 523 5 NA 5
2 State A City_2 456 NA City_1 5
3 State A City_3 1230 52 NA 52
4 State A City_4 780 NA City_1 5
5 State B City_5 788 NA City_6 15
6 State B City_6 111 15 NA 15
7 State B City_7 897 NA City_6 15
8 State B City_8 5 48 NA 48