获得 data.table R 中的最小值差和大小写

get minimum value difference and case in data.table R

我在 R 中有一个 data.table,如下所示:

  State     City City.Population  num.stores
state A   City_1             523           5
state A   City_2             456          NA
state A   City_3            1230          52 
state A   City_4             780          NA
state B   City_5             788          NA
state B   City_6             111          15
state B   City_7             897          NA 
state B   City_8               5          48  

我想通过比较一个州内另一个城市的人口,得到每个 City 另一个城市的 num.stores。例如,在 State ACity_1 中,我会得到 State ACity 2 在人口方面比 City 3 更相似,因为他们之间的人口差异是 67 (在 City_1City_2 之间,均来自 state A)与 707(City_1City_3)相比,因此将 5 家商店分配给 City_2。我的最终结果如下所示:

  State     City City.Population  num.stores  assigned.stores  similar_pop_city
state A   City_1             523           5                5
state A   City_2             456          NA                5            City_1  (City_1 is closer in population and not null)
state A   City_3            1230          52               52 
state A   City_4             780          NA               52            City_1  (City_1 is closer in population and not null)
state B   City_5             788          NA               15            City_6  (City_6 is closer in population and not null)
state B   City_6             111          15               15
state B   City_7             897          NA               15            City_6  (City_6 is closer in population and not null)
state B   City_8               5          48                5  

我尝试执行以下操作,但我仍然缺少执行它的正确逻辑:

d.f[, which.min(City.Population -City.Population), .(State)]

但只返回 1。

P.D 与州内其他城市进行比较时,我也应该排除同一个城市

我不确定我是否得到了错误的输出,或者您想要的输出是否仍然有一些错误(例如,对于 City_4,您将 City_4(本身)指定为 Closest.City).

我首先将 CityClosest.City 分配给 num.stores 中没有 NACity num.stores 中包含 NA ].之后我创建了列 assigned.stores,从 City 复制了 num.stores 和已知的 num.stores,否则分配了 Closest.City.[=29 的 num.stores =]

数据

State <- c(rep("State A", 4), rep("State B", 4))
City <- c("City_1", "City_2", "City_3", "City_4", "City_5", "City_6", "City_7", "City_8")
City.Population <- c(523, 456, 1230, 780, 788, 111, 897, 5)
num.stores <- c(5, NA, 52, NA, NA, 15, NA, 48)
d.f <- data.frame(State, City, City.Population, num.stores)

代码

d.f %>%
  group_by(State) %>%
  mutate(Closest.City = ifelse(is.na(num.stores),
     apply(sapply(City.Population[!is.na(num.stores)], function(i) abs(i - City.Population[!is.na(num.stores)])), 2, 
                              function(n) City[which(n == sort(n)[2])]), NA)) %>%
  mutate(assigned.stores = ifelse(is.na(num.stores), num.stores[match(Closest.City, City)], num.stores))

输出

# A tibble: 8 x 6
# Groups:   State [2]
  State   City   City.Population num.stores Closest.City assigned.stores
  <chr>   <chr>            <dbl>      <dbl> <chr>                  <dbl>
1 State A City_1             523          5 NA                         5
2 State A City_2             456         NA City_1                     5
3 State A City_3            1230         52 NA                        52
4 State A City_4             780         NA City_1                     5
5 State B City_5             788         NA City_6                    15
6 State B City_6             111         15 NA                        15
7 State B City_7             897         NA City_6                    15
8 State B City_8               5         48 NA                        48