在 r 中自动查找和转换值

Question

我有一个包含 45 行的样本数据集，如下所示。

 itemid                    title release_date
16    573          Body Snatchers          1993
17    670          Body Snatchers          1993
41   1645        Butcher Boy, The          1998
42   1650        Butcher Boy, The          1998
1     218               Cape Fear          1991
18    673               Cape Fear          1962
27   1234   Chairman of the Board          1998
43   1654   Chairman of the Board          1998
2     246             Chasing Amy          1997
5     268             Chasing Amy          1997
11    309                Deceiver          1997
37   1606                Deceiver          1997
28   1256 Designated Mourner, The          1997
29   1257 Designated Mourner, The          1997
12    329      Desperate Measures          1998
13    348      Desperate Measures          1998
9     304           Fly Away Home          1996
15    500           Fly Away Home          1996
26   1175               Hugo Pool          1997
39   1617               Hugo Pool          1997
31   1395       Hurricane Streets          1998
38   1607       Hurricane Streets          1998
10    305          Ice Storm, The          1997
21    865          Ice Storm, The          1997
4     266      Kull the Conqueror          1997
19    680      Kull the Conqueror          1997
22    876             Money Talks          1997
24    881             Money Talks          1997
35   1477              Nightwatch          1997
40   1625              Nightwatch          1997
6     274                 Sabrina          1995
14    486                 Sabrina          1954
33   1442     Scarlet Letter, The          1995
36   1542     Scarlet Letter, The          1926
3     251         Shall We Dance?          1996
30   1286         Shall We Dance?          1937
32   1429           Sliding Doors          1998
45   1680           Sliding Doors          1998
20    711  Substance of Fire, The          1996
44   1658  Substance of Fire, The          1996
23    878          That Darn Cat!          1997
25   1003          That Darn Cat!          1997
34   1444          That Darn Cat!          1965
7     297             Ulee's Gold          1997
8     303             Ulee's Gold          1997

我想做的是根据电影名称和电影的发行日期是否相同来转换 itemid。例如，电影“Ulee's Gold”有两个项目 ID 为 297 和 303。我正在尝试找到一种方法来自动执行检查电影发行日期的过程，如果相同，则该电影的项目 ID [2] 应该是替换为 itemid[1]。目前，我通过将 itemid 提取到两个向量 x 和 y 中然后使用向量化更改它们来手动完成此操作。我想知道是否有更好的方法来完成这项任务，因为只有 18 部电影具有多个 id，但数据集有几百个。手动查找和处理将非常耗时。

我正在提供我用来完成此任务的代码。

x <- c(670,1650,1654,268,1606,1257,348,500,1617,1607,865,680,881,1625,1680,1658,1003,303)
y<- c(573,1645,1234,246,309,1256,329,304,1175,1395,305,266,876,1477,1429,711,878,297)


for(i in 1:18)
{
  df$itemid[x[i]] <- y[i]

}

有没有更好的方法来完成这项工作？

Answer 1

我认为你可以在 dplyr 中直接做到：

使用您上面的评论，一个简短的例子：

itemid <- c(878,1003,1444,297,303)
title <- c(rep("That Darn Cat!", 3), rep("Ulee's Gold", 2))
year <- c(1997,1997,1965,1997,1997)

temp <- data.frame(itemid,title,year)
temp

library(dplyr)

temp %>% group_by(title,year) %>% mutate(itemid1 = min(itemid))

(出于某种原因，我将 'release_date' 更改为 'year'...但这基本上将 title/year 组合在一起，搜索最小的 itemid 并创建 mutate具有此最低 'itemid'.

的新变量

给出：

#  itemid          title year itemid1
#1    878 That Darn Cat! 1997     878
#2   1003 That Darn Cat! 1997     878
#3   1444 That Darn Cat! 1965    1444
#4    297    Ulee's Gold 1997     297
#5    303    Ulee's Gold 1997     297

在 r 中自动查找和转换值

Automating finding and converting values in r

automation

r

function