使用 dplyr 在 for 循环中按行操作

Question

我有一些传输数据，我想在 for 循环中按行执行 if 比较。数据看起来像这样。

# Using the iris dataset 
> iris <- as.data.frame(iris)
> head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

结果将记录每个物种中萼片长度与花瓣宽度相等的实例。这样我们就记录了花瓣宽度相等的萼片长度对（这只是一个例子，没有科学意义）。这会产生这样的结果：

Species Petal.Width Sepal.Length1 Sepal.Length2
setosa          0.2         5.1             4.9
setosa          0.2         5.1             4.7
setosa          0.2         4.9             4.7
setosa          0.2         5.1             4.6
...

我最初的 Python-ish 想法是在 for 循环中执行 for 循环，看起来像这样：

for s in unique(Species):
  for i in 1:nrow(iris):
    for j in 1:nrow(iris):
      if iris$Petal.Width[i,] == iris$Petal.Width[j,]:
        Output$Species = iris$Species[i,]
        Output$Petal.Width = iris$Petal.Width[i,]
        Output$Sepal.Length1= iris$Sepal.Length[i,]
        Output$Sepal.Length2= iris$Sepal.Length[j,]
    end
  end
end

本来想过先用group_by分类Species实现第一个for循环for s in unique(Species):。但是我不知道如何按行比较数据集中的每个观察值，并像第二个代码块一样存储它。我在 and 上看到了问题。如果上面的代码不那么清楚，我深表歉意。第一次在这里问问题。

Answer 1

使用dplyr：

library(dplyr)    

iris %>%
      group_by(Species,Petal.Width) %>%
      mutate(n = n()) %>%
      filter(n > 1) %>%
      mutate(Sepal.Length1 = Sepal.Length,
             Sepal.Length2 = Sepal.Length1 - Petal.Width) %>%
      arrange(Petal.Width) %>%
      select(Species, Petal.Width, Sepal.Length1, Sepal.Length2)

这是对 Species 和 Petal.Width 进行分组，计算它们相同的实例，仅选择存在超过 1 个唯一配对的情况，然后将 Sepal.Length 重命名为 Sepal.Length1，并创建一个新变量 Sepal.Length2 = Sepal.Length1 - Petal.Width

为定义范围内的每个 Species 记录 Sepal.Length：

minpw <- min(Petal.Width)
maxpw <- max(Petal.Width)

iris %>%
  group_by(Sepal.Length, Species, petal_width_range = cut(Petal.Width, breaks = seq(minpw,maxpw,by=0.2))) %>%
  summarise(count = n())

使用 dplyr 在 for 循环中按行操作

Rowwise operation within for loop using dplyr

group-by

r

dplyr

rowwise