如何从 R 中同一列的两个观察值中获取差异

How obtain the difference from two observations from the same column in R

我有一个包含 3 个不同索引的数据集,我需要 HPI 1 和 HPI2 之间的差异以及 HPI2 和 HPI3 之间的差异,用于出租和销售。 我想到的第一件事是将此 DF 转换为每个 HPI 一列的 DF,然后通过简单的减法创建新列,但我不知道该怎么做。 另一个想法是保留来自同一列的值来调节操作和日期。 这是数据的例子

  ~Index,    ~Value, ~Operation, ~Year,
  "HPI1",    0.9,    "Sale", "2017",
  "HPI2",    1.1,    "Sale", "2017",
  "HPI3",    0.89,   "Sale", "2017",
  "HPI1",    1.12,   "Rent", "2017",
  "HPI2",    0.85,   "Rent", "2017",
  "HPI3",    1.22,   "Rent", "2017",
  "HPI1",  0.91,   "Sale", "2018",
  "HPI2",  1.02,   "Sale", "2018",
  "HPI3",    0.9,    "Sale", "2018",
  "HPI1",    1.1,    "Rent", "2018",
  "HPI2",    0.89,   "Rent", "2018",
  "HPI3",    1.12,   "Rent", "2018",) 

如果有任何帮助,我将不胜感激。谢谢!

首先,您必须重新组织您的数据。 data.table 包会帮助你。试试这个(df 是你的数据):

library(data.table)
dt <- as.data.table(df)
dt <- dcast(dt,...~Index,value.var="Value")

输出:

#    Operation Year HPI1 HPI2 HPI3
# 1:      Rent 2017 1.12 0.85 1.22
# 2:      Rent 2018 1.10 0.89 1.12
# 3:      Sale 2017 0.90 1.10 0.89
# 4:      Sale 2018 0.91 1.02 0.90

那你想怎么算就怎么算。尝试:

dt <- dt %>% mutate(Col1=HPI1-HPI2, Col2=HPI2-HPI3) 

输出:

#    Operation Year HPI1 HPI2 HPI3  Col1  Col2
# 1:      Rent 2017 1.12 0.85 1.22  0.27 -0.37
# 2:      Rent 2018 1.10 0.89 1.12  0.21 -0.23
# 3:      Sale 2017 0.90 1.10 0.89 -0.20  0.21
# 4:      Sale 2018 0.91 1.02 0.90 -0.11  0.12

这是另一种方式:

library(dplyr)
library(tidyr)

df <- tribble(
  ~Index,    ~Value, ~Operation, ~Year,
  "HPI1",    0.9,    "Sale", "2017",
  "HPI2",    1.1,    "Sale", "2017",
  "HPI3",    0.89,   "Sale", "2017",
  "HPI1",    1.12,   "Rent", "2017",
  "HPI2",    0.85,   "Rent", "2017",
  "HPI3",    1.22,   "Rent", "2017",
  "HPI1",  0.91,   "Sale", "2018",
  "HPI2",  1.02,   "Sale", "2018",
  "HPI3",    0.9,    "Sale", "2018",
  "HPI1",    1.1,    "Rent", "2018",
  "HPI2",    0.89,   "Rent", "2018",
  "HPI3",    1.12,   "Rent", "2018")

df %>%
  group_by(Year, Operation) %>% 
  pivot_wider(names_from = Index, values_from = Value) %>%
  mutate(HPI1_HPI2_diff = HPI1 - HPI2, 
         HPI2_HPI3_diff = HPI2 - HPI3)

# A tibble: 4 x 7
# Groups:   Year, Operation [4]
  Operation Year   HPI1  HPI2  HPI3 HPI1_HPI2_diff HPI2_HPI3_diff
  <chr>     <chr> <dbl> <dbl> <dbl>          <dbl>          <dbl>
1 Sale      2017   0.9   1.1   0.89         -0.2             0.21
2 Rent      2017   1.12  0.85  1.22          0.27           -0.37
3 Sale      2018   0.91  1.02  0.9          -0.110           0.12
4 Rent      2018   1.1   0.89  1.12          0.21           -0.23

  1. 使用 tidyr 包中的 pivot_wider 转换为宽格式。
  2. 计算差值。
df %>% 
  pivot_wider(
    names_from = "Index", 
    values_from = "Value"
  ) %>% 
  mutate(diff_1v2 = HPI1 - HPI2,
         diff_1_3 = HPI2 - HPI3)

输出:

# A tibble: 4 x 7
  Operation Year   HPI1  HPI2  HPI3 diff_1v2 diff_1_3
  <chr>     <chr> <dbl> <dbl> <dbl>    <dbl>    <dbl>
1 Sale      2017   0.9   1.1   0.89   -0.2       0.21
2 Rent      2017   1.12  0.85  1.22    0.27     -0.37
3 Sale      2018   0.91  1.02  0.9    -0.110     0.12
4 Rent      2018   1.1   0.89  1.12    0.21     -0.23