如何从 R 中同一列的两个观察值中获取差异
How obtain the difference from two observations from the same column in R
我有一个包含 3 个不同索引的数据集,我需要 HPI 1 和 HPI2 之间的差异以及 HPI2 和 HPI3 之间的差异,用于出租和销售。
我想到的第一件事是将此 DF 转换为每个 HPI 一列的 DF,然后通过简单的减法创建新列,但我不知道该怎么做。
另一个想法是保留来自同一列的值来调节操作和日期。
这是数据的例子
~Index, ~Value, ~Operation, ~Year,
"HPI1", 0.9, "Sale", "2017",
"HPI2", 1.1, "Sale", "2017",
"HPI3", 0.89, "Sale", "2017",
"HPI1", 1.12, "Rent", "2017",
"HPI2", 0.85, "Rent", "2017",
"HPI3", 1.22, "Rent", "2017",
"HPI1", 0.91, "Sale", "2018",
"HPI2", 1.02, "Sale", "2018",
"HPI3", 0.9, "Sale", "2018",
"HPI1", 1.1, "Rent", "2018",
"HPI2", 0.89, "Rent", "2018",
"HPI3", 1.12, "Rent", "2018",)
如果有任何帮助,我将不胜感激。谢谢!
首先,您必须重新组织您的数据。 data.table
包会帮助你。试试这个(df
是你的数据):
library(data.table)
dt <- as.data.table(df)
dt <- dcast(dt,...~Index,value.var="Value")
输出:
# Operation Year HPI1 HPI2 HPI3
# 1: Rent 2017 1.12 0.85 1.22
# 2: Rent 2018 1.10 0.89 1.12
# 3: Sale 2017 0.90 1.10 0.89
# 4: Sale 2018 0.91 1.02 0.90
那你想怎么算就怎么算。尝试:
dt <- dt %>% mutate(Col1=HPI1-HPI2, Col2=HPI2-HPI3)
输出:
# Operation Year HPI1 HPI2 HPI3 Col1 Col2
# 1: Rent 2017 1.12 0.85 1.22 0.27 -0.37
# 2: Rent 2018 1.10 0.89 1.12 0.21 -0.23
# 3: Sale 2017 0.90 1.10 0.89 -0.20 0.21
# 4: Sale 2018 0.91 1.02 0.90 -0.11 0.12
这是另一种方式:
library(dplyr)
library(tidyr)
df <- tribble(
~Index, ~Value, ~Operation, ~Year,
"HPI1", 0.9, "Sale", "2017",
"HPI2", 1.1, "Sale", "2017",
"HPI3", 0.89, "Sale", "2017",
"HPI1", 1.12, "Rent", "2017",
"HPI2", 0.85, "Rent", "2017",
"HPI3", 1.22, "Rent", "2017",
"HPI1", 0.91, "Sale", "2018",
"HPI2", 1.02, "Sale", "2018",
"HPI3", 0.9, "Sale", "2018",
"HPI1", 1.1, "Rent", "2018",
"HPI2", 0.89, "Rent", "2018",
"HPI3", 1.12, "Rent", "2018")
df %>%
group_by(Year, Operation) %>%
pivot_wider(names_from = Index, values_from = Value) %>%
mutate(HPI1_HPI2_diff = HPI1 - HPI2,
HPI2_HPI3_diff = HPI2 - HPI3)
# A tibble: 4 x 7
# Groups: Year, Operation [4]
Operation Year HPI1 HPI2 HPI3 HPI1_HPI2_diff HPI2_HPI3_diff
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Sale 2017 0.9 1.1 0.89 -0.2 0.21
2 Rent 2017 1.12 0.85 1.22 0.27 -0.37
3 Sale 2018 0.91 1.02 0.9 -0.110 0.12
4 Rent 2018 1.1 0.89 1.12 0.21 -0.23
- 使用
tidyr
包中的 pivot_wider
转换为宽格式。
- 计算差值。
df %>%
pivot_wider(
names_from = "Index",
values_from = "Value"
) %>%
mutate(diff_1v2 = HPI1 - HPI2,
diff_1_3 = HPI2 - HPI3)
输出:
# A tibble: 4 x 7
Operation Year HPI1 HPI2 HPI3 diff_1v2 diff_1_3
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Sale 2017 0.9 1.1 0.89 -0.2 0.21
2 Rent 2017 1.12 0.85 1.22 0.27 -0.37
3 Sale 2018 0.91 1.02 0.9 -0.110 0.12
4 Rent 2018 1.1 0.89 1.12 0.21 -0.23
我有一个包含 3 个不同索引的数据集,我需要 HPI 1 和 HPI2 之间的差异以及 HPI2 和 HPI3 之间的差异,用于出租和销售。 我想到的第一件事是将此 DF 转换为每个 HPI 一列的 DF,然后通过简单的减法创建新列,但我不知道该怎么做。 另一个想法是保留来自同一列的值来调节操作和日期。 这是数据的例子
~Index, ~Value, ~Operation, ~Year,
"HPI1", 0.9, "Sale", "2017",
"HPI2", 1.1, "Sale", "2017",
"HPI3", 0.89, "Sale", "2017",
"HPI1", 1.12, "Rent", "2017",
"HPI2", 0.85, "Rent", "2017",
"HPI3", 1.22, "Rent", "2017",
"HPI1", 0.91, "Sale", "2018",
"HPI2", 1.02, "Sale", "2018",
"HPI3", 0.9, "Sale", "2018",
"HPI1", 1.1, "Rent", "2018",
"HPI2", 0.89, "Rent", "2018",
"HPI3", 1.12, "Rent", "2018",)
如果有任何帮助,我将不胜感激。谢谢!
首先,您必须重新组织您的数据。 data.table
包会帮助你。试试这个(df
是你的数据):
library(data.table)
dt <- as.data.table(df)
dt <- dcast(dt,...~Index,value.var="Value")
输出:
# Operation Year HPI1 HPI2 HPI3
# 1: Rent 2017 1.12 0.85 1.22
# 2: Rent 2018 1.10 0.89 1.12
# 3: Sale 2017 0.90 1.10 0.89
# 4: Sale 2018 0.91 1.02 0.90
那你想怎么算就怎么算。尝试:
dt <- dt %>% mutate(Col1=HPI1-HPI2, Col2=HPI2-HPI3)
输出:
# Operation Year HPI1 HPI2 HPI3 Col1 Col2
# 1: Rent 2017 1.12 0.85 1.22 0.27 -0.37
# 2: Rent 2018 1.10 0.89 1.12 0.21 -0.23
# 3: Sale 2017 0.90 1.10 0.89 -0.20 0.21
# 4: Sale 2018 0.91 1.02 0.90 -0.11 0.12
这是另一种方式:
library(dplyr)
library(tidyr)
df <- tribble(
~Index, ~Value, ~Operation, ~Year,
"HPI1", 0.9, "Sale", "2017",
"HPI2", 1.1, "Sale", "2017",
"HPI3", 0.89, "Sale", "2017",
"HPI1", 1.12, "Rent", "2017",
"HPI2", 0.85, "Rent", "2017",
"HPI3", 1.22, "Rent", "2017",
"HPI1", 0.91, "Sale", "2018",
"HPI2", 1.02, "Sale", "2018",
"HPI3", 0.9, "Sale", "2018",
"HPI1", 1.1, "Rent", "2018",
"HPI2", 0.89, "Rent", "2018",
"HPI3", 1.12, "Rent", "2018")
df %>%
group_by(Year, Operation) %>%
pivot_wider(names_from = Index, values_from = Value) %>%
mutate(HPI1_HPI2_diff = HPI1 - HPI2,
HPI2_HPI3_diff = HPI2 - HPI3)
# A tibble: 4 x 7
# Groups: Year, Operation [4]
Operation Year HPI1 HPI2 HPI3 HPI1_HPI2_diff HPI2_HPI3_diff
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Sale 2017 0.9 1.1 0.89 -0.2 0.21
2 Rent 2017 1.12 0.85 1.22 0.27 -0.37
3 Sale 2018 0.91 1.02 0.9 -0.110 0.12
4 Rent 2018 1.1 0.89 1.12 0.21 -0.23
- 使用
tidyr
包中的pivot_wider
转换为宽格式。 - 计算差值。
df %>%
pivot_wider(
names_from = "Index",
values_from = "Value"
) %>%
mutate(diff_1v2 = HPI1 - HPI2,
diff_1_3 = HPI2 - HPI3)
输出:
# A tibble: 4 x 7
Operation Year HPI1 HPI2 HPI3 diff_1v2 diff_1_3
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Sale 2017 0.9 1.1 0.89 -0.2 0.21
2 Rent 2017 1.12 0.85 1.22 0.27 -0.37
3 Sale 2018 0.91 1.02 0.9 -0.110 0.12
4 Rent 2018 1.1 0.89 1.12 0.21 -0.23