替换 R 数据框中的缺失值
Replace missing values in R dataframe
我有数据:
Date
Price
"2021-01-01"
1
"2021-01-02"
NA
"2021-01-03"
NA
"2021-01-04"
NA
"2021-01-05"
NA
"2021-01-06"
6
"2021-01-07"
NA
"2021-01-08"
NA
"2021-01-09"
3
我想用均值替换缺失值,这样最终结果将如下所示:
Date
Price
"2021-01-01"
1
"2021-01-02"
2
"2021-01-03"
3
"2021-01-04"
4
"2021-01-05"
5
"2021-01-06"
6
"2021-01-07"
5
"2021-01-08"
4
"2021-01-09"
3
我认为你有多个价格列,你从那里得到了价格。然后你想创建一个名为 Price
的新列,它是平均值并且没有 NA
值。
library(tidyverse)
library(dplyr)
Date <- c("2021-01-01","2021-01-02","2021-01-03","2021-01-04","2021-01-05",
"2021-01-06", "2021-01-07", "2021-01-08","2021-01-09", "2021-01-08","2021-01-09")
your.price.col1 <- c(floor(runif(9,0,100)),NA,NA)
your.price.col2 <- c(floor(runif(9,0,100)),33,44)
df <- data.frame(Date, your.price.col1,your.price.col2)
# slice your price cols, which you want to include in the mean with [2:3] for col1 and col2
df %>%
mutate(Price = rowMeans(df[2:3], na.rm=T))
Date your.price.col1 your.price.col2 Price
1 2021-01-01 96 55 75.5
2 2021-01-02 22 43 32.5
3 2021-01-03 68 62 65.0
4 2021-01-04 18 51 34.5
5 2021-01-05 27 6 16.5
6 2021-01-06 26 30 28.0
7 2021-01-07 32 22 27.0
8 2021-01-08 53 95 74.0
9 2021-01-09 74 78 76.0
10 2021-01-08 NA 33 33.0
11 2021-01-09 NA 44 44.0
一种方法是使用 imputeTS
库中的 na_interpolation
:
imputeTS::na_interpolation(c(1, NA, NA, 4))
# 1 2 3 4
imputeTS::na_interpolation(c(6, NA, NA, 3))
# 6 5 4 3
您可以使用 zoo::na.approx
:
library(zoo)
na.approx(dat$Price)
# [1] 1 2 3 4 5 6 5 4 3
我有数据:
Date | Price |
---|---|
"2021-01-01" | 1 |
"2021-01-02" | NA |
"2021-01-03" | NA |
"2021-01-04" | NA |
"2021-01-05" | NA |
"2021-01-06" | 6 |
"2021-01-07" | NA |
"2021-01-08" | NA |
"2021-01-09" | 3 |
我想用均值替换缺失值,这样最终结果将如下所示:
Date | Price |
---|---|
"2021-01-01" | 1 |
"2021-01-02" | 2 |
"2021-01-03" | 3 |
"2021-01-04" | 4 |
"2021-01-05" | 5 |
"2021-01-06" | 6 |
"2021-01-07" | 5 |
"2021-01-08" | 4 |
"2021-01-09" | 3 |
我认为你有多个价格列,你从那里得到了价格。然后你想创建一个名为 Price
的新列,它是平均值并且没有 NA
值。
library(tidyverse)
library(dplyr)
Date <- c("2021-01-01","2021-01-02","2021-01-03","2021-01-04","2021-01-05",
"2021-01-06", "2021-01-07", "2021-01-08","2021-01-09", "2021-01-08","2021-01-09")
your.price.col1 <- c(floor(runif(9,0,100)),NA,NA)
your.price.col2 <- c(floor(runif(9,0,100)),33,44)
df <- data.frame(Date, your.price.col1,your.price.col2)
# slice your price cols, which you want to include in the mean with [2:3] for col1 and col2
df %>%
mutate(Price = rowMeans(df[2:3], na.rm=T))
Date your.price.col1 your.price.col2 Price
1 2021-01-01 96 55 75.5
2 2021-01-02 22 43 32.5
3 2021-01-03 68 62 65.0
4 2021-01-04 18 51 34.5
5 2021-01-05 27 6 16.5
6 2021-01-06 26 30 28.0
7 2021-01-07 32 22 27.0
8 2021-01-08 53 95 74.0
9 2021-01-09 74 78 76.0
10 2021-01-08 NA 33 33.0
11 2021-01-09 NA 44 44.0
一种方法是使用 imputeTS
库中的 na_interpolation
:
imputeTS::na_interpolation(c(1, NA, NA, 4))
# 1 2 3 4
imputeTS::na_interpolation(c(6, NA, NA, 3))
# 6 5 4 3
您可以使用 zoo::na.approx
:
library(zoo)
na.approx(dat$Price)
# [1] 1 2 3 4 5 6 5 4 3