如何找到适合 R 上一系列点的曲线?
How to find a curve that fits a series of points on the R?
我需要找出适应某种疾病每天污染的功率曲线方程,以便进行预测,数据如下:
Day Contaminated
26/feb 1
29/feb 2
04/mar 3
05/mar 8
06/mar 13
07/mar 19
08/mar 25
10/mar 34
11/mar 52
12/mar 81
13/mar 98
14/mar 121
15/mar 176
16/mar 234
17/mar 291
18/mar 428
19/mar 621
20/mar 904
21/mar 1128
22/mar 1546
23/mar 1891
24/mar 2201
25/mar 2433
我认为我需要在 R 中进行幂曲线回归(NonLinearRegression),但我不知道如何实现它。
这是使用 data.table
、ggplot2
和 nls
的方法。
首先,让我们将日期固定为标准格式并转换为整数,以便进行一些计算。
library(data.table)
library(ggplot2)
setDT(data)
data[,Day:= as.Date(Day,"%d/%b")]
data[,Int := as.integer(Day)-min(as.integer(Day))]
然后我们使用 nls
将模型拟合到数据。我们将使用公式 y = a * x ^ b
.
nls(formula = Contaminated ~ a * Int ^ b, data,start=list(a=1,b=1))
# Nonlinear regression model
# model: Contaminated ~ a * Int^b
# data: data
# a b
#2.272e-05 5.571e+00
# residual sum-of-squares: 123279
#
#Number of iterations to convergence: 48
#Achieved convergence tolerance: 7.832e-07
现在我们可以用ggplot
查看结果了。
ggplot(data, aes(x=Int,y=Contaminated)) +
geom_point() +
scale_x_continuous(breaks = c(0,10,20), labels = data$Day[data$Int %in% c(0,10,20)]) + xlab("Date") +
geom_smooth(method="nls", formula = y ~ a * x ^ b,method.args = list(start = c(a=1, b=1)),se=FALSE, linetype = 1)
数据
data <- structure(list(Day = c("26/feb", "29/feb", "04/mar", "05/mar",
"06/mar", "07/mar", "08/mar", "10/mar", "11/mar", "12/mar", "13/mar",
"14/mar", "15/mar", "16/mar", "17/mar", "18/mar", "19/mar", "20/mar",
"21/mar", "22/mar", "23/mar", "24/mar", "25/mar"), Contaminated = c(1L,
2L, 3L, 8L, 13L, 19L, 25L, 34L, 52L, 81L, 98L, 121L, 176L, 234L,
291L, 428L, 621L, 904L, 1128L, 1546L, 1891L, 2201L, 2433L)), class = "data.frame", row.names = c(NA,
-23L))
我需要找出适应某种疾病每天污染的功率曲线方程,以便进行预测,数据如下:
Day Contaminated
26/feb 1
29/feb 2
04/mar 3
05/mar 8
06/mar 13
07/mar 19
08/mar 25
10/mar 34
11/mar 52
12/mar 81
13/mar 98
14/mar 121
15/mar 176
16/mar 234
17/mar 291
18/mar 428
19/mar 621
20/mar 904
21/mar 1128
22/mar 1546
23/mar 1891
24/mar 2201
25/mar 2433
我认为我需要在 R 中进行幂曲线回归(NonLinearRegression),但我不知道如何实现它。
这是使用 data.table
、ggplot2
和 nls
的方法。
首先,让我们将日期固定为标准格式并转换为整数,以便进行一些计算。
library(data.table)
library(ggplot2)
setDT(data)
data[,Day:= as.Date(Day,"%d/%b")]
data[,Int := as.integer(Day)-min(as.integer(Day))]
然后我们使用 nls
将模型拟合到数据。我们将使用公式 y = a * x ^ b
.
nls(formula = Contaminated ~ a * Int ^ b, data,start=list(a=1,b=1))
# Nonlinear regression model
# model: Contaminated ~ a * Int^b
# data: data
# a b
#2.272e-05 5.571e+00
# residual sum-of-squares: 123279
#
#Number of iterations to convergence: 48
#Achieved convergence tolerance: 7.832e-07
现在我们可以用ggplot
查看结果了。
ggplot(data, aes(x=Int,y=Contaminated)) +
geom_point() +
scale_x_continuous(breaks = c(0,10,20), labels = data$Day[data$Int %in% c(0,10,20)]) + xlab("Date") +
geom_smooth(method="nls", formula = y ~ a * x ^ b,method.args = list(start = c(a=1, b=1)),se=FALSE, linetype = 1)
data <- structure(list(Day = c("26/feb", "29/feb", "04/mar", "05/mar",
"06/mar", "07/mar", "08/mar", "10/mar", "11/mar", "12/mar", "13/mar",
"14/mar", "15/mar", "16/mar", "17/mar", "18/mar", "19/mar", "20/mar",
"21/mar", "22/mar", "23/mar", "24/mar", "25/mar"), Contaminated = c(1L,
2L, 3L, 8L, 13L, 19L, 25L, 34L, 52L, 81L, 98L, 121L, 176L, 234L,
291L, 428L, 621L, 904L, 1128L, 1546L, 1891L, 2201L, 2433L)), class = "data.frame", row.names = c(NA,
-23L))