计算列表上的高斯曲线拟合
calculate gaussian curve fitting on a list
我有如下列表数据。我想对列表中的每个元素在 mids 和 counts 之间执行非线性回归高斯曲线拟合,并报告均值和标准差
mylist<- structure(list(A = structure(list(breaks = c(-10, -9,
-8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4), counts = c(1L,
0L, 1L, 5L, 9L, 38L, 56L, 105L, 529L, 2858L, 17L, 2L, 0L, 2L),
density = c(0.000276014352746343, 0, 0.000276014352746343,
0.00138007176373171, 0.00248412917471709, 0.010488545404361,
0.0154568037537952, 0.028981507038366, 0.146011592602815,
0.788849020149048, 0.00469224399668783, 0.000552028705492686,
0, 0.000552028705492686), mids = c(-9.5, -8.5, -7.5, -6.5,
-5.5, -4.5, -3.5, -2.5, -1.5, -0.5, 0.5, 1.5, 2.5, 3.5),
xname = "x", equidist = TRUE), .Names = c("breaks", "counts",
"density", "mids", "xname", "equidist"), class = "histogram"),
B = structure(list(breaks = c(-7, -6, -5,
-4, -3, -2, -1, 0), counts = c(2L, 0L, 6L, 2L, 2L, 1L, 3L
), density = c(0.125, 0, 0.375, 0.125, 0.125, 0.0625, 0.1875
), mids = c(-6.5, -5.5, -4.5, -3.5, -2.5, -1.5, -0.5), xname = "x",
equidist = TRUE), .Names = c("breaks", "counts", "density",
"mids", "xname", "equidist"), class = "histogram"), C = structure(list(
breaks = c(-7, -6, -5, -4, -3, -2, -1, 0, 1), counts = c(2L,
2L, 4L, 5L, 14L, 22L, 110L, 3L), density = c(0.0123456790123457,
0.0123456790123457, 0.0246913580246914, 0.0308641975308642,
0.0864197530864197, 0.135802469135802, 0.679012345679012,
0.0185185185185185), mids = c(-6.5, -5.5, -4.5, -3.5,
-2.5, -1.5, -0.5, 0.5), xname = "x", equidist = TRUE), .Names = c("breaks",
"counts", "density", "mids", "xname", "equidist"), class = "histogram")), .Names = c("A",
"B", "C"))
我读过这个
Fitting a density curve to a histogram in R
但这是将曲线拟合到直方图的方法。我想要的是最佳值"
“平均”
“标清”
如果我用PRISM来做,我应该会得到如下结果
对于 A
Mids Counts
-9.5 1
-8.5 0
-7.5 1
-6.5 5
-5.5 9
-4.5 38
-3.5 56
-2.5 105
-1.5 529
-0.5 2858
0.5 17
1.5 2
2.5 0
3.5 2
进行非线性回归高斯曲线拟合,得到
"Best-fit values"
" Amplitude" 3537
" Mean" -0.751
" SD" 0.3842
第二组
B
Mids Counts
-6.5 2
-5.5 0
-4.5 6
-3.5 2
-2.5 2
-1.5 1
-0.5 3
"Best-fit values"
" Amplitude" 7.672
" Mean" -4.2
" SD" 0.4275
第三个
Mids Counts
-6.5 2
-5.5 2
-4.5 4
-3.5 5
-2.5 14
-1.5 22
-0.5 110
0.5 3
我明白了
"Best-fit values"
" Amplitude" 120.7
" Mean" -0.6893
" SD" 0.4397
为了将直方图转换回均值和标准差的估计值。首先转换bin 计数乘以bin 的结果。这将是原始数据的近似值。
根据你上面的例子:
#extract the mid points and create list of simulated data
simdata<-lapply(mylist, function(x){rep(x$mids, x$counts)})
#if the original data were integers then this may give a better estimate
#simdata<-lapply(mylist, function(x){rep(x$breaks[-1], x$counts)})
#find the mean and sd of simulated data
means<-lapply(simdata, mean)
sds<-lapply(simdata, sd)
#or use sapply in the above 2 lines depending on future process needs
如果您的数据是整数,那么使用间隔作为分箱将提供更好的估计。根据直方图的函数(即 right=TRUE/FALSE),结果可能会偏移一个。
编辑
我认为这会很容易。我查看了视频,显示的样本数据是:
mids<-seq(-7, 7)
counts<-c(7, 1, 2, 2, 2, 5, 217, 70, 18, 0, 2, 1, 2, 0, 1)
simdata<-rep(mids, counts)
视频结果的平均值 = -0.7359,标准差 = 0.4571。我发现提供最接近结果的解决方案是使用“fitdistrplus”包:
fitdist(simdata, "norm", "mge")
使用“最大化拟合优度估计”导致均值 = -0.7597280 和 sd= 0.8320465。
在这一点上,上述方法提供了一个接近的估计,但并不完全匹配。我不知道用什么技术从视频中计算出拟合度。
编辑 #2
上述解决方案涉及重新创建原始数据并使用 mean/sd 或使用 fitdistrplus 包对其进行拟合。本次尝试是尝试使用高斯分布进行最小二乘拟合。
simdata<-lapply(mylist, function(x){rep(x$mids, x$counts)})
means<-sapply(simdata, mean)
sds<-sapply(simdata, sd)
#Data from video
#mids<-seq(-7, 7)
#counts<-c(7, 1, 2, 2, 2, 5, 217, 70, 18, 0, 2, 1, 2, 0, 1)
#make list of the bins and distribution in each bin
mids<-lapply(mylist, function(x){x$mids})
dis<-lapply(mylist, function(x) {x$counts/sum(x$counts)})
#function to perform the least square fit
nnorm<-function(values, mids, dis) {
means<-values[1]
sds<-values[2]
#print(paste(means, sds))
#calculate out the Gaussian distribution for each bin
modeld<-dnorm(mids, means, sds)
#sum of the squares
diff<-sum( (modeld-dis)^2)
diff
}
#use optim function with the mean and sd as initial guesses
#find the mininium with the mean and SD as fit parameters
lapply(1:3, function(i) {optim(c(means[[i]], sds[[i]]), nnorm, mids=mids[[i]], dis=dis[[i]])})
此解决方案提供了与 PRISM 结果更接近的答案,但仍然不尽相同。这是所有 4 种解决方案的比较。
从 table 开始,最小二乘拟合(上面那个)提供了最接近的近似值。也许调整中点 dnorm 函数可能会有所帮助。但是Case B的数据最不符合正态分布,但是PRISM软件还是会产生很小的标准差,其他方法都差不多。 PRISM 软件可能会执行某种类型的数据过滤以在拟合之前移除异常值。
我有如下列表数据。我想对列表中的每个元素在 mids 和 counts 之间执行非线性回归高斯曲线拟合,并报告均值和标准差
mylist<- structure(list(A = structure(list(breaks = c(-10, -9,
-8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4), counts = c(1L,
0L, 1L, 5L, 9L, 38L, 56L, 105L, 529L, 2858L, 17L, 2L, 0L, 2L),
density = c(0.000276014352746343, 0, 0.000276014352746343,
0.00138007176373171, 0.00248412917471709, 0.010488545404361,
0.0154568037537952, 0.028981507038366, 0.146011592602815,
0.788849020149048, 0.00469224399668783, 0.000552028705492686,
0, 0.000552028705492686), mids = c(-9.5, -8.5, -7.5, -6.5,
-5.5, -4.5, -3.5, -2.5, -1.5, -0.5, 0.5, 1.5, 2.5, 3.5),
xname = "x", equidist = TRUE), .Names = c("breaks", "counts",
"density", "mids", "xname", "equidist"), class = "histogram"),
B = structure(list(breaks = c(-7, -6, -5,
-4, -3, -2, -1, 0), counts = c(2L, 0L, 6L, 2L, 2L, 1L, 3L
), density = c(0.125, 0, 0.375, 0.125, 0.125, 0.0625, 0.1875
), mids = c(-6.5, -5.5, -4.5, -3.5, -2.5, -1.5, -0.5), xname = "x",
equidist = TRUE), .Names = c("breaks", "counts", "density",
"mids", "xname", "equidist"), class = "histogram"), C = structure(list(
breaks = c(-7, -6, -5, -4, -3, -2, -1, 0, 1), counts = c(2L,
2L, 4L, 5L, 14L, 22L, 110L, 3L), density = c(0.0123456790123457,
0.0123456790123457, 0.0246913580246914, 0.0308641975308642,
0.0864197530864197, 0.135802469135802, 0.679012345679012,
0.0185185185185185), mids = c(-6.5, -5.5, -4.5, -3.5,
-2.5, -1.5, -0.5, 0.5), xname = "x", equidist = TRUE), .Names = c("breaks",
"counts", "density", "mids", "xname", "equidist"), class = "histogram")), .Names = c("A",
"B", "C"))
我读过这个 Fitting a density curve to a histogram in R 但这是将曲线拟合到直方图的方法。我想要的是最佳值"
“平均” “标清”
如果我用PRISM来做,我应该会得到如下结果 对于 A
Mids Counts
-9.5 1
-8.5 0
-7.5 1
-6.5 5
-5.5 9
-4.5 38
-3.5 56
-2.5 105
-1.5 529
-0.5 2858
0.5 17
1.5 2
2.5 0
3.5 2
进行非线性回归高斯曲线拟合,得到
"Best-fit values"
" Amplitude" 3537
" Mean" -0.751
" SD" 0.3842
第二组 B
Mids Counts
-6.5 2
-5.5 0
-4.5 6
-3.5 2
-2.5 2
-1.5 1
-0.5 3
"Best-fit values"
" Amplitude" 7.672
" Mean" -4.2
" SD" 0.4275
第三个
Mids Counts
-6.5 2
-5.5 2
-4.5 4
-3.5 5
-2.5 14
-1.5 22
-0.5 110
0.5 3
我明白了
"Best-fit values"
" Amplitude" 120.7
" Mean" -0.6893
" SD" 0.4397
为了将直方图转换回均值和标准差的估计值。首先转换bin 计数乘以bin 的结果。这将是原始数据的近似值。
根据你上面的例子:
#extract the mid points and create list of simulated data
simdata<-lapply(mylist, function(x){rep(x$mids, x$counts)})
#if the original data were integers then this may give a better estimate
#simdata<-lapply(mylist, function(x){rep(x$breaks[-1], x$counts)})
#find the mean and sd of simulated data
means<-lapply(simdata, mean)
sds<-lapply(simdata, sd)
#or use sapply in the above 2 lines depending on future process needs
如果您的数据是整数,那么使用间隔作为分箱将提供更好的估计。根据直方图的函数(即 right=TRUE/FALSE),结果可能会偏移一个。
编辑
我认为这会很容易。我查看了视频,显示的样本数据是:
mids<-seq(-7, 7)
counts<-c(7, 1, 2, 2, 2, 5, 217, 70, 18, 0, 2, 1, 2, 0, 1)
simdata<-rep(mids, counts)
视频结果的平均值 = -0.7359,标准差 = 0.4571。我发现提供最接近结果的解决方案是使用“fitdistrplus”包:
fitdist(simdata, "norm", "mge")
使用“最大化拟合优度估计”导致均值 = -0.7597280 和 sd= 0.8320465。
在这一点上,上述方法提供了一个接近的估计,但并不完全匹配。我不知道用什么技术从视频中计算出拟合度。
编辑 #2
上述解决方案涉及重新创建原始数据并使用 mean/sd 或使用 fitdistrplus 包对其进行拟合。本次尝试是尝试使用高斯分布进行最小二乘拟合。
simdata<-lapply(mylist, function(x){rep(x$mids, x$counts)})
means<-sapply(simdata, mean)
sds<-sapply(simdata, sd)
#Data from video
#mids<-seq(-7, 7)
#counts<-c(7, 1, 2, 2, 2, 5, 217, 70, 18, 0, 2, 1, 2, 0, 1)
#make list of the bins and distribution in each bin
mids<-lapply(mylist, function(x){x$mids})
dis<-lapply(mylist, function(x) {x$counts/sum(x$counts)})
#function to perform the least square fit
nnorm<-function(values, mids, dis) {
means<-values[1]
sds<-values[2]
#print(paste(means, sds))
#calculate out the Gaussian distribution for each bin
modeld<-dnorm(mids, means, sds)
#sum of the squares
diff<-sum( (modeld-dis)^2)
diff
}
#use optim function with the mean and sd as initial guesses
#find the mininium with the mean and SD as fit parameters
lapply(1:3, function(i) {optim(c(means[[i]], sds[[i]]), nnorm, mids=mids[[i]], dis=dis[[i]])})
此解决方案提供了与 PRISM 结果更接近的答案,但仍然不尽相同。这是所有 4 种解决方案的比较。
从 table 开始,最小二乘拟合(上面那个)提供了最接近的近似值。也许调整中点 dnorm 函数可能会有所帮助。但是Case B的数据最不符合正态分布,但是PRISM软件还是会产生很小的标准差,其他方法都差不多。 PRISM 软件可能会执行某种类型的数据过滤以在拟合之前移除异常值。