R:拟合具有大 x 范围的卡方分布
R: Fitting Chi-squared distribution with large x range
在有限范围内很容易得到卡方分布的良好拟合:
library(MASS)
nnn <- 1000
set.seed(101)
chii <- rchisq(nnn,4, ncp = 0) ## Generating a chi-sq distribution
chi_df <- fitdistr(chii,"chi-squared",start=list(df=3),method="BFGS") ## Fitting
chi_k <- chi_df[[1]][1] ## Degrees of freedom
chi_hist <- hist(chii,breaks=50,freq=FALSE) ## PLotting the histogram
curve(dchisq(x,df=chi_k),add=TRUE,col="green",lwd=3) ## Plotting the line
但是,假设我有一个数据集,其中分布分布在 X 轴上,并且它的新值由类似以下内容给出:
chii <- 5*rchisq(nnn,4, ncp = 0)
在不知道这个乘法因子的情况下 5
对于真实的数据集,我如何归一化 rchisq()
/复杂的数据如何与fitdistr()
很好地契合?
在此先感谢您的帮助!
您将不得不遍历自由度以找到最适合您的数据。首先你可能知道卡方分布的均值是自由度,让我们用它来调整你的数据并解决你的问题。
总而言之,您遍历可能的自由度以找到最适合您调整后的数据的自由度。
library(MASS)
nnn <- 1000
set.seed(101)
x <- round(runif(1,1,100)) # generate a random multiplier
chii <- x*rchisq(nnn,4, ncp = 0) ## Generating a shifted chi-sq distribution
max_df <- 100 # max degree of freedom to test (here from 1 to 100)
chi_df_disp <- rep(NA,max_df)
# loop across degree of freedom
for (i in 1:max_df) {
chii_adjusted <- (chii/mean(chii))*i # Adjust the chi-sq distribution so that the mean matches the tested degree of freedom
chi_fit <- fitdistr(chii_adjusted,"chi-squared",start=list(df=i),method="BFGS") ## Fitting
chi_df_disp[i] <- chi_fit$estimate/i # This is going to give you the dispersion between the fitted df and the tested df
}
# Find the value with the smallest dispersion (i.e. the best match between the estimated df and the tested df)
real_df <- which.min(abs(chi_df_disp-1))
print(real_df) # print the real degree of freedom after correction
现在您可以使用 "real" 自由度来调整您的卡方分布并绘制理论分布线。
chii_adjusted <- (chii/mean(chii))*real_df
chi_hist <- hist(chii_adjusted,breaks=50,freq=FALSE) ## PLotting the histogram
curve(dchisq(x,df=real_df),add=TRUE,col="green",lwd=3) ## Plotting the line
在有限范围内很容易得到卡方分布的良好拟合:
library(MASS)
nnn <- 1000
set.seed(101)
chii <- rchisq(nnn,4, ncp = 0) ## Generating a chi-sq distribution
chi_df <- fitdistr(chii,"chi-squared",start=list(df=3),method="BFGS") ## Fitting
chi_k <- chi_df[[1]][1] ## Degrees of freedom
chi_hist <- hist(chii,breaks=50,freq=FALSE) ## PLotting the histogram
curve(dchisq(x,df=chi_k),add=TRUE,col="green",lwd=3) ## Plotting the line
但是,假设我有一个数据集,其中分布分布在 X 轴上,并且它的新值由类似以下内容给出:
chii <- 5*rchisq(nnn,4, ncp = 0)
在不知道这个乘法因子的情况下 5
对于真实的数据集,我如何归一化 rchisq()
/复杂的数据如何与fitdistr()
很好地契合?
在此先感谢您的帮助!
您将不得不遍历自由度以找到最适合您的数据。首先你可能知道卡方分布的均值是自由度,让我们用它来调整你的数据并解决你的问题。
总而言之,您遍历可能的自由度以找到最适合您调整后的数据的自由度。
library(MASS)
nnn <- 1000
set.seed(101)
x <- round(runif(1,1,100)) # generate a random multiplier
chii <- x*rchisq(nnn,4, ncp = 0) ## Generating a shifted chi-sq distribution
max_df <- 100 # max degree of freedom to test (here from 1 to 100)
chi_df_disp <- rep(NA,max_df)
# loop across degree of freedom
for (i in 1:max_df) {
chii_adjusted <- (chii/mean(chii))*i # Adjust the chi-sq distribution so that the mean matches the tested degree of freedom
chi_fit <- fitdistr(chii_adjusted,"chi-squared",start=list(df=i),method="BFGS") ## Fitting
chi_df_disp[i] <- chi_fit$estimate/i # This is going to give you the dispersion between the fitted df and the tested df
}
# Find the value with the smallest dispersion (i.e. the best match between the estimated df and the tested df)
real_df <- which.min(abs(chi_df_disp-1))
print(real_df) # print the real degree of freedom after correction
现在您可以使用 "real" 自由度来调整您的卡方分布并绘制理论分布线。
chii_adjusted <- (chii/mean(chii))*real_df
chi_hist <- hist(chii_adjusted,breaks=50,freq=FALSE) ## PLotting the histogram
curve(dchisq(x,df=real_df),add=TRUE,col="green",lwd=3) ## Plotting the line