用 R 拟合分布

Fitting distributions with R

下午好。我有一个包含 16000 个值的向量 'a'。我借助以下内容获得了描述性统计信息:

library(pastecs)
library(timeDate)
 stat.desc(a)
 skewness(a)
 kurtosis(a)

特别是skewness=-0.5012, kurtosis=420.8073 (1)

然后我构建了我的经验数据的直方图:

 hist(a, col="lightblue", breaks = 140, border="white", main="",
         xlab="Value",xlim=c(-0.001,0.001))

在此之后,我尝试将理论分布拟合到我的经验数据中。我选择 Variance-Gamma 分布并尝试在我的数据上得到它的参数估计值:

library(VarianceGammma)
 a_VG<-vgFit(a)

参数估计值如下: vgC=-11.7485, sigma=0.4446, theta=11.7193, nu=0.1186 (2)

此外,我使用 (2) 中的参数从方差-Gamma 分布创建了一个样本 并构建创建的理论值的直方图:

VG<-rvg(length(a),vgC=-11.7485,sigma=0.4446,theta=11.7193,nu=0.1186)
 hist(VG,breaks=140,col="orange",main="",xlab="Value")

但第二个直方图与第一个(经验)直方图完全不同。而且,它是建立在参数(2)的基础上的,这是我在经验数据上得到的

我的代码有什么问题?我该如何解决?

P.S。当我输入 dput(a[abs(a) > 5e-4]) 时,我得到:

c(0.000801110480004752, 0.000588162271316861, 0.000555169128569233, 
0.000502563410256229, 0.000854633994686438, 0.00593622112246628, 
-0.000506168123513007, -0.000502909585836875, 0.000720924373137422, 
0.00119141739181039, 0.000548159382141478, -0.000516511318695123, 
-0.000744590777740584, 0.000595213912401249, 0.000514055190913965, 
-0.000589061375421807, -0.00175392114572581, 0.000745548313668465, 
-0.00075910234096277, -0.00059987613053103, 0.000583568488865538, 
0.00426484136013094, 0.000610760059768012, 0.000575522836335551, 
0.000823785810599276, 0.00181936036509178, -0.00073316272551871, 
-0.00184238143420679, -0.000519146793923397, -0.00120324664043103, 
-0.000882469414168696, -0.00148118339830283, 0.000929612782487155, 
0.000565364610238817, 0.000578158613453894, 0.00060479145432879, 
-0.00520576206828594, 0.000708404040882016, 0.00105224485893451, 
0.000636486872540587, -0.00359655507585543, 0.000769164650506582, 
0.000635701125126786, 0.000570489501935612, -0.000641260260277221, 
0.000735092947873994, 0.000757195823062773, 0.000556002742616357, 
-0.00207489740356159, -0.000553386431560554, 0.000511326871983186, 
0.000504591469525195, -0.000749886905655472, -0.0013939718643865, 
-0.000513742626250036, -0.00105021597423516, -0.00156667292147716, 
0.000864563166150134, 0.00433724128055069, 0.00053855648931922, 
-0.00150732363190365, 0.00052621785349416, 0.000987781100809215, 
0.000560725818171903, 0.00176012436713435, -0.000594895431092368, 
-0.000686229580335151, 0.00138682284509528, -0.000531964338888358, 
-0.00179959148771403, 0.000574543871314503, -0.000686996216439084, 
-0.000559043343629995, 0.00055881173674166, -0.000636332688477736, 
-0.000623778186703561, -0.00173834148094443, -0.000567224129968125, 
-0.00122578683434504, 0.00130960156515414, -0.000548203197176633, 
-0.000522749285863711, -0.000820371086264871, 0.000756014225812507, 
-0.000714081490558627, -0.000617600335221624, 0.000523639760748651, 
-0.000578502663833191, 0.00107478825239227, 0.000612725356974764, 
-0.00065509337422931, 0.000505887803587513, -0.000566716376848575, 
0.000511727090058756, 0.000572807738912218, -0.000756026937699161, 
0.000547948751494332, 0.000628323894238392, -0.000541350489317693, 
-0.00133529454372372, -0.000590618859845904, -0.000700581963648972, 
0.000735987224462775, 0.000528958898682319, 0.000838250041022448, 
-0.000519084424130511, -0.00052258402856431, -0.000538130765869838, 
-0.000631819887885854, 0.00054800880764283, 0.00266115500510899, 
-0.000839092093771754, 0.000559253571783103, -0.000801028189803432, 
-0.000608879021022801, -0.000538018076854385, -0.000689859734395171, 
0.00329650346269972, 0.000765494493951024, -0.000689450477848297, 
-0.000560199139975737, 0.00159082699266122, -0.00208548663121455, 
-0.000598493596793759, 0.000563544422691464, 0.000626996183768824, 
-0.000653166846808162, -0.000851350174739807, -0.00140687473245116, 
-0.000887003220306326, -0.000765614651347946, -0.00100676206277761, 
0.000724714394852555, 0.00108872127644233, -0.000678558537305918, 
-0.000705087556212902, 0.000544828152248655, -0.000791700964308362, 
0.000606125736727137, -0.00119335967326073, 0.00075413211796338, 
0.000526038939010931, 0.00086543737231537, -0.000817788712950573, 
-0.000584070926663571, 0.000619657281937691, 0.000680783312420274, 
-0.000513831718574664, -0.00050972403875349, -0.00114542220685365, 
-0.00070564389723593, -0.01057964950882, -0.000610357922434801, 
0.000818264221596365, 0.000940825400308043, -0.000726555639413817, 
-0.000591089505560305, 0.000564738888193972, -0.00068515060569041, 
0.000668920238348747, -0.00110103375121717, -0.0015480433031172, 
0.000663030855223568, 0.000500097431997304, -0.000600730311271391, 
-0.000672397772962796, -0.000607852365856587, 0.000536711920570809, 
0.000595055206488837, 0.000523123873687581, 0.000977280737528119, 
0.000616410821629998, 0.000788593666889881, -0.000671642905915704, 
0.000717328711735021, -0.000551853104219902, -0.000565153434708421, 
-0.000802585212152707, 0.000536342062561701, 0.000682048510343591, 
-0.000541902545439399, 0.000779676683974273, 0.000698841439971787, 
0.000559313965908359, -0.00064986819016255, 0.000795421518319017, 
0.00364973919549527, 0.000669658692276087, 0.00109045476974678, 
0.000514411572742901, 0.000503832507211754, -0.000507376233564116, 
0.001232871590787, 0.000561820312542594, -0.000501190337518054, 
-0.000769036505996468, -0.000695537959007453, -0.000572065848166048, 
-0.00167929926328192, 0.000597078186826749, 0.00710238430870014, 
0.000745192112519888, -0.00116091022028009, -0.000791139281769659, 
-0.00148898466632552, 0.000565144038962018, -0.000514019821833855, 
-0.00148427996685285, -0.000822717245339888, -0.00062922111212238, 
-0.000636011367371125, 0.00119640327632808, 0.000548455410294579, 
0.000652678152560426, 0.000509244387833618, 0.000961872348987924, 
0.000662064072514568, -0.00068116858054168, -0.000569930302445343, 
0.00188358126928101, 0.00130560555273895, 0.000593470885775105, 
0.00160093110088155, 0.000785262438315115, -0.000912313442922752, 
0.000609996052359563, 0.000720137994393966, 0.000568163899000496, 
0.00128685533068307, -0.000756787473447318, 0.000765932134255465, 
0.00064884753100003, 0.000687571386270847, -0.000582094290400903, 
-0.000693177295971736, -0.000601776208094762, 0.000503616387996786, 
-0.000615095866544735, -0.000799593899689199, 0.000773750859128342, 
-0.000522576090260074, 0.000503578107212022, -0.00104492224837571, 
0.000547928732299141, 0.00310304337507183, 0.000893382870797765, 
-0.000577792878910799, -0.000647710366578735, -0.00061992948706191, 
0.000825702487162516, 0.000606579510524341, 0.000552792484727505, 
0.000688600840895504, 0.000505093563534231, -0.000728420573667066, 
-0.00157924525963438, -0.000603846616019865, -0.000521941317177976, 
0.00150498158245682, -0.000584572670337735, 0.000713757870583365, 
0.000524287801789924, 0.00107217649464886, 0.00213147531822244, 
0.000566012832157625, -0.00069828890607937, 0.000641567963736378, 
-0.000509531713644762, -0.000547564140049417, -0.00115275240244728, 
0.000560465768010943, -0.000651807371497171, -0.00096487058986483, 
0.000753687665266511, -0.000665599418910645, -0.000691278087025182, 
-0.000578010050725553, -0.000685833148198256, 0.000698470819832764, 
0.00102943368139208, -0.000725840586788706, 0.00125882415960632, 
-0.000630791474954151, -0.000764813558678412, -0.000638539347184164, 
0.000654486496518558, 0.000547453642294471, 0.000572020020495501, 
-0.000605791001705214, 0.00660211658324172, 0.00114928683282756, 
0.000985676480677711, -0.000694668292547718, -0.000528955637964401, 
0.000647975568638159, 0.00116454536417443, 0.000506748841724303, 
-0.000500925156604382, -0.000567015088082101, 0.00128711230206946, 
0.000533633762033858, 0.00505991432758357, 0.000518058378462527, 
-0.000592822519784875, 0.00177414999018666, 0.00059845426944527, 
-0.000511614433724716, 0.0016614697907098, 0.000852196464322219, 
0.00241689725305427, -0.000614317948913978, -0.000729717143318709, 
-0.000612900648802039, -0.000727983564232204, -0.000694965869158182, 
-0.000527752006066251, -0.000584233784708843, 0.000522097476268968, 
0.000543092880677776, 0.000947121210698398, -0.00241810275096377, 
0.00181893137435019, 0.000931873879297385, 0.000512116215015013, 
0.000724985702444059, -0.000566713495050664, 0.000603953591362227
)

拟合后的数据如下所示(经验直方图-蓝色,理论直方图-橙色):

hist

中包含freq=FALSE时相同

这都是由于 a 中的异常值未由您显示的直方图表示。这可能是峰度非常高和 vgFit() 算法未能找到合适的原因。

在控制台中输入 dput(a[abs(a) > 5e-4]) 并将输出复制到您的问题中。人们然后可以重新创建类似矢量 a 的东西,而不必获取所有 16000 个值并调试 vgFit 问题。


感谢您提供额外的数据。那里有一些极端值,但我不认为这些是导致 vgFit 出现问题的原因。拟合几乎可以是任何值的 4 个参数很困难,但您可以通过将数据重新缩放为典型值来帮助解决这个问题。试试这个:

b <- (a-mean(a))/sd(a)
vgf <- vgFit(b)
vgf$param
VG <- rvg(16000, param = vgf$param)
VG_rescaled <- VG*sd(a)+mean(a)
hist(VG_rescaled, breaks=140, col="orange", main="", xlab="Value")

看看这两个直方图现在是否足够接近。