如何为 python 中的数据正确获取累积分布函数?

How to get cumulative distribution function correctly for my data in python?

大家好,我有一个值列表,我需要为其获取累积分布函数,我已将此列表保存在变量名称 yvalues 中

[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0, 24.0, 25.0, 26.0, 27.0, 28.0, 29.0, 30.0, 31.0, 32.0, 33.0, 34.0, 35.0, 36.0, 37.0, 38.0, 39.0, 40.0, 41.0, 42.0, 43.0, 44.0, 45.0, 46.0, 47.0, 48.0, 49.0, 50.0, 51.0, 52.0, 53.0, 54.0, 55.0, 56.0, 57.0, 58.0, 59.0, 60.0, 61.0, 62.0, 63.0, 64.0, 65.0, 66.0, 67.0, 68.0, 69.0, 70.0, 71.0, 72.0, 73.0, 74.0, 75.0, 76.0, 77.0, 78.0, 79.0, 80.0, 81.0, 82.0, 83.0, 84.0, 85.0, 86.0, 87.0, 88.0, 89.0, 90.0, 91.0, 92.0, 93.0, 94.0, 95.0, 96.0, 97.0, 98.0, 99.0, 100.0, 101.0, 102.0, 103.0, 104.0, 105.0, 106.0, 107.0, 108.0, 109.0, 110.0, 111.0, 112.0, 113.0, 114.0, 115.0, 116.0, 117.0, 118.0, 119.0, 120.0, 121.0, 122.0, 123.0, 124.0, 125.0, 126.0, 127.0, 128.0, 129.0, 130.0, 131.0, 132.0, 133.0, 134.0, 135.0, 136.0, 137.0, 138.0, 139.0, 140.0, 141.0, 142.0, 143.0, 144.0, 145.0, 146.0, 147.0, 148.0, 149.0, 150.0, 151.0, 152.0, 153.0, 154.0, 155.0, 156.0, 157.0, 158.0, 159.0, 160.0, 161.0, 162.0, 163.0, 164.0, 165.0, 166.0, 167.0, 168.0, 169.0, 170.0, 171.0, 172.0, 173.0, 174.0, 175.0, 176.0, 177.0, 178.0, 179.0, 180.0, 181.0, 182.0, 183.0, 184.0, 185.0, 186.0, 187.0, 188.0, 189.0, 190.0, 191.0, 192.0, 193.0, 194.0, 195.0, 196.0, 197.0, 198.0, 199.0, 200.0, 201.0, 202.0, 203.0, 204.0, 205.0, 206.0, 207.0, 208.0, 209.0, 210.0, 211.0, 212.0, 213.0, 214.0, 215.0, 216.0, 217.0, 218.0, 219.0, 220.0, 221.0, 222.0, 223.0, 224.0, 225.0, 226.0, 227.0, 228.0, 229.0, 230.0, 231.0, 232.0, 233.0, 234.0, 235.0, 236.0, 237.0, 238.0, 239.0, 240.0, 241.0, 242.0, 243.0, 244.0, 245.0, 246.0, 247.0, 248.0, 249.0, 250.0, 251.0, 252.0, 253.0, 254.0, 255.0, 256.0, 257.0, 259.0, 260.0, 261.0, 262.0, 263.0, 264.0, 265.0, 266.0, 267.0, 268.0, 269.0, 270.0, 271.0, 272.0, 273.0, 274.0, 275.0, 276.0, 277.0, 278.0, 279.0, 280.0, 281.0, 282.0, 283.0, 284.0, 285.0, 286.0, 287.0, 288.0, 289.0, 290.0, 291.0, 292.0, 293.0, 294.0, 295.0, 296.0, 298.0, 299.0, 300.0, 301.0, 302.0, 303.0, 304.0, 305.0, 306.0, 307.0, 308.0, 309.0, 310.0, 311.0, 313.0, 315.0, 316.0, 317.0, 318.0, 319.0, 320.0, 321.0, 322.0, 323.0, 324.0, 325.0, 326.0, 327.0, 328.0, 329.0, 331.0, 332.0, 333.0, 334.0, 335.0, 336.0, 337.0, 338.0, 339.0, 340.0, 341.0, 342.0, 343.0, 344.0, 345.0, 346.0, 347.0, 349.0, 350.0, 352.0, 353.0, 354.0, 355.0, 356.0, 357.0, 358.0, 359.0, 360.0, 362.0, 363.0, 364.0, 365.0, 367.0, 368.0, 370.0, 371.0, 372.0, 375.0, 376.0, 377.0, 378.0, 379.0, 380.0, 381.0, 383.0, 384.0, 386.0, 389.0, 390.0, 391.0, 392.0, 393.0, 395.0, 396.0, 397.0, 398.0, 399.0, 400.0, 402.0, 403.0, 404.0, 405.0, 411.0, 412.0, 413.0, 414.0, 415.0, 416.0, 417.0, 419.0, 420.0, 424.0, 425.0, 426.0, 427.0, 428.0, 429.0, 430.0, 431.0, 432.0, 433.0, 434.0, 435.0, 436.0, 438.0, 439.0, 440.0, 442.0, 443.0, 445.0, 446.0, 447.0, 448.0, 452.0, 454.0, 456.0, 458.0, 460.0, 461.0, 462.0, 463.0, 464.0, 467.0, 468.0, 469.0, 470.0, 475.0, 477.0, 479.0, 480.0, 481.0, 482.0, 483.0, 485.0, 486.0, 487.0, 488.0, 492.0, 493.0, 495.0, 500.0, 502.0, 505.0, 508.0, 509.0, 511.0, 514.0, 515.0, 516.0, 517.0, 518.0, 519.0, 520.0, 524.0, 526.0, 527.0, 528.0, 530.0, 531.0, 532.0, 533.0, 534.0, 535.0, 536.0, 537.0, 540.0, 541.0, 545.0, 546.0, 547.0, 548.0, 551.0, 552.0, 553.0, 555.0, 558.0, 562.0, 563.0, 565.0, 566.0, 567.0, 569.0, 570.0, 572.0, 573.0, 574.0, 575.0, 577.0, 579.0, 583.0, 585.0, 587.0, 588.0, 591.0, 593.0, 594.0, 597.0, 599.0, 601.0, 602.0, 607.0, 610.0, 613.0, 614.0, 622.0, 624.0, 627.0, 629.0, 630.0, 631.0, 632.0, 633.0, 636.0, 637.0, 638.0, 640.0, 645.0, 649.0, 654.0, 655.0, 656.0, 658.0, 662.0, 668.0, 676.0, 677.0, 679.0, 682.0, 685.0, 689.0, 691.0, 696.0, 697.0, 699.0, 700.0, 702.0, 703.0, 706.0, 707.0, 721.0, 722.0, 725.0, 727.0, 731.0, 733.0, 735.0, 740.0, 744.0, 747.0, 751.0, 754.0, 760.0, 770.0, 778.0, 779.0, 781.0, 782.0, 791.0, 798.0, 805.0, 807.0, 825.0, 835.0, 840.0, 846.0, 851.0, 877.0, 882.0, 887.0, 893.0, 900.0, 919.0, 926.0, 929.0, 944.0, 959.0, 961.0, 979.0, 984.0, 1012.0, 1017.0, 1042.0, 1043.0, 1048.0, 1055.0, 1062.0, 1077.0, 1089.0, 1111.0, 1128.0, 1162.0, 1203.0, 1204.0, 1243.0, 1300.0, 1318.0, 1325.0, 1339.0, 1362.0, 1425.0, 1483.0, 1512.0, 1657.0, 1671.0, 1709.0, 1751.0, 1812.0, 1889.0, 1955.0, 2138.0, 2147.0, 2171.0, 2205.0, 2278.0, 2558.0, 2574.0, 2781.0, 2783.0, 2790.0, 2815.0, 3019.0, 3034.0, 3278.0, 3292.0, 3415.0, 3452.0, 3579.0, 3760.0, 3857.0, 3944.0, 4111.0, 4698.0, 4994.0, 5191.0, 5586.0, 5647.0, 5874.0, 6072.0, 6440.0, 6491.0, 6772.0, 7973.0, 8341.0, 13170.0, 74473.0, 76745.0, 78061.0, 78955.0, 79225.0, 79500.0, 80509.0, 80968.0, 81203.0, 81462.0, 81506.0, 81761.0, 81989.0, 82215.0, 82426.0, 83003.0, 83011.0, 83108.0, 83129.0, 83425.0, 83457.0, 83553.0, 83609.0, 83705.0, 83844.0, 83973.0, 83996.0, 84075.0, 84283.0, 84336.0, 84524.0, 84676.0, 84787.0, 84830.0, 84943.0, 84944.0, 84960.0, 85071.0, 85088.0, 85170.0, 85194.0, 85235.0, 85353.0, 85400.0, 85557.0, 85589.0, 85599.0, 85600.0, 85716.0, 85820.0, 85824.0, 85830.0, 85846.0, 85934.0, 86022.0, 86067.0, 86177.0, 86186.0, 86195.0, 86228.0, 86279.0, 86282.0, 86289.0, 86327.0, 86336.0, 86340.0, 86359.0, 86366.0, 86370.0, 86371.0, 86376.0, 86377.0, 86385.0, 86390.0, 86391.0, 86396.0, 86397.0, 86398.0, 86399.0, 471967.0, 545161.0, 583973.0]

我试过了

a = yvalues
num_bins = len(a)
counts, bin_edges = np.histogram(a, bins=num_bins, normed=True)
cdf = np.cumsum(counts)

我的 cdf 输出是以下列表,这是不正确的,因为 cdf 输出的最后一个值应该是 1。请帮助我,我不知道我做错了什么。提前致谢。

>>> cdf array([ 0.0009658 , 0.00104114, 0.00106169, 0.00107539, 0.00108909, 0.00109252, 0.00109594, 0.00110279, 0.00110793, 0.00110964, 0.00111135, 0.00111135, 0.00111135, 0.00111135, 0.00111135, 0.00111135, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111307, 0.00111478, 0.00111478, 0.00111478, 0.00111649, 0.0011182 , 0.00111991, 0.00112334, 0.00112505, 0.00112848, 0.00113533, 0.00113875, 0.00115416, 0.00116615, 0.0011867 , 0.00120896, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00124835, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125006, 0.00125177, 0.00125177, 0.00125177, 0.00125177, 0.00125177, 0.00125177, 0.00125177, 0.00125177, 0.00125177, 0.00125177, 0.00125177, 0.00125177, 0.00125177, 0.00125177, 0.00125177, 0.00125177, 0.00125177, 0.00125177, 0.00125177, 0.00125177, 0.00125177, 0.00125177, 0.00125177, 0.00125177, 0.00125177, 0.00125177, 0.00125177, 0.00125177, 0.00125177, 0.00125177, 0.00125177, 0.00125177, 0.00125177, 0.00125177, 0.00125177, 0.00125177, 0.00125177, 0.00125177, 0.00125177, 0.00125177, 0.00125177, 0.00125177, 0.00125177, 0.00125177, 0.00125177, 0.00125177, 0.00125177, 0.00125177, 0.00125348])

normed=True时,counts可以解释为pdf值:

counts, bin_edges = np.histogram(a, bins=num_bins, normed=True)

cdf

给出
dx = bin_edges[1]-bin_edges[0]
cdf = np.cumsum(counts*dx)

bin 边缘之间的距离是均匀的,因此 dx 是常数。 counts*dx 给出每个 bin 的概率质量。现在 np.cumsum 的概率质量给出了累积分布函数。

assert np.allclose(cdf[-1], 1)