间隔中的切割功能和控制频率

cut function and controlled frequency in the intervals

我的问题很简单:cut() 函数允许选择中断点,我可以沿着这些中断点将向量的范围划分为多个间隔。我希望能够控制新创建的间隔内的观察次数,其方式类似于使用 cut() 函数调用中的分位数参数可以获得的方式。但是我不想使用分位数参数,因为我希望选择固定的间隔,以便我可以在不同的数据库之间匹配它们以进行进一步比较,并且我希望在标签中找到相同的离散值新切割的向量。

我曾经将其用于分位数方法:

df$z<-cut(df$x, quantile(x, (0:10)/10), include.lowest=TRUE)

这很简单。我的新方法更简单,因此它类似于例如:

df$z<-cut(df$x, c(0.04,0.055,0.06,0.065,0.07,0.075,0.08,0.085,0.09,0.095,0.11), include.lowest=T)

然后我有另一个变量,我想根据离散变量的水平计算一些统计数据。

所以它会是这样的:

df$conf.intx<-ifelse(df$z=="1",t.test(df[df$z=="1",]$y)$conf.int[1],
              ifelse(df$z=="2",t.test(df[df$z=="2",]$y)$conf.int[1],
              ifelse(df$z=="3",t.test(df[df$z=="3",]$y)$conf.int[1],
              ifelse(df$z=="4",t.test(df[df$z=="4",]$y)$conf.int[1],NA))))

但对我而言,能够计算出每个 'pools' y 值的这种 t 检验置信区间(与离散区间内的观察值数量相同的数字变量),我需要能够控制每个为 z 创建的间隔内的值的数量,以便我的测试仍然有效,至少就观察数量而言。

简而言之,我需要一个自动化程序来为 z 变量创建中断向量,以便每个中断向量包含最少数量的观察值。更复杂的是,对于两个不同的数据库,它应该是相同的中断,我不知道这是否可能。

欢迎就此事提供任何帮助,提前致谢。

编辑: 这是我的 x 数据样本。

    structure(list(x = c(5.319125, 7.3036667, 5.5166167, 7.0308333, 
5.6812917, 6.5496583, 5.6621833, 6.4682, 5.4897417, 7.185175, 
6.44905, 7.2055833, 7.629375, 6.2282833, 6.6813917, 7.7976, 6.683975, 
5.5089083, 7.307475, 7.3958667, 6.2036583, 6.2488833, 5.9372, 
6.6180167, 6.4167833, 5.640275, 8.7416917, 8.3134167, 6.8996833, 
5.1161917, 7.0606333, 5.2622667, 6.780925, 5.4615417, 6.48185, 
5.51585, 6.2224333, 5.3660667, 7.196525, 6.2984083, 7.0137833, 
7.4490083, 5.9712333, 6.4287833, 7.6693917, 6.4406417, 5.4135083, 
7.16245, 7.2267, 5.820325, 6.066175, 5.760975, 6.4775, 6.2625, 
5.5182583, 8.446625, 8.19025, 6.7955333, 4.7899583, 6.5680167, 
4.5965917, 6.3539333, 4.6639, 6.0489667, 4.9047833, 5.353625, 
4.711425, 6.6268833, 5.5458083, 6.3271917, 6.4591417, 5.1843917, 
5.6117167, 7.1828417, 5.6956917, 5.0271917, 6.741875, 6.68305, 
4.7859667, 5.3068667, 5.3245, 5.745675, 5.7518917, 5.37945, 8.0030417, 
7.7064583, 6.2935333, 5.1838667, 6.9369333, 4.9734583, 6.7257167, 
5.0510333, 6.4257667, 5.2858083, 5.7285167, 5.084, 7.0092833, 
5.905875, 6.6893417, 6.8319583, 5.5558083, 5.9854833, 7.5552167, 
6.064625, 5.3990333, 7.115175, 7.0600167, 5.1644833, 5.6848667, 
5.7014417, 6.1051, 6.1186333, 5.7217667, 8.3685417, 8.071325, 
6.6547333, 5.5972417, 7.4226, 5.539725, 7.26335, 5.645975, 6.87475, 
5.8486167, 6.3001667, 5.5997833, 7.4353167, 6.5089583, 7.213625, 
7.3125667, 6.12095, 6.5410083, 8.0639083, 6.6505167, 5.8886417, 
7.6301167, 7.5850417, 5.7693667, 6.2480167, 6.1847167, 6.6896167, 
6.6323917, 6.1972167, 8.8560333, 8.5501083, 7.1036167, 4.9929583, 
6.9839583, 5.3847417, 6.8814417, 5.59555, 6.7867167, 5.7831333, 
6.9370917, 5.7400917, 7.6922, 6.3151, 7.084725, 7.0414417, 5.95435, 
6.4274167, 7.6692167, 6.9159, 6.0856083, 7.3079583, 7.1937667, 
5.744675, 5.946525, 6.0651833, 6.8488833, 6.5924333, 5.772025, 
8.3281167, 8.5475917, 6.7952917, 8.248525, 5.1931083, 7.0688917, 
5.4793583, 7.0091583, 5.7593, 7.1053333, 5.9382583, 7.1765417, 
6.003075, 7.7699833, 6.2757333, 7.2446583, 7.179275, 6.0013083, 
6.447975, 7.7845833, 6.9071083, 6.1009, 7.425425, 7.4619083, 
5.9380667, 6.2116, 6.13315, 7.0852, 7.0047417, 6.0763917, 8.5926583, 
8.7468417, 7.2485167, 8.5096833, 5.1541, 7.0479917, 5.43065, 
6.9689083, 5.7356, 7.0842917, 5.9051667, 7.1283333, 5.9666667, 
7.7295583, 6.249925, 7.21005, 7.1427167, 5.9675583, 6.4135667, 
7.7448583, 6.874275, 6.0679333, 7.388675, 7.429025, 5.911225, 
6.1757167, 6.095225, 7.045775, 6.9870833, 6.0567333, 8.5771167, 
8.7541917, 7.3187333, 8.5092083, 5.5746, 7.342925, 5.8561667, 
7.4704667, 5.922225, 6.9787, 6.1564167, 7.6059667, 5.9122917, 
7.7848833, 6.6192, 7.34055, 7.2352417, 5.9776083, 6.5197583, 
7.4891583, 7.2185667, 6.4710167, 7.70945, 7.5078083, 6.1470417, 
6.66115, 6.6899333, 7.4454083, 7.2270917, 6.350075, 8.3156667, 
8.9007917, 6.7578083, 8.3258083, 5.1996, 6.9688833, 5.3592917, 
6.7583417, 5.5623583, 6.756375, 5.7361, 7.120425, 5.6567, 7.6174667, 
6.1474833, 7.1442167, 6.74475, 5.5820333, 6.0106, 7.142675, 6.667475, 
5.9067917, 7.2392, 7.058675, 5.6394417, 5.9119167, 5.8367333, 
6.798025, 6.694675, 5.8565917, 8.6035083, 8.912375, 7.0501083, 
8.38045, 4.8478083, 6.7493167, 5.3686667, 6.5152333, 5.282025, 
6.5464333, 5.5085583, 6.870975, 5.4757667, 7.318, 5.92225, 6.9300417, 
6.5758083, 5.4233083, 5.8295583, 7.0451, 6.4790083, 5.68255, 
6.9632833, 6.9965833, 5.5005667, 5.717725, 5.5938083, 6.5309, 
6.4824583, 5.4429833, 8.072575, 8.3635, 6.5797167, 8.0352333, 
4.6289833, 6.64105, 4.8883833, 6.2025833, 5.2291833, 6.4814667, 
5.2211083, 6.5780083, 5.196275, 7.030725, 5.6001583, 6.620475, 
6.2858333, 5.114375, 5.5424417, 6.7784917, 6.1561333, 5.339375, 
6.6249083, 6.6248583, 5.139775, 5.4195, 5.4531833, 6.3348583, 
6.4041417, 5.292, 7.6243833, 7.9624583, 6.3226417, 7.761175, 
4.8419083, 6.8384083, 5.3500417, 6.5903333, 5.33275, 6.732575, 
5.4486, 6.8069417, 5.4569583, 7.26275, 5.835525, 6.8680333, 6.6712333, 
5.4720417, 5.904325, 7.1506917, 6.4746833, 5.638675, 6.9570667, 
7.0017333, 5.5033667, 5.6859333, 5.651875, 6.5903, 6.529725, 
5.4819667, 7.971975, 8.2337833, 6.5815333, 7.9736583, 5.7711917, 
7.543325, 5.8986917, 7.5081333, 6.2920333, 7.5321667, 6.4908917, 
7.7616583, 6.4509417, 8.08035, 6.8219, 7.7939167, 7.6491333, 
6.4773583, 6.9338667, 8.1865583, 7.3998917, 6.572125, 7.9198417, 
8.0568, 6.5880333, 6.8299667, 6.7399833, 7.6436, 7.509275, 6.5139833, 
9.1520167, 9.3580667, 7.65415, 9.0725167, 5.7483583, 7.5230417, 
5.89105, 7.4808833, 6.1969667, 7.4923583, 6.4092583, 7.70695, 
6.3970833, 8.0971333, 6.7949083, 7.76445, 7.6170167, 6.4494333, 
6.8997, 8.1575333, 7.3728417, 6.544075, 7.888, 8.0215, 6.5484, 
6.7911667, 6.7121917, 7.6179083, 7.4731167, 6.4629167, 9.1226333, 
9.3307083, 7.6230583, 9.024875, 5.543925, 7.1460833, 5.6575583, 
7.5986083, 6.027075, 7.4386167, 6.3500333, 7.6694833, 6.3682583, 
8.0843333, 6.7181083, 7.7376, 7.5818583, 6.4010667, 6.8440083, 
8.1217917, 7.3290833, 6.5187333, 7.8591667, 7.9898583, 6.5051, 
6.7251167, 6.6881333, 7.477675, 7.3571333, 6.3351833, 8.881575, 
9.12315, 7.3851, 8.8008667, 5.3437833, 7.1560417, 5.5748, 7.4622583, 
5.9412417, 7.3428667, 6.2594167, 7.5839167, 6.28685, 8.0270917, 
6.6388333, 7.6611, 7.50065, 6.3217167, 6.7594417, 8.0401167, 
7.252425, 6.444, 7.77975, 7.9104167, 6.42495, 6.6421667, 6.6103333, 
7.3489417, 7.23205, 6.2059333, 8.726725, 8.994625, 7.2460917, 
8.660125, 5.2502833, 7.2591, 5.6425417, 6.889925, 5.353675, 6.50635, 
6.260675, 7.4236583, 5.9076417, 7.3915, 6.2134917, 7.1645333, 
6.922675, 6.0295417, 6.1687917, 7.2771083, 6.6152333, 6.3299417, 
7.167325, 6.647275, 5.726475, 5.93905, 6.2888583, 6.7497167, 
6.4364083, 5.8906583, 7.6052917, 8.039425, 6.5672833, 7.8754667, 
6.3086333, 5.352025, 7.2849417, 5.7184833, 6.9675917, 5.5615333, 
6.6157917, 6.3505417, 7.4881, 6.0007417, 7.5110583, 6.35525, 
7.254075, 7.0289083, 6.1994417, 6.2860833, 7.372575, 6.735975, 
6.4628917, 7.3102167, 6.8619417, 5.9123667, 6.1611917, 6.4854083, 
6.8942417, 6.563625, 6.0610083, 7.941625, 8.6969167, 6.66075, 
8.1197167, 6.2802, 3.9638, 5.870825, 4.1852, 5.5841417, 4.3007583, 
5.2352167, 4.4281417, 5.819425, 4.1990917, 5.9338917, 4.89765, 
5.7204333, 5.6546833, 4.5632167, 4.9803333, 5.6962417, 5.247725, 
4.7092583, 6.0145417, 5.6403917, 4.4016917, 4.7181, 4.5007833, 
5.2828917, 5.1314167, 4.7492, 6.777575, 6.9040083, 4.9760583, 
6.4471917, 5.0952833, 3.712725, 5.8215333, 4.025725, 5.5635, 
4.2354083, 5.143525, 4.4900083, 5.6802417, 4.1214333, 5.8128, 
4.7525583, 5.6412583, 5.5534917, 4.487475, 4.8237833, 5.6156917, 
5.0573, 4.5755417, 5.8096083, 5.5252083, 4.3145583, 4.5437417, 
4.194675, 5.0100833, 4.8972333, 4.590025, 6.6441417, 6.5789417, 
4.6947667, 6.1648167, 4.8517333, 3.982925, 5.7966833, 4.1607083, 
5.5564833, 4.2557417, 5.2304083, 4.8661333, 5.912875, 4.4988333, 
6.03915, 4.9131583, 5.8518667, 5.6578583, 4.773225, 4.8958583, 
5.8759833, 5.204725, 4.8961667, 5.9217, 5.58395, 4.5410667, 4.73445, 
4.5922333, 5.2517333, 5.0220333, 4.619475, 6.4883667, 6.429175, 
4.6796417, 6.3171083, 4.93615, 3.9278833, 5.7590417, 4.1155667, 
5.612725, 4.2199833, 5.2126667, 4.805275, 5.8888833, 4.4363, 
6.0380083, 4.892, 5.8192083, 5.64205, 4.708825, 4.8751583, 5.833775, 
5.2210417, 4.853225, 5.924225, 5.5856583, 4.5386167, 4.7280917, 
4.5618, 5.264425, 5.03855, 4.5539, 6.4993, 6.4900667, 4.6749083, 
6.2961333, 4.918525, 4.0890583, 6.33385, 4.3470083, 5.9645, 4.6541833, 
5.5438667, 4.9556583, 6.1590583, 4.6379417, 6.2876833, 5.2235167, 
6.1387167, 6.0547583, 4.9545667, 5.254125, 6.05395, 5.4813417, 
4.9971333, 6.2266583, 5.9172833, 4.7275917, 4.9274917, 4.443575, 
5.3164917, 5.2507083, 5.1704583, 7.173075, 6.9351583, 5.0816667, 
6.5568, 5.3417667, 5.1705167, 7.0777833, 5.6253333, 7.231225, 
5.5799167, 6.6942917, 6.1014583, 7.538725, 5.7152667, 7.459275, 
6.2406083, 7.064925, 6.9234417, 5.8328833, 6.1819583, 7.2127583, 
6.8071583, 6.2599417, 7.2975417, 6.973875, 5.804125, 6.1944667, 
6.38855, 7.0553583, 6.8393167, 6.1275417, 7.9986833, 8.5846, 
6.4682167, 8.0134583, 6.1805917, 5.0699583, 6.9006667, 5.36365, 
6.9204917, 5.4478667, 6.5391583, 6.0647417, 7.2951667, 5.6632833, 
7.25595, 6.1057333, 6.9578417, 6.8235583, 5.8671833, 6.0716417, 
7.060175, 6.5401, 6.1229417, 7.1305083, 6.7823417, 5.62415, 5.9202, 
5.9957167, 6.7142167, 6.4706417, 5.9004667, 7.8304583, 8.2144667, 
6.1530583, 7.6896417, 5.9285333, 4.2625417, 5.9677583, 4.58695, 
6.0400083, 4.4215333, 5.6052833, 5.04165, 6.48845, 4.6423583, 
6.1688833, 5.0256167, 5.926725, 5.7214667, 4.746375, 4.9828, 
6.1583083, 5.6903, 5.217375, 6.1341583, 5.7868083, 4.5895333, 
4.98235, 5.159725, 5.7866167, 5.6300833, 4.882975, 6.7210833, 
7.4314833, 5.2493083, 6.8503833, 5.2225583, 3.8417833, 5.9798, 
4.1168583, 5.63415, 4.3311333, 5.0777667, 4.6606833, 5.789425, 
4.3565167, 5.9736167, 4.8910667, 5.9445417, 5.699275, 4.6897167, 
4.9036083, 5.8767, 5.088675, 4.6224417, 5.8052833, 5.5697167, 
4.3237, 4.6084333, 4.2958833, 5.1394417, 5.0137583, 4.7711, 6.771275, 
6.5984417, 4.845625, 6.3338083, 5.1370333, 3.1820167, 5.2699667, 
3.4827167, 5.0992583, 3.7040583, 4.6358583, 4.1604917, 5.2488333, 
3.7522, 5.3774167, 4.2636167, 5.1998167, 5.0456333, 4.051475, 
4.289175, 5.1718917, 4.5787083, 4.1461667, 5.2983167, 5.03025, 
3.8709333, 4.0917167, 3.731925, 4.5584167, 4.4200333, 4.061375, 
6.064225, 6.02975, 4.1590167, 5.6589083, 4.2614833, 3.68695, 
5.587375, 3.91725, 5.3387, 4.0061667, 4.9563833, 4.1942, 5.6720583, 
3.9584333, 5.6873583, 4.6251, 5.4801417, 5.3975583, 4.2382, 4.6710917, 
5.4898083, 5.0469667, 4.4950083, 5.72005, 5.46085, 4.30355, 4.5525917, 
4.3681667, 5.1723167, 5.0331417, 4.4793083, 6.5492917, 6.720225, 
4.7550917, 6.197775, 4.8082917, 4.09925, 5.986525, 4.3104417, 
5.68455, 4.4287167, 5.3555667, 4.5191083, 5.9269833, 4.2695917, 
5.9984167, 4.981225, 5.8049917, 5.7680667, 4.5736667, 5.0673583, 
5.7443583, 5.2811083, 4.719175, 6.0376667, 5.73875, 4.3947333, 
4.8157333, 4.6093417, 5.3906417, 5.2357417, 4.684825, 6.8885583, 
7.018425, 5.0878167, 6.5122333, 5.2084, 3.810525, 6.2600083, 
3.6246583, 5.7396417, 4.0617917, 5.6724583, 4.2505833, 4.7518417, 
4.1232, 6.208375, 4.5881167, 5.252575, 5.71795, 4.0840583, 4.700325, 
6.2360333, 4.701725, 3.922525, 5.5162167, 5.6220333, 3.8836833, 
4.4883667, 4.5398583)), .Names = "x", row.names = c(NA, -962L
), class = "data.frame")

假设我想要每个间隔 30 个值('n'),这是我使用的代码:

df$z<-cut(df$x, seq(30,length(df$x),by=30)/length(df$x), include.lowest=T)

这给了我:

> table(df$z)

[0.0312,0.0624] (0.0624,0.0936]  (0.0936,0.125]   (0.125,0.156]   (0.156,0.187]   (0.187,0.218]   (0.218,0.249]   (0.249,0.281]   (0.281,0.312]   (0.312,0.343]   (0.343,0.374] 
              0               0               0               0               0               0               0               0               0               0               0 
  (0.374,0.405]   (0.405,0.437]   (0.437,0.468]   (0.468,0.499]    (0.499,0.53]    (0.53,0.561]   (0.561,0.593]   (0.593,0.624]   (0.624,0.655]   (0.655,0.686]   (0.686,0.717] 
              0               0               0               0               0               0               0               0               0               0               0 
  (0.717,0.748]    (0.748,0.78]    (0.78,0.811]   (0.811,0.842]   (0.842,0.873]   (0.873,0.904]   (0.904,0.936]   (0.936,0.967]   (0.967,0.998] 
              0               0               0               0               0               0               0               0               0 

我想要的结果与我使用分位数得到的结果类似:

df$zbis<-cut(df$x, quantile(df$x, (0:20)/20), include.lowest=T)
table(df$zbis)

[3.18,4.29] (4.29,4.62] (4.62,4.89] (4.89,5.14] (5.14,5.33] (5.33,5.53] (5.53,5.66]  (5.66,5.8]  (5.8,5.94]  (5.94,6.1]  (6.1,6.26] (6.26,6.45] (6.45,6.58] (6.58,6.74] (6.74,6.93] 
         49          48          48          48          48          48          48          48          48          48          48          48          48          48          48 
(6.93,7.14] (7.14,7.34] (7.34,7.62] (7.62,8.06] (8.06,9.36] 
         48          48          48          48          49 

除非我希望这对另一个数据库是可重现的,所以我不能使用分位数函数,因为我不会在不同的数据库上得到相同的间隔。

第二次编辑: 这是来自另一个数据库的第二个样本。 'x' 是同一个变量,它们的范围相似。

structure(list(x = c(5.319125, 7.3036667, 5.5166167, 7.0308333, 
5.6812917, 6.5496583, 5.6621833, 6.4682, 5.4897417, 7.185175, 
6.44905, 7.2055833, 7.629375, 6.2282833, 6.6813917, 7.7976, 6.683975, 
5.5089083, 7.307475, 7.3958667, 6.2036583, 6.2488833, 5.9372, 
6.6180167, 6.4167833, 5.640275, 8.7416917, 8.3134167, 6.8996833, 
5.1931083, 7.0688917, 5.4793583, 7.0091583, 5.7593, 7.1053333, 
5.9382583, 7.1765417, 6.003075, 7.7699833, 6.2757333, 7.2446583, 
7.179275, 6.0013083, 6.447975, 7.7845833, 6.9071083, 6.1009, 
7.425425, 7.4619083, 5.9380667, 6.2116, 6.13315, 7.0852, 7.0047417, 
6.0763917, 8.5926583, 8.7468417, 7.2485167, 8.5096833, 5.177275, 
7.09985, 5.6444667, 7.0102417, 5.7303833, 7.0383333, 5.9870583, 
7.3342083, 5.9363667, 7.7753333, 6.38355, 7.389575, 7.0396667, 
5.889625, 6.29395, 7.51135, 6.940925, 6.1455417, 7.4281833, 7.4657167, 
5.9707083, 6.1902083, 6.0936167, 6.9595167, 6.85065, 5.8525, 
8.5148083, 8.805625, 7.00665, 8.4457, 5.3437833, 7.1560417, 5.5748, 
7.4622583, 5.9412417, 7.3428667, 6.2594167, 7.5839167, 6.28685, 
8.0270917, 6.6388333, 7.6611, 7.50065, 6.3217167, 6.7594417, 
8.0401167, 7.252425, 6.444, 7.77975, 7.9104167, 6.42495, 6.6421667, 
6.6103333, 7.3489417, 7.23205, 6.2059333, 8.726725, 8.994625, 
7.2460917, 8.660125, 3.614125, 5.6345917, 3.9410417, 5.2901417, 
4.0147333, 4.766825, 4.4500417, 5.5189, 4.11375, 5.6350667, 4.5756917, 
5.5998833, 5.3663, 4.44405, 4.5767417, 5.552025, 4.847425, 4.4382583, 
5.5769417, 5.2390667, 4.0610917, 4.4054833, 4.1917, 4.9029083, 
4.6935917, 4.3499417, 6.0562333, 6.081225, 4.45855, 6.0121583, 
4.740275, 4.5028, 6.4177833, 4.8716417, 6.1469917, 4.6208917, 
5.7748083, 5.4530083, 6.694125, 5.0944333, 6.5123167, 5.3257083, 
6.2765333, 6.0149167, 5.1815583, 5.30715, 6.4149083, 5.82245, 
5.515425, 6.3654333, 5.8472833, 4.9798917, 5.1833583, 5.5210333, 
6.0410667, 5.7377917, 5.2666083, 7.0378167, 7.744175, 5.718725, 
7.3220583, 5.24325, 5.3256, 7.2155167, 5.696925, 7.0029667, 5.5235, 
6.7261083, 6.2810667, 7.546825, 5.90915, 7.3299167, 6.2227333, 
7.147075, 6.9142417, 6.0012083, 6.1725333, 7.29815, 6.7, 6.3454583, 
7.2129583, 6.7559833, 5.8115, 6.0756667, 6.458225, 6.9969167, 
6.778825, 6.2245833, 8.0809583, 8.875325, 6.7210917, 8.3203, 
6.3513, 5.2591333, 7.1404917, 5.6266417, 6.9356, 5.4568, 6.6604, 
6.206025, 7.48525, 5.8323667, 7.24635, 6.1446583, 7.066275, 6.8334, 
5.9198667, 6.09505, 7.2206583, 6.63085, 6.270075, 7.1397333, 
6.689125, 5.7441333, 6.042575, 6.38255, 6.9325833, 6.7175667, 
6.1592, 8.00415, 8.8051167, 6.647125, 8.2465667, 6.2788167, 6.49435, 
8.1847583, 6.664475, 8.0528583, 6.6822417, 7.376, 7.1517833, 
8.2306833, 6.8584583, 8.3052167, 7.288375, 8.2758583, 7.7162583, 
7.2807833, 7.0459, 8.2507833, 7.5855, 7.0505917, 8.2230167, 8.1669, 
6.8184667, 6.9700583, 7.0936167, 7.7615667, 7.6239083, 7.0921667, 
9.02585, 9.3416167, 7.6256333, 9.0869333, 8.0984667, 4.116325, 
6.1680917, 4.56965, 5.797725, 4.36085, 5.42455, 5.144075, 6.1531833, 
4.77825, 6.2533417, 5.0192083, 5.99395, 5.6934083, 4.9074167, 
4.9823083, 5.9861667, 5.4068833, 5.1872833, 6.10095, 5.659325, 
4.6632833, 4.86315, 5.221775, 5.5878, 5.3217083, 4.8202333, 6.4883083, 
6.69355, 4.952075, 6.7075583, 5.00015, 5.2502833, 7.2591, 5.6425417, 
6.889925, 5.353675, 6.50635, 6.260675, 7.4236583, 5.9076417, 
7.3915, 6.2134917, 7.1645333, 6.922675, 6.0295417, 6.1687917, 
7.2771083, 6.6152333, 6.3299417, 7.167325, 6.647275, 5.726475, 
5.93905, 6.2888583, 6.7497167, 6.4364083, 5.8906583, 7.6052917, 
8.039425, 6.5672833, 7.8754667, 6.3086333, 5.352025, 7.2849417, 
5.7184833, 6.9675917, 5.5615333, 6.6157917, 6.3505417, 7.4881, 
6.0007417, 7.5110583, 6.35525, 7.254075, 7.0289083, 6.1994417, 
6.2860833, 7.372575, 6.735975, 6.4628917, 7.3102167, 6.8619417, 
5.9123667, 6.1611917, 6.4854083, 6.8942417, 6.563625, 6.0610083, 
7.941625, 8.6969167, 6.66075, 8.1197167, 6.2802, 3.9638, 5.870825, 
4.1852, 5.5841417, 4.3007583, 5.2352167, 4.4281417, 5.819425, 
4.1990917, 5.9338917, 4.89765, 5.7204333, 5.6546833, 4.5632167, 
4.9803333, 5.6962417, 5.247725, 4.7092583, 6.0145417, 5.6403917, 
4.4016917, 4.7181, 4.5007833, 5.2828917, 5.1314167, 4.7492, 6.777575, 
6.9040083, 4.9760583, 6.4471917, 5.0952833, 3.712725, 5.8215333, 
4.025725, 5.5635, 4.2354083, 5.143525, 4.4900083, 5.6802417, 
4.1214333, 5.8128, 4.7525583, 5.6412583, 5.5534917, 4.487475, 
4.8237833, 5.6156917, 5.0573, 4.5755417, 5.8096083, 5.5252083, 
4.3145583, 4.5437417, 4.194675, 5.0100833, 4.8972333, 4.590025, 
6.6441417, 6.5789417, 4.6947667, 6.1648167, 4.8517333, 4.1059833, 
5.9023167, 4.2812417, 5.6593917, 4.3587583, 5.3359583, 4.983275, 
6.0223417, 4.6178333, 6.1545333, 5.0244667, 5.9596, 5.7608833, 
4.8875333, 4.9990583, 5.9919333, 5.3157417, 5.0169333, 6.024775, 
5.6717167, 4.6372083, 4.8370583, 4.7311333, 5.3704, 5.133575, 
4.7174917)), .Names = "x", row.names = c(NA, -455L), class = "data.frame")

一些评论后更新

既然你说每个组中的最小病例数对你来说没问题,我会选择 Hmisc::cut2

v <- rnorm(10, 0, 1)
Hmisc::cut2(v, m = 3) # minimum of 3 cases per group

cut2 的文档指出:

m   desired minimum number of observations in a group.
    The algorithm does not guarantee that all groups will have at least m observations.

对单独的变量进行相同的切割

如果变量的分布非常相似,您可以通过设置参数 onlycuts = T 来提取确切的切点,并将它们重新用于其他变量。如果分布不同,您最终会在某些时间间隔内得到很少的案例。

使用您的数据:

library(magrittr)
library(Hmisc)

cuts <- cut2(df1$x, g = 20, onlycuts = T) # determine cuts based on df1

cut2(df1$x, cuts = cuts) %>% table
cut2(df2$x, cuts = cuts) %>% table*2 # multiplied by two for better comparison

这是一个很好的例子,说明如何不提出问题。最后我们有一个示例,可以将 post 代码应用于它。 (您显然天真地在我的评论中粘贴了确切的代码,而没有考虑如何在问题的上下文中表达 'n' 和 'N'。我确实需要添加 prob=c( seq(...) , 1) 以捕获最高值。

这假定您需要 100 人的组(尽管仍然不清楚为什么需要这样做)。

 x$xct <- cut( x$x, breaks=quantile(x$x, prob=c( seq(100, length(x$x), by=100)/length(x$x) , 1) ))
 table(x$xct)

(4.64,5.17] (5.17,5.57] (5.57,5.85] (5.85,6.17] (6.17,6.51] (6.51,6.85] 
        100         100         100         100         100         100 
(6.85,7.26] (7.26,7.94] (7.94,9.36] 
        100         100          62