如何检查R中数据的凸性?

How to check convexity of data in R?

我正在尝试使用函数来识别数据的凸性,而不是使用绘图的视觉观察。

这是数据:

          x1        x2           x3        y
1   2.302585 0.8340324 -0.181482974 1.455921
2   2.397895 0.8745914 -0.133998493 1.504507
3   2.484907 0.9102351 -0.094052368 1.546571
4   2.564949 0.9419387 -0.059815044 1.583520
5   2.639057 0.9704218 -0.030024476 1.616374
6   2.708050 0.9962289 -0.003778236 1.645887
7   2.772589 1.0197814  0.019588330 1.672631
8   2.833213 1.0414115  0.040577028 1.697047
9   2.890372 1.0613851  0.059574781 1.719478
10  2.944439 1.0799183  0.076885390 1.740201
11  2.995732 1.0971887  0.092751181 1.759435
12  3.044522 1.1133441  0.107368148 1.777364
13  3.091042 1.1285084  0.120896759 1.794136
14  3.135494 1.1427868  0.133469846 1.809879
15  3.178054 1.1562690  0.145198448 1.824699
16  3.218876 1.1690322  0.156176206 1.838688
17  3.258097 1.1811431  0.166482733 1.851924
18  3.295837 1.1926601  0.176186204 1.864476
19  3.332205 1.2036341  0.185345397 1.876404
20  3.367296 1.2141100  0.194011296 1.887760
21  3.401197 1.2241275  0.202228379 1.898591
22  3.433987 1.2337220  0.210035646 1.908938
23  3.465736 1.2429250  0.217467466 1.918838
24  3.496508 1.2517646  0.224554260 1.928325
25  3.526361 1.2602663  0.231323069 1.937428
26  3.555348 1.2684530  0.237798021 1.946174
27  3.583519 1.2763453  0.244000729 1.954587
28  3.610918 1.2839620  0.249950617 1.962690
29  3.637586 1.2913203  0.255665198 1.970502
30  3.663562 1.2984358  0.261160310 1.978041
31  3.688879 1.3053227  0.266450321 1.985326
32  3.713572 1.3119942  0.271548296 1.992370
33  3.737670 1.3184623  0.276466149 1.999187
34  3.761200 1.3247381  0.281214769 2.005792
35  3.784190 1.3308318  0.285804134 2.012195
36  3.806662 1.3367528  0.290243403 2.018407
37  3.828641 1.3425100  0.294541006 2.024440
38  3.850148 1.3481115  0.298704714 2.030301
39  3.871201 1.3535648  0.302741703 2.036000
40  3.891820 1.3588770  0.306658617 2.041545
41  3.912023 1.3640546  0.310461612 2.046944
42  3.931826 1.3691039  0.314156406 2.052203
43  3.951244 1.3740304  0.317748315 2.057329
44  3.970292 1.3788396  0.321242292 2.062327
45  3.988984 1.3835366  0.324642955 2.067205
46  4.007333 1.3881260  0.327954621 2.071967
47  4.025352 1.3926123  0.331181324 2.076617
48  4.043051 1.3969997  0.334326845 2.081162
49  4.060443 1.4012921  0.337394728 2.085605
50  4.077537 1.4054932  0.340388301 2.089950
51  4.094345 1.4096066  0.343310691 2.094201
52  4.110874 1.4136356  0.346164843 2.098362
53  4.127134 1.4175833  0.348953529 2.102437
54  4.143135 1.4214527  0.351679364 2.106428
55  4.158883 1.4252465  0.354344815 2.110339
56  4.174387 1.4289676  0.356952216 2.114173
57  4.189655 1.4326183  0.359503770 2.117932
58  4.204693 1.4362012  0.362001567 2.121620
59  4.219508 1.4397185  0.364447583 2.125238
60  4.234107 1.4431723  0.366843695 2.128789
61  4.248495 1.4465649  0.369191683 2.132275
62  4.262680 1.4498980  0.371493238 2.135699
63  4.276666 1.4531738  0.373749966 2.139062
64  4.290459 1.4563938  0.375963396 2.142367
65  4.304065 1.4595599  0.378134984 2.145615
66  4.317488 1.4626738  0.380266116 2.148808
67  4.330733 1.4657369  0.382358113 2.151948
68  4.343805 1.4687508  0.384412236 2.155036
69  4.356709 1.4717169  0.386429689 2.158074
70  4.369448 1.4746367  0.388411622 2.161063
71  4.382027 1.4775113  0.390359131 2.164005
72  4.394449 1.4803422  0.392273270 2.166901
73  4.406719 1.4831305  0.394155042 2.169753
74  4.418841 1.4858774  0.396005410 2.172561
75  4.430817 1.4885839  0.397825296 2.175327
76  4.442651 1.4912513  0.399615585 2.178052
77  4.454347 1.4938805  0.401377124 2.180737
78  4.465908 1.4964726  0.403110725 2.183384
79  4.477337 1.4990284  0.404817171 2.185992
80  4.488636 1.5015490  0.406497210 2.188564
81  4.499810 1.5040351  0.408151563 2.191099
82  4.510860 1.5064877  0.409780924 2.193600
83  4.521789 1.5089076  0.411385958 2.196066
84  4.532599 1.5112956  0.412967306 2.198499
85  4.543295 1.5136525  0.414525586 2.200900
86  4.553877 1.5159789  0.416061392 2.203269
87  4.564348 1.5182757  0.417575296 2.205607
88  4.574711 1.5205435  0.419067852 2.207914
89  4.584967 1.5227830  0.420539590 2.210192
90  4.595120 1.5249948  0.421991025 2.212442
91  4.605170 1.5271796  0.423422652 2.214663
92  4.615121 1.5293380  0.424834950 2.216856
93  4.624973 1.5314705  0.426228380 2.219022
94  4.634729 1.5335777  0.427603389 2.221162
95  4.644391 1.5356602  0.428960408 2.223277
96  4.653960 1.5377185  0.430299854 2.225366
97  4.663439 1.5397532  0.431622130 2.227430
98  4.672829 1.5417646  0.432927627 2.229470
99  4.682131 1.5437534  0.434216722 2.231486
100 4.691348 1.5457199  0.435489780 2.233479
101 4.700480 1.5476647  0.436747155 2.235450
102 4.709530 1.5495882  0.437989191 2.237398
103 4.718499 1.5514907  0.439216219 2.239325
104 4.727388 1.5533728  0.440428562 2.241230
105 4.736198 1.5552348  0.441626530 2.243114
106 4.744932 1.5570771  0.442810428 2.244977
107 4.753590 1.5589002  0.443980548 2.246820
108 4.762174 1.5607043  0.445137176 2.248644
109 4.770685 1.5624898  0.446280589 2.250448
110 4.779123 1.5642572  0.447411053 2.252233
111 4.787492 1.5660066  0.448528831 2.254000
112 4.795791 1.5677386  0.449634176 2.255748
113 4.804021 1.5694533  0.450727333 2.257478
114 4.812184 1.5711511  0.451808541 2.259191
115 4.820282 1.5728323  0.452878034 2.260886
116 4.828314 1.5744973  0.453936037 2.262564
117 4.836282 1.5761462  0.454982769 2.264225
118 4.844187 1.5777794  0.456018445 2.265870
119 4.852030 1.5793972  0.457043273 2.267499
120 4.859812 1.5809998  0.458057455 2.269112
121 4.867534 1.5825875  0.459061188 2.270709
122 4.875197 1.5841606  0.460054665 2.272291
123 4.882802 1.5857192  0.461038071 2.273858
124 4.890349 1.5872637  0.462011589 2.275411
125 4.897840 1.5887943  0.462975396 2.276948
126 4.905275 1.5903111  0.463929665 2.278471
127 4.912655 1.5918145  0.464874564 2.279981
128 4.919981 1.5933047  0.465810258 2.281476
129 4.927254 1.5947818  0.466736906 2.282958
130 4.934474 1.5962461  0.467654665 2.284426
131 4.941642 1.5976978  0.468563687 2.285881
132 4.948760 1.5991370  0.469464120 2.287323
133 4.955827 1.6005641  0.470356109 2.288752
134 4.962845 1.6019791  0.471239796 2.290169
135 4.969813 1.6033823  0.472115319 2.291573
136 4.976734 1.6047738  0.472982813 2.292966
137 4.983607 1.6061539  0.473842408 2.294346
138 4.990433 1.6075226  0.474694234 2.295714
139 4.997212 1.6088802  0.475538416 2.297071
140 5.003946 1.6102269  0.476375077 2.298416
141 5.010635 1.6115627  0.477204337 2.299750
142 5.017280 1.6128879  0.478026312 2.301072
143 5.023881 1.6142026  0.478841118 2.302384
144 5.030438 1.6155070  0.479648865 2.303685
145 5.036953 1.6168013  0.480449664 2.304975
146 5.043425 1.6180854  0.481243622 2.306255
147 5.049856 1.6193597  0.482030842 2.307524
148 5.056246 1.6206243  0.482811428 2.308784
149 5.062595 1.6218792  0.483585480 2.310033
150 5.068904 1.6231247  0.484353094 2.311272
151 5.075174 1.6243608  0.485114368 2.312502
152 5.081404 1.6255877  0.485869395 2.313721
153 5.087596 1.6268055  0.486618267 2.314932
154 5.093750 1.6280143  0.487361074 2.316133
155 5.099866 1.6292143  0.488097904 2.317324
156 5.105945 1.6304056  0.488828843 2.318507
157 5.111988 1.6315883  0.489553975 2.319681
158 5.117994 1.6327625  0.490273383 2.320845
159 5.123964 1.6339284  0.490987149 2.322001
160 5.129899 1.6350859  0.491695351 2.323149
161 5.135798 1.6362353  0.492398067 2.324287
162 5.141664 1.6373767  0.493095373 2.325418
163 5.147494 1.6385101  0.493787345 2.326540
164 5.153292 1.6396357  0.494474056 2.327654
165 5.159055 1.6407535  0.495155576 2.328760
166 5.164786 1.6418637  0.495831977 2.329858
167 5.170484 1.6429663  0.496503328 2.330948
168 5.176150 1.6440615  0.497169696 2.332030
169 5.181784 1.6451493  0.497831147 2.333104
170 5.187386 1.6462299  0.498487747 2.334171
171 5.192957 1.6473033  0.499139560 2.335231
172 5.198497 1.6483696  0.499786649 2.336283
173 5.204007 1.6494288  0.500429074 2.337327
174 5.209486 1.6504812  0.501066896 2.338365
175 5.214936 1.6515268  0.501700175 2.339395
176 5.220356 1.6525656  0.502328968 2.340419
177 5.225747 1.6535977  0.502953333 2.341435
178 5.231109 1.6546232  0.503573326 2.342445
179 5.236442 1.6556423  0.504189002 2.343448
180 5.241747 1.6566548  0.504800414 2.344444
181 5.247024 1.6576611  0.505407616 2.345433
182 5.252273 1.6586610  0.506010660 2.346416
183 5.257495 1.6596547  0.506609598 2.347393
184 5.262690 1.6606423  0.507204479 2.348363
185 5.267858 1.6616239  0.507795352 2.349326
186 5.273000 1.6625994  0.508382267 2.350284
187 5.278115 1.6635690  0.508965271 2.351235
188 5.283204 1.6645327  0.509544412 2.352181
189 5.288267 1.6654906  0.510119734 2.353120
190 5.293305 1.6664428  0.510691284 2.354053
191 5.298317 1.6673893  0.511259105 2.354981
192 5.303305 1.6683302  0.511823242 2.355902
193 5.308268 1.6692655  0.512383738 2.356818
194 5.313206 1.6701954  0.512940635 2.357728
195 5.318120 1.6711199  0.513493974 2.358633
196 5.323010 1.6720389  0.514043797 2.359532
197 5.327876 1.6729527  0.514590144 2.360425
198 5.332719 1.6738612  0.515133054 2.361314
199 5.337538 1.6747645  0.515672566 2.362196
200 5.342334 1.6756627  0.516208719 2.363074
201 5.347108 1.6765558  0.516741550 2.363946
202 5.351858 1.6774438  0.517271096 2.364813
203 5.356586 1.6783269  0.517797394 2.365674
204 5.361292 1.6792050  0.518320480 2.366531
205 5.365976 1.6800783  0.518840389 2.367383
206 5.370638 1.6809467  0.519357155 2.368229
207 5.375278 1.6818104  0.519870814 2.369071
208 5.379897 1.6826693  0.520381398 2.369908
209 5.384495 1.6835235  0.520888942 2.370740
210 5.389072 1.6843731  0.521393476 2.371567
211 5.393628 1.6852182  0.521895035 2.372390
212 5.398163 1.6860587  0.522393649 2.373208
213 5.402677 1.6868946  0.522889349 2.374021
214 5.407172 1.6877262  0.523382166 2.374830
215 5.411646 1.6885533  0.523872131 2.375634
216 5.416100 1.6893761  0.524359273 2.376433
217 5.420535 1.6901945  0.524843622 2.377229
218 5.424950 1.6910087  0.525325207 2.378019
219 5.429346 1.6918186  0.525804055 2.378806
220 5.433722 1.6926244  0.526280195 2.379588
221 5.438079 1.6934259  0.526753655 2.380366
222 5.442418 1.6942234  0.527224461 2.381140
223 5.446737 1.6950168  0.527692642 2.381909
224 5.451038 1.6958061  0.528158222 2.382675
225 5.455321 1.6965915  0.528621229 2.383436
226 5.459586 1.6973729  0.529081687 2.384193
227 5.463832 1.6981503  0.529539623 2.384947
228 5.468060 1.6989239  0.529995061 2.385696
229 5.472271 1.6996936  0.530448026 2.386441
230 5.476464 1.7004596  0.530898541 2.387183
231 5.480639 1.7012217  0.531346632 2.387920
232 5.484797 1.7019801  0.531792321 2.388654
233 5.488938 1.7027347  0.532235632 2.389384
234 5.493061 1.7034857  0.532676587 2.390110
235 5.497168 1.7042331  0.533115210 2.390833
236 5.501258 1.7049768  0.533551522 2.391552
237 5.505332 1.7057170  0.533985546 2.392267
238 5.509388 1.7064536  0.534417303 2.392979
239 5.513429 1.7071867  0.534846815 2.393687
240 5.517453 1.7079163  0.535274102 2.394391
241 5.521461 1.7086425  0.535699186 2.395092
242 5.525453 1.7093652  0.536122088 2.395790
243 5.529429 1.7100846  0.536542826 2.396484
244 5.533389 1.7108006  0.536961422 2.397175
245 5.537334 1.7115132  0.537377895 2.397862
246 5.541264 1.7122226  0.537792264 2.398546
247 5.545177 1.7129286  0.538204550 2.399227
248 5.549076 1.7136314  0.538614769 2.399904
249 5.552960 1.7143310  0.539022943 2.400578
250 5.556828 1.7150275  0.539429088 2.401249

如果我们绘制数据,

plot(y~x1, type = "l")
plot(y~x2, type = "l")
plot(y~x3, type = "l")

我们会看到第一个情节看起来是凹的;第二个情节看起来很直;第三个情节看起来有点凸。 R中有没有可以测试这个的函数?即有没有可以识别数据凸性的函数(而不是函数)?

谢谢!

您可以使用 CVXR 包来识别数据的凸性。我将举例说明如何处理这个问题。您可以使用此 link 获取额外信息 (https://cran.r-project.org/web/packages/CVXR/vignettes/cvxr_intro.html):

library(kableExtra)
set.seed(123)

n <- 100
p <- 10
beta <- -4:5   # beta is just -4 through 5.

X <- matrix(rnorm(n * p), nrow=n)
colnames(X) <- paste0("beta_", beta)
Y <- X %*% beta + rnorm(n)

ls.model <- lm(Y ~ 0 + X)   # There is no intercept in our model above
m <- data.frame(ls.est = coef(ls.model))
rownames(m) <- paste0("$\beta_{", 1:p, "}$")

# load packages
suppressWarnings(library(CVXR, warn.conflicts=FALSE))

betaHat <- Variable(p)

objective <- Minimize(sum((Y - X %*% betaHat)^2))

problem <- Problem(objective)

result <- solve(problem)

m <- cbind(coef(ls.model), result$getValue(betaHat))
colnames(m) <- c("lm est.", "CVXR est.")
rownames(m) <- paste0("$\beta_{", 1:p, "}$")
kbl(m)

输出:

方法一:

修改 this paper 的凸性度量,测量沿点的路径长度比沿相同点的凸包长多少(按比例)。

基础 R 解决方案:

path_length <- function(x, y) {
  sum(sqrt(diff(x)^2 + diff(y)^2))
}

conv_meas <- function(x, y) {
  ch <- chull(c(x, range(x)[2:1]), c(y, rep(min(y), 2)))
  ch <- ch[ch <= length(x)]
  pl <- path_length(x[ch], y[ch])
  (path_length(x, y) - pl)/pl
}

conv_meas(df$x1, df$y)
#> [1] 1.053814e-11
conv_meas(df$x2, df$y)
#> [1] 1.407418e-09
conv_meas(df$x3, df$y)
#> [1] 0.002536735

恰好为零的值表示这些点是完全凸的。离零越远,凸性越小。

方法二:

计算曲线“内部”上下的凸包,并取路径长度的比值。

conv_meas <- function(x, y) {
  yrng <- range(y)
  xhull <- c(x, range(x)[2:1])
  chDown <- chull(xhull, c(y, rep(yrng[1], 2)))
  chUp <- chull(xhull, c(y, rep(yrng[2], 2)))
  chDown <- sort(chDown[chDown <= length(x)])
  chUp <- sort(chUp[chUp <= length(x)])
  path_length(x[chDown], y[chDown])/path_length(x[chUp], y[chUp])
}

conv_meas(df$x1, df$y)
#> [1] 1.003598
conv_meas(df$x2, df$y)
#> [1] 1.000438
conv_meas(df$x3, df$y)
#> [1] 0.9974697
set.seed(123)
conv_meas(1:100, log(1:100) + runif(100))
#> [1] 1.007082
conv_meas(1:100, exp((1:100 - 50)/20) + runif(100))
#> [1] 0.9871246
conv_meas(1:100, -(1:100))
#> [1] 1

值<1表示凸(上凹),值>1表示凹(下凹)。恰好为 1 的值表示一条线(或者说它不比凹的更凸)。