减少样本的大数据帧以确保样本之间的最大可变性
Reduce large data-frame of samples to ensure maximum variability between samples
我有一个向量列表,列表中的每个条目都是一个索引向量,例如:
list(c(563L, 688L, 630L, 160L, 568L, 908L, 457L, 798L, 3L, 558L,
56L, 389L, 506L, 106L, 807L, 556L, 809L, 63L, 343L, 242L, 470L,
894L, 804L, 970L, 406L, 881L, 893L, 952L, 126L, 827L, 282L, 910L,
61L, 66L, 763L, 787L, 337L, 41L, 712L, 144L, 450L, 12L, 200L,
574L, 945L, 236L, 336L, 684L, 280L, 721L, 233L, 686L, 64L, 504L,
174L, 934L, 40L, 850L, 26L, 799L, 853L, 978L), c(85L, 564L, 591L,
662L, 377L, 536L, 325L, 402L, 72L, 410L, 687L, 216L, 603L, 67L,
794L, 388L, 627L, 376L, 863L, 491L, 598L, 861L, 991L, 651L, 670L,
401L, 459L, 39L, 997L, 806L, 623L, 954L), c(427L, 791L, 212L,
779L, 657L, 740L, 800L, 838L, 104L, 985L, 167L, 486L, 685L, 739L,
60L, 862L, 130L, 134L, 175L, 375L, 683L, 885L, 575L, 859L, 341L,
726L, 472L, 802L, 76L, 424L, 177L, 624L, 189L, 334L, 378L, 329L,
581L, 224L, 851L, 218L, 993L, 678L, 248L, 365L, 188L, 774L, 58L,
813L, 514L, 59L, 777L, 485L, 606L, 480L, 826L, 350L, 608L, 27L,
661L, 775L, 340L, 10L, 207L, 260L, 483L, 150L, 205L), c(138L,
587L, 165L, 1L, 722L, 300L, 500L, 535L, 832L, 392L, 432L, 139L,
744L, 676L, 839L, 107L, 769L, 589L, 647L, 548L, 704L, 197L, 689L,
111L, 342L, 319L, 567L, 17L, 925L, 5L, 116L, 493L, 241L, 965L
), c(89L, 440L, 228L, 884L, 88L, 147L, 413L, 821L, 70L, 95L,
71L, 917L, 463L, 990L, 672L, 981L, 765L, 937L, 75L, 766L, 374L,
636L, 449L, 816L, 1000L, 356L, 629L), c(421L, 650L, 453L, 666L,
584L, 717L, 220L, 605L, 182L, 811L, 157L, 523L, 28L, 527L, 737L,
812L, 263L, 675L, 132L, 879L, 438L, 451L, 883L, 950L, 114L, 466L,
348L, 711L, 209L, 887L, 593L, 949L, 349L, 764L, 595L, 736L, 660L,
801L, 118L, 877L), c(23L, 231L, 78L, 988L, 55L, 57L, 753L, 994L,
437L, 202L, 842L, 190L, 822L, 968L, 331L, 733L, 782L, 886L, 105L,
943L, 743L, 815L, 311L, 498L, 792L, 795L, 184L, 728L, 573L, 771L,
117L, 251L, 192L, 735L, 15L, 776L, 295L, 677L, 631L, 235L, 237L,
705L, 856L, 97L, 725L), c(229L, 671L, 129L, 405L, 115L, 644L,
98L, 492L, 871L, 935L, 435L, 707L, 773L, 754L, 803L, 120L, 656L,
345L, 875L, 330L, 533L, 366L, 240L, 408L, 332L, 577L, 550L, 452L,
963L, 8L, 187L, 226L, 901L, 371L, 426L, 339L, 519L, 86L, 501L,
274L, 831L), c(16L, 79L, 68L, 477L, 133L, 659L, 2L, 973L, 264L,
953L, 90L, 234L, 420L, 588L, 21L, 788L, 363L, 539L, 227L, 565L,
30L, 642L, 786L, 982L, 347L, 680L, 52L, 96L, 592L, 409L, 643L,
81L, 419L, 245L, 658L, 416L, 590L, 448L, 819L, 277L, 357L, 442L,
789L, 516L, 980L, 93L, 998L, 149L, 166L, 299L, 454L, 529L, 986L,
127L, 541L, 45L, 829L, 289L, 418L, 179L, 310L, 113L, 729L), c(429L,
781L, 303L, 434L, 83L, 259L, 387L, 583L, 393L, 770L, 246L, 428L,
947L, 976L, 31L, 382L, 710L, 944L, 164L, 868L, 373L, 899L, 74L,
468L, 614L, 701L, 221L, 645L, 268L, 785L, 293L, 632L, 24L, 749L,
283L, 741L, 796L, 915L), c(258L, 844L, 649L, 752L, 474L, 613L,
351L, 551L, 309L, 380L, 497L, 724L, 327L, 992L, 845L, 607L, 818L,
693L, 914L, 291L, 720L, 633L, 974L, 367L, 639L, 94L, 467L, 92L,
522L, 141L, 496L, 276L, 542L, 665L, 695L, 634L, 602L, 913L, 396L,
597L, 443L, 892L, 65L, 394L, 222L, 778L, 169L, 960L, 35L, 655L,
422L, 927L, 154L, 215L, 262L, 203L, 880L, 217L, 423L, 755L, 904L,
180L, 620L), c(507L, 628L, 29L, 902L, 738L, 897L, 664L, 967L,
294L, 682L, 254L, 302L, 128L, 559L, 511L, 526L, 7L, 742L, 464L,
621L, 265L, 599L, 102L, 546L, 458L, 969L, 751L, 860L, 326L, 873L,
335L, 580L, 499L, 962L, 290L, 557L, 213L, 716L, 53L, 835L, 600L,
610L, 321L, 673L, 713L, 876L, 244L, 462L, 136L, 272L, 195L, 447L,
230L, 679L, 465L, 611L, 297L, 731L, 44L, 824L, 162L, 837L), c(446L,
561L, 391L, 652L, 857L, 946L, 560L, 784L, 854L, 204L, 512L, 82L,
455L, 372L, 407L, 328L, 808L, 152L, 178L, 185L, 543L, 108L, 473L,
490L, 955L, 719L, 757L, 198L, 338L, 223L, 919L, 531L, 653L, 734L,
923L, 487L, 637L, 398L, 431L, 46L, 848L, 324L, 948L, 43L, 183L,
288L, 697L, 87L, 307L, 42L, 571L, 360L, 433L, 390L, 569L, 956L,
534L, 6L, 381L, 549L, 301L, 920L, 69L, 322L, 267L, 503L, 285L,
961L, 370L, 425L), c(344L, 959L, 364L, 552L, 11L, 481L, 287L,
891L, 692L, 762L, 47L, 292L, 358L, 810L, 942L, 730L, 746L, 638L,
750L, 759L, 761L, 140L, 444L, 191L, 805L, 306L, 691L, 170L, 715L,
508L, 984L, 461L, 911L, 103L, 938L, 718L, 928L), c(124L, 284L,
123L, 513L, 417L, 933L, 121L, 168L, 208L, 385L, 32L, 273L, 869L,
932L, 397L, 509L, 239L, 797L, 379L, 723L, 898L, 163L, 320L, 833L,
151L, 906L, 648L, 732L, 279L, 834L, 489L, 840L, 783L, 971L, 49L,
145L, 253L, 352L, 137L, 261L, 247L, 143L, 544L, 109L, 921L, 830L,
972L, 585L, 690L, 609L, 703L, 250L, 708L, 225L, 889L, 181L, 987L,
54L, 502L, 148L, 355L, 888L, 579L, 983L, 825L, 855L, 62L, 918L,
979L, 586L, 681L, 384L, 709L, 333L, 758L, 194L, 368L), c(646L,
930L, 361L, 399L, 13L, 298L, 395L, 975L, 482L, 940L, 596L, 772L,
700L, 843L, 171L, 537L, 173L, 836L, 767L, 989L, 532L, 890L, 99L,
865L, 142L, 135L, 271L, 346L, 441L, 48L, 941L, 866L, 201L, 872L,
36L, 520L, 530L, 77L, 270L), c(238L, 699L, 22L, 50L, 615L, 702L,
4L, 469L, 101L, 314L, 616L, 995L, 996L, 414L, 566L, 249L, 572L,
369L, 553L, 158L, 159L, 199L, 317L, 515L, 517L, 524L, 562L, 19L,
476L, 20L, 146L, 618L, 895L, 312L, 912L), c(768L, 939L, 578L,
849L, 196L, 640L, 323L, 635L, 304L, 318L, 874L, 977L, 488L, 619L,
155L, 905L, 9L, 112L, 484L, 847L, 313L, 900L, 494L, 727L, 625L,
931L, 119L, 846L, 186L, 219L, 471L, 696L, 404L, 460L, 668L, 896L,
439L, 964L, 275L, 756L, 411L, 878L, 538L, 669L, 478L, 570L, 255L,
547L, 257L, 841L, 37L, 576L, 456L, 663L, 525L, 817L, 612L, 820L
), c(243L, 594L, 33L, 176L, 415L, 667L, 748L, 852L, 232L, 922L,
308L, 436L, 153L, 505L, 14L, 281L, 316L, 495L, 540L, 622L, 156L,
926L, 521L, 698L, 545L, 760L, 84L, 210L, 359L, 131L, 745L, 34L,
91L, 555L, 858L, 445L, 867L, 125L, 814L, 604L, 706L, 315L, 654L,
747L, 936L, 269L, 957L), c(80L, 924L, 110L, 193L, 958L, 296L,
475L, 18L, 907L, 626L, 999L, 278L, 362L, 51L, 641L, 211L, 929L,
122L, 694L, 73L, 353L, 25L, 100L, 305L, 864L, 214L, 790L, 286L,
518L, 674L, 206L, 400L, 554L, 903L, 780L, 916L, 38L, 430L, 617L,
823L, 172L, 966L, 412L, 951L, 510L, 828L, 479L, 909L, 266L, 582L,
870L, 882L, 161L, 252L, 256L, 383L, 403L, 601L, 386L, 793L, 528L,
354L, 714L))
其中每个条目(或每个嵌套列表)代表一个使用聚类方法获得的组。
现在我有以下一段代码,它采用这个嵌套列表列表和所需的样本数量,以及 returns 一个数据框,其中每一行代表一个样本,每一列代表一个样本来自嵌套列表之一的组。
groups_samples <- function(groups, repetition) {
return(as.data.frame(sapply(groups, sample, repetition, TRUE)))
}
下面以下面为例:
df <- groups_samples(ll, 100)
structure(list(V1 = c(106L, 686L, 721L, 200L, 970L, 910L, 556L,
807L, 908L, 568L, 688L, 389L, 56L, 470L, 630L, 893L, 574L, 236L,
804L, 798L, 721L, 934L, 763L, 807L, 457L, 568L, 684L, 934L, 787L,
450L, 688L, 64L, 568L, 934L, 894L, 558L, 568L, 343L, 450L, 853L,
336L, 64L, 712L, 144L, 934L, 144L, 809L, 763L, 457L, 763L, 558L,
457L, 688L, 763L, 504L, 66L, 406L, 881L, 3L, 343L, 556L, 799L,
712L, 568L, 61L, 799L, 908L, 688L, 64L, 881L, 236L, 787L, 66L,
160L, 853L, 343L, 809L, 200L, 827L, 893L, 894L, 799L, 470L, 406L,
337L, 389L, 63L, 952L, 236L, 337L, 763L, 41L, 945L, 144L, 56L,
978L, 233L, 978L, 881L, 910L), V2 = c(72L, 651L, 861L, 651L,
591L, 72L, 564L, 662L, 402L, 623L, 603L, 377L, 401L, 603L, 598L,
67L, 991L, 376L, 67L, 325L, 325L, 377L, 536L, 861L, 564L, 670L,
806L, 377L, 687L, 603L, 954L, 627L, 67L, 388L, 954L, 564L, 991L,
564L, 591L, 863L, 376L, 991L, 85L, 85L, 564L, 598L, 591L, 687L,
806L, 564L, 401L, 72L, 603L, 536L, 459L, 603L, 954L, 67L, 216L,
410L, 687L, 806L, 623L, 388L, 67L, 401L, 491L, 662L, 85L, 627L,
598L, 954L, 459L, 591L, 997L, 687L, 687L, 536L, 863L, 459L, 670L,
459L, 603L, 401L, 39L, 687L, 39L, 651L, 991L, 376L, 388L, 954L,
997L, 85L, 39L, 627L, 861L, 670L, 39L, 459L), V3 = c(424L, 775L,
862L, 791L, 683L, 826L, 60L, 205L, 802L, 740L, 58L, 985L, 683L,
341L, 838L, 212L, 993L, 59L, 851L, 657L, 375L, 885L, 150L, 167L,
218L, 205L, 58L, 260L, 341L, 661L, 791L, 350L, 726L, 378L, 188L,
150L, 60L, 813L, 774L, 104L, 207L, 207L, 485L, 514L, 424L, 514L,
859L, 130L, 350L, 188L, 188L, 740L, 859L, 177L, 212L, 802L, 606L,
104L, 608L, 260L, 329L, 993L, 427L, 427L, 485L, 472L, 859L, 424L,
661L, 514L, 791L, 678L, 993L, 726L, 188L, 340L, 483L, 150L, 340L,
514L, 606L, 248L, 205L, 188L, 581L, 813L, 175L, 657L, 862L, 775L,
212L, 341L, 27L, 885L, 575L, 334L, 350L, 486L, 483L, 340L), V4 = c(138L,
493L, 111L, 241L, 548L, 107L, 548L, 965L, 839L, 1L, 139L, 1L,
165L, 769L, 111L, 965L, 548L, 1L, 676L, 319L, 689L, 769L, 567L,
197L, 139L, 319L, 319L, 832L, 116L, 500L, 392L, 704L, 689L, 500L,
689L, 832L, 165L, 138L, 116L, 676L, 197L, 589L, 832L, 165L, 925L,
165L, 647L, 832L, 116L, 744L, 587L, 925L, 500L, 116L, 107L, 832L,
500L, 319L, 17L, 925L, 116L, 548L, 17L, 107L, 676L, 111L, 832L,
925L, 111L, 107L, 17L, 722L, 139L, 432L, 319L, 548L, 241L, 769L,
319L, 17L, 689L, 342L, 165L, 722L, 676L, 319L, 197L, 241L, 139L,
139L, 111L, 744L, 689L, 722L, 965L, 432L, 647L, 432L, 1L, 111L
), V5 = c(816L, 95L, 884L, 821L, 88L, 374L, 981L, 672L, 70L,
71L, 89L, 95L, 374L, 75L, 917L, 765L, 917L, 449L, 71L, 884L,
766L, 70L, 672L, 89L, 816L, 937L, 937L, 440L, 413L, 1000L, 1000L,
413L, 70L, 356L, 821L, 440L, 990L, 821L, 147L, 356L, 629L, 374L,
766L, 766L, 71L, 937L, 89L, 95L, 917L, 937L, 937L, 449L, 95L,
463L, 1000L, 440L, 821L, 884L, 917L, 816L, 89L, 1000L, 766L,
356L, 765L, 440L, 75L, 463L, 440L, 440L, 765L, 636L, 672L, 629L,
88L, 356L, 374L, 374L, 463L, 95L, 463L, 75L, 71L, 89L, 449L,
88L, 990L, 884L, 765L, 463L, 884L, 672L, 463L, 449L, 629L, 821L,
981L, 75L, 990L, 440L), V6 = c(650L, 675L, 737L, 466L, 883L,
877L, 209L, 887L, 584L, 263L, 605L, 132L, 584L, 950L, 650L, 451L,
737L, 453L, 348L, 675L, 949L, 349L, 209L, 584L, 801L, 593L, 711L,
666L, 466L, 605L, 527L, 666L, 584L, 717L, 114L, 660L, 118L, 466L,
811L, 595L, 438L, 28L, 593L, 811L, 118L, 711L, 605L, 593L, 466L,
650L, 801L, 438L, 348L, 349L, 118L, 584L, 114L, 584L, 801L, 209L,
157L, 466L, 801L, 182L, 812L, 132L, 523L, 666L, 605L, 527L, 950L,
950L, 812L, 421L, 584L, 801L, 132L, 182L, 737L, 887L, 883L, 605L,
737L, 711L, 28L, 675L, 220L, 157L, 118L, 887L, 675L, 132L, 736L,
811L, 887L, 438L, 182L, 717L, 737L, 950L), V7 = c(994L, 202L,
311L, 725L, 437L, 725L, 776L, 295L, 792L, 57L, 57L, 295L, 842L,
15L, 776L, 331L, 822L, 795L, 78L, 988L, 498L, 822L, 988L, 782L,
776L, 728L, 631L, 725L, 735L, 573L, 105L, 295L, 23L, 78L, 202L,
117L, 190L, 705L, 105L, 57L, 792L, 251L, 251L, 968L, 192L, 23L,
231L, 822L, 295L, 231L, 631L, 842L, 57L, 235L, 815L, 331L, 117L,
705L, 331L, 994L, 795L, 237L, 815L, 815L, 23L, 822L, 235L, 631L,
78L, 97L, 57L, 192L, 677L, 184L, 57L, 231L, 231L, 753L, 733L,
237L, 743L, 677L, 631L, 988L, 815L, 311L, 815L, 311L, 771L, 728L,
23L, 988L, 728L, 705L, 97L, 988L, 994L, 57L, 728L, 192L), V8 = c(754L,
875L, 332L, 935L, 86L, 339L, 86L, 644L, 339L, 501L, 803L, 229L,
644L, 426L, 550L, 129L, 330L, 129L, 229L, 86L, 773L, 803L, 129L,
901L, 452L, 8L, 229L, 98L, 129L, 366L, 187L, 8L, 773L, 187L,
229L, 8L, 98L, 935L, 98L, 345L, 754L, 533L, 332L, 550L, 240L,
875L, 773L, 229L, 426L, 754L, 120L, 803L, 129L, 901L, 901L, 644L,
345L, 707L, 707L, 773L, 533L, 120L, 332L, 330L, 803L, 86L, 803L,
8L, 226L, 345L, 871L, 240L, 550L, 963L, 330L, 345L, 226L, 533L,
366L, 452L, 803L, 405L, 803L, 405L, 550L, 577L, 8L, 339L, 901L,
577L, 330L, 229L, 330L, 656L, 452L, 330L, 519L, 226L, 366L, 435L
), V9 = c(643L, 953L, 642L, 21L, 592L, 16L, 127L, 539L, 409L,
516L, 419L, 277L, 986L, 590L, 45L, 980L, 998L, 516L, 541L, 980L,
454L, 81L, 149L, 986L, 227L, 45L, 420L, 363L, 986L, 90L, 409L,
986L, 953L, 45L, 982L, 588L, 68L, 127L, 127L, 16L, 418L, 21L,
953L, 442L, 418L, 419L, 565L, 980L, 659L, 16L, 149L, 448L, 789L,
454L, 516L, 2L, 127L, 79L, 277L, 980L, 234L, 357L, 357L, 642L,
980L, 680L, 729L, 81L, 21L, 454L, 986L, 357L, 980L, 973L, 680L,
592L, 788L, 2L, 264L, 79L, 680L, 729L, 52L, 986L, 539L, 79L,
277L, 416L, 786L, 477L, 113L, 454L, 419L, 442L, 953L, 79L, 245L,
788L, 93L, 234L), V10 = c(31L, 468L, 468L, 387L, 164L, 796L,
701L, 785L, 915L, 614L, 741L, 770L, 770L, 583L, 373L, 373L, 393L,
221L, 303L, 83L, 74L, 785L, 387L, 741L, 741L, 393L, 468L, 701L,
382L, 393L, 387L, 899L, 429L, 947L, 781L, 781L, 645L, 645L, 710L,
915L, 74L, 796L, 259L, 749L, 373L, 393L, 246L, 632L, 785L, 259L,
614L, 785L, 428L, 741L, 632L, 382L, 770L, 710L, 781L, 749L, 868L,
915L, 434L, 221L, 429L, 303L, 393L, 468L, 632L, 976L, 781L, 373L,
947L, 428L, 781L, 781L, 645L, 868L, 645L, 710L, 283L, 31L, 868L,
583L, 915L, 246L, 373L, 373L, 781L, 164L, 428L, 710L, 373L, 303L,
632L, 868L, 614L, 947L, 74L, 382L), V11 = c(351L, 154L, 423L,
496L, 818L, 913L, 665L, 913L, 380L, 720L, 542L, 380L, 634L, 551L,
258L, 818L, 634L, 474L, 222L, 639L, 974L, 755L, 262L, 665L, 522L,
217L, 927L, 351L, 755L, 914L, 380L, 65L, 844L, 633L, 613L, 222L,
649L, 892L, 752L, 423L, 755L, 169L, 904L, 309L, 639L, 276L, 217L,
394L, 291L, 522L, 203L, 720L, 35L, 422L, 724L, 423L, 720L, 914L,
180L, 327L, 92L, 422L, 258L, 467L, 724L, 620L, 665L, 367L, 639L,
443L, 892L, 724L, 141L, 422L, 327L, 396L, 92L, 309L, 844L, 258L,
914L, 634L, 497L, 222L, 141L, 880L, 467L, 443L, 496L, 913L, 394L,
217L, 35L, 396L, 35L, 880L, 351L, 755L, 474L, 215L), V12 = c(102L,
546L, 682L, 464L, 162L, 876L, 162L, 302L, 682L, 162L, 302L, 53L,
967L, 679L, 837L, 824L, 44L, 53L, 294L, 738L, 254L, 557L, 546L,
7L, 902L, 244L, 128L, 499L, 621L, 499L, 458L, 526L, 837L, 465L,
290L, 969L, 265L, 507L, 835L, 837L, 546L, 136L, 897L, 213L, 195L,
244L, 465L, 835L, 464L, 621L, 162L, 511L, 969L, 230L, 580L, 335L,
610L, 969L, 546L, 897L, 835L, 447L, 526L, 302L, 464L, 302L, 682L,
628L, 610L, 272L, 53L, 254L, 969L, 962L, 511L, 621L, 290L, 458L,
559L, 860L, 136L, 507L, 462L, 136L, 462L, 731L, 873L, 462L, 335L,
897L, 580L, 447L, 628L, 731L, 7L, 335L, 102L, 128L, 679L, 742L
), V13 = c(108L, 637L, 757L, 734L, 534L, 42L, 808L, 322L, 757L,
204L, 808L, 324L, 288L, 82L, 285L, 961L, 955L, 652L, 808L, 961L,
503L, 549L, 697L, 87L, 734L, 43L, 204L, 455L, 398L, 961L, 183L,
433L, 431L, 854L, 490L, 69L, 407L, 808L, 398L, 69L, 87L, 338L,
446L, 178L, 6L, 198L, 82L, 543L, 370L, 534L, 87L, 267L, 455L,
360L, 534L, 407L, 431L, 446L, 854L, 857L, 46L, 637L, 848L, 923L,
560L, 531L, 919L, 223L, 307L, 561L, 6L, 719L, 560L, 43L, 734L,
288L, 324L, 87L, 808L, 322L, 757L, 446L, 425L, 324L, 757L, 857L,
87L, 848L, 223L, 503L, 307L, 152L, 503L, 757L, 956L, 152L, 43L,
69L, 719L, 637L), V14 = c(746L, 805L, 191L, 47L, 508L, 508L,
715L, 461L, 928L, 750L, 140L, 746L, 364L, 552L, 287L, 984L, 481L,
715L, 762L, 959L, 750L, 344L, 959L, 959L, 306L, 911L, 103L, 638L,
759L, 761L, 750L, 444L, 692L, 692L, 761L, 481L, 552L, 942L, 810L,
938L, 306L, 762L, 344L, 942L, 344L, 364L, 552L, 891L, 11L, 103L,
762L, 287L, 891L, 358L, 730L, 959L, 750L, 191L, 718L, 959L, 358L,
306L, 287L, 692L, 746L, 461L, 750L, 170L, 358L, 911L, 805L, 938L,
481L, 759L, 750L, 140L, 715L, 959L, 928L, 692L, 461L, 750L, 306L,
762L, 691L, 306L, 287L, 481L, 170L, 746L, 810L, 762L, 358L, 292L,
750L, 191L, 47L, 942L, 344L, 191L), V15 = c(987L, 972L, 151L,
397L, 250L, 825L, 681L, 825L, 723L, 49L, 585L, 109L, 833L, 137L,
49L, 690L, 681L, 253L, 385L, 921L, 708L, 151L, 109L, 385L, 54L,
247L, 979L, 121L, 225L, 124L, 825L, 417L, 320L, 979L, 681L, 918L,
145L, 397L, 681L, 145L, 586L, 709L, 284L, 840L, 121L, 368L, 250L,
898L, 840L, 109L, 417L, 513L, 544L, 194L, 417L, 544L, 320L, 987L,
840L, 987L, 888L, 489L, 855L, 906L, 62L, 579L, 379L, 783L, 368L,
379L, 49L, 732L, 279L, 509L, 54L, 145L, 797L, 979L, 709L, 840L,
368L, 830L, 502L, 123L, 681L, 194L, 855L, 703L, 247L, 833L, 609L,
830L, 708L, 609L, 509L, 397L, 987L, 609L, 320L, 124L), V16 = c(346L,
48L, 865L, 865L, 173L, 890L, 482L, 13L, 537L, 171L, 482L, 940L,
843L, 173L, 975L, 866L, 142L, 646L, 482L, 700L, 395L, 298L, 975L,
890L, 361L, 173L, 890L, 975L, 940L, 271L, 395L, 989L, 395L, 142L,
865L, 361L, 399L, 441L, 441L, 772L, 142L, 520L, 142L, 520L, 975L,
930L, 890L, 989L, 530L, 866L, 941L, 530L, 596L, 890L, 36L, 441L,
346L, 865L, 173L, 646L, 270L, 441L, 866L, 866L, 346L, 441L, 482L,
872L, 36L, 890L, 271L, 13L, 36L, 836L, 767L, 395L, 890L, 537L,
395L, 530L, 346L, 346L, 940L, 173L, 865L, 772L, 520L, 171L, 48L,
866L, 135L, 298L, 135L, 77L, 361L, 872L, 395L, 596L, 772L, 532L
), V17 = c(912L, 146L, 312L, 22L, 618L, 317L, 618L, 199L, 369L,
101L, 515L, 4L, 476L, 699L, 517L, 317L, 159L, 517L, 553L, 616L,
995L, 314L, 317L, 314L, 562L, 101L, 249L, 369L, 615L, 562L, 476L,
702L, 312L, 312L, 515L, 101L, 159L, 572L, 101L, 618L, 895L, 317L,
616L, 618L, 572L, 562L, 4L, 517L, 312L, 312L, 249L, 699L, 312L,
158L, 469L, 20L, 524L, 476L, 572L, 249L, 50L, 19L, 249L, 912L,
469L, 476L, 101L, 146L, 616L, 618L, 476L, 20L, 146L, 249L, 50L,
101L, 158L, 517L, 238L, 515L, 895L, 553L, 702L, 146L, 312L, 517L,
158L, 895L, 517L, 101L, 314L, 238L, 22L, 146L, 317L, 895L, 469L,
912L, 369L, 572L), V18 = c(525L, 635L, 488L, 456L, 878L, 119L,
119L, 849L, 768L, 817L, 931L, 275L, 460L, 900L, 494L, 669L, 846L,
488L, 768L, 494L, 570L, 439L, 878L, 275L, 471L, 896L, 768L, 619L,
727L, 977L, 155L, 155L, 896L, 112L, 817L, 768L, 411L, 304L, 964L,
612L, 905L, 768L, 456L, 255L, 119L, 404L, 304L, 576L, 219L, 756L,
612L, 668L, 255L, 768L, 196L, 668L, 155L, 931L, 896L, 878L, 488L,
576L, 640L, 37L, 846L, 494L, 257L, 37L, 411L, 411L, 625L, 820L,
304L, 112L, 619L, 9L, 669L, 494L, 471L, 323L, 318L, 570L, 817L,
578L, 878L, 696L, 977L, 768L, 896L, 525L, 669L, 841L, 471L, 727L,
619L, 304L, 874L, 931L, 37L, 619L), V19 = c(926L, 281L, 957L,
308L, 315L, 814L, 622L, 153L, 858L, 315L, 867L, 176L, 555L, 210L,
867L, 540L, 555L, 867L, 622L, 852L, 540L, 436L, 269L, 505L, 436L,
505L, 654L, 505L, 91L, 125L, 131L, 706L, 243L, 125L, 922L, 281L,
91L, 359L, 33L, 957L, 232L, 698L, 555L, 540L, 667L, 34L, 545L,
698L, 555L, 308L, 926L, 445L, 316L, 748L, 243L, 14L, 521L, 232L,
654L, 243L, 232L, 359L, 156L, 131L, 555L, 359L, 521L, 852L, 706L,
957L, 308L, 125L, 91L, 852L, 315L, 604L, 604L, 760L, 604L, 936L,
521L, 747L, 922L, 555L, 243L, 521L, 316L, 867L, 84L, 176L, 814L,
232L, 315L, 316L, 555L, 505L, 745L, 505L, 232L, 540L), V20 = c(554L,
882L, 823L, 386L, 966L, 694L, 286L, 354L, 214L, 25L, 25L, 110L,
353L, 475L, 479L, 252L, 582L, 999L, 266L, 211L, 18L, 278L, 828L,
412L, 528L, 386L, 296L, 353L, 412L, 80L, 206L, 714L, 18L, 211L,
475L, 554L, 38L, 882L, 25L, 362L, 510L, 110L, 206L, 823L, 362L,
694L, 256L, 479L, 582L, 25L, 828L, 193L, 951L, 80L, 793L, 999L,
882L, 903L, 38L, 386L, 354L, 214L, 916L, 25L, 110L, 864L, 882L,
25L, 353L, 780L, 296L, 864L, 510L, 38L, 386L, 400L, 694L, 793L,
999L, 122L, 278L, 475L, 916L, 903L, 958L, 161L, 828L, 73L, 790L,
73L, 430L, 18L, 958L, 828L, 582L, 383L, 51L, 278L, 18L, 122L)), class = "data.frame", row.names = c(NA,
-100L))
现在我想做的是减少数量,假设从 100 个条目减少到 50 个条目,其中每个条目都是来自每个组的几个索引 1。我尝试使用多种方法计算距离矩阵并选择最远的条目,但当我检查时它并没有提供太多信息。
有什么办法可以做到吗,也许可以考虑列表的列表或者其他复杂的方法?
不胜感激help/insights
编辑 - 澄清 objective
假设我抽取了 100 个组,每个组包含嵌套列表的每个列表中的 1 个元素。
有些组与其他组接近,假设这两个组之间只有 1 个元素不同,所以我可能会想要丢弃它。或者甚至只有 2 个元素不同等等。但我希望最终保留 K 组,它们尽可能“遥远”。
如果可以考虑特定嵌套列表中元素的数量,某种加权过程,也很好。
编辑No.2
对于以下 list(c(1L, 5L, 6L), c(3L, 4L, 2L, 9L), c(8L, 7L, 10L))
我们得到以下数据帧:
structure(list(V1 = c(1L, 5L, 6L, 1L, 6L, 1L, 1L, 6L, 1L, 5L,
5L, 5L, 1L, 1L, 5L, 6L, 5L, 6L, 6L, 5L, 5L, 5L, 6L, 5L, 6L, 1L,
6L, 1L, 1L, 1L, 5L, 5L, 6L, 6L, 5L, 1L, 6L, 6L, 5L, 6L, 1L, 1L,
5L, 5L, 5L, 1L, 6L, 5L, 1L, 5L, 5L, 5L, 5L, 1L, 5L, 5L, 1L, 6L,
5L, 6L, 5L, 6L, 5L, 1L, 5L, 1L, 5L, 6L, 5L, 1L, 6L, 1L, 6L, 1L,
1L, 5L, 5L, 6L, 1L, 5L, 1L, 5L, 5L, 6L, 6L, 1L, 1L, 6L, 6L, 6L,
5L, 5L, 1L, 6L, 1L, 1L, 6L, 5L, 5L, 1L), V2 = c(9L, 3L, 9L, 4L,
2L, 4L, 3L, 3L, 3L, 2L, 2L, 9L, 3L, 3L, 2L, 2L, 9L, 9L, 9L, 3L,
4L, 3L, 2L, 3L, 4L, 2L, 2L, 3L, 4L, 9L, 9L, 2L, 3L, 2L, 9L, 9L,
3L, 2L, 4L, 4L, 3L, 4L, 3L, 2L, 2L, 9L, 9L, 2L, 4L, 4L, 4L, 9L,
2L, 3L, 9L, 3L, 3L, 2L, 2L, 2L, 4L, 2L, 4L, 3L, 3L, 3L, 2L, 9L,
9L, 9L, 2L, 9L, 3L, 3L, 9L, 4L, 3L, 3L, 4L, 3L, 4L, 4L, 4L, 4L,
2L, 9L, 9L, 4L, 9L, 2L, 2L, 9L, 4L, 4L, 9L, 9L, 2L, 4L, 4L, 3L
), V3 = c(7L, 7L, 7L, 8L, 7L, 7L, 7L, 7L, 10L, 8L, 10L, 8L, 7L,
7L, 10L, 10L, 10L, 8L, 8L, 8L, 8L, 8L, 8L, 7L, 10L, 7L, 10L,
10L, 7L, 8L, 7L, 8L, 7L, 8L, 8L, 8L, 7L, 8L, 8L, 8L, 10L, 7L,
8L, 7L, 7L, 10L, 7L, 7L, 10L, 7L, 10L, 8L, 8L, 7L, 10L, 10L,
10L, 8L, 8L, 10L, 7L, 8L, 8L, 10L, 8L, 10L, 10L, 10L, 8L, 10L,
10L, 10L, 8L, 10L, 8L, 7L, 10L, 7L, 7L, 10L, 8L, 7L, 8L, 10L,
7L, 8L, 10L, 7L, 7L, 7L, 7L, 10L, 7L, 7L, 10L, 10L, 7L, 7L, 8L,
10L)), class = "data.frame", row.names = c(NA, -100L))
运行 @Allan Cameron 代码,将产生以下更好的 5:
V1 V2 V3
26 1 2 7
68 6 9 10
7 1 3 7
17 5 9 10
13 1 3 7
正如你所描述的,两组之间的总体“距离”概念有点模糊。很明显,像 c(1, 5, 2, 6)
和 c(2, 9, 12, 3)
这样的对比 c(1, 5, 2, 6)
和 c(101, 78, 96, 54)
这样的对更接近,但是是否应该对完全匹配进行惩罚?方差重要吗?在没有更清晰的距离概念的情况下,我们拥有的最佳衡量标准是每组的 mean。这很容易通过 rowMeans(df)
.
获得
关于“K 最远的组”的概念也有些模糊。组之间的距离是 对 组的函数,而不是单个组的函数。如果K = 1,那么想必任何组都可以。如果 K = 2,则您需要均值差最大的一对组。在那之后,不清楚你在寻找什么,但一种方法是找到具有最高方差的 K 组的集合。
所以如果我们做类似的事情:
k <- 5
group_means <- rowMeans(df)
indices <- seq(nrow(df))
k_furthest <- c(which.min(group_means), which.max(group_means))
k_vals <- c(min(group_means), max(group_means))
group_means <- group_means[-k_furthest]
indices <- indices[-k_furthest]
while(length(k_furthest) < k)
{
best <- which.max(rowSums(sapply(k_vals, function(x) (x - group_means)^2)))
k_vals <- c(k_vals, group_means[best])
k_furthest <- c(k_furthest, indices[best])
group_means <- group_means[-best]
indices <- indices[-best]
}
然后 k_furthest
将包含数据框的 5 行集合,所有均值之间的方差最大。您的结果将如下所示:
df[k_furthest,]
#> V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20
#> 63 236 794 885 300 71 114 725 492 52 468 92 128 948 191 585 441 414 196 156 18
#> 51 798 536 739 704 1000 883 237 644 299 915 695 860 338 47 972 890 996 939 957 793
#> 61 41 388 624 689 672 466 55 229 454 164 542 265 338 170 32 271 314 640 922 582
#> 33 970 598 775 548 228 132 842 644 986 781 818 679 920 287 825 361 562 756 748 929
#> 12 336 216 774 107 71 801 725 492 642 74 613 297 948 306 124 646 19 439 281 122
请注意,此算法实际上只是在每次迭代中交替采用具有最高和最低均值的行。尽管这会在样本之间产生最大的整体集体“差异”,但您最终可能会得到一些非常靠近的样本,前提是它们也都与另一个样本相距很远。这可能不是您要查找的内容,这就是为什么在此上下文中准确指定“距离”的含义可能是个好主意。
编辑
随着进一步的澄清和来自 OP 的新示例,我们似乎正在寻求最大化组间 element-wise 差异的总和。这意味着我们可以这样做:
distances <- as.data.frame(t(sapply(1:nrow(df), function(i) {
a <- rowSums(apply(df, 2, function(x) abs(x[i] - x)))
c(row = i, most_distant = which.max(a), difference = max(a))
})))
这将为我们提供一个数据框,每一行告诉我们最“远”的其他组。
head(distances)
#> row most_distant difference
#> 1 1 16 15
#> 2 2 46 13
#> 3 3 9 14
#> 4 4 68 12
#> 5 5 46 15
#> 6 6 68 13
如果我们根据最大的差异对其进行排序,并取前两列中提到的前 K 组,我们将得到我们的结果:
i <- unique(c(t(distances[order(-distances$difference)[seq(k)], 1:2])))[seq(k)]
df[i,]
#> V1 V2 V3
#> 1 1 9 7
#> 16 6 2 10
#> 5 6 2 7
#> 46 1 9 10
#> 26 1 2 7
我有一个向量列表,列表中的每个条目都是一个索引向量,例如:
list(c(563L, 688L, 630L, 160L, 568L, 908L, 457L, 798L, 3L, 558L,
56L, 389L, 506L, 106L, 807L, 556L, 809L, 63L, 343L, 242L, 470L,
894L, 804L, 970L, 406L, 881L, 893L, 952L, 126L, 827L, 282L, 910L,
61L, 66L, 763L, 787L, 337L, 41L, 712L, 144L, 450L, 12L, 200L,
574L, 945L, 236L, 336L, 684L, 280L, 721L, 233L, 686L, 64L, 504L,
174L, 934L, 40L, 850L, 26L, 799L, 853L, 978L), c(85L, 564L, 591L,
662L, 377L, 536L, 325L, 402L, 72L, 410L, 687L, 216L, 603L, 67L,
794L, 388L, 627L, 376L, 863L, 491L, 598L, 861L, 991L, 651L, 670L,
401L, 459L, 39L, 997L, 806L, 623L, 954L), c(427L, 791L, 212L,
779L, 657L, 740L, 800L, 838L, 104L, 985L, 167L, 486L, 685L, 739L,
60L, 862L, 130L, 134L, 175L, 375L, 683L, 885L, 575L, 859L, 341L,
726L, 472L, 802L, 76L, 424L, 177L, 624L, 189L, 334L, 378L, 329L,
581L, 224L, 851L, 218L, 993L, 678L, 248L, 365L, 188L, 774L, 58L,
813L, 514L, 59L, 777L, 485L, 606L, 480L, 826L, 350L, 608L, 27L,
661L, 775L, 340L, 10L, 207L, 260L, 483L, 150L, 205L), c(138L,
587L, 165L, 1L, 722L, 300L, 500L, 535L, 832L, 392L, 432L, 139L,
744L, 676L, 839L, 107L, 769L, 589L, 647L, 548L, 704L, 197L, 689L,
111L, 342L, 319L, 567L, 17L, 925L, 5L, 116L, 493L, 241L, 965L
), c(89L, 440L, 228L, 884L, 88L, 147L, 413L, 821L, 70L, 95L,
71L, 917L, 463L, 990L, 672L, 981L, 765L, 937L, 75L, 766L, 374L,
636L, 449L, 816L, 1000L, 356L, 629L), c(421L, 650L, 453L, 666L,
584L, 717L, 220L, 605L, 182L, 811L, 157L, 523L, 28L, 527L, 737L,
812L, 263L, 675L, 132L, 879L, 438L, 451L, 883L, 950L, 114L, 466L,
348L, 711L, 209L, 887L, 593L, 949L, 349L, 764L, 595L, 736L, 660L,
801L, 118L, 877L), c(23L, 231L, 78L, 988L, 55L, 57L, 753L, 994L,
437L, 202L, 842L, 190L, 822L, 968L, 331L, 733L, 782L, 886L, 105L,
943L, 743L, 815L, 311L, 498L, 792L, 795L, 184L, 728L, 573L, 771L,
117L, 251L, 192L, 735L, 15L, 776L, 295L, 677L, 631L, 235L, 237L,
705L, 856L, 97L, 725L), c(229L, 671L, 129L, 405L, 115L, 644L,
98L, 492L, 871L, 935L, 435L, 707L, 773L, 754L, 803L, 120L, 656L,
345L, 875L, 330L, 533L, 366L, 240L, 408L, 332L, 577L, 550L, 452L,
963L, 8L, 187L, 226L, 901L, 371L, 426L, 339L, 519L, 86L, 501L,
274L, 831L), c(16L, 79L, 68L, 477L, 133L, 659L, 2L, 973L, 264L,
953L, 90L, 234L, 420L, 588L, 21L, 788L, 363L, 539L, 227L, 565L,
30L, 642L, 786L, 982L, 347L, 680L, 52L, 96L, 592L, 409L, 643L,
81L, 419L, 245L, 658L, 416L, 590L, 448L, 819L, 277L, 357L, 442L,
789L, 516L, 980L, 93L, 998L, 149L, 166L, 299L, 454L, 529L, 986L,
127L, 541L, 45L, 829L, 289L, 418L, 179L, 310L, 113L, 729L), c(429L,
781L, 303L, 434L, 83L, 259L, 387L, 583L, 393L, 770L, 246L, 428L,
947L, 976L, 31L, 382L, 710L, 944L, 164L, 868L, 373L, 899L, 74L,
468L, 614L, 701L, 221L, 645L, 268L, 785L, 293L, 632L, 24L, 749L,
283L, 741L, 796L, 915L), c(258L, 844L, 649L, 752L, 474L, 613L,
351L, 551L, 309L, 380L, 497L, 724L, 327L, 992L, 845L, 607L, 818L,
693L, 914L, 291L, 720L, 633L, 974L, 367L, 639L, 94L, 467L, 92L,
522L, 141L, 496L, 276L, 542L, 665L, 695L, 634L, 602L, 913L, 396L,
597L, 443L, 892L, 65L, 394L, 222L, 778L, 169L, 960L, 35L, 655L,
422L, 927L, 154L, 215L, 262L, 203L, 880L, 217L, 423L, 755L, 904L,
180L, 620L), c(507L, 628L, 29L, 902L, 738L, 897L, 664L, 967L,
294L, 682L, 254L, 302L, 128L, 559L, 511L, 526L, 7L, 742L, 464L,
621L, 265L, 599L, 102L, 546L, 458L, 969L, 751L, 860L, 326L, 873L,
335L, 580L, 499L, 962L, 290L, 557L, 213L, 716L, 53L, 835L, 600L,
610L, 321L, 673L, 713L, 876L, 244L, 462L, 136L, 272L, 195L, 447L,
230L, 679L, 465L, 611L, 297L, 731L, 44L, 824L, 162L, 837L), c(446L,
561L, 391L, 652L, 857L, 946L, 560L, 784L, 854L, 204L, 512L, 82L,
455L, 372L, 407L, 328L, 808L, 152L, 178L, 185L, 543L, 108L, 473L,
490L, 955L, 719L, 757L, 198L, 338L, 223L, 919L, 531L, 653L, 734L,
923L, 487L, 637L, 398L, 431L, 46L, 848L, 324L, 948L, 43L, 183L,
288L, 697L, 87L, 307L, 42L, 571L, 360L, 433L, 390L, 569L, 956L,
534L, 6L, 381L, 549L, 301L, 920L, 69L, 322L, 267L, 503L, 285L,
961L, 370L, 425L), c(344L, 959L, 364L, 552L, 11L, 481L, 287L,
891L, 692L, 762L, 47L, 292L, 358L, 810L, 942L, 730L, 746L, 638L,
750L, 759L, 761L, 140L, 444L, 191L, 805L, 306L, 691L, 170L, 715L,
508L, 984L, 461L, 911L, 103L, 938L, 718L, 928L), c(124L, 284L,
123L, 513L, 417L, 933L, 121L, 168L, 208L, 385L, 32L, 273L, 869L,
932L, 397L, 509L, 239L, 797L, 379L, 723L, 898L, 163L, 320L, 833L,
151L, 906L, 648L, 732L, 279L, 834L, 489L, 840L, 783L, 971L, 49L,
145L, 253L, 352L, 137L, 261L, 247L, 143L, 544L, 109L, 921L, 830L,
972L, 585L, 690L, 609L, 703L, 250L, 708L, 225L, 889L, 181L, 987L,
54L, 502L, 148L, 355L, 888L, 579L, 983L, 825L, 855L, 62L, 918L,
979L, 586L, 681L, 384L, 709L, 333L, 758L, 194L, 368L), c(646L,
930L, 361L, 399L, 13L, 298L, 395L, 975L, 482L, 940L, 596L, 772L,
700L, 843L, 171L, 537L, 173L, 836L, 767L, 989L, 532L, 890L, 99L,
865L, 142L, 135L, 271L, 346L, 441L, 48L, 941L, 866L, 201L, 872L,
36L, 520L, 530L, 77L, 270L), c(238L, 699L, 22L, 50L, 615L, 702L,
4L, 469L, 101L, 314L, 616L, 995L, 996L, 414L, 566L, 249L, 572L,
369L, 553L, 158L, 159L, 199L, 317L, 515L, 517L, 524L, 562L, 19L,
476L, 20L, 146L, 618L, 895L, 312L, 912L), c(768L, 939L, 578L,
849L, 196L, 640L, 323L, 635L, 304L, 318L, 874L, 977L, 488L, 619L,
155L, 905L, 9L, 112L, 484L, 847L, 313L, 900L, 494L, 727L, 625L,
931L, 119L, 846L, 186L, 219L, 471L, 696L, 404L, 460L, 668L, 896L,
439L, 964L, 275L, 756L, 411L, 878L, 538L, 669L, 478L, 570L, 255L,
547L, 257L, 841L, 37L, 576L, 456L, 663L, 525L, 817L, 612L, 820L
), c(243L, 594L, 33L, 176L, 415L, 667L, 748L, 852L, 232L, 922L,
308L, 436L, 153L, 505L, 14L, 281L, 316L, 495L, 540L, 622L, 156L,
926L, 521L, 698L, 545L, 760L, 84L, 210L, 359L, 131L, 745L, 34L,
91L, 555L, 858L, 445L, 867L, 125L, 814L, 604L, 706L, 315L, 654L,
747L, 936L, 269L, 957L), c(80L, 924L, 110L, 193L, 958L, 296L,
475L, 18L, 907L, 626L, 999L, 278L, 362L, 51L, 641L, 211L, 929L,
122L, 694L, 73L, 353L, 25L, 100L, 305L, 864L, 214L, 790L, 286L,
518L, 674L, 206L, 400L, 554L, 903L, 780L, 916L, 38L, 430L, 617L,
823L, 172L, 966L, 412L, 951L, 510L, 828L, 479L, 909L, 266L, 582L,
870L, 882L, 161L, 252L, 256L, 383L, 403L, 601L, 386L, 793L, 528L,
354L, 714L))
其中每个条目(或每个嵌套列表)代表一个使用聚类方法获得的组。
现在我有以下一段代码,它采用这个嵌套列表列表和所需的样本数量,以及 returns 一个数据框,其中每一行代表一个样本,每一列代表一个样本来自嵌套列表之一的组。
groups_samples <- function(groups, repetition) {
return(as.data.frame(sapply(groups, sample, repetition, TRUE)))
}
下面以下面为例:
df <- groups_samples(ll, 100)
structure(list(V1 = c(106L, 686L, 721L, 200L, 970L, 910L, 556L,
807L, 908L, 568L, 688L, 389L, 56L, 470L, 630L, 893L, 574L, 236L,
804L, 798L, 721L, 934L, 763L, 807L, 457L, 568L, 684L, 934L, 787L,
450L, 688L, 64L, 568L, 934L, 894L, 558L, 568L, 343L, 450L, 853L,
336L, 64L, 712L, 144L, 934L, 144L, 809L, 763L, 457L, 763L, 558L,
457L, 688L, 763L, 504L, 66L, 406L, 881L, 3L, 343L, 556L, 799L,
712L, 568L, 61L, 799L, 908L, 688L, 64L, 881L, 236L, 787L, 66L,
160L, 853L, 343L, 809L, 200L, 827L, 893L, 894L, 799L, 470L, 406L,
337L, 389L, 63L, 952L, 236L, 337L, 763L, 41L, 945L, 144L, 56L,
978L, 233L, 978L, 881L, 910L), V2 = c(72L, 651L, 861L, 651L,
591L, 72L, 564L, 662L, 402L, 623L, 603L, 377L, 401L, 603L, 598L,
67L, 991L, 376L, 67L, 325L, 325L, 377L, 536L, 861L, 564L, 670L,
806L, 377L, 687L, 603L, 954L, 627L, 67L, 388L, 954L, 564L, 991L,
564L, 591L, 863L, 376L, 991L, 85L, 85L, 564L, 598L, 591L, 687L,
806L, 564L, 401L, 72L, 603L, 536L, 459L, 603L, 954L, 67L, 216L,
410L, 687L, 806L, 623L, 388L, 67L, 401L, 491L, 662L, 85L, 627L,
598L, 954L, 459L, 591L, 997L, 687L, 687L, 536L, 863L, 459L, 670L,
459L, 603L, 401L, 39L, 687L, 39L, 651L, 991L, 376L, 388L, 954L,
997L, 85L, 39L, 627L, 861L, 670L, 39L, 459L), V3 = c(424L, 775L,
862L, 791L, 683L, 826L, 60L, 205L, 802L, 740L, 58L, 985L, 683L,
341L, 838L, 212L, 993L, 59L, 851L, 657L, 375L, 885L, 150L, 167L,
218L, 205L, 58L, 260L, 341L, 661L, 791L, 350L, 726L, 378L, 188L,
150L, 60L, 813L, 774L, 104L, 207L, 207L, 485L, 514L, 424L, 514L,
859L, 130L, 350L, 188L, 188L, 740L, 859L, 177L, 212L, 802L, 606L,
104L, 608L, 260L, 329L, 993L, 427L, 427L, 485L, 472L, 859L, 424L,
661L, 514L, 791L, 678L, 993L, 726L, 188L, 340L, 483L, 150L, 340L,
514L, 606L, 248L, 205L, 188L, 581L, 813L, 175L, 657L, 862L, 775L,
212L, 341L, 27L, 885L, 575L, 334L, 350L, 486L, 483L, 340L), V4 = c(138L,
493L, 111L, 241L, 548L, 107L, 548L, 965L, 839L, 1L, 139L, 1L,
165L, 769L, 111L, 965L, 548L, 1L, 676L, 319L, 689L, 769L, 567L,
197L, 139L, 319L, 319L, 832L, 116L, 500L, 392L, 704L, 689L, 500L,
689L, 832L, 165L, 138L, 116L, 676L, 197L, 589L, 832L, 165L, 925L,
165L, 647L, 832L, 116L, 744L, 587L, 925L, 500L, 116L, 107L, 832L,
500L, 319L, 17L, 925L, 116L, 548L, 17L, 107L, 676L, 111L, 832L,
925L, 111L, 107L, 17L, 722L, 139L, 432L, 319L, 548L, 241L, 769L,
319L, 17L, 689L, 342L, 165L, 722L, 676L, 319L, 197L, 241L, 139L,
139L, 111L, 744L, 689L, 722L, 965L, 432L, 647L, 432L, 1L, 111L
), V5 = c(816L, 95L, 884L, 821L, 88L, 374L, 981L, 672L, 70L,
71L, 89L, 95L, 374L, 75L, 917L, 765L, 917L, 449L, 71L, 884L,
766L, 70L, 672L, 89L, 816L, 937L, 937L, 440L, 413L, 1000L, 1000L,
413L, 70L, 356L, 821L, 440L, 990L, 821L, 147L, 356L, 629L, 374L,
766L, 766L, 71L, 937L, 89L, 95L, 917L, 937L, 937L, 449L, 95L,
463L, 1000L, 440L, 821L, 884L, 917L, 816L, 89L, 1000L, 766L,
356L, 765L, 440L, 75L, 463L, 440L, 440L, 765L, 636L, 672L, 629L,
88L, 356L, 374L, 374L, 463L, 95L, 463L, 75L, 71L, 89L, 449L,
88L, 990L, 884L, 765L, 463L, 884L, 672L, 463L, 449L, 629L, 821L,
981L, 75L, 990L, 440L), V6 = c(650L, 675L, 737L, 466L, 883L,
877L, 209L, 887L, 584L, 263L, 605L, 132L, 584L, 950L, 650L, 451L,
737L, 453L, 348L, 675L, 949L, 349L, 209L, 584L, 801L, 593L, 711L,
666L, 466L, 605L, 527L, 666L, 584L, 717L, 114L, 660L, 118L, 466L,
811L, 595L, 438L, 28L, 593L, 811L, 118L, 711L, 605L, 593L, 466L,
650L, 801L, 438L, 348L, 349L, 118L, 584L, 114L, 584L, 801L, 209L,
157L, 466L, 801L, 182L, 812L, 132L, 523L, 666L, 605L, 527L, 950L,
950L, 812L, 421L, 584L, 801L, 132L, 182L, 737L, 887L, 883L, 605L,
737L, 711L, 28L, 675L, 220L, 157L, 118L, 887L, 675L, 132L, 736L,
811L, 887L, 438L, 182L, 717L, 737L, 950L), V7 = c(994L, 202L,
311L, 725L, 437L, 725L, 776L, 295L, 792L, 57L, 57L, 295L, 842L,
15L, 776L, 331L, 822L, 795L, 78L, 988L, 498L, 822L, 988L, 782L,
776L, 728L, 631L, 725L, 735L, 573L, 105L, 295L, 23L, 78L, 202L,
117L, 190L, 705L, 105L, 57L, 792L, 251L, 251L, 968L, 192L, 23L,
231L, 822L, 295L, 231L, 631L, 842L, 57L, 235L, 815L, 331L, 117L,
705L, 331L, 994L, 795L, 237L, 815L, 815L, 23L, 822L, 235L, 631L,
78L, 97L, 57L, 192L, 677L, 184L, 57L, 231L, 231L, 753L, 733L,
237L, 743L, 677L, 631L, 988L, 815L, 311L, 815L, 311L, 771L, 728L,
23L, 988L, 728L, 705L, 97L, 988L, 994L, 57L, 728L, 192L), V8 = c(754L,
875L, 332L, 935L, 86L, 339L, 86L, 644L, 339L, 501L, 803L, 229L,
644L, 426L, 550L, 129L, 330L, 129L, 229L, 86L, 773L, 803L, 129L,
901L, 452L, 8L, 229L, 98L, 129L, 366L, 187L, 8L, 773L, 187L,
229L, 8L, 98L, 935L, 98L, 345L, 754L, 533L, 332L, 550L, 240L,
875L, 773L, 229L, 426L, 754L, 120L, 803L, 129L, 901L, 901L, 644L,
345L, 707L, 707L, 773L, 533L, 120L, 332L, 330L, 803L, 86L, 803L,
8L, 226L, 345L, 871L, 240L, 550L, 963L, 330L, 345L, 226L, 533L,
366L, 452L, 803L, 405L, 803L, 405L, 550L, 577L, 8L, 339L, 901L,
577L, 330L, 229L, 330L, 656L, 452L, 330L, 519L, 226L, 366L, 435L
), V9 = c(643L, 953L, 642L, 21L, 592L, 16L, 127L, 539L, 409L,
516L, 419L, 277L, 986L, 590L, 45L, 980L, 998L, 516L, 541L, 980L,
454L, 81L, 149L, 986L, 227L, 45L, 420L, 363L, 986L, 90L, 409L,
986L, 953L, 45L, 982L, 588L, 68L, 127L, 127L, 16L, 418L, 21L,
953L, 442L, 418L, 419L, 565L, 980L, 659L, 16L, 149L, 448L, 789L,
454L, 516L, 2L, 127L, 79L, 277L, 980L, 234L, 357L, 357L, 642L,
980L, 680L, 729L, 81L, 21L, 454L, 986L, 357L, 980L, 973L, 680L,
592L, 788L, 2L, 264L, 79L, 680L, 729L, 52L, 986L, 539L, 79L,
277L, 416L, 786L, 477L, 113L, 454L, 419L, 442L, 953L, 79L, 245L,
788L, 93L, 234L), V10 = c(31L, 468L, 468L, 387L, 164L, 796L,
701L, 785L, 915L, 614L, 741L, 770L, 770L, 583L, 373L, 373L, 393L,
221L, 303L, 83L, 74L, 785L, 387L, 741L, 741L, 393L, 468L, 701L,
382L, 393L, 387L, 899L, 429L, 947L, 781L, 781L, 645L, 645L, 710L,
915L, 74L, 796L, 259L, 749L, 373L, 393L, 246L, 632L, 785L, 259L,
614L, 785L, 428L, 741L, 632L, 382L, 770L, 710L, 781L, 749L, 868L,
915L, 434L, 221L, 429L, 303L, 393L, 468L, 632L, 976L, 781L, 373L,
947L, 428L, 781L, 781L, 645L, 868L, 645L, 710L, 283L, 31L, 868L,
583L, 915L, 246L, 373L, 373L, 781L, 164L, 428L, 710L, 373L, 303L,
632L, 868L, 614L, 947L, 74L, 382L), V11 = c(351L, 154L, 423L,
496L, 818L, 913L, 665L, 913L, 380L, 720L, 542L, 380L, 634L, 551L,
258L, 818L, 634L, 474L, 222L, 639L, 974L, 755L, 262L, 665L, 522L,
217L, 927L, 351L, 755L, 914L, 380L, 65L, 844L, 633L, 613L, 222L,
649L, 892L, 752L, 423L, 755L, 169L, 904L, 309L, 639L, 276L, 217L,
394L, 291L, 522L, 203L, 720L, 35L, 422L, 724L, 423L, 720L, 914L,
180L, 327L, 92L, 422L, 258L, 467L, 724L, 620L, 665L, 367L, 639L,
443L, 892L, 724L, 141L, 422L, 327L, 396L, 92L, 309L, 844L, 258L,
914L, 634L, 497L, 222L, 141L, 880L, 467L, 443L, 496L, 913L, 394L,
217L, 35L, 396L, 35L, 880L, 351L, 755L, 474L, 215L), V12 = c(102L,
546L, 682L, 464L, 162L, 876L, 162L, 302L, 682L, 162L, 302L, 53L,
967L, 679L, 837L, 824L, 44L, 53L, 294L, 738L, 254L, 557L, 546L,
7L, 902L, 244L, 128L, 499L, 621L, 499L, 458L, 526L, 837L, 465L,
290L, 969L, 265L, 507L, 835L, 837L, 546L, 136L, 897L, 213L, 195L,
244L, 465L, 835L, 464L, 621L, 162L, 511L, 969L, 230L, 580L, 335L,
610L, 969L, 546L, 897L, 835L, 447L, 526L, 302L, 464L, 302L, 682L,
628L, 610L, 272L, 53L, 254L, 969L, 962L, 511L, 621L, 290L, 458L,
559L, 860L, 136L, 507L, 462L, 136L, 462L, 731L, 873L, 462L, 335L,
897L, 580L, 447L, 628L, 731L, 7L, 335L, 102L, 128L, 679L, 742L
), V13 = c(108L, 637L, 757L, 734L, 534L, 42L, 808L, 322L, 757L,
204L, 808L, 324L, 288L, 82L, 285L, 961L, 955L, 652L, 808L, 961L,
503L, 549L, 697L, 87L, 734L, 43L, 204L, 455L, 398L, 961L, 183L,
433L, 431L, 854L, 490L, 69L, 407L, 808L, 398L, 69L, 87L, 338L,
446L, 178L, 6L, 198L, 82L, 543L, 370L, 534L, 87L, 267L, 455L,
360L, 534L, 407L, 431L, 446L, 854L, 857L, 46L, 637L, 848L, 923L,
560L, 531L, 919L, 223L, 307L, 561L, 6L, 719L, 560L, 43L, 734L,
288L, 324L, 87L, 808L, 322L, 757L, 446L, 425L, 324L, 757L, 857L,
87L, 848L, 223L, 503L, 307L, 152L, 503L, 757L, 956L, 152L, 43L,
69L, 719L, 637L), V14 = c(746L, 805L, 191L, 47L, 508L, 508L,
715L, 461L, 928L, 750L, 140L, 746L, 364L, 552L, 287L, 984L, 481L,
715L, 762L, 959L, 750L, 344L, 959L, 959L, 306L, 911L, 103L, 638L,
759L, 761L, 750L, 444L, 692L, 692L, 761L, 481L, 552L, 942L, 810L,
938L, 306L, 762L, 344L, 942L, 344L, 364L, 552L, 891L, 11L, 103L,
762L, 287L, 891L, 358L, 730L, 959L, 750L, 191L, 718L, 959L, 358L,
306L, 287L, 692L, 746L, 461L, 750L, 170L, 358L, 911L, 805L, 938L,
481L, 759L, 750L, 140L, 715L, 959L, 928L, 692L, 461L, 750L, 306L,
762L, 691L, 306L, 287L, 481L, 170L, 746L, 810L, 762L, 358L, 292L,
750L, 191L, 47L, 942L, 344L, 191L), V15 = c(987L, 972L, 151L,
397L, 250L, 825L, 681L, 825L, 723L, 49L, 585L, 109L, 833L, 137L,
49L, 690L, 681L, 253L, 385L, 921L, 708L, 151L, 109L, 385L, 54L,
247L, 979L, 121L, 225L, 124L, 825L, 417L, 320L, 979L, 681L, 918L,
145L, 397L, 681L, 145L, 586L, 709L, 284L, 840L, 121L, 368L, 250L,
898L, 840L, 109L, 417L, 513L, 544L, 194L, 417L, 544L, 320L, 987L,
840L, 987L, 888L, 489L, 855L, 906L, 62L, 579L, 379L, 783L, 368L,
379L, 49L, 732L, 279L, 509L, 54L, 145L, 797L, 979L, 709L, 840L,
368L, 830L, 502L, 123L, 681L, 194L, 855L, 703L, 247L, 833L, 609L,
830L, 708L, 609L, 509L, 397L, 987L, 609L, 320L, 124L), V16 = c(346L,
48L, 865L, 865L, 173L, 890L, 482L, 13L, 537L, 171L, 482L, 940L,
843L, 173L, 975L, 866L, 142L, 646L, 482L, 700L, 395L, 298L, 975L,
890L, 361L, 173L, 890L, 975L, 940L, 271L, 395L, 989L, 395L, 142L,
865L, 361L, 399L, 441L, 441L, 772L, 142L, 520L, 142L, 520L, 975L,
930L, 890L, 989L, 530L, 866L, 941L, 530L, 596L, 890L, 36L, 441L,
346L, 865L, 173L, 646L, 270L, 441L, 866L, 866L, 346L, 441L, 482L,
872L, 36L, 890L, 271L, 13L, 36L, 836L, 767L, 395L, 890L, 537L,
395L, 530L, 346L, 346L, 940L, 173L, 865L, 772L, 520L, 171L, 48L,
866L, 135L, 298L, 135L, 77L, 361L, 872L, 395L, 596L, 772L, 532L
), V17 = c(912L, 146L, 312L, 22L, 618L, 317L, 618L, 199L, 369L,
101L, 515L, 4L, 476L, 699L, 517L, 317L, 159L, 517L, 553L, 616L,
995L, 314L, 317L, 314L, 562L, 101L, 249L, 369L, 615L, 562L, 476L,
702L, 312L, 312L, 515L, 101L, 159L, 572L, 101L, 618L, 895L, 317L,
616L, 618L, 572L, 562L, 4L, 517L, 312L, 312L, 249L, 699L, 312L,
158L, 469L, 20L, 524L, 476L, 572L, 249L, 50L, 19L, 249L, 912L,
469L, 476L, 101L, 146L, 616L, 618L, 476L, 20L, 146L, 249L, 50L,
101L, 158L, 517L, 238L, 515L, 895L, 553L, 702L, 146L, 312L, 517L,
158L, 895L, 517L, 101L, 314L, 238L, 22L, 146L, 317L, 895L, 469L,
912L, 369L, 572L), V18 = c(525L, 635L, 488L, 456L, 878L, 119L,
119L, 849L, 768L, 817L, 931L, 275L, 460L, 900L, 494L, 669L, 846L,
488L, 768L, 494L, 570L, 439L, 878L, 275L, 471L, 896L, 768L, 619L,
727L, 977L, 155L, 155L, 896L, 112L, 817L, 768L, 411L, 304L, 964L,
612L, 905L, 768L, 456L, 255L, 119L, 404L, 304L, 576L, 219L, 756L,
612L, 668L, 255L, 768L, 196L, 668L, 155L, 931L, 896L, 878L, 488L,
576L, 640L, 37L, 846L, 494L, 257L, 37L, 411L, 411L, 625L, 820L,
304L, 112L, 619L, 9L, 669L, 494L, 471L, 323L, 318L, 570L, 817L,
578L, 878L, 696L, 977L, 768L, 896L, 525L, 669L, 841L, 471L, 727L,
619L, 304L, 874L, 931L, 37L, 619L), V19 = c(926L, 281L, 957L,
308L, 315L, 814L, 622L, 153L, 858L, 315L, 867L, 176L, 555L, 210L,
867L, 540L, 555L, 867L, 622L, 852L, 540L, 436L, 269L, 505L, 436L,
505L, 654L, 505L, 91L, 125L, 131L, 706L, 243L, 125L, 922L, 281L,
91L, 359L, 33L, 957L, 232L, 698L, 555L, 540L, 667L, 34L, 545L,
698L, 555L, 308L, 926L, 445L, 316L, 748L, 243L, 14L, 521L, 232L,
654L, 243L, 232L, 359L, 156L, 131L, 555L, 359L, 521L, 852L, 706L,
957L, 308L, 125L, 91L, 852L, 315L, 604L, 604L, 760L, 604L, 936L,
521L, 747L, 922L, 555L, 243L, 521L, 316L, 867L, 84L, 176L, 814L,
232L, 315L, 316L, 555L, 505L, 745L, 505L, 232L, 540L), V20 = c(554L,
882L, 823L, 386L, 966L, 694L, 286L, 354L, 214L, 25L, 25L, 110L,
353L, 475L, 479L, 252L, 582L, 999L, 266L, 211L, 18L, 278L, 828L,
412L, 528L, 386L, 296L, 353L, 412L, 80L, 206L, 714L, 18L, 211L,
475L, 554L, 38L, 882L, 25L, 362L, 510L, 110L, 206L, 823L, 362L,
694L, 256L, 479L, 582L, 25L, 828L, 193L, 951L, 80L, 793L, 999L,
882L, 903L, 38L, 386L, 354L, 214L, 916L, 25L, 110L, 864L, 882L,
25L, 353L, 780L, 296L, 864L, 510L, 38L, 386L, 400L, 694L, 793L,
999L, 122L, 278L, 475L, 916L, 903L, 958L, 161L, 828L, 73L, 790L,
73L, 430L, 18L, 958L, 828L, 582L, 383L, 51L, 278L, 18L, 122L)), class = "data.frame", row.names = c(NA,
-100L))
现在我想做的是减少数量,假设从 100 个条目减少到 50 个条目,其中每个条目都是来自每个组的几个索引 1。我尝试使用多种方法计算距离矩阵并选择最远的条目,但当我检查时它并没有提供太多信息。
有什么办法可以做到吗,也许可以考虑列表的列表或者其他复杂的方法?
不胜感激help/insights
编辑 - 澄清 objective
假设我抽取了 100 个组,每个组包含嵌套列表的每个列表中的 1 个元素。
有些组与其他组接近,假设这两个组之间只有 1 个元素不同,所以我可能会想要丢弃它。或者甚至只有 2 个元素不同等等。但我希望最终保留 K 组,它们尽可能“遥远”。
如果可以考虑特定嵌套列表中元素的数量,某种加权过程,也很好。
编辑No.2
对于以下 list(c(1L, 5L, 6L), c(3L, 4L, 2L, 9L), c(8L, 7L, 10L))
我们得到以下数据帧:
structure(list(V1 = c(1L, 5L, 6L, 1L, 6L, 1L, 1L, 6L, 1L, 5L,
5L, 5L, 1L, 1L, 5L, 6L, 5L, 6L, 6L, 5L, 5L, 5L, 6L, 5L, 6L, 1L,
6L, 1L, 1L, 1L, 5L, 5L, 6L, 6L, 5L, 1L, 6L, 6L, 5L, 6L, 1L, 1L,
5L, 5L, 5L, 1L, 6L, 5L, 1L, 5L, 5L, 5L, 5L, 1L, 5L, 5L, 1L, 6L,
5L, 6L, 5L, 6L, 5L, 1L, 5L, 1L, 5L, 6L, 5L, 1L, 6L, 1L, 6L, 1L,
1L, 5L, 5L, 6L, 1L, 5L, 1L, 5L, 5L, 6L, 6L, 1L, 1L, 6L, 6L, 6L,
5L, 5L, 1L, 6L, 1L, 1L, 6L, 5L, 5L, 1L), V2 = c(9L, 3L, 9L, 4L,
2L, 4L, 3L, 3L, 3L, 2L, 2L, 9L, 3L, 3L, 2L, 2L, 9L, 9L, 9L, 3L,
4L, 3L, 2L, 3L, 4L, 2L, 2L, 3L, 4L, 9L, 9L, 2L, 3L, 2L, 9L, 9L,
3L, 2L, 4L, 4L, 3L, 4L, 3L, 2L, 2L, 9L, 9L, 2L, 4L, 4L, 4L, 9L,
2L, 3L, 9L, 3L, 3L, 2L, 2L, 2L, 4L, 2L, 4L, 3L, 3L, 3L, 2L, 9L,
9L, 9L, 2L, 9L, 3L, 3L, 9L, 4L, 3L, 3L, 4L, 3L, 4L, 4L, 4L, 4L,
2L, 9L, 9L, 4L, 9L, 2L, 2L, 9L, 4L, 4L, 9L, 9L, 2L, 4L, 4L, 3L
), V3 = c(7L, 7L, 7L, 8L, 7L, 7L, 7L, 7L, 10L, 8L, 10L, 8L, 7L,
7L, 10L, 10L, 10L, 8L, 8L, 8L, 8L, 8L, 8L, 7L, 10L, 7L, 10L,
10L, 7L, 8L, 7L, 8L, 7L, 8L, 8L, 8L, 7L, 8L, 8L, 8L, 10L, 7L,
8L, 7L, 7L, 10L, 7L, 7L, 10L, 7L, 10L, 8L, 8L, 7L, 10L, 10L,
10L, 8L, 8L, 10L, 7L, 8L, 8L, 10L, 8L, 10L, 10L, 10L, 8L, 10L,
10L, 10L, 8L, 10L, 8L, 7L, 10L, 7L, 7L, 10L, 8L, 7L, 8L, 10L,
7L, 8L, 10L, 7L, 7L, 7L, 7L, 10L, 7L, 7L, 10L, 10L, 7L, 7L, 8L,
10L)), class = "data.frame", row.names = c(NA, -100L))
运行 @Allan Cameron 代码,将产生以下更好的 5:
V1 V2 V3
26 1 2 7
68 6 9 10
7 1 3 7
17 5 9 10
13 1 3 7
正如你所描述的,两组之间的总体“距离”概念有点模糊。很明显,像 c(1, 5, 2, 6)
和 c(2, 9, 12, 3)
这样的对比 c(1, 5, 2, 6)
和 c(101, 78, 96, 54)
这样的对更接近,但是是否应该对完全匹配进行惩罚?方差重要吗?在没有更清晰的距离概念的情况下,我们拥有的最佳衡量标准是每组的 mean。这很容易通过 rowMeans(df)
.
关于“K 最远的组”的概念也有些模糊。组之间的距离是 对 组的函数,而不是单个组的函数。如果K = 1,那么想必任何组都可以。如果 K = 2,则您需要均值差最大的一对组。在那之后,不清楚你在寻找什么,但一种方法是找到具有最高方差的 K 组的集合。
所以如果我们做类似的事情:
k <- 5
group_means <- rowMeans(df)
indices <- seq(nrow(df))
k_furthest <- c(which.min(group_means), which.max(group_means))
k_vals <- c(min(group_means), max(group_means))
group_means <- group_means[-k_furthest]
indices <- indices[-k_furthest]
while(length(k_furthest) < k)
{
best <- which.max(rowSums(sapply(k_vals, function(x) (x - group_means)^2)))
k_vals <- c(k_vals, group_means[best])
k_furthest <- c(k_furthest, indices[best])
group_means <- group_means[-best]
indices <- indices[-best]
}
然后 k_furthest
将包含数据框的 5 行集合,所有均值之间的方差最大。您的结果将如下所示:
df[k_furthest,]
#> V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20
#> 63 236 794 885 300 71 114 725 492 52 468 92 128 948 191 585 441 414 196 156 18
#> 51 798 536 739 704 1000 883 237 644 299 915 695 860 338 47 972 890 996 939 957 793
#> 61 41 388 624 689 672 466 55 229 454 164 542 265 338 170 32 271 314 640 922 582
#> 33 970 598 775 548 228 132 842 644 986 781 818 679 920 287 825 361 562 756 748 929
#> 12 336 216 774 107 71 801 725 492 642 74 613 297 948 306 124 646 19 439 281 122
请注意,此算法实际上只是在每次迭代中交替采用具有最高和最低均值的行。尽管这会在样本之间产生最大的整体集体“差异”,但您最终可能会得到一些非常靠近的样本,前提是它们也都与另一个样本相距很远。这可能不是您要查找的内容,这就是为什么在此上下文中准确指定“距离”的含义可能是个好主意。
编辑
随着进一步的澄清和来自 OP 的新示例,我们似乎正在寻求最大化组间 element-wise 差异的总和。这意味着我们可以这样做:
distances <- as.data.frame(t(sapply(1:nrow(df), function(i) {
a <- rowSums(apply(df, 2, function(x) abs(x[i] - x)))
c(row = i, most_distant = which.max(a), difference = max(a))
})))
这将为我们提供一个数据框,每一行告诉我们最“远”的其他组。
head(distances)
#> row most_distant difference
#> 1 1 16 15
#> 2 2 46 13
#> 3 3 9 14
#> 4 4 68 12
#> 5 5 46 15
#> 6 6 68 13
如果我们根据最大的差异对其进行排序,并取前两列中提到的前 K 组,我们将得到我们的结果:
i <- unique(c(t(distances[order(-distances$difference)[seq(k)], 1:2])))[seq(k)]
df[i,]
#> V1 V2 V3
#> 1 1 9 7
#> 16 6 2 10
#> 5 6 2 7
#> 46 1 9 10
#> 26 1 2 7