减少样本的大数据帧以确保样本之间的最大可变性

Reduce large data-frame of samples to ensure maximum variability between samples

我有一个向量列表,列表中的每个条目都是一个索引向量,例如:

list(c(563L, 688L, 630L, 160L, 568L, 908L, 457L, 798L, 3L, 558L, 
56L, 389L, 506L, 106L, 807L, 556L, 809L, 63L, 343L, 242L, 470L, 
894L, 804L, 970L, 406L, 881L, 893L, 952L, 126L, 827L, 282L, 910L, 
61L, 66L, 763L, 787L, 337L, 41L, 712L, 144L, 450L, 12L, 200L, 
574L, 945L, 236L, 336L, 684L, 280L, 721L, 233L, 686L, 64L, 504L, 
174L, 934L, 40L, 850L, 26L, 799L, 853L, 978L), c(85L, 564L, 591L, 
662L, 377L, 536L, 325L, 402L, 72L, 410L, 687L, 216L, 603L, 67L, 
794L, 388L, 627L, 376L, 863L, 491L, 598L, 861L, 991L, 651L, 670L, 
401L, 459L, 39L, 997L, 806L, 623L, 954L), c(427L, 791L, 212L, 
779L, 657L, 740L, 800L, 838L, 104L, 985L, 167L, 486L, 685L, 739L, 
60L, 862L, 130L, 134L, 175L, 375L, 683L, 885L, 575L, 859L, 341L, 
726L, 472L, 802L, 76L, 424L, 177L, 624L, 189L, 334L, 378L, 329L, 
581L, 224L, 851L, 218L, 993L, 678L, 248L, 365L, 188L, 774L, 58L, 
813L, 514L, 59L, 777L, 485L, 606L, 480L, 826L, 350L, 608L, 27L, 
661L, 775L, 340L, 10L, 207L, 260L, 483L, 150L, 205L), c(138L, 
587L, 165L, 1L, 722L, 300L, 500L, 535L, 832L, 392L, 432L, 139L, 
744L, 676L, 839L, 107L, 769L, 589L, 647L, 548L, 704L, 197L, 689L, 
111L, 342L, 319L, 567L, 17L, 925L, 5L, 116L, 493L, 241L, 965L
), c(89L, 440L, 228L, 884L, 88L, 147L, 413L, 821L, 70L, 95L, 
71L, 917L, 463L, 990L, 672L, 981L, 765L, 937L, 75L, 766L, 374L, 
636L, 449L, 816L, 1000L, 356L, 629L), c(421L, 650L, 453L, 666L, 
584L, 717L, 220L, 605L, 182L, 811L, 157L, 523L, 28L, 527L, 737L, 
812L, 263L, 675L, 132L, 879L, 438L, 451L, 883L, 950L, 114L, 466L, 
348L, 711L, 209L, 887L, 593L, 949L, 349L, 764L, 595L, 736L, 660L, 
801L, 118L, 877L), c(23L, 231L, 78L, 988L, 55L, 57L, 753L, 994L, 
437L, 202L, 842L, 190L, 822L, 968L, 331L, 733L, 782L, 886L, 105L, 
943L, 743L, 815L, 311L, 498L, 792L, 795L, 184L, 728L, 573L, 771L, 
117L, 251L, 192L, 735L, 15L, 776L, 295L, 677L, 631L, 235L, 237L, 
705L, 856L, 97L, 725L), c(229L, 671L, 129L, 405L, 115L, 644L, 
98L, 492L, 871L, 935L, 435L, 707L, 773L, 754L, 803L, 120L, 656L, 
345L, 875L, 330L, 533L, 366L, 240L, 408L, 332L, 577L, 550L, 452L, 
963L, 8L, 187L, 226L, 901L, 371L, 426L, 339L, 519L, 86L, 501L, 
274L, 831L), c(16L, 79L, 68L, 477L, 133L, 659L, 2L, 973L, 264L, 
953L, 90L, 234L, 420L, 588L, 21L, 788L, 363L, 539L, 227L, 565L, 
30L, 642L, 786L, 982L, 347L, 680L, 52L, 96L, 592L, 409L, 643L, 
81L, 419L, 245L, 658L, 416L, 590L, 448L, 819L, 277L, 357L, 442L, 
789L, 516L, 980L, 93L, 998L, 149L, 166L, 299L, 454L, 529L, 986L, 
127L, 541L, 45L, 829L, 289L, 418L, 179L, 310L, 113L, 729L), c(429L, 
781L, 303L, 434L, 83L, 259L, 387L, 583L, 393L, 770L, 246L, 428L, 
947L, 976L, 31L, 382L, 710L, 944L, 164L, 868L, 373L, 899L, 74L, 
468L, 614L, 701L, 221L, 645L, 268L, 785L, 293L, 632L, 24L, 749L, 
283L, 741L, 796L, 915L), c(258L, 844L, 649L, 752L, 474L, 613L, 
351L, 551L, 309L, 380L, 497L, 724L, 327L, 992L, 845L, 607L, 818L, 
693L, 914L, 291L, 720L, 633L, 974L, 367L, 639L, 94L, 467L, 92L, 
522L, 141L, 496L, 276L, 542L, 665L, 695L, 634L, 602L, 913L, 396L, 
597L, 443L, 892L, 65L, 394L, 222L, 778L, 169L, 960L, 35L, 655L, 
422L, 927L, 154L, 215L, 262L, 203L, 880L, 217L, 423L, 755L, 904L, 
180L, 620L), c(507L, 628L, 29L, 902L, 738L, 897L, 664L, 967L, 
294L, 682L, 254L, 302L, 128L, 559L, 511L, 526L, 7L, 742L, 464L, 
621L, 265L, 599L, 102L, 546L, 458L, 969L, 751L, 860L, 326L, 873L, 
335L, 580L, 499L, 962L, 290L, 557L, 213L, 716L, 53L, 835L, 600L, 
610L, 321L, 673L, 713L, 876L, 244L, 462L, 136L, 272L, 195L, 447L, 
230L, 679L, 465L, 611L, 297L, 731L, 44L, 824L, 162L, 837L), c(446L, 
561L, 391L, 652L, 857L, 946L, 560L, 784L, 854L, 204L, 512L, 82L, 
455L, 372L, 407L, 328L, 808L, 152L, 178L, 185L, 543L, 108L, 473L, 
490L, 955L, 719L, 757L, 198L, 338L, 223L, 919L, 531L, 653L, 734L, 
923L, 487L, 637L, 398L, 431L, 46L, 848L, 324L, 948L, 43L, 183L, 
288L, 697L, 87L, 307L, 42L, 571L, 360L, 433L, 390L, 569L, 956L, 
534L, 6L, 381L, 549L, 301L, 920L, 69L, 322L, 267L, 503L, 285L, 
961L, 370L, 425L), c(344L, 959L, 364L, 552L, 11L, 481L, 287L, 
891L, 692L, 762L, 47L, 292L, 358L, 810L, 942L, 730L, 746L, 638L, 
750L, 759L, 761L, 140L, 444L, 191L, 805L, 306L, 691L, 170L, 715L, 
508L, 984L, 461L, 911L, 103L, 938L, 718L, 928L), c(124L, 284L, 
123L, 513L, 417L, 933L, 121L, 168L, 208L, 385L, 32L, 273L, 869L, 
932L, 397L, 509L, 239L, 797L, 379L, 723L, 898L, 163L, 320L, 833L, 
151L, 906L, 648L, 732L, 279L, 834L, 489L, 840L, 783L, 971L, 49L, 
145L, 253L, 352L, 137L, 261L, 247L, 143L, 544L, 109L, 921L, 830L, 
972L, 585L, 690L, 609L, 703L, 250L, 708L, 225L, 889L, 181L, 987L, 
54L, 502L, 148L, 355L, 888L, 579L, 983L, 825L, 855L, 62L, 918L, 
979L, 586L, 681L, 384L, 709L, 333L, 758L, 194L, 368L), c(646L, 
930L, 361L, 399L, 13L, 298L, 395L, 975L, 482L, 940L, 596L, 772L, 
700L, 843L, 171L, 537L, 173L, 836L, 767L, 989L, 532L, 890L, 99L, 
865L, 142L, 135L, 271L, 346L, 441L, 48L, 941L, 866L, 201L, 872L, 
36L, 520L, 530L, 77L, 270L), c(238L, 699L, 22L, 50L, 615L, 702L, 
4L, 469L, 101L, 314L, 616L, 995L, 996L, 414L, 566L, 249L, 572L, 
369L, 553L, 158L, 159L, 199L, 317L, 515L, 517L, 524L, 562L, 19L, 
476L, 20L, 146L, 618L, 895L, 312L, 912L), c(768L, 939L, 578L, 
849L, 196L, 640L, 323L, 635L, 304L, 318L, 874L, 977L, 488L, 619L, 
155L, 905L, 9L, 112L, 484L, 847L, 313L, 900L, 494L, 727L, 625L, 
931L, 119L, 846L, 186L, 219L, 471L, 696L, 404L, 460L, 668L, 896L, 
439L, 964L, 275L, 756L, 411L, 878L, 538L, 669L, 478L, 570L, 255L, 
547L, 257L, 841L, 37L, 576L, 456L, 663L, 525L, 817L, 612L, 820L
), c(243L, 594L, 33L, 176L, 415L, 667L, 748L, 852L, 232L, 922L, 
308L, 436L, 153L, 505L, 14L, 281L, 316L, 495L, 540L, 622L, 156L, 
926L, 521L, 698L, 545L, 760L, 84L, 210L, 359L, 131L, 745L, 34L, 
91L, 555L, 858L, 445L, 867L, 125L, 814L, 604L, 706L, 315L, 654L, 
747L, 936L, 269L, 957L), c(80L, 924L, 110L, 193L, 958L, 296L, 
475L, 18L, 907L, 626L, 999L, 278L, 362L, 51L, 641L, 211L, 929L, 
122L, 694L, 73L, 353L, 25L, 100L, 305L, 864L, 214L, 790L, 286L, 
518L, 674L, 206L, 400L, 554L, 903L, 780L, 916L, 38L, 430L, 617L, 
823L, 172L, 966L, 412L, 951L, 510L, 828L, 479L, 909L, 266L, 582L, 
870L, 882L, 161L, 252L, 256L, 383L, 403L, 601L, 386L, 793L, 528L, 
354L, 714L))

其中每个条目(或每个嵌套列表)代表一个使用聚类方法获得的组。

现在我有以下一段代码,它采用这个嵌套列表列表和所需的样本数量,以及 returns 一个数据框,其中每一行代表一个样本,每一列代表一个样本来自嵌套列表之一的组。

groups_samples <- function(groups, repetition) {
  return(as.data.frame(sapply(groups, sample, repetition, TRUE)))
}

下面以下面为例:

df <- groups_samples(ll, 100)

    structure(list(V1 = c(106L, 686L, 721L, 200L, 970L, 910L, 556L, 
807L, 908L, 568L, 688L, 389L, 56L, 470L, 630L, 893L, 574L, 236L, 
804L, 798L, 721L, 934L, 763L, 807L, 457L, 568L, 684L, 934L, 787L, 
450L, 688L, 64L, 568L, 934L, 894L, 558L, 568L, 343L, 450L, 853L, 
336L, 64L, 712L, 144L, 934L, 144L, 809L, 763L, 457L, 763L, 558L, 
457L, 688L, 763L, 504L, 66L, 406L, 881L, 3L, 343L, 556L, 799L, 
712L, 568L, 61L, 799L, 908L, 688L, 64L, 881L, 236L, 787L, 66L, 
160L, 853L, 343L, 809L, 200L, 827L, 893L, 894L, 799L, 470L, 406L, 
337L, 389L, 63L, 952L, 236L, 337L, 763L, 41L, 945L, 144L, 56L, 
978L, 233L, 978L, 881L, 910L), V2 = c(72L, 651L, 861L, 651L, 
591L, 72L, 564L, 662L, 402L, 623L, 603L, 377L, 401L, 603L, 598L, 
67L, 991L, 376L, 67L, 325L, 325L, 377L, 536L, 861L, 564L, 670L, 
806L, 377L, 687L, 603L, 954L, 627L, 67L, 388L, 954L, 564L, 991L, 
564L, 591L, 863L, 376L, 991L, 85L, 85L, 564L, 598L, 591L, 687L, 
806L, 564L, 401L, 72L, 603L, 536L, 459L, 603L, 954L, 67L, 216L, 
410L, 687L, 806L, 623L, 388L, 67L, 401L, 491L, 662L, 85L, 627L, 
598L, 954L, 459L, 591L, 997L, 687L, 687L, 536L, 863L, 459L, 670L, 
459L, 603L, 401L, 39L, 687L, 39L, 651L, 991L, 376L, 388L, 954L, 
997L, 85L, 39L, 627L, 861L, 670L, 39L, 459L), V3 = c(424L, 775L, 
862L, 791L, 683L, 826L, 60L, 205L, 802L, 740L, 58L, 985L, 683L, 
341L, 838L, 212L, 993L, 59L, 851L, 657L, 375L, 885L, 150L, 167L, 
218L, 205L, 58L, 260L, 341L, 661L, 791L, 350L, 726L, 378L, 188L, 
150L, 60L, 813L, 774L, 104L, 207L, 207L, 485L, 514L, 424L, 514L, 
859L, 130L, 350L, 188L, 188L, 740L, 859L, 177L, 212L, 802L, 606L, 
104L, 608L, 260L, 329L, 993L, 427L, 427L, 485L, 472L, 859L, 424L, 
661L, 514L, 791L, 678L, 993L, 726L, 188L, 340L, 483L, 150L, 340L, 
514L, 606L, 248L, 205L, 188L, 581L, 813L, 175L, 657L, 862L, 775L, 
212L, 341L, 27L, 885L, 575L, 334L, 350L, 486L, 483L, 340L), V4 = c(138L, 
493L, 111L, 241L, 548L, 107L, 548L, 965L, 839L, 1L, 139L, 1L, 
165L, 769L, 111L, 965L, 548L, 1L, 676L, 319L, 689L, 769L, 567L, 
197L, 139L, 319L, 319L, 832L, 116L, 500L, 392L, 704L, 689L, 500L, 
689L, 832L, 165L, 138L, 116L, 676L, 197L, 589L, 832L, 165L, 925L, 
165L, 647L, 832L, 116L, 744L, 587L, 925L, 500L, 116L, 107L, 832L, 
500L, 319L, 17L, 925L, 116L, 548L, 17L, 107L, 676L, 111L, 832L, 
925L, 111L, 107L, 17L, 722L, 139L, 432L, 319L, 548L, 241L, 769L, 
319L, 17L, 689L, 342L, 165L, 722L, 676L, 319L, 197L, 241L, 139L, 
139L, 111L, 744L, 689L, 722L, 965L, 432L, 647L, 432L, 1L, 111L
), V5 = c(816L, 95L, 884L, 821L, 88L, 374L, 981L, 672L, 70L, 
71L, 89L, 95L, 374L, 75L, 917L, 765L, 917L, 449L, 71L, 884L, 
766L, 70L, 672L, 89L, 816L, 937L, 937L, 440L, 413L, 1000L, 1000L, 
413L, 70L, 356L, 821L, 440L, 990L, 821L, 147L, 356L, 629L, 374L, 
766L, 766L, 71L, 937L, 89L, 95L, 917L, 937L, 937L, 449L, 95L, 
463L, 1000L, 440L, 821L, 884L, 917L, 816L, 89L, 1000L, 766L, 
356L, 765L, 440L, 75L, 463L, 440L, 440L, 765L, 636L, 672L, 629L, 
88L, 356L, 374L, 374L, 463L, 95L, 463L, 75L, 71L, 89L, 449L, 
88L, 990L, 884L, 765L, 463L, 884L, 672L, 463L, 449L, 629L, 821L, 
981L, 75L, 990L, 440L), V6 = c(650L, 675L, 737L, 466L, 883L, 
877L, 209L, 887L, 584L, 263L, 605L, 132L, 584L, 950L, 650L, 451L, 
737L, 453L, 348L, 675L, 949L, 349L, 209L, 584L, 801L, 593L, 711L, 
666L, 466L, 605L, 527L, 666L, 584L, 717L, 114L, 660L, 118L, 466L, 
811L, 595L, 438L, 28L, 593L, 811L, 118L, 711L, 605L, 593L, 466L, 
650L, 801L, 438L, 348L, 349L, 118L, 584L, 114L, 584L, 801L, 209L, 
157L, 466L, 801L, 182L, 812L, 132L, 523L, 666L, 605L, 527L, 950L, 
950L, 812L, 421L, 584L, 801L, 132L, 182L, 737L, 887L, 883L, 605L, 
737L, 711L, 28L, 675L, 220L, 157L, 118L, 887L, 675L, 132L, 736L, 
811L, 887L, 438L, 182L, 717L, 737L, 950L), V7 = c(994L, 202L, 
311L, 725L, 437L, 725L, 776L, 295L, 792L, 57L, 57L, 295L, 842L, 
15L, 776L, 331L, 822L, 795L, 78L, 988L, 498L, 822L, 988L, 782L, 
776L, 728L, 631L, 725L, 735L, 573L, 105L, 295L, 23L, 78L, 202L, 
117L, 190L, 705L, 105L, 57L, 792L, 251L, 251L, 968L, 192L, 23L, 
231L, 822L, 295L, 231L, 631L, 842L, 57L, 235L, 815L, 331L, 117L, 
705L, 331L, 994L, 795L, 237L, 815L, 815L, 23L, 822L, 235L, 631L, 
78L, 97L, 57L, 192L, 677L, 184L, 57L, 231L, 231L, 753L, 733L, 
237L, 743L, 677L, 631L, 988L, 815L, 311L, 815L, 311L, 771L, 728L, 
23L, 988L, 728L, 705L, 97L, 988L, 994L, 57L, 728L, 192L), V8 = c(754L, 
875L, 332L, 935L, 86L, 339L, 86L, 644L, 339L, 501L, 803L, 229L, 
644L, 426L, 550L, 129L, 330L, 129L, 229L, 86L, 773L, 803L, 129L, 
901L, 452L, 8L, 229L, 98L, 129L, 366L, 187L, 8L, 773L, 187L, 
229L, 8L, 98L, 935L, 98L, 345L, 754L, 533L, 332L, 550L, 240L, 
875L, 773L, 229L, 426L, 754L, 120L, 803L, 129L, 901L, 901L, 644L, 
345L, 707L, 707L, 773L, 533L, 120L, 332L, 330L, 803L, 86L, 803L, 
8L, 226L, 345L, 871L, 240L, 550L, 963L, 330L, 345L, 226L, 533L, 
366L, 452L, 803L, 405L, 803L, 405L, 550L, 577L, 8L, 339L, 901L, 
577L, 330L, 229L, 330L, 656L, 452L, 330L, 519L, 226L, 366L, 435L
), V9 = c(643L, 953L, 642L, 21L, 592L, 16L, 127L, 539L, 409L, 
516L, 419L, 277L, 986L, 590L, 45L, 980L, 998L, 516L, 541L, 980L, 
454L, 81L, 149L, 986L, 227L, 45L, 420L, 363L, 986L, 90L, 409L, 
986L, 953L, 45L, 982L, 588L, 68L, 127L, 127L, 16L, 418L, 21L, 
953L, 442L, 418L, 419L, 565L, 980L, 659L, 16L, 149L, 448L, 789L, 
454L, 516L, 2L, 127L, 79L, 277L, 980L, 234L, 357L, 357L, 642L, 
980L, 680L, 729L, 81L, 21L, 454L, 986L, 357L, 980L, 973L, 680L, 
592L, 788L, 2L, 264L, 79L, 680L, 729L, 52L, 986L, 539L, 79L, 
277L, 416L, 786L, 477L, 113L, 454L, 419L, 442L, 953L, 79L, 245L, 
788L, 93L, 234L), V10 = c(31L, 468L, 468L, 387L, 164L, 796L, 
701L, 785L, 915L, 614L, 741L, 770L, 770L, 583L, 373L, 373L, 393L, 
221L, 303L, 83L, 74L, 785L, 387L, 741L, 741L, 393L, 468L, 701L, 
382L, 393L, 387L, 899L, 429L, 947L, 781L, 781L, 645L, 645L, 710L, 
915L, 74L, 796L, 259L, 749L, 373L, 393L, 246L, 632L, 785L, 259L, 
614L, 785L, 428L, 741L, 632L, 382L, 770L, 710L, 781L, 749L, 868L, 
915L, 434L, 221L, 429L, 303L, 393L, 468L, 632L, 976L, 781L, 373L, 
947L, 428L, 781L, 781L, 645L, 868L, 645L, 710L, 283L, 31L, 868L, 
583L, 915L, 246L, 373L, 373L, 781L, 164L, 428L, 710L, 373L, 303L, 
632L, 868L, 614L, 947L, 74L, 382L), V11 = c(351L, 154L, 423L, 
496L, 818L, 913L, 665L, 913L, 380L, 720L, 542L, 380L, 634L, 551L, 
258L, 818L, 634L, 474L, 222L, 639L, 974L, 755L, 262L, 665L, 522L, 
217L, 927L, 351L, 755L, 914L, 380L, 65L, 844L, 633L, 613L, 222L, 
649L, 892L, 752L, 423L, 755L, 169L, 904L, 309L, 639L, 276L, 217L, 
394L, 291L, 522L, 203L, 720L, 35L, 422L, 724L, 423L, 720L, 914L, 
180L, 327L, 92L, 422L, 258L, 467L, 724L, 620L, 665L, 367L, 639L, 
443L, 892L, 724L, 141L, 422L, 327L, 396L, 92L, 309L, 844L, 258L, 
914L, 634L, 497L, 222L, 141L, 880L, 467L, 443L, 496L, 913L, 394L, 
217L, 35L, 396L, 35L, 880L, 351L, 755L, 474L, 215L), V12 = c(102L, 
546L, 682L, 464L, 162L, 876L, 162L, 302L, 682L, 162L, 302L, 53L, 
967L, 679L, 837L, 824L, 44L, 53L, 294L, 738L, 254L, 557L, 546L, 
7L, 902L, 244L, 128L, 499L, 621L, 499L, 458L, 526L, 837L, 465L, 
290L, 969L, 265L, 507L, 835L, 837L, 546L, 136L, 897L, 213L, 195L, 
244L, 465L, 835L, 464L, 621L, 162L, 511L, 969L, 230L, 580L, 335L, 
610L, 969L, 546L, 897L, 835L, 447L, 526L, 302L, 464L, 302L, 682L, 
628L, 610L, 272L, 53L, 254L, 969L, 962L, 511L, 621L, 290L, 458L, 
559L, 860L, 136L, 507L, 462L, 136L, 462L, 731L, 873L, 462L, 335L, 
897L, 580L, 447L, 628L, 731L, 7L, 335L, 102L, 128L, 679L, 742L
), V13 = c(108L, 637L, 757L, 734L, 534L, 42L, 808L, 322L, 757L, 
204L, 808L, 324L, 288L, 82L, 285L, 961L, 955L, 652L, 808L, 961L, 
503L, 549L, 697L, 87L, 734L, 43L, 204L, 455L, 398L, 961L, 183L, 
433L, 431L, 854L, 490L, 69L, 407L, 808L, 398L, 69L, 87L, 338L, 
446L, 178L, 6L, 198L, 82L, 543L, 370L, 534L, 87L, 267L, 455L, 
360L, 534L, 407L, 431L, 446L, 854L, 857L, 46L, 637L, 848L, 923L, 
560L, 531L, 919L, 223L, 307L, 561L, 6L, 719L, 560L, 43L, 734L, 
288L, 324L, 87L, 808L, 322L, 757L, 446L, 425L, 324L, 757L, 857L, 
87L, 848L, 223L, 503L, 307L, 152L, 503L, 757L, 956L, 152L, 43L, 
69L, 719L, 637L), V14 = c(746L, 805L, 191L, 47L, 508L, 508L, 
715L, 461L, 928L, 750L, 140L, 746L, 364L, 552L, 287L, 984L, 481L, 
715L, 762L, 959L, 750L, 344L, 959L, 959L, 306L, 911L, 103L, 638L, 
759L, 761L, 750L, 444L, 692L, 692L, 761L, 481L, 552L, 942L, 810L, 
938L, 306L, 762L, 344L, 942L, 344L, 364L, 552L, 891L, 11L, 103L, 
762L, 287L, 891L, 358L, 730L, 959L, 750L, 191L, 718L, 959L, 358L, 
306L, 287L, 692L, 746L, 461L, 750L, 170L, 358L, 911L, 805L, 938L, 
481L, 759L, 750L, 140L, 715L, 959L, 928L, 692L, 461L, 750L, 306L, 
762L, 691L, 306L, 287L, 481L, 170L, 746L, 810L, 762L, 358L, 292L, 
750L, 191L, 47L, 942L, 344L, 191L), V15 = c(987L, 972L, 151L, 
397L, 250L, 825L, 681L, 825L, 723L, 49L, 585L, 109L, 833L, 137L, 
49L, 690L, 681L, 253L, 385L, 921L, 708L, 151L, 109L, 385L, 54L, 
247L, 979L, 121L, 225L, 124L, 825L, 417L, 320L, 979L, 681L, 918L, 
145L, 397L, 681L, 145L, 586L, 709L, 284L, 840L, 121L, 368L, 250L, 
898L, 840L, 109L, 417L, 513L, 544L, 194L, 417L, 544L, 320L, 987L, 
840L, 987L, 888L, 489L, 855L, 906L, 62L, 579L, 379L, 783L, 368L, 
379L, 49L, 732L, 279L, 509L, 54L, 145L, 797L, 979L, 709L, 840L, 
368L, 830L, 502L, 123L, 681L, 194L, 855L, 703L, 247L, 833L, 609L, 
830L, 708L, 609L, 509L, 397L, 987L, 609L, 320L, 124L), V16 = c(346L, 
48L, 865L, 865L, 173L, 890L, 482L, 13L, 537L, 171L, 482L, 940L, 
843L, 173L, 975L, 866L, 142L, 646L, 482L, 700L, 395L, 298L, 975L, 
890L, 361L, 173L, 890L, 975L, 940L, 271L, 395L, 989L, 395L, 142L, 
865L, 361L, 399L, 441L, 441L, 772L, 142L, 520L, 142L, 520L, 975L, 
930L, 890L, 989L, 530L, 866L, 941L, 530L, 596L, 890L, 36L, 441L, 
346L, 865L, 173L, 646L, 270L, 441L, 866L, 866L, 346L, 441L, 482L, 
872L, 36L, 890L, 271L, 13L, 36L, 836L, 767L, 395L, 890L, 537L, 
395L, 530L, 346L, 346L, 940L, 173L, 865L, 772L, 520L, 171L, 48L, 
866L, 135L, 298L, 135L, 77L, 361L, 872L, 395L, 596L, 772L, 532L
), V17 = c(912L, 146L, 312L, 22L, 618L, 317L, 618L, 199L, 369L, 
101L, 515L, 4L, 476L, 699L, 517L, 317L, 159L, 517L, 553L, 616L, 
995L, 314L, 317L, 314L, 562L, 101L, 249L, 369L, 615L, 562L, 476L, 
702L, 312L, 312L, 515L, 101L, 159L, 572L, 101L, 618L, 895L, 317L, 
616L, 618L, 572L, 562L, 4L, 517L, 312L, 312L, 249L, 699L, 312L, 
158L, 469L, 20L, 524L, 476L, 572L, 249L, 50L, 19L, 249L, 912L, 
469L, 476L, 101L, 146L, 616L, 618L, 476L, 20L, 146L, 249L, 50L, 
101L, 158L, 517L, 238L, 515L, 895L, 553L, 702L, 146L, 312L, 517L, 
158L, 895L, 517L, 101L, 314L, 238L, 22L, 146L, 317L, 895L, 469L, 
912L, 369L, 572L), V18 = c(525L, 635L, 488L, 456L, 878L, 119L, 
119L, 849L, 768L, 817L, 931L, 275L, 460L, 900L, 494L, 669L, 846L, 
488L, 768L, 494L, 570L, 439L, 878L, 275L, 471L, 896L, 768L, 619L, 
727L, 977L, 155L, 155L, 896L, 112L, 817L, 768L, 411L, 304L, 964L, 
612L, 905L, 768L, 456L, 255L, 119L, 404L, 304L, 576L, 219L, 756L, 
612L, 668L, 255L, 768L, 196L, 668L, 155L, 931L, 896L, 878L, 488L, 
576L, 640L, 37L, 846L, 494L, 257L, 37L, 411L, 411L, 625L, 820L, 
304L, 112L, 619L, 9L, 669L, 494L, 471L, 323L, 318L, 570L, 817L, 
578L, 878L, 696L, 977L, 768L, 896L, 525L, 669L, 841L, 471L, 727L, 
619L, 304L, 874L, 931L, 37L, 619L), V19 = c(926L, 281L, 957L, 
308L, 315L, 814L, 622L, 153L, 858L, 315L, 867L, 176L, 555L, 210L, 
867L, 540L, 555L, 867L, 622L, 852L, 540L, 436L, 269L, 505L, 436L, 
505L, 654L, 505L, 91L, 125L, 131L, 706L, 243L, 125L, 922L, 281L, 
91L, 359L, 33L, 957L, 232L, 698L, 555L, 540L, 667L, 34L, 545L, 
698L, 555L, 308L, 926L, 445L, 316L, 748L, 243L, 14L, 521L, 232L, 
654L, 243L, 232L, 359L, 156L, 131L, 555L, 359L, 521L, 852L, 706L, 
957L, 308L, 125L, 91L, 852L, 315L, 604L, 604L, 760L, 604L, 936L, 
521L, 747L, 922L, 555L, 243L, 521L, 316L, 867L, 84L, 176L, 814L, 
232L, 315L, 316L, 555L, 505L, 745L, 505L, 232L, 540L), V20 = c(554L, 
882L, 823L, 386L, 966L, 694L, 286L, 354L, 214L, 25L, 25L, 110L, 
353L, 475L, 479L, 252L, 582L, 999L, 266L, 211L, 18L, 278L, 828L, 
412L, 528L, 386L, 296L, 353L, 412L, 80L, 206L, 714L, 18L, 211L, 
475L, 554L, 38L, 882L, 25L, 362L, 510L, 110L, 206L, 823L, 362L, 
694L, 256L, 479L, 582L, 25L, 828L, 193L, 951L, 80L, 793L, 999L, 
882L, 903L, 38L, 386L, 354L, 214L, 916L, 25L, 110L, 864L, 882L, 
25L, 353L, 780L, 296L, 864L, 510L, 38L, 386L, 400L, 694L, 793L, 
999L, 122L, 278L, 475L, 916L, 903L, 958L, 161L, 828L, 73L, 790L, 
73L, 430L, 18L, 958L, 828L, 582L, 383L, 51L, 278L, 18L, 122L)), class = "data.frame", row.names = c(NA, 
-100L))

现在我想做的是减少数量,假设从 100 个条目减少到 50 个条目,其中每个条目都是来自每个组的几个索引 1。我尝试使用多种方法计算距离矩阵并选择最远的条目,但当我检查时它并没有提供太多信息。

有什么办法可以做到吗,也许可以考虑列表的列表或者其他复杂的方法?

不胜感激help/insights

编辑 - 澄清 objective

假设我抽取了 100 个组,每个组包含嵌套列表的每个列表中的 1 个元素。

有些组与其他组接近,假设这两个组之间只有 1 个元素不同,所以我可能会想要丢弃它。或者甚至只有 2 个元素不同等等。但我希望最终保留 K 组,它们尽可能“遥远”。

如果可以考虑特定嵌套列表中元素的数量,某种加权过程,也很好。

编辑No.2

对于以下 list(c(1L, 5L, 6L), c(3L, 4L, 2L, 9L), c(8L, 7L, 10L)) 我们得到以下数据帧:

structure(list(V1 = c(1L, 5L, 6L, 1L, 6L, 1L, 1L, 6L, 1L, 5L, 
5L, 5L, 1L, 1L, 5L, 6L, 5L, 6L, 6L, 5L, 5L, 5L, 6L, 5L, 6L, 1L, 
6L, 1L, 1L, 1L, 5L, 5L, 6L, 6L, 5L, 1L, 6L, 6L, 5L, 6L, 1L, 1L, 
5L, 5L, 5L, 1L, 6L, 5L, 1L, 5L, 5L, 5L, 5L, 1L, 5L, 5L, 1L, 6L, 
5L, 6L, 5L, 6L, 5L, 1L, 5L, 1L, 5L, 6L, 5L, 1L, 6L, 1L, 6L, 1L, 
1L, 5L, 5L, 6L, 1L, 5L, 1L, 5L, 5L, 6L, 6L, 1L, 1L, 6L, 6L, 6L, 
5L, 5L, 1L, 6L, 1L, 1L, 6L, 5L, 5L, 1L), V2 = c(9L, 3L, 9L, 4L, 
2L, 4L, 3L, 3L, 3L, 2L, 2L, 9L, 3L, 3L, 2L, 2L, 9L, 9L, 9L, 3L, 
4L, 3L, 2L, 3L, 4L, 2L, 2L, 3L, 4L, 9L, 9L, 2L, 3L, 2L, 9L, 9L, 
3L, 2L, 4L, 4L, 3L, 4L, 3L, 2L, 2L, 9L, 9L, 2L, 4L, 4L, 4L, 9L, 
2L, 3L, 9L, 3L, 3L, 2L, 2L, 2L, 4L, 2L, 4L, 3L, 3L, 3L, 2L, 9L, 
9L, 9L, 2L, 9L, 3L, 3L, 9L, 4L, 3L, 3L, 4L, 3L, 4L, 4L, 4L, 4L, 
2L, 9L, 9L, 4L, 9L, 2L, 2L, 9L, 4L, 4L, 9L, 9L, 2L, 4L, 4L, 3L
), V3 = c(7L, 7L, 7L, 8L, 7L, 7L, 7L, 7L, 10L, 8L, 10L, 8L, 7L, 
7L, 10L, 10L, 10L, 8L, 8L, 8L, 8L, 8L, 8L, 7L, 10L, 7L, 10L, 
10L, 7L, 8L, 7L, 8L, 7L, 8L, 8L, 8L, 7L, 8L, 8L, 8L, 10L, 7L, 
8L, 7L, 7L, 10L, 7L, 7L, 10L, 7L, 10L, 8L, 8L, 7L, 10L, 10L, 
10L, 8L, 8L, 10L, 7L, 8L, 8L, 10L, 8L, 10L, 10L, 10L, 8L, 10L, 
10L, 10L, 8L, 10L, 8L, 7L, 10L, 7L, 7L, 10L, 8L, 7L, 8L, 10L, 
7L, 8L, 10L, 7L, 7L, 7L, 7L, 10L, 7L, 7L, 10L, 10L, 7L, 7L, 8L, 
10L)), class = "data.frame", row.names = c(NA, -100L))

运行 @Allan Cameron 代码,将产生以下更好的 5:

   V1 V2 V3
26  1  2  7
68  6  9 10
7   1  3  7
17  5  9 10
13  1  3  7

正如你所描述的,两组之间的总体“距离”概念有点模糊。很明显,像 c(1, 5, 2, 6)c(2, 9, 12, 3) 这样的对比 c(1, 5, 2, 6)c(101, 78, 96, 54) 这样的对更接近,但是是否应该对完全匹配进行惩罚?方差重要吗?在没有更清晰的距离概念的情况下,我们拥有的最佳衡量标准是每组的 mean。这很容易通过 rowMeans(df).

获得

关于“K 最远的组”的概念也有些模糊。组之间的距离是 组的函数,而不是单个组的函数。如果K = 1,那么想必任何组都可以。如果 K = 2,则您需要均值差最大的一对组。在那之后,不清楚你在寻找什么,但一种方法是找到具有最高方差的 K 组的集合。

所以如果我们做类似的事情:

k <- 5

group_means <- rowMeans(df)
indices     <- seq(nrow(df))

k_furthest <- c(which.min(group_means), which.max(group_means))
k_vals     <- c(min(group_means), max(group_means))

group_means <- group_means[-k_furthest]
indices     <- indices[-k_furthest]

while(length(k_furthest) < k)
{
  best <- which.max(rowSums(sapply(k_vals, function(x) (x - group_means)^2)))
  k_vals <- c(k_vals, group_means[best])
  k_furthest <- c(k_furthest, indices[best])
  group_means <- group_means[-best]
  indices     <- indices[-best]
}

然后 k_furthest 将包含数据框的 5 行集合,所有均值之间的方差最大。您的结果将如下所示:

 df[k_furthest,]
#>     V1  V2  V3  V4   V5  V6  V7  V8  V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20
#> 63 236 794 885 300   71 114 725 492  52 468  92 128 948 191 585 441 414 196 156  18
#> 51 798 536 739 704 1000 883 237 644 299 915 695 860 338  47 972 890 996 939 957 793
#> 61  41 388 624 689  672 466  55 229 454 164 542 265 338 170  32 271 314 640 922 582
#> 33 970 598 775 548  228 132 842 644 986 781 818 679 920 287 825 361 562 756 748 929
#> 12 336 216 774 107   71 801 725 492 642  74 613 297 948 306 124 646  19 439 281 122

请注意,此算法实际上只是在每次迭代中交替采用具有最高和最低均值的行。尽管这会在样本之间产生最大的整体集体“差异”,但您最终可能会得到一些非常靠近的样本,前提是它们也都与另一个样本相距很远。这可能不是您要查找的内容,这就是为什么在此上下文中准确指定“距离”的含义可能是个好主意。

编辑

随着进一步的澄清和来自 OP 的新示例,我们似乎正在寻求最大化组间 element-wise 差异的总和。这意味着我们可以这样做:

distances <- as.data.frame(t(sapply(1:nrow(df), function(i) {
  a <- rowSums(apply(df, 2, function(x) abs(x[i] - x)))
  c(row = i, most_distant = which.max(a), difference = max(a))
  })))

这将为我们提供一个数据框,每一行告诉我们最“远”的其他组。

head(distances)
#>   row most_distant difference
#> 1   1           16         15
#> 2   2           46         13
#> 3   3            9         14
#> 4   4           68         12
#> 5   5           46         15
#> 6   6           68         13

如果我们根据最大的差异对其进行排序,并取前两列中提到的前 K 组,我们将得到我们的结果:

i <- unique(c(t(distances[order(-distances$difference)[seq(k)], 1:2])))[seq(k)]

df[i,]
#>    V1 V2 V3
#> 1   1  9  7
#> 16  6  2 10
#> 5   6  2  7
#> 46  1  9 10
#> 26  1  2  7