为什么我的第一列将成为数据框中的第 0 列？

Question

我的输入文件有 8 列。我有 38 个文件想合并在一起。输入文件：AAA.out

             pos     gpos         p1        ihh1        p2        ihh2  xpehh
 9.1022217  1022217 1.02222e+06 0.138333    901220  0.0738636   572286  0.454111  

 9.1024910  1024910 1.02491e+06 0.138333    900853  0.0738636   572286  0.453703  

 9.1041353  1041353 1.04135e+06 0.246667    852186  0.0738636   573584  0.3959  

 9.1070162  1070162 1.07016e+06 0.113333    870718  0   583622  0.400065

BBB.out

             pos      gpos          p1       ihh1       p2      ihh2    xpehh
  8.1135641 1135641 1.13564e+06 0.368333    639953  0.352273    512804  0.2215  
  8.1152035 1152035 1.15204e+06 0.00333333  651548  0   540213  0.187389
  8.1158202 1158202 1.1582e+06  0.358333    646188  0   540213  0.179129
  8.1178735 1178735 1.17874e+06 0.01    654438  0.409091    486335  0.29688
  8.1193344 1193344 1.19334e+06 0   651573  0   497049  0.270699
  8.1230464 1230464 1.23046e+06 0.373333    631599  0.505682    482294  0.269701

我尝试通过

合并它们

files <- list.files(pattern = "*.*.out", full.names = TRUE, recursive = FALSE)  
#make a list of all out.files
uridata <- data.frame()
#go through each file, one by one, and add it to the 'uridata' df,   above  
big_list_of_data_frames <- lapply(files, read.table, skip = FALSE,header = TRUE, stringsAsFactors = FALSE)  
big_data_frame <- do.call(rbind,big_list_of_data_frames)
new_fram <- big_data_frame [,c(1,7)]  

the dput:  
structure(list(pos = c(1022217L, 1024910L, 1041353L, 1070162L, 
1089884L), gpos = c(1022220, 1024910, 1041350, 1070160, 1089880
), p1 = c(0.138333, 0.138333, 0.246667, 0.113333, 0.113333), 
    ihh1 = c(901220L, 900853L, 852186L, 870718L, 870014L), p2 =      c(0.0738636, 
0.0738636, 0.0738636, 0, 0), ihh2 = c(572286L, 572286L, 573584L, 
583622L, 583435L), xpehh = c(0.454111, 0.453703, 0.3959, 
0.400065, 0.399577)), class = "data.frame", row.names = c("9.1022217", 
"9.1024910", "9.1041353", "9.1070162", "9.1089884"))

我希望我的输出文件是 csv

    ID             XPEHH  
    9.1022217     0.454111  
    9.1024910     0.453703
    9.1041353     0.3959 
    .
    .
    .
    8.1135641     0.2215

但是，我不知道为什么输入文件的第一列会变成big_data_fram中的第0列？

你能给点建议吗？

Answer 1

您在合并文件方面做得很好。您的问题在于您如何使用 read.table 读取文件，因为 read.table 假定如果缺少第一列名称，则第一列是行名。看这里：

> read.table(text=BBB, header=TRUE)
              pos    gpos         p1   ihh1       p2   ihh2    xpehh
8.1135641 1135641 1135640 0.36833300 639953 0.352273 512804 0.221500
8.1152035 1152035 1152040 0.00333333 651548 0.000000 540213 0.187389
8.1158202 1158202 1158200 0.35833300 646188 0.000000 540213 0.179129
8.1178735 1178735 1178740 0.01000000 654438 0.409091 486335 0.296880
8.1193344 1193344 1193340 0.00000000 651573 0.000000 497049 0.270699
8.1230464 1230464 1230460 0.37333300 631599 0.505682 482294 0.269701
> rownames(read.table(text=BBB, header=TRUE))
[1] "8.1135641" "8.1152035" "8.1158202" "8.1178735" "8.1193344" "8.1230464"

啊，看看 ?read.table 关于 row.names 的说法。 TLDR;通过将其设置为 NULL 来禁用它。

> read.table(text=BBB, row.names = NULL, header=TRUE)
  row.names     pos    gpos         p1   ihh1       p2   ihh2    xpehh
1 8.1135641 1135641 1135640 0.36833300 639953 0.352273 512804 0.221500
2 8.1152035 1152035 1152040 0.00333333 651548 0.000000 540213 0.187389
3 8.1158202 1158202 1158200 0.35833300 646188 0.000000 540213 0.179129
4 8.1178735 1178735 1178740 0.01000000 654438 0.409091 486335 0.296880
5 8.1193344 1193344 1193340 0.00000000 651573 0.000000 497049 0.270699
6 8.1230464 1230464 1230460 0.37333300 631599 0.505682 482294 0.269701
> rownames(read.table(text=BBB, row.names = NULL, header=TRUE))
[1] "1" "2" "3" "4" "5" "6"

您可以在这里看到，第一列方便地命名为 "row.names"。如果列名是事先固定的，您可以简单地提供一个带有 col.names 参数的名称向量来指定第一列的名称。

对于这些示例，我使用 text 参数从变量 BBB 中的字符串中读取文件内容；您必须将其替换为文件参数和文件名。

为什么我的第一列将成为数据框中的第 0 列？

why my first column will become column 0 in a dataframe?

r

dataframe

rbind