将 .txt 文件组织到 R 中的数据框中

Question

我有一个看起来完全像这样的 .txt 文件：

ENVI ASCII Plot File [Sun Mar  5 00:06:04 2017]
Column 1: Band Number
Column 2: Mean: red_1 [Magenta] 20 points~~7
Column 3: Mean: red_2 [Red] 12 points~~2 
Column 4: Mean: red_3 [Green] 12 points~~3
Column 5: Mean: red_4 [Blue] 15 points~~4
Column 6: Mean: red_5 [Yellow] 20 points~~5
Column 7: Mean: red_6 [Cyan] 25 points~~6
Column 8: Mean: red_7 [Maroon] 16 points~~8
Column 9: Mean: red_8 [Sea Green] 6 points~~9
Column 10: Mean: red_9 [Purple] 12 points~~10
Column 11: Mean: red_10 [Coral] 6 points~~11
Column 12: Mean: bcs_1 [Aquamarine] 16 points~~12
Column 13: Mean: bcs_2 [Orchid] 16 points~~13
Column 14: Mean: bcs_3 [Sienna] 30 points~~14
Column 15: Mean: bcs_4 [Chartreuse] 16 points~~15
Column 16: Mean: bcs_5 [Thistle] 25 points~~16
Column 17: Mean: bcs_6 [Red1] 16 points~~17
Column 18: Mean: bcs_7 [Red2] 15 points~~18
Column 19: Mean: bcs_8 [Red3] 12 points~~19
Column 20: Mean: bcs_9 [Green1] 20 points~~20
Column 21: Mean: bcs_10 [Green2] 20 points~~21
1.000000  0.061581  0.078073  0.057892  0.065844  0.090056  0.088098     0.089036  0.077258  0.055721  0.124091  0.037674  0.040654  0.037246  0.049291  0.041737  0.052611  0.059882  0.057625  0.054079  0.053647
2.000000  0.042688  0.037923  0.045340  0.046383  0.046419  0.047063  0.053226  0.049161  0.028502  0.026902  0.057672  0.045742  0.028775  0.041979  0.038616  0.046102  0.053043  0.029172  0.045776  0.040539
3.000000  0.018434  0.036316  0.032751  0.024035  0.027343  0.027738  0.036514  0.014953  0.022183  0.034359  0.010836  0.014596  0.011336  0.014386  0.011091  0.016790  0.014971  0.016921  0.016966  0.019890
4.000000  0.018490  0.015526  0.018201  0.014678  0.016888  0.013276  0.024992  0.019930  0.014847  0.007780  0.018094  0.009815  0.006283  0.014529  0.012734  0.009747  0.011569  0.007291  0.013920  0.008032

我想制作一个数据框，其中每个 ROI（即 red_1、red_2、red_3 等）是一行，Band Number 值是列。这将涉及转置数据，我不知道该怎么做。最终数据框应如下所示：

ROI    Band_1    Band_2   Band_3   Band_4
Red_1  0.061581  0.042688 0.018434 0.018490
Red_2  0.078073. 0.037923 0.036316 0.018489 
... and so forth

到目前为止我有这个：

# create an index for the lines that are needed
txt[-1:-22] # removes all rows except data

# find lines with names of ROIs
rep_date_entries = grep("Mean:", txt)

任何有关如何转置值的线索都将不胜感激！

Answer 1

使用：

# reading the text file
txt <- readLines('name_of_file.txt')

# extract the columnnames from the text file
colnms <- sapply(strsplit(grep('^Column ', txt, value = TRUE),':'), function(i) trimws(tail(i,1)))
colnms <- sub('(\w+).*', '\1', colnms)

# reading the data lines into a dataframe with 'read.table'
# and use the 'col.names' parameter to assign the column names
dat <- read.table(text = txt, skip = 22, header = FALSE, col.names = colnms)

# reshape the data into the desired format
library(reshape2)
dat2 <- recast(dat, variable ~ paste0('Band_',Band), id.var = 'Band')
names(dat2)[1] <- 'ROI'

将给予：

> dat2
      ROI   Band_1   Band_2   Band_3   Band_4
1   red_1 0.061581 0.042688 0.018434 0.018490
2   red_2 0.078073 0.037923 0.036316 0.015526
3   red_3 0.057892 0.045340 0.032751 0.018201
4   red_4 0.065844 0.046383 0.024035 0.014678
5   red_5 0.090056 0.046419 0.027343 0.016888
6   red_6 0.088098 0.047063 0.027738 0.013276
7   red_7 0.089036 0.053226 0.036514 0.024992
8   red_8 0.077258 0.049161 0.014953 0.019930
9   red_9 0.055721 0.028502 0.022183 0.014847
10 red_10 0.124091 0.026902 0.034359 0.007780
11  bcs_1 0.037674 0.057672 0.010836 0.018094
12  bcs_2 0.040654 0.045742 0.014596 0.009815
13  bcs_3 0.037246 0.028775 0.011336 0.006283
14  bcs_4 0.049291 0.041979 0.014386 0.014529
15  bcs_5 0.041737 0.038616 0.011091 0.012734
16  bcs_6 0.052611 0.046102 0.016790 0.009747
17  bcs_7 0.059882 0.053043 0.014971 0.011569
18  bcs_8 0.057625 0.029172 0.016921 0.007291
19  bcs_9 0.054079 0.045776 0.016966 0.013920
20 bcs_10 0.053647 0.040539 0.019890 0.008032

重塑数据的最后一步也可以使用 data.table 包完成：

library(data.table)
dcast(melt(setDT(dat), id = 1, variable.name = 'ROI'), ROI ~ paste0('Band_',Band))

将 .txt 文件组织到 R 中的数据框中

Organize .txt file into data frame in R

transpose

r

dataframe