R read.table 用上面的值填充空数据

Question

我有一个格式糟糕的文本文件，我需要将其读入 R。我正在阅读一堆其他格式不可怕的 read.table 文件，所以我想继续如果可能，请使用此功能。

文件如下所示：

 M  D YY CONC
 7  1 78 15
        0.00
        0.15
        1.06
        1.21
       10.91
       34.55
       69.09
       87.27
       73.67
       38.65
       12.27
        2.27
        6.52
        0.45
        0.00
        0.00
        0.00
        0.00
        0.00
        0.00
        0.00
        0.00
        0.00
        0.00
        0.00
        0.00
        0.00
        0.19
        0.96
        4.59
        4.55
        4.59
        7.25
        7.13
       11.60
        1.06
        0.15
        1.50
        1.16
        0.00
        0.00
        0.00
        0.00
        0.00
  7  1 78 16
        0.00
        0.00
        0.00
        0.00
        7.25
        1.50
        9.00
       20.25
       51.25
       55.00
       53.75
        3.13
        0.00
        0.00
        0.00
        0.00
        0.00
        0.00
        0.00
        0.00
        0.00
        0.00
        0.00
        0.00
        0.00
        0.00
        0.00
        0.00
        0.80
        0.98
        4.00
        2.47
        5.63
        3.50
        7.88
        0.43
        2.30
        0.00
        0.00
        0.00
        0.00
        0.00
        0.00
        0.00
  7  1 78 17
        4.15
        0.00
        0.00
        0.15
        2.27
       16.36
       54.37
       67.96
       58.07
        3.58
        0.89
        0.20
        0.52
        0.59
        0.00
        0.00
        0.00
        0.00
        0.00
        0.00
        0.00
        0.00
        0.00
        5.44
        0.00
        3.09
        3.26
        7.17
        9.39
        8.65
        3.09
        0.45
        7.41
        3.18
        0.00
        2.05
        0.00

在第一行提供的日期每小时有一次 CONC。我的最终目标是让日期重复并为小时添加一列。所以第一位应该是这样的：

 M  D YY H CONC
 7  1 78 1 15
 7  1 78 2 0.00
 7  1 78 3 0.15
 7  1 78 4 1.06
 7  1 78 5 1.21
 7  1 78 6 10.91
 7  1 78 7 34.55
 7  1 78 8 69.09

我可以用这个读入文件：

monitor_datai <- read.table(file =file,header = TRUE, stringsAsFactors = FALSE, skip = 0, sep = "", fill = TRUE)

但该方法的问题在于，数据读取时会用月份（如果在该行提供）或浓度（如果没有为该行提供月份）填充第一列。看起来像这样：

head(monitor_datai)
     V1 V2 V3 V4
1  7.00  1 78 15
2  0.00 NA NA NA
3  0.15 NA NA NA
4  1.06 NA NA NA
5  1.21 NA NA NA
6 10.91 NA NA NA

所以，我需要帮助读取文件并修复格式。

谢谢！

Answer 1

这是我的方法，使用 data.table-package

的武器

我不确定 H 的值应该变成什么...只是 1:128，按组排序，还是...？请具体说明，我会把它添加到答案中..

我在下面的代码中包含了注释和结果之间的内容，因此您（希望）可以按照这些步骤并根据需要调整 if/where

library( data.table )
#read the file as-is, complete lines, no separator
DT <- fread( "./temp/testfile.txt", sep = "", skip = 1, header = FALSE )
# head(DT)
#            V1
# 1: 7  1 78 15
# 2:       0.00
# 3:       0.15
# 4:       1.06
# 5:       1.21
# 6:      10.91

#get column names from the file, store in a vector
colnames = names( fread( "./temp/testfile.txt", sep = " ", nrows = 1, header = TRUE ) )
#split the rows with a space in them to the for desired columns, 
#   use a space (or multiple in a row) as separator
DT[ grepl(" ", V1), (colnames) := tstrsplit( V1, "[ ]+", perl = TRUE ) ]
#              V1    M    D   YY CONC
#   1: 7  1 78 15    7    1   78   15
#   2:       0.00 <NA> <NA> <NA> <NA>
#   3:       0.15 <NA> <NA> <NA> <NA>
#   4:       1.06 <NA> <NA> <NA> <NA>
#   5:       1.21 <NA> <NA> <NA> <NA>
# ---                               
# 124:       7.41 <NA> <NA> <NA> <NA>
# 125:       3.18 <NA> <NA> <NA> <NA>
# 126:       0.00 <NA> <NA> <NA> <NA>
# 127:       2.05 <NA> <NA> <NA> <NA>
# 128:       0.00 <NA> <NA> <NA> <NA>

#where CONC is.na, copy the value of V1
DT[ is.na( CONC ), CONC := V1 ]
#              V1    M    D   YY CONC
#   1: 7  1 78 15    7    1   78   15
#   2:       0.00 <NA> <NA> <NA> 0.00
#   3:       0.15 <NA> <NA> <NA> 0.15
#   4:       1.06 <NA> <NA> <NA> 1.06
#   5:       1.21 <NA> <NA> <NA> 1.21
# ---                               
# 124:       7.41 <NA> <NA> <NA> 7.41
# 125:       3.18 <NA> <NA> <NA> 3.18
# 126:       0.00 <NA> <NA> <NA> 0.00
# 127:       2.05 <NA> <NA> <NA> 2.05
# 128:       0.00 <NA> <NA> <NA> 0.00

#now we can drop the V1-column
DT[, V1 := NULL]
#set all columns to the right (numeric) type
DT[, (names(DT)) := lapply( .SD, as.numeric ) ]

#and fill down the missing values of M, D and YY
setnafill( DT, type = "locf", cols = c("M", "D", "YY") )

#      M D YY  CONC
#   1: 7 1 78 15.00
#   2: 7 1 78  0.00
#   3: 7 1 78  0.15
#   4: 7 1 78  1.06
#   5: 7 1 78  1.21
# ---             
# 124: 7 1 78  7.41
# 125: 7 1 78  3.18
# 126: 7 1 78  0.00
# 127: 7 1 78  2.05
# 128: 7 1 78  0.00

R read.table 用上面的值填充空数据

R read.table fill empty data with value above

r

read.table