在 R 编程中使用模式和表达式从文本文件中提取多个数据帧

Question

我有以下 XY1 系列的输入文件，开头有很多不必要的行。

Input file:
Unnecessay lines...
Unnecessay lines...
Unnecessay lines...
...........
...........

!Time Step
XY1 3 3 0 0
11908800 5
11912400   200
13737600 200

!Discharge
XY1 1 8 0 0
11908800    1840.593294 
11995200    1840.593294 !Day spin-up
12081600    1840.593294 !Day of simulation
12168000    2831.681991 !Day to ramp up flow
12254400    2831.681991 !Day of Simulation
12340800    4247.522986 !Day to ramp up flow
12427200    4247.522986 !Day of simulation
12513600    4247.522986 !+ 1-hour

!DS tailwater
XY1 2 8 0 0
11908800    103.0224
11995200    103.0224
12081600    103.0224
12168000    103.05288
12254400    103.05288
12340800    103.08336
12427200    103.08336
12513600    103.08336

!DS tailwater2
XY1 3 8 0 0
119088  103.0224
119520  90.0224
120800  115.0224
121000  103.05288
122400  110.05288
123800  103.08336
124200  101.08336
125600  105.08336
!ENDXY1

END

输入文件中可以有更多的XY1系列。我只想获得 XY1 系列下方的数据框，该行中带有“8”。我用过 grep("^XY1 \d 8", at) 但不知道如何使用循环。

Output df1 based on XY1 1 8 0 0:

Node        Value
11908800    1840.593294 
11995200    1840.593294
12081600    1840.593294 
12168000    2831.681991 
12254400    2831.681991 
12340800    4247.522986 
12427200    4247.522986 
12513600    4247.522986 

Output df2 based on XY1 2 8 0 0:
Node        Value
11908800    103.0224
11995200    103.0224
12081600    103.0224
12168000    103.05288
12254400    103.05288
12340800    103.08336
12427200    103.08336
12513600    103.08336 

Output df3 based on XY1 3 8 0 0:
Node    Value
119088  103.0224
119520  90.0224
120800  115.0224
121000  103.05288
122400  110.05288
123800  103.08336
124200  101.08336
125600  105.08336

非常感谢您的帮助。
我可以用这样的东西来获取行

rm(list=ls(all=TRUE))
dat <- readLines("D:/Shuvashish/R_adh/AR_20base_201214.bc" )
a=grep("^XY1 \d 8", dat)
b=grep("^!ENDXY1", dat)

df1 <- read.delim( text=dat[(a[1]+1):(a[2]-2)],sep = "",header = FALSE)
df1

如何在所有 XY1 系列的 for 或 while 循环中自动执行此过程，数据将位于不同的数据帧中，即 df1、df2 df3 等。谢谢。如果我想使用以下代码获取下一个 XY1 系列，即 XY1 2 8 0 0：

df2 <- read.delim( text=dat[(a[2]+1):(a[3]-2)],sep = "",header = FALSE)

它抛出一个错误，因为文本中没有第 3 个 XY1 系列，因为没有 a[3]，另一方面，a[-1] 只捕获最后一个 XY1 的开始如何获得结束线在这种情况下？我只是将 !ENDXY1 放在我可以使用 grep 获取的最后一个 XY1 系列之后：

b=grep("^!ENDXY1", data)

如果有 100 个 XY1 系列，我该如何编写带条件的 forloop。非常感谢您的帮助。

Answer 1

在基础 R 中你可以这样做：

a <- paste0(readLines("try.txt"), collapse = "\n")
b <- regmatches(a, gregexpr("(?s)XY1[^\n]+8.*?\n\K.*?\n[!]", a, perl = TRUE))[[1]]
lapply(b, function(x) 
     read.table(text = gsub("(?m)[!].*", '', x, perl = TRUE), 
                col.names = c("Node", "Value")))
[[1]]
      Node    Value
1 11908800 1840.593
2 11995200 1840.593
3 12081600 1840.593
4 12168000 2831.682
5 12254400 2831.682
6 12340800 4247.523
7 12427200 4247.523
8 12513600 4247.523

[[2]]
      Node    Value
1 11908800 103.0224
2 11995200 103.0224
3 12081600 103.0224
4 12168000 103.0529
5 12254400 103.0529
6 12340800 103.0834
7 12427200 103.0834
8 12513600 103.0834

[[3]]
    Node    Value
1 119088 103.0224
2 119520  90.0224
3 120800 115.0224
4 121000 103.0529
5 122400 110.0529
6 123800 103.0834
7 124200 101.0834
8 125600 105.0834

在 R 编程中使用模式和表达式从文本文件中提取多个数据帧

Extracting multiple dataframes from textfile using pattern and expression in R programming

for-loop

r

text-files

dataframe