在R（grep？）中的txt字符串中的模式后替换一些数值

Question

我有几个txt文件是模型的输入文件，我需要更改一些模型参数以便进行一些实验。但是，参数很多，手动更改它们有些耗时。我考虑过在 R 中使用 readLines() 和 {grep} 来搜索和替换参数值但不是很成功，希望有人能帮助我。谢谢。

该文件有这样的行：

Bubbling Pressure 1          = 0.3389 .4423 .4118
Field Capacity 1             = 0.35  0.38  0.37
Wilting Point 1              = 0.13  0.14  0.13
Bulk Density 1               = 750. 1400. 1500.
Vertical Conductivity 1      = 2.904e-06  3.63e-05  3.63e-05

.....

Bubbling Pressure 3          = 0.2044 0.2876 0.2876
Field Capacity 3             = 0.31  0.33  0.33
Wilting Point 3              = 0.13  0.14  0.14
Bulk Density 3               = 750. 1400. 1500.
Vertical Conductivity 3      = 3.16e-06  3.95e-05  3.95e-05

...

我想将所有垂直电导率参数加倍...但我不确定如何用科学记数法（例如“3.16e-06”）隔离这些数字。

有没有办法将包含模式 "vertical conductivity"

的行中的每个数字隔离开来

Vertical Conductivity 3      = 3.16e-06  3.95e-05  3.95e-05

然后将每个数字加倍？

Vertical Conductivity 3      = 6.32e-06  7.90e-05  7.90e-05

我已经设法使用 grep 来隔离包含模式 "Vertical Conductivity" 的每一行文本，但我不确定如何获取数值...

谢谢，尼克

Answer 1

您的数据不整齐，因此第一步是将其整理成有用的形式。 Hadley Wickham 的 tidyr 包中有您需要的工具，并且与他的 dplyr 包很好地结合在一起，可以让您将关心的变量加倍。

# read in data
df <- read.csv(text = 'Bubbling Pressure 1          = 0.3389 .4423 .4118
               Field Capacity 1             = 0.35  0.38  0.37
               Wilting Point 1              = 0.13  0.14  0.13
               Bulk Density 1               = 750. 1400. 1500.
               Vertical Conductivity 1      = 2.904e-06  3.63e-05  3.63e-05
               Bubbling Pressure 3          = 0.2044 0.2876 0.2876
               Field Capacity 3             = 0.31  0.33  0.33
               Wilting Point 3              = 0.13  0.14  0.14
               Bulk Density 3               = 750. 1400. 1500.
               Vertical Conductivity 3      = 3.16e-06  3.95e-05  3.95e-05', 
    sep = '=', header = FALSE, strip = TRUE)

现在整理：

library(tidyr)
library(dplyr)

       # separate variable from identifier
df %>% separate(V1, c('var', 'var_id'), sep = ' (?=.$)', convert = TRUE) %>% 
    # separate values for each variable
    separate(V2, 1:3, sep = ' +', convert = TRUE) %>% 
    # melt values to long form so there's one observation per row
    gather(val_id, val, -var:-var_id, convert = TRUE) %>% 
    # spread variables so each column is one variable
    spread(var, val) %>%
    # use data.frame to make names without spaces
    data.frame() %>%
    # use dplyr::mutate to double vertical conductivity as desired
    mutate(Vertical.Conductivity = Vertical.Conductivity * 2)

#   var_id val_id Bubbling.Pressure Bulk.Density Field.Capacity Vertical.Conductivity
# 1      1      1            0.3389          750           0.35             5.808e-06
# 2      1      2            0.4423         1400           0.38             7.260e-05
# 3      1      3            0.4118         1500           0.37             7.260e-05
# 4      3      1            0.2044          750           0.31             6.320e-06
# 5      3      2            0.2876         1400           0.33             7.900e-05
# 6      3      3            0.2876         1500           0.33             7.900e-05
#   Wilting.Point
# 1          0.13
# 2          0.14
# 3          0.13
# 4          0.13
# 5          0.14
# 6          0.14

Answer 2

我们可以使用 gsubfn 轻松完成此操作，而无需更改原始结构并将其修改为 OP 可能需要或不需要的内容。

在这里，我们正在使用 readLines 读取数据集，获取 'lines' 的索引，其中它有 'Vertical Conductivity' 子字符串和 grepl ('i1') .然后，使用 gsubfn 将这些值替换为它的两倍。

library(gsubfn)
i1 <- grepl("Vertical Conductivity", lines)
lines[i1] <- gsubfn("[0-9.]+e[-+][0-9]+", ~format(as.numeric(x)*2, 
                                           scientific = TRUE), lines[i1]) 
lines
#[1] "Bubbling Pressure 1          = 0.3389 .4423 .4118"           
#[2] "Field Capacity 1             = 0.35  0.38  0.37"             
#[3] "Wilting Point 1              = 0.13  0.14  0.13"             
#[4] "Bulk Density 1               = 750. 1400. 1500."             
#[5] "Vertical Conductivity 1      = 5.808e-06  7.26e-05  7.26e-05"
#[6] "Bubbling Pressure 3          = 0.2044 0.2876 0.2876"         
#[7] "Field Capacity 3             = 0.31  0.33  0.33"             
#[8] "Wilting Point 3              = 0.13  0.14  0.14"             
#[9] "Bulk Density 3               = 750. 1400. 1500."             
#[10] "Vertical Conductivity 3      = 6.32e-06  7.9e-05  7.9e-05"

数据

lines <- trimws(readLines(textConnection(
          'Bubbling Pressure 1          = 0.3389 .4423 .4118
           Field Capacity 1             = 0.35  0.38  0.37
           Wilting Point 1              = 0.13  0.14  0.13
           Bulk Density 1               = 750. 1400. 1500.
           Vertical Conductivity 1      = 2.904e-06  3.63e-05  3.63e-05
           Bubbling Pressure 3          = 0.2044 0.2876 0.2876
           Field Capacity 3             = 0.31  0.33  0.33
           Wilting Point 3              = 0.13  0.14  0.14
           Bulk Density 3               = 750. 1400. 1500.
           Vertical Conductivity 3      = 3.16e-06  3.95e-05  3.95e-05')))

我们也可以直接从文件中读取这个

lines <- readLines("yourfile.txt")

在R（grep？）中的txt字符串中的模式后替换一些数值

Replace some numeric value after a pattern in a string of txt in R (grep?)

string

text

r

gsub

数据