在R(grep?)中的txt字符串中的模式后替换一些数值
Replace some numeric value after a pattern in a string of txt in R (grep?)
我有几个txt文件是模型的输入文件,我需要更改一些模型参数以便进行一些实验。但是,参数很多,手动更改它们有些耗时。我考虑过在 R 中使用 readLines() 和 {grep} 来搜索和替换参数值但不是很成功,希望有人能帮助我。谢谢。
该文件有这样的行:
Bubbling Pressure 1 = 0.3389 .4423 .4118
Field Capacity 1 = 0.35 0.38 0.37
Wilting Point 1 = 0.13 0.14 0.13
Bulk Density 1 = 750. 1400. 1500.
Vertical Conductivity 1 = 2.904e-06 3.63e-05 3.63e-05
.....
Bubbling Pressure 3 = 0.2044 0.2876 0.2876
Field Capacity 3 = 0.31 0.33 0.33
Wilting Point 3 = 0.13 0.14 0.14
Bulk Density 3 = 750. 1400. 1500.
Vertical Conductivity 3 = 3.16e-06 3.95e-05 3.95e-05
...
我想将所有垂直电导率参数加倍...但我不确定如何用科学记数法(例如“3.16e-06”)隔离这些数字。
有没有办法将包含模式 "vertical conductivity"
的行中的每个数字隔离开来
Vertical Conductivity 3 = 3.16e-06 3.95e-05 3.95e-05
然后将每个数字加倍?
Vertical Conductivity 3 = 6.32e-06 7.90e-05 7.90e-05
我已经设法使用 grep 来隔离包含模式 "Vertical Conductivity" 的每一行文本,但我不确定如何获取数值...
谢谢,
尼克
您的数据不整齐,因此第一步是将其整理成有用的形式。 Hadley Wickham 的 tidyr
包中有您需要的工具,并且与他的 dplyr
包很好地结合在一起,可以让您将关心的变量加倍。
# read in data
df <- read.csv(text = 'Bubbling Pressure 1 = 0.3389 .4423 .4118
Field Capacity 1 = 0.35 0.38 0.37
Wilting Point 1 = 0.13 0.14 0.13
Bulk Density 1 = 750. 1400. 1500.
Vertical Conductivity 1 = 2.904e-06 3.63e-05 3.63e-05
Bubbling Pressure 3 = 0.2044 0.2876 0.2876
Field Capacity 3 = 0.31 0.33 0.33
Wilting Point 3 = 0.13 0.14 0.14
Bulk Density 3 = 750. 1400. 1500.
Vertical Conductivity 3 = 3.16e-06 3.95e-05 3.95e-05',
sep = '=', header = FALSE, strip = TRUE)
现在整理:
library(tidyr)
library(dplyr)
# separate variable from identifier
df %>% separate(V1, c('var', 'var_id'), sep = ' (?=.$)', convert = TRUE) %>%
# separate values for each variable
separate(V2, 1:3, sep = ' +', convert = TRUE) %>%
# melt values to long form so there's one observation per row
gather(val_id, val, -var:-var_id, convert = TRUE) %>%
# spread variables so each column is one variable
spread(var, val) %>%
# use data.frame to make names without spaces
data.frame() %>%
# use dplyr::mutate to double vertical conductivity as desired
mutate(Vertical.Conductivity = Vertical.Conductivity * 2)
# var_id val_id Bubbling.Pressure Bulk.Density Field.Capacity Vertical.Conductivity
# 1 1 1 0.3389 750 0.35 5.808e-06
# 2 1 2 0.4423 1400 0.38 7.260e-05
# 3 1 3 0.4118 1500 0.37 7.260e-05
# 4 3 1 0.2044 750 0.31 6.320e-06
# 5 3 2 0.2876 1400 0.33 7.900e-05
# 6 3 3 0.2876 1500 0.33 7.900e-05
# Wilting.Point
# 1 0.13
# 2 0.14
# 3 0.13
# 4 0.13
# 5 0.14
# 6 0.14
我们可以使用 gsubfn
轻松完成此操作,而无需更改原始结构并将其修改为 OP 可能需要或不需要的内容。
在这里,我们正在使用 readLines
读取数据集,获取 'lines' 的索引,其中它有 'Vertical Conductivity' 子字符串和 grepl
('i1') .然后,使用 gsubfn
将这些值替换为它的两倍。
library(gsubfn)
i1 <- grepl("Vertical Conductivity", lines)
lines[i1] <- gsubfn("[0-9.]+e[-+][0-9]+", ~format(as.numeric(x)*2,
scientific = TRUE), lines[i1])
lines
#[1] "Bubbling Pressure 1 = 0.3389 .4423 .4118"
#[2] "Field Capacity 1 = 0.35 0.38 0.37"
#[3] "Wilting Point 1 = 0.13 0.14 0.13"
#[4] "Bulk Density 1 = 750. 1400. 1500."
#[5] "Vertical Conductivity 1 = 5.808e-06 7.26e-05 7.26e-05"
#[6] "Bubbling Pressure 3 = 0.2044 0.2876 0.2876"
#[7] "Field Capacity 3 = 0.31 0.33 0.33"
#[8] "Wilting Point 3 = 0.13 0.14 0.14"
#[9] "Bulk Density 3 = 750. 1400. 1500."
#[10] "Vertical Conductivity 3 = 6.32e-06 7.9e-05 7.9e-05"
数据
lines <- trimws(readLines(textConnection(
'Bubbling Pressure 1 = 0.3389 .4423 .4118
Field Capacity 1 = 0.35 0.38 0.37
Wilting Point 1 = 0.13 0.14 0.13
Bulk Density 1 = 750. 1400. 1500.
Vertical Conductivity 1 = 2.904e-06 3.63e-05 3.63e-05
Bubbling Pressure 3 = 0.2044 0.2876 0.2876
Field Capacity 3 = 0.31 0.33 0.33
Wilting Point 3 = 0.13 0.14 0.14
Bulk Density 3 = 750. 1400. 1500.
Vertical Conductivity 3 = 3.16e-06 3.95e-05 3.95e-05')))
我们也可以直接从文件中读取这个
lines <- readLines("yourfile.txt")
我有几个txt文件是模型的输入文件,我需要更改一些模型参数以便进行一些实验。但是,参数很多,手动更改它们有些耗时。我考虑过在 R 中使用 readLines() 和 {grep} 来搜索和替换参数值但不是很成功,希望有人能帮助我。谢谢。
该文件有这样的行:
Bubbling Pressure 1 = 0.3389 .4423 .4118
Field Capacity 1 = 0.35 0.38 0.37
Wilting Point 1 = 0.13 0.14 0.13
Bulk Density 1 = 750. 1400. 1500.
Vertical Conductivity 1 = 2.904e-06 3.63e-05 3.63e-05
.....
Bubbling Pressure 3 = 0.2044 0.2876 0.2876
Field Capacity 3 = 0.31 0.33 0.33
Wilting Point 3 = 0.13 0.14 0.14
Bulk Density 3 = 750. 1400. 1500.
Vertical Conductivity 3 = 3.16e-06 3.95e-05 3.95e-05
...
我想将所有垂直电导率参数加倍...但我不确定如何用科学记数法(例如“3.16e-06”)隔离这些数字。
有没有办法将包含模式 "vertical conductivity"
的行中的每个数字隔离开来Vertical Conductivity 3 = 3.16e-06 3.95e-05 3.95e-05
然后将每个数字加倍?
Vertical Conductivity 3 = 6.32e-06 7.90e-05 7.90e-05
我已经设法使用 grep 来隔离包含模式 "Vertical Conductivity" 的每一行文本,但我不确定如何获取数值...
谢谢, 尼克
您的数据不整齐,因此第一步是将其整理成有用的形式。 Hadley Wickham 的 tidyr
包中有您需要的工具,并且与他的 dplyr
包很好地结合在一起,可以让您将关心的变量加倍。
# read in data
df <- read.csv(text = 'Bubbling Pressure 1 = 0.3389 .4423 .4118
Field Capacity 1 = 0.35 0.38 0.37
Wilting Point 1 = 0.13 0.14 0.13
Bulk Density 1 = 750. 1400. 1500.
Vertical Conductivity 1 = 2.904e-06 3.63e-05 3.63e-05
Bubbling Pressure 3 = 0.2044 0.2876 0.2876
Field Capacity 3 = 0.31 0.33 0.33
Wilting Point 3 = 0.13 0.14 0.14
Bulk Density 3 = 750. 1400. 1500.
Vertical Conductivity 3 = 3.16e-06 3.95e-05 3.95e-05',
sep = '=', header = FALSE, strip = TRUE)
现在整理:
library(tidyr)
library(dplyr)
# separate variable from identifier
df %>% separate(V1, c('var', 'var_id'), sep = ' (?=.$)', convert = TRUE) %>%
# separate values for each variable
separate(V2, 1:3, sep = ' +', convert = TRUE) %>%
# melt values to long form so there's one observation per row
gather(val_id, val, -var:-var_id, convert = TRUE) %>%
# spread variables so each column is one variable
spread(var, val) %>%
# use data.frame to make names without spaces
data.frame() %>%
# use dplyr::mutate to double vertical conductivity as desired
mutate(Vertical.Conductivity = Vertical.Conductivity * 2)
# var_id val_id Bubbling.Pressure Bulk.Density Field.Capacity Vertical.Conductivity
# 1 1 1 0.3389 750 0.35 5.808e-06
# 2 1 2 0.4423 1400 0.38 7.260e-05
# 3 1 3 0.4118 1500 0.37 7.260e-05
# 4 3 1 0.2044 750 0.31 6.320e-06
# 5 3 2 0.2876 1400 0.33 7.900e-05
# 6 3 3 0.2876 1500 0.33 7.900e-05
# Wilting.Point
# 1 0.13
# 2 0.14
# 3 0.13
# 4 0.13
# 5 0.14
# 6 0.14
我们可以使用 gsubfn
轻松完成此操作,而无需更改原始结构并将其修改为 OP 可能需要或不需要的内容。
在这里,我们正在使用 readLines
读取数据集,获取 'lines' 的索引,其中它有 'Vertical Conductivity' 子字符串和 grepl
('i1') .然后,使用 gsubfn
将这些值替换为它的两倍。
library(gsubfn)
i1 <- grepl("Vertical Conductivity", lines)
lines[i1] <- gsubfn("[0-9.]+e[-+][0-9]+", ~format(as.numeric(x)*2,
scientific = TRUE), lines[i1])
lines
#[1] "Bubbling Pressure 1 = 0.3389 .4423 .4118"
#[2] "Field Capacity 1 = 0.35 0.38 0.37"
#[3] "Wilting Point 1 = 0.13 0.14 0.13"
#[4] "Bulk Density 1 = 750. 1400. 1500."
#[5] "Vertical Conductivity 1 = 5.808e-06 7.26e-05 7.26e-05"
#[6] "Bubbling Pressure 3 = 0.2044 0.2876 0.2876"
#[7] "Field Capacity 3 = 0.31 0.33 0.33"
#[8] "Wilting Point 3 = 0.13 0.14 0.14"
#[9] "Bulk Density 3 = 750. 1400. 1500."
#[10] "Vertical Conductivity 3 = 6.32e-06 7.9e-05 7.9e-05"
数据
lines <- trimws(readLines(textConnection(
'Bubbling Pressure 1 = 0.3389 .4423 .4118
Field Capacity 1 = 0.35 0.38 0.37
Wilting Point 1 = 0.13 0.14 0.13
Bulk Density 1 = 750. 1400. 1500.
Vertical Conductivity 1 = 2.904e-06 3.63e-05 3.63e-05
Bubbling Pressure 3 = 0.2044 0.2876 0.2876
Field Capacity 3 = 0.31 0.33 0.33
Wilting Point 3 = 0.13 0.14 0.14
Bulk Density 3 = 750. 1400. 1500.
Vertical Conductivity 3 = 3.16e-06 3.95e-05 3.95e-05')))
我们也可以直接从文件中读取这个
lines <- readLines("yourfile.txt")