用于仅在括号内替换逗号的正则表达式
Regex for replacing commas only within brackets
我有一个成分数据集,每一行都是用逗号分隔的成分列表,例如:
燕麦 (24%)(轧制、麸皮)、椰子 (13%)(椰子、防腐剂 (220、223))、红糖、乳固体、金糖浆 (10%)、种子 (9%) (芝麻,向日葵),人造黄油(植物油,水,盐,乳化剂(471,大豆卵磷脂),抗氧化剂(307)),葡萄糖,牛奶巧克力化合物(5%)(糖,植物油,牛奶固体,Cocoa 粉末、乳化剂(大豆卵磷脂,492)、天然香料)、天然香料
我想解析文件以仅用分号替换括号内的逗号。括号内可以有任意数量的括号和任意数量的逗号。结果应如下所示:
燕麦 (24%)(轧制;麸皮)、椰子 (13%)(椰子;防腐剂 (220;223))、红糖、乳固体、金糖浆 (10%)、种子 (9%) (芝麻;向日葵),人造黄油(植物油;水;盐;乳化剂(471;大豆卵磷脂);抗氧化剂 (307)),葡萄糖,牛奶巧克力化合物 (5%)(糖;植物油;乳固体;Cocoa 粉末;乳化剂(大豆卵磷脂;492);天然香料),天然香料
我可以得到一些关于正则表达式的帮助来解决问题吗?提前谢谢你。
1) gsubfn 使用 gsubfn 无需复杂的正则表达式即可完成此操作。由点组成的正则表达式匹配单个字符。然后对于输入字符向量中的每个字符串,pre
函数将计数器 k
初始化为 0,然后对于每个匹配项 fun
是 运行,该字符通过x
参数。在 fun
中,计数器 k
每次遇到 (
时加 1,每次遇到 )
时减 1。如果计数器不为零并且遇到逗号,则返回分号来替换逗号;否则,返回输入的字符。这是矢量化的,也就是说,如果输入 s
是一个字符向量,其中每个分量都应该单独处理。
library(gsubfn)
p <- proto(k = 0,
pre = function(this) this$k <- 0,
fun = function(this, x) {
if (x == "(") this$k <- k + 1
if (x == ")") this$k <- k - 1
if (k && x == ",") ";" else x
})
gsubfn(".", p, s)
给予:
[1] "Oats (24%) (Rolled; Bran), Coconut (13%) (Coconut ; Preservative (220; 223)), Brown Sugar, Milk Solids, Golden Syrup (10%), Seeds (9%) (Sesame ; Sunflower), Margarine (Vegetable Oil; Water; Salt; Emulsifiers (471; Soy Lecithin); Antioxidant (307)), Glucose, Milk Choc Compound (5%) (Sugar; Vegetable Oil; Milk Solids; Cocoa Powder; Emulsifiers (Soy Lecithin; 492); Natural Flavour), Natural Flavour"
2) Base R Base R 解决方案是将输入拆分为单个字符,给出字符向量列表 L。然后对于每个组件,chars
, of L
创建一个计数器向量,k
,与 chars
的长度相同,表示到该点的 (
的数量减去 )
的数量那一点。然后用分号替换那些对应于非零 k
的逗号,并将 chars
转换回单个字符串。像 (1) 这适用于字符向量。
L <- strsplit(s, "")
sapply(L, function(chars) {
k <- cumsum((chars == "(") - (chars == ")"))
chars[k & chars == ","] <- ";"
paste(chars, collapse = "")
})
备注
输入字符串s如下
s <- "Oats (24%) (Rolled, Bran), Coconut (13%) (Coconut , Preservative (220, 223)), Brown Sugar, Milk Solids, Golden Syrup (10%), Seeds (9%) (Sesame , Sunflower), Margarine (Vegetable Oil, Water, Salt, Emulsifiers (471, Soy Lecithin), Antioxidant (307)), Glucose, Milk Choc Compound (5%) (Sugar, Vegetable Oil, Milk Solids, Cocoa Powder, Emulsifiers (Soy Lecithin, 492), Natural Flavour), Natural Flavour"
你可以用?R
赞。
i <- gregexpr("\(([^()]|(?R))*\)", s, perl=TRUE)
regmatches(s, i)[[1]] <- gsub(",", ";", regmatches(s, i)[[1]])
s
#[1] "Oats (24%) (Rolled; Bran), Coconut (13%) (Coconut ; Preservative (220; 223)), Brown Sugar, Milk Solids, Golden Syrup (10%), Seeds (9%) (Sesame ; Sunflower), Margarine (Vegetable Oil; Water; Salt; Emulsifiers (471; Soy Lecithin); Antioxidant (307)), Glucose, Milk Choc Compound (5%) (Sugar; Vegetable Oil; Milk Solids; Cocoa Powder; Emulsifiers (Soy Lecithin; 492); Natural Flavour), Natural Flavour"
其中 a(?R)z
是匹配一个或多个字母 a
后跟完全相同数量的字母 z
.
的递归
数据
s <- "Oats (24%) (Rolled, Bran), Coconut (13%) (Coconut , Preservative (220, 223)), Brown Sugar, Milk Solids, Golden Syrup (10%), Seeds (9%) (Sesame , Sunflower), Margarine (Vegetable Oil, Water, Salt, Emulsifiers (471, Soy Lecithin), Antioxidant (307)), Glucose, Milk Choc Compound (5%) (Sugar, Vegetable Oil, Milk Solids, Cocoa Powder, Emulsifiers (Soy Lecithin, 492), Natural Flavour), Natural Flavour"
我有一个成分数据集,每一行都是用逗号分隔的成分列表,例如:
燕麦 (24%)(轧制、麸皮)、椰子 (13%)(椰子、防腐剂 (220、223))、红糖、乳固体、金糖浆 (10%)、种子 (9%) (芝麻,向日葵),人造黄油(植物油,水,盐,乳化剂(471,大豆卵磷脂),抗氧化剂(307)),葡萄糖,牛奶巧克力化合物(5%)(糖,植物油,牛奶固体,Cocoa 粉末、乳化剂(大豆卵磷脂,492)、天然香料)、天然香料
我想解析文件以仅用分号替换括号内的逗号。括号内可以有任意数量的括号和任意数量的逗号。结果应如下所示:
燕麦 (24%)(轧制;麸皮)、椰子 (13%)(椰子;防腐剂 (220;223))、红糖、乳固体、金糖浆 (10%)、种子 (9%) (芝麻;向日葵),人造黄油(植物油;水;盐;乳化剂(471;大豆卵磷脂);抗氧化剂 (307)),葡萄糖,牛奶巧克力化合物 (5%)(糖;植物油;乳固体;Cocoa 粉末;乳化剂(大豆卵磷脂;492);天然香料),天然香料
我可以得到一些关于正则表达式的帮助来解决问题吗?提前谢谢你。
1) gsubfn 使用 gsubfn 无需复杂的正则表达式即可完成此操作。由点组成的正则表达式匹配单个字符。然后对于输入字符向量中的每个字符串,pre
函数将计数器 k
初始化为 0,然后对于每个匹配项 fun
是 运行,该字符通过x
参数。在 fun
中,计数器 k
每次遇到 (
时加 1,每次遇到 )
时减 1。如果计数器不为零并且遇到逗号,则返回分号来替换逗号;否则,返回输入的字符。这是矢量化的,也就是说,如果输入 s
是一个字符向量,其中每个分量都应该单独处理。
library(gsubfn)
p <- proto(k = 0,
pre = function(this) this$k <- 0,
fun = function(this, x) {
if (x == "(") this$k <- k + 1
if (x == ")") this$k <- k - 1
if (k && x == ",") ";" else x
})
gsubfn(".", p, s)
给予:
[1] "Oats (24%) (Rolled; Bran), Coconut (13%) (Coconut ; Preservative (220; 223)), Brown Sugar, Milk Solids, Golden Syrup (10%), Seeds (9%) (Sesame ; Sunflower), Margarine (Vegetable Oil; Water; Salt; Emulsifiers (471; Soy Lecithin); Antioxidant (307)), Glucose, Milk Choc Compound (5%) (Sugar; Vegetable Oil; Milk Solids; Cocoa Powder; Emulsifiers (Soy Lecithin; 492); Natural Flavour), Natural Flavour"
2) Base R Base R 解决方案是将输入拆分为单个字符,给出字符向量列表 L。然后对于每个组件,chars
, of L
创建一个计数器向量,k
,与 chars
的长度相同,表示到该点的 (
的数量减去 )
的数量那一点。然后用分号替换那些对应于非零 k
的逗号,并将 chars
转换回单个字符串。像 (1) 这适用于字符向量。
L <- strsplit(s, "")
sapply(L, function(chars) {
k <- cumsum((chars == "(") - (chars == ")"))
chars[k & chars == ","] <- ";"
paste(chars, collapse = "")
})
备注
输入字符串s如下
s <- "Oats (24%) (Rolled, Bran), Coconut (13%) (Coconut , Preservative (220, 223)), Brown Sugar, Milk Solids, Golden Syrup (10%), Seeds (9%) (Sesame , Sunflower), Margarine (Vegetable Oil, Water, Salt, Emulsifiers (471, Soy Lecithin), Antioxidant (307)), Glucose, Milk Choc Compound (5%) (Sugar, Vegetable Oil, Milk Solids, Cocoa Powder, Emulsifiers (Soy Lecithin, 492), Natural Flavour), Natural Flavour"
你可以用?R
赞。
i <- gregexpr("\(([^()]|(?R))*\)", s, perl=TRUE)
regmatches(s, i)[[1]] <- gsub(",", ";", regmatches(s, i)[[1]])
s
#[1] "Oats (24%) (Rolled; Bran), Coconut (13%) (Coconut ; Preservative (220; 223)), Brown Sugar, Milk Solids, Golden Syrup (10%), Seeds (9%) (Sesame ; Sunflower), Margarine (Vegetable Oil; Water; Salt; Emulsifiers (471; Soy Lecithin); Antioxidant (307)), Glucose, Milk Choc Compound (5%) (Sugar; Vegetable Oil; Milk Solids; Cocoa Powder; Emulsifiers (Soy Lecithin; 492); Natural Flavour), Natural Flavour"
其中 a(?R)z
是匹配一个或多个字母 a
后跟完全相同数量的字母 z
.
数据
s <- "Oats (24%) (Rolled, Bran), Coconut (13%) (Coconut , Preservative (220, 223)), Brown Sugar, Milk Solids, Golden Syrup (10%), Seeds (9%) (Sesame , Sunflower), Margarine (Vegetable Oil, Water, Salt, Emulsifiers (471, Soy Lecithin), Antioxidant (307)), Glucose, Milk Choc Compound (5%) (Sugar, Vegetable Oil, Milk Solids, Cocoa Powder, Emulsifiers (Soy Lecithin, 492), Natural Flavour), Natural Flavour"