删除Python中多个条件下带括号的内容

Remove content with parentheses under multiple conditions in Python

给定一个列表如下:

l = ['hydrogenated benzene (purity: 99.9 density (g/cm3), produced in ZB): SD', 
    'Car board price (tax included): JT Port', 
    'Ex-factory price (low-end price): Triethanolamine (85% commercial grade): North'
    ]

我想得到预期的结果如下:

['hydrogenated benzene: SD', 'Car board price: JT Port', 'Ex-factory price: Triethanolamine: North']

代码如下:

def remove_extra(content):
    pat1 = '[\s]'  # remove space
    pat2 = '\(.*\)' # remove content within parentheses
    combined_pat = r'|'.join((pat2, pat3))
    return re.sub(combined_pat, '', str(content))
[remove_extra(item) for item in l]

它生成:

['hydrogenated benzene : SD',
 'Car board price : JT Port',
 'Ex-factory price : North']

如您所见,结果 'Ex-factory price : North' 的最后一个元素与预期不符,我如何才能达到我的需要?谢谢。

内括号使它变得复杂。您在此处看到的解决方案适用于您的示例,但可能不适用于您的整个数据集。如果您遇到错误,请更新问题,以便我们找到解决方案。

此函数首先计算字符串中有多少个单独的括号,然后将其删除。

def par_remover(st):
    begin = [ i.start() for i in re.finditer('\(', st)]
    end = [ i.start() for i in re.finditer('\)', st)]
    count = len(list(re.finditer('\(', st))) +1 - len([i for i in begin if i < end[0]])
    for i in range(count):
        begin = [ i.start() for i in re.finditer('\(', st)]
        end = [ i.start() for i in re.finditer('\)', st)]
        end1 = len([i for i in begin if i < end[0]])
        str_remove = st[st.find("("):list(re.finditer('\)', st))[end1-1].end()]
        st = st.replace(str_remove,'')
    return(st.replace(')',''))

df = pd.DataFrame({'value':l})

df['value'] = df['value'].apply(lambda st:par_remover(st))

结果:

|    | value                                      |
|---:|:-------------------------------------------|
|  0 | hydrogenated benzene : SD                  |
|  1 | Car board price : JT Port                  |
|  2 | Ex-factory price : Triethanolamine : North |

问题实际上不是您的第 3 项,而是第一项,因为有嵌套的括号。你应该像这样做一个循环并使用 subn 而不是 sub

def remove_text_between_parens(text):
    n = 1
    while n:
        text, n = re.subn(r'\s*\([^()]*\)\s*', '', text)
    return text
>>> [remove_text_between_parens(t) for t in l]
['hydrogenated benzene: SD',
 'Car board price: JT Port',
 'Ex-factory price: Triethanolamine: North']

正确的解释在这里:

您可以使用 \s* 修改链接解决方案以删除 (:

之前的可选空格
# 
def remove_text_between_parens(text):
    n = 1  # run at least once
    while n:
        text, n = re.subn(r'\s*\([^()]*\)', '', text) #remove non-nested/flat balanced parts
    return text

a = [remove_text_between_parens(item) for item in l]
print (a)

['hydrogenated benzene: SD', 
 'Car board price: JT Port', 
 'Ex-factory price: Triethanolamine: North']