Python 正则表达式 - 删除所有“.”和除小数点以外的特殊字符
Python Regex - remove all "." and special characters EXCEPT the decimal point
我有些句子有多个“.”。
如何删除所有特殊字符和“.”数据中除小数点外?
输入示例为
What? The Census Says It’s Counted 99.9 Percent of Households. Don’t Be Fooled.
我想删除所有“.” s 和除小数点以外的特殊字符'.'
输出应该像
What The Census Says Its Counted 99.9 Percent of Households Dont Be Fooled
我试过了,
regex = re.compile('[^ (\w+\.\w+)0-9a-zA-Z]+')
regex.sub('', test)
但输出是
What The Census Says Its Counted 99.9 Percent of Households. Dont Be Fooled.
使用捕获组只捕获十进制数字,同时匹配特殊字符(即不是 space 和单词字符)。
替换后,只需引用捕获组,以便仅使用捕获的字符。 IE。整个匹配项将被删除并替换为十进制数(如果存在)。
s = 'What? The Census Says It’s Counted 99.9 Percent of Households. Don’t Be Fooled.'
import re
rgx = re.compile(r'(\d\.\d)|[^\s\w]')
rgx.sub(lambda x: x.group(1), s)
# 'What The Census Says Its Counted 99.9 Percent of Households Dont Be Fooled'
或
匹配除数字之间存在的点以外的所有点和除特殊字符外的所有字符,然后最后将这些匹配字符替换为空字符串。
re.sub(r'(?!<\d)\.(?!\d)|[^\s\w.]', '', s)
# 'What The Census Says Its Counted 99.9 Percent of Households Dont Be Fooled'
您需要以下正则表达式:
[^ 0-9a-zA-Z](?!(?<=\d\.)\d)
或者,如果您需要一个完全支持 Unicode 的正则表达式:
(?:_|[^\s\w])(?!(?<=\d\.)\d)
参见regex demo。 详情:
[^ 0-9a-zA-Z]
- 除了 space、ASCII 字母或数字 之外的任何一个字符
(?:_|[^\s\w])
- _
字符或除 whitespace 和单词 char 以外的任何一个字符
(?!(?<=\d\.)\d)
- 如果在当前位置右侧紧邻数字和一个点之前有一个数字,则匹配失败的否定前瞻。
看到一个Python demo:
import re
s = 'What? The Census Says It’s Counted 99.9 Percent of Households. Don’t Be Fooled.'
print(re.sub(r'[^ 0-9a-zA-Z](?!(?<=\d\.)\d)', '', s))
# => What The Census Says Its Counted 99.9 Percent of Households Dont Be Fooled
print(re.sub(r'(?:_|[^\s\w])(?!(?<=\d\.)\d)', '', s))
# => What The Census Says Its Counted 99.9 Percent of Households Dont Be Fooled
我有些句子有多个“.”。
如何删除所有特殊字符和“.”数据中除小数点外?
输入示例为
What? The Census Says It’s Counted 99.9 Percent of Households. Don’t Be Fooled.
我想删除所有“.” s 和除小数点以外的特殊字符'.'
输出应该像
What The Census Says Its Counted 99.9 Percent of Households Dont Be Fooled
我试过了,
regex = re.compile('[^ (\w+\.\w+)0-9a-zA-Z]+')
regex.sub('', test)
但输出是
What The Census Says Its Counted 99.9 Percent of Households. Dont Be Fooled.
使用捕获组只捕获十进制数字,同时匹配特殊字符(即不是 space 和单词字符)。
替换后,只需引用捕获组,以便仅使用捕获的字符。 IE。整个匹配项将被删除并替换为十进制数(如果存在)。
s = 'What? The Census Says It’s Counted 99.9 Percent of Households. Don’t Be Fooled.'
import re
rgx = re.compile(r'(\d\.\d)|[^\s\w]')
rgx.sub(lambda x: x.group(1), s)
# 'What The Census Says Its Counted 99.9 Percent of Households Dont Be Fooled'
或
匹配除数字之间存在的点以外的所有点和除特殊字符外的所有字符,然后最后将这些匹配字符替换为空字符串。
re.sub(r'(?!<\d)\.(?!\d)|[^\s\w.]', '', s)
# 'What The Census Says Its Counted 99.9 Percent of Households Dont Be Fooled'
您需要以下正则表达式:
[^ 0-9a-zA-Z](?!(?<=\d\.)\d)
或者,如果您需要一个完全支持 Unicode 的正则表达式:
(?:_|[^\s\w])(?!(?<=\d\.)\d)
参见regex demo。 详情:
[^ 0-9a-zA-Z]
- 除了 space、ASCII 字母或数字 之外的任何一个字符
(?:_|[^\s\w])
-_
字符或除 whitespace 和单词 char 以外的任何一个字符
(?!(?<=\d\.)\d)
- 如果在当前位置右侧紧邻数字和一个点之前有一个数字,则匹配失败的否定前瞻。
看到一个Python demo:
import re
s = 'What? The Census Says It’s Counted 99.9 Percent of Households. Don’t Be Fooled.'
print(re.sub(r'[^ 0-9a-zA-Z](?!(?<=\d\.)\d)', '', s))
# => What The Census Says Its Counted 99.9 Percent of Households Dont Be Fooled
print(re.sub(r'(?:_|[^\s\w])(?!(?<=\d\.)\d)', '', s))
# => What The Census Says Its Counted 99.9 Percent of Households Dont Be Fooled