Python中re.compile(r[\w']+")的含义

Question

我是 python 的新手，正在尝试处理大数据代码，但无法理解表达式 re.compile(r"[\w']+" ) means.Anyone 对此有什么想法吗？

这是我正在使用的代码。

from mrjob.job import MRJob
import re

WORD_REGEXP = re.compile(r"[\w']+")

class MRWordFrequencyCount(MRJob):

    def mapper(self, _, line):
        words = WORD_REGEXP.findall(line)
        for word in words:
            yield word.lower(), 1

    def reducer(self, key, values):
        yield key, sum(values)


if __name__ == '__main__':
    MRWordFrequencyCount.run()

Answer 1

这是一个为更快重用而编译的正则表达式（在这个问题中解释：Is it worth using re.compile). The command re.compile is explained in the Python docs。

关于特定的正则表达式，这将搜索具有 1 或更长的字母数字（即 \w 部分）或撇号（也在那些方括号中）的组。请注意，空格不是匹配项，所以一般来说，这会将一行分成单词。

请参阅 the query in a Python specific regex tester to try it out or on regex101，其中提供了对任何正则表达式的解释。

在短语 How's it going $# 中，这将如何匹配三个：How's、it、going 但不会匹配符号组。

有很多教程，甚至还有一些游戏，但您可以从 regexone 开始，通过尝试一些来更好地理解它。

Answer 2

在 re.compile('\W') 的帮助下，我们可以从字符串中删除特殊字符。

示例：

str = 'how many $ amount spend for Car??'
pattern = re.compile('\W')
x = re.sub(pattern, ' ', str)
print(x)

结果：

how many amount spend for Car

注：特殊包机"$"和"?"从字符串中去掉。

Python中re.compile(r[\w']+")的含义

Meaning of re.compile(r"[\w']+") in Python

python

mapreduce

bigdata

python-2.7