Easy: Why am I getting "TypeError: 'int' object is not callable" ? (see line 24 in code)

Question

我知道我在做一些愚蠢的事情，但我想不通。比我聪明的人可以告诉我出了什么问题吗？谢谢你。该脚本应该打开一个 URL，获取 HTML，应用正则表达式获取感兴趣的内容，然后将内容存储在文件中并重复。

from selenium import selenium
import unittest, time, re, csv, string, logging, codecs

class Untitled(unittest.TestCase):
    def setUp(self):
        self.verificationErrors = []
        self.selenium = selenium("localhost", 4444, "*firefox", "http://www.baseurl.com")
        self.selenium.start()
        self.selenium.set_timeout("60000")

    def test_untitled(self):
        sel = self.selenium
        spamReader = csv.reader(open('urlExtentions.csv', 'rb'))
        for row in spamReader:
            try:
                sel.open(row[0])
            except Exception, e:
                ofile = open('outputTest.csv', 'ab')
                ofile.write("error on %s: %s" % (row[0],e))
            else:
                time.sleep(5)
                htmlSource = sel.get_html_source()
                htmlSource2 = htmlSource.encode('utf-8')

    ##Next line throws "TypeError: 'int' object is not callable"

                bodyText = re.DOTALL('<h3>.*?<footer>', htmlSource2)

                ofile = open('output.txt', 'ab')
                ofile.write(bodyText.encode('utf-8') + '\n')
            ofile.close()

    def tearDown(self):
        self.selenium.stop()
        self.assertEqual([], self.verificationErrors)

if __name__ == "__main__":
     unittest.main()

Answer 1

re.DOTALL 是 re 模块中的常量。它不是函数，您不能调用它。它旨在用作 flags 模块函数的 flags 参数中的标志。

如果您想搜索正则表达式，请使用：

bodyText = re.search('<h3>.*?<footer>', htmlSource2, flags=re.DOTALL)

re.search() returns a MatchObject 所以你可能想要得到匹配的文本：

bodyText = bodyText.group()

请注意，您已经将 HTML 编码为 UTF-8：

htmlSource2 = htmlSource.encode('utf-8')

所以你不想再这样做:

ofile.write(bodyText.encode('utf-8') + '\n')

删除那里的 .encode() 调用。

请注意，与其使用正则表达式，不如在这里使用适当的 HTML 解析器。例如，BeautifulSoup 将是一个很好的选择。

Easy: Why am I getting "TypeError: 'int' object is not callable" ? (see line 24 in code)

Easy: Why am I getting "TypeError: 'int' object is not callable" ? (see line 24 in code)

html

python

regex

selenium