如何在 Python 中查找字符串并将其 return 输出到标准输出

Question

我越来越熟悉 Python 并且正在努力使用 BeautifulSoup、Python.

进行以下操作

期望值：

*如果下面脚本的输出包含字符串 5378，它应该将字符串出现的行通过电子邮件发送给我。

#! /usr/bin/env python

from bs4 import BeautifulSoup
from lxml import html
import urllib2,re

import codecs
import sys
streamWriter = codecs.lookup('utf-8')[-1]
sys.stdout = streamWriter(sys.stdout)

BASE_URL = "http://outlet.us.dell.com/ARBOnlineSales/Online/InventorySearch.aspx?c=us&cs=22&l=en&s=dfh&brandid=2201&fid=111162"

webpage = urllib2.urlopen(BASE_URL)
soup = BeautifulSoup(webpage.read(), "lxml")
findcolumn = soup.find("div", {"id": "itemheader-FN"})
name = findcolumn.text.strip()
print name

我尝试使用 findall(5378, name)，但它 returns 像这样清空大括号 []。

如果我尝试将它与 grep 一起使用，我会遇到 Unicode 问题。

$ python dell.py | grep 5378 Traceback (most recent call last): File "dell.py", line 18, in <module> print name UnicodeEncodeError: 'ascii' codec can't encode character u'\u201d' in position 817: ordinal not in range(128)

有人可以告诉我在这两种情况下我做错了什么吗？

Answer 1

函数findall（来自re模块）期望第一个参数是一个正则表达式，它是一个字符串，但你提供了一个整数。试试这个：

re.findall("5378", name)

当打印时，当它找到东西时输出 [u'5378'] 或当没有找到时输出 []。

我怀疑您想从数字中检索产品名称，这意味着您必须遍历 findcolumn 中的元素。我们可以在此处使用 re.search() 来检查元素文本中的单个匹配项。

for input_element in findcolumn.find_all("div"):
    name = unicode(input_element.text.strip())
    if re.search("5378", name) != None:
        print unicode(name)

至于 unicode 错误，有很多解决方案，具体取决于您的操作系统和配置：Reconfigure your system locale on Ubuntu or Encode your script output with .encode()/unicode().

如何在 Python 中查找字符串并将其 return 输出到标准输出

How to find string and return it to stdout in Python

python

beautifulsoup

python-2.7

python-unicode