在 python 中用 beautifulsoup4 解析 html 信息

Question

我正在和一个朋友一起做一个 python 项目，我们想做一些东西来下载页面 http://projecteuler.net/problem 并得到你 select 的问题，然后打印出来像这样回来：

欧拉计划问题 7：第 100001 个素数通过列出前六个素数：2、3、5、7、11、13，我们可以看出第6个素数是13。

What is the 10 001st prime number?

例如，您可以执行类似 python script_name.py 7 的操作，它会打印出 ^^。

一直在尝试寻找多种方法来做到这一点，但我不明白 beautifulsoup4 是如何工作的。如果你有的话，解释一下脚本是如何工作的也会很有帮助

谢谢！

Answer 1

这应该可以帮助您入门：

import sys
import urllib2
from bs4 import BeautifulSoup

problem_url  = "https://projecteuler.net/problem={}".format(sys.argv[1])
problem_page = urllib2.urlopen(problem_url)
soup = BeautifulSoup(problem_page.read())

problem_text = soup.find("div", {"class": "problem_content"}).text
print problem_text

用法：

$ ./euler.py 4

输出：

A palindromic number reads the same both ways. The largest palindrome made from the product of two 2-digit numbers is 9009 = 91 × 99. Find the largest palindrome made from the product of two 3-digit numbers.

在 python 中用 beautifulsoup4 解析 html 信息

parsing html information with beautifulsoup4 in python

html

python

parsing