如何在python中读取url，然后在网站上打印每个URL？

Question

我想弄清楚如何只读取来自网站的 url 的每一行，每次我运行我收到错误代码：

AttributeError: module 'urllib' has no attribute 'urlopen'

我的代码如下

import os
import subprocess
import urllib

datasource = urllib.urlopen("www.google.com")

while 1:
        line = datasource.readline()
        if line == "": break
        if (line.find("www") > -1) :
                print (line)


li = ['www.apple.com', 'www.google.com']
os.chdir('..')
os.chdir('..')
os.chdir('..')
os.chdir('Program Files (x86)\LinkChecker')

for s in li:
    os.system('Start .\linkchecker ' + s)

Answer 1

似乎是python3X，所以你应该使用

urllib.request.urlopen

Answer 2

这是一个非常简单的例子。

这适用于 Python 3.2 及更高版本。

import urllib.request
with urllib.request.urlopen("http://www.apple.com") as url:
    r = url.read()
print(r)

作为参考，完成这个问题。 Urlopen attribute error.

Answer 3

AttributeError 是因为它应该是 urllib.request.urlopen 而不是 urllib.urlopen。

除了问题中提到的 AttributeError 之外，我还遇到了 2 个错误。

ValueError：未知url类型：'www.google.com'

解决方案：重写定义datasource的行如下，其中包含https部分：

datasource = urllib.request.urlopen("https://www.google.com")
TypeError: 需要类似字节的对象，而不是 'str' 在行 ' if (line.find("www") > -1) :`.

整体解决代码为：

import os
import urllib

datasource = urllib.request.urlopen("https://www.google.com")

while 1:
        line = str(datasource.read())
        if line == "": break
        if (line.find("www") > -1) :
                print (line)

li = ['www.apple.com', 'www.google.com']
os.chdir('..')
os.chdir('..')
os.chdir('..')
os.chdir('Program Files (x86)\LinkChecker')

for s in li:
    os.system('Start .\linkchecker ' + s)

如何在python中读取url，然后在网站上打印每个URL？

How to read in url in python and then print each URL on the website?

python

shell

urllib