Python 3:为什么要使用 urlparse/urlsplit

Python 3 : Why would you use urlparse/urlsplit

我不太确定这些模块的用途。我知道他们将各自的 url 拆分成它的组件,但为什么这会有用,或者什么是何时使用 urlparse 的示例?

仅在需要参数时才使用urlparse。我在下面解释了为什么你需要参数 for.

Reference

urllib.parse.urlsplit(urlstring, scheme='', allow_fragments=True)

This is similar to urlparse(), but does not split the params from the URL. This should generally be used instead of urlparse() if the more recent URL syntax allowing parameters to be applied to each segment of the path portion of the URL (see RFC 2396) is wanted.

主机名总是有用的,可以存储在变量中以供以后使用或添加参数,查询主机名以在抓取时获取所需的网页。

关于参数:

仅供参考:根据 RFC2396,url

中的参数

Extensive testing of current client applications demonstrated that the majority of deployed systems do not use the ";" character to indicate trailing parameter information, and that the presence of a semicolon in a path segment does not affect the relative parsing of that segment. Therefore, parameters have been removed as a separate component and may now appear in any path segment. Their influence has been removed from the algorithm for resolving a relative URI reference.

参数在抓取中很有用, 例如如果 url 是 http://www.example.com/products/women?color=green

当你使用urlparse时,你会得到参数。现在您必须将其更改为 men,这样它将是 http://www.example.com/products/men?color=greenkidsgirlboy 等等。