Python 3:为什么要使用 urlparse/urlsplit
Python 3 : Why would you use urlparse/urlsplit
我不太确定这些模块的用途。我知道他们将各自的 url 拆分成它的组件,但为什么这会有用,或者什么是何时使用 urlparse 的示例?
仅在需要参数时才使用urlparse
。我在下面解释了为什么你需要参数 for.
urllib.parse.urlsplit(urlstring, scheme='', allow_fragments=True)
This is similar to urlparse(), but does not split the params from the
URL. This should generally be used instead of urlparse() if the more
recent URL syntax allowing parameters to be applied to each segment of
the path portion of the URL (see RFC 2396) is wanted.
主机名总是有用的,可以存储在变量中以供以后使用或添加参数,查询主机名以在抓取时获取所需的网页。
关于参数:
仅供参考:根据 RFC2396,url
中的参数
Extensive testing of current client applications demonstrated that the
majority of deployed systems do not use the ";" character to indicate
trailing parameter information, and that the presence of a semicolon
in a path segment does not affect the relative parsing of that
segment. Therefore, parameters have been removed as a separate
component and may now appear in any path segment. Their influence has
been removed from the algorithm for resolving a relative URI
reference.
参数在抓取中很有用,
例如如果 url 是 http://www.example.com/products/women?color=green
当你使用urlparse
时,你会得到参数。现在您必须将其更改为 men
,这样它将是 http://www.example.com/products/men?color=green
和 kids
、girl
、boy
等等。
我不太确定这些模块的用途。我知道他们将各自的 url 拆分成它的组件,但为什么这会有用,或者什么是何时使用 urlparse 的示例?
仅在需要参数时才使用urlparse
。我在下面解释了为什么你需要参数 for.
urllib.parse.urlsplit(urlstring, scheme='', allow_fragments=True)
This is similar to urlparse(), but does not split the params from the URL. This should generally be used instead of urlparse() if the more recent URL syntax allowing parameters to be applied to each segment of the path portion of the URL (see RFC 2396) is wanted.
主机名总是有用的,可以存储在变量中以供以后使用或添加参数,查询主机名以在抓取时获取所需的网页。
关于参数:
仅供参考:根据 RFC2396,url
中的参数Extensive testing of current client applications demonstrated that the majority of deployed systems do not use the ";" character to indicate trailing parameter information, and that the presence of a semicolon in a path segment does not affect the relative parsing of that segment. Therefore, parameters have been removed as a separate component and may now appear in any path segment. Their influence has been removed from the algorithm for resolving a relative URI reference.
参数在抓取中很有用,
例如如果 url 是 http://www.example.com/products/women?color=green
当你使用urlparse
时,你会得到参数。现在您必须将其更改为 men
,这样它将是 http://www.example.com/products/men?color=green
和 kids
、girl
、boy
等等。