如何解析然后反解析 url 查询字符串,以便它以与以前相同的 format/encoding 结尾?
How to parse and then unparse a url query string so that it ends up in the same format/encoding as before?
有没有一种方法可以获取 url,解析它以获取查询,使用 python 编辑查询,然后重新制作 url,使其完全符合相同(相同的格式、编码等)。这是我尝试使用 urllib 函数
>>> working_url
'https://<some-netloc>/reports/sales-order-history?page=&sort_direction=&sort_column=&filter%5Bsearch%5D=&filter%5Bofficial%5D%5B0%5D%5Bname%5D=status&filter%5Bofficial%5D%5B0%5D%5Bvalue%5D=Pending%2CProcessing%2CReady%20to%20ship%2CDelivering%2CDelivered%2CCompleted&filter%5Bofficial%5D%5B1%5D%5Bname%5D=orderDate&filter%5Bofficial%5D%5B1%5D%5Bvalue%5D=2020-05-10T07%3A00%3A00.000Z%2C2020-05-18T06%3A59%3A59.999Z'
>>> working_parse = urlparse(working_url)
>>> working_parse
ParseResult(scheme='https', netloc='<some-netloc>', path='/reports/sales-order-history', params='', query='page=&sort_direction=&sort_column=&filter%5Bsearch%5D=&filter%5Bofficial%5D%5B0%5D%5Bname%5D=status&filter%5Bofficial%5D%5B0%5D%5Bvalue%5D=Pending%2CProcessing%2CReady%20to%20ship%2CDelivering%2CDelivered%2CCompleted&filter%5Bofficial%5D%5B1%5D%5Bname%5D=orderDate&filter%5Bofficial%5D%5B1%5D%5Bvalue%5D=2020-05-10T07%3A00%3A00.000Z%2C2020-05-18T06%3A59%3A59.999Z', fragment='')
>>> working_query_dict = parse_qs(working_parse.query)
这里是我编辑 working_query_dict
以更改这些时间戳的地方。现在,我使用 urlencode 再次对字典进行编码,并使用 urlunparse 将其转回真正的工作 url。
>>> working_query_dict
{'filter[official][0][name]': ['status'], 'filter[official][0][value]': ['Pending,Processing,Ready to ship,Delivering,Delivered,Completed'], 'filter[official][1][name]': ['orderDate'], 'filter[official][1][value]': ['2020-05-10T07:00:00.000Z,2020-05-18T06:59:59.999Z']}
>>> urlunparse((working_parse.scheme,working_parse.netloc,working_parse.path,working_parse.params,urlencode(working_query_dict),working_parse.fragment))
'https://<some-net-loc>/reports/sales-order-history?filter%5Bofficial%5D%5B0%5D%5Bname%5D=%5B%27status%27%5D&filter%5Bofficial%5D%5B0%5D%5Bvalue%5D=%5B%27Pending%2CProcessing%2CReady+to+ship%2CDelivering%2CDelivered%2CCompleted%27%5D&filter%5Bofficial%5D%5B1%5D%5Bname%5D=%5B%27orderDate%27%5D&filter%5Bofficial%5D%5B1%5D%5Bvalue%5D=%5B%272020-05-10T07%3A00%3A00.000Z%2C2020-05-18T06%3A59%3A59.999Z%27%5D'
但是,形成的 url 不起作用 - 它不会解析到网站上的同一位置。即使看着它,你也可以看出它发生了变化,即使我没有改变任何属性或任何东西。
我在想也许我需要喜欢,在执行 parse_qs 时检测编码或格式,然后在执行 url 编码时使用该格式?我怎样才能做到这一点?
好的,关键是 urlencode 标志 quote_via=urllib.parse.quote
。此外,parse_qs 可以更改为 parse_qsl 以保留参数的顺序,如果您想要一个绝对真正的比赛。
现在这对我有用了:
>>> from urllib.parse import quote, parse_qsl,urlencode
>>> urlencode(parse_qsl(working_parse.query,keep_blank_values=True),quote_via=quote) == working_parse.query
True
它需要一个复杂的查询(您可以根据需要编辑属性),将其解析出来并将其 urlencode 为原始查询字符串。
有没有一种方法可以获取 url,解析它以获取查询,使用 python 编辑查询,然后重新制作 url,使其完全符合相同(相同的格式、编码等)。这是我尝试使用 urllib 函数
>>> working_url
'https://<some-netloc>/reports/sales-order-history?page=&sort_direction=&sort_column=&filter%5Bsearch%5D=&filter%5Bofficial%5D%5B0%5D%5Bname%5D=status&filter%5Bofficial%5D%5B0%5D%5Bvalue%5D=Pending%2CProcessing%2CReady%20to%20ship%2CDelivering%2CDelivered%2CCompleted&filter%5Bofficial%5D%5B1%5D%5Bname%5D=orderDate&filter%5Bofficial%5D%5B1%5D%5Bvalue%5D=2020-05-10T07%3A00%3A00.000Z%2C2020-05-18T06%3A59%3A59.999Z'
>>> working_parse = urlparse(working_url)
>>> working_parse
ParseResult(scheme='https', netloc='<some-netloc>', path='/reports/sales-order-history', params='', query='page=&sort_direction=&sort_column=&filter%5Bsearch%5D=&filter%5Bofficial%5D%5B0%5D%5Bname%5D=status&filter%5Bofficial%5D%5B0%5D%5Bvalue%5D=Pending%2CProcessing%2CReady%20to%20ship%2CDelivering%2CDelivered%2CCompleted&filter%5Bofficial%5D%5B1%5D%5Bname%5D=orderDate&filter%5Bofficial%5D%5B1%5D%5Bvalue%5D=2020-05-10T07%3A00%3A00.000Z%2C2020-05-18T06%3A59%3A59.999Z', fragment='')
>>> working_query_dict = parse_qs(working_parse.query)
这里是我编辑 working_query_dict
以更改这些时间戳的地方。现在,我使用 urlencode 再次对字典进行编码,并使用 urlunparse 将其转回真正的工作 url。
>>> working_query_dict
{'filter[official][0][name]': ['status'], 'filter[official][0][value]': ['Pending,Processing,Ready to ship,Delivering,Delivered,Completed'], 'filter[official][1][name]': ['orderDate'], 'filter[official][1][value]': ['2020-05-10T07:00:00.000Z,2020-05-18T06:59:59.999Z']}
>>> urlunparse((working_parse.scheme,working_parse.netloc,working_parse.path,working_parse.params,urlencode(working_query_dict),working_parse.fragment))
'https://<some-net-loc>/reports/sales-order-history?filter%5Bofficial%5D%5B0%5D%5Bname%5D=%5B%27status%27%5D&filter%5Bofficial%5D%5B0%5D%5Bvalue%5D=%5B%27Pending%2CProcessing%2CReady+to+ship%2CDelivering%2CDelivered%2CCompleted%27%5D&filter%5Bofficial%5D%5B1%5D%5Bname%5D=%5B%27orderDate%27%5D&filter%5Bofficial%5D%5B1%5D%5Bvalue%5D=%5B%272020-05-10T07%3A00%3A00.000Z%2C2020-05-18T06%3A59%3A59.999Z%27%5D'
但是,形成的 url 不起作用 - 它不会解析到网站上的同一位置。即使看着它,你也可以看出它发生了变化,即使我没有改变任何属性或任何东西。
我在想也许我需要喜欢,在执行 parse_qs 时检测编码或格式,然后在执行 url 编码时使用该格式?我怎样才能做到这一点?
好的,关键是 urlencode 标志 quote_via=urllib.parse.quote
。此外,parse_qs 可以更改为 parse_qsl 以保留参数的顺序,如果您想要一个绝对真正的比赛。
现在这对我有用了:
>>> from urllib.parse import quote, parse_qsl,urlencode
>>> urlencode(parse_qsl(working_parse.query,keep_blank_values=True),quote_via=quote) == working_parse.query
True
它需要一个复杂的查询(您可以根据需要编辑属性),将其解析出来并将其 urlencode 为原始查询字符串。