Python 显示与网络浏览器不同的内容视图
Python shows a different view of the content than web browser
我有一个 Python 库存更新脚本,每晚运行,从网站提取库存。我最近开始遇到问题,经过进一步调查发现,当我通过网络浏览器(查看源代码)查看源内容时,它看起来很正常。但是,当我使用 python 将其打印到控制台时,它看起来很奇怪(并且破坏了脚本)。想知道是否有人见过这样的事情或知道是什么原因造成的?
Web 浏览器显示此内容(url 已编辑):
<ul class='vnav vnav__subnav vnav--level2'>
<li class='vnav__item'><a href='https://xx.htm' class='vnav__link'>Folding Tables</a>
</li><li class='vnav__item'><a href='https://xx.htm' class='vnav__link'>Bookcases</a>
</li><li class='vnav__item'><a href='https://xx.htm' class='vnav__link'>Printer Stands</a>
</li><li class='vnav__item'><a href='https://xx.htm' class='vnav__link'>Computer Desks</a>
</li><li class='vnav__item'><a href='https://xx.htm' class='vnav__link'>Office Chairs</a>
</li><li class='vnav__item'><a href='https://xx.htm' class='vnav__link'>Filing Cabinets</a>
</li><li class='vnav__item'><a href='https://xx.htm' class='vnav__link'>Letter Holders</a>
</li></ul>
</li>
<li class='vnav__item'><a href='https://xx.htm' class='vnav__link'>Bathroom</a>
<ul class='vnav vnav__subnav vnav--level2'>
<li class='vnav__item'><a href='https://xx.htm' class='vnav__link'>Bathroom Mirrors</a>
</li><li class='vnav__item'><a href='https://xx.htm' class='vnav__link'>Bathroom Sinks</a>
</li><li class='vnav__item'><a href='https://xx.htm' class='vnav__link'>Bathroom Cabinets</a>
</li><li class='vnav__item'><a href='https://xx.htm' class='vnav__link'>Bathroom Vanities</a>
</li><li class='vnav__item'><a href='https://xx.htm' class='vnav__link'>Laundry Hampers</a>
</li><li class='vnav__item'><a href='https://xx.htm' class='vnav__link'>Bath Towel Sets</a>
</li><li class='vnav__item'><a href='https://xx.htm' class='vnav__link'>Shower Curtains</a>
</li></ul>
但是控制台中的 Python print() 显示了这个(URL 已编辑):
<ul class="vnav vnav__subnav vnav--level2">
<li class="vnav__item"><a class="vnav__link" href="https://xx.htm">Folding Tables</a>
</li><li class="vnav__item"><a class="vnav__link" href="https://xx.htm">Bookcases</a>
</li><li class="vnav__item"><a class="vnav__link" href="https://xx.htm">Printer Stands</a>
</li><li class="vnav__item"><a class="vnav__link" href="https://xx.htm">Computer Desks</a>
</li><li class="vnav__item"><a class="vnav__link" href="https://xx.htm">Office Chairs</a>
</li><li class="vnav__item"><a class="vnav__link" href="https://xx.htm">Filing Cabinets</a>
</li><li class="vnav__item"><a class="vnav__link" href="https://xx.htm">Letter Holders</a>
</li></ul>
</li>
<li class="vnav__item"><a href="https:">/ / x x . h t m ' c l a s s = ' v n a v _ _ l i n k ' > B a t h r o o m / a >
u l c l a s s = ' v n a v v n a v _ _ s u b n a v v n a v - - l e v e l 2 ' >
l i c l a s s = ' v n a v _ _ i t e m ' > a h r e f = ' h t t p s : / / x x . h t m ' c l a s s = ' v n a v _ _ l i n k ' > B a t h r o o m M i r r o r s / a >
/ l i > l i c l a s s = ' v n a v _ _ i t e m ' > a h r e f = ' h t t p s : / / x x . h t m ' c l a s s = ' v n a v _ _ l i n k ' > B a t h r o m S i n k s / a >
/ l i > l i c l a s s = ' v n a v _ _ i t e m ' > a h r e f = ' h t t p s : / / x x . h t m ' c l a s s = ' v n a v _ _ l i n k ' > B a h r o o m C a b i n e t s / a >
/ l i > l i c l a s s = ' v n a v _ _ i t e m ' > a h r e f = ' h t t p s : / / x x . h t m ' c l a s s = ' v n a v _ _ l i n k ' > B a h r o o m V a n i t i e s / a >
/ l i > l i c l a s s = ' v n a v _ _ i t e m ' > a h r e f = ' h t t p s : / / x x . h t m ' c l a s s = ' v n a v _ _ l i n k ' > L a u n r y H a m p e r s / a >
/ l i > l i c l a s s = ' v n a v _ _ i t e m ' > a h r e f = ' h t t p s : / / x x . h t m ' c l a s s = ' v n a v _ _ l i n k ' > B a t h T o w e l S e t s / a >
/ l i > l i c l a s s = ' v n a v _ _ i t e m ' > a h r e f = ' h t t p s : / / x x . h t m ' c l a s s = ' v n a v _ _ l i n k ' > S h o w e C u r t a i n s / a >
/ l i > / u l >
内容类型是 "text/html",编码在网络浏览器中是 "ISO-8859-1",但通过 Python 打印时显示 "UTF-8"。此外,在 Python 控制台 print() 上,html 的其余部分与所有空格和字符一起出现,除了最后的右边,它恢复正常(除了它看起来像有 2 个标签,这是一个不同的问题):
/ b o d y >
/ h t m l >
</a></li></ul></div></nav></body></html>
最后,如果我尝试使用 UTF-8 而不是 ISO-8859-1 进行解码,我会收到以下错误:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 74396: invalid continuation byte
没关系,想通了。
重要提示:在不同的 VirtualEnvs 中工作时,请始终确保您的 python 版本相同。我最初没有检查这个,但是因为我来回跳来跳去,所以决定验证一下。我假设正在使用的 python 版本没有。一旦我做出改变......是的!更好。
我有一个 Python 库存更新脚本,每晚运行,从网站提取库存。我最近开始遇到问题,经过进一步调查发现,当我通过网络浏览器(查看源代码)查看源内容时,它看起来很正常。但是,当我使用 python 将其打印到控制台时,它看起来很奇怪(并且破坏了脚本)。想知道是否有人见过这样的事情或知道是什么原因造成的?
Web 浏览器显示此内容(url 已编辑):
<ul class='vnav vnav__subnav vnav--level2'>
<li class='vnav__item'><a href='https://xx.htm' class='vnav__link'>Folding Tables</a>
</li><li class='vnav__item'><a href='https://xx.htm' class='vnav__link'>Bookcases</a>
</li><li class='vnav__item'><a href='https://xx.htm' class='vnav__link'>Printer Stands</a>
</li><li class='vnav__item'><a href='https://xx.htm' class='vnav__link'>Computer Desks</a>
</li><li class='vnav__item'><a href='https://xx.htm' class='vnav__link'>Office Chairs</a>
</li><li class='vnav__item'><a href='https://xx.htm' class='vnav__link'>Filing Cabinets</a>
</li><li class='vnav__item'><a href='https://xx.htm' class='vnav__link'>Letter Holders</a>
</li></ul>
</li>
<li class='vnav__item'><a href='https://xx.htm' class='vnav__link'>Bathroom</a>
<ul class='vnav vnav__subnav vnav--level2'>
<li class='vnav__item'><a href='https://xx.htm' class='vnav__link'>Bathroom Mirrors</a>
</li><li class='vnav__item'><a href='https://xx.htm' class='vnav__link'>Bathroom Sinks</a>
</li><li class='vnav__item'><a href='https://xx.htm' class='vnav__link'>Bathroom Cabinets</a>
</li><li class='vnav__item'><a href='https://xx.htm' class='vnav__link'>Bathroom Vanities</a>
</li><li class='vnav__item'><a href='https://xx.htm' class='vnav__link'>Laundry Hampers</a>
</li><li class='vnav__item'><a href='https://xx.htm' class='vnav__link'>Bath Towel Sets</a>
</li><li class='vnav__item'><a href='https://xx.htm' class='vnav__link'>Shower Curtains</a>
</li></ul>
但是控制台中的 Python print() 显示了这个(URL 已编辑):
<ul class="vnav vnav__subnav vnav--level2">
<li class="vnav__item"><a class="vnav__link" href="https://xx.htm">Folding Tables</a>
</li><li class="vnav__item"><a class="vnav__link" href="https://xx.htm">Bookcases</a>
</li><li class="vnav__item"><a class="vnav__link" href="https://xx.htm">Printer Stands</a>
</li><li class="vnav__item"><a class="vnav__link" href="https://xx.htm">Computer Desks</a>
</li><li class="vnav__item"><a class="vnav__link" href="https://xx.htm">Office Chairs</a>
</li><li class="vnav__item"><a class="vnav__link" href="https://xx.htm">Filing Cabinets</a>
</li><li class="vnav__item"><a class="vnav__link" href="https://xx.htm">Letter Holders</a>
</li></ul>
</li>
<li class="vnav__item"><a href="https:">/ / x x . h t m ' c l a s s = ' v n a v _ _ l i n k ' > B a t h r o o m / a >
u l c l a s s = ' v n a v v n a v _ _ s u b n a v v n a v - - l e v e l 2 ' >
l i c l a s s = ' v n a v _ _ i t e m ' > a h r e f = ' h t t p s : / / x x . h t m ' c l a s s = ' v n a v _ _ l i n k ' > B a t h r o o m M i r r o r s / a >
/ l i > l i c l a s s = ' v n a v _ _ i t e m ' > a h r e f = ' h t t p s : / / x x . h t m ' c l a s s = ' v n a v _ _ l i n k ' > B a t h r o m S i n k s / a >
/ l i > l i c l a s s = ' v n a v _ _ i t e m ' > a h r e f = ' h t t p s : / / x x . h t m ' c l a s s = ' v n a v _ _ l i n k ' > B a h r o o m C a b i n e t s / a >
/ l i > l i c l a s s = ' v n a v _ _ i t e m ' > a h r e f = ' h t t p s : / / x x . h t m ' c l a s s = ' v n a v _ _ l i n k ' > B a h r o o m V a n i t i e s / a >
/ l i > l i c l a s s = ' v n a v _ _ i t e m ' > a h r e f = ' h t t p s : / / x x . h t m ' c l a s s = ' v n a v _ _ l i n k ' > L a u n r y H a m p e r s / a >
/ l i > l i c l a s s = ' v n a v _ _ i t e m ' > a h r e f = ' h t t p s : / / x x . h t m ' c l a s s = ' v n a v _ _ l i n k ' > B a t h T o w e l S e t s / a >
/ l i > l i c l a s s = ' v n a v _ _ i t e m ' > a h r e f = ' h t t p s : / / x x . h t m ' c l a s s = ' v n a v _ _ l i n k ' > S h o w e C u r t a i n s / a >
/ l i > / u l >
内容类型是 "text/html",编码在网络浏览器中是 "ISO-8859-1",但通过 Python 打印时显示 "UTF-8"。此外,在 Python 控制台 print() 上,html 的其余部分与所有空格和字符一起出现,除了最后的右边,它恢复正常(除了它看起来像有 2 个标签,这是一个不同的问题):
/ b o d y >
/ h t m l >
</a></li></ul></div></nav></body></html>
最后,如果我尝试使用 UTF-8 而不是 ISO-8859-1 进行解码,我会收到以下错误:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 74396: invalid continuation byte
没关系,想通了。
重要提示:在不同的 VirtualEnvs 中工作时,请始终确保您的 python 版本相同。我最初没有检查这个,但是因为我来回跳来跳去,所以决定验证一下。我假设正在使用的 python 版本没有。一旦我做出改变......是的!更好。