请求不再 return html - Python
Requests does not return html anymore - Python
我正在尝试通过 python 请求 (2.7) 从 public Linkedin url 获取名称。
以前的代码工作正常。
import requests
from bs4 import BeautifulSoup
url = "https://www.linkedin.com/in/linustorvalds"
html = requests.get(url).content
link = BeautifulSoup(html).title.text.split("|")[0].replace(" ","")
print link
期望的输出是:
linustorvalds
我收到以下错误消息:
AttributeError: 'NoneType' object has no attribute 'text'
问题似乎是 html 没有返回页面的真实内容。所以没有找到'title'。这是打印 html:
的结果
<html><head>
<script type="text/javascript">
window.onload = function() {
var newLocation = "";
if (window.location.protocol == "http:") {
var cookies = document.cookie.split("; ");
for (var i = 0; i < cookies.length; ++i) {
if ((cookies[i].indexOf("sl=") == 0) && (cookies[i].length > 3)) {
newLocation = "https:" + window.location.href.substring(window.location.protocol.length);
}
}
}
if (newLocation.length == 0) {
var domain = location.host;
var newDomainIndex = 0;
if (domain.substr(0, 6) == "touch.") {
newDomainIndex = 6;
}
else if (domain.substr(0, 7) == "tablet.") {
newDomainIndex = 7;
}
if (newDomainIndex) {
domain = domain.substr(newDomainIndex);
}
newLocation = "https://" + domain + "/uas/login?trk=sentinel_org_block&session_redirect=" + encodeURIComponent(window.location)
}
window.location.href = newLocation;
}
</script>
</head></html>
我被屏蔽了吗?使此代码像以前一样工作的可能建议是什么?
非常感谢!
尝试设置 User-Agent header:
html = requests.get(url, headers={"User-Agent": "Requests"}).content
我正在尝试通过 python 请求 (2.7) 从 public Linkedin url 获取名称。
以前的代码工作正常。
import requests
from bs4 import BeautifulSoup
url = "https://www.linkedin.com/in/linustorvalds"
html = requests.get(url).content
link = BeautifulSoup(html).title.text.split("|")[0].replace(" ","")
print link
期望的输出是:
linustorvalds
我收到以下错误消息:
AttributeError: 'NoneType' object has no attribute 'text'
问题似乎是 html 没有返回页面的真实内容。所以没有找到'title'。这是打印 html:
的结果<html><head>
<script type="text/javascript">
window.onload = function() {
var newLocation = "";
if (window.location.protocol == "http:") {
var cookies = document.cookie.split("; ");
for (var i = 0; i < cookies.length; ++i) {
if ((cookies[i].indexOf("sl=") == 0) && (cookies[i].length > 3)) {
newLocation = "https:" + window.location.href.substring(window.location.protocol.length);
}
}
}
if (newLocation.length == 0) {
var domain = location.host;
var newDomainIndex = 0;
if (domain.substr(0, 6) == "touch.") {
newDomainIndex = 6;
}
else if (domain.substr(0, 7) == "tablet.") {
newDomainIndex = 7;
}
if (newDomainIndex) {
domain = domain.substr(newDomainIndex);
}
newLocation = "https://" + domain + "/uas/login?trk=sentinel_org_block&session_redirect=" + encodeURIComponent(window.location)
}
window.location.href = newLocation;
}
</script>
</head></html>
我被屏蔽了吗?使此代码像以前一样工作的可能建议是什么?
非常感谢!
尝试设置 User-Agent header:
html = requests.get(url, headers={"User-Agent": "Requests"}).content