如何用 python 复制 URL 的所有代码
how to copy all the code of a URL with python
我想用Python3.6复制一个URL(http://modelseed.org/biochem/reactions/rxn00001)的全部代码,但是只能复制一部分代码,不知道为什么。
到目前为止,我尝试使用 "requests" 模块
import requests
page = requests.get("http://modelseed.org/biochem/reactions/rxn00001")
print(page.content)
和"urllib"
import urllib.request
site = urllib.request.urlopen("http://modelseed.org/biochem/reactions/rxn00001")
print(site.read())
包含 "Reaction Details" 信息的代码部分,如 "Name"、"ID" 和 "Abbreviation" 丢失,但如果我检查代码,它们是可见的在 Chrome 的开发者栏上。
我可以使用上面的两个代码下载的代码是:
<!DOCTYPE html>
<html lang="en" ng-app="ModelSEED">
<head>
<base href="/"/>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<meta content="IE=edge" http-equiv="X-UA-Compatible"/>
<meta content="initial-scale=1, maximum-scale=1, user-scalable=no" name="viewport">
<meta content="The ModelSEED is a resource for the reconstruction, exploration, comparison, and analysis of metabolic models." name="description"/>
<link href="/img/ModelSEED-favicon.png?v=2.0" rel="shortcut icon"/>
<meta content="nconrad" name="author"/>
<title>
ModelSEED
</title>
<link href="components/angular-material/angular-material.css" rel="stylesheet"/>
<link href="components/bootstrap/dist/css/bootstrap.min.css" rel="stylesheet"/>
<!-- to be removed -->
<link href="components/font-awesome/css/font-awesome.min.css" rel="stylesheet"/>
<link href="icomoon/style.css" rel="stylesheet"/>
<link href="https://fonts.googleapis.com/icon?family=Material+Icons" rel="stylesheet"/>
<link href="http://fonts.googleapis.com/css?family=Montserrat:400,700" rel="stylesheet" type="text/css"/>
<link href="build/style.css" rel="stylesheet"/>
<!--<script src="https://cdn.socket.io/socket.io-1.3.7.js"></script>-->
<script src="build/site.js">
</script>
<!-- HTML5 Shim and Respond.js IE8 support of HTML5 elements and media queries -->
<!-- WARNING: Respond.js doesn't work if you view the page via file:// -->
<!--[if lt IE 9]>
<script src="https://oss.maxcdn.com/libs/html5shiv/3.7.0/html5shiv.js"></script>
<script src="https://oss.maxcdn.com/libs/respond.js/1.4.2/respond.min.js"></script>
<![endif]-->
</meta>
</head>
<body>
<div style="height: 100%;" ui-view="">
</div>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-67412611-1', 'auto');
ga('send', 'pageview');
</script>
</body>
</html>
任何人都知道为什么 < div style="height: 100%;" ui-view="" > 和(就在 < body > 之后和 < script > 之前)之间的代码没有下载?
谢谢。
它是由 javascript 脚本插入的,因此,requests 和 urllib 都找不到它,您需要为此使用浏览器,您应该尝试使用 selenium 或 PhantomJS
类似于:
from selenium import webdriver
driver = webdriver.Chrome('./chromedriver')
driver.get(url)
driver.page_source
尝试获取此 url:https://www.patricbrc.org/api/model_reaction/?http_accept=application/json&eq(id,rxn00001)
我想用Python3.6复制一个URL(http://modelseed.org/biochem/reactions/rxn00001)的全部代码,但是只能复制一部分代码,不知道为什么。
到目前为止,我尝试使用 "requests" 模块
import requests
page = requests.get("http://modelseed.org/biochem/reactions/rxn00001")
print(page.content)
和"urllib"
import urllib.request
site = urllib.request.urlopen("http://modelseed.org/biochem/reactions/rxn00001")
print(site.read())
包含 "Reaction Details" 信息的代码部分,如 "Name"、"ID" 和 "Abbreviation" 丢失,但如果我检查代码,它们是可见的在 Chrome 的开发者栏上。
我可以使用上面的两个代码下载的代码是:
<!DOCTYPE html>
<html lang="en" ng-app="ModelSEED">
<head>
<base href="/"/>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<meta content="IE=edge" http-equiv="X-UA-Compatible"/>
<meta content="initial-scale=1, maximum-scale=1, user-scalable=no" name="viewport">
<meta content="The ModelSEED is a resource for the reconstruction, exploration, comparison, and analysis of metabolic models." name="description"/>
<link href="/img/ModelSEED-favicon.png?v=2.0" rel="shortcut icon"/>
<meta content="nconrad" name="author"/>
<title>
ModelSEED
</title>
<link href="components/angular-material/angular-material.css" rel="stylesheet"/>
<link href="components/bootstrap/dist/css/bootstrap.min.css" rel="stylesheet"/>
<!-- to be removed -->
<link href="components/font-awesome/css/font-awesome.min.css" rel="stylesheet"/>
<link href="icomoon/style.css" rel="stylesheet"/>
<link href="https://fonts.googleapis.com/icon?family=Material+Icons" rel="stylesheet"/>
<link href="http://fonts.googleapis.com/css?family=Montserrat:400,700" rel="stylesheet" type="text/css"/>
<link href="build/style.css" rel="stylesheet"/>
<!--<script src="https://cdn.socket.io/socket.io-1.3.7.js"></script>-->
<script src="build/site.js">
</script>
<!-- HTML5 Shim and Respond.js IE8 support of HTML5 elements and media queries -->
<!-- WARNING: Respond.js doesn't work if you view the page via file:// -->
<!--[if lt IE 9]>
<script src="https://oss.maxcdn.com/libs/html5shiv/3.7.0/html5shiv.js"></script>
<script src="https://oss.maxcdn.com/libs/respond.js/1.4.2/respond.min.js"></script>
<![endif]-->
</meta>
</head>
<body>
<div style="height: 100%;" ui-view="">
</div>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-67412611-1', 'auto');
ga('send', 'pageview');
</script>
</body>
</html>
任何人都知道为什么 < div style="height: 100%;" ui-view="" > 和(就在 < body > 之后和 < script > 之前)之间的代码没有下载?
谢谢。
它是由 javascript 脚本插入的,因此,requests 和 urllib 都找不到它,您需要为此使用浏览器,您应该尝试使用 selenium 或 PhantomJS
类似于:
from selenium import webdriver
driver = webdriver.Chrome('./chromedriver')
driver.get(url)
driver.page_source
尝试获取此 url:https://www.patricbrc.org/api/model_reaction/?http_accept=application/json&eq(id,rxn00001)