Javascript - 如何从 HTML 中的 <script> 标签查找和获取特定值
Javascript - How to find and get specific values from a <script> tag in HTML
我正在抓取一个网站,并试图从 HTML 页面中的标签获取特定值。 HTML 页面还有许多其他标签。我定位的特定脚本包含我需要抓取的所有图像。
我无法使用 Cheerio 直接抓取图像,因为它们在主 HTML 页面上不可用,除非我单击主图像以查看所有其他图像。
我需要的是这样的:
找到具有关键字 {someImages} 的标签,然后对于名称为 {large} 的每个关键字,return 该关键字的值。
我在下面创建了一个示例来解释我的问题。
非常感谢您的帮助
非常感谢
<body>
<script type="text/javascript"> ... </script>
<script type="text/javascript"> ... </script>
<script type="text/javascript"> ... </script>
<script type="text/javascript"> ... </script>
.
.
.
<script type="text/javascript">
var data = {
'someImages': {
'initial': [
{
"hiRes": "https://somewebsite/images/imageName1.jpg",
"thumb": "https://somewebsite/images/imageName1.jpg",
"large": "https://somewebsite/images/imageName1.jpg", // I would like to be able to get the value of large from this line
"main": {
"https://somewebsite/images/imageName1.jpg": [1654],
"https://somewebsite/images/imageName1.jpg": [3416],
"https://somewebsite/images/imageName1.jpg": [7560]
}
},
{
"hiRes": "https://somewebsite/images/imageName2.jpg",
"thumb": "https://somewebsite/images/imageName2.jpg",
"large": "https://somewebsite/images/imageName2.jpg", // I would like to be able to get the value of large from this line
"main": {
"https://somewebsite/images/imageName2.jpg": [2234],
"https://somewebsite/images/imageName2.jpg": [3616],
"https://somewebsite/images/imageName2.jpg": [7849]
}
},
{
"hiRes": "https://somewebsite/images/imageName3.jpg",
"thumb": "https://somewebsite/images/imageName3.jpg",
"large": "https://somewebsite/images/imageName3.jpg", // I would like to be able to get the value of large from this line
"main": {
"https://somewebsite/images/imageName3.jpg": [2344],
"https://somewebsite/images/imageName3.jpg": [3556],
"https://somewebsite/images/imageName3.jpg": [7490]
}
},
]
}
</script>
<script type="text/javascript"> ... </script>
<script type="text/javascript"> ... </script>
<script type="text/javascript"> ... </script>
<script type="text/javascript"> ... </script>
.
.
.
</body>
一个简单的正则表达式就可以解决问题。使用捕获组捕获 URL。
/"large": ?"(.+?)",/g
如果需要,请在 Regexpal 中进行测试
我正在抓取一个网站,并试图从 HTML 页面中的标签获取特定值。 HTML 页面还有许多其他标签。我定位的特定脚本包含我需要抓取的所有图像。
我无法使用 Cheerio 直接抓取图像,因为它们在主 HTML 页面上不可用,除非我单击主图像以查看所有其他图像。
我需要的是这样的:
找到具有关键字 {someImages} 的标签,然后对于名称为 {large} 的每个关键字,return 该关键字的值。
我在下面创建了一个示例来解释我的问题。
非常感谢您的帮助
非常感谢
<body>
<script type="text/javascript"> ... </script>
<script type="text/javascript"> ... </script>
<script type="text/javascript"> ... </script>
<script type="text/javascript"> ... </script>
.
.
.
<script type="text/javascript">
var data = {
'someImages': {
'initial': [
{
"hiRes": "https://somewebsite/images/imageName1.jpg",
"thumb": "https://somewebsite/images/imageName1.jpg",
"large": "https://somewebsite/images/imageName1.jpg", // I would like to be able to get the value of large from this line
"main": {
"https://somewebsite/images/imageName1.jpg": [1654],
"https://somewebsite/images/imageName1.jpg": [3416],
"https://somewebsite/images/imageName1.jpg": [7560]
}
},
{
"hiRes": "https://somewebsite/images/imageName2.jpg",
"thumb": "https://somewebsite/images/imageName2.jpg",
"large": "https://somewebsite/images/imageName2.jpg", // I would like to be able to get the value of large from this line
"main": {
"https://somewebsite/images/imageName2.jpg": [2234],
"https://somewebsite/images/imageName2.jpg": [3616],
"https://somewebsite/images/imageName2.jpg": [7849]
}
},
{
"hiRes": "https://somewebsite/images/imageName3.jpg",
"thumb": "https://somewebsite/images/imageName3.jpg",
"large": "https://somewebsite/images/imageName3.jpg", // I would like to be able to get the value of large from this line
"main": {
"https://somewebsite/images/imageName3.jpg": [2344],
"https://somewebsite/images/imageName3.jpg": [3556],
"https://somewebsite/images/imageName3.jpg": [7490]
}
},
]
}
</script>
<script type="text/javascript"> ... </script>
<script type="text/javascript"> ... </script>
<script type="text/javascript"> ... </script>
<script type="text/javascript"> ... </script>
.
.
.
</body>
一个简单的正则表达式就可以解决问题。使用捕获组捕获 URL。
/"large": ?"(.+?)",/g
如果需要,请在 Regexpal 中进行测试