Javascript - 如何从 HTML 中的 <script> 标签查找和获取特定值

Javascript - How to find and get specific values from a <script> tag in HTML

我正在抓取一个网站,并试图从 HTML 页面中的标签获取特定值。 HTML 页面还有许多其他标签。我定位的特定脚本包含我需要抓取的所有图像。

我无法使用 Cheerio 直接抓取图像,因为它们在主 HTML 页面上不可用,除非我单击主图像以查看所有其他图像。

我需要的是这样的:

找到具有关键字 {someImages} 的标签,然后对于名称为 {large} 的每个关键字,return 该关键字的值。

我在下面创建了一个示例来解释我的问题。

非常感谢您的帮助

非常感谢

<body>
    <script type="text/javascript"> ... </script>
    <script type="text/javascript"> ... </script>
    <script type="text/javascript"> ... </script>
    <script type="text/javascript"> ... </script>
    .
    .
    .
    
    <script type="text/javascript">
        var data = {
            'someImages': {
                'initial': [
                        {
                        "hiRes": "https://somewebsite/images/imageName1.jpg",
                        "thumb": "https://somewebsite/images/imageName1.jpg",
                        "large": "https://somewebsite/images/imageName1.jpg", // I would like to be able to get the value of large from this line
                        "main": { 
                            "https://somewebsite/images/imageName1.jpg": [1654],
                            "https://somewebsite/images/imageName1.jpg": [3416],
                            "https://somewebsite/images/imageName1.jpg": [7560]
                            }
                        }, 
                    
                        {
                    "hiRes": "https://somewebsite/images/imageName2.jpg",
                    "thumb": "https://somewebsite/images/imageName2.jpg",
                    "large": "https://somewebsite/images/imageName2.jpg", // I would like to be able to get the value of large from this line
                    "main": { 
                        "https://somewebsite/images/imageName2.jpg": [2234],
                        "https://somewebsite/images/imageName2.jpg": [3616],
                        "https://somewebsite/images/imageName2.jpg": [7849]
                        }
                    },

                    {
                    "hiRes": "https://somewebsite/images/imageName3.jpg",
                    "thumb": "https://somewebsite/images/imageName3.jpg",
                    "large": "https://somewebsite/images/imageName3.jpg", // I would like to be able to get the value of large from this line
                    "main": { 
                        "https://somewebsite/images/imageName3.jpg": [2344],
                        "https://somewebsite/images/imageName3.jpg": [3556],
                        "https://somewebsite/images/imageName3.jpg": [7490]
                        }
                    },
                ]
            }
            
    </script>
    
    
    <script type="text/javascript"> ... </script>
    <script type="text/javascript"> ... </script>
    <script type="text/javascript"> ... </script>
    <script type="text/javascript"> ... </script>
    .
    .
    .

</body>

一个简单的正则表达式就可以解决问题。使用捕获组捕获 URL。

/"large": ?"(.+?)",/g

如果需要,请在 Regexpal 中进行测试