Javascript 代码未使用 python 代码提取

Question

我正在尝试从网站中提取 .mp4 link，该文件仅显示在网络浏览器的 "Inspect Element" 选项卡中。

我在 Internet 上看到我需要使用 selenium，例如 PhantomJS 来获取该代码。我试过了，但我得到了 HTML 文件，该文件在 "Show source code"

中可见

from selenium import webdriver

driver = webdriver.PhantomJS(executable_path=r'C:\Users\Nevendary\Desktop\phantomjs-2.1.1-windows\bin\phantomjs')
driver.get("https://filmovitica.com/pucanj-u-sljiviku-preko-reke-1978-domaci-film-gledaj-online/")
driver.implicitly_wait(30)

print(driver.page_source)

我希望获得包含以下内容的代码：https://fs40.gounlimited.to/tea5u5akd32qzxfffpqyfndb6resauu5w43w7enoxkvu6sjtrf5hfhbz3ika/v.mp4"

但我只是正常 HTML 网站

Answer 1

而不是 PhantomJS 尝试使用 ChromeDriver 和 headless options.This 给我你想要的输出。

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
driver=webdriver.Chrome(executable_path='path of chrome driver',options=chrome_options)
driver.get("https://filmovitica.com/pucanj-u-sljiviku-preko-reke-1978-domaci-film-gledaj-online/")
print(driver.page_source)

注意：如果您没有根据浏览器安装 chrome 驱动程序，您可以从以下 link 下载 chrome 驱动程序 compatibility.Please 下载前请阅读发行说明任何 chrome 兼容性驱动程序。 Download Chrome driver

另一种使用 Beautiful Soup 的方法，它是 python 库。

 import requests
    from bs4 import BeautifulSoup
    data=requests.get('https://filmovitica.com/pucanj-u-sljiviku-preko-reke-1978-domaci-film-gledaj-online/')
    soup=BeautifulSoup(data.text,'html.parser')
    print(soup)

注意：安装很容易pip install beautifulsoup4您可以查看以下link关于Beautiful Soup Beautiful Soup

Answer 2

无需搜索页面源，您可以直接获取视频元素的 src 属性，其中包含您要查找的 link。

视频 link 在 iframe 中。在不切换到框架的情况下获取页面源不会 return 视频 link。

示例中我使用了 chromedriver。

试试这个：

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

driver = webdriver.Chrome(executable_path="chromedriver.exe")

wait = WebDriverWait(driver, 20)
driver.get("https://filmovitica.com/pucanj-u-sljiviku-preko-reke-1978-domaci-film-gledaj-online/")

vframe = driver.find_element_by_xpath("//iframe[@width='900']")

driver.switch_to.frame(vframe)

videoElement = wait.until(EC.visibility_of(driver.find_element(By.CSS_SELECTOR, "#vplayer > div > div.container > video")))

print(videoElement.get_attribute('src'))

driver.quit()

Answer 3

检查 html 确实似乎 link 是在 iframe 使用的同一个 url 中生成的。您可以使用请求来获取：

import requests
from bs4 import BeautifulSoup
res = requests.get('https://filmovitica.com/pucanj-u-sljiviku-preko-reke-1978-domaci-film-gledaj-online/')
soup = bs(res.content, 'lxml')
print(soup.select_one('iframe[allowfullscreen]')['src'])

您可以在 uri 中的一个脚本标记中找到它（您的字符串）是如何生成的（请参阅开头以蓝色突出显示的行：

稍后在 js 中：

Javascript 代码未使用 python 代码提取

Javascript code not extracted with python code

html

javascript

python

selenium

phantomjs