使用 BeautifulSoup 查找具有特定字符串的 JavaScript 变量
Finding JavaScript variable with certain string with BeautifulSoup
我有一个棘手的任务,我需要在 JavaScript 变量中找到一些 HTML 并遍历它。
变量如下所示:
<script>
var someVar = new something.Something({
content: 'This text has to be found<br /><table></table>',
size: 230
)};
....
</script>
不知道JS变量的名字,只好根据This text has to be found
snippet/string找了。后来验证确实是一个JS变量,于是想取值<br /><table></table>
来遍历
在这种情况下,一种方法是使用 javascript 解析器 、slimit
。思路是找到所有脚本标签,遍历它们,解析代码,遍历语法树,检查每个赋值节点右边是否有你要找的文本:
from bs4 import BeautifulSoup
from slimit import ast
from slimit.parser import Parser
from slimit.visitors import nodevisitor
data = """
<script>
var someVar = new something.Something({
content: 'This text has to be found<br /><table></table>',
size: 230
});
</script>
"""
text_to_find = 'This text has to be found'
soup = BeautifulSoup(data)
for script in soup.find_all('script'):
parser = Parser()
tree = parser.parse(script.text)
for node in nodevisitor.visit(tree):
if isinstance(node, ast.Assign):
value = getattr(node.right, 'value', '')
if text_to_find in value:
print value
打印 'This text has to be found<br /><table></table>'
.
我不确定它是否完全符合您的需求,但希望这至少是一个开始。
另请参阅:
- JavaScript parser in Python
- Extracting text from script tag using BeautifulSoup in Python
我有一个棘手的任务,我需要在 JavaScript 变量中找到一些 HTML 并遍历它。
变量如下所示:
<script>
var someVar = new something.Something({
content: 'This text has to be found<br /><table></table>',
size: 230
)};
....
</script>
不知道JS变量的名字,只好根据This text has to be found
snippet/string找了。后来验证确实是一个JS变量,于是想取值<br /><table></table>
来遍历
在这种情况下,一种方法是使用 javascript 解析器 、slimit
。思路是找到所有脚本标签,遍历它们,解析代码,遍历语法树,检查每个赋值节点右边是否有你要找的文本:
from bs4 import BeautifulSoup
from slimit import ast
from slimit.parser import Parser
from slimit.visitors import nodevisitor
data = """
<script>
var someVar = new something.Something({
content: 'This text has to be found<br /><table></table>',
size: 230
});
</script>
"""
text_to_find = 'This text has to be found'
soup = BeautifulSoup(data)
for script in soup.find_all('script'):
parser = Parser()
tree = parser.parse(script.text)
for node in nodevisitor.visit(tree):
if isinstance(node, ast.Assign):
value = getattr(node.right, 'value', '')
if text_to_find in value:
print value
打印 'This text has to be found<br /><table></table>'
.
我不确定它是否完全符合您的需求,但希望这至少是一个开始。
另请参阅:
- JavaScript parser in Python
- Extracting text from script tag using BeautifulSoup in Python