Beautifulsoup、urllib2 和请求未从 中找到所有 HTML 标签

Beautifulsoup, urllib2 and requests did not find all HTML tags from

我正在尝试抓取 9gag 评论部分以进行一些情绪分析并将 post 标记为正面或负面。最终目标是训练数千 post 的数据,并根据评论计数、post 赞、前十名评论赞和标题预测 post 的情绪的 post。

我成功地抓取了标题和点赞的热门部分,但是在抓取评论时,Html 解析器不会显示相关标签。我尝试了不同的库,如 BS4、Requests、Pattern、urllib1/2。我什至尝试 'html.parser' 而不是 lxml。

我的问题是 9gag 评论部分被限制抓取?如果不是,是否有任何解析器无法获取所有标签的原因?

更新 #2- 这是我使用的代码-

    url = URL("")
    req = requests.get(url)
    soup = BeautifulSoup(req.text, 'html.parser')
    soup.findAll("div", attrs={"class":"comment-embed"})

输出看起来像空列表- []

他们的评论是通过 reactjs 加载的,您需要执行 javascript 的东西才能抓取评论部分。



数据是使用 React 加载的,但您可以进行一些解析并以 json 格式获取所需的所有数据:

import requests
from urlparse import urljoin
import ast

base = ""

# these are the params to get the json.
params = {"appId": "",
          "url": "",
          "count": "10",
          "level": "2",
          "order": "score",
          "mentionMapping": "true",
          "origin": ""}

js = "Request URL:"

with requests.session() as s:
    r = s.get(base)
    soup = BeautifulSoup(r.content,"lxml")
    # links to each actual page.
    links = [urljoin(base, a["href"]) for a in"a.badge-evt.point"")]
    for link in links:
        cont = s.get(link).content
        soup = BeautifulSoup(cont,"lxml")
        # the params are all in the script body
        script = soup.find("script", text=re.compile('appId')).text
        # convert to dict so we can pull what we need by key
        data = ast.literal_eval(script[script.find("{"):script.rfind("}") + 1])
        params["appId"] = data["appId"]
        params["url"] = data["url"]
        page_json = s.get(js, params=params).json()
        for dct in page_json["payload"]["comments"]:

如果我们 运行 仅使用第一个 url 返回的代码,我们得到:

In [28]: with requests.session() as s:
   ....:         r = s.get(base)
   ....:         soup = BeautifulSoup(r.content,"lxml")
   ....:         links = [urljoin(base, a["href"]) for a in"a.comment.badge-evt")][:1]
   ....:         for link in links:
   ....:                 cont = s.get(link).content
   ....:                 soup = BeautifulSoup(cont,"lxml")
   ....:                 script = soup.find("script", text=re.compile('appId')).text
   ....:                 data = ast.literal_eval(script[script.find("{"):script.rfind("}") + 1])
   ....:                 params["appId"] = data["appId"]
   ....:                 params["url"] = data["url"]
   ....:                 page_json = s.get(js, params=params).json()
   ....:                 for dct in page_json["payload"]["comments"]:
   ....:                         print(dct)
{u'hasNext': True, u'dislikeCount': 0, u'text': u'This is so awkward to watch ... and funny', u'userId': u'u_13759018032623', u'likeCount': 343, u'orderKey': u'score_00000000004834_14651297124662', u'children': [{u'hasNext': False, u'dislikeCount': 0, u'text': u'@twistedpickle.and also fake.', u'userId': u'u_145548331532421082', u'likeCount': 26, u'children': [], u'isCollapsed': 0, u'mediaText': u'@twistedpickle.and also fake.', u'section': u'', u'mentionMapping': {u'@twistedpickle': u'aBL7q1'}, u'commentId': u'c_146513113612585611', u'type': u'text', u'status': 0, u'parent': u'c_146512971246623391', u'timestamp': 1465131136, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'savage_ali', u'avatarUrl': u'', u'timestamp': u'1455483315', u'userId': u'u_145548331532421082', u'hashedAccountId': u'anbN66n', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u''}, u'accountId': u'34323189', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'', u'level': 2, u'suppData': {}, u'richtext': u'@twistedpickle.and also fake.', u'childrenTotal': 0, u'isAnonymous': 0}], u'isCollapsed': 0, u'mediaText': u'This is so awkward to watch ... and funny', u'section': u'', u'mentionMapping': {u'dummy': u''}, u'commentId': u'c_146512971246623391', u'type': u'text', u'status': 0, u'parent': u'c_146493707813378457', u'timestamp': 1465129712, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'twistedpickle', u'avatarUrl': u'', u'timestamp': u'1375901803', u'userId': u'u_13759018032623', u'hashedAccountId': u'aBL7q1', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u''}, u'accountId': u'1870095', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'', u'level': 1, u'suppData': {}, u'richtext': u'This is so awkward to watch ... and funny', u'childrenTotal': 19, u'isAnonymous': 0}
{u'hasNext': True, u'dislikeCount': 0, u'text': u'Hahaha PANTURA', u'userId': u'u_143454521023534763', u'likeCount': 231, u'orderKey': u'score_00000000004076_14649387351969', u'children': [{u'hasNext': False, u'dislikeCount': 0, u'text': u'@deadfight nussittuna nukut paremmin', u'userId': u'u_141790386790069041', u'likeCount': 39, u'children': [], u'isCollapsed': 0, u'mediaText': u'@deadfight nussittuna nukut paremmin', u'section': u'', u'mentionMapping': {u'@deadfight': u'aYLgpy7'}, u'commentId': u'c_146513018381635287', u'type': u'text', u'status': 0, u'parent': u'c_146493873519691145', u'timestamp': 1465130183, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'lady_kappa', u'avatarUrl': u'', u'timestamp': u'1417903867', u'userId': u'u_141790386790069041', u'hashedAccountId': u'a5K8b5N', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u''}, u'accountId': u'22251683', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'', u'level': 2, u'suppData': {}, u'richtext': u'@deadfight nussittuna nukut paremmin', u'childrenTotal': 0, u'isAnonymous': 0}], u'isCollapsed': 0, u'mediaText': u'Hahaha PANTURA', u'section': u'', u'mentionMapping': {u'dummy': u''}, u'commentId': u'c_146493873519691145', u'type': u'text', u'status': 0, u'parent': u'c_146493707813378457', u'timestamp': 1464938735, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'deadfight', u'avatarUrl': u'', u'timestamp': u'1434545210', u'userId': u'u_143454521023534763', u'hashedAccountId': u'aYLgpy7', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u''}, u'accountId': u'27180133', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'', u'level': 1, u'suppData': {}, u'richtext': u'Hahaha PANTURA', u'childrenTotal': 16, u'isAnonymous': 0}
{u'hasNext': True, u'dislikeCount': 0, u'text': u'', u'userId': u'u_141680114571912397', u'likeCount': 225, u'orderKey': u'score_00000000003373_14649381081078', u'children': [{u'hasNext': False, u'dislikeCount': 0, u'text': u'@shogun_ka_yo up you go', u'userId': u'u_144283683005248817', u'likeCount': 2, u'children': [], u'isCollapsed': 0, u'mediaText': u'@shogun_ka_yo up you go', u'section': u'', u'mentionMapping': {u'@shogun_ka_yo': u'aMQRLRW'}, u'commentId': u'c_146513150738658348', u'type': u'text', u'status': 0, u'parent': u'c_146493810810784782', u'timestamp': 1465131507, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'dergermanyball', u'avatarUrl': u'', u'timestamp': u'', u'userId': u'u_144283683005248817', u'hashedAccountId': u'a1dpXrY', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u''}, u'accountId': u'29998985', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'', u'level': 2, u'suppData': {}, u'richtext': u'@shogun_ka_yo up you go', u'childrenTotal': 0, u'isAnonymous': 0}], u'isCollapsed': 0, u'mediaText': u'', u'section': u'', u'mentionMapping': {u'dummy': u''}, u'commentId': u'c_146493810810784782', u'type': u'media', u'status': 0, u'parent': u'c_146493707813378457', u'timestamp': 1464938108, u'embedMediaMeta': {u'embedImage': {u'type': u'ANIMATED', u'image': {u'url': u'', u'width': 400, u'height': 206}, u'animated': {u'url': u'', u'width': 400, u'height': 206}, u'video': {u'url': u'', u'width': 400, u'height': 206}}}, u'user': {u'displayName': u'shogun_ka_yo', u'avatarUrl': u'', u'timestamp': u'1416801145', u'userId': u'u_141680114571912397', u'hashedAccountId': u'aMQRLRW', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u''}, u'accountId': u'22391718', u'permissions': []}, u'isUrl': 1, u'isLike': {u'value': 0}, u'permalink': u'', u'level': 1, u'suppData': {}, u'richtext': u'[url][/url]', u'childrenTotal': 4, u'isAnonymous': 0}
{u'hasNext': True, u'dislikeCount': 0, u'text': u'Now imagine if the genders were reversed', u'userId': u'u_143552720523387146', u'likeCount': 179, u'orderKey': u'score_00000000003144_14651301155438', u'children': [{u'hasNext': False, u'dislikeCount': 0, u'text': u'@rednotash hush little one. You're making sense now', u'userId': u'u_141363015125977644', u'likeCount': 77, u'children': [], u'isCollapsed': 0, u'mediaText': u'@rednotash hush little one. You're making sense now', u'section': u'', u'mentionMapping': {u'@rednotash': u'aOv8RMy'}, u'commentId': u'c_146513114535963914', u'type': u'text', u'status': 0, u'parent': u'c_146513011554386056', u'timestamp': 1465131145, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'srslydude', u'avatarUrl': u'', u'timestamp': u'1413630151', u'userId': u'u_141363015125977644', u'hashedAccountId': u'aYwvpZx', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u''}, u'accountId': u'21558777', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'', u'level': 2, u'suppData': {}, u'richtext': u'@rednotash hush little one. You're making sense now', u'childrenTotal': 0, u'isAnonymous': 0}], u'isCollapsed': 0, u'mediaText': u'Now imagine if the genders were reversed', u'section': u'', u'mentionMapping': {u'dummy': u''}, u'commentId': u'c_146513011554386056', u'type': u'text', u'status': 0, u'parent': u'c_146493707813378457', u'timestamp': 1465130115, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'rednotash', u'avatarUrl': u'', u'timestamp': u'1435527205', u'userId': u'u_143552720523387146', u'hashedAccountId': u'aOv8RMy', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u''}, u'accountId': u'27823975', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'', u'level': 1, u'suppData': {}, u'richtext': u'Now imagine if the genders were reversed', u'childrenTotal': 9, u'isAnonymous': 0}
{u'hasNext': True, u'dislikeCount': 0, u'text': u'Never let your waif follow you? Well she wouldnt follow you if you werent a dickhead. Women have the sixth sense . We know whats going on.', u'userId': u'u_145321627176216569', u'likeCount': 78, u'orderKey': u'score_00000000002462_14651303108023', u'children': [{u'hasNext': False, u'dislikeCount': 0, u'text': u'@marshmallowww What if I tell you that gender has nothing to do with it? Men have that "sixth sense" too.', u'userId': u'u_143741207696358239', u'likeCount': 56, u'children': [], u'isCollapsed': 0, u'mediaText': u'@marshmallowww What if I tell you that gender has nothing to do with it? Men have that "sixth sense" too.', u'section': u'', u'mentionMapping': {u'@marshmallowww': u'ab693MB'}, u'commentId': u'c_146513102333226094', u'type': u'text', u'status': 0, u'parent': u'c_146513031080236628', u'timestamp': 1465131023, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'the_hidden', u'avatarUrl': u'', u'timestamp': u'1437412076', u'userId': u'u_143741207696358239', u'hashedAccountId': u'aop4wG2', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u''}, u'accountId': u'28267060', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'', u'level': 2, u'suppData': {}, u'richtext': u'@marshmallowww What if I tell you that gender has nothing to do with it? Men have that "sixth sense" too.', u'childrenTotal': 0, u'isAnonymous': 0}], u'isCollapsed': 0, u'mediaText': u'Never let your waif follow you? Well she wouldnt follow you if you werent a dickhead. Women have the sixth sense . We know whats going on.', u'section': u'', u'mentionMapping': {u'dummy': u''}, u'commentId': u'c_146513031080236628', u'type': u'text', u'status': 0, u'parent': u'c_146493707813378457', u'timestamp': 1465130310, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'marshmallowww', u'avatarUrl': u'', u'timestamp': u'1453216271', u'userId': u'u_145321627176216569', u'hashedAccountId': u'ab693MB', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u''}, u'accountId': u'33477821', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'', u'level': 1, u'suppData': {}, u'richtext': u'Never let your waif follow you? Well she wouldnt follow you if you werent a dickhead. Women have the sixth sense . We know whats going on.', u'childrenTotal': 20, u'isAnonymous': 0}
{u'hasNext': True, u'dislikeCount': 0, u'text': u'But is correct that she can hit him? i mean, "no violence" right? if SHE is drunk and doing stupid things, and the husband go and hit her, is correct too? because equality.', u'userId': u'u_143329792027606743', u'likeCount': 54, u'orderKey': u'score_00000000001796_14651298735006', u'children': [{u'hasNext': False, u'dislikeCount': 0, u'text': u'@pcmasteracer yes it's correct', u'userId': u'u_143073218849877360', u'likeCount': 9, u'children': [], u'isCollapsed': 0, u'mediaText': u'@pcmasteracer yes it's correct', u'section': u'', u'mentionMapping': {u'@pcmasteracer': u'avnOvdq'}, u'commentId': u'c_146513013516459530', u'type': u'text', u'status': 0, u'parent': u'c_146512987350064451', u'timestamp': 1465130135, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'kkakuka97', u'avatarUrl': u'', u'timestamp': u'1430732188', u'userId': u'u_143073218849877360', u'hashedAccountId': u'a4j4NWy', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u''}, u'accountId': u'26450856', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'', u'level': 2, u'suppData': {}, u'richtext': u'@pcmasteracer yes it's correct', u'childrenTotal': 0, u'isAnonymous': 0}], u'isCollapsed': 0, u'mediaText': u'But is correct that she can hit him? i mean, "no violence" right? if SHE is drunk and doing stupid things, and the husband go and hit her, is correct too? because equality.', u'section': u'', u'mentionMapping': {u'dummy': u''}, u'commentId': u'c_146512987350064451', u'type': u'text', u'status': 0, u'parent': u'c_146493707813378457', u'timestamp': 1465129873, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'pcmasteracer', u'avatarUrl': u'', u'timestamp': u'1433297920', u'userId': u'u_143329792027606743', u'hashedAccountId': u'avnOvdq', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u''}, u'accountId': u'27225255', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'', u'level': 1, u'suppData': {}, u'richtext': u'But is correct that she can hit him? i mean, "no violence" right? if SHE is drunk and doing stupid things, and the husband go and hit her, is correct too? because equality.', u'childrenTotal': 7, u'isAnonymous': 0}
{u'hasNext': False, u'dislikeCount': 0, u'text': u'I can hear the 'BONG!'', u'userId': u'u_13987497367750', u'likeCount': 30, u'orderKey': u'score_00000000001168_14650124142865', u'children': [{u'hasNext': False, u'dislikeCount': 0, u'text': u'@yajirobe__ but not boing', u'userId': u'u_13775281935884', u'likeCount': 4, u'children': [], u'isCollapsed': 0, u'mediaText': u'@yajirobe__ but not boing', u'section': u'', u'mentionMapping': {u'@yajirobe__': u'avgE1Y5'}, u'commentId': u'c_146513060674619430', u'type': u'text', u'status': 0, u'parent': u'c_146501241428653553', u'timestamp': 1465130606, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'siophang', u'avatarUrl': u'', u'timestamp': u'1377528193', u'userId': u'u_13775281935884', u'hashedAccountId': u'aBQK6qO', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u''}, u'accountId': u'11455251', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'', u'level': 2, u'suppData': {}, u'richtext': u'@yajirobe__ but not boing', u'childrenTotal': 0, u'isAnonymous': 0}], u'isCollapsed': 0, u'mediaText': u'I can hear the 'BONG!'', u'section': u'', u'mentionMapping': {u'dummy': u''}, u'commentId': u'c_146501241428653553', u'type': u'text', u'status': 0, u'parent': u'c_146493707813378457', u'timestamp': 1465012414, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'yajirobe__', u'avatarUrl': u'', u'timestamp': u'1398749736', u'userId': u'u_13987497367750', u'hashedAccountId': u'avgE1Y5', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u''}, u'accountId': u'16992199', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'', u'level': 1, u'suppData': {}, u'richtext': u'I can hear the 'BONG!'', u'childrenTotal': 1, u'isAnonymous': 0}
{u'hasNext': False, u'dislikeCount': 0, u'text': u'', u'userId': u'u_13907047642371', u'likeCount': 21, u'orderKey': u'score_00000000000967_14649476233018', u'children': [{u'hasNext': False, u'dislikeCount': 0, u'text': u'@kaylaruffalo mfw', u'userId': u'u_13907047642371', u'likeCount': 0, u'children': [], u'isCollapsed': 0, u'mediaText': u'@kaylaruffalo mfw', u'section': u'', u'mentionMapping': {u'@kaylaruffalo': u'adYKGQj'}, u'commentId': u'c_146494763324897147', u'type': u'text', u'status': 0, u'parent': u'c_146494762330186947', u'timestamp': 1464947633, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'kaylaruffalo', u'avatarUrl': u'', u'timestamp': u'1390704764', u'userId': u'u_13907047642371', u'hashedAccountId': u'adYKGQj', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u''}, u'accountId': u'16005886', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'', u'level': 2, u'suppData': {}, u'richtext': u'@kaylaruffalo mfw', u'childrenTotal': 0, u'isAnonymous': 0}], u'isCollapsed': 0, u'mediaText': u'', u'section': u'', u'mentionMapping': {u'dummy': u''}, u'commentId': u'c_146494762330186947', u'type': u'media', u'status': 0, u'parent': u'c_146493707813378457', u'timestamp': 1464947623, u'embedMediaMeta': {u'embedImage': {u'type': u'ANIMATED', u'image': {u'url': u'', u'width': 500, u'height': 400}, u'animated': {u'url': u'', u'width': 500, u'height': 400}, u'video': {u'url': u'', u'width': 500, u'height': 400}}}, u'user': {u'displayName': u'kaylaruffalo', u'avatarUrl': u'', u'timestamp': u'1390704764', u'userId': u'u_13907047642371', u'hashedAccountId': u'adYKGQj', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u''}, u'accountId': u'16005886', u'permissions': []}, u'isUrl': 1, u'isLike': {u'value': 0}, u'permalink': u'', u'level': 1, u'suppData': {}, u'richtext': u'[url][/url]', u'childrenTotal': 1, u'isAnonymous': 0}
{u'hasNext': False, u'dislikeCount': 0, u'text': u'Look at the dude in the red shirt run XD', u'userId': u'u_144176454299618603', u'likeCount': 15, u'orderKey': u'score_00000000000806_14651298710300', u'children': [{u'hasNext': False, u'dislikeCount': 0, u'text': u'@crazybrownguy he knew he was next', u'userId': u'u_13976607580627', u'likeCount': 1, u'children': [], u'isCollapsed': 0, u'mediaText': u'@crazybrownguy he knew he was next', u'section': u'', u'mentionMapping': {u'@crazybrownguy': u'agGWL5q'}, u'commentId': u'c_146514413390208345', u'type': u'text', u'status': 0, u'parent': u'c_146512987103009031', u'timestamp': 1465144133, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'lightfoot2012', u'avatarUrl': u'', u'timestamp': u'1397660758', u'userId': u'u_13976607580627', u'hashedAccountId': u'axZPvbp', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u''}, u'accountId': u'17248879', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'', u'level': 2, u'suppData': {}, u'richtext': u'@crazybrownguy he knew he was next', u'childrenTotal': 0, u'isAnonymous': 0}], u'isCollapsed': 0, u'mediaText': u'Look at the dude in the red shirt run XD', u'section': u'', u'mentionMapping': {u'dummy': u''}, u'commentId': u'c_146512987103009031', u'type': u'text', u'status': 0, u'parent': u'c_146493707813378457', u'timestamp': 1465129871, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'crazybrownguy', u'avatarUrl': u'', u'timestamp': u'1441764542', u'userId': u'u_144176454299618603', u'hashedAccountId': u'agGWL5q', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u''}, u'accountId': u'29662036', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'', u'level': 1, u'suppData': {}, u'richtext': u'Look at the dude in the red shirt run XD', u'childrenTotal': 1, u'isAnonymous': 0}
{u'hasNext': True, u'dislikeCount': 0, u'text': u'', u'userId': u'u_144337172763285563', u'likeCount': 5, u'orderKey': u'score_00000000000626_14651301539010', u'children': [{u'hasNext': False, u'dislikeCount': 0, u'text': u'@wat_ya_doin I agree with that wife', u'userId': u'u_144337172763285563', u'likeCount': 3, u'children': [], u'isCollapsed': 0, u'mediaText': u'@wat_ya_doin I agree with that wife', u'section': u'', u'mentionMapping': {u'@wat_ya_doin': u'ay8yRoM'}, u'commentId': u'c_146513018506335085', u'type': u'text', u'status': 0, u'parent': u'c_146513015390105680', u'timestamp': 1465130185, u'embedMediaMeta': {u'dummy': []}, u'user': {u'displayName': u'wat_ya_doin', u'avatarUrl': u'', u'timestamp': u'', u'userId': u'u_144337172763285563', u'hashedAccountId': u'ay8yRoM', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u''}, u'accountId': u'29948571', u'permissions': []}, u'isUrl': 0, u'isLike': {u'value': 0}, u'permalink': u'', u'level': 2, u'suppData': {}, u'richtext': u'@wat_ya_doin I agree with that wife', u'childrenTotal': 0, u'isAnonymous': 0}], u'isCollapsed': 0, u'mediaText': u'', u'section': u'', u'mentionMapping': {u'dummy': u''}, u'commentId': u'c_146513015390105680', u'type': u'media', u'status': 0, u'parent': u'c_146493707813378457', u'timestamp': 1465130153, u'embedMediaMeta': {u'embedImage': {u'type': u'ANIMATED', u'image': {u'url': u'', u'width': 319, u'height': 260}, u'animated': {u'url': u'', u'width': 319, u'height': 260}, u'video': {u'url': u'', u'width': 318, u'height': 260}}}, u'user': {u'displayName': u'wat_ya_doin', u'avatarUrl': u'', u'timestamp': u'', u'userId': u'u_144337172763285563', u'hashedAccountId': u'ay8yRoM', u'profileUrls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u''}, u'accountId': u'29948571', u'permissions': []}, u'isUrl': 1, u'isLike': {u'value': 0}, u'permalink': u'', u'level': 1, u'suppData': {}, u'richtext': u'[url][/url]', u'childrenTotal': 3, u'isAnonymous': 0}

例如,我们可以从 dct 中提取文本,然后遍历 dct["children"] 以获得更多评论:

In [30]: params = {"appId": "",
   ....:           "url": "",
   ....:           "count": "2",
   ....:           "level": "2",
   ....:           "order": "score",
   ....:           "mentionMapping": "true",
   ....:           "origin": ""}

In [31]: js = "Request URL:"

In [32]: with requests.session() as s:
   ....:         r = s.get(base)
   ....:         soup = BeautifulSoup(r.content,"lxml")
   ....:         links = [urljoin(base, a["href"]) for a in"a.badge-evt.point")][:1]
   ....:         for link in links:
   ....:                 cont = s.get(link).content
   ....:                 soup = BeautifulSoup(cont,"lxml")
   ....:                 script = soup.find("script", text=re.compile('appId')).text
   ....:                 data = ast.literal_eval(script[script.find("{"):script.rfind("}") + 1])
   ....:                 params["appId"] = data["appId"]
   ....:                 params["url"] = data["url"]
   ....:                 page_json = s.get(js, params=params).json()
   ....:                 for dct in page_json["payload"]["comments"]:
   ....:                         print(dct["text"])
   ....:                         for child in dct["children"]:
   ....:                                 print(child["text"])

Once again this is a post made by someone who has no idea what true love is. True love is jealous, painful, and difficult. It's a battle it always will be. You're either fighting yourself to be a better person, fighting life to give the other person the life they deserve or fighting the other person. But true love is worth all of it, its also beautiful, kind, gentle and warm.  No relationship is perfect. There is not "8 ways to know". The one for you is the one who will put up with your shit but at the same time make you want to make yourself a better person. Your true love will get on your nerves, piss you off, hurt you, but they will also love you, hold you up when you can't and forgive you. True love is when you find someone you can stand beside through anything, someone who would never want to hurt you  When you find someone you can trust no matter what. No one is perfect and there is more than one person in the world you can fall in love with, but when you find that person, you fi
@celticdraconian this Is so true
Comment complaining that this will lead straight to the "friendzone"
Comment saying the "Friendzone" is not a thing.

你可以看到我将参数计数更改为 2,以获取所有数据,你可以将其设置为非常高的数字,如 "count":"1000" 以获取你在不断加载更多评论时所需的所有数据在页面上: