如何使用 Graph API 和 User Token 抓取 Facebook 数据?

How to scrape Facebook data using Graph API and the User Token?

我正在尝试抓取 public 页面的 Facebook 数据。

我几个月前(也许 10 个月前)使用的代码运行良好。现在,当我想继续那个项目,但代码不再工作时。我曾经使用我的私人用户令牌,它会在几分钟后过期。但这对我的用例来说已经足够了。我不需要 App 和 App Review 等来获得永久令牌。

代码如下:

def getData(page, urlToConnect, startTime, filterStart, filterEnd):

    posts = []
    found = False

    try:
        while (True):
            #print(url)
            facebook_connection = urlopen(urlToConnect)
            data = facebook_connection.read().decode('utf8')
            json_object = json.loads(data)
            #posts=json_object["data"]
            allposts=json_object["data"]
            allposts = np.asarray(allposts)
            created = startTime
            for i in range(0,100,1):
                if (pd.to_datetime(allposts[i]['created_time']) > pd.to_datetime(created)):
                    posts.append(allposts[i])
                else:
                    print(" found date at this index: ", i)
                    posts.append(allposts[i])
                    found = True
                    break;
                if (i == 99):
                    urlToConnect = json_object["paging"]["next"]
            if (found == True):
                break; 



        df=pd.DataFrame(allposts)

        df['Angry'] = df['Angry'].astype(str).str.replace('{\'data\':(.*?)count\': ','')
        df['Angry'] = df['Angry'].str.replace(',(.*?)}}','')

        df['Haha'] = df['Haha'].astype(str).str.replace('{\'data\':(.*?)count\': ','')
        df['Haha'] = df['Haha'].str.replace('}}','')

        df['Love'] = df['Love'].astype(str).str.replace('{\'data\':(.*?)count\': ','')
        df['Love'] = df['Love'].str.replace('}}','')

        df['Sad'] = df['Sad'].astype(str).str.replace('{\'data\':(.*?)count\': ','')
        df['Sad'] = df['Sad'].str.replace(',(.*?)}}','')

        df['Wow'] = df['Wow'].astype(str).str.replace('{\'data\':(.*?)count\': ','')
        df['Wow'] = df['Wow'].str.replace('}}','')

        df['comments'] = df['comments'].astype(str).str.replace('{\'data\':(.*?)count\': ','')
        df['comments'] = df['comments'].str.replace(',(.*?)}}','')

        df['likes'] = df['likes'].astype(str).str.replace('{\'(.*?)count\':','')
        df['likes'] = df['likes'].str.replace(',(.*?)}}','')

        df['shares'] = df['shares'].astype(str).str.replace('{\'count\': ','')
        df['shares'] = df['shares'].str.replace('}','')

        df['date'], df['time'] = df['created_time'].astype(str).str.split('T', 1).str
        df['time'] = df['time'].str.replace('[+]0000','')

        # Convert NaN's to 0 (as string)
        df['shares'] = df['shares'].str.replace('nan','0')
        df['shares'] = df['shares'].str.replace('Nan','0')
        df['shares'] = df['shares'].str.replace('NaN','0')

        # Convert Series values from str to int
        df['shares'] = df['shares'].astype(int)
        df['likes'] = df['likes'].astype(int)
        df['comments'] = df['comments'].astype(int)
        df['Love'] = df['Love'].astype(int)
        df['Wow'] = df['Wow'].astype(int)
        df['Sad'] = df['Sad'].astype(int)
        df['Angry'] = df['Angry'].astype(int)
        df['Haha'] = df['Haha'].astype(int)


        # Sum over all number columns of one row
        col_list= list(df)
        df['total_reac'] = df[col_list].sum(axis=1)

        # Sort values by 'total_reac' column, descending
        df = df.sort_values(by='total_reac', ascending=False)

        # Convert column from str to datetime
        df['created_time'] = pd.to_datetime(df['created_time'])

        # Filter for dates needed
        df = df[(df['created_time'] > fStart) & (df['created_time'] <= fEnd)]


        # Save Dataframe as csv
        df.to_csv("Facebook_Posts_" + page + ".csv" )



    except Exception as ex:
        print (ex)

    return df



token="my_User__Token_Here (got from my personal  https://developers.facebook.com/tools/explorer)"

sTime = '2018-05-01'
fStart = '2018-05-01'
fEnd = '2018-05-29'


page_id="nytimes"

url="https://graph.facebook.com/3.2/"+page_id+"/posts/?fields=id,created_time,message,shares.summary(true).limit(0),comments.summary(true).limit(0),likes.summary(true),reactions.type(LOVE).limit(0).summary(total_count).as(Love),reactions.type(WOW).limit(0).summary(total_count).as(Wow),reactions.type(HAHA).limit(0).summary(total_count).as(Haha),reactions.type(SAD).limit(0).summary(1).as(Sad),reactions.type(ANGRY).limit(0).summary(1).as(Angry)&access_token="+token+"&limit=100"

dataNYT = getData(page_id, url, sTime, fStart, fEnd)


dataNYT.to_csv("NYT_posts.csv")

这是我现在遇到的错误:

HTTP Error 400: Bad Request

当我尝试在浏览器中输入请求的 url 时,出现此错误:

{
   "error": {
      "message": "Unknown path components: /nytimes/posts",
      "type": "OAuthException",
      "code": 2500,
      "fbtrace_id": "HsN9zi+byTD"
   }
}

有人有想法吗?

不确定为什么会出现该错误,当我尝试在 API 资源管理器中调用 API 时,我得到了正确的错误:

{
  "error": {
    "message": "(#10) To use 'Page Public Content Access', your use of this endpoint must be reviewed and approved by Facebook. To submit this 'Page Public Content Access' feature for review please read our documentation on reviewable features: https://developers.facebook.com/docs/apps/review.",
    "type": "OAuthException",
    "code": 10,
    "fbtrace_id": "AZJ2HjKFmkW"
  }
}

您确实需要一个 App,并且您确实需要 App Review。为了访问您不拥有的页面,您必须获得 Facebook 的 "Page Public Content Access" 批准。之后,您甚至可以使用永不过期的 App Access Token。但是您仍然需要一个应用程序,以便始终进行任何 API 访问。

更多信息:https://developers.facebook.com/docs/apps/review/feature/?locale=de_DE#reference-PAGES_ACCESS