python 输出数据文件的 GCP Cloud Functions

Question

我是 GCP 的新手，不确定 Cloud Functions 是否适合我。

我有一个 python 脚本，它使用 tweepy 调用推特 api 并生成一个 csv 文件，其中包含该特定用户名的推文列表。

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import tweepy
import datetime
import csv

def fetchTweets(username):
  # credentials from https://apps.twitter.com/
  consumerKey = "" # hidden for security reasons
  consumerSecret = "" # hidden for security reasons
  accessToken = "" # hidden for security reasons
  accessTokenSecret = "" # hidden for security reasons

  auth = tweepy.OAuthHandler(consumerKey, consumerSecret)
  auth.set_access_token(accessToken, accessTokenSecret)

  api = tweepy.API(auth)

  startDate = datetime.datetime(2019, 1, 1, 0, 0, 0)
  endDate =   datetime.datetime.now()
  print (endDate)

  tweets = []
  tmpTweets = api.user_timeline(username)

  for tweet in tmpTweets:
      if tweet.created_at < endDate and tweet.created_at > startDate:
          tweets.append(tweet)

  lastid = ""
  while (tmpTweets[-1].created_at > startDate and tmpTweets[-1].id != lastid):
      print("Last Tweet @", tmpTweets[-1].created_at, " - fetching some more")
      lastid = tmpTweets[-1].id
      tmpTweets = api.user_timeline(username, max_id = tmpTweets[-1].id)
      for tweet in tmpTweets:
          if tweet.created_at < endDate and tweet.created_at > startDate:
              tweets.append(tweet)

  # # for CSV

  #transform the tweepy tweets into a 2D array that will populate the csv   
  outtweets = [[tweet.id_str, tweet.created_at, tweet.text.encode("utf-8")] for tweet in tweets]

  #write the csv    
  with open('%s_tweets.csv' % username, 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerow(["id","created","text"])
    writer.writerows(outtweets)
  pass

  f = open('%s_tweets.csv' % username, "r")
  contents = f.read()
  return contents

fetchTweets('usernameofusertoretrieve') # this will be set manually in production

我想运行这个脚本并通过 http 请求检索结果（作为 csv 文件或 return contents），例如使用 javascript。该脚本每天只需运行一次。但是生成的数据 (csv) 应该可以根据需要使用。

因此我的问题是

一个。 GCP Cloud Functions 是完成这项工作的正确工具吗？或者这是否需要更广泛的东西，因此需要 GCP VM 实例？

b。要在 GCP 上运行需要更改代码中的哪些内容？

任何关于方向的help/advice也非常感谢。

Answer 1

如果不提供更多详细信息，您的问题不容易回答。但是，我会尝试提供一些见解

is GCP Cloud Functions the correct tool for the job? or will this require something more extensive and therefore a GCP VM instance?

视情况而定。 1 CPU 您的处理时间是否少于 9 分钟？您的进程占用的内存是否少于 2Gb（应用程序内存占用量 + 文件大小 + tweets 数组大小）？

为什么文件大小？因为只有 /tmp 目录是可写的，而且它是一个内存文件系统。

如果您需要最多 15 分钟的超时，可以查看 Cloud Run, very similar to Cloud Function and I personally prefer。 CPU 和内存的限制在 Cloud Function 和 Cloud 运行之间是相同的（但它应该会在 2020 年改变更多 CPU 和内存）

What would need to be changed in the code to make it run on GCP?

首先写入和读取 to/from /tmp 目录。最后，如果您希望您的文件全天可用，请将其存储在 Cloud Storage (https://cloud.google.com/storage/docs) 中并在函数开始时检索它。如果不存在，则为当天生成，否则获取现有的。

然后，将函数def fetchTweets(username):的签名替换为def fetchTweets(request):，获取请求参数中的用户名

最后，如果你想要每天一代，就设置一个Cloud Scheduler

你没有谈到安全问题。我建议您在 private mode

中部署您的函数

所以，这个答案中有很多GCP无服务器概念，我不知道你对GCP的了解。如果您想要某些零件的精度，请不要犹豫！

python 输出数据文件的 GCP Cloud Functions

GCP Cloud Functions for python output data file

python

tweepy

google-cloud-platform

google-cloud-functions

twitterapi-python