如何从我的 Cloud Storage 存储桶自动创建 BigQuery 表?
How can I automatically create BigQuery tables from my Cloud Storage bucket?
-
google-cloud-platform
-
google-bigquery
-
google-cloud-functions
-
google-cloud-storage
-
google-cloud-scheduler
我想创建一个每天凌晨 2 点运行的作业。此作业必须通过从 Cloud Storage 存储桶读取我的文件来创建 BigQuery table。我怎样才能做到这一点?
您可以直接将您的 firestore 备份导入 BigQuery。设置一个 load job,其中 sourceFormat 等于 DATASTORE_BACKUP
(即使对于 firestore 也是如此)并且 writeDisposition 为 WRITE_TRUNCATE
您可以将其包装到云函数中。您可以直接使用 API 或 client libraries。如果您需要代码示例,请给我您的语言,我会看看我能为您做些什么。
编辑
您需要在 package.json
中导入这些依赖项
"@google-cloud/bigquery": "^4.7.0",
"@google-cloud/storage": "^5.0.1",
然后,这里是具有静态值的函数。如果你愿意,你可以构建更动态的东西(例如通过阅读函数参数)。
const {Storage} = require('@google-cloud/storage');
const {BigQuery} = require('@google-cloud/bigquery');
const bigquery = new BigQuery();
const storage = new Storage();
//
const bucketName = "my_bucket" //to change
const fileExport = "path/to/my_export.export_metadata" //to change
const datasetId = "data" //to change
const tableId = "dsexport" //to change
exports.loadDSExport = async (req, res) => {
// Configure the load job. For full list of options, see:
// https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoad
const metadata = {
sourceFormat: 'DATASTORE_BACKUP',
autodetect: true,
location: 'EU', // Set your correct region
writeDisposition: "WRITE_TRUNCATE",
};
// Load data from a Google Cloud Storage file into the table
const [job] = await bigquery
.dataset(datasetId)
.table(tableId)
.load(storage.bucket(bucketName).file(fileExport), metadata);
// load() waits for the job to finish
// Can take time, increase function timeout if needed
// Check the job's status for errors
const errors = job.status.errors;
if (errors && errors.length > 0) {
//Handle error and return code here
throw errors;
}
console.log(`Job ${job.id} completed.`);
res.send(`Job ${job.id} completed.`);
};
然后,像这样部署你的函数(这里是私有模式)
gcloud beta functions deploy --runtime nodejs10 --trigger-http --entry-point loadDSExport --region europe-west1 loadDSExport
google-cloud-platform
google-bigquery
google-cloud-functions
google-cloud-storage
google-cloud-scheduler
我想创建一个每天凌晨 2 点运行的作业。此作业必须通过从 Cloud Storage 存储桶读取我的文件来创建 BigQuery table。我怎样才能做到这一点?
您可以直接将您的 firestore 备份导入 BigQuery。设置一个 load job,其中 sourceFormat 等于 DATASTORE_BACKUP
(即使对于 firestore 也是如此)并且 writeDisposition 为 WRITE_TRUNCATE
您可以将其包装到云函数中。您可以直接使用 API 或 client libraries。如果您需要代码示例,请给我您的语言,我会看看我能为您做些什么。
编辑
您需要在 package.json
中导入这些依赖项 "@google-cloud/bigquery": "^4.7.0",
"@google-cloud/storage": "^5.0.1",
然后,这里是具有静态值的函数。如果你愿意,你可以构建更动态的东西(例如通过阅读函数参数)。
const {Storage} = require('@google-cloud/storage');
const {BigQuery} = require('@google-cloud/bigquery');
const bigquery = new BigQuery();
const storage = new Storage();
//
const bucketName = "my_bucket" //to change
const fileExport = "path/to/my_export.export_metadata" //to change
const datasetId = "data" //to change
const tableId = "dsexport" //to change
exports.loadDSExport = async (req, res) => {
// Configure the load job. For full list of options, see:
// https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoad
const metadata = {
sourceFormat: 'DATASTORE_BACKUP',
autodetect: true,
location: 'EU', // Set your correct region
writeDisposition: "WRITE_TRUNCATE",
};
// Load data from a Google Cloud Storage file into the table
const [job] = await bigquery
.dataset(datasetId)
.table(tableId)
.load(storage.bucket(bucketName).file(fileExport), metadata);
// load() waits for the job to finish
// Can take time, increase function timeout if needed
// Check the job's status for errors
const errors = job.status.errors;
if (errors && errors.length > 0) {
//Handle error and return code here
throw errors;
}
console.log(`Job ${job.id} completed.`);
res.send(`Job ${job.id} completed.`);
};
然后,像这样部署你的函数(这里是私有模式)
gcloud beta functions deploy --runtime nodejs10 --trigger-http --entry-point loadDSExport --region europe-west1 loadDSExport