如何在我项目的所有 Big Query 表中 运行 Cloud DLP(数据丢失防护)?

How to run Cloud DLP (Data Loss Prevention) in all Big Query tables in my project?

根据 DLP docs,当您创建检查作业时,您需要指定 table 参考:

{
  "inspectJob":{
    "storageConfig":{
      "bigQueryOptions":{
        "tableReference":{
          "projectId":"bigquery-public-data",
          "datasetId":"usa_names",
          "tableId":"usa_1910_current"
        },
        "rowsLimit":"1000",
        "sampleMethod":"RANDOM_START",
        "identifyingFields":[
          {
            "name":"name"
          }
        ]
      }
    },
    "inspectConfig":{
      "infoTypes":[
        {
          "name":"FIRST_NAME"
        }
      ],
      "includeQuote":true
    },
    "actions":[
      {
        "saveFindings":{
          "outputConfig":{
            "table":{
              "projectId":"[PROJECT-ID]",
              "datasetId":"testingdlp",
              "tableId":"bqsample3"
            },
            "outputSchema":"BASIC_COLUMNS"
          }
        }
      }
    ]
  }
}

这意味着我需要为每个 table 创建一个 Inspect Job,我想在我所有的 Big Query 资源中查找敏感数据,该怎么做?

要在所有 Big Query 资源中 运行 DLP,您有两个选择。

  • 以编程方式获取您的大查询 table,然后为每个 table.

    触发一个 Inspect Job

    优点:更便宜,1 GB 到 50 TB - 每 GB 价格 1.00 美元

    缺点: 是批处理操作,不是实时执行的

    Python 样本,想法:

    client = bigquery.Client()
    datasets = list(client.list_datasets(project=project_id))
    
    if datasets:
        for dataset in datasets:
            tables = client.list_tables(dataset.dataset_id)
            for table in tables:
                # Create Inspect Job for table.table_id
    
  • 以编程方式获取您的大查询 tables,查询您的 table 并调用 DLP Streaming Content API.

    优点:这是一个实时操作。

    缺点:更贵,超过 1 GB - 每 GB 价格 3.00 美元

    Java 样本,想法:

    url =
        String.format(
            "jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;OAuthType=3;ProjectId=%s;",
            projectId);
    DataSource ds = new com.simba.googlebigquery.jdbc42.DataSource();
    ds.setURL(url);
    conn = ds.getConnection();
    DatabaseMetaData databaseMetadata = conn.getMetaData();
    ResultSet tablesResultSet =
        databaseMetadata.getTables(conn.getCatalog(), null, "%", new String[]{"TABLE"});
    while (tablesResultSet.next()) {
    // Query your Table Data and call DLP Streaming API
    }
    

有关第二个选项的完整教程,blog post 正在谈论它。

当心:"it is possible for costs to become very high, depending on the quantity of information that you instruct the Cloud DLP to scan. To learn several methods that you can use to keep costs down while also ensuring that you're using the Cloud DLP to scan the exact data that you intend to, see Keeping Cloud DLP costs under control."

账单信息在撰写本文时是最新的,有关最新信息,请查看 DLP billing docs page