如何使用 DLP 扫描 BigQuery table 以查找敏感数据?
How to scan BigQuery table with DLP looking for sensitive data?
我想使用 DLP 在 BigQuery 中分析我的表。有可能的 ?怎么做 ?
有可能。您需要定义 storage_config 才能使用 BigQuery。
如果您想在另一个 table 中保存调查结果,请将 save_findings
操作添加到作业配置中。如果不执行操作,您将只能通过 projects.dlpJobs.get
方法访问职位的发现摘要。
按照python中的示例调用DLP扫描BigQuery:
client_dlp = dlp_v2.DlpServiceClient.from_service_account_json(JSON_FILE_NAME)
inspect_job_data = {
'storage_config': {
'big_query_options': {
'table_reference': {
'project_id': GCP_PROJECT_ID,
'dataset_id': DATASET_ID,
'table_id': TABLE_ID
},
'rows_limit':10000,
'sample_method':'RANDOM_START',
},
},
'inspect_config': {
'info_types': [
{'name': 'ALL_BASIC'},
],
},
'actions': [
{
'save_findings': {
'output_config':{
'table':{
'project_id': GCP_PROJECT_ID,
'dataset_id': DATASET_ID,
'table_id': '{}_DLP'.format(TABLE_ID)
}
}
},
},
]
}
operation = client_dlp.create_dlp_job(parent=client_dlp.project_path(GCP_PROJECT_ID), inspect_job=inspect_job_data)
和一个查询来分析结果:
client_bq = bigquery.Client.from_service_account_json(JSON_FILE_NAME)
# Perform a query.
QUERY = (
'WITH result AS ('
'SELECT'
' c1.info_type.name,'
' c1.likelihood,'
' content_locations.record_location.record_key.big_query_key.table_reference as bq,'
' content_locations.record_location.field_id as column '
'FROM '
' `'+ GCP_PROJECT_ID +'.'+ DATASET_ID +'.'+ TABLE_ID +'_DLP` as c1 '
'CROSS JOIN UNNEST(c1.location.content_locations) AS content_locations '
'WHERE c1.likelihood in (\'LIKELY\',\'VERY_LIKELY\'))'
'SELECT r.name as info_type, r.likelihood, r.bq.project_id, r.bq.dataset_id,'
' r.bq.table_id, r.column.name, count(*) as count FROM result r GROUP By 1,2,3,4,5,6 '
'ORDER By COUNT DESC'
)
query_job = client_bq.query(QUERY) # API request
rows = query_job.result()
for row in rows:
print('RULES: {} ({}) | COLUMN: {}.{}.{}:{} | count->{}'.format
(row.info_type, row.likelihood, row.project_id,row.dataset_id,row.table_id,row.name, row.count)
您可以找到更多详细信息here
关于您的用例的社区教程已发布:dlp-to-datacatalog-tags。
按照它,您可以在所有 Big Query 资源中 运行 DLP,并在 Google 数据目录中自动创建标签。
因此您可以使用 Google 数据目录搜索语法搜索敏感信息。
我想使用 DLP 在 BigQuery 中分析我的表。有可能的 ?怎么做 ?
有可能。您需要定义 storage_config 才能使用 BigQuery。
如果您想在另一个 table 中保存调查结果,请将 save_findings
操作添加到作业配置中。如果不执行操作,您将只能通过 projects.dlpJobs.get
方法访问职位的发现摘要。
按照python中的示例调用DLP扫描BigQuery:
client_dlp = dlp_v2.DlpServiceClient.from_service_account_json(JSON_FILE_NAME)
inspect_job_data = {
'storage_config': {
'big_query_options': {
'table_reference': {
'project_id': GCP_PROJECT_ID,
'dataset_id': DATASET_ID,
'table_id': TABLE_ID
},
'rows_limit':10000,
'sample_method':'RANDOM_START',
},
},
'inspect_config': {
'info_types': [
{'name': 'ALL_BASIC'},
],
},
'actions': [
{
'save_findings': {
'output_config':{
'table':{
'project_id': GCP_PROJECT_ID,
'dataset_id': DATASET_ID,
'table_id': '{}_DLP'.format(TABLE_ID)
}
}
},
},
]
}
operation = client_dlp.create_dlp_job(parent=client_dlp.project_path(GCP_PROJECT_ID), inspect_job=inspect_job_data)
和一个查询来分析结果:
client_bq = bigquery.Client.from_service_account_json(JSON_FILE_NAME)
# Perform a query.
QUERY = (
'WITH result AS ('
'SELECT'
' c1.info_type.name,'
' c1.likelihood,'
' content_locations.record_location.record_key.big_query_key.table_reference as bq,'
' content_locations.record_location.field_id as column '
'FROM '
' `'+ GCP_PROJECT_ID +'.'+ DATASET_ID +'.'+ TABLE_ID +'_DLP` as c1 '
'CROSS JOIN UNNEST(c1.location.content_locations) AS content_locations '
'WHERE c1.likelihood in (\'LIKELY\',\'VERY_LIKELY\'))'
'SELECT r.name as info_type, r.likelihood, r.bq.project_id, r.bq.dataset_id,'
' r.bq.table_id, r.column.name, count(*) as count FROM result r GROUP By 1,2,3,4,5,6 '
'ORDER By COUNT DESC'
)
query_job = client_bq.query(QUERY) # API request
rows = query_job.result()
for row in rows:
print('RULES: {} ({}) | COLUMN: {}.{}.{}:{} | count->{}'.format
(row.info_type, row.likelihood, row.project_id,row.dataset_id,row.table_id,row.name, row.count)
您可以找到更多详细信息here
关于您的用例的社区教程已发布:dlp-to-datacatalog-tags。
按照它,您可以在所有 Big Query 资源中 运行 DLP,并在 Google 数据目录中自动创建标签。
因此您可以使用 Google 数据目录搜索语法搜索敏感信息。