对于 deidentify_with_fpe() Python API 包装器 google DLP 需要传递哪些参数?
For the deidentify_with_fpe() Python API wrapper for google DLP what are the arguments needed to pass through?
我正在研究 google 云 dlp api 可用文档 here 具体来说,这个问题是关于 deidentify_with_fpe()
.
我的问题是需要通过函数传递给 return 匿名数据的参数的格式是什么。我现在的代码是
def deidentify_with_fpe(
string,
info_types,
alphabet=1,
project='XXXX-data-development',
surrogate_type=None,
key_name='projects/XXXX-data-development/locations/global/keyRings/google-dlp-test-global/cryptoKeys/google-dlp-test-key-global',
wrapped_key=WRAPPED
):
"read file in for wrapped key"
"""Uses the Data Loss Prevention API to deidentify sensitive data in a
string using Format Preserving Encryption (FPE).
Args:
project: The Google Cloud project id to use as a parent resource.
item: The string to deidentify (will be treated as text).
alphabet: The set of characters to replace sensitive ones with. For
more information, see https://cloud.google.com/dlp/docs/reference/
rest/v2beta2/organizations.deidentifyTemplates#ffxcommonnativealphabet
surrogate_type: The name of the surrogate custom info type to use. Only
necessary if you want to reverse the deidentification process. Can
be essentially any arbitrary string, as long as it doesn't appear
in your dataset otherwise.
key_name: The name of the Cloud KMS key used to encrypt ('wrap') the
AES-256 key. Example:
key_name = 'projects/YOUR_GCLOUD_PROJECT/locations/YOUR_LOCATION/
keyRings/YOUR_KEYRING_NAME/cryptoKeys/YOUR_KEY_NAME'
wrapped_key: The encrypted ('wrapped') AES-256 key to use. This key
should be encrypted using the Cloud KMS key specified by key_name.
Returns:
None; the response from the API is printed to the terminal.
"""
# Import the client library
import google.cloud.dlp
# Instantiate a client
dlp = google.cloud.dlp_v2.DlpServiceClient(credentials='/Users/callumsmyth/virtual_envs/google_dlp_test/XXXX.json')
dlp = dlp_client.from_service_account_json('/Users/callumsmyth/virtual_envs/google_dlp_test/XXXX.json')
# Convert the project id into a full resource id.
parent = dlp.project_path(project)
# The wrapped key is base64-encoded, but the library expects a binary
# string, so decode it here.
import base64
# wrapped_key = base64.b64decode(wrapped_key)
# Construct FPE configuration dictionary
crypto_replace_ffx_fpe_config = {
"crypto_key": {
"kms_wrapped": {
"wrapped_key": wrapped_key,
"crypto_key_name": key_name,
}
},
"common_alphabet": alphabet,
}
# Add surrogate type
if surrogate_type:
crypto_replace_ffx_fpe_config["surrogate_info_type"] = {
"name": surrogate_type
}
# Construct inspect configuration dictionary
inspect_config = {
"info_types": [{"name": info_type} for info_type in info_types]
}
# Construct deidentify configuration dictionary
deidentify_config = {
"info_type_transformations": {
"transformations": [
{
"primitive_transformation": {
"crypto_replace_ffx_fpe_config": crypto_replace_ffx_fpe_config
}
}
]
}
}
# Convert string to item
item = {"value": string}
# Call the API
response = dlp.deidentify_content(
parent,
inspect_config=inspect_config,
deidentify_config=deidentify_config,
item=item,
)
# Print results
print(response.item.value)
在哪里
with open('mysecret.txt.encrypted', 'rb') as f:
WRAPPED = f.read()
并且 mysecret.txt.encrypted
是由终端中的此命令生成的
--keyring google-dlp-test-global --key google-dlp-test-key-global \
--plaintext-file google-token.txt \
--ciphertext-file mysecret.txt.encrypted
当 google-token.txt 是从 here 生成时。
调用deidentify_with_fpe('My name is john smith', ['FIRST_NAME'])
时出现的错误如下:
_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.INVALID_ARGUMENT
details = "Could not de-identify all content due to transformation errors. See the error details for an overview of all the transformation errors encountered."
debug_error_string = "{"created":"@1581675678.839972000","description":"Error received from peer ipv4:216.58.213.10:443","file":"src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"Could not de-identify all content due to transformation errors. See the error details for an overview of all the transformation errors encountered.","grpc_status":3}"
这是以下原因的直接原因:
InvalidArgument: 400 Could not de-identify all content due to transformation errors. See the error details for an overview of all the transformation errors encountered.
所以我认为我的问题是与密钥有关 - 在它被加密之前。我在文档中看不到有关如何获取该密钥或如何将其传递给函数的任何地方。
我很感激这是一份冗长的提交,任何回复都将不胜感激,我花了太长时间尝试这样做,感觉我快要让它工作了
错误:
“google.api_core.exceptions.InvalidArgument: 400 由于转换错误,无法取消识别所有内容。请参阅错误详细信息,了解遇到的所有转换错误的概览。”
由于某些转换错误导致自由格式文本去标识化失败时的一般错误。不幸的是,python 库似乎没有公开错误详细信息。
根据服务文档 [1],检测到的令牌长度必须至少为两个字符:
The input value:
- Must be at least two characters long (or the empty string).
- Must be encoded as ASCII.
- Comprised of the characters specified by an "alphabet," which is the set of between 2 and 64 allowed characters in the input value. For more information, see the alphabet field in CryptoReplaceFfxFpeConfig.
[1] https://cloud.google.com/dlp/docs/transformations-reference#fpe
将字母表从 1 改为以下:
由字母指定的字符组成。有效选项:
- 数字
- 十六进制
- UPPER_CASE_ALPHA_NUMERIC
- ALPHA_NUMERIC
输入值:
- 长度必须至少为两个字符(或空字符串)。
- 必须由字母指定的字符组成。字母表可包含 2 到 95 个字符。 (95 个字符的字母表包括 US-ASCII 字符集中的所有可打印字符。)
如果您的输入格式为 111-222-333,那么您的自定义字母字段应为:"customAlphabet": "-0123456789"
我正在研究 google 云 dlp api 可用文档 here 具体来说,这个问题是关于 deidentify_with_fpe()
.
我的问题是需要通过函数传递给 return 匿名数据的参数的格式是什么。我现在的代码是
def deidentify_with_fpe(
string,
info_types,
alphabet=1,
project='XXXX-data-development',
surrogate_type=None,
key_name='projects/XXXX-data-development/locations/global/keyRings/google-dlp-test-global/cryptoKeys/google-dlp-test-key-global',
wrapped_key=WRAPPED
):
"read file in for wrapped key"
"""Uses the Data Loss Prevention API to deidentify sensitive data in a
string using Format Preserving Encryption (FPE).
Args:
project: The Google Cloud project id to use as a parent resource.
item: The string to deidentify (will be treated as text).
alphabet: The set of characters to replace sensitive ones with. For
more information, see https://cloud.google.com/dlp/docs/reference/
rest/v2beta2/organizations.deidentifyTemplates#ffxcommonnativealphabet
surrogate_type: The name of the surrogate custom info type to use. Only
necessary if you want to reverse the deidentification process. Can
be essentially any arbitrary string, as long as it doesn't appear
in your dataset otherwise.
key_name: The name of the Cloud KMS key used to encrypt ('wrap') the
AES-256 key. Example:
key_name = 'projects/YOUR_GCLOUD_PROJECT/locations/YOUR_LOCATION/
keyRings/YOUR_KEYRING_NAME/cryptoKeys/YOUR_KEY_NAME'
wrapped_key: The encrypted ('wrapped') AES-256 key to use. This key
should be encrypted using the Cloud KMS key specified by key_name.
Returns:
None; the response from the API is printed to the terminal.
"""
# Import the client library
import google.cloud.dlp
# Instantiate a client
dlp = google.cloud.dlp_v2.DlpServiceClient(credentials='/Users/callumsmyth/virtual_envs/google_dlp_test/XXXX.json')
dlp = dlp_client.from_service_account_json('/Users/callumsmyth/virtual_envs/google_dlp_test/XXXX.json')
# Convert the project id into a full resource id.
parent = dlp.project_path(project)
# The wrapped key is base64-encoded, but the library expects a binary
# string, so decode it here.
import base64
# wrapped_key = base64.b64decode(wrapped_key)
# Construct FPE configuration dictionary
crypto_replace_ffx_fpe_config = {
"crypto_key": {
"kms_wrapped": {
"wrapped_key": wrapped_key,
"crypto_key_name": key_name,
}
},
"common_alphabet": alphabet,
}
# Add surrogate type
if surrogate_type:
crypto_replace_ffx_fpe_config["surrogate_info_type"] = {
"name": surrogate_type
}
# Construct inspect configuration dictionary
inspect_config = {
"info_types": [{"name": info_type} for info_type in info_types]
}
# Construct deidentify configuration dictionary
deidentify_config = {
"info_type_transformations": {
"transformations": [
{
"primitive_transformation": {
"crypto_replace_ffx_fpe_config": crypto_replace_ffx_fpe_config
}
}
]
}
}
# Convert string to item
item = {"value": string}
# Call the API
response = dlp.deidentify_content(
parent,
inspect_config=inspect_config,
deidentify_config=deidentify_config,
item=item,
)
# Print results
print(response.item.value)
在哪里
with open('mysecret.txt.encrypted', 'rb') as f:
WRAPPED = f.read()
并且 mysecret.txt.encrypted
是由终端中的此命令生成的
--keyring google-dlp-test-global --key google-dlp-test-key-global \
--plaintext-file google-token.txt \
--ciphertext-file mysecret.txt.encrypted
当 google-token.txt 是从 here 生成时。
调用deidentify_with_fpe('My name is john smith', ['FIRST_NAME'])
时出现的错误如下:
_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.INVALID_ARGUMENT
details = "Could not de-identify all content due to transformation errors. See the error details for an overview of all the transformation errors encountered."
debug_error_string = "{"created":"@1581675678.839972000","description":"Error received from peer ipv4:216.58.213.10:443","file":"src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"Could not de-identify all content due to transformation errors. See the error details for an overview of all the transformation errors encountered.","grpc_status":3}"
这是以下原因的直接原因:
InvalidArgument: 400 Could not de-identify all content due to transformation errors. See the error details for an overview of all the transformation errors encountered.
所以我认为我的问题是与密钥有关 - 在它被加密之前。我在文档中看不到有关如何获取该密钥或如何将其传递给函数的任何地方。
我很感激这是一份冗长的提交,任何回复都将不胜感激,我花了太长时间尝试这样做,感觉我快要让它工作了
错误: “google.api_core.exceptions.InvalidArgument: 400 由于转换错误,无法取消识别所有内容。请参阅错误详细信息,了解遇到的所有转换错误的概览。”
由于某些转换错误导致自由格式文本去标识化失败时的一般错误。不幸的是,python 库似乎没有公开错误详细信息。
根据服务文档 [1],检测到的令牌长度必须至少为两个字符:
The input value:
- Must be at least two characters long (or the empty string).
- Must be encoded as ASCII.
- Comprised of the characters specified by an "alphabet," which is the set of between 2 and 64 allowed characters in the input value. For more information, see the alphabet field in CryptoReplaceFfxFpeConfig.
[1] https://cloud.google.com/dlp/docs/transformations-reference#fpe
将字母表从 1 改为以下:
由字母指定的字符组成。有效选项:
- 数字
- 十六进制
- UPPER_CASE_ALPHA_NUMERIC
- ALPHA_NUMERIC
输入值:
- 长度必须至少为两个字符(或空字符串)。
- 必须由字母指定的字符组成。字母表可包含 2 到 95 个字符。 (95 个字符的字母表包括 US-ASCII 字符集中的所有可打印字符。)
如果您的输入格式为 111-222-333,那么您的自定义字母字段应为:"customAlphabet": "-0123456789"