GCP AutoML TextSnippet 超过 10.000 个字符
GCP AutoML TextSnippet longer than 10.000 characters
我一直在使用 GCP AUtoML Python 库版本 2.2.0 进行文本提取,通常效果很好。但是有时它会给我这个错误:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 2447, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1952, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1821, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/usr/local/lib/python3.6/site-packages/flask/_compat.py", line 39, in reraise
raise value
File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1950, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1936, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/wr/api/simple_text_nlp.py", line 159, in extract_entity_from_text
predict_data, predict_id, predict_error = extract_entity.predict(text, model_path)
File "/usr/local/lib/python3.6/site-packages/ProfessorPatPending/ocr/textOperations.py", line 68, in predict
response = self.__client.predict(name=model_path, payload=payload)
File "/usr/local/lib/python3.6/site-packages/google/cloud/automl_v1/services/prediction_service/client.py", line 498, in predict
response = rpc(request, retry=retry, timeout=timeout, metadata=metadata,)
File "/usr/local/lib/python3.6/site-packages/google/api_core/gapic_v1/method.py", line 145, in __call__
return wrapped_func(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/google/api_core/grpc_helpers.py", line 75, in error_remapped_callable
six.raise_from(exceptions.from_grpc_error(exc), exc)
File "<string>", line 3, in raise_from
google.api_core.exceptions.InvalidArgument: 400 List of found errors: 1.Field: payload.text_snippet.content; Message: The provided string field value is longer than 10000: 10796
有问题的 TextSnippet 有超过 10.000 个字符,但是 documentation 明确指出最多可达 250.000 个字符。谁能给我解释一下这是怎么回事?
创建文本片段的代码是:
client = automl.PredictionServiceClient.from_service_account_file(sa_json_file)
text_snippet = automl.TextSnippet(content=text_data, mime_type="text/plain")
payload = automl.ExamplePayload(text_snippet=text_snippet)
response = client.predict(name=model_path, payload=payload)
出于显而易见的原因,我不会在此处 post text_data
本身。
谢谢。
您遇到的错误是 client.predict()
,因为您发送的 TextSnippet 大于 10k 个字符。 AutoML Entity Extraction 每个预测请求仅限 10k 个字符。
AutoML Natural Language Entity Extraction
- A TextSnippet up to 10,000 characters, UTF-8 NFC encoded or a document in .PDF, .TIF or .TIFF format with size upto 20MB.
我建议您拆分 TextSnippet 并发送多个请求或 trim 将 TextSnippet 设置为 10k 以满足字符数限制。
我一直在使用 GCP AUtoML Python 库版本 2.2.0 进行文本提取,通常效果很好。但是有时它会给我这个错误:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 2447, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1952, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1821, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/usr/local/lib/python3.6/site-packages/flask/_compat.py", line 39, in reraise
raise value
File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1950, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1936, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/wr/api/simple_text_nlp.py", line 159, in extract_entity_from_text
predict_data, predict_id, predict_error = extract_entity.predict(text, model_path)
File "/usr/local/lib/python3.6/site-packages/ProfessorPatPending/ocr/textOperations.py", line 68, in predict
response = self.__client.predict(name=model_path, payload=payload)
File "/usr/local/lib/python3.6/site-packages/google/cloud/automl_v1/services/prediction_service/client.py", line 498, in predict
response = rpc(request, retry=retry, timeout=timeout, metadata=metadata,)
File "/usr/local/lib/python3.6/site-packages/google/api_core/gapic_v1/method.py", line 145, in __call__
return wrapped_func(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/google/api_core/grpc_helpers.py", line 75, in error_remapped_callable
six.raise_from(exceptions.from_grpc_error(exc), exc)
File "<string>", line 3, in raise_from
google.api_core.exceptions.InvalidArgument: 400 List of found errors: 1.Field: payload.text_snippet.content; Message: The provided string field value is longer than 10000: 10796
有问题的 TextSnippet 有超过 10.000 个字符,但是 documentation 明确指出最多可达 250.000 个字符。谁能给我解释一下这是怎么回事?
创建文本片段的代码是:
client = automl.PredictionServiceClient.from_service_account_file(sa_json_file)
text_snippet = automl.TextSnippet(content=text_data, mime_type="text/plain")
payload = automl.ExamplePayload(text_snippet=text_snippet)
response = client.predict(name=model_path, payload=payload)
出于显而易见的原因,我不会在此处 post text_data
本身。
谢谢。
您遇到的错误是 client.predict()
,因为您发送的 TextSnippet 大于 10k 个字符。 AutoML Entity Extraction 每个预测请求仅限 10k 个字符。
AutoML Natural Language Entity Extraction
- A TextSnippet up to 10,000 characters, UTF-8 NFC encoded or a document in .PDF, .TIF or .TIFF format with size upto 20MB.
我建议您拆分 TextSnippet 并发送多个请求或 trim 将 TextSnippet 设置为 10k 以满足字符数限制。