Google Cloud Natural Language API:analyzeSyntax API 是否有一个令牌参数不在 partOfSpeech 结果中包含“*_UNKNOWN”值属性
Google Cloud Natural Language API: Does analyzeSyntax API have a param for tokens to not include "*_UNKNOWN" value attributes in partOfSpeech result
我想知道 API 端点是否有任何方式允许 analyzeSyntax
API response JSON to not include sub-attributes of partOfSpeech
dictionaries if they are *_UNKNOWN
? When looking at details around the document input, I can't find any way to limit the response document contents of partOfSpeech
。
这是只有清理数据时才会处理的东西吗,post-response?
示例查询 per API docs here 在名为 request.json
的文件中:
{
"encodingType": "UTF8",
"document": {
"type": "PLAIN_TEXT",
"content": "Google, headquartered in Mountain View, unveiled the new Android phone at the Consumer Electronic Show. Sundar Pichai said in his keynote that users love their new Android phones."
}
}
执行的命令:
curl "https://language.googleapis.com/v1/documents:analyzeSyntax?key=${API_KEY}" \
-s \
-X POST \
-H "Content-Type: application/json" \
--data-binary @request.json > response.json
响应示例:
{
"sentences": [
{
"text": {
"content": "Google, headquartered in Mountain View, unveiled the new Android phone at the Consumer Electronic Show.",
"beginOffset": 0
}
},
{
"text": {
"content": "Sundar Pichai said in his keynote that users love their new Android phones.",
"beginOffset": 105
}
}
],
"tokens": [
{
"text": {
"content": "Google",
"beginOffset": 0
},
"partOfSpeech": {
"tag": "NOUN",
"aspect": "ASPECT_UNKNOWN",
"case": "CASE_UNKNOWN",
"form": "FORM_UNKNOWN",
"gender": "GENDER_UNKNOWN",
"mood": "MOOD_UNKNOWN",
"number": "SINGULAR",
"person": "PERSON_UNKNOWN",
"proper": "PROPER",
"reciprocity": "RECIPROCITY_UNKNOWN",
"tense": "TENSE_UNKNOWN",
"voice": "VOICE_UNKNOWN"
},
"dependencyEdge": {
"headTokenIndex": 7,
"label": "NSUBJ"
},
"lemma": "Google"
},
{
"text": {
"content": ",",
"beginOffset": 6
},
"partOfSpeech": {
"tag": "PUNCT",
"aspect": "ASPECT_UNKNOWN",
"case": "CASE_UNKNOWN",
"form": "FORM_UNKNOWN",
"gender": "GENDER_UNKNOWN",
"mood": "MOOD_UNKNOWN",
"number": "NUMBER_UNKNOWN",
"person": "PERSON_UNKNOWN",
"proper": "PROPER_UNKNOWN",
"reciprocity": "RECIPROCITY_UNKNOWN",
"tense": "TENSE_UNKNOWN",
"voice": "VOICE_UNKNOWN"
},
"dependencyEdge": {
"headTokenIndex": 0,
"label": "P"
},
"lemma": ","
},
...
...
此响应 JSON 是 819 行,其中 314 行(响应的近 40%!)是 *_UNKNOWN
partOfSpeech
属性的值。因此,完全没用,但会显着增加响应中的数据量。
文档似乎没有提供对此有帮助的参数。我是不是遗漏了什么,或者这个 API 不支持在 *_UNKNOWN
时删除这些键的论点?这是只能管理post-response with data cleaning吗?
如果我们查看 API 规范,我们最终会发现词性实际上是枚举(enumerations)。比如我们发现Gender可以是:
- GENDER_UNKNOWN
- 女性气质
- 阳刚
- 中性
使 REST API 调用发送和接收 JSON 有效载荷,JSON 对枚举的抽象是它们的值是扩展字符串。但是,REST 和 JSON 并不是发出 GCP 服务请求的唯一协议。还可以进行 gRPC 调用。当一个人使用 gRPC 时,传输的协议是 protocol buffer。来自 Google 的语言绑定允许您使用 gRPC 进行服务调用,而不必因学习该技术而分心。 gRPC 的价值在于消息更小更快。
我没有看到在 API 级别提供传输压缩的机制(例如在使用 REST 时要求字段不包含在 JSON 响应中)。
另请参阅:
我想知道 API 端点是否有任何方式允许 analyzeSyntax
API response JSON to not include sub-attributes of partOfSpeech
dictionaries if they are *_UNKNOWN
? When looking at details around the document input, I can't find any way to limit the response document contents of partOfSpeech
。
这是只有清理数据时才会处理的东西吗,post-response?
示例查询 per API docs here 在名为 request.json
的文件中:
{
"encodingType": "UTF8",
"document": {
"type": "PLAIN_TEXT",
"content": "Google, headquartered in Mountain View, unveiled the new Android phone at the Consumer Electronic Show. Sundar Pichai said in his keynote that users love their new Android phones."
}
}
执行的命令:
curl "https://language.googleapis.com/v1/documents:analyzeSyntax?key=${API_KEY}" \
-s \
-X POST \
-H "Content-Type: application/json" \
--data-binary @request.json > response.json
响应示例:
{
"sentences": [
{
"text": {
"content": "Google, headquartered in Mountain View, unveiled the new Android phone at the Consumer Electronic Show.",
"beginOffset": 0
}
},
{
"text": {
"content": "Sundar Pichai said in his keynote that users love their new Android phones.",
"beginOffset": 105
}
}
],
"tokens": [
{
"text": {
"content": "Google",
"beginOffset": 0
},
"partOfSpeech": {
"tag": "NOUN",
"aspect": "ASPECT_UNKNOWN",
"case": "CASE_UNKNOWN",
"form": "FORM_UNKNOWN",
"gender": "GENDER_UNKNOWN",
"mood": "MOOD_UNKNOWN",
"number": "SINGULAR",
"person": "PERSON_UNKNOWN",
"proper": "PROPER",
"reciprocity": "RECIPROCITY_UNKNOWN",
"tense": "TENSE_UNKNOWN",
"voice": "VOICE_UNKNOWN"
},
"dependencyEdge": {
"headTokenIndex": 7,
"label": "NSUBJ"
},
"lemma": "Google"
},
{
"text": {
"content": ",",
"beginOffset": 6
},
"partOfSpeech": {
"tag": "PUNCT",
"aspect": "ASPECT_UNKNOWN",
"case": "CASE_UNKNOWN",
"form": "FORM_UNKNOWN",
"gender": "GENDER_UNKNOWN",
"mood": "MOOD_UNKNOWN",
"number": "NUMBER_UNKNOWN",
"person": "PERSON_UNKNOWN",
"proper": "PROPER_UNKNOWN",
"reciprocity": "RECIPROCITY_UNKNOWN",
"tense": "TENSE_UNKNOWN",
"voice": "VOICE_UNKNOWN"
},
"dependencyEdge": {
"headTokenIndex": 0,
"label": "P"
},
"lemma": ","
},
...
...
此响应 JSON 是 819 行,其中 314 行(响应的近 40%!)是 *_UNKNOWN
partOfSpeech
属性的值。因此,完全没用,但会显着增加响应中的数据量。
文档似乎没有提供对此有帮助的参数。我是不是遗漏了什么,或者这个 API 不支持在 *_UNKNOWN
时删除这些键的论点?这是只能管理post-response with data cleaning吗?
如果我们查看 API 规范,我们最终会发现词性实际上是枚举(enumerations)。比如我们发现Gender可以是:
- GENDER_UNKNOWN
- 女性气质
- 阳刚
- 中性
使 REST API 调用发送和接收 JSON 有效载荷,JSON 对枚举的抽象是它们的值是扩展字符串。但是,REST 和 JSON 并不是发出 GCP 服务请求的唯一协议。还可以进行 gRPC 调用。当一个人使用 gRPC 时,传输的协议是 protocol buffer。来自 Google 的语言绑定允许您使用 gRPC 进行服务调用,而不必因学习该技术而分心。 gRPC 的价值在于消息更小更快。
我没有看到在 API 级别提供传输压缩的机制(例如在使用 REST 时要求字段不包含在 JSON 响应中)。
另请参阅: