我如何 JSON 从 google 的自然语言 API 中序列化一个对象? (无 __dict__ 属性)
How can I JSON serialize an object from google's natural language API? (No __dict__ attribute)
我正在使用 Google 自然语言 API 进行带有情感分析的文本标记项目。我想将我的 NL 结果存储为 JSON。如果向 Google 发出直接 HTTP 请求,则会返回 JSON 响应。
然而,当使用提供的 Python 库时,会返回一个对象,并且该对象不能直接 JSON 序列化。
这是我的代码示例:
import os
import sys
import oauth2client.client
from google.cloud.gapic.language.v1beta2 import enums, language_service_client
from google.cloud.proto.language.v1beta2 import language_service_pb2
class LanguageReader:
# class that parses, stores and reports language data from text
def __init__(self, content=None):
try:
# attempts to autheticate credentials from env variable
oauth2client.client.GoogleCredentials.get_application_default()
except oauth2client.client.ApplicationDefaultCredentialsError:
print("=== ERROR: Google credentials could not be authenticated! ===")
print("Current enviroment variable for this process is: {}".format(os.environ['GOOGLE_APPLICATION_CREDENTIALS']))
print("Run:")
print(" $ export GOOGLE_APPLICATION_CREDENTIALS=/YOUR_PATH_HERE/YOUR_JSON_KEY_HERE.json")
print("to set the authentication credentials manually")
sys.exit()
self.language_client = language_service_client.LanguageServiceClient()
self.document = language_service_pb2.Document()
self.document.type = enums.Document.Type.PLAIN_TEXT
self.encoding = enums.EncodingType.UTF32
self.results = None
if content is not None:
self.read_content(content)
def read_content(self, content):
self.document.content = content
self.language_client.analyze_sentiment(self.document, self.encoding)
self.results = self.language_client.analyze_sentiment(self.document, self.encoding)
现在如果你运行:
sample_text="I love R&B music. Marvin Gaye is the best. 'What's Going On' is one of my favorite songs. It was so sad when Marvin Gaye died."
resp = LanguageReader(sample_text).results
print resp
你会得到:
document_sentiment {
magnitude: 2.40000009537
score: 0.40000000596
}
language: "en"
sentences {
text {
content: "I love R&B music."
}
sentiment {
magnitude: 0.800000011921
score: 0.800000011921
}
}
sentences {
text {
content: "Marvin Gaye is the best."
begin_offset: 18
}
sentiment {
magnitude: 0.800000011921
score: 0.800000011921
}
}
sentences {
text {
content: "\'What\'s Going On\' is one of my favorite songs."
begin_offset: 43
}
sentiment {
magnitude: 0.40000000596
score: 0.40000000596
}
}
sentences {
text {
content: "It was so sad when Marvin Gaye died."
begin_offset: 90
}
sentiment {
magnitude: 0.20000000298
score: -0.20000000298
}
}
这不是 JSON。它是 google.cloud.proto.language.v1beta2.language_service_pb2.AnalyzeSentimentResponse 对象的一个实例。而且它没有 __dict__ attribute 属性,因此无法使用 json.dumps().
序列化
如何指定响应应在 JSON 中或将对象序列化为 JSON?
编辑:@Zach 注意到 Google 的 protobuf 数据交换格式 。似乎首选的选择是使用这些 protobuf.json_format
方法:
from google.protobuf.json_format import MessageToDict, MessageToJson
self.dict = MessageToDict(self.results)
self.json = MessageToJson(self.results)
来自文档字符串:
MessageToJson(message, including_default_value_fields=False, preserving_proto_field_name=False)
Converts protobuf message to JSON format.
Args:
message: The protocol buffers message instance to serialize.
including_default_value_fields: If True, singular primitive fields,
repeated fields, and map fields will always be serialized. If
False, only serialize non-empty fields. Singular message fields
and oneof fields are not affected by this option.
preserving_proto_field_name: If True, use the original proto field
names as defined in the .proto file. If False, convert the field
names to lowerCamelCase.
Returns:
A string containing the JSON formatted protocol buffer message.
我正在使用 Google 自然语言 API 进行带有情感分析的文本标记项目。我想将我的 NL 结果存储为 JSON。如果向 Google 发出直接 HTTP 请求,则会返回 JSON 响应。
然而,当使用提供的 Python 库时,会返回一个对象,并且该对象不能直接 JSON 序列化。
这是我的代码示例:
import os
import sys
import oauth2client.client
from google.cloud.gapic.language.v1beta2 import enums, language_service_client
from google.cloud.proto.language.v1beta2 import language_service_pb2
class LanguageReader:
# class that parses, stores and reports language data from text
def __init__(self, content=None):
try:
# attempts to autheticate credentials from env variable
oauth2client.client.GoogleCredentials.get_application_default()
except oauth2client.client.ApplicationDefaultCredentialsError:
print("=== ERROR: Google credentials could not be authenticated! ===")
print("Current enviroment variable for this process is: {}".format(os.environ['GOOGLE_APPLICATION_CREDENTIALS']))
print("Run:")
print(" $ export GOOGLE_APPLICATION_CREDENTIALS=/YOUR_PATH_HERE/YOUR_JSON_KEY_HERE.json")
print("to set the authentication credentials manually")
sys.exit()
self.language_client = language_service_client.LanguageServiceClient()
self.document = language_service_pb2.Document()
self.document.type = enums.Document.Type.PLAIN_TEXT
self.encoding = enums.EncodingType.UTF32
self.results = None
if content is not None:
self.read_content(content)
def read_content(self, content):
self.document.content = content
self.language_client.analyze_sentiment(self.document, self.encoding)
self.results = self.language_client.analyze_sentiment(self.document, self.encoding)
现在如果你运行:
sample_text="I love R&B music. Marvin Gaye is the best. 'What's Going On' is one of my favorite songs. It was so sad when Marvin Gaye died."
resp = LanguageReader(sample_text).results
print resp
你会得到:
document_sentiment {
magnitude: 2.40000009537
score: 0.40000000596
}
language: "en"
sentences {
text {
content: "I love R&B music."
}
sentiment {
magnitude: 0.800000011921
score: 0.800000011921
}
}
sentences {
text {
content: "Marvin Gaye is the best."
begin_offset: 18
}
sentiment {
magnitude: 0.800000011921
score: 0.800000011921
}
}
sentences {
text {
content: "\'What\'s Going On\' is one of my favorite songs."
begin_offset: 43
}
sentiment {
magnitude: 0.40000000596
score: 0.40000000596
}
}
sentences {
text {
content: "It was so sad when Marvin Gaye died."
begin_offset: 90
}
sentiment {
magnitude: 0.20000000298
score: -0.20000000298
}
}
这不是 JSON。它是 google.cloud.proto.language.v1beta2.language_service_pb2.AnalyzeSentimentResponse 对象的一个实例。而且它没有 __dict__ attribute 属性,因此无法使用 json.dumps().
序列化如何指定响应应在 JSON 中或将对象序列化为 JSON?
编辑:@Zach 注意到 Google 的 protobuf 数据交换格式 。似乎首选的选择是使用这些 protobuf.json_format
方法:
from google.protobuf.json_format import MessageToDict, MessageToJson
self.dict = MessageToDict(self.results)
self.json = MessageToJson(self.results)
来自文档字符串:
MessageToJson(message, including_default_value_fields=False, preserving_proto_field_name=False)
Converts protobuf message to JSON format.
Args:
message: The protocol buffers message instance to serialize.
including_default_value_fields: If True, singular primitive fields,
repeated fields, and map fields will always be serialized. If
False, only serialize non-empty fields. Singular message fields
and oneof fields are not affected by this option.
preserving_proto_field_name: If True, use the original proto field
names as defined in the .proto file. If False, convert the field
names to lowerCamelCase.
Returns:
A string containing the JSON formatted protocol buffer message.