如何直接从内存中索引 JSON 文档
How to index a JSON document directly from memory
我正在尝试为 JSON 文档编制索引,但它根本不起作用;到目前为止,我已经尝试了 https://developer.ibm.com/answers/questions/361808/adding-a-json-document-to-a-discovery-collection-u/ 中发布的解决方案,但它根本不起作用;
如果我尝试:
discovery.addDocument({
environment_id: config.watson.environment_id,
collection_id: config.watson.collection_id,
file: JSON.stringify({
"ocorrencia_id": 9001
})
}, (error, data) => {
if (error) {
console.error(error);
return;
}
console.log(data);
});
它 returns 我这个错误:
The Media Type [text/plain] of the input document is not supported. Auto correction was attempted, but the auto detected media type [text/plain] is also not supported. Supported Media Types are: application/json, application/msword, application/vnd.openxmlformats-officedocument.wordprocessingml.document, application/pdf, text/html, application/xhtml+xml .
另一方面,如果我尝试:
discovery.addDocument({
environment_id: config.watson.environment_id,
collection_id: config.watson.collection_id,
file: JSON.parse(JSON.stringify({
"ocorrencia_id": 9001
}))
}, (error, data) => {
if (error) {
console.error(error);
return;
}
console.log(data);
});
我收到这个错误:
TypeError: source.on is not a function
at Function.DelayedStream.create (C:\Temp\teste-watson\watson-orchestrator\node_modules\delayed-stream\lib\delayed_stream.js:33:10)
at FormData.CombinedStream.append (C:\Temp\teste-watson\watson-orchestrator\node_modules\combined-stream\lib\combined_stream.js:43:37)
at FormData.append (C:\Temp\teste-watson\watson-orchestrator\node_modules\form-data\lib\form_data.js:68:3)
at appendFormValue (C:\Temp\teste-watson\watson-orchestrator\node_modules\request\request.js:324:21)
at Request.init (C:\Temp\teste-watson\watson-orchestrator\node_modules\request\request.js:337:11)
at new Request (C:\Temp\teste-watson\watson-orchestrator\node_modules\request\request.js:130:8)
at request (C:\Temp\teste-watson\watson-orchestrator\node_modules\request\index.js:54:10)
at createRequest (C:\Temp\teste-watson\watson-orchestrator\node_modules\watson-developer-cloud\lib\requestwrapper.js:177:10)
at DiscoveryV1.addDocument (C:\Temp\teste-watson\watson-orchestrator\node_modules\watson-developer-cloud\discovery\v1.js:516:10)
at client.query.then.res (C:\Temp\teste-watson\watson-orchestrator\populate\populate.js:36:13)
at process._tickCallback (internal/process/next_tick.js:109:7)
同样,通过保存到临时文件,然后使用它:
const tempy = require('tempy');
const f = tempy.file({extension: 'json'});
fs.writeFileSync(f, JSON.stringify({
"ocorrencia_id": 9001
}));
discovery.addDocument({
environment_id: config.watson.environment_id,
collection_id: config.watson.collection_id,
file: fs.readFileSync(f)
}, (error, data) => {
if (error) {
console.error(error);
return;
}
console.log(data);
});
然后会发生这种情况:
The Media Type [application/octet-stream] of the input document is not supported. Auto correction was attempted, but the auto detected media type [text/plain] is also not supported. Supported Media Types are: application/json, application/msword, application/vnd.openxmlformats-officedocument.wordprocessingml.document, application/pdf, text/html, application/xhtml+xml .
考虑到其他帖子建议使用 JSON.parse(),API 似乎接受 JS 对象,但是 none 的示例,到目前为止我没有尝试过似乎在工作。似乎是一个错误?
更新:通过保存到一个临时文件然后使用 createDataStream()
,而不是 readFileSync()
,它可以工作,但是必须通过磁盘获取信息仍然是一个很大的麻烦已经在记忆中了。
我也试过 create a in-memory stream from a Readable,但也失败了:
var Readable = require('stream').Readable;
var s = new Readable();
s._read = function noop() {}; // redundant? see update below
s.push(JSON.stringify({
"ocorrencia_id": 9001
}));
s.push(null);
discovery.addDocument({
environment_id: config.watson.environment_id,
collection_id: config.watson.collection_id,
file: s
}, (error, data) => {
if (error) {
console.error(error);
return;
}
console.log(data);
});
这个失败了:
Error: Unexpected end of multipart data
at Request._callback (C:\Temp\teste-watson\watson-orchestrator\node_modules\watson-developer-cloud\lib\requestwrapper.js:88:15)
at Request.self.callback (C:\Temp\teste-watson\watson-orchestrator\node_modules\request\request.js:188:22)
at emitTwo (events.js:106:13)
at Request.emit (events.js:191:7)
at Request.<anonymous> (C:\Temp\teste-watson\watson-orchestrator\node_modules\request\request.js:1171:10)
at emitOne (events.js:96:13)
at Request.emit (events.js:188:7)
at Gunzip.<anonymous> (C:\Temp\teste-watson\watson-orchestrator\node_modules\request\request.js:1091:12)
at Gunzip.g (events.js:292:16)
at emitNone (events.js:91:20)
at Gunzip.emit (events.js:185:7)
at endReadableNT (_stream_readable.js:974:12)
at _combinedTickCallback (internal/process/next_tick.js:80:11)
at process._tickCallback (internal/process/next_tick.js:104:9) code: 500, error: 'Unexpected end of multipart data'
您遇到的问题是因为缺少内容类型(默认为 text/plain
)。当您提供要作为字符串上传的文档时,您需要提供内容类型和文件名。在这种情况下,您可以尝试使用以下内容
discovery.addDocument({
//other required parameters
file: {
value: JSON.stringify({ "ocorrencia_id": 9001 }),
options: {
filename: "some_file_name",
contentType: "application/json; charset=utf-8"
}
}
}, callbackFn)
该服务先检查文件名,然后检查内容以确定类型,但似乎无法正确识别 JSON 内容 - 它只能看到文本。另一个答案将起作用,只要文件名以 .json
结尾(它不关心 contentType)。
但是,我们向 node.js SDK 添加了 .addJsonDocument()
and .updateJsonDocument()
方法,以使其更加简单:
discovery.addJsonDocument({
environment_id: config.watson.environment_id,
collection_id: config.watson.collection_id,
// note: no JSON.stringify needed with addJsonDocument()
file: {
"ocorrencia_id": 9001
}
}, (error, data) => {
if (error) {
console.error(error);
return;
}
console.log(data);
});
我正在尝试为 JSON 文档编制索引,但它根本不起作用;到目前为止,我已经尝试了 https://developer.ibm.com/answers/questions/361808/adding-a-json-document-to-a-discovery-collection-u/ 中发布的解决方案,但它根本不起作用;
如果我尝试:
discovery.addDocument({
environment_id: config.watson.environment_id,
collection_id: config.watson.collection_id,
file: JSON.stringify({
"ocorrencia_id": 9001
})
}, (error, data) => {
if (error) {
console.error(error);
return;
}
console.log(data);
});
它 returns 我这个错误:
The Media Type [text/plain] of the input document is not supported. Auto correction was attempted, but the auto detected media type [text/plain] is also not supported. Supported Media Types are: application/json, application/msword, application/vnd.openxmlformats-officedocument.wordprocessingml.document, application/pdf, text/html, application/xhtml+xml .
另一方面,如果我尝试:
discovery.addDocument({
environment_id: config.watson.environment_id,
collection_id: config.watson.collection_id,
file: JSON.parse(JSON.stringify({
"ocorrencia_id": 9001
}))
}, (error, data) => {
if (error) {
console.error(error);
return;
}
console.log(data);
});
我收到这个错误:
TypeError: source.on is not a function
at Function.DelayedStream.create (C:\Temp\teste-watson\watson-orchestrator\node_modules\delayed-stream\lib\delayed_stream.js:33:10)
at FormData.CombinedStream.append (C:\Temp\teste-watson\watson-orchestrator\node_modules\combined-stream\lib\combined_stream.js:43:37)
at FormData.append (C:\Temp\teste-watson\watson-orchestrator\node_modules\form-data\lib\form_data.js:68:3)
at appendFormValue (C:\Temp\teste-watson\watson-orchestrator\node_modules\request\request.js:324:21)
at Request.init (C:\Temp\teste-watson\watson-orchestrator\node_modules\request\request.js:337:11)
at new Request (C:\Temp\teste-watson\watson-orchestrator\node_modules\request\request.js:130:8)
at request (C:\Temp\teste-watson\watson-orchestrator\node_modules\request\index.js:54:10)
at createRequest (C:\Temp\teste-watson\watson-orchestrator\node_modules\watson-developer-cloud\lib\requestwrapper.js:177:10)
at DiscoveryV1.addDocument (C:\Temp\teste-watson\watson-orchestrator\node_modules\watson-developer-cloud\discovery\v1.js:516:10)
at client.query.then.res (C:\Temp\teste-watson\watson-orchestrator\populate\populate.js:36:13)
at process._tickCallback (internal/process/next_tick.js:109:7)
同样,通过保存到临时文件,然后使用它:
const tempy = require('tempy');
const f = tempy.file({extension: 'json'});
fs.writeFileSync(f, JSON.stringify({
"ocorrencia_id": 9001
}));
discovery.addDocument({
environment_id: config.watson.environment_id,
collection_id: config.watson.collection_id,
file: fs.readFileSync(f)
}, (error, data) => {
if (error) {
console.error(error);
return;
}
console.log(data);
});
然后会发生这种情况:
The Media Type [application/octet-stream] of the input document is not supported. Auto correction was attempted, but the auto detected media type [text/plain] is also not supported. Supported Media Types are: application/json, application/msword, application/vnd.openxmlformats-officedocument.wordprocessingml.document, application/pdf, text/html, application/xhtml+xml .
考虑到其他帖子建议使用 JSON.parse(),API 似乎接受 JS 对象,但是 none 的示例,到目前为止我没有尝试过似乎在工作。似乎是一个错误?
更新:通过保存到一个临时文件然后使用 createDataStream()
,而不是 readFileSync()
,它可以工作,但是必须通过磁盘获取信息仍然是一个很大的麻烦已经在记忆中了。
我也试过 create a in-memory stream from a Readable,但也失败了:
var Readable = require('stream').Readable;
var s = new Readable();
s._read = function noop() {}; // redundant? see update below
s.push(JSON.stringify({
"ocorrencia_id": 9001
}));
s.push(null);
discovery.addDocument({
environment_id: config.watson.environment_id,
collection_id: config.watson.collection_id,
file: s
}, (error, data) => {
if (error) {
console.error(error);
return;
}
console.log(data);
});
这个失败了:
Error: Unexpected end of multipart data
at Request._callback (C:\Temp\teste-watson\watson-orchestrator\node_modules\watson-developer-cloud\lib\requestwrapper.js:88:15)
at Request.self.callback (C:\Temp\teste-watson\watson-orchestrator\node_modules\request\request.js:188:22)
at emitTwo (events.js:106:13)
at Request.emit (events.js:191:7)
at Request.<anonymous> (C:\Temp\teste-watson\watson-orchestrator\node_modules\request\request.js:1171:10)
at emitOne (events.js:96:13)
at Request.emit (events.js:188:7)
at Gunzip.<anonymous> (C:\Temp\teste-watson\watson-orchestrator\node_modules\request\request.js:1091:12)
at Gunzip.g (events.js:292:16)
at emitNone (events.js:91:20)
at Gunzip.emit (events.js:185:7)
at endReadableNT (_stream_readable.js:974:12)
at _combinedTickCallback (internal/process/next_tick.js:80:11)
at process._tickCallback (internal/process/next_tick.js:104:9) code: 500, error: 'Unexpected end of multipart data'
您遇到的问题是因为缺少内容类型(默认为 text/plain
)。当您提供要作为字符串上传的文档时,您需要提供内容类型和文件名。在这种情况下,您可以尝试使用以下内容
discovery.addDocument({
//other required parameters
file: {
value: JSON.stringify({ "ocorrencia_id": 9001 }),
options: {
filename: "some_file_name",
contentType: "application/json; charset=utf-8"
}
}
}, callbackFn)
该服务先检查文件名,然后检查内容以确定类型,但似乎无法正确识别 JSON 内容 - 它只能看到文本。另一个答案将起作用,只要文件名以 .json
结尾(它不关心 contentType)。
但是,我们向 node.js SDK 添加了 .addJsonDocument()
and .updateJsonDocument()
方法,以使其更加简单:
discovery.addJsonDocument({
environment_id: config.watson.environment_id,
collection_id: config.watson.collection_id,
// note: no JSON.stringify needed with addJsonDocument()
file: {
"ocorrencia_id": 9001
}
}, (error, data) => {
if (error) {
console.error(error);
return;
}
console.log(data);
});