批量加载后 Neptune 中没有数据
No data in Neptune after bulk load
从S3 加载大量数据到Neptune 后,我在数据库中看不到任何顶点。这是我的装载机状态:
curl -G 'https://**.amazonaws.com:8182/loader/**?details=true&errors=true'
^[[A{
"status" : "200 OK",
"payload" : {
"feedCount" : [
{
"LOAD_FAILED" : 1
}
],
"overallStatus" : {
"fullUri" : "s3://**.nt",
"runNumber" : 1,
"retryNumber" : 0,
"status" : "LOAD_FAILED",
"totalTimeSpent" : 13035,
"startTime" : 1626033369,
"totalRecords" : 1745612081,
"totalDuplicates" : 3580674,
"parsingErrors" : 22,
"datatypeMismatchErrors" : 0,
"insertErrors" : 0
},
"failedFeeds" : [
{
"fullUri" : "s3://**.nt",
"runNumber" : 1,
"retryNumber" : 0,
"status" : "LOAD_FAILED",
"totalTimeSpent" : 13032,
"startTime" : 1626033372,
"totalRecords" : 1745612081,
"totalDuplicates" : 3580674,
"parsingErrors" : 22,
"datatypeMismatchErrors" : 0,
"insertErrors" : 0
}
],
"errors" : {
"startIndex" : 1,
"endIndex" : 10,
"loadId" : "**",
"errorLogs" : [
{
"errorCode" : "PARSING_ERROR",
"errorMessage" : "IRI includes string escapes: '\92'",
"fileName" : "s3://**.nt",
"recordNum" : 195142350
},
{
"errorCode" : "PARSING_ERROR",
"errorMessage" : "IRI includes string escapes: '\92'",
"fileName" : "s3://**.nt",
"recordNum" : 213781671
},
{
"errorCode" : "PARSING_ERROR",
"errorMessage" : "IRI includes string escapes: '\92'",
"fileName" : "s3://**.nt",
"recordNum" : 223606399
},
{
"errorCode" : "PARSING_ERROR",
"errorMessage" : "IRI includes string escapes: '\92'",
"fileName" : "s3://**.nt",
"recordNum" : 237802811
},
{
"errorCode" : "PARSING_ERROR",
"errorMessage" : "IRI includes string escapes: '\92'",
"fileName" : "s3://**.nt",
"recordNum" : 459805351
},
{
"errorCode" : "PARSING_ERROR",
"errorMessage" : "IRI includes string escapes: '\92'",
"fileName" : "s3://**.nt",
"recordNum" : 603488680
},
{
"errorCode" : "PARSING_ERROR",
"errorMessage" : "IRI includes string escapes: '\92'",
"fileName" : "s3://**.nt",
"recordNum" : 644623634
},
{
"errorCode" : "PARSING_ERROR",
"errorMessage" : "IRI includes string escapes: '\92'",
"fileName" : "s3://**.nt",
"recordNum" : 696970927
},
{
"errorCode" : "PARSING_ERROR",
"errorMessage" : "IRI includes string escapes: '\92'",
"fileName" : "s3://**.nt",
"recordNum" : 700557784
},
{
"errorCode" : "PARSING_ERROR",
"errorMessage" : "IRI includes string escapes: '\92'",
"fileName" : "s3://**.nt",
"recordNum" : 714098924
}
]
}
}
}
如您所见,它提到我有 22 个解析错误和 ~1.7B 总记录。我可以假设我在请求中设置了 "failOnError" : "FALSE",
,数据库应该一切正常,但我完全可以接受的 22 个项目。
在这一点上,我确定数据库在那里,但是在 运行 一个简单的查询之后我什么也看不到:
curl -G "https://**.amazonaws.com:8182?gremlin=g.V().count()"
{"requestId":"**","status":{"message":"","code":200,"attributes":{"@type":"g:Map","@value":[]}},"result":{"data":{"@type":"g:List","@value":[{"@type":"g:Int64","@value":0}]},"meta":{"@type":"g:Map","@value":[]}}}
您似乎加载了 RDF 数据(N-Triples 格式)。必须使用 SPARQL 和 Amazon Neptune 查询 RDF 数据。 Gremlin 只能用于 属性 图形数据(使用批量加载程序加载为 CSV 文件)。要验证您是否有一些数据,请尝试使用 SPARQL 查询,例如:
SELECT ?s ?p ?o where {?s ?p ?o } LIMIT 1
从S3 加载大量数据到Neptune 后,我在数据库中看不到任何顶点。这是我的装载机状态:
curl -G 'https://**.amazonaws.com:8182/loader/**?details=true&errors=true'
^[[A{
"status" : "200 OK",
"payload" : {
"feedCount" : [
{
"LOAD_FAILED" : 1
}
],
"overallStatus" : {
"fullUri" : "s3://**.nt",
"runNumber" : 1,
"retryNumber" : 0,
"status" : "LOAD_FAILED",
"totalTimeSpent" : 13035,
"startTime" : 1626033369,
"totalRecords" : 1745612081,
"totalDuplicates" : 3580674,
"parsingErrors" : 22,
"datatypeMismatchErrors" : 0,
"insertErrors" : 0
},
"failedFeeds" : [
{
"fullUri" : "s3://**.nt",
"runNumber" : 1,
"retryNumber" : 0,
"status" : "LOAD_FAILED",
"totalTimeSpent" : 13032,
"startTime" : 1626033372,
"totalRecords" : 1745612081,
"totalDuplicates" : 3580674,
"parsingErrors" : 22,
"datatypeMismatchErrors" : 0,
"insertErrors" : 0
}
],
"errors" : {
"startIndex" : 1,
"endIndex" : 10,
"loadId" : "**",
"errorLogs" : [
{
"errorCode" : "PARSING_ERROR",
"errorMessage" : "IRI includes string escapes: '\92'",
"fileName" : "s3://**.nt",
"recordNum" : 195142350
},
{
"errorCode" : "PARSING_ERROR",
"errorMessage" : "IRI includes string escapes: '\92'",
"fileName" : "s3://**.nt",
"recordNum" : 213781671
},
{
"errorCode" : "PARSING_ERROR",
"errorMessage" : "IRI includes string escapes: '\92'",
"fileName" : "s3://**.nt",
"recordNum" : 223606399
},
{
"errorCode" : "PARSING_ERROR",
"errorMessage" : "IRI includes string escapes: '\92'",
"fileName" : "s3://**.nt",
"recordNum" : 237802811
},
{
"errorCode" : "PARSING_ERROR",
"errorMessage" : "IRI includes string escapes: '\92'",
"fileName" : "s3://**.nt",
"recordNum" : 459805351
},
{
"errorCode" : "PARSING_ERROR",
"errorMessage" : "IRI includes string escapes: '\92'",
"fileName" : "s3://**.nt",
"recordNum" : 603488680
},
{
"errorCode" : "PARSING_ERROR",
"errorMessage" : "IRI includes string escapes: '\92'",
"fileName" : "s3://**.nt",
"recordNum" : 644623634
},
{
"errorCode" : "PARSING_ERROR",
"errorMessage" : "IRI includes string escapes: '\92'",
"fileName" : "s3://**.nt",
"recordNum" : 696970927
},
{
"errorCode" : "PARSING_ERROR",
"errorMessage" : "IRI includes string escapes: '\92'",
"fileName" : "s3://**.nt",
"recordNum" : 700557784
},
{
"errorCode" : "PARSING_ERROR",
"errorMessage" : "IRI includes string escapes: '\92'",
"fileName" : "s3://**.nt",
"recordNum" : 714098924
}
]
}
}
}
如您所见,它提到我有 22 个解析错误和 ~1.7B 总记录。我可以假设我在请求中设置了 "failOnError" : "FALSE",
,数据库应该一切正常,但我完全可以接受的 22 个项目。
在这一点上,我确定数据库在那里,但是在 运行 一个简单的查询之后我什么也看不到:
curl -G "https://**.amazonaws.com:8182?gremlin=g.V().count()"
{"requestId":"**","status":{"message":"","code":200,"attributes":{"@type":"g:Map","@value":[]}},"result":{"data":{"@type":"g:List","@value":[{"@type":"g:Int64","@value":0}]},"meta":{"@type":"g:Map","@value":[]}}}
您似乎加载了 RDF 数据(N-Triples 格式)。必须使用 SPARQL 和 Amazon Neptune 查询 RDF 数据。 Gremlin 只能用于 属性 图形数据(使用批量加载程序加载为 CSV 文件)。要验证您是否有一些数据,请尝试使用 SPARQL 查询,例如:
SELECT ?s ?p ?o where {?s ?p ?o } LIMIT 1