批量加载后 Neptune 中没有数据

No data in Neptune after bulk load

从S3 加载大量数据到Neptune 后,我在数据库中看不到任何顶点。这是我的装载机状态:

curl -G 'https://**.amazonaws.com:8182/loader/**?details=true&errors=true'
^[[A{
    "status" : "200 OK",
    "payload" : {
        "feedCount" : [
            {
                "LOAD_FAILED" : 1
            }
        ],
        "overallStatus" : {
            "fullUri" : "s3://**.nt",
            "runNumber" : 1,
            "retryNumber" : 0,
            "status" : "LOAD_FAILED",
            "totalTimeSpent" : 13035,
            "startTime" : 1626033369,
            "totalRecords" : 1745612081,
            "totalDuplicates" : 3580674,
            "parsingErrors" : 22,
            "datatypeMismatchErrors" : 0,
            "insertErrors" : 0
        },
        "failedFeeds" : [
            {
                "fullUri" : "s3://**.nt",
                "runNumber" : 1,
                "retryNumber" : 0,
                "status" : "LOAD_FAILED",
                "totalTimeSpent" : 13032,
                "startTime" : 1626033372,
                "totalRecords" : 1745612081,
                "totalDuplicates" : 3580674,
                "parsingErrors" : 22,
                "datatypeMismatchErrors" : 0,
                "insertErrors" : 0
            }
        ],
        "errors" : {
            "startIndex" : 1,
            "endIndex" : 10,
            "loadId" : "**",
            "errorLogs" : [
                {
                    "errorCode" : "PARSING_ERROR",
                    "errorMessage" : "IRI includes string escapes: '\92'",
                    "fileName" : "s3://**.nt",
                    "recordNum" : 195142350
                },
                {
                    "errorCode" : "PARSING_ERROR",
                    "errorMessage" : "IRI includes string escapes: '\92'",
                    "fileName" : "s3://**.nt",
                    "recordNum" : 213781671
                },
                {
                    "errorCode" : "PARSING_ERROR",
                    "errorMessage" : "IRI includes string escapes: '\92'",
                    "fileName" : "s3://**.nt",
                    "recordNum" : 223606399
                },
                {
                    "errorCode" : "PARSING_ERROR",
                    "errorMessage" : "IRI includes string escapes: '\92'",
                    "fileName" : "s3://**.nt",
                    "recordNum" : 237802811
                },
                {
                    "errorCode" : "PARSING_ERROR",
                    "errorMessage" : "IRI includes string escapes: '\92'",
                    "fileName" : "s3://**.nt",
                    "recordNum" : 459805351
                },
                {
                    "errorCode" : "PARSING_ERROR",
                    "errorMessage" : "IRI includes string escapes: '\92'",
                    "fileName" : "s3://**.nt",
                    "recordNum" : 603488680
                },
                {
                    "errorCode" : "PARSING_ERROR",
                    "errorMessage" : "IRI includes string escapes: '\92'",
                    "fileName" : "s3://**.nt",
                    "recordNum" : 644623634
                },
                {
                    "errorCode" : "PARSING_ERROR",
                    "errorMessage" : "IRI includes string escapes: '\92'",
                    "fileName" : "s3://**.nt",
                    "recordNum" : 696970927
                },
                {
                    "errorCode" : "PARSING_ERROR",
                    "errorMessage" : "IRI includes string escapes: '\92'",
                    "fileName" : "s3://**.nt",
                    "recordNum" : 700557784
                },
                {
                    "errorCode" : "PARSING_ERROR",
                    "errorMessage" : "IRI includes string escapes: '\92'",
                    "fileName" : "s3://**.nt",
                    "recordNum" : 714098924
                }
            ]
        }
    }
}

如您所见,它提到我有 22 个解析错误和 ~1.7B 总记录。我可以假设我在请求中设置了 "failOnError" : "FALSE",,数据库应该一切正常,但我完全可以接受的 22 个项目。

在这一点上,我确定数据库在那里,但是在 运行 一个简单的查询之后我什么也看不到:

curl -G "https://**.amazonaws.com:8182?gremlin=g.V().count()"

{"requestId":"**","status":{"message":"","code":200,"attributes":{"@type":"g:Map","@value":[]}},"result":{"data":{"@type":"g:List","@value":[{"@type":"g:Int64","@value":0}]},"meta":{"@type":"g:Map","@value":[]}}}

您似乎加载了 RDF 数据(N-Triples 格式)。必须使用 SPARQL 和 Amazon Neptune 查询 RDF 数据。 Gremlin 只能用于 属性 图形数据(使用批量加载程序加载为 CSV 文件)。要验证您是否有一些数据,请尝试使用 SPARQL 查询,例如:

SELECT ?s ?p ?o where {?s ?p ?o } LIMIT 1