使用 neo4j-import 工具导入 40M+ 数据时出错

Error on importing 40M+ data using neo4j-import tool

我使用neo4j-import导入40M节点,下面是我的shell:

[luning@pinnacle bin]$ ./neo4j-import --into ../data/weibo.db --nodes:User "/data/weibo/user-header.csv,/data/weibo/users/000000_0.csv,/data/weibo/users/000001_0.csv,/data/weibo/users/000002_0.csv,/data/weibo/users/000003_0.csv,/data/weibo/users/000004_0.csv,/data/weibo/users/000005_0.csv,/data/weibo/users/000006_0.csv,/data/weibo/users/000007_0.csv,/data/weibo/users/000008_0.csv,/data/weibo/users/000009_0.csv,/data/weibo/users/000010_0.csv,/data/weibo/users/000011_0.csv,/data/weibo/users/000012_0.csv,/data/weibo/users/000013_0.csv,/data/weibo/users/000014_0.csv,/data/weibo/users/000015_0.csv,/data/weibo/users/000016_0.csv,/data/weibo/users/000017_0.csv,/data/weibo/users/000018_0.csv,/data/weibo/users/000019_0.csv,/data/weibo/users/000020_0.csv,/data/weibo/users/000021_0.csv,/data/weibo/users/000022_0.csv,/data/weibo/users/000023_1.csv,/data/weibo/users/000024_0.csv,/data/weibo/users/000025_0.csv" --delimiter "TAB"

Nodes
[*>:87.20 MB/s---------------------------|PROPERTIES(2)===============|NOD|v:227.03 MB/s(2)====] 48MImport error: Panic called, so exiting

Neo4j Import Tool
    neo4j-import is used to create a new Neo4j database from data in CSV files. See 
    the chapter "Import Tool" in the Neo4j Manual for details on the CSV file format 
    - a special kind of header is required.
Usage:
--into <store-dir>
    Database directory to import into. Must not contain existing database.
--nodes [:Label1:Label2] "<file1>,<file2>,..."
    Node CSV header and data. Multiple files will be logically seen as one big file 
    from the perspective of the importer. The first line must contain the header. 
    Multiple data sources like these can be specified in one import, where each data 
    source has its own header. Note that file groups must be enclosed in quotation 
    marks.
--relationships [:RELATIONSHIP_TYPE] "<file1>,<file2>,..."
    Relationship CSV header and data. Multiple files will be logically seen as one 
    big file from the perspective of the importer. The first line must contain the 
    header. Multiple data sources like these can be specified in one import, where 
    each data source has its own header. Note that file groups must be enclosed in 
    quotation marks.
--delimiter <delimiter-character>
    Delimiter character, or 'TAB', between values in CSV data. The default option is 
    ,.
--array-delimiter <array-delimiter-character>
    Delimiter character, or 'TAB', between array elements within a value in CSV

我检查了他们的模式。他们都是一致的。它显示

Import error: Panic called, so exiting

有人知道怎么解决吗?

下面是我的堆栈跟踪:

    java.lang.RuntimeException: Panic called, so exiting
    at org.neo4j.unsafe.impl.batchimport.staging.StageExecution.stillExecuting(StageExecution.java:63)
    at org.neo4j.unsafe.impl.batchimport.staging.ExecutionSupervisor.anyStillExecuting(ExecutionSupervisor.java:79)
    at org.neo4j.unsafe.impl.batchimport.staging.ExecutionSupervisor.finishAwareSleep(ExecutionSupervisor.java:102)
    at org.neo4j.unsafe.impl.batchimport.staging.ExecutionSupervisor.supervise(ExecutionSupervisor.java:64)
    at org.neo4j.unsafe.impl.batchimport.staging.ExecutionSupervisors.superviseDynamicExecution(ExecutionSupervisors.java:65)
    at org.neo4j.unsafe.impl.batchimport.ParallelBatchImporter.executeStages(ParallelBatchImporter.java:226)
    at org.neo4j.unsafe.impl.batchimport.ParallelBatchImporter.doImport(ParallelBatchImporter.java:151)
    at org.neo4j.tooling.ImportTool.main(ImportTool.java:263)
Caused by: java.lang.RuntimeException: Panic called, so exiting
    at org.neo4j.unsafe.impl.batchimport.staging.AbstractStep.assertHealthy(AbstractStep.java:189)
    at org.neo4j.unsafe.impl.batchimport.staging.AbstractStep.await(AbstractStep.java:180)
    at org.neo4j.unsafe.impl.batchimport.staging.ExecutorServiceStep.receive(ExecutorServiceStep.java:82)
    at org.neo4j.unsafe.impl.batchimport.staging.AbstractStep.sendDownstream(AbstractStep.java:226)
    at org.neo4j.unsafe.impl.batchimport.staging.ExecutorServiceStep.call(ExecutorServiceStep.java:103)
    at org.neo4j.unsafe.impl.batchimport.staging.ExecutorServiceStep.call(ExecutorServiceStep.java:87)
    at org.neo4j.unsafe.impl.batchimport.executor.DynamicTaskExecutor$Processor.run(DynamicTaskExecutor.java:217)
Caused by: java.lang.RuntimeException: Panic called, so exiting
    ... 7 more
Caused by: java.lang.RuntimeException: Panic called, so exiting
    ... 7 more
Caused by: org.neo4j.unsafe.impl.batchimport.input.InputException: ERROR in input
  data source: BufferedCharSeeker[buffer:org.neo4j.csv.reader.SectionedCharBuffer@4ac5af5c, seekPos:2764030, line:2882236]
  in field: descriptions:string:4
  for header: [id:string, screenname:string, locations:string, descriptions:string, :IGNORE, profileimageurl:string, gender:string, followerscount:string, friendscount:string, statusescount:string, favouritescount:string, verified:string, verifiedreason:string, :IGNORE, :IGNORE, :IGNORE, :IGNORE, :IGNORE, :IGNORE, :IGNORE, darenint:string, :IGNORE, :IGNORE, updateddate:string]
  raw field value: 6:19:
  original error: Tried to read in a value larger than effective buffer size 8388608
    at org.neo4j.unsafe.impl.batchimport.input.csv.InputEntityDeserializer.fetchNextOrNull(InputEntityDeserializer.java:152)
    at org.neo4j.unsafe.impl.batchimport.input.csv.InputEntityDeserializer.fetchNextOrNull(InputEntityDeserializer.java:42)
    at org.neo4j.helpers.collection.PrefetchingIterator.peek(PrefetchingIterator.java:60)
    at org.neo4j.helpers.collection.PrefetchingIterator.hasNext(PrefetchingIterator.java:46)
    at org.neo4j.helpers.collection.NestingIterator.fetchNextOrNull(NestingIterator.java:61)
    at org.neo4j.helpers.collection.PrefetchingIterator.peek(PrefetchingIterator.java:60)
    at org.neo4j.helpers.collection.PrefetchingIterator.hasNext(PrefetchingIterator.java:46)
    at org.neo4j.unsafe.impl.batchimport.staging.IteratorBatcherStep.nextBatchOrNull(IteratorBatcherStep.java:54)
    at org.neo4j.unsafe.impl.batchimport.staging.InputIteratorBatcherStep.nextBatchOrNull(InputIteratorBatcherStep.java:42)
    at org.neo4j.unsafe.impl.batchimport.staging.ProducerStep.process(ProducerStep.java:73)
    at org.neo4j.unsafe.impl.batchimport.staging.ProducerStep.run(ProducerStep.java:54)
Caused by: java.lang.IllegalStateException: Tried to read in a value larger than effective buffer size 8388608
    at org.neo4j.csv.reader.BufferedCharSeeker.fillBufferIfWeHaveExhaustedIt(BufferedCharSeeker.java:258)
    at org.neo4j.csv.reader.BufferedCharSeeker.nextChar(BufferedCharSeeker.java:231)
    at org.neo4j.csv.reader.BufferedCharSeeker.seek(BufferedCharSeeker.java:109)
    at org.neo4j.unsafe.impl.batchimport.input.csv.InputEntityDeserializer.fetchNextOrNull(InputEntityDeserializer.java:81)
    ... 10 more

其中一个字段可能有一个未结束该引号的引号...因此 CSV 解析器将读取和读取直到找到下一个引号。你不太可能在那里有一个 8M 大的字段,所以这就是我的想法。

当我尝试使用此方法将数据导入图表时,我也遇到了 "Import error: Executor has been shut down" 和 "Import error: Panic called, so exiting" 错误。

当我收到这些错误时,我的数据没有引号字符(" 和 ')。

解决我问题的方法是删除所有其他特殊字符。

我可能遗漏了文档中的某些内容,因为我认为节点属性中的所有文本都将作为字符串读入。结果 neo4j-import 不喜欢“&”和“/”这样的字符!

当我编辑我的数据(是的 sed!)以仅包含字母数字字符时,导入工具运行完美。

我遇到了同样的错误并删除了特殊字符,例如“*”、“&”、“/”,但保留单引号足以消除错误。