Datastax Graph Loader - 正在加载 non-uniform JSON 个文件的 meta-properties

Datastax Graph Loader - Loading non-uniform JSON files' meta-properties

下面是 3 个示例 JSON 文件和图形加载器脚本。第一个文件包含最复杂的内容,加载脚本应忽略其中的大部分内容。第二个文件是一个经常出现的简单变体。最后一个文件是为了提供每个文件之间可能出现的广泛差异的感觉,并显示当前问题所在的最直接示例。

在深入研究之前,请注意这只是我实际使用的数据结构的近似值,它正在加载脚本。有更好的方法来为人们处理顶点,但这是我能想到的第一个例子。

样本输入JSON文件1

/*{
  "peopleInfo": [
    {
      "id": {
        "idProperty1": "property1Value",
        "idProperty2": "someUUID"
      }
    },
    {
      "people": [
        {
          "firstName": "person1FirstName",
          "lastName": "person1LastName",
          "sequence": 1
        },
        {
          "firstName": "person2FirstName",
          "lastName": "person2LastName",
          "sequence": 2
        },
        { //children and twins may be switched such that twins are sequence 3 & 4 and one or both of them have children with corresponding sequences
          "children": [
            {
              "firstName": "firstChildFirstName",
              "lastName": "firstChildLastName",
              "sequence": 3
            },
            {
              "firstName": "secondChildFirstName",
              "lastName": "secondChildLastName",
              "sequence": 4
            },
            {
              "twins": [
                {
                  "firstName": "firstTwinFirstName",
                  "lastName": "firstTwinLastName",
                  "sequence": 5
                },
                {
                  "firstName": "secondTwinFirstName",
                  "lastName": "secondTwinLastName",
                  "sequence": 6
                }
              ]
            }
          ]
        }
      ]
    }
  ]
}*/

第二个文件不包含任何 children

示例输入 JSON 文件 2

/*{
  "peopleInfo": [
    {
      "id": {
        "idProperty1": "property1Value",
        "idProperty2": "someUUID"
      }
    },
    {
      "people": [
        {
          "firstName": "person1FirstName",
          "lastName": "person1LastName",
          "sequence": 1
        },
        {
          "firstName": "person2FirstName",
          "lastName": "person2LastName",
          "sequence": 2
        }
      ]
    }
  ]
}*/

第三个文件包含Twins,但没有single-bornchildren

示例输入 JSON 文件 3

    /*{
      "peopleInfo": [
        {
          "personsID": {
            "idProperty1": "property1Value",
            "idProperty2": "someUUID"
          }
        },
        {
          "people": [
            { // twins can exist without top level people(parents work well to define this) and without other children. Also, children can exist without twins and without parents as well.
              "twins": [
                {
                  "firstName": "firstTwinFirstName",
                  "lastName": "firstTwinLastName",
                  "sequence": 3
                },
                {
                  "firstName": "secondTwinFirstName",
                  "lastName": "secondTwinLastName",
                  "sequence": 4
                }
              ]
            }
          ]
        }
      ]
    }*/

加载脚本

inputBaseDir = "/path/to/directories"

import java.io.File as javaFile;
def list = []

new javaFile(inputBaseDir).eachDir() { dir ->
  list << dir.getAbsolutePath()
}
for (item in list){
  def fileBuilder = File.directory(item)
  def peopleInfoMapper = fileBuilder.map {
    it['idProperty1'] = it.peopleInfo.id.idProperty1[0]
    it['idProperty2'] = it.peopleInfo.id.idProperty2[0]

    def ppl = it.peopleInfo.people[1]
    people = ppl.collect{
      if ( it['firstName'] != null){
        it['firstName'] = it['firstName']
      } else if ( it['lastName'] != null){
        it['lastName'] = it['lastName']
      } else if ( it['sequence'] != null) {
        it['sequence'] = it['sequence']
      }

      //filling the null values below is the temporary non-solution to get the data to load
      if ( it['firstName'] == null){
        it['firstName'] = ''
      }
      if ( it['lastName'] == null){
        it['lastName'] = ''
      }
      if ( it['sequence'] == null){
        it['sequence'] = 0
      }
      it
    }
    it['people'] = people
    it.remove('peopleInfo')
    it
    }
  load(peopleInfoMapper).asVertices {
    label "peopleInfo"
    key 'idProperty2'
    vertexProperty 'people',{
      value 'firstName'
      value 'lastName'
      value 'sequence'
      ignore 'children'
      ignore 'twins'
    }
  }

问题

1

查看第三个文件: 虽然双胞胎中有允许的值,但它们不应影响加载,因为忽略 'twins' 键应该忽略它们的所有 meta-property 值。在这种情况下,我认为下面的异常被抛出,因为没有任何不是 children 或双胞胎的顶级人,并且通过忽略 'twins' 键,所有剩下的 vertexProperty 'people' 是一张空地图。我的 non-answer 只是用一个空字符串填充了那个空映射,用一个空字符串表示名称,用一个零表示与实际数据一起加载到数据库中的序列。

java.lang.IllegalArgumentException: [On field 'people'] Provided map does not contain property value on field [sequence]: {twin=[{firstName=firstTwinFirstName,lastName=firstTwinLastName, sequence=1},{firstName=secondTwinFirstName,lastName=secondTwinLastName,sequence=2}]}

2

查看第一个文件: 当 'twins' 键被忽略或直接删除时,仍然会留下一个空地图作为占位符,在加载脚本中由相同的 non-solution 填充并与实际数据一起加载到数据库中.

是否有处理这些问题的最佳实践?

我不知道这是否是最常规的解决方案,但这似乎可以解决问题

inputBaseDir = "/path/to/directories"

import java.io.File as javaFile;
def list = []

new javaFile(inputBaseDir).eachDir() { dir ->
  list << dir.getAbsolutePath()
}
for (item in list){
  def fileBuilder = File.directory(item)
  def peopleInfoMapper = fileBuilder.map {
    it['idProperty1'] = it.peopleInfo.id.idProperty1[0]
    it['idProperty2'] = it.peopleInfo.id.idProperty2[0]

    def ppl = it.peopleInfo.people[1]
    people = ppl.collect{
      //removes k:v leaving an empty map
      if (it['children'] != null{
        it.remove('children')
      }
      //removes k:v leaving an empty map
      if (it['twins'] != null{
        it.remove('twins')
      }
      if ( it['firstName'] != null){
        it['firstName'] = it['firstName']
      } else if ( it['lastName'] != null){
        it['lastName'] = it['lastName']
      } else if ( it['sequence'] != null) {
        it['sequence'] = it['sequence']
      }
    }
    if (ppl['firstName'][0] != null && ppl['lastName'][0] != null){
      it['people'] = people.findAll() //only gathers non-empty maps from people
    } else { 
        /* removing people without desired meta-properties enables
         loader to proceed when empty maps from the removal of
         children and/or twins are present, while top-level 
         persons aren't*/
        it.remove('people')}  
    it.remove('peopleInfo')
    it
    }
  load(peopleInfoMapper).asVertices {
    label "peopleInfo"
    key 'idProperty2'
    vertexProperty 'people',{
      value 'firstName'
      value 'lastName'
      value 'sequence'
      ignore 'children'
      ignore 'twins'
    }
  }