Datastax Graph Loader - 正在加载 non-uniform JSON 个文件的 meta-properties
Datastax Graph Loader - Loading non-uniform JSON files' meta-properties
下面是 3 个示例 JSON 文件和图形加载器脚本。第一个文件包含最复杂的内容,加载脚本应忽略其中的大部分内容。第二个文件是一个经常出现的简单变体。最后一个文件是为了提供每个文件之间可能出现的广泛差异的感觉,并显示当前问题所在的最直接示例。
在深入研究之前,请注意这只是我实际使用的数据结构的近似值,它正在加载脚本。有更好的方法来为人们处理顶点,但这是我能想到的第一个例子。
样本输入JSON文件1
/*{
"peopleInfo": [
{
"id": {
"idProperty1": "property1Value",
"idProperty2": "someUUID"
}
},
{
"people": [
{
"firstName": "person1FirstName",
"lastName": "person1LastName",
"sequence": 1
},
{
"firstName": "person2FirstName",
"lastName": "person2LastName",
"sequence": 2
},
{ //children and twins may be switched such that twins are sequence 3 & 4 and one or both of them have children with corresponding sequences
"children": [
{
"firstName": "firstChildFirstName",
"lastName": "firstChildLastName",
"sequence": 3
},
{
"firstName": "secondChildFirstName",
"lastName": "secondChildLastName",
"sequence": 4
},
{
"twins": [
{
"firstName": "firstTwinFirstName",
"lastName": "firstTwinLastName",
"sequence": 5
},
{
"firstName": "secondTwinFirstName",
"lastName": "secondTwinLastName",
"sequence": 6
}
]
}
]
}
]
}
]
}*/
第二个文件不包含任何 children
示例输入 JSON 文件 2
/*{
"peopleInfo": [
{
"id": {
"idProperty1": "property1Value",
"idProperty2": "someUUID"
}
},
{
"people": [
{
"firstName": "person1FirstName",
"lastName": "person1LastName",
"sequence": 1
},
{
"firstName": "person2FirstName",
"lastName": "person2LastName",
"sequence": 2
}
]
}
]
}*/
第三个文件包含Twins,但没有single-bornchildren
示例输入 JSON 文件 3
/*{
"peopleInfo": [
{
"personsID": {
"idProperty1": "property1Value",
"idProperty2": "someUUID"
}
},
{
"people": [
{ // twins can exist without top level people(parents work well to define this) and without other children. Also, children can exist without twins and without parents as well.
"twins": [
{
"firstName": "firstTwinFirstName",
"lastName": "firstTwinLastName",
"sequence": 3
},
{
"firstName": "secondTwinFirstName",
"lastName": "secondTwinLastName",
"sequence": 4
}
]
}
]
}
]
}*/
加载脚本
inputBaseDir = "/path/to/directories"
import java.io.File as javaFile;
def list = []
new javaFile(inputBaseDir).eachDir() { dir ->
list << dir.getAbsolutePath()
}
for (item in list){
def fileBuilder = File.directory(item)
def peopleInfoMapper = fileBuilder.map {
it['idProperty1'] = it.peopleInfo.id.idProperty1[0]
it['idProperty2'] = it.peopleInfo.id.idProperty2[0]
def ppl = it.peopleInfo.people[1]
people = ppl.collect{
if ( it['firstName'] != null){
it['firstName'] = it['firstName']
} else if ( it['lastName'] != null){
it['lastName'] = it['lastName']
} else if ( it['sequence'] != null) {
it['sequence'] = it['sequence']
}
//filling the null values below is the temporary non-solution to get the data to load
if ( it['firstName'] == null){
it['firstName'] = ''
}
if ( it['lastName'] == null){
it['lastName'] = ''
}
if ( it['sequence'] == null){
it['sequence'] = 0
}
it
}
it['people'] = people
it.remove('peopleInfo')
it
}
load(peopleInfoMapper).asVertices {
label "peopleInfo"
key 'idProperty2'
vertexProperty 'people',{
value 'firstName'
value 'lastName'
value 'sequence'
ignore 'children'
ignore 'twins'
}
}
问题
1
查看第三个文件:
虽然双胞胎中有允许的值,但它们不应影响加载,因为忽略 'twins' 键应该忽略它们的所有 meta-property 值。在这种情况下,我认为下面的异常被抛出,因为没有任何不是 children 或双胞胎的顶级人,并且通过忽略 'twins' 键,所有剩下的 vertexProperty 'people'
是一张空地图。我的 non-answer 只是用一个空字符串填充了那个空映射,用一个空字符串表示名称,用一个零表示与实际数据一起加载到数据库中的序列。
java.lang.IllegalArgumentException: [On field 'people'] Provided map
does not contain property value on field [sequence]:
{twin=[{firstName=firstTwinFirstName,lastName=firstTwinLastName,
sequence=1},{firstName=secondTwinFirstName,lastName=secondTwinLastName,sequence=2}]}
2
查看第一个文件:
当 'twins' 键被忽略或直接删除时,仍然会留下一个空地图作为占位符,在加载脚本中由相同的 non-solution 填充并与实际数据一起加载到数据库中.
是否有处理这些问题的最佳实践?
我不知道这是否是最常规的解决方案,但这似乎可以解决问题
inputBaseDir = "/path/to/directories"
import java.io.File as javaFile;
def list = []
new javaFile(inputBaseDir).eachDir() { dir ->
list << dir.getAbsolutePath()
}
for (item in list){
def fileBuilder = File.directory(item)
def peopleInfoMapper = fileBuilder.map {
it['idProperty1'] = it.peopleInfo.id.idProperty1[0]
it['idProperty2'] = it.peopleInfo.id.idProperty2[0]
def ppl = it.peopleInfo.people[1]
people = ppl.collect{
//removes k:v leaving an empty map
if (it['children'] != null{
it.remove('children')
}
//removes k:v leaving an empty map
if (it['twins'] != null{
it.remove('twins')
}
if ( it['firstName'] != null){
it['firstName'] = it['firstName']
} else if ( it['lastName'] != null){
it['lastName'] = it['lastName']
} else if ( it['sequence'] != null) {
it['sequence'] = it['sequence']
}
}
if (ppl['firstName'][0] != null && ppl['lastName'][0] != null){
it['people'] = people.findAll() //only gathers non-empty maps from people
} else {
/* removing people without desired meta-properties enables
loader to proceed when empty maps from the removal of
children and/or twins are present, while top-level
persons aren't*/
it.remove('people')}
it.remove('peopleInfo')
it
}
load(peopleInfoMapper).asVertices {
label "peopleInfo"
key 'idProperty2'
vertexProperty 'people',{
value 'firstName'
value 'lastName'
value 'sequence'
ignore 'children'
ignore 'twins'
}
}
下面是 3 个示例 JSON 文件和图形加载器脚本。第一个文件包含最复杂的内容,加载脚本应忽略其中的大部分内容。第二个文件是一个经常出现的简单变体。最后一个文件是为了提供每个文件之间可能出现的广泛差异的感觉,并显示当前问题所在的最直接示例。
在深入研究之前,请注意这只是我实际使用的数据结构的近似值,它正在加载脚本。有更好的方法来为人们处理顶点,但这是我能想到的第一个例子。
样本输入JSON文件1
/*{
"peopleInfo": [
{
"id": {
"idProperty1": "property1Value",
"idProperty2": "someUUID"
}
},
{
"people": [
{
"firstName": "person1FirstName",
"lastName": "person1LastName",
"sequence": 1
},
{
"firstName": "person2FirstName",
"lastName": "person2LastName",
"sequence": 2
},
{ //children and twins may be switched such that twins are sequence 3 & 4 and one or both of them have children with corresponding sequences
"children": [
{
"firstName": "firstChildFirstName",
"lastName": "firstChildLastName",
"sequence": 3
},
{
"firstName": "secondChildFirstName",
"lastName": "secondChildLastName",
"sequence": 4
},
{
"twins": [
{
"firstName": "firstTwinFirstName",
"lastName": "firstTwinLastName",
"sequence": 5
},
{
"firstName": "secondTwinFirstName",
"lastName": "secondTwinLastName",
"sequence": 6
}
]
}
]
}
]
}
]
}*/
第二个文件不包含任何 children
示例输入 JSON 文件 2
/*{
"peopleInfo": [
{
"id": {
"idProperty1": "property1Value",
"idProperty2": "someUUID"
}
},
{
"people": [
{
"firstName": "person1FirstName",
"lastName": "person1LastName",
"sequence": 1
},
{
"firstName": "person2FirstName",
"lastName": "person2LastName",
"sequence": 2
}
]
}
]
}*/
第三个文件包含Twins,但没有single-bornchildren
示例输入 JSON 文件 3
/*{
"peopleInfo": [
{
"personsID": {
"idProperty1": "property1Value",
"idProperty2": "someUUID"
}
},
{
"people": [
{ // twins can exist without top level people(parents work well to define this) and without other children. Also, children can exist without twins and without parents as well.
"twins": [
{
"firstName": "firstTwinFirstName",
"lastName": "firstTwinLastName",
"sequence": 3
},
{
"firstName": "secondTwinFirstName",
"lastName": "secondTwinLastName",
"sequence": 4
}
]
}
]
}
]
}*/
加载脚本
inputBaseDir = "/path/to/directories"
import java.io.File as javaFile;
def list = []
new javaFile(inputBaseDir).eachDir() { dir ->
list << dir.getAbsolutePath()
}
for (item in list){
def fileBuilder = File.directory(item)
def peopleInfoMapper = fileBuilder.map {
it['idProperty1'] = it.peopleInfo.id.idProperty1[0]
it['idProperty2'] = it.peopleInfo.id.idProperty2[0]
def ppl = it.peopleInfo.people[1]
people = ppl.collect{
if ( it['firstName'] != null){
it['firstName'] = it['firstName']
} else if ( it['lastName'] != null){
it['lastName'] = it['lastName']
} else if ( it['sequence'] != null) {
it['sequence'] = it['sequence']
}
//filling the null values below is the temporary non-solution to get the data to load
if ( it['firstName'] == null){
it['firstName'] = ''
}
if ( it['lastName'] == null){
it['lastName'] = ''
}
if ( it['sequence'] == null){
it['sequence'] = 0
}
it
}
it['people'] = people
it.remove('peopleInfo')
it
}
load(peopleInfoMapper).asVertices {
label "peopleInfo"
key 'idProperty2'
vertexProperty 'people',{
value 'firstName'
value 'lastName'
value 'sequence'
ignore 'children'
ignore 'twins'
}
}
问题
1
查看第三个文件:
虽然双胞胎中有允许的值,但它们不应影响加载,因为忽略 'twins' 键应该忽略它们的所有 meta-property 值。在这种情况下,我认为下面的异常被抛出,因为没有任何不是 children 或双胞胎的顶级人,并且通过忽略 'twins' 键,所有剩下的 vertexProperty 'people'
是一张空地图。我的 non-answer 只是用一个空字符串填充了那个空映射,用一个空字符串表示名称,用一个零表示与实际数据一起加载到数据库中的序列。
java.lang.IllegalArgumentException: [On field 'people'] Provided map does not contain property value on field [sequence]: {twin=[{firstName=firstTwinFirstName,lastName=firstTwinLastName, sequence=1},{firstName=secondTwinFirstName,lastName=secondTwinLastName,sequence=2}]}
2
查看第一个文件: 当 'twins' 键被忽略或直接删除时,仍然会留下一个空地图作为占位符,在加载脚本中由相同的 non-solution 填充并与实际数据一起加载到数据库中.
是否有处理这些问题的最佳实践?
我不知道这是否是最常规的解决方案,但这似乎可以解决问题
inputBaseDir = "/path/to/directories"
import java.io.File as javaFile;
def list = []
new javaFile(inputBaseDir).eachDir() { dir ->
list << dir.getAbsolutePath()
}
for (item in list){
def fileBuilder = File.directory(item)
def peopleInfoMapper = fileBuilder.map {
it['idProperty1'] = it.peopleInfo.id.idProperty1[0]
it['idProperty2'] = it.peopleInfo.id.idProperty2[0]
def ppl = it.peopleInfo.people[1]
people = ppl.collect{
//removes k:v leaving an empty map
if (it['children'] != null{
it.remove('children')
}
//removes k:v leaving an empty map
if (it['twins'] != null{
it.remove('twins')
}
if ( it['firstName'] != null){
it['firstName'] = it['firstName']
} else if ( it['lastName'] != null){
it['lastName'] = it['lastName']
} else if ( it['sequence'] != null) {
it['sequence'] = it['sequence']
}
}
if (ppl['firstName'][0] != null && ppl['lastName'][0] != null){
it['people'] = people.findAll() //only gathers non-empty maps from people
} else {
/* removing people without desired meta-properties enables
loader to proceed when empty maps from the removal of
children and/or twins are present, while top-level
persons aren't*/
it.remove('people')}
it.remove('peopleInfo')
it
}
load(peopleInfoMapper).asVertices {
label "peopleInfo"
key 'idProperty2'
vertexProperty 'people',{
value 'firstName'
value 'lastName'
value 'sequence'
ignore 'children'
ignore 'twins'
}
}