使用 CSV 文件中的 ETL 将 OPoint 数据导入 OrientDB 2.2.x

Importing OPoint data into OrientDB 2.2.x using ETL from a CSV file

这与我之前的问题有关

  1. (我想通了)
  2. OrientDB spatial query to find all pairs within X km of each other(仍在寻找有用的答案)

作为对 (2) 的回应,我正在考虑修改我的 nazca geoglyph 数据集以使用 WKT 版本以与较新的 OrientDB 2.2.x Spatial Index 功能保持一致。

我输入的CSV文件,nazca_lines_wkt.csv是这样的:

Name,Location
Hummingbird,POINT(-75.148892 -14.692131)
Monkey,POINT(-75.138532 -14.706940)
Condor,POINT(-75.126208 -14.697444)
Spider,POINT(-75.122381 -14.694145)
Spiral,POINT(-75.122746 -14.688277)
Hands,POINT(-75.113881 -14.694459)
Tree,POINT(-75.114520 -14.693898)
Astronaut,POINT(-75.079755 -14.745222)
Dog,POINT(-75.130788 -14.706401)
Wing,POINT(-75.100385 -14.680309)
Parrot,POINT(-75.107498 -14.689463)

我创建了一个空的 PLOCAL 数据库,nazca-wkt.orientdb 并定义了一个 GeoGlyphWKT 顶点 class:

CREATE DATABASE PLOCAL:nazca-wkt.orientdb admin admin plocal graph

CREATE CLASS GeoGlyphWKT EXTENDS V

CREATE PROPERTY GeoGlyphWKT.Name      STRING
CREATE PROPERTY GeoGlyphWKT.Location  EMBEDDED OPoint
CREATE PROPERTY GeoGlyphWKT.Tag       EMBEDDEDSET STRING

我有两个用于 oetl 脚本的 .json 文件:

nazca_lines_wkt.json

{
    "config": {
        "log": "info",
        "fileDirectory": "./",
        "fileName": "nazca_lines_wkt.csv"
    }
}

commonGeoGlyphWKT.json

{
    "begin": [ { "let": { "name": "$filePath",  "expression": "$fileDirectory.append($fileName )" } } ],
    "config": { "log": "debug" },
    "source": { "file": { "path": "$filePath" } },
    "extractor":
        {
        "csv": { "ignoreEmptyLines": true,
                 "nullValue": "N/A",
                 "separator": ",",
                 "columnsOnFirstLine": true,
                 "dateFormat": "yyyy-MM-dd"
               }
        },
    "transformers": [ { "vertex": { "class": "GeoGlyphWKT" } } ],
    "loader": {
        "orientdb": {
            "dbURL": "plocal:nazca-wkt.orientdb",
            "dbType": "graph",
            "batchCommit": 1000
        }
    }
}

我 运行 oetl 使用此命令:

$ oetl.sh commonGeoGlyphWKT.json nazca_lines_wkt.json

但这失败了,输出如下:

$ oetl.sh commonGeoGlyphWKT.json nazca_lines_wkt.json
OrientDB etl v.2.2.13 (build 2.2.x@r90d7caa1e4af3fad86594e592c64dc1202558ab1; 2016-11-15 12:04:05+0000) www.orientdb.com
BEGIN ETL PROCESSOR
[file] INFO Reading from file ./nazca_lines_wkt.csv with encoding UTF-8
Started execution with 1 worker threads
Error in Pipeline execution: com.orientechnologies.orient.core.exception.OValidationException: impossible to convert value of field "Location"
    DB name="nazca-wkt.orientdb"
ETL process has problem: java.util.concurrent.ExecutionException: com.orientechnologies.orient.core.exception.OValidationException: impossible to convert value of field "Location"
    DB name="nazca-wkt.orientdb"
END ETL PROCESSOR
+ extracted 9 rows (0 rows/sec) - 9 rows -> loaded 0 vertices (0 vertices/sec) Total time: 16ms [0 warnings, 1 errors]

我确定我遗漏了一些愚蠢的东西...有没有人能够使用 ETL 导入包含点、多边形等的 WKT 字符串的 CSV 文件?

感谢任何帮助!

这对我有用:

commonGeoGlyphWKT.json

{
  "source": { "file": { "path": "./nazca_lines_wkt.csv" } },
  "extractor": { "csv": {
    "separator": ",",
    "columns": ["Name:String","Location:String"] } },
  "transformers": [
    { "command": { "command": "INSERT INTO GeoGlyphWKT(Name,Location) values('${input.Name}', St_GeomFromText('${input.Location}'))"} }
  ],
  "loader": {
    "orientdb": {
        "dbURL": "plocal:/home/ivan/OrientDB/db_installati/enterprise/orientdb-enterprise-2.2.13/databases/stack40982509-spatial",
        "dbUser": "admin",
        "dbPassword": "admin",
        "dbType": "graph",
        "batchCommit": 1000
    }
  }
}

nazca_lines_wkt.csv

Name,Location
Hummingbird,POINT (-75.148892 -14.692131)
Monkey,POINT (-75.138532 -14.706940)
Condor,POINT(-75.126208 -14.697444)
Spider,POINT(-75.122381 -14.694145)
Spiral,POINT(-75.122746 -14.688277)
Hands,POINT(-75.113881 -14.694459)
Tree,POINT(-75.114520 -14.693898)
Astronaut,POINT(-75.079755 -14.745222)
Dog,POINT(-75.130788 -14.706401)
Wing,POINT(-75.100385 -14.680309)
Parrot,POINT(-75.107498 -14.689463)

[ivan@canemagico-pc bin]$ ./oetl.sh commonGeoGlyphWKT2.json

OrientDB etl v.2.2.13 (build 2.2.x@r90d7caa1e4af3fad86594e592c64dc1202558ab1; 2016-11-15 12:04:05+0000) www.orientdb.com
[csv] INFO column types: {Name=STRING, Location=STRING}
BEGIN ETL PROCESSOR
[file] INFO Reading from file ./nazca_lines_wkt.csv with encoding UTF-8
Started execution with 1 worker threads
[orientdb] INFO committing
END ETL PROCESSOR
+ extracted 11 rows (0 rows/sec) - 11 rows -> loaded 11 vertices (0 vertices/sec) Total time: 244ms [0 warnings, 0 errors]

orientdb {db=stack40982509-spatial}> select from GeoGlyphWKT                                                                                                           

+----+-----+-----------+-----------+-----------------------+
|#   |@RID |@CLASS     |Name       |Location               |
+----+-----+-----------+-----------+-----------------------+
|0   |#25:0|GeoGlyphWKT|Hummingbird|OPoint{coordinates:[2]}|
|1   |#25:1|GeoGlyphWKT|Spiral     |OPoint{coordinates:[2]}|
|2   |#25:2|GeoGlyphWKT|Dog        |OPoint{coordinates:[2]}|
|3   |#26:0|GeoGlyphWKT|Monkey     |OPoint{coordinates:[2]}|
|4   |#26:1|GeoGlyphWKT|Hands      |OPoint{coordinates:[2]}|
|5   |#26:2|GeoGlyphWKT|Wing       |OPoint{coordinates:[2]}|
|6   |#27:0|GeoGlyphWKT|Condor     |OPoint{coordinates:[2]}|
|7   |#27:1|GeoGlyphWKT|Tree       |OPoint{coordinates:[2]}|
|8   |#27:2|GeoGlyphWKT|Parrot     |OPoint{coordinates:[2]}|
|9   |#28:0|GeoGlyphWKT|Spider     |OPoint{coordinates:[2]}|
|10  |#28:1|GeoGlyphWKT|Astronaut  |OPoint{coordinates:[2]}|
+----+-----+-----------+-----------+-----------------------+

11 item(s) found. Query executed in 0.013 sec(s).