Scriptella - 动态命名 csv 文件
Scriptella - dynamically name csv file
我想创建一个 csv 文件,其中包含我在执行 etl 脚本期间获得的值。例如。我从一个序列中得到一个新值,并想将它附加到 csv 的名称中。听起来很简单,但我真的卡住了...
我的脚本:
<!DOCTYPE etl SYSTEM "http://scriptella.javaforge.com/dtd/etl.dtd">
<etl>
<description>Scriptella ETL</description>
<properties>
<include href="etl.properties"/> <!--Load from external properties file-->
</properties>
<!-- Connection declarations -->
<connection id="mypostgres" driver="$driver" url="$url" user="$user" password="$password" classpath="$classpath"/>
<connection driver="jexl" id="jexl"/>
<connection id="log" driver="text"/>
<query connection-id="mypostgres">
select nextval('transfer_id_seq') as tid
<script connection-id="jexl">
etl.globals['transferID'] = tid;
</script>
<script connection-id="log">
TransferID: ${etl.globals['transferID']}
</script>
</query>
<script connection-id="log">
TransferID (Outside query): ${etl.globals['transferID']}
</script>
<connection id="transfer-csv" driver="csv" url="transfer_${etl.globals['transferID']}.csv">
null_string=
quote=
</connection>
<script connection-id="transfer-csv">
col1, col2, col3
</script>
</etl>
我的输出:
C:\scriptella>scriptella
C:\java\jdk1.8\bin\java.exe -cp ;C:\dev\scriptella-1.1\lib\commons-compiler-jdk.jar;C:\dev\scriptella-1.1\lib\commons-compiler.jar;C:\dev\scriptella-1.1\lib\commons-jexl.jar;C:\dev\scriptella-1.
1\lib\commons-logging.jar;C:\dev\scriptella-1.1\lib\janino.jar;C:\dev\scriptella-1.1\lib\scriptella-core.jar;C:\dev\scriptella-1.1\lib\scriptella-drivers.jar;C:\dev\scriptella-1.1\lib\scriptella-tools
.jar scriptella.tools.launcher.EtlLauncher
23.02.2015 17:33:58 <WARNING> XML configuration warning in file:/C:/scriptella/etl.xml(35:7): The content of element type "etl" must match "(description?,properties?,connection*,(script*,
query*)*)".
23.02.2015 17:33:58 <INFO> Execution Progress.Initializing properties: 1%
23.02.2015 17:33:58 <INFO> Execution Progress.Initialized connection id=mypostgres, JdbcConnection{org.postgresql.jdbc4.Jdbc4Connection}, Dialect{PostgreSQL 9.3.2}, properties {}: 2%
23.02.2015 17:33:58 <INFO> Execution Progress.Initialized connection id=jexl, JexlConnection, Dialect{JEXL 2.0}, properties {}: 3%
23.02.2015 17:33:58 <INFO> Execution Progress.Initialized connection id=log, TextConnection, Dialect{Text 1.0}, properties {}: 4%
23.02.2015 17:33:58 <INFO> Execution Progress.Initialized connection id=transfer-csv, CsvConnection, Dialect{CSV 1.0}, properties {null_string=, quote=}: 5%
23.02.2015 17:33:58 <INFO> Execution Progress./etl/query[1] prepared: 6%
23.02.2015 17:33:58 <INFO> Execution Progress./etl/script[1] prepared: 7%
23.02.2015 17:33:58 <INFO> Execution Progress./etl/script[2] prepared: 10%
23.02.2015 17:33:58 <INFO> Registered JMX mbean: scriptella:type=etl,url="file:/C:/scriptella/etl.xml"
TransferID: 171
23.02.2015 17:33:58 <INFO> Execution Progress./etl/query[1] executed: 38%
TransferID (Outside query): 171
23.02.2015 17:33:58 <INFO> Execution Progress./etl/script[1] executed: 66%
23.02.2015 17:33:58 <INFO> Execution Progress./etl/script[2] executed: 95%
23.02.2015 17:33:58 <INFO> Execution Progress.Complete
23.02.2015 17:33:58 <INFO> Execution statistics:
Executed 1 query, 4 scripts, 4 statements
/etl/query[1]: Element successfully executed (1 statement). Working time 11 milliseconds. Avg throughput: 89,63 statements/sec.
/etl/query[1]/script[1]: Element successfully executed. Working time 9 milliseconds.
/etl/query[1]/script[2]: Element successfully executed (1 statement). Working time 4 milliseconds. Avg throughput: 206,37 statements/sec.
/etl/script[1]: Element successfully executed (1 statement). Working time 2 milliseconds. Avg throughput: 432,13 statements/sec.
/etl/script[2]: Element successfully executed (1 statement). Working time 2 milliseconds. Avg throughput: 447,04 statements/sec.
Total working time: 0,26 second
23.02.2015 17:33:58 <INFO> Successfully executed ETL file C:\scriptella\etl.xml
如您所见,csv 文件名错误:
Directory of C:\scriptella
23.02.2015 17:33 <DIR> .
23.02.2015 17:33 <DIR> ..
23.02.2015 11:28 282 etl.properties
23.02.2015 17:32 1.239 etl.xml
23.02.2015 17:33 133 transfer_transferID.csv
3 File(s) 1.654 bytes
2 Dir(s) 741.036.032 bytes free
无法拥有动态连接元素,因为 Scriptella 在启动时处理所有连接(来自您的 5% 日志行):
23.02.2015 17:33:58 <INFO> Execution Progress.Initialized connection id=transfer-csv, CsvConnection, Dialect{CSV 1.0}, properties {null_string=, quote=}: 5%
最好的选择是使用 scriptella 驱动程序,这将允许您调用另一个 etl.xml 作为子例程(实际上不需要全局变量):
etl.xml:
<!DOCTYPE etl SYSTEM "http://scriptella.javaforge.com/dtd/etl.dtd">
<etl>
<description>Scriptella ETL</description>
<properties>
<include href="etl.properties"/> <!--Load from external properties file-->
</properties>
<!-- Connection declarations -->
<connection id="mypostgres" driver="$driver" url="$url" user="$user" password="$password" classpath="$classpath"/>
<connection id="log" driver="text"/>
<connection id="scriptella" driver="scriptella"/>
<query connection-id="mypostgres">
select nextval('transfer_id_seq') as tid
<script connection-id="log">
TransferID: $tid
</script>
<script connection-id="scriptella">
dynamic.xml
</script>
</query>
</etl>
dynamic.xml:
<!DOCTYPE etl SYSTEM "http://scriptella.javaforge.com/dtd/etl.dtd">
<etl>
<connection id="transfer-csv" driver="csv" url="transfer_${tid}.csv">
null_string=
quote=
</connection>
<script connection-id="transfer-csv">
col1, col2, col3
</script>
</etl>
注意: dynamic.xml 文件的连接 url.
中需要 ${var} 语法
此外,无法将 scriptella 附加到 csv 文件(每次都会 运行cate),所以我认为您要完成的工作可能需要重新考虑您的过程。 The Scriptella FAQ on Working with CSV Data 建议使用 HSQLDB 文本表,这可能会有所帮助——使用 HSQLDB 或 H2 暂存您需要导出的数据可能会提高性能,并使您的流程在长期 运行 中更易于维护。
在 Scriptella 1.2 中(我不确定它是否适用于旧的 Scriptella 版本),您可以像这样动态设置 CSV 文件名:
<connection id="out" driver="csv" url="my_report_${date:today('yyyyMMdd_HHmmss')}.csv">
根据:http://scriptella.org/reference/index.html#%3Cproperties%3E
参见 "Expressions and Variables Substitution"。
顺便说一句,Scriptella 缺乏结构化和方便的文档。
我想创建一个 csv 文件,其中包含我在执行 etl 脚本期间获得的值。例如。我从一个序列中得到一个新值,并想将它附加到 csv 的名称中。听起来很简单,但我真的卡住了...
我的脚本:
<!DOCTYPE etl SYSTEM "http://scriptella.javaforge.com/dtd/etl.dtd">
<etl>
<description>Scriptella ETL</description>
<properties>
<include href="etl.properties"/> <!--Load from external properties file-->
</properties>
<!-- Connection declarations -->
<connection id="mypostgres" driver="$driver" url="$url" user="$user" password="$password" classpath="$classpath"/>
<connection driver="jexl" id="jexl"/>
<connection id="log" driver="text"/>
<query connection-id="mypostgres">
select nextval('transfer_id_seq') as tid
<script connection-id="jexl">
etl.globals['transferID'] = tid;
</script>
<script connection-id="log">
TransferID: ${etl.globals['transferID']}
</script>
</query>
<script connection-id="log">
TransferID (Outside query): ${etl.globals['transferID']}
</script>
<connection id="transfer-csv" driver="csv" url="transfer_${etl.globals['transferID']}.csv">
null_string=
quote=
</connection>
<script connection-id="transfer-csv">
col1, col2, col3
</script>
</etl>
我的输出:
C:\scriptella>scriptella
C:\java\jdk1.8\bin\java.exe -cp ;C:\dev\scriptella-1.1\lib\commons-compiler-jdk.jar;C:\dev\scriptella-1.1\lib\commons-compiler.jar;C:\dev\scriptella-1.1\lib\commons-jexl.jar;C:\dev\scriptella-1.
1\lib\commons-logging.jar;C:\dev\scriptella-1.1\lib\janino.jar;C:\dev\scriptella-1.1\lib\scriptella-core.jar;C:\dev\scriptella-1.1\lib\scriptella-drivers.jar;C:\dev\scriptella-1.1\lib\scriptella-tools
.jar scriptella.tools.launcher.EtlLauncher
23.02.2015 17:33:58 <WARNING> XML configuration warning in file:/C:/scriptella/etl.xml(35:7): The content of element type "etl" must match "(description?,properties?,connection*,(script*,
query*)*)".
23.02.2015 17:33:58 <INFO> Execution Progress.Initializing properties: 1%
23.02.2015 17:33:58 <INFO> Execution Progress.Initialized connection id=mypostgres, JdbcConnection{org.postgresql.jdbc4.Jdbc4Connection}, Dialect{PostgreSQL 9.3.2}, properties {}: 2%
23.02.2015 17:33:58 <INFO> Execution Progress.Initialized connection id=jexl, JexlConnection, Dialect{JEXL 2.0}, properties {}: 3%
23.02.2015 17:33:58 <INFO> Execution Progress.Initialized connection id=log, TextConnection, Dialect{Text 1.0}, properties {}: 4%
23.02.2015 17:33:58 <INFO> Execution Progress.Initialized connection id=transfer-csv, CsvConnection, Dialect{CSV 1.0}, properties {null_string=, quote=}: 5%
23.02.2015 17:33:58 <INFO> Execution Progress./etl/query[1] prepared: 6%
23.02.2015 17:33:58 <INFO> Execution Progress./etl/script[1] prepared: 7%
23.02.2015 17:33:58 <INFO> Execution Progress./etl/script[2] prepared: 10%
23.02.2015 17:33:58 <INFO> Registered JMX mbean: scriptella:type=etl,url="file:/C:/scriptella/etl.xml"
TransferID: 171
23.02.2015 17:33:58 <INFO> Execution Progress./etl/query[1] executed: 38%
TransferID (Outside query): 171
23.02.2015 17:33:58 <INFO> Execution Progress./etl/script[1] executed: 66%
23.02.2015 17:33:58 <INFO> Execution Progress./etl/script[2] executed: 95%
23.02.2015 17:33:58 <INFO> Execution Progress.Complete
23.02.2015 17:33:58 <INFO> Execution statistics:
Executed 1 query, 4 scripts, 4 statements
/etl/query[1]: Element successfully executed (1 statement). Working time 11 milliseconds. Avg throughput: 89,63 statements/sec.
/etl/query[1]/script[1]: Element successfully executed. Working time 9 milliseconds.
/etl/query[1]/script[2]: Element successfully executed (1 statement). Working time 4 milliseconds. Avg throughput: 206,37 statements/sec.
/etl/script[1]: Element successfully executed (1 statement). Working time 2 milliseconds. Avg throughput: 432,13 statements/sec.
/etl/script[2]: Element successfully executed (1 statement). Working time 2 milliseconds. Avg throughput: 447,04 statements/sec.
Total working time: 0,26 second
23.02.2015 17:33:58 <INFO> Successfully executed ETL file C:\scriptella\etl.xml
如您所见,csv 文件名错误:
Directory of C:\scriptella
23.02.2015 17:33 <DIR> .
23.02.2015 17:33 <DIR> ..
23.02.2015 11:28 282 etl.properties
23.02.2015 17:32 1.239 etl.xml
23.02.2015 17:33 133 transfer_transferID.csv
3 File(s) 1.654 bytes
2 Dir(s) 741.036.032 bytes free
无法拥有动态连接元素,因为 Scriptella 在启动时处理所有连接(来自您的 5% 日志行):
23.02.2015 17:33:58 <INFO> Execution Progress.Initialized connection id=transfer-csv, CsvConnection, Dialect{CSV 1.0}, properties {null_string=, quote=}: 5%
最好的选择是使用 scriptella 驱动程序,这将允许您调用另一个 etl.xml 作为子例程(实际上不需要全局变量):
etl.xml:
<!DOCTYPE etl SYSTEM "http://scriptella.javaforge.com/dtd/etl.dtd">
<etl>
<description>Scriptella ETL</description>
<properties>
<include href="etl.properties"/> <!--Load from external properties file-->
</properties>
<!-- Connection declarations -->
<connection id="mypostgres" driver="$driver" url="$url" user="$user" password="$password" classpath="$classpath"/>
<connection id="log" driver="text"/>
<connection id="scriptella" driver="scriptella"/>
<query connection-id="mypostgres">
select nextval('transfer_id_seq') as tid
<script connection-id="log">
TransferID: $tid
</script>
<script connection-id="scriptella">
dynamic.xml
</script>
</query>
</etl>
dynamic.xml:
<!DOCTYPE etl SYSTEM "http://scriptella.javaforge.com/dtd/etl.dtd">
<etl>
<connection id="transfer-csv" driver="csv" url="transfer_${tid}.csv">
null_string=
quote=
</connection>
<script connection-id="transfer-csv">
col1, col2, col3
</script>
</etl>
注意: dynamic.xml 文件的连接 url.
中需要 ${var} 语法此外,无法将 scriptella 附加到 csv 文件(每次都会 运行cate),所以我认为您要完成的工作可能需要重新考虑您的过程。 The Scriptella FAQ on Working with CSV Data 建议使用 HSQLDB 文本表,这可能会有所帮助——使用 HSQLDB 或 H2 暂存您需要导出的数据可能会提高性能,并使您的流程在长期 运行 中更易于维护。
在 Scriptella 1.2 中(我不确定它是否适用于旧的 Scriptella 版本),您可以像这样动态设置 CSV 文件名:
<connection id="out" driver="csv" url="my_report_${date:today('yyyyMMdd_HHmmss')}.csv">
根据:http://scriptella.org/reference/index.html#%3Cproperties%3E 参见 "Expressions and Variables Substitution"。
顺便说一句,Scriptella 缺乏结构化和方便的文档。