从文本文件读取时,Beam 是否支持自定义分隔符
Does Beam supports custom delimiter when reading from text file
我有 ldif 文件格式和分隔符为空行
dn: uid=12345,ab=users,xy=random
phone: 111
address: someaddress
email: true
username:abc
password:abc
dn: uid=12345,ab=users,xy=random
objectClass: inetOrgPerson
objectClass: top
phone: 111
address: someaddress
email: true
username:abcd
password:abcd
我想写点像
data = (p
| 'Read File From GCS' >> beam.io.textio.ReadFromText('gs://my-ldif.ldiff', delimiter='\r\n')
)
但在 python 中似乎没有指定分隔符的选项。引用官方文档,但没有说明如何提及定界符。
Parses a text file as newline-delimited elements, by default assuming UTF-8 encoding. Supports newline delimiters \n and \r\n.
我看到 java 中有这个,谁能说 python 是否支持定界符?
PAssert.that(p.apply(TextIO.read().from(filename).withDelimiter(new byte[] {'|', '*'})))
.containsInAnyOrder(
"To be, or not to be: that |is the question: To be, or not to be: "
+ "that *is the question: Whether 'tis nobler in the mind to suffer ",
"The slings and arrows of outrageous fortune,|");
p.run();
你是对的:Python.
还不可能
我找到了这个开放的功能请求票:https://issues.apache.org/jira/browse/BEAM-12730。对于有兴趣为 Beam 做贡献的人来说,这将是一个很好的入门任务!
我有 ldif 文件格式和分隔符为空行
dn: uid=12345,ab=users,xy=random
phone: 111
address: someaddress
email: true
username:abc
password:abc
dn: uid=12345,ab=users,xy=random
objectClass: inetOrgPerson
objectClass: top
phone: 111
address: someaddress
email: true
username:abcd
password:abcd
我想写点像
data = (p
| 'Read File From GCS' >> beam.io.textio.ReadFromText('gs://my-ldif.ldiff', delimiter='\r\n')
)
但在 python 中似乎没有指定分隔符的选项。引用官方文档,但没有说明如何提及定界符。
Parses a text file as newline-delimited elements, by default assuming UTF-8 encoding. Supports newline delimiters \n and \r\n.
我看到 java 中有这个,谁能说 python 是否支持定界符?
PAssert.that(p.apply(TextIO.read().from(filename).withDelimiter(new byte[] {'|', '*'})))
.containsInAnyOrder(
"To be, or not to be: that |is the question: To be, or not to be: "
+ "that *is the question: Whether 'tis nobler in the mind to suffer ",
"The slings and arrows of outrageous fortune,|");
p.run();
你是对的:Python.
还不可能我找到了这个开放的功能请求票:https://issues.apache.org/jira/browse/BEAM-12730。对于有兴趣为 Beam 做贡献的人来说,这将是一个很好的入门任务!