从文本文件读取时,Beam 是否支持自定义分隔符

Does Beam supports custom delimiter when reading from text file

我有 ldif 文件格式和分隔符为空行

dn: uid=12345,ab=users,xy=random
phone: 111
address: someaddress
email: true
username:abc
password:abc


dn: uid=12345,ab=users,xy=random
objectClass: inetOrgPerson
objectClass: top
phone: 111
address: someaddress
email: true
username:abcd
password:abcd

我想写点像

data = (p
            | 'Read File From GCS' >> beam.io.textio.ReadFromText('gs://my-ldif.ldiff', delimiter='\r\n')
            )

但在 python 中似乎没有指定分隔符的选项。引用官方文档,但没有说明如何提及定界符。

Parses a text file as newline-delimited elements, by default assuming UTF-8 encoding. Supports newline delimiters \n and \r\n.

我看到 java 中有这个,谁能说 python 是否支持定界符?

 PAssert.that(p.apply(TextIO.read().from(filename).withDelimiter(new byte[] {'|', '*'})))
   .containsInAnyOrder(
     "To be, or not to be: that |is the question: To be, or not to be: "
       + "that *is the question: Whether 'tis nobler in the mind to suffer ",
     "The slings and arrows of outrageous fortune,|");
 p.run();

你是对的:Python.

还不可能

我找到了这个开放的功能请求票:https://issues.apache.org/jira/browse/BEAM-12730。对于有兴趣为 Beam 做贡献的人来说,这将是一个很好的入门任务!