将 Java.io.BufferedReader 转换为 Python 对象

Transforming Java.io.BufferedReader into Python object

通过使用以下代码(来源:https://docs.microsoft.com/en-us/azure/databricks/kb/python/hdfs-to-read-files

URI = sc._gateway.jvm.java.net.URI
Path = sc._gateway.jvm.org.apache.hadoop.fs.Path
FileSystem = sc._gateway.jvm.org.apache.hadoop.fs.FileSystem
conf = sc._jsc.hadoopConfiguration()

conf.set(
  "fs.azure.account.key.<account-name>.blob.core.windows.net,
  "<account-access-key>")

fs = Path('wasbs://<container-name>@<account-name>.blob.core.windows.net/<file-path>/').getFileSystem(sc._jsc.hadoopConfiguration())
istream = fs.open(Path('wasbs://<container-name>@<account-name>.blob.core.windows.net/<file-path>/'))

reader = sc._gateway.jvm.java.io.BufferedReader(sc._jvm.java.io.InputStreamReader(istream))

while True:
  thisLine = reader.readLine()
  if thisLine is not None:
    print(thisLine)
  else:
    break

istream.close()

我收到 java.io.BufferedReader 类型的对象 reader,我想用它来由 pandas、geopandas 或其他库读取(不是像示例中那样逐行读取和打印)。

你能帮帮我吗?

谢谢 卢卡斯

我会尝试将 BufferedReader 内容读入一个字符串,而不是用 pd.read_csv(StringIO(string)):

读取这个字符串
string = reader.lines().collect(sc._jvm.java.util.stream.Collectors.joining())
df = pd.read_csv(StringIO(string))