将 Java.io.BufferedReader 转换为 Python 对象

Question

通过使用以下代码（来源：https://docs.microsoft.com/en-us/azure/databricks/kb/python/hdfs-to-read-files）

URI = sc._gateway.jvm.java.net.URI
Path = sc._gateway.jvm.org.apache.hadoop.fs.Path
FileSystem = sc._gateway.jvm.org.apache.hadoop.fs.FileSystem
conf = sc._jsc.hadoopConfiguration()

conf.set(
  "fs.azure.account.key.<account-name>.blob.core.windows.net,
  "<account-access-key>")

fs = Path('wasbs://<container-name>@<account-name>.blob.core.windows.net/<file-path>/').getFileSystem(sc._jsc.hadoopConfiguration())
istream = fs.open(Path('wasbs://<container-name>@<account-name>.blob.core.windows.net/<file-path>/'))

reader = sc._gateway.jvm.java.io.BufferedReader(sc._jvm.java.io.InputStreamReader(istream))

while True:
  thisLine = reader.readLine()
  if thisLine is not None:
    print(thisLine)
  else:
    break

istream.close()

我收到 java.io.BufferedReader 类型的对象 reader，我想用它来由 pandas、geopandas 或其他库读取（不是像示例中那样逐行读取和打印）。

你能帮帮我吗？

谢谢卢卡斯

Answer 1

我会尝试将 BufferedReader 内容读入一个字符串，而不是用 pd.read_csv(StringIO(string)):

读取这个字符串

string = reader.lines().collect(sc._jvm.java.util.stream.Collectors.joining())
df = pd.read_csv(StringIO(string))

将 Java.io.BufferedReader 转换为 Python 对象

Transforming Java.io.BufferedReader into Python object

python

java

azure

hdfs