将 Java.io.BufferedReader 转换为 Python 对象
Transforming Java.io.BufferedReader into Python object
通过使用以下代码(来源:https://docs.microsoft.com/en-us/azure/databricks/kb/python/hdfs-to-read-files)
URI = sc._gateway.jvm.java.net.URI
Path = sc._gateway.jvm.org.apache.hadoop.fs.Path
FileSystem = sc._gateway.jvm.org.apache.hadoop.fs.FileSystem
conf = sc._jsc.hadoopConfiguration()
conf.set(
"fs.azure.account.key.<account-name>.blob.core.windows.net,
"<account-access-key>")
fs = Path('wasbs://<container-name>@<account-name>.blob.core.windows.net/<file-path>/').getFileSystem(sc._jsc.hadoopConfiguration())
istream = fs.open(Path('wasbs://<container-name>@<account-name>.blob.core.windows.net/<file-path>/'))
reader = sc._gateway.jvm.java.io.BufferedReader(sc._jvm.java.io.InputStreamReader(istream))
while True:
thisLine = reader.readLine()
if thisLine is not None:
print(thisLine)
else:
break
istream.close()
我收到 java.io.BufferedReader 类型的对象 reader,我想用它来由 pandas、geopandas 或其他库读取(不是像示例中那样逐行读取和打印)。
你能帮帮我吗?
谢谢
卢卡斯
我会尝试将 BufferedReader
内容读入一个字符串,而不是用 pd.read_csv(StringIO(string))
:
读取这个字符串
string = reader.lines().collect(sc._jvm.java.util.stream.Collectors.joining())
df = pd.read_csv(StringIO(string))
通过使用以下代码(来源:https://docs.microsoft.com/en-us/azure/databricks/kb/python/hdfs-to-read-files)
URI = sc._gateway.jvm.java.net.URI
Path = sc._gateway.jvm.org.apache.hadoop.fs.Path
FileSystem = sc._gateway.jvm.org.apache.hadoop.fs.FileSystem
conf = sc._jsc.hadoopConfiguration()
conf.set(
"fs.azure.account.key.<account-name>.blob.core.windows.net,
"<account-access-key>")
fs = Path('wasbs://<container-name>@<account-name>.blob.core.windows.net/<file-path>/').getFileSystem(sc._jsc.hadoopConfiguration())
istream = fs.open(Path('wasbs://<container-name>@<account-name>.blob.core.windows.net/<file-path>/'))
reader = sc._gateway.jvm.java.io.BufferedReader(sc._jvm.java.io.InputStreamReader(istream))
while True:
thisLine = reader.readLine()
if thisLine is not None:
print(thisLine)
else:
break
istream.close()
我收到 java.io.BufferedReader 类型的对象 reader,我想用它来由 pandas、geopandas 或其他库读取(不是像示例中那样逐行读取和打印)。
你能帮帮我吗?
谢谢 卢卡斯
我会尝试将 BufferedReader
内容读入一个字符串,而不是用 pd.read_csv(StringIO(string))
:
string = reader.lines().collect(sc._jvm.java.util.stream.Collectors.joining())
df = pd.read_csv(StringIO(string))