如何使用Pandas在内存中处理Paramiko从远程shell命令返回的数据?

How to process data returned by Paramiko from remote shell command in memory using Pandas?

我在通过 Paramiko 从配置单元导出数据时遇到问题。通常我会执行以下操作来代替坏行错误,但在同一台服务器上

with open('xxx.tsv', 'r') as temp_f:
    # get No of columns in each line
    col_count = [ len(l.split(",")) for l in temp_f.readlines() ]
### Generate column names  (names will be 0, 1, 2, ..., maximum columns - 1)
column_names = [i for i in range(0, max(col_count))]
### Read csv
df2 = pd.read_csv('xxx.tsv', header=None, 
delimiter="\t", names=column_names)
df2 = df2.rename(columns=df2.iloc[0]).drop(df2.index[0])
df2 = df2[['content_id', 'title','product_id', 'type', 'episode_total','template_model','tags_name','grade','isdeleted' ,'actor']]

现在我想做的就是如何把上面的代码和我的代码这样结合起来

import paramiko 
import traceback
from io import StringIO 
import pandas as pd 

host = 'xxxx'
conn_obj = paramiko.SSHClient()
conn_obj.set_missing_host_key_policy(paramiko.AutoAddPolicy())

conn_obj.connect(host, username="xxxx",
                 password='xxxx')# insert username and password

query='"select content_id as content_id, title as title,product_id as product_id, type as type, episode_total as episode_total, template_model as template_model, tags_name as tags_name,grade as grade, isdeleted as isdeleted, actor as actor from aaa.aaa;"' 
hive_query = 'beeline xxxx --outputformat=tsv2 -e '+ query 
print(hive_query)
std_in, std_out, std_err = conn_obj.exec_command(hive_query)
edge_out_str = str(std_out.read())
edge_out_str_n = "\n".join(edge_out_str.split("\n")) 
edge_out_csv = StringIO(edge_out_str_n)
with open(edge_out_csv) as temp_f:
    #get No of columns in each line
    col_count = [ len(l.split(",")) for l in temp_f.readlines() ]
### Generate column names  (names will be 0, 1, 2, ..., maximum columns - 1)
column_names = [i for i in range(0, max(col_count))]
### Read csv
df2 = pd.read_csv(temp_f, header=None, delimiter="\t", names=column_names)
df2 = df2.rename(columns=df2.iloc[0]).drop(df2.index[0])
df2 = df2[['content_id', 'title','product_id', 'type', 'episode_total', 'template_model', 'tags_name','grade','isdeleted' ,'actor']]
conn_obj.close()

当我执行脚本时出现这样的错误

Error :Traceback (most recent call last):
  File "<ipython-input-13-360c6dba28e1>", line 21
    with open(edge_out_csv) as temp_f:
TypeError: expected str, bytes or os.PathLike object, not _io.StringIO

StringIO 已经是一个 类文件 对象。所以你用它代替 temp_f file:

with StringIO(edge_out_str_n) as edge_out_csv:
    # get No of columns in each line
    col_count = [ len(l.split(",")) for l in edge_out_csv.readlines() ]
    ### Generate column names  (names will be 0, 1, 2, ..., maximum columns - 1)
    column_names = [i for i in range(0, max(col_count))]
    # Seek back to the beginning
    edge_out_csv.seek(0)
    ### Read csv
    df2 = pd.read_csv(temp_f, header=None, delimiter="\t", names=column_names)

强制性警告:请勿使用 AutoAddPolicy – 您正在失去针对 MITM attacks by doing so. For a correct solution, see Paramiko "Unknown Server".

的保护