Python 使用子进程 hdfs 打开 HDFS 文件 "cat: Illegal file pattern: Illegal character range near index 11"
Python HDFS file open using subprocess hdfs "cat: Illegal file pattern: Illegal character range near index 11"
我正在尝试加载存储在 HDFS 中的 Hadoop 集群上的 informatica 日志文件。我在 Python 中使用子进程来执行此操作,但相信由于文件名而出现错误,我不确定如何解决它。
我得到的错误是"cat: Illegal file pattern: Illegal character range near index 11"
我的代码是:
input = subprocess.Popen(["hadoop", "fs", "-cat", '/corp_staffs/IT/IICOE/process/infa_stats/WorkflowLogs/infra.[08-04-2015-(15_19)].1438719569664.log'], stdout=subprocess.PIPE)
# read the lines into an array
for line in input.stdout:
print line
我可以重命名每个文件以避免 cat 认为文件名中有正则表达式,但我宁愿不这样做。有没有办法解决这个问题?
quotechars=re.compile('|'.join(re.escape(s) for s in r'\[]()*?'))
def quote_name(filename):
return re.sub(quotechars, r'\\g<0>', filename)
input = subprocess.Popen(
[
"hadoop", "fs", "-cat",
quote_name('/corp_staffs/IT/IICOE/process/infa_stats/WorkflowLogs/infra.[08-04-2015-(15_19)].1438719569664.log')
], stdout=subprocess.PIPE)
我正在尝试加载存储在 HDFS 中的 Hadoop 集群上的 informatica 日志文件。我在 Python 中使用子进程来执行此操作,但相信由于文件名而出现错误,我不确定如何解决它。
我得到的错误是"cat: Illegal file pattern: Illegal character range near index 11"
我的代码是:
input = subprocess.Popen(["hadoop", "fs", "-cat", '/corp_staffs/IT/IICOE/process/infa_stats/WorkflowLogs/infra.[08-04-2015-(15_19)].1438719569664.log'], stdout=subprocess.PIPE)
# read the lines into an array
for line in input.stdout:
print line
我可以重命名每个文件以避免 cat 认为文件名中有正则表达式,但我宁愿不这样做。有没有办法解决这个问题?
quotechars=re.compile('|'.join(re.escape(s) for s in r'\[]()*?'))
def quote_name(filename):
return re.sub(quotechars, r'\\g<0>', filename)
input = subprocess.Popen(
[
"hadoop", "fs", "-cat",
quote_name('/corp_staffs/IT/IICOE/process/infa_stats/WorkflowLogs/infra.[08-04-2015-(15_19)].1438719569664.log')
], stdout=subprocess.PIPE)