从子流程输出中获取价值
Get value from subprocess output
这是我的输入:
info = subprocess.run(['pdfinfo', 'test.pdf'], stdout=subprocess.PIPE)
这是 info
的输出:
b'Title: Aboriginal Custom Adoption Recognition\r\nAuthor:
Department of Justice\r\nCreator: PScript5.dll Version
5.2.2\r\nProducer: Acrobat Distiller 10.0.0 (Windows)\r\nCreationDate:
Wed Feb 20 11:12:48 2013 Eastern Standard Time\r\nModDate: Wed Feb 20
11:12:55 2013 Eastern Standard Time\r\nTagged: no\r\nUserProperties:
no\r\nSuspects: no\r\nForm: none\r\nJavaScript:
no\r\nPages: 6\r\nEncrypted: no\r\nPage size: 612 x 792
pts (letter)\r\nPage rot: 0\r\nFile size: 20059
bytes\r\nOptimized: no\r\nPDF version: 1.5\r\n'
我想获取 Pages: 6
的整数值(即 pdf 中的页数)。有没有办法通过子流程来获取这个?如果没有,关于如何在我有大量 pdf 的情况下始终如一地获取该值有什么建议吗?
只需使用正则表达式来获取'Pages: '
之后的整数。
import re
print(re.findall(r'^Pages:\s+(\d+)', info.stdout.read().decode('utf-8'), flags=re.MULTILINE)[0])
这是我的输入:
info = subprocess.run(['pdfinfo', 'test.pdf'], stdout=subprocess.PIPE)
这是 info
的输出:
b'Title: Aboriginal Custom Adoption Recognition\r\nAuthor:
Department of Justice\r\nCreator: PScript5.dll Version
5.2.2\r\nProducer: Acrobat Distiller 10.0.0 (Windows)\r\nCreationDate:
Wed Feb 20 11:12:48 2013 Eastern Standard Time\r\nModDate: Wed Feb 20
11:12:55 2013 Eastern Standard Time\r\nTagged: no\r\nUserProperties:
no\r\nSuspects: no\r\nForm: none\r\nJavaScript:
no\r\nPages: 6\r\nEncrypted: no\r\nPage size: 612 x 792
pts (letter)\r\nPage rot: 0\r\nFile size: 20059
bytes\r\nOptimized: no\r\nPDF version: 1.5\r\n'
我想获取 Pages: 6
的整数值(即 pdf 中的页数)。有没有办法通过子流程来获取这个?如果没有,关于如何在我有大量 pdf 的情况下始终如一地获取该值有什么建议吗?
只需使用正则表达式来获取'Pages: '
之后的整数。
import re
print(re.findall(r'^Pages:\s+(\d+)', info.stdout.read().decode('utf-8'), flags=re.MULTILINE)[0])