从子流程输出中获取价值

Question

这是我的输入：

info = subprocess.run(['pdfinfo', 'test.pdf'], stdout=subprocess.PIPE)

这是 info 的输出：

b'Title:          Aboriginal Custom Adoption Recognition\r\nAuthor:         
Department of Justice\r\nCreator:        PScript5.dll Version 
5.2.2\r\nProducer:       Acrobat Distiller 10.0.0 (Windows)\r\nCreationDate:     
Wed Feb 20 11:12:48 2013 Eastern Standard Time\r\nModDate:        Wed Feb 20 
11:12:55 2013 Eastern Standard Time\r\nTagged:         no\r\nUserProperties: 
no\r\nSuspects:       no\r\nForm:           none\r\nJavaScript:     
no\r\nPages:          6\r\nEncrypted:      no\r\nPage size:      612 x 792 
pts (letter)\r\nPage rot:       0\r\nFile size:      20059 
bytes\r\nOptimized:      no\r\nPDF version:    1.5\r\n'

我想获取 Pages: 6 的整数值（即 pdf 中的页数）。有没有办法通过子流程来获取这个？如果没有，关于如何在我有大量 pdf 的情况下始终如一地获取该值有什么建议吗？

Answer 1

只需使用正则表达式来获取'Pages: '之后的整数。

import re
print(re.findall(r'^Pages:\s+(\d+)', info.stdout.read().decode('utf-8'), flags=re.MULTILINE)[0])

从子流程输出中获取价值

Get value from subprocess output

python

subprocess

pdftotext