如何根据公式提取特定页面?
How to extract specific pages based on a formula?
我正在尝试从 1000 页长的 PDF 中提取页面,但我只需要 [9,10,17,18,25,26,33,34,...等] 模式的页面].这些数字可以用公式表示:pg = 1/2 (7 - 3 (-1)^n + 8*n)
.
我尝试定义公式并插入 tabula.read_pdf
,但我不确定如何定义 'n' 变量,其中 'n' 的范围从 0 到 25。现在我将其定义为一个我认为是问题所在的列表...
n = list(range(25+1))
pg = 1/2 (7 - 3 (-1)^n + 8*n)
df = tabula.read_pdf(path, pages = 'pg',index_col=0, multiple_tables=False)
尝试执行时,我得到一个 TypeError:'int' object is not callable on line pg = 1/2 (7 - 3 (-1)^n + 8*n)
。我将如何定义变量以便 tabula 提取符合公式条件的页面?
公式为 x = 1/2(8n - 3(-1)^n + 7)
第一步:
pg = [] #Empty list to store the pages numbers calculated by formula
for i in range(1, 25+1): # For 1000 pages pdf use 1000 instead of 25
k = int(1/2*((8*n[i])-3*((-1)**n[i])+7))
pg.append(k)
print(pg, end = '') # This will give you list of page numbers
#[9, 10, 17, 18, 25, 26, 33, 34, 41, 42, 49, 50, 57, 58, 65, 66, 73, 74, 81, 82, 89, 90, 97, 98, 105]
第 2 步:
# Now run the loop through each of the pages with the table
df=pd.DataFrame([])
df_combine=pd.DataFrame([])
for pageiter in range(pg):
df = tabula.read_pdf(path, pages=pageiter+1 ,index_col=0, multiple_tables=False, guess=False) #modify it as per your requirement
df_combine=pd.concat([df,df_combine]) #you can choose between merge or concat as per your need
或
df_data = []
for pageiter in range(pg):
df = tabula.read_pdf(path, pages=pageiter+1 ,index_col=0, multiple_tables=False, guess=False) #modify it as per your requirement
df_data.append(df)
df_combine= pd.concat(df_data, axis=1)
参考link创建公式
https://www.wolframalpha.com/widgets/view.jsp?id=a3af2e675c3bfae0f2ecce820c2bef43
我正在尝试从 1000 页长的 PDF 中提取页面,但我只需要 [9,10,17,18,25,26,33,34,...等] 模式的页面].这些数字可以用公式表示:pg = 1/2 (7 - 3 (-1)^n + 8*n)
.
我尝试定义公式并插入 tabula.read_pdf
,但我不确定如何定义 'n' 变量,其中 'n' 的范围从 0 到 25。现在我将其定义为一个我认为是问题所在的列表...
n = list(range(25+1))
pg = 1/2 (7 - 3 (-1)^n + 8*n)
df = tabula.read_pdf(path, pages = 'pg',index_col=0, multiple_tables=False)
尝试执行时,我得到一个 TypeError:'int' object is not callable on line pg = 1/2 (7 - 3 (-1)^n + 8*n)
。我将如何定义变量以便 tabula 提取符合公式条件的页面?
公式为 x = 1/2(8n - 3(-1)^n + 7)
第一步:
pg = [] #Empty list to store the pages numbers calculated by formula
for i in range(1, 25+1): # For 1000 pages pdf use 1000 instead of 25
k = int(1/2*((8*n[i])-3*((-1)**n[i])+7))
pg.append(k)
print(pg, end = '') # This will give you list of page numbers
#[9, 10, 17, 18, 25, 26, 33, 34, 41, 42, 49, 50, 57, 58, 65, 66, 73, 74, 81, 82, 89, 90, 97, 98, 105]
第 2 步:
# Now run the loop through each of the pages with the table
df=pd.DataFrame([])
df_combine=pd.DataFrame([])
for pageiter in range(pg):
df = tabula.read_pdf(path, pages=pageiter+1 ,index_col=0, multiple_tables=False, guess=False) #modify it as per your requirement
df_combine=pd.concat([df,df_combine]) #you can choose between merge or concat as per your need
或
df_data = []
for pageiter in range(pg):
df = tabula.read_pdf(path, pages=pageiter+1 ,index_col=0, multiple_tables=False, guess=False) #modify it as per your requirement
df_data.append(df)
df_combine= pd.concat(df_data, axis=1)
参考link创建公式 https://www.wolframalpha.com/widgets/view.jsp?id=a3af2e675c3bfae0f2ecce820c2bef43