使用适合页面的文本创建一页 PDF

Creating one-page PDF with fit-to-page text

我正在寻找一种从任意长度的文本生成单页 PDF 文件的方法,自动适应页面字体大小,具有合理的边距,居中 H/W。

command --text="Text of arbitrary length" --output=one-page-file.pdf

也就是我要重新制作

magick -gravity center -background white -fill black -size 1728x972 -font /Users/marekkowalczyk/Library/Fonts/RobotoMono-Medium.ttf caption:"Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat." -background white -extent 1920x1080 long.pdf

其中输出文件是“真正的”PDF,而不是嵌入在 PDF 中的图像文件 --- 显然用生成 PDF 的工具(PostScript?TeX?)代替 ImageMagick。

我想到了以下技巧。

  1. 创建一个 SVG 矢量图像。
convert\
    -gravity\
        center\
    -background\
        white\
    -fill\
        black\
    -size\
        1728x972\
    -extent\
        1920x1080\
    -font ~/Library/Fonts/RobotoMono-Medium.ttf\
    caption:"Lorem ipsum"\
    lorem.svg
  1. 将其转换为向量PDF。请注意,直接使用 convert 生成 PDF 是行不通的,因为该文件只是一个嵌入的位图图像。

svg2pdf lorem.svg lorem.pdf

  1. 使用ocrmypdf添加一层文字。这一步是必要的,因为上一步的 PDF 只是字母形状的矢量图像,不像 LaTeX
  2. 渲染的 PDF
ocrmypdf -l pol+eng --output-type pdfa --clean lorem.pdf lorem-ocr.pdf

Hacky 极了,但完成了工作。

正确的解决方案将涉及以某种方式访问​​ ImageMagick 内部布局引擎并在将其转换为位图之前捕获其输出。

Update 2022-03-03:
onepg.sh 中,在 paste -s -d ' ' 之后添加了一个 -(破折号) 指定 stdin 作为输入。 如果 ghostscript 说“无法打开文件 /dev/stdout”,我建议 编辑 onepg.sh 如下:将 /dev/stdout 更改为 %stdout/dev/stderr%stderr,以及 -f /dev/stdin-f -(但留下 : ${infile='/dev/stdin'} 原样)。 (结束更新)


问这个问题已经有一段时间了。尽管如此...

这是一个 PostScript 程序 (onepg.ps) 和一个 POSIX shell 脚本 (onepg.sh) 使用 ghostscript 9.50 创建 one-page PDF 调整字体大小以填充页面。 运行 为:

echo 'BZZZT ~ Train leaving in 45 minutes' | ./onepg.sh > bzzzt.pdf

或者,将中欧或东欧纯文本文件转换为 PDF 蓝色文字,

tocode=latin2 rgbtext='0 0 255' ./onepg.sh < some.txt > some.pdf

或者,对于紧凑的独立 PostScript 文件和跟踪日志文件,

TRACE=x logfile=file.log psoutfile=file.ps outfile=file.null ./onepg.sh < file.txt

或者,对于横向的 A5 大小的 PNG 文件,

PAPERSIZE=a5 landscape=x outfile=file.png ./onepg.sh < file.txt

驱动程序shell脚本

  • 提供文件、编码、页面大小和页边距的默认值, 字体等
  • 转换和格式化输入文本 - 其中可能包含 ~ (波浪号空白)作为部分分隔符 - 使用标准工具 iconvsed
  • 发出 PostScript 启动代码以调用 vert-centr 过程 在 onepg.ps
  • 调用ghostscript生成输出文件;默认格式为 PDF
  • 使用shell参数扩展(已记录 here)

注意:很短的文本中的长词可能会被截断 放大字体。


我应该提一下,我在紫色的月亮里做了一次 PostScript。

对于 one-page 输出效率不是一个大问题,算法 使用起来很简单。 vert-centr 过程调用 adjustfont 计算字体大小,以便文本填充艺术框( 页面有意义的内容)通过重复调用 linebreakr divide-and-conquer 方法。当行数等于时停止 floor(artbox height / font size) 或者当计算出的字体大小没有 更长的变化。最后vert-centr显示页面分布 多余的垂直空白均匀地分布在线条和中心线之间 横向;没有完成其他格式设置。

encodefont proc 支持 ASCII(标准编码)、Latin-1 和 拉丁语 2。输入文本转换为 iconv--to-code="...//TRANSLIT" 等可能无法准确表示。 //TRANSLIT 便于 UTF-8 输入,但在输出中留下 ? 如果无法进行音译。

如果使用 non-empty TRACE shell 调用 onepg.sh 脚本 变量 artbox 在输出文件中概述,并遵循 写入跟踪日志(stderr,默认情况下):

  • 画框尺寸
  • table字号计算:
    1. 字体大小min:max
    2. 当前字体大小
    3. 行数
    4. floor(画框高度/字体大小)
    5. stringwidth 当前字体的文本
  • 每行的Y坐标

示例跟踪日志:

artbox: x=71 y=67 w=452 h=707 x+w=523 y+h=774
szrg    ftsz    lnct    h/ftsz  textw
6:144   75      69      9       23870
6:75    40      34      17      12730
6:40    23      19      30      7320
23:40   31      26      22      9866
23:31   27      23      26      8593
27:31   29      24      24      9229
lnypos: 749.0 719.542 690.083 660.625 631.167 601.708 572.25 542.792 513.333 483.875 454.417 424.958 395.5 366.042 336.583 307.125 277.667 248.208 218.75 189.292 159.833 130.375 100.917 71.4584

文件:onepg.ps

% onepg.ps -- convert text to fit one page, adapting font
%
% Notes:
% - invoke with accompanying POSIX shell script onepg.sh
% - intended for one-page texts, not for extreme-size texts or words
% - supports section breaks, e.g. (end para.~ Next para), see /SECT
%   NB: section delimiter must be followed by a word delimiter (blank)
% - /fsMin, /fsMax font sizes are defined in /adjustfont
% - for StandardEncoding /encodefont is not needed
% - use Latin-2 encoding vector for ISO 8859-2 compatibility
% - tested with ghostscript 9.50, evince 3.36.7, okular 1.9.3

/TRACE false def        % trace info flag
/SECT (~) 0 get def     % section delimiter char (use 7bit ascii)


/Trace { % (string) --> ...
  TRACE { print flush } if
} bind def

/strN { % any --> (string)
  32 string cvs
} bind def


% Concatenate N strings.
% (s1) (s2) (s3) ... (sN) n  -->  (s1s2s3...sN)
% origin:  (with comments)
/ncat {
    dup 1 add
    copy
    0 exch { exch length add } repeat
    string exch
    0 exch
    -1 1 {
        2 add -1 roll
        3 copy putinterval
        length add
    } for
    pop
} def 


% Split text into lines, call back for each, return line count.
% NB: newlines get no special treatment (so replace with word delimiter)
% stack: text word-delimiter maxwidth eolproc(lntext,lnwidth) --> lnct
/linebreakr {
  0 begin
    /eolproc exch def
    /maxlinewidth exch cvr def
    /delim exch def
    /qtxt exch def                  % queued text

    /qtxtlen qtxt length def
    /qtxtlnct 0 def
    /delimlen delim length def
    /delimwd delim stringwidth pop def
    {
        qtxtlen 0 le { exit } if
        /qtxtlnct qtxtlnct 1 add def
        /lntxt qtxt def             % rest of current line
        /lnlen 0 def
        /lnwidth 0.0 def
        { % process current line
            % string seek <search> post match pre true
            % string seek <search> string false
            lntxt delim search      % look for next delimiter
            /inq exch def           % queue not empty if found
            /nextword exch def
            /nextwordlen nextword length def
            inq { pop /lntxt exch def } if

            /atsect 0 def          % SECT at end of nextword?
            nextwordlen 0 ne { % if
              nextword nextwordlen 1 sub get SECT eq { % if
                /atsect 1 def
                /qtxtlnct qtxtlnct 1 add def
                /nextword nextword 0 nextwordlen 1 sub getinterval def
              } if
            } if
            % at end of line if passing max unless no words 
            % seen, in which case truncating a rather long word, 
            % cf. https://en.wikipedia.org/wiki/Longest_words
            /wordwidth nextword stringwidth pop def
            lnwidth wordwidth add maxlinewidth gt lnlen 0 gt and {
              exit      % FIXME: better to add delimwd before exit
            } if
            /lnwidth lnwidth wordwidth add delimwd add def
            inq not atsect 0 ne or {
              /lnlen lnlen nextwordlen add def
              exit
            } if
            /lnlen lnlen nextwordlen add delimlen add def
        } loop  % line
        % call back line+width
        qtxt 0 lnlen atsect sub getinterval lnwidth delimwd sub eolproc
        atsect 0 ne { () 0.0 eolproc } if   % call back linefeed
        % skip to next line
        /qtxtlen qtxtlen lnlen sub def
        /qtxt qtxt lnlen qtxtlen getinterval def
    } loop      % text
    qtxtlnct    % return line count
  end           % dict
} def
/linebreakr load 0 16 dict put


% Adjust font size to fill artbox by repeatedly calling linebreakr.
% stack: fontname artbox text word-delimiter --> fontsize linect
%
% Returns when linect == floor(artbox-height / fontsize)
% or when fontsize no longer changes after call to linebreakr.
%
% Detects and avoids oscillation as in:
%     height fontsize quotient linect
%       708     26     27.2     26
%       708     27     26.2     27
/adjustfont {
  0 begin
    /worddelim exch def
    /pgtext exch def
    /artbox exch def
    /fontname exch def

    /fsMin 6 def
    /fsMax 144 def
    /fontsize 1 def
    /ABX artbox 0 get def
    /ABY artbox 1 get def
    /ABW artbox 2 get def
    /ABH artbox 3 get def

    TRACE { % if
        % outline rectangle where text goes
        gsave
        .82 setgray  artbox rectstroke
        grestore

        % artbox coords
        % ... N ncat
        (artbox:)
        ( x=) ABX strN
        ( y=) ABY strN 
        ( w=) ABW strN 
        ( h=) ABH strN
        ( x+w=) ABX ABW add strN 
        ( y+h=) ABY ABH add strN 
        (\n)
        14 ncat Trace
        % fontsize computation table header
        (szrg\tftsz\tlnct\th/ftsz\ttextw\n) Trace
    } if

    { % loop
        /lastfs fontsize def
        % prefer smaller font size (using idiv)
        /fontsize fsMin fsMax add 2 idiv def
        fontname fontsize selectfont
        % count lines by splitting text using current font
        pgtext worddelim ABW { pop pop } linebreakr
        /linect exch def
        /lineqt ABH fontsize idiv def          % floor(ABH / fontsize)

        TRACE { % if
          % fontsize computation table row
          /textwd pgtext stringwidth pop def   % width in current font
          % ... N ncat
          fsMin strN (:) fsMax strN
          (\t) fontsize strN
          (\t) linect strN
          (\t) lineqt strN
          (\t) textwd cvi strN
          (\n)
          12 ncat Trace
        } if

        lineqt linect sub
        dup 0 eq                    % success
        fontsize lastfs eq or       % guard against infinite loop
        { pop exit } if
        0 lt { /fsMax fontsize def
        }{     /fsMin fontsize def
        } ifelse
    } loop
    fontsize linect     % return values
  end   % dict
} def
/adjustfont load 0 16 dict put


% Encode named font: fontname encid --> encfontname
% where encid is 0 StandardEncoding, 1 Latin-1, or 2 Latin-2
% e.g. /Helvetica 1 --> /encft1Helvetica
% origin of /encvec table: 
/encodefont {
  0 begin
        /encid exch def
        /fontnm exch def

        /myfontnm {
            (encft)
            encid strN
            fontnm 64 string cvs
            3 ncat
        } def
        /encvec encid 1 eq 
            { ISOLatin1Encoding }
            { StandardEncoding } ifelse
        def
        encid 2 eq { % if
        /encvec
            % Latin-2: first 144 entries same as in ISO Latin-1
            ISOLatin1Encoding 0 144 getinterval aload pop
            % x
                /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
                /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
            % x
                /nbspace /Aogonek /breve /Lslash /currency /Lcaron /Sacute /section
                /dieresis /Scaron /Scedilla /Tcaron /Zacute /hyphen /Zcaron /Zdotaccent
                /degree /aogonek /ogonek /lslash /acute /lcaron /sacute /caron
                /cedilla /scaron /scedilla /tcaron /zacute /hungarumlaut /zcaron /zdotaccent
            % x
                /Racute /Aacute /Acircumflex /Abreve /Adieresis /Lacute /Cacute /Ccedilla
                /Ccaron /Eacute /Eogonek /Edieresis /Ecaron /Iacute /Icircumflex /Dcaron
                /Dcroat /Nacute /Ncaron /Oacute /Ocircumflex /Ohungarumlaut /Odieresis /multiply
                /Rcaron /Uring /Uacute /Uhungarumlaut /Udieresis /Yacute /Tcedilla /germandbls
            % x
                /racute /aacute /acircumflex /abreve /adieresis /lacute /cacute /ccedilla
                /ccaron /eacute /eogonek /edieresis /ecaron /iacute /icircumflex /dcaron
                /dcroat /nacute /ncaron /oacute /ocircumflex /ohungarumlaut /odieresis /divide
                /rcaron /uring /uacute /uhungarumlaut /udieresis /yacute /tcedilla /dotaccent
            256 packedarray def
        } if
        fontnm findfont           % load the font
        0 dict copy begin         % copy it to a new dictionary
        /Encoding encvec def      % replace encoding vector
        myfontnm /FontName def    % replace font name
        currentdict end
        dup /FID undef            % remove internal data
        myfontnm exch definefont pop  % define the new font 

        myfontnm        % return value
  end % dict
} def
/encodefont load 0 4 dict put


% Justify text vertically by adjusting font size, centre horizontally, 
% and display in artbox.
% stack: pagetext fontname mediabox artbox rgbtext rgbbkg --> ...
/vert-centr {
  15 dict begin
    /rgbbkg exch def
    /rgbtext exch def
    /artbox exch def
    /mediabox exch def
    /fontname exch def
    /pgtext exch def

    /worddelim ( ) def
    /ABX artbox 0 get def
    /ABY artbox 1 get def
    /ABW artbox 2 get def
    /ABH artbox 3 get def

    rgbbkg {255 div} forall setrgbcolor mediabox rectfill
    rgbtext {255 div} forall setrgbcolor 

    % adjust font size, select font, centre text vertically
    fontname artbox pgtext worddelim adjustfont
    /lnct exch def
    /fontsize exch def

    /lnyadj ABH fontsize lnct mul sub lnct div def  % even out excess
    /lnypos ABH ABY add lnyadj add 4 add cvr def    % +4 looks better
    (lnypos:) Trace
    % split text into lines and display
    % args: pagetext delimiter maxlinewidth eolproc
    pgtext worddelim ABW { 
        % eolproc: linetext linewidth --> ...
        /lnypos lnypos fontsize sub lnyadj sub def
        ABX lnypos cvi moveto 
        % centre text horizontally
        ABW sub -2 div 0 rmoveto show
        ( ) Trace lnypos strN Trace 
    } linebreakr pop
    (\n) Trace
    showpage
  end   % dict
} def

% ---- startup code here ----

示例启动代码:

%%Page: 1 1
/TRACE true def
(OBS! ~ Tåg till Göteborg avgår inom fyrtiofem minuter)
/Helvetica 1 encodefont
[0 0 595 842] [71 67 453 708 ] [0 0 0] [252 250 243]
vert-centr
%%Trailer

文件:onepg.sh

#! /bin/sh
# Use ghostscript 9.50 to run onepg.ps
# e.g.
#   echo 'Train in 45 min' | ./onepg.sh > msg.pdf
#   tocode=latin2 rgbtext='0 0 255' ./onepg.sh < some.txt > some.pdf
#   TRACE=x logfile=file.log outfile=file.pdf ./onepg.sh < file.txt
#   infile=the.txt outfile=the.png devWpts=1600 devHpts=900 ./onepg.sh

# shellcheck disable=SC2223,SC2046,SC2086

## Set default values

: ${progps='./onepg.ps'}        ## PostScript program file
: ${TRACE=}                     ## non-empty to trace to ${logfile}
: ${psoutfile=}                 ## non-empty to emit raw PostScript
: ${infile='/dev/stdin'}        ## source text
: ${outfile='/dev/stdout'}      ## destination, e.g. my.pdf or my.ps
: ${logfile='/dev/stderr'}      ## e.g. my.trace.log or %stderr
: ${fromcode='UTF-8'}           ## encoding of ${infile}
: ${tocode='ASCII'}             ## ASCII | LATIN1 | LATIN2
: ${PAPERSIZE='a4'}             ## see `man paperconf`
: ${marginx=.12} ${marginy=.08} ## page margins (.08 = 8%)
: ${landscape=}                 ## non-empty for landscape orientation
: ${fontname='Helvetica'}       ## font name
: ${rgbtext='0 0 0'}            ## text colour RGB
: ${rgbbkg='252 250 243'}       ## background colour RGB

## Set up arguments

case ${tocode} in
  (LATIN2|latin2) encid=2 tocode='LATIN2//TRANSLIT' ;;
  (LATIN1|latin1) encid=1 tocode='LATIN1//TRANSLIT' ;;
  (ASCII|ascii|*) encid=0 tocode='ASCII//TRANSLIT' ;;
esac
case ${outfile} in
  (*.jpeg) gsdevice='jpeg' ;;
  (*.null) gsdevice='nullpage' ;;
  (*.pdf)  gsdevice='pdfwrite' ;;
  (*.png)  gsdevice='png16m' ;;
  (*.ps)   gsdevice='ps2write' ;;
  (*.txt)  gsdevice='txtwrite' ;;
  (*)      gsdevice='pdfwrite' ;;
esac
case ${PAPERSIZE} in
  ## portrait mode width and height dimensions in points
  (letter)
       : ${devWpts=612}  ${devHpts=792} ;;
  (a5) : ${devWpts=420}  ${devHpts=595} ;;
  (a4) : ${devWpts=595}  ${devHpts=842} ;;
  (a3) : ${devWpts=842}  ${devHpts=1191} ;;
  (*)  if test -z "${devWpts}"; then
           set -- $(LC_NUMERIC=C printf '%.0f ' \
                    $(paperconf -p "${PAPERSIZE}" -w -h))
           devWpts=""  devHpts=""
       fi ;;
esac
if test "${landscape}"
then _tmp="${devWpts}"  devWpts="${devHpts}"  devHpts="${_tmp}"
     _tmp="${marginx}"  marginx="${marginy}"  marginy="${_tmp}"
fi
mediabox2artbox() { ## x= y= w= h=
  set -- "*$marginx" "*$marginy" "-*$marginx*2" "-*$marginy*2"
  printf '(%s+0.5)/1\n' "$@" | bc | paste -s -d ' ' -
}
: ${mediabox="0 0 ${devWpts} ${devHpts}"}       ## x y width height
: ${artbox="$(mediabox2artbox ${mediabox})"}    ## same, within margins


## Emit PostScript, run ghostscript

{   cat << ENDCMT
%!PS-Adobe-2.0
%%BoundingBox: ${mediabox}
%%Creator: ${0##*/}
%%Pages: 1
%%Title: ${infile%.*}
%%EndComments
ENDCMT
    ## copy program stripping non-DSC comments and indentation
    sed -e '/^%%/! s/[[:blank:]]*%[^%]*$//' \
        -e 's/^[[:blank:]]*//' -e '/./!d' "${progps}"
    ## startup code
    cat << HERE
%%Page: 1 1
${TRACE:+/TRACE true def}
HERE
    ## convert text to 8-bit PostScript string, 
    ##  escape backslashes, paren:s, and newlines, enclose in paren:s
    iconv -f "${fromcode}" -t "${tocode}" < "${infile}" |
    sed -e 's/[\()]/\&/g' -e '$!s/$/\/' -e '1s/^/(/' -e '$s/$/)/'
    cat << ENDPS
/${fontname} ${encid} encodefont
[${mediabox}] [${artbox}] [${rgbtext}] [${rgbbkg}]
vert-centr
%%Trailer
ENDPS
} |
tee ${psoutfile:+"${psoutfile}"} |
gs -q -dBATCH -dNOPAUSE \
    -dDEVICEWIDTHPOINTS="${devWpts}" \
    -dDEVICEHEIGHTPOINTS="${devHpts}" \
    -sDEVICE="${gsdevice}" \
    -sOutputFile="${outfile}" \
    ${logfile:+-sstdout="${logfile}"} \
    -f /dev/stdin

一种可能的解决方案是使用 tcolorbox 中的 fitting 库:

\documentclass{article}

\usepackage[
  margin=0.5in,
  papersize={8.5in,11in} 
]{geometry}

\pagestyle{empty}

\usepackage{lmodern}

\usepackage[fitting]{tcolorbox}

\newtcolorbox{mybox}{
  colback=white,
  colframe=white,
  width=\textwidth,
  fit to height=\textheight,
  halign=center,
  valign=center,
  fontupper=\sffamily, 
  fit basedim=150pt
}

\begin{document}

\begin{mybox}
  Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Ut purus elit, vestibulum ut, placerat ac, adipiscing vitae, felis. Curabitur dictum gravida mauris. Nam arcu libero, nonummy eget, consectetuer id, vulputate a, magna. Donec vehicula augue eu neque. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Mauris ut leo. Cras viverra metus rhoncus sem.
\end{mybox}%

\end{document}