Python-docx：查找Word文档中所有占位符数字并用随机数替换

Question

我在查找和替换 Word 文件段落中所有出现的多个占位符时遇到问题。这是一本游戏书，所以我正在尝试为起草本书时使用的占位符添加随机条目号。

所有占位符都以“#”开头（例如#1-5、#22-1 等）。设置数字，如第一个条目（始终为“1”），没有“#”前缀。通过在元组内压缩以供参考，占位符条目与随机对应项作为元组配对。

这一切都非常适合标题，因为它是按顺序直接 one-for-one 段落交换。问题是当我遍历常规段落（代码的倒数第二位）时。它似乎只替换前八个数字，然后停止。我试过设置一个循环，但似乎无济于事。不知道我错过了什么。代码如下。

编辑：以下是两个列表和引用元组的设置方式。在这个测试中，只设置了第一个条目，没有 in-paragraph 返回它的引用。所有其他的都将被随机化并替换 in-paragraph.

条目工作： ['#1-1', '#1-2', '#1-3', '#1-4', '#1-5', '#1-6', '#1-7', '#1-8', '#2', '#2-1', '#2-2', '#2-3', '#2-4', '#2-5', '#2 -6', '#2-7', '#16', '#17', '#3', '#3-1', '#3-2', '#3-3', '#3 -4', '#3-5', '#3-6', '#3-8', '#3-9']

条目数： ['2', '20', '12', '27', '23', '4', '11', '16', '26', '7', '25', '5', ' 3', '15', '17', '6', '18', '22', '10', '21', '19', '13', '28', '8', '14' , '9', '24']

参考： (('#1-1', '2'), ('#1-2', '20'), ('#1-3', '12'), ('#1-4', '27 '), ('#1-5', '23'), ('#1-6', '4'), ('#1-7', '11'), ('#1-8', '16'), ('#2', '26'), ('#2-1', '7'), ('#2-2', '25'), ('#2-3', '5'), ('#2-4', '3'), ('#2-5', '15'), ('#2-6', '17'), ('#2-7 ', '6'), ('#16', '18'), ('#17', '22'), ('#3', '10'), ('#3-1', '21 '), ('#3-2', '19'), ('#3-3', '13'), ('#3-4', '28'), ('#3-5', '8'), ('#3-6', '14'), ('#3-8', '9'), ('#3-9', '24'))

感谢协助。

import sys, os, random
from docx import *

entryWorking = [] # The placeholder entries created for the draft gamebook


# Identify all paragraphs with a specific heading style (e.g. 'Heading 2')
def iter_headings( paragraphs, heading ) :
    for paragraph in paragraphs :
        if paragraph.style.name.startswith( heading ) :
            yield paragraph


# Open the .docx file
document = Document( 'TestFile.docx' )


# Search document for unique placeholder entries (must have a unique heading style)
for heading in iter_headings( document.paragraphs, 'Heading 2' ) :
    entryWorking.append( heading.text )


# Create list of randomized gamebook entry numbers
entryNumbers = [ i for i in range( len ( entryWorking ) + 1 ) ]

# Remove unnecessary entry zero (extra added above to compensate)
entryNumbers.remove( 0 )

# Convert to strings
entryNumbers = [ str( x ) for x in entryNumbers ]


# Identify pre-set entries (such as Entry 1), and remove from both lists
# This avoids pre-set numbers being replaced (i.e. they remain as is in the .docx)
# Pre-set entry numbers must _not_ have the "#" prefix in the .docx
for string in entryWorking :
    if string[ 0 ] != '#' :
        entryWorking.remove( string )
        if string in entryNumbers :
            entryNumbers.remove( string )

# Shuffle new entry numbers
random.shuffle( entryNumbers )


# Create tuple list of placeholder entries paired with random entry
reference = tuple( zip( entryWorking, entryNumbers ) )


# Replace placeholder headings with assigned randomized entry
for heading in iter_headings( document.paragraphs, 'Heading 2' ) :
    for entry in reference :
        if heading.text == entry[ 0 ] :
            heading.text = entry[ 1 ]


# Search through paragraphs for placeholders and replace with randomized entry
for paragraph in document.paragraphs :
    for run in paragraph.runs :
        for entry in reference :
            if run.text == entry[ 0 ] :
                run.text = entry [ 1 ]

                        
# Save the new document with final entries
document.save('Output.docx')

Answer 1

在 Word 中，在文本中的任意位置运行中断：

.

您可能对此答案中的链接感兴趣，这些链接演示了在一般情况下执行此类操作所需的（异常复杂的）工作：

How to use python-docx to replace text in a Word document and save

There are a couple of paragraph-level functions that do a good job of this and can be found on the GitHub site for python-docx.

This one will replace a regex-match with a replacement str. The replacement string will appear formatted the same as the first character of the matched string.

This one will isolate a run such that some formatting can be applied to that word or phrase, like highlighting each occurence of "foobar" in the text or perhaps making it bold or appear in a larger font.

幸运的是，它通常是可复制粘贴的，效果很好:)

Answer 2

谢谢scanny的协助！

我在让它工作后发现的最后一个问题是在每个参考号后添加一个“#”后缀以确保它们是唯一的（例如，#2 的随机条目没有被替换为 #2-1 ).

下面的工作代码。

import sys, os, random, re
from docx import *



# Identify all paragraphs with a specific heading style (e.g. 'Heading 2')
def iter_headings( paragraphs, heading ) :
    for paragraph in paragraphs :
        if paragraph.style.name.startswith( heading ) :
            yield paragraph



def paragraph_replace_text( paragraph, regex, replace_str ) : # Credit to scanny on GitHub
    """Return `paragraph` after replacing all matches for `regex` with `replace_str`.

    `regex` is a compiled regular expression prepared with `re.compile(pattern)`
    according to the Python library documentation for the `re` module.
    """
    
    # --- a paragraph may contain more than one match, loop until all are replaced ---
    while True :
        text = paragraph.text
        
        match = regex.search( text )

        if not match :
            break


        # --- when there's a match, we need to modify run.text for each run that
        # --- contains any part of the match-string.
        runs = iter( paragraph.runs )
        start, end = match.start(), match.end()


        # --- Skip over any leading runs that do not contain the match ---
        for run in runs :
            run_len = len( run.text )

            if start < run_len :
                break

            start, end = start - run_len, end - run_len


        # --- Match starts somewhere in the current run. Replace match-str prefix
        # --- occurring in this run with entire replacement str.
        run_text = run.text

        run_len = len( run_text )

        run.text = "%s%s%s" % ( run_text[ :start ], replace_str, run_text[ end: ] )

        end -= run_len  # --- note this is run-len before replacement ---

        # --- Remove any suffix of match word that occurs in following runs. Note that
        # --- such a suffix will always begin at the first character of the run. Also
        # --- note a suffix can span one or more entire following runs.
        for run in runs :  # --- next and remaining runs, uses same iterator ---
            if end <= 0 :
                break

            run_text = run.text

            run_len = len( run_text )

            run.text = run_text[ end: ]

            end -= run_len

    # --- optionally get rid of any "spanned" runs that are now empty. This
    # --- could potentially delete things like inline pictures, so use your judgement.
    # for run in paragraph.runs :
    #     if run.text == "" :
    #         r = run._r
    #         r.getparent().remove( r )

    return paragraph


""" NOTE: Replace 'Doc.docx' with your filename """
# Open the .docx file
document = Document( 'Doc.docx' )


# Search document for unique placeholder entries (must have a unique heading style)
entryWorking = [] # The placeholder entries created for the draft gamebook


""" NOTE: Replace 'Heading 2' with your entry number header """
for heading in iter_headings( document.paragraphs, 'Heading 2' ) :
    entryWorking.append( heading.text )


# Create list of randomized gamebook entry numbers
entryNumbers = [ i for i in range( len ( entryWorking ) + 1 ) ]


# Remove unnecessary entry zero (extra added above to compensate)
entryNumbers.remove( 0 )


# Convert to strings
entryNumbers = [ str( x ) for x in entryNumbers ]


# Identify pre-set entries (such as Entry 1), and remove from both lists
# This avoids pre-set numbers being replaced (i.e. they remain as is in the .docx)
# Pre-set entry numbers must _not_ have the "#" prefix in the .docx
for string in entryWorking :
    if string[ 0 ] != '#' :
        entryWorking.remove( string )

        if string in entryNumbers :
            entryNumbers.remove( string )


# Shuffle new entry numbers
random.shuffle( entryNumbers )


# Create tuple list of placeholder entries paired with random entry
reference = tuple( zip( entryWorking, entryNumbers ) )


# Replace placeholder headings with assigned randomized entry
for heading in iter_headings( document.paragraphs, 'Heading 2' ) :
    for entry in reference :
        if heading.text == entry[ 0 ] :
            heading.text = entry[ 1 ]


for paragraph in document.paragraphs :
    for entry in reference :
        if entry[ 0 ] in paragraph.text :
            regex = re.compile( entry[ 0 ] )
            paragraph_replace_text(paragraph, regex, entry[ 1 ])

                        
# Save the new document with final entries
document.save('Output.docx')

Python-docx：查找Word文档中所有占位符数字并用随机数替换

Python-docx: Find and replace all placeholder numbers in Word doc with random numbers

python

replace

docx

find

python-docx