使用 Python 和正则表达式添加自定义 ID

Question

我在降价文档中有一个文档，我想为每个城市条目添加一个自定义 ID。文档的基本布局如下：

#Country

## StateA

### CityA
#### Population
#### Government
#### History

### CityB
#### Population
#### Government
#### History

## StateB

### CityA
#### Population
#### Government
#### History

### CityB
#### Population
#### Government
#### History

我想为每个城市添加一个带有计数器的自定义 ID。例如，ID 看起来像：

#USA

## FL

### US_FL_00001
### US_FL_00002
### US_FL_00003

## GA

### US_GA_00001
### US_GA_00002
### US_GA_00003

我知道使用正则表达式对 select 城市使用 re.findall() 和 re.sub() 作为 '###' headers 相对简单，但是如何我可以提取状态和 ID 的连续计数器吗？

Answer 1

看起来您的样本输入和样本输出可能有所不同，但我的回答是基于您的样本输出，您可以根据自己的需要进行调整。

想法是读入输入文件并逐行测试以查看该行是否代表国家、州或城市。然后它存储这些直到它到达以'####'开头的行然后它输出结果和计数器到一个新文件。

import re

with open('input.md', 'r') as f:
    # read in the original file
    text = f.readlines()

# open the output file and loop through the original data
with open('output.md', 'w') as o:
    country_counter = counter = 0
    for line in text:
        # get the country
        m = re.match(r'^#([A-Za-z]+)', line)
        if m:
            country = m.group(1)
            # this checks to see if it is the first country
            # in the file. If so then we don't want the leading
            # newline characters
            if country_counter == 0:
                o.write(f'#{country}')
            else:
                o.write(f'\n\n#{country}')
            country_counter += 1

        # get the state
        m = re.match(r'^##\s([A-Za-z]+)', line)
        if m:
            state = m.group(1)
            # reset the counter
            counter = 0
            o.write(f'\n\n## {state}\n')

        # get the city
        m = re.match(r'^###\s([A-Za-z]+)', line)
        if m:
            # increase the counter and output the results
            # the counter is padded to 5 digits.
            counter += 1
            o.write(f'\n### {city}_{state}_{counter:05}')

使用 Python 和正则表达式添加自定义 ID

Adding a Custom ID with Python & Regex

python

regex

markdown

text