混合大写单词和符号的最佳正则表达式是什么?

What is the best regex for a mix of capitalised words and symbols?

我正在尝试组织以下列表: list of gaming genre

我想将连接的单词分开,但它们似乎不会正确使用代表首字母缩略词(例如 PVP、MMORPG、MOBA、DeFi)的大写单词。

目前,我的正则表达式代码如下:

re.sub(r"(\w)([A-Z])", r" ", ele) for ele in genre_list

正如您在下面看到的,它有时有效,有时无效:

['Collectible Open-World Virtual-World', 'Breeding Card PV P', 'Auto-Battler Breeding Strategy', 'Minigame Open-World Virtual-World', 'Action Simulation Sports', 'Adventure MM OStrategy', 'Adventure Casual Puzzle', 'Sports', 'Collectible Sci-Fi Virtual-World', 'Battle-Royalee Sports MO BA', 'Action PV PShooter', 'P VP Sci-Fi Tower-Defense', 'Action Battle-Royale', 'P VP Sci-Fi Shooter', 'Breeding Collectible Mining', 'Collectible De Fie Sports', 'Action Adventure Shooter', 'City-Building Collectible Simulation', 'Action Strategy', 'Adventure Open-World', 'Breeding Racing Sports', 'Open-World Virtual-World', 'Collectible Idle', 'Action Adventure', 'Card Collectible PV P', 'Battle-Royale Fantasy MO BA', 'City-Building', 'Building MM OStrategy', 'Adventure MM OR PG', 'Action Adventure Idle', 'M OB AR PG Strategy', 'M MO RP GStrategy', 'Card Collectible Idle', 'Open-World PV PR PG', 'De Fi MM OSpace', 'Collectible', 'Card Collectible PV P', 'Auto-Battler De Fi RP G', 'Adventure MM OOpen-World', 'Collectible Open-World Virtual-World', 'Collectible Idle RP G', 'Card Collectible PV P', 'Action Adventure PV P', 'Sci-Fi Shooter Survival', 'Action Strategy', 'Arcade Minigame', 'Breeding PV PRacing', 'M OB AP VP', 'Action Sports', 'P VP Space Turn-based', 'M MO Strategy Tower-Defense']

你能帮我看看哪个正则表达式最适合这个吗?或者正则表达式不适用于此列表?谢谢!

根据评论进行编辑: 您只需要在 [A-Z] 之后添加一个“+”,即 r"(\w)([A-Z]+)"。 这将匹配 1 个或多个大写字母。

这很困难,因为您有可能会粘在一起的全大写单词。如果你有这样的列表,它是可以解决的。

您可以使用和增强以下代码以获得更好的输出精度:

import re
l = ['Collectible Open-World Virtual-World', 'Breeding Card PV P', 'Auto-Battler Breeding Strategy', 'Minigame Open-World Virtual-World', 'Action Simulation Sports', 'Adventure MM OStrategy', 'Adventure Casual Puzzle', 'Sports', 'Collectible Sci-Fi Virtual-World', 'Battle-Royalee Sports MO BA', 'Action PV PShooter', 'P VP Sci-Fi Tower-Defense', 'Action Battle-Royale', 'P VP Sci-Fi Shooter', 'Breeding Collectible Mining', 'Collectible De Fie Sports', 'Action Adventure Shooter', 'City-Building Collectible Simulation', 'Action Strategy', 'Adventure Open-World', 'Breeding Racing Sports', 'Open-World Virtual-World', 'Collectible Idle', 'Action Adventure', 'Card Collectible PV P', 'Battle-Royale Fantasy MO BA', 'City-Building', 'Building MM OStrategy', 'Adventure MM OR PG', 'Action Adventure Idle', 'M OB AR PG Strategy', 'M MO RP GStrategy', 'Card Collectible Idle', 'Open-World PV PR PG', 'De Fi MM OSpace', 'Collectible', 'Card Collectible PV P', 'Auto-Battler De Fi RP G', 'Adventure MM OOpen-World', 'Collectible Open-World Virtual-World', 'Collectible Idle RP G', 'Card Collectible PV P', 'Action Adventure PV P', 'Sci-Fi Shooter Survival', 'Action Strategy', 'Arcade Minigame', 'Breeding PV PRacing', 'M OB AP VP', 'Action Sports', 'P VP Space Turn-based', 'M MO Strategy Tower-Defense']
l = [''.join(s.split()) for s in l]
allcaps = ['RPG', 'MOBA', 'PVP', 'MMO']
rx_1 = re.compile(r'[a-z](?=[A-Z])|[A-Z](?=[A-Z][a-z])')
rx_2 = re.compile( fr"\b(?:{r'|'.join(allcaps)})(?=[A-Za-z])" )
rx_3 = re.compile( fr"(?<=[A-Za-z])(?:{r'|'.join(allcaps)})\b" )
for s in l:
    print( r'{} => {}'.format(s, rx_3.sub(r" \g<0>", rx_2.sub(r"\g<0> ", rx_1.sub(r"\g<0> ", s)))) )

Python demo。输出:

CollectibleOpen-WorldVirtual-World => Collectible Open-World Virtual-World
BreedingCardPVP => Breeding Card PVP
Auto-BattlerBreedingStrategy => Auto-Battler Breeding Strategy
MinigameOpen-WorldVirtual-World => Minigame Open-World Virtual-World
ActionSimulationSports => Action Simulation Sports
AdventureMMOStrategy => Adventure MMO Strategy
AdventureCasualPuzzle => Adventure Casual Puzzle
Sports => Sports
CollectibleSci-FiVirtual-World => Collectible Sci-Fi Virtual-World
Battle-RoyaleeSportsMOBA => Battle-Royalee Sports MOBA
ActionPVPShooter => Action PVP Shooter
PVPSci-FiTower-Defense => PVP Sci-Fi Tower-Defense
ActionBattle-Royale => Action Battle-Royale
PVPSci-FiShooter => PVP Sci-Fi Shooter
BreedingCollectibleMining => Breeding Collectible Mining
CollectibleDeFieSports => Collectible De Fie Sports
ActionAdventureShooter => Action Adventure Shooter
City-BuildingCollectibleSimulation => City-Building Collectible Simulation
ActionStrategy => Action Strategy
AdventureOpen-World => Adventure Open-World
BreedingRacingSports => Breeding Racing Sports
Open-WorldVirtual-World => Open-World Virtual-World
CollectibleIdle => Collectible Idle
ActionAdventure => Action Adventure
CardCollectiblePVP => Card Collectible PVP
Battle-RoyaleFantasyMOBA => Battle-Royale Fantasy MOBA
City-Building => City-Building
BuildingMMOStrategy => Building MMO Strategy
AdventureMMORPG => Adventure MMO RPG
ActionAdventureIdle => Action Adventure Idle
MOBARPGStrategy => MOBA RPG Strategy
MMORPGStrategy => MMO RPG Strategy
CardCollectibleIdle => Card Collectible Idle
Open-WorldPVPRPG => Open-World PVP RPG
DeFiMMOSpace => De Fi MMO Space
Collectible => Collectible
CardCollectiblePVP => Card Collectible PVP
Auto-BattlerDeFiRPG => Auto-Battler De Fi RPG
AdventureMMOOpen-World => Adventure MMO Open-World
CollectibleOpen-WorldVirtual-World => Collectible Open-World Virtual-World
CollectibleIdleRPG => Collectible Idle RPG
CardCollectiblePVP => Card Collectible PVP
ActionAdventurePVP => Action Adventure PVP
Sci-FiShooterSurvival => Sci-Fi Shooter Survival
ActionStrategy => Action Strategy
ArcadeMinigame => Arcade Minigame
BreedingPVPRacing => Breeding PVP Racing
MOBAPVP => MOBA PVP
ActionSports => Action Sports
PVPSpaceTurn-based => PVP Space Turn-based
MMOStrategyTower-Defense => MMO Strategy Tower-Defense

[a-z](?=[A-Z])|[A-Z](?=[A-Z][a-z]) 正则表达式(参见 its demo)匹配

  • [a-z](?=[A-Z]) - 紧跟大写字母的小写字母
  • | - 或
  • [A-Z](?=[A-Z][a-z]) - 一个大写字母后跟一个大写字母和一个小写字母。

我们在这些匹配后添加 space。

rx_2rx_3 正则表达式是根据 ALLCAPS 单词列表构建的,并在左侧或右侧添加 space,具体取决于另一个字母出现在哪一侧。