带有大写单词和破折号的正则表达式

Question

给定的文本字符串如下：

wikiradio 27/09/2012 - LE QUATTRO GIORNATE DI NAPOLI raccontate da Ida Gribaudi

wikiradio 10/04/2013 - DAG HAMMARSKJOLD raccontato da Susanna Pesenti

我正在使用正则表达式来匹配字符串的 UPPERCASE WORDS（即 "LE QUATTRO GIORNATE DI NAPOLI" 和 "DAG HAMMARSKJOLD"）。我的代码是这样的：

$title = $_GET["title"];
if (preg_match_all('/\b(?=[A-Z])[A-Z\' ]+(?=\W)/',$title,$match)) {

process matched portion...

它几乎总是有效，但是当 $title 字符串包含 撇号+space 或 破折号 时，它不起作用吨。例如，这两个标题中的大写单词不匹配。

wikiradio 11/02/2014 - L'ABBE' PIERRE raccontato da Giovanni Anversa

wikiradio 22/12/2015 - JEAN-MICHEL BASQUIAT raccontato da Costantino D'Orazio

我错过了什么？

Answer 1

类似这样的内容可能适合您：

\b[A-Z].*?(?= [a-z])

Regex online demo

传奇

    \b         # regex words boundary [1]
    [A-Z]      # any single Uppercase letter
    .*?        # Any char repeatead zero or more in lazy mode
    (?= [a-z]) # matches when the next 2 chars are a space and any single lowercase letter

[1] regex word boundary matches between a regex word char '\w' (also [a-zA-Z0-9_]) 
    and a non word \W ([^a-zA-Z0-9_]) or at start/end of the string 
    (just like '^' and '$')

代码演示 on ideone

更新

使用字符白名单的更新版本（我们不知道它是所有可能的）

(?m)\b[A-Z][A-Z '-]*(?= |$)

updated version

的在线演示

带有大写单词和破折号的正则表达式

regex with uppercase words and dash

regex

hyphen

preg-match

uppercase