preg_split() 生成单行数组而不是基于正则表达式拆分

preg_split() producing a single row array instead of splitting based on regex

有人可能会立即发现这一点,但我一直对这种搜索模式视而不见,不确定我遗漏了什么。

// test string
$stringToSplit = "I awoke in the dim light of the fire pit surrounded by daunting stone walls, my chest tight and my breath stolen by the creak of the heavy oak door opposite my bed. But it wasn’t my bed; that sack of feathers and the sheets of linen were unfamiliar to me. It was the place that my captor had left me. I found it strange, despite struggling against my bonds and having the memory of the cord tearing into my flesh, that no rip, no break of the skin remained. My hands were free, though; bits of rope—severed by a knife or a sword—lay on the floor beside me.";

//test split parameters
$split = '/["’“]?(A-Z)(((Mr|Ms|Mrs|Dr|Gen|Col|Maj|Capt|Lt|Sgt|Cpl|Pvt|Hon|Jr|Sr|St|Rev|Prof)\.\s+((?!\w{2,}[.?!][’\"]?\s+["’]?[A-Z]).))?)((?![.?!]["’]?\s+["’]?[A-Z]).)[.?!…—]+["’”]?/';

//split based on parameters
$splitText = preg_split($split, $stringToSplit);

//return split text
print_r($splitText);

当前输出:

Array ( [0] => I awoke in the dim light of the fire pit surrounded by daunting stone walls, my chest tight and my breath stolen by the creak of the heavy oak door opposite my bed. But it wasn’t my bed; that sack of feathers and the sheets of linen were unfamiliar to me. It was the place that my captor had left me. I found it strange, despite struggling against my bonds and having the memory of the cord tearing into my flesh, that no rip, no break of the skin remained. My hands were free, though; bits of rope—severed by a knife or a sword—lay on the floor beside me. )

期望的输出:

Array ( [0] => I awoke in the dim light of the fire pit surrounded by daunting stone walls, my chest tight and my breath stolen by the creak of the heavy oak door opposite my bed.
[1] = > But it wasn’t my bed; that sack of feathers and the sheets of linen were unfamiliar to me. 
[2] = > It was the place that my captor had left me. 
[3] = > I found it strange, despite struggling against my bonds and having the memory of the cord tearing into my flesh, that no rip, no break of the skin remained.
[4] = > My hands were free, though; bits of rope—severed by a knife or a sword—lay on the floor beside me. )

正则表达式很复杂,因为它旨在找到能够正确拆分文本中的任何字符串的模式,而不是挂在不是该段真正结尾的缩写和结尾上。虽然所有规则都不适用于示例文本,但我需要这些规则来解析任何给定的示例。

就目前而言,代码 returns 一对 key/value 键 0,值为整个未拆分字符串。

编辑添加:为了清楚起见,我正在添加一个更大的文本样本,它显示了正则表达式字符串中某些规则的原因。

$stringToSplit = "I awoke in the dim light of the fire pit surrounded by daunting stone walls, my chest tight and my breath stolen by the creak of the heavy oak door opposite my bed. But it wasn’t my bed; that sack of feathers and the sheets of linen were unfamiliar to me. It was the place that my captor had left me. I found it strange, despite struggling against my bonds and having the memory of the cord tearing into my flesh, that no rip, no break of the skin remained. My hands were free, though; bits of rope—severed by a knife or a sword—lay on the floor beside me. They must be sure that I won’t… can’t escape. “Good,” my captor said, stepping the rest of the way into the room. “You’ve awakened.” The way he said it sent tingles racing along my skin. Whereas I considered waking up a trivial matter, this man seemed to reflect upon the act with some reverence. The man’s cloak, his cowl draped over his hair and forehead, matched the drab gray of my prison’s walls, and a shadow cast over his face made it impossible to distinguish any of his features. His eyes, though, were obvious, and they must have caught the firelight because they glowed pale blue. “My family…” I started, inching away as if I could escape through the stone at my back. “They’ll pay whatever ransom you ask. Please, I beg—” “You waste your breath.” The man approached, but he stopped at the table halfway and lay upon it folded cloth. “I am not the one who keeps you here.” “But you serve him… her? You must reason with your master—” “I must do nothing,” he replied, laughing. “And your family might not want you in your condition. Have you smelled yourself lately?” “No,” I said flatly, and it wasn’t until the man had said something that I noticed I couldn’t smell the wood roasting in the fireplace, or anything else for that matter. My whole body was numb except for my head, which still ached. I recalled that he had bashed me in the head with a club, but I couldn’t piece together much else. “Why are you keeping me here?” “You’ll see.” He gestured at the table. “I suggest you change.” And he closed the door behind him. I stood there for a time, consumed with loathing and hatred for the man. I glanced at the fire and then at the table. When I studied the door from where I stood, I realized that it had no lock, and the place seemed unlike any cell I’d ever seen. No prisoner, for all I knew, had ever been treated to his own fireplace, stuffed mattress, or wash basin. And so, believing my chances of escape slim and without any available options, I stripped the tattered clothes from my body. The shirt—the one my father had bought for me, the fine silk one—couldn’t be salvaged. The pants, too, were in ribbons and came off easily.";

如果你想在每个句点上拆分你的字符串(就像你展示的例子),而不是当它们前面有 Mr|Ms|Mrs... 时,你可以这样做:

$stringToSplit = "I awoke in the dim light of the fire pit surrounded by daunting stone walls, my chest tight and my breath stolen by the creak of the heavy oak door opposite my bed. But it wasn’t my bed; that sack of feathers and the sheets of linen were unfamiliar to me. It was the place that my captor had left me. I found it strange, despite struggling against my bonds and having the memory of the cord tearing into my flesh, that no rip, no break of the skin remained. My hands were free, though; bits of rope—severed by a knife or a sword—lay on the floor beside me. They must be sure that I won’t… can’t escape. “Good,” my captor said, stepping the rest of the way into the room. “You’ve awakened.” The way he said it sent tingles racing along my skin. Whereas I considered waking up a trivial matter, this man seemed to reflect upon the act with some reverence. The man’s cloak, his cowl draped over his hair and forehead, matched the drab gray of my prison’s walls, and a shadow cast over his face made it impossible to distinguish any of his features. His eyes, though, were obvious, and they must have caught the firelight because they glowed pale blue. “My family…” I started, inching away as if I could escape through the stone at my back. “They’ll pay whatever ransom you ask. Please, I beg—” “You waste your breath.” The man approached, but he stopped at the table halfway and lay upon it folded cloth. “I am not the one who keeps you here.” “But you serve him… her? You must reason with your master—” “I must do nothing,” he replied, laughing. “And your family might not want you in your condition. Have you smelled yourself lately?” “No,” I said flatly, and it wasn’t until the man had said something that I noticed I couldn’t smell the wood roasting in the fireplace, or anything else for that matter. My whole body was numb except for my head, which still ached. I recalled that he had bashed me in the head with a club, but I couldn’t piece together much else. “Why are you keeping me here?” “You’ll see.” He gestured at the table. “I suggest you change.” And he closed the door behind him. I stood there for a time, consumed with loathing and hatred for the man. I glanced at the fire and then at the table. When I studied the door from where I stood, I realized that it had no lock, and the place seemed unlike any cell I’d ever seen. No prisoner, for all I knew, had ever been treated to his own fireplace, stuffed mattress, or wash basin. And so, believing my chances of escape slim and without any available options, I stripped the tattered clothes from my body. The shirt—the one my father had bought for me, the fine silk one—couldn’t be salvaged. The pants, too, were in ribbons and came off easily.";

$split = preg_split('/(?:(?<!Mr|Ms|Mrs|Dr|Gen|Col|Maj|Capt|Lt|Sgt|Cpl|Pvt|Hon|Jr|Sr|St|Rev|Prof)\.|[!?)"])/', iconv('UTF-8', 'ASCII//TRANSLIT', $stringToSplit));

var_dump(array_filter(array_map('trim', $split))); // I've used array_map to trim any white spaces and then array filter remove empty array elements

编辑:要拆分句点,但不是在句点前面有 Mr|Ms|Mrs... 时,只需使用正则表达式 negative lookbehind.

如果现在对您有用,请告诉我。

我稍微简化了您的正则表达式。我还使用了负后视,如果您在浏览器中使用它,可能不支持它。

不过你可以试试这个。

(?<!Mr|Ms|Mrs|Dr|Gen|Col|Maj|Capt|Lt|Sgt|Cpl|Pvt|Hon|Jr|Sr|St|Rev|Prof\.)(?<!["”“'])[.!?]+(?!["“”'])

使用更大的文本示例在 Google Chrome v76.0.3809.132 here 上进行了测试,一切正常。

特点:

  • 匹配点
  • 不要匹配 Mr、Ms 等之后的点
  • 不匹配“,”,“,'之间的点

编辑。

保留定界符的解决方案是在将点与负后视匹配后使用正后视。

$regex = "/(?<=(?<!Mr|Ms|Mrs|Dr|Gen|Col|Maj|Capt|Lt|Sgt|Cpl|Pvt|Hon|Jr|Sr|St|Rev|Prof\.)(?<![\"”“'])[!?.](?![!?.])(?![\"“”']))/";

$subject = "your text here";

$result = preg_split($regex, $subject, 0, PREG_SPLIT_NO_EMPTY);