MATLAB：简单的字符串分析——查找位置

Question

这里我举一个文学作品的例子，想做一个简单的分析。注意不同的部分：

str =   "Random info - at beginning-man. "+ ...
        "Random info still continues. "+ ...
        "CHAPTER 1. " + ...
        "Random info in middle one, "+ ...
        "Random info still continues. "+ ...
        "1 This is sentence one of verse one, "+ ...
        "This still sentence one of verse one. "+ ...
        "2 This is sentence one of verse two. "+ ...
        "This is sentence two of verse two. "+ ...
        "3 This is sentence one of verse three; "+ ...
        "this still sentence one of verse three. "+ ...
        "CHAPTER 2. " + ...
        "Random info in middle two. "+ ...
        "Random info still continues. "+ ...
        "1 This is sentence four? "+ ...
        "2 This is sentence five, "+ ...
        "3 this still sentence five but verse three!"+ ...
        "Random info at end's end."+ ...
        "Random info still continues. ";

我感兴趣的是所有数据dat都可以称为“中间的随机信息”，它在章节名称之后和诗歌开始之前。

我想使用“extractBetween”函数来提取“CHAPTER #”和“1”（第一节）之间的信息。

我知道如何使用“extractBetween”功能，但如何确定“CHAPTER #”之前和“1”（第一节）之后的任意数量的章节的位置？

最后我想有这样一个答案，其中每个章节的随机信息分配在table:

我试过 regexp() 和 findstr()，但都没有成功。所有帮助将不胜感激。谢谢！

Answer 1

您可以使用带有regexp的正则表达式来匹配文本。

[tokens, matches] = regexp(str, '(CHAPTER \d)\.\s*(.*?)1', 'tokens', 'match');

for k = 1:numel(tokens)
    fprintf('%s\t%s\n', tokens{k}(1), tokens{k}(2)); 
    % or: fprintf('%s\t%s\n', tokens{k}); 
end

将打印

CHAPTER 1   Random info in middle one, Random info still continues. 
CHAPTER 2   Random info in middle two. Random info still continues.

解释正则表达式(CHAPTER \d)\.\s*(.*?)1：

(CHAPTER \d) 匹配 CHAPTER 与任何数字，它周围的 () 括号将捕获 tokens 变量中的匹配项。
\. 匹配句点
\s* 匹配任何可能的空格
(.*?)1 将捕获文本中下一个 1 之前的任何文本。注意问号使其惰性匹配，否则它将匹配所有文本直到 str.

MATLAB：简单的字符串分析——查找位置

MATLAB: Simple string analysis - Find locations

string

datatable

matlab

location

find