MATLAB:简单的字符串分析——查找位置

MATLAB: Simple string analysis - Find locations

这里我举一个文学作品的例子,想做一个简单的分析。注意不同的部分:

str =   "Random info - at beginning-man. "+ ...
        "Random info still continues. "+ ...
        "CHAPTER 1. " + ...
        "Random info in middle one, "+ ...
        "Random info still continues. "+ ...
        "1 This is sentence one of verse one, "+ ...
        "This still sentence one of verse one. "+ ...
        "2 This is sentence one of verse two. "+ ...
        "This is sentence two of verse two. "+ ...
        "3 This is sentence one of verse three; "+ ...
        "this still sentence one of verse three. "+ ...
        "CHAPTER 2. " + ...
        "Random info in middle two. "+ ...
        "Random info still continues. "+ ...
        "1 This is sentence four? "+ ...
        "2 This is sentence five, "+ ...
        "3 this still sentence five but verse three!"+ ...
        "Random info at end's end."+ ...
        "Random info still continues. ";

我感兴趣的是所有数据dat都可以称为“中间的随机信息”,它在章节名称之后和诗歌开始之前。

我想使用“extractBetween”函数来提取“CHAPTER #”和“1”(第一节)之间的信息。

我知道如何使用“extractBetween”功能,但如何确定“CHAPTER #”之前和“1”(第一节)之后的任意数量的章节的位置?

最后我想有这样一个答案,其中每个章节的随机信息分配在table:

我试过 regexp() 和 findstr(),但都没有成功。 所有帮助将不胜感激。谢谢!

您可以使用带有regexp的正则表达式来匹配文本。

[tokens, matches] = regexp(str, '(CHAPTER \d)\.\s*(.*?)1', 'tokens', 'match');

for k = 1:numel(tokens)
    fprintf('%s\t%s\n', tokens{k}(1), tokens{k}(2)); 
    % or: fprintf('%s\t%s\n', tokens{k}); 
end

将打印

CHAPTER 1   Random info in middle one, Random info still continues. 
CHAPTER 2   Random info in middle two. Random info still continues. 

解释正则表达式(CHAPTER \d)\.\s*(.*?)1

  • (CHAPTER \d) 匹配 CHAPTER 与任何数字,它周围的 () 括号将捕获 tokens 变量中的匹配项。
  • \. 匹配句点
  • \s* 匹配任何可能的空格
  • (.*?)1 将捕获文本中下一个 1 之前的任何文本。注意问号使其惰性匹配,否则它将匹配所有文本直到 str.
  • 中的最后一个 1