MATLAB:简单的字符串分析——查找位置
MATLAB: Simple string analysis - Find locations
这里我举一个文学作品的例子,想做一个简单的分析。注意不同的部分:
str = "Random info - at beginning-man. "+ ...
"Random info still continues. "+ ...
"CHAPTER 1. " + ...
"Random info in middle one, "+ ...
"Random info still continues. "+ ...
"1 This is sentence one of verse one, "+ ...
"This still sentence one of verse one. "+ ...
"2 This is sentence one of verse two. "+ ...
"This is sentence two of verse two. "+ ...
"3 This is sentence one of verse three; "+ ...
"this still sentence one of verse three. "+ ...
"CHAPTER 2. " + ...
"Random info in middle two. "+ ...
"Random info still continues. "+ ...
"1 This is sentence four? "+ ...
"2 This is sentence five, "+ ...
"3 this still sentence five but verse three!"+ ...
"Random info at end's end."+ ...
"Random info still continues. ";
我感兴趣的是所有数据dat都可以称为“中间的随机信息”,它在章节名称之后和诗歌开始之前。
我想使用“extractBetween”函数来提取“CHAPTER #”和“1”(第一节)之间的信息。
我知道如何使用“extractBetween”功能,但如何确定“CHAPTER #”之前和“1”(第一节)之后的任意数量的章节的位置?
最后我想有这样一个答案,其中每个章节的随机信息分配在table:
我试过 regexp() 和 findstr(),但都没有成功。
所有帮助将不胜感激。谢谢!
您可以使用带有regexp
的正则表达式来匹配文本。
[tokens, matches] = regexp(str, '(CHAPTER \d)\.\s*(.*?)1', 'tokens', 'match');
for k = 1:numel(tokens)
fprintf('%s\t%s\n', tokens{k}(1), tokens{k}(2));
% or: fprintf('%s\t%s\n', tokens{k});
end
将打印
CHAPTER 1 Random info in middle one, Random info still continues.
CHAPTER 2 Random info in middle two. Random info still continues.
解释正则表达式(CHAPTER \d)\.\s*(.*?)1
:
(CHAPTER \d)
匹配 CHAPTER 与任何数字,它周围的 () 括号将捕获 tokens
变量中的匹配项。
\.
匹配句点
\s*
匹配任何可能的空格
(.*?)1
将捕获文本中下一个 1 之前的任何文本。注意问号使其惰性匹配,否则它将匹配所有文本直到 str
. 中的最后一个 1
这里我举一个文学作品的例子,想做一个简单的分析。注意不同的部分:
str = "Random info - at beginning-man. "+ ...
"Random info still continues. "+ ...
"CHAPTER 1. " + ...
"Random info in middle one, "+ ...
"Random info still continues. "+ ...
"1 This is sentence one of verse one, "+ ...
"This still sentence one of verse one. "+ ...
"2 This is sentence one of verse two. "+ ...
"This is sentence two of verse two. "+ ...
"3 This is sentence one of verse three; "+ ...
"this still sentence one of verse three. "+ ...
"CHAPTER 2. " + ...
"Random info in middle two. "+ ...
"Random info still continues. "+ ...
"1 This is sentence four? "+ ...
"2 This is sentence five, "+ ...
"3 this still sentence five but verse three!"+ ...
"Random info at end's end."+ ...
"Random info still continues. ";
我感兴趣的是所有数据dat都可以称为“中间的随机信息”,它在章节名称之后和诗歌开始之前。
我想使用“extractBetween”函数来提取“CHAPTER #”和“1”(第一节)之间的信息。
我知道如何使用“extractBetween”功能,但如何确定“CHAPTER #”之前和“1”(第一节)之后的任意数量的章节的位置?
最后我想有这样一个答案,其中每个章节的随机信息分配在table:
我试过 regexp() 和 findstr(),但都没有成功。 所有帮助将不胜感激。谢谢!
您可以使用带有regexp
的正则表达式来匹配文本。
[tokens, matches] = regexp(str, '(CHAPTER \d)\.\s*(.*?)1', 'tokens', 'match');
for k = 1:numel(tokens)
fprintf('%s\t%s\n', tokens{k}(1), tokens{k}(2));
% or: fprintf('%s\t%s\n', tokens{k});
end
将打印
CHAPTER 1 Random info in middle one, Random info still continues.
CHAPTER 2 Random info in middle two. Random info still continues.
解释正则表达式(CHAPTER \d)\.\s*(.*?)1
:
(CHAPTER \d)
匹配 CHAPTER 与任何数字,它周围的 () 括号将捕获tokens
变量中的匹配项。\.
匹配句点\s*
匹配任何可能的空格(.*?)1
将捕获文本中下一个 1 之前的任何文本。注意问号使其惰性匹配,否则它将匹配所有文本直到str
. 中的最后一个 1