从文本文件matlab中读取特定段落

Read certain paragraph from text files matlab

我正在使用 matlab 从文本文件中提取单词。我有几个文本文件,我想对每个文件的“AB”部分进行文本扫描。

据我所知,我知道如何从文本文件中读取特定的行,但是,因为我想对文件夹中的所有文本文件应用相同的代码,行号每次都会不同,我会每次都要改。

这是我所有文本文件的样子(示例):

PMID- 27401974
OWN - NLM
STAT- Publisher
DP - 2016 Jul 8
TI - North-seeking magnetotactic Gammaproteobacteria in the Southern Hemisphere.
LID - AEM.01545-16 [pii]
AB - Magnetotactic bacteria (MTB) comprise a phylogenetically diverse group of prokaryotes capable of orienting and navigating along magnetic field lines. Under oxic conditions, MTB in natural environments in the Northern Hemisphere generally display north-seeking (NS) polarity, swimming parallel to the Earth's magnetic field lines, while those in the Southern Hemisphere generally swim antiparallel to magnetic field lines (south-seeking (SS) polarity).
CI - Copyright (c) 2016, American Society for Microbiology. All Rights Reserved.
FAU - Leao, Pedro
AU - Leao P

提前致谢!

我想 regexp 是你的朋友:

fid = fopen('/path/to/file.txt');
line = fgetl(fid);
target = '';
found_ab = false;
while ischar(line)
    line = strtrim(line); % remove trailing white space
    if ~found_ab        
        res = regexp(line, '^AB\s*-?\s*(\S.*)$', 'tokens', 'once');
        if ~isempty(res)
            target = res{1};
            found_ab = true;
        end
    else
        % we found an "AB -" line, we see if there are multiple lines here
        res = regexp(line, '^[A-Z]+\s-\s'); 
        if ~ismepty(res)
            % we reached the end of AB - lines
            break;
        end
        % there are multiple text lines for "AB - "
        target = [target, line];
    end
    line = fgetl(fid);
end
fclose(fid);