在具有 python 的字符串列表中找到唯一的子字符串模式

find the unqiue substring pattern in a list of string with python

我有一个字符串列表如下:

['/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-130_S_4817-ses-2018-05-04_14_33_33.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-141_S_0767-ses-2019-04-08_12_52_36.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-041_S_5097-ses-2019-05-07_09_56_14.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-068_S_4061-ses-2017-09-26_14_07_37.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-002_S_1280-ses-2017-03-13_13_38_31.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-082_S_5282-ses-2019-06-17_10_11_15.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-018_S_4399-ses-2019-08-06_13_03_58.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-123_S_0106-ses-2018-10-11_12_54_59.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-141_S_2333-ses-2018-12-26_15_31_55.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-031_S_2018-ses-2019-01-24_11_26_13.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-041_S_0679-ses-2017-07-05_09_46_36.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-037_S_0303-ses-2017-05-11_13_39_46.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-037_S_0454-ses-2017-09-06_09_41_25.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-068_S_2187-ses-2019-10-09_13_19_17.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-116_S_4043-ses-2018-03-02_10_03_10.0.txt',

我希望在列表中提取具有模式 'sub-???_S_????' 的唯一主题 ID。

到目前为止我可以用:

unique_subject = re.search('(.*)_sub-(.*)-ses(.*).txt', all_files[0]).group(2)

但这只适用于单个字符串。我需要用循环来做。

unique_subject = set()

for f in all_files:
    unique_subject.add(re.search('(.*)_sub-(.*)-ses(.*).txt', f).group(2))

我想知道是否有更好的方法来做到这一点。最后,我想获得每个主题的第一节课。有快速的方法吗?

试试这个:

l = re.findall('\d{3}_S_\d{4}', ''.join(all_files))

您可以使用相同的正则表达式(我在会话部分添加了连字符)并将集更改为具有 key/value of subject/first 会话的字典。鉴于您希望以不同方式对待每个主题的第一行,我认为您当前使用列表元素循环的方法很好。

all_files = [
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-130_S_4817-ses-2018-05-04_14_33_33.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-141_S_0767-ses-2019-04-08_12_52_36.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-041_S_5097-ses-2019-05-07_09_56_14.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-068_S_4061-ses-2017-09-26_14_07_37.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-002_S_1280-ses-2017-03-13_13_38_31.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-082_S_5282-ses-2019-06-17_10_11_15.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-018_S_4399-ses-2019-08-06_13_03_58.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-123_S_0106-ses-2018-10-11_12_54_59.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-141_S_2333-ses-2018-12-26_15_31_55.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-031_S_2018-ses-2019-01-24_11_26_13.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-041_S_0679-ses-2017-07-05_09_46_36.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-037_S_0303-ses-2017-05-11_13_39_46.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-037_S_0454-ses-2017-09-06_09_41_25.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-068_S_2187-ses-2019-10-09_13_19_17.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-116_S_4043-ses-2018-03-02_10_03_10.0.txt'
]

import re
unique_subject = {}

for f in all_files:
    groups = re.search('(.*)_sub-(.*)-ses-(.*).txt', f)
    subject = groups.group(2)
    if subject not in unique_subject:
        session = groups.group(3)
        unique_subject[subject] = session

[print(f"{k} : {v}") for k, v in unique_subject.items()]

输出:

130_S_4817 : 2018-05-04_14_33_33.0
141_S_0767 : 2019-04-08_12_52_36.0
041_S_5097 : 2019-05-07_09_56_14.0
068_S_4061 : 2017-09-26_14_07_37.0
002_S_1280 : 2017-03-13_13_38_31.0
082_S_5282 : 2019-06-17_10_11_15.0
018_S_4399 : 2019-08-06_13_03_58.0
123_S_0106 : 2018-10-11_12_54_59.0
141_S_2333 : 2018-12-26_15_31_55.0
031_S_2018 : 2019-01-24_11_26_13.0
041_S_0679 : 2017-07-05_09_46_36.0
037_S_0303 : 2017-05-11_13_39_46.0
037_S_0454 : 2017-09-06_09_41_25.0
068_S_2187 : 2019-10-09_13_19_17.0
116_S_4043 : 2018-03-02_10_03_10.0