如何通过定义分隔符前后来提取子字符串

How to extract sub string by defining before and after delimiter

我有包含 URL 的数据框,我想在两者之间提取一些内容。

df
    URL
    https://storage.com/vision/Glass2020/2020-02-04_B8I8FZHl-xJ_2236301468348443721.jpg
    https://storage.com/vision/Carpet5020/2020-02-04_B8I8FZHl-xJ_2236301468348443721.jpg
    https://storage.com/vision/Metal8020/2020-02-04_B8I8FZHl-xJ_2236301468348443721.jpg

期望的输出是这样的

            URL                                                                           Type
 https://storage.com/vision/Glass2020/2020-02-04_B8I8FZHl-xJ_2236301468348443721.jpg      Glass2020
 https://storage.com/vision/Carpet5020/2020-02-04_B8I8FZHl-xJ_2236301468348443721.jpg     Carpet5020
 https://storage.com/vision/Metal8020/2020-02-04_B8I8FZHl-xJ_2236301468348443721.jpg      Metal8020

我会使用 df['URL'].str.extract 但要了解如何定义分隔符前后。

一个想法是通过索引将 Series.str.split 与 select 倒数第二个值一起使用:

df['Type'] = df['URL'].str.split('/').str[-2]
print (df)
                                                 URL        Type
0  https://storage.com/vision/Glass2020/2020-02-0...   Glass2020
1  https://storage.com/vision/Carpet5020/2020-02-...  Carpet5020
2  https://storage.com/vision/Metal8020/2020-02-0...   Metal8020

编辑:要指定超出预期输出的不同值,请使用 Series.str.extract:

df['Type'] = df['URL'].str.extract('vision/(.+)/2020')
print (df)
                                                 URL        Type
0  https://storage.com/vision/Glass2020/2020-02-0...   Glass2020
1  https://storage.com/vision/Carpet5020/2020-02-...  Carpet5020
2  https://storage.com/vision/Metal8020/2020-02-0...   Metal8020

尝试 str.split:

df['Type'] = df.URL.str.split('/').str[-2]