如何通过定义分隔符前后来提取子字符串
How to extract sub string by defining before and after delimiter
我有包含 URL 的数据框,我想在两者之间提取一些内容。
df
URL
https://storage.com/vision/Glass2020/2020-02-04_B8I8FZHl-xJ_2236301468348443721.jpg
https://storage.com/vision/Carpet5020/2020-02-04_B8I8FZHl-xJ_2236301468348443721.jpg
https://storage.com/vision/Metal8020/2020-02-04_B8I8FZHl-xJ_2236301468348443721.jpg
期望的输出是这样的
URL Type
https://storage.com/vision/Glass2020/2020-02-04_B8I8FZHl-xJ_2236301468348443721.jpg Glass2020
https://storage.com/vision/Carpet5020/2020-02-04_B8I8FZHl-xJ_2236301468348443721.jpg Carpet5020
https://storage.com/vision/Metal8020/2020-02-04_B8I8FZHl-xJ_2236301468348443721.jpg Metal8020
我会使用 df['URL'].str.extract
但要了解如何定义分隔符前后。
一个想法是通过索引将 Series.str.split
与 select 倒数第二个值一起使用:
df['Type'] = df['URL'].str.split('/').str[-2]
print (df)
URL Type
0 https://storage.com/vision/Glass2020/2020-02-0... Glass2020
1 https://storage.com/vision/Carpet5020/2020-02-... Carpet5020
2 https://storage.com/vision/Metal8020/2020-02-0... Metal8020
编辑:要指定超出预期输出的不同值,请使用 Series.str.extract
:
df['Type'] = df['URL'].str.extract('vision/(.+)/2020')
print (df)
URL Type
0 https://storage.com/vision/Glass2020/2020-02-0... Glass2020
1 https://storage.com/vision/Carpet5020/2020-02-... Carpet5020
2 https://storage.com/vision/Metal8020/2020-02-0... Metal8020
尝试 str.split
:
df['Type'] = df.URL.str.split('/').str[-2]
我有包含 URL 的数据框,我想在两者之间提取一些内容。
df
URL
https://storage.com/vision/Glass2020/2020-02-04_B8I8FZHl-xJ_2236301468348443721.jpg
https://storage.com/vision/Carpet5020/2020-02-04_B8I8FZHl-xJ_2236301468348443721.jpg
https://storage.com/vision/Metal8020/2020-02-04_B8I8FZHl-xJ_2236301468348443721.jpg
期望的输出是这样的
URL Type
https://storage.com/vision/Glass2020/2020-02-04_B8I8FZHl-xJ_2236301468348443721.jpg Glass2020
https://storage.com/vision/Carpet5020/2020-02-04_B8I8FZHl-xJ_2236301468348443721.jpg Carpet5020
https://storage.com/vision/Metal8020/2020-02-04_B8I8FZHl-xJ_2236301468348443721.jpg Metal8020
我会使用 df['URL'].str.extract
但要了解如何定义分隔符前后。
一个想法是通过索引将 Series.str.split
与 select 倒数第二个值一起使用:
df['Type'] = df['URL'].str.split('/').str[-2]
print (df)
URL Type
0 https://storage.com/vision/Glass2020/2020-02-0... Glass2020
1 https://storage.com/vision/Carpet5020/2020-02-... Carpet5020
2 https://storage.com/vision/Metal8020/2020-02-0... Metal8020
编辑:要指定超出预期输出的不同值,请使用 Series.str.extract
:
df['Type'] = df['URL'].str.extract('vision/(.+)/2020')
print (df)
URL Type
0 https://storage.com/vision/Glass2020/2020-02-0... Glass2020
1 https://storage.com/vision/Carpet5020/2020-02-... Carpet5020
2 https://storage.com/vision/Metal8020/2020-02-0... Metal8020
尝试 str.split
:
df['Type'] = df.URL.str.split('/').str[-2]