URL解析:获取N个文件夹后跟从某个文件夹开始的文件名
URL parsing: Get N number of folders followed by filename starting from a certain folder
我有一个 URL,它可以有任意数量的文件夹,它以 filename.extension 结尾。
示例:
https://cdn.example.com/user/image/upload/v87879798/images/profile/oaz4wkjkjsbzxa3xlkmu.jpg
我试图在折叠 /v87879798 版本后获取所有内容,因此:
images/profile/oaz4wkjkjsbzxa3xlkmu.jpg
我尝试了以下多种方法,但没有任何效果,因为我知道我很可能需要一个正则表达式,但我对它们的了解还不允许我构建这样的方法。我尝试过的一些方法是:
import os
from urllib.parse import urlparse
# url https://cdn.example.com/user/image/upload/v87879798/images/profile/oaz4wkjkjsbzxa3xlkmu.jpg
parsed_url = urlparse(url)
# parsed_url.path /lang-code/image/upload/v1601568948/images/profile
path = os.path.dirname(parsed_url.path)
# file_name oaz4wkjkjsbzxa3xlkmu.jpg
file_name = os.path.basename()
但目前没有任何效果。任何帮助将不胜感激。
编辑:
抱歉,忘了说我说 N 个文件夹的意思是以下任何一个 url 都是可能的:
https://cdn.example.com/user/image/upload/v87879798/images/profile/oaz4wkjkjsbzxa3xlkmu.jpg
https://cdn.example.com/user/image/upload/v87879798/images/oaz4wkjkjsbzxa3xlkmu.jpg
https://cdn.example.com/user/image/upload/v87879798/oaz4wkjkjsbzxa3xlkmu.jpg
使用正则表达式模式 --> r"v\d+\/(.+)$"
。假设随机数以 v
开头
例如:
import re
from urllib.parse import urlparse
ptrn = re.compile(r"v\d+\/(.+)$")
url = "https://cdn.example.com/user/image/upload/v87879798/images/profile/oaz4wkjkjsbzxa3xlkmu.jpg"
parsed_url = urlparse(url)
print(ptrn.search(parsed_url.path).group(1))
输出:
images/profile/oaz4wkjkjsbzxa3xlkmu.jpg
演示:
ptrn = re.compile(r"v\d+\/(.+)$")
urls = ["https://cdn.example.com/user/image/upload/v87879798/images/profile/oaz4wkjkjsbzxa3xlkmu.jpg", "https://cdn.example.com/user/image/upload/v87879798/images/profile/oaz4wkjkjsbzxa3xlkmu.jpg",
"https://cdn.example.com/user/image/upload/v87879798/images/oaz4wkjkjsbzxa3xlkmu.jpg", "https://cdn.example.com/user/image/upload/v87879798/oaz4wkjkjsbzxa3xlkmu.jpg"]
for url in urls:
parsed_url = urlparse(url)
print(ptrn.search(parsed_url.path).group(1))
输出:
images/profile/oaz4wkjkjsbzxa3xlkmu.jpg
images/profile/oaz4wkjkjsbzxa3xlkmu.jpg
images/oaz4wkjkjsbzxa3xlkmu.jpg
oaz4wkjkjsbzxa3xlkmu.jpg
我有一个 URL,它可以有任意数量的文件夹,它以 filename.extension 结尾。
示例:
https://cdn.example.com/user/image/upload/v87879798/images/profile/oaz4wkjkjsbzxa3xlkmu.jpg
我试图在折叠 /v87879798 版本后获取所有内容,因此:
images/profile/oaz4wkjkjsbzxa3xlkmu.jpg
我尝试了以下多种方法,但没有任何效果,因为我知道我很可能需要一个正则表达式,但我对它们的了解还不允许我构建这样的方法。我尝试过的一些方法是:
import os
from urllib.parse import urlparse
# url https://cdn.example.com/user/image/upload/v87879798/images/profile/oaz4wkjkjsbzxa3xlkmu.jpg
parsed_url = urlparse(url)
# parsed_url.path /lang-code/image/upload/v1601568948/images/profile
path = os.path.dirname(parsed_url.path)
# file_name oaz4wkjkjsbzxa3xlkmu.jpg
file_name = os.path.basename()
但目前没有任何效果。任何帮助将不胜感激。
编辑: 抱歉,忘了说我说 N 个文件夹的意思是以下任何一个 url 都是可能的:
https://cdn.example.com/user/image/upload/v87879798/images/profile/oaz4wkjkjsbzxa3xlkmu.jpg
https://cdn.example.com/user/image/upload/v87879798/images/oaz4wkjkjsbzxa3xlkmu.jpg
https://cdn.example.com/user/image/upload/v87879798/oaz4wkjkjsbzxa3xlkmu.jpg
使用正则表达式模式 --> r"v\d+\/(.+)$"
。假设随机数以 v
例如:
import re
from urllib.parse import urlparse
ptrn = re.compile(r"v\d+\/(.+)$")
url = "https://cdn.example.com/user/image/upload/v87879798/images/profile/oaz4wkjkjsbzxa3xlkmu.jpg"
parsed_url = urlparse(url)
print(ptrn.search(parsed_url.path).group(1))
输出:
images/profile/oaz4wkjkjsbzxa3xlkmu.jpg
演示:
ptrn = re.compile(r"v\d+\/(.+)$")
urls = ["https://cdn.example.com/user/image/upload/v87879798/images/profile/oaz4wkjkjsbzxa3xlkmu.jpg", "https://cdn.example.com/user/image/upload/v87879798/images/profile/oaz4wkjkjsbzxa3xlkmu.jpg",
"https://cdn.example.com/user/image/upload/v87879798/images/oaz4wkjkjsbzxa3xlkmu.jpg", "https://cdn.example.com/user/image/upload/v87879798/oaz4wkjkjsbzxa3xlkmu.jpg"]
for url in urls:
parsed_url = urlparse(url)
print(ptrn.search(parsed_url.path).group(1))
输出:
images/profile/oaz4wkjkjsbzxa3xlkmu.jpg
images/profile/oaz4wkjkjsbzxa3xlkmu.jpg
images/oaz4wkjkjsbzxa3xlkmu.jpg
oaz4wkjkjsbzxa3xlkmu.jpg