解析 url in pandas df 列并获取特定索引的值
parse url in pandas df column and grab value of specific index
我有一个包含 url
列的 pandas df。数据如下所示:
row url
1 'https://www.delish.com/cooking/recipe-ideas/recipes/four-cheese'
2 'https://www.delish.com/holiday-recipes/thanksgiving/thanksgiving-cabbage/
3 'https://www.delish.com/kitchen-tools/cookware-reviews/advice/kitchen-tools-gadgets/'
我只需要获取第二个索引的值,即烹饪或假日食谱等
期望的输出:
row url
1 cooking
2 holiday-recipes
3 kitchen-tools
我想将 url 解析到不同的列中,然后删除我不需要的列。这是代码:
df['protocol'],df['domain'],df['path']=zip(*df['url'].map(urlparse(df['url']).urlsplit))
错误信息是:ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
有没有更好的方法来解决这个问题?如何抓取具体索引?
这是您要找的吗?
df['url'] = df['url'].str.split('/').str[3]
print(df)
row url
0 1 cooking
1 2 holiday-recipes
2 3 kitchen-tools
另一种方法是在 com
之后立即将 alphas
与字符 -
匹配
df['url']=df['url'].str.extract('((?<=com\/)[a-z-]+)')
url
0 cooking
1 holiday-recipes
2 kitchen-tools
我有一个包含 url
列的 pandas df。数据如下所示:
row url
1 'https://www.delish.com/cooking/recipe-ideas/recipes/four-cheese'
2 'https://www.delish.com/holiday-recipes/thanksgiving/thanksgiving-cabbage/
3 'https://www.delish.com/kitchen-tools/cookware-reviews/advice/kitchen-tools-gadgets/'
我只需要获取第二个索引的值,即烹饪或假日食谱等
期望的输出:
row url
1 cooking
2 holiday-recipes
3 kitchen-tools
我想将 url 解析到不同的列中,然后删除我不需要的列。这是代码:
df['protocol'],df['domain'],df['path']=zip(*df['url'].map(urlparse(df['url']).urlsplit))
错误信息是:ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
有没有更好的方法来解决这个问题?如何抓取具体索引?
这是您要找的吗?
df['url'] = df['url'].str.split('/').str[3]
print(df)
row url
0 1 cooking
1 2 holiday-recipes
2 3 kitchen-tools
另一种方法是在 com
alphas
与字符 -
匹配
df['url']=df['url'].str.extract('((?<=com\/)[a-z-]+)')
url
0 cooking
1 holiday-recipes
2 kitchen-tools