用于从 youtube 中提取视频 ID 或播放列表 ID 的正则表达式 url

Question

我想知道如何使用单个正则表达式根据 url 提取 YouTube 视频 ID 或播放列表 ID。正则表达式还应确保域是 youtube.com 以下是我需要的一些结果：

提取

的播放列表ID

    https://www.youtube.com/playlist?list=PLuC2HflhhpLGQ4RgqA76_Gv52fGA0909r
    www.youtube.com/playlist?list=PLuC2HflhhpLGQ4RgqA76_Gv52fGA0909r
    http://www.youtube.com/playlist?list=PLuC2HflhhpLGQ4RgqA76_Gv52fGA0909r
    https://www.youtube.com/embed/videoseries?list=PLuC2HflhhpLGQ4RgqA76_Gv52fGA0909r

提取视频 ID

https://www.youtube.com/watch?v=fqMfRi2gJok&index=1&list=PLuC2HflhhpLGQ4RgqA76_Gv52fGA0909r
https://www.youtube.com/watch?v=fqMfRi2gJok
http://youtu.be/cCnrX1w5luM 
http://youtube.com/embed/cCnrX1w5luM
http://youtube.com/v/cCnrX1w5luM
https://www.youtube.com/v/cCnrX1w5luM
www.youtube.com/v/cCnrX1w5luM
youtube.com/v/cCnrX1w5luM

这些只是示例 url。我需要为所有可能的 youtube link 结构提取各自的 ID。

简而言之，提取视频 ID，如果不存在，则获取播放列表 ID。

Answer 1

这里是：

/\?(?:v|list)=(\w*)/g

您可以使用正则表达式或 (|)

您可以在这里测试和查看：

https://regex101.com/r/mI3qY9/2

更新

我更新了正则表达式（感谢您对捕获下划线的评论），并使第一组成为非捕获

更新也抓拍：youtu.be/cCnrX1w5luM

/(?:\?v=|\?list=|be/)(\w)/g*

https://regex101.com/r/mI3qY9/6

Answer 2

https://regex101.com/r/mI3qY9/4

这个正则表达式假设你给它一个合法的 Youtube link。这将所有 v 和 lists 一起抓取：

/(?:(?:\?|&)(?:v|list)=|embed\/|v\/|youtu\.be\/)((?!videoseries)[a-zA-Z0-9_]*)/g

细分：

/
(?:                         //non-capturing group
  (?:\?|&)(?:v|list)=       //? or & following a v or list
  |                         //or
  embed\/                   //embed/
  |                         //or
  v\/                       //v/            
  |                         //or
  youtu\.be\/               //youtu.be/
)
(
  (?!videoseries)           //will not capture "videoseries"
  [a-zA-Z0-9_]*             //capture any alphabet digits or underscore that follows afterwards
)          
/g                          //global

但是你可能分不清哪个是v哪个是list，所以，

这只会抓取 v:

/(?:(?:\?|&)v=|embed\/|v\/|youtu\.be\/)((?!videoseries)[a-zA-Z0-9_]*)/g

这只会抓取 list:

/(?:(?:\?|&)list=)((?!videoseries)[a-zA-Z0-9_]*)/g

这只会抓取 YouTube vs:

/(?:youtube\.com.*(?:\?|&)(?:v)=|youtube\.com.*embed\/|youtube\.com.*v\/|youtu\.be\/)((?!videoseries)[a-zA-Z0-9_]*)/g

仅限 YouTube lists:

/(?:youtube\.com.*(?:\?|&)(?:list)=)((?!videoseries)[a-zA-Z0-9_]*)/g

这基本上是相同的，但也在正则表达式中添加了 youtube\.com.*。它不会抓住例如http://example.com/v/abc

https://regex101.com/r/mI3qY9/5

解释：

youtube\.com.*          //Matches youtube.com and any multiple characters followed

Answer 3

您的问题显然有两种模式

第一个：

^.*?(?:v|list)=(.*?)(?:&|$)

对于任何具有显式属性的 url，或者你可以说它们在 url.

中有 = 符号

说明

^.*?(?:v|list)=：直到单词 v= 或 list= 的任何字符串，这里我们更喜欢 v 而不是 list、

(.*?)(?:&|$)：任何以 & 符号或结束行符号 $ 结尾的字符串，在这里我们更喜欢 & 而不是 $.

第二个：

^(?:(?!=).)*\/(.*)$

对于任何没有属性的 url 或者 url 中没有 = 符号。

说明

^(?:(?!=).)*\/：任何没有 = 符号的字符串（这里由负先行 (?!=) 处理）直到 / 符号，

(.*)$：直到行尾的任何字符串。

将它们组合成一个正则表达式我们得到

^(?:https?:\/\/)?(?:www\.)?youtu\.?be(?:\.com)?.*?(?:v|list)=(.*?)(?:&|$)|^(?:https?:\/\/)?(?:www\.)?youtu\.?be(?:\.com)?(?:(?!=).)*\/(.*)$

这里，

添加

(?:https?:\/\/)?(?:www\.)?youtu\.?be(?:\.com)? 以处理各种形式的 www.youtube.com 的 url

这应该可以帮助您获得想要的东西

参见：DEMO

重要说明：这个问题，提问者想从www.youtube.com中提取id，他更喜欢"video id"而不是"playlist id" .

用于从 youtube 中提取视频 ID 或播放列表 ID 的正则表达式 url

Regex to extract both video id or playlist id from youtube url

javascript

regex

youtube