如何在 Python 中构造 re.findall 正则表达式以捕获 Youtube 时间戳
How To Construct re.findall Regex In Python To Capture Youtube Timestamp
脚本
from __future__ import unicode_literals
import youtube_dl
import pandas as pd
import csv
import re
# Initialize YouTube-DL Array
ydl_opts = {}
# read the csv file
number_of_rows = pd.read_csv('single.csv')
# Scrape Online Product
def run_scraper():
# Read CSV to List
with open("single.csv", "r") as f:
csv_reader = csv.reader(f)
next(csv_reader)
# Scrape Data From Store
for csv_line_entry in csv_reader:
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
meta = ydl.extract_info(csv_line_entry[0], download=False)
description = meta['description']
#print('Description :', description)
# Function to Capture Timestamp Descriptions
get_links(description)
def get_links(description):
# Format: Timestamp + Text
description_text = re.findall(r'(\d{2}:\d{2}?.*)', description)
print(description_text)
print()
# Format: Text + Timestamp
description_text1 = re.findall(r'(.*\d{2}:\d{2}?)', description)
print(description_text1)
run_scraper()
CSV 文件
Videos, Format
https://www.youtube.com/watch?v=kqtD5dpn9C8, Format: Timestamp + Text
https://www.youtube.com/watch?v=pJ3IPRqiD2M, Format: Text + Timestamp
https://www.youtube.com/watch?v=rfscVS0vtbw, No Regex in code
https://www.youtube.com/watch?v=t8pPdKYpowI, No Regex in code
我的脚本从 CSV 文件中提取 YouTube 网址,以准备捕获一般的 YouTube 描述信息,例如介绍、链接、时间戳等。
我只想捕获 YouTube 时间戳描述,如下图突出显示:
我知道 YouTube 时间戳格式不一致,因此我在 CSV 文件中包含了一些示例。
在我的函数 get_links 中,我已经部分提取了 Timestamp + Text 和 Text + 列出的 4 个 CSV 网址中的 2 个的时间戳。
我需要一种方法来只显示时间戳的文本或描述部分,而不考虑所有 4 个 CSV 网址中显示的格式类型。
如有任何帮助,我们将不胜感激。
尝试:
import youtube_dl
import pandas as pd
import csv
import re
# Initialize YouTube-DL Array
ydl_opts = {}
r_pat = re.compile(r"\d+:\d+")
r_pat2 = re.compile(r"[^A-Za-z]*\d+:\d+:?\d*?[^A-Za-z]*")
# Scrape Online Product
def run_scraper():
# Read CSV to List
with open("single.csv", "r") as f:
csv_reader = csv.reader(f)
next(csv_reader)
# Scrape Data From Store
for csv_line_entry in csv_reader:
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
meta = ydl.extract_info(csv_line_entry[0], download=False)
description = meta["description"]
out = get_links(description)
print(*out, sep="\n")
print("-" * 80)
def get_links(description):
rv = []
for line in description.splitlines():
if r_pat.search(line):
rv.append(r_pat2.sub("", line))
return rv
run_scraper()
打印:
[youtube] kqtD5dpn9C8: Downloading webpage
Introduction
What You Can Do With Python
Your First Python Program
Variables
Receiving Input
Type Conversion
Strings
Arithmetic Operators
Operator Precedence
Comparison Operators
Logical Operators
If Statements
Exercise
While Loops
Lists
List Methods
For Loops
The range() Function
Tuples
--------------------------------------------------------------------------------
[youtube] pJ3IPRqiD2M: Downloading webpage
Python Course
What is Python
Why choose Python
Features of Python
Applications of Python
Salary Trends
Quiz
Installing Python
Python Variable
Python Tokens
...and so on.
脚本
from __future__ import unicode_literals
import youtube_dl
import pandas as pd
import csv
import re
# Initialize YouTube-DL Array
ydl_opts = {}
# read the csv file
number_of_rows = pd.read_csv('single.csv')
# Scrape Online Product
def run_scraper():
# Read CSV to List
with open("single.csv", "r") as f:
csv_reader = csv.reader(f)
next(csv_reader)
# Scrape Data From Store
for csv_line_entry in csv_reader:
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
meta = ydl.extract_info(csv_line_entry[0], download=False)
description = meta['description']
#print('Description :', description)
# Function to Capture Timestamp Descriptions
get_links(description)
def get_links(description):
# Format: Timestamp + Text
description_text = re.findall(r'(\d{2}:\d{2}?.*)', description)
print(description_text)
print()
# Format: Text + Timestamp
description_text1 = re.findall(r'(.*\d{2}:\d{2}?)', description)
print(description_text1)
run_scraper()
CSV 文件
Videos, Format
https://www.youtube.com/watch?v=kqtD5dpn9C8, Format: Timestamp + Text
https://www.youtube.com/watch?v=pJ3IPRqiD2M, Format: Text + Timestamp
https://www.youtube.com/watch?v=rfscVS0vtbw, No Regex in code
https://www.youtube.com/watch?v=t8pPdKYpowI, No Regex in code
我的脚本从 CSV 文件中提取 YouTube 网址,以准备捕获一般的 YouTube 描述信息,例如介绍、链接、时间戳等。
我只想捕获 YouTube 时间戳描述,如下图突出显示:
我知道 YouTube 时间戳格式不一致,因此我在 CSV 文件中包含了一些示例。
在我的函数 get_links 中,我已经部分提取了 Timestamp + Text 和 Text + 列出的 4 个 CSV 网址中的 2 个的时间戳。
我需要一种方法来只显示时间戳的文本或描述部分,而不考虑所有 4 个 CSV 网址中显示的格式类型。
如有任何帮助,我们将不胜感激。
尝试:
import youtube_dl
import pandas as pd
import csv
import re
# Initialize YouTube-DL Array
ydl_opts = {}
r_pat = re.compile(r"\d+:\d+")
r_pat2 = re.compile(r"[^A-Za-z]*\d+:\d+:?\d*?[^A-Za-z]*")
# Scrape Online Product
def run_scraper():
# Read CSV to List
with open("single.csv", "r") as f:
csv_reader = csv.reader(f)
next(csv_reader)
# Scrape Data From Store
for csv_line_entry in csv_reader:
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
meta = ydl.extract_info(csv_line_entry[0], download=False)
description = meta["description"]
out = get_links(description)
print(*out, sep="\n")
print("-" * 80)
def get_links(description):
rv = []
for line in description.splitlines():
if r_pat.search(line):
rv.append(r_pat2.sub("", line))
return rv
run_scraper()
打印:
[youtube] kqtD5dpn9C8: Downloading webpage
Introduction
What You Can Do With Python
Your First Python Program
Variables
Receiving Input
Type Conversion
Strings
Arithmetic Operators
Operator Precedence
Comparison Operators
Logical Operators
If Statements
Exercise
While Loops
Lists
List Methods
For Loops
The range() Function
Tuples
--------------------------------------------------------------------------------
[youtube] pJ3IPRqiD2M: Downloading webpage
Python Course
What is Python
Why choose Python
Features of Python
Applications of Python
Salary Trends
Quiz
Installing Python
Python Variable
Python Tokens
...and so on.