拆分函数不适用于字符串和列表
Split function doesnt work for string and for list
刚刚进行了我的第一次网络抓取,我已经有了要提取的元素,但我找不到将它们打印为编号列表的功能。我现在的代码:
r = requests.get('https://mmazurek.dev/category/programowanie-2/page/3/', proxies={'http':'82.119.170.106'})
page = soup(r.content, "html.parser")
contents=page.findAll(None, class_="post-title-link")
for content in contents:
text_content=list(content.get_text())
first_letter=str(text_content[0])
x="".join(first_letter)
listToStr = "".join(map(str, text_content))
print(listToStr)
目的是让列表打印如下:
- P....
- J...
- ...
希望您不介意这是波兰语文本;)
def get_html(url, useragent=None, proxy=None):
session = requests.Session()
request = session.get(url=url, headers=useragent, proxies=proxy)
if request.status_code == 200:
soup = bs(request.text, 'lxml')
return soup
else:
print("Error " + str(request.status_code))
return request.status_code
def parse(soup):
data = []
contents = soup.findAll(None, class_="post-title-link")
for i, content in enumerate(contents):
text = content.text
href = content['href']
data.append([
i,
text,
href,
])
return data
return data
data = parse(get_html('https://mmazurek.dev/category/programowanie-2/page/3/', proxy={'http': '82.119.170.106'}))
print(data)
刚刚进行了我的第一次网络抓取,我已经有了要提取的元素,但我找不到将它们打印为编号列表的功能。我现在的代码:
r = requests.get('https://mmazurek.dev/category/programowanie-2/page/3/', proxies={'http':'82.119.170.106'})
page = soup(r.content, "html.parser")
contents=page.findAll(None, class_="post-title-link")
for content in contents:
text_content=list(content.get_text())
first_letter=str(text_content[0])
x="".join(first_letter)
listToStr = "".join(map(str, text_content))
print(listToStr)
目的是让列表打印如下:
- P....
- J...
- ...
希望您不介意这是波兰语文本;)
def get_html(url, useragent=None, proxy=None):
session = requests.Session()
request = session.get(url=url, headers=useragent, proxies=proxy)
if request.status_code == 200:
soup = bs(request.text, 'lxml')
return soup
else:
print("Error " + str(request.status_code))
return request.status_code
def parse(soup):
data = []
contents = soup.findAll(None, class_="post-title-link")
for i, content in enumerate(contents):
text = content.text
href = content['href']
data.append([
i,
text,
href,
])
return data
return data
data = parse(get_html('https://mmazurek.dev/category/programowanie-2/page/3/', proxy={'http': '82.119.170.106'}))
print(data)