如何更改 python 的公式?编码新手,任何帮助表示赞赏
How to change this formula for python? Newbie to coding, any help is appreciated
您好,我目前已从网上获得此代码。它当前获取以下代码的 url 和公司信息。有什么方法可以更新此代码以显示部门和行业信息以替换 url 和公司信息?编码新手,非常感谢任何帮助:)
代码如下:
import bs4 as BeautifulSoup
from bs4 import SoupStrainer
import re
import urllib.request
import pandas as pd
import requests
symbols = ['SBUX', 'MET', 'CAT', 'JNJ', 'ORCL']
headers = {'User-agent': 'Mozilla/5.0'}
mySymbols = {}
for s in symbols:
vals = {}
url = ("https://finance.yahoo.com/quote/{}/profile?p={}".format(s,s))
webpage = requests.get(url, headers=headers)
soup = BeautifulSoup.BeautifulSoup(webpage.content)
title = soup.find("title")
tmp = title.get_text()
rxTitle = re.compile(r'\(.*$')
coName = rxTitle.sub("", tmp)
for link in soup.find_all('a', href=True):
try:
if link['target'] and "" == link['title']:
m = re.search('yahoo', link['href'], flags=re.IGNORECASE)
if None == m:
url = link['href']
webpage = requests.get(url, headers=headers)
soup = BeautifulSoup.BeautifulSoup(webpage.content)
vals = {"company":coName, "url":link['href']}
print (s, vals)
mySymbols[s] = vals
except:
pass
查看其中一个页面,我看到该部门处于 'class'='Fw(600)' 和 'data-reactid'=21 的范围内,而 data-reactid=25 的行业,所以你可以使用
sector = soup.find('span', {'class':'Fw(600)','data-reactid': '21'})
print(sector.next)
industry = soup.find('span', {'class':'Fw(600)','data-reactid': '25'})
print(industry.next)
sector.next 获取范围内的内容,而不是返回整个内容。
查找 'Sector' 和 'Industry' 跨度以及 returns 后续跨度的更好方法在下面进行了完整编码:
import bs4 as BeautifulSoup
import requests
def get_tags(url):
webpage = requests.get(url, headers=headers)
soup = BeautifulSoup.BeautifulSoup(webpage.content)
title = soup.find("title")
results = {}
tmp = title.get_text()
results['title'] = tmp
spans = soup.findAll('span')
for i in range(len(spans)):
if spans[i] and spans[i].text == 'Sector':
sector = spans[i+1].text
results['Sector'] = sector
if spans[i] and spans[i].text == 'Industry':
industry = spans[i+1].text
results['Industry'] = industry
return results
headers = {'User-agent': 'Mozilla/5.0'}
symbols = ['SBUX', 'MET', 'CAT', 'JNJ', 'ORCL']
for s in symbols:
url = ("https://finance.yahoo.com/quote/{}/profile?p={}".format(s,s))
results = get_tags(url)
print(results)
您好,我目前已从网上获得此代码。它当前获取以下代码的 url 和公司信息。有什么方法可以更新此代码以显示部门和行业信息以替换 url 和公司信息?编码新手,非常感谢任何帮助:)
代码如下:
import bs4 as BeautifulSoup
from bs4 import SoupStrainer
import re
import urllib.request
import pandas as pd
import requests
symbols = ['SBUX', 'MET', 'CAT', 'JNJ', 'ORCL']
headers = {'User-agent': 'Mozilla/5.0'}
mySymbols = {}
for s in symbols:
vals = {}
url = ("https://finance.yahoo.com/quote/{}/profile?p={}".format(s,s))
webpage = requests.get(url, headers=headers)
soup = BeautifulSoup.BeautifulSoup(webpage.content)
title = soup.find("title")
tmp = title.get_text()
rxTitle = re.compile(r'\(.*$')
coName = rxTitle.sub("", tmp)
for link in soup.find_all('a', href=True):
try:
if link['target'] and "" == link['title']:
m = re.search('yahoo', link['href'], flags=re.IGNORECASE)
if None == m:
url = link['href']
webpage = requests.get(url, headers=headers)
soup = BeautifulSoup.BeautifulSoup(webpage.content)
vals = {"company":coName, "url":link['href']}
print (s, vals)
mySymbols[s] = vals
except:
pass
查看其中一个页面,我看到该部门处于 'class'='Fw(600)' 和 'data-reactid'=21 的范围内,而 data-reactid=25 的行业,所以你可以使用
sector = soup.find('span', {'class':'Fw(600)','data-reactid': '21'})
print(sector.next)
industry = soup.find('span', {'class':'Fw(600)','data-reactid': '25'})
print(industry.next)
sector.next 获取范围内的内容,而不是返回整个内容。
查找 'Sector' 和 'Industry' 跨度以及 returns 后续跨度的更好方法在下面进行了完整编码:
import bs4 as BeautifulSoup
import requests
def get_tags(url):
webpage = requests.get(url, headers=headers)
soup = BeautifulSoup.BeautifulSoup(webpage.content)
title = soup.find("title")
results = {}
tmp = title.get_text()
results['title'] = tmp
spans = soup.findAll('span')
for i in range(len(spans)):
if spans[i] and spans[i].text == 'Sector':
sector = spans[i+1].text
results['Sector'] = sector
if spans[i] and spans[i].text == 'Industry':
industry = spans[i+1].text
results['Industry'] = industry
return results
headers = {'User-agent': 'Mozilla/5.0'}
symbols = ['SBUX', 'MET', 'CAT', 'JNJ', 'ORCL']
for s in symbols:
url = ("https://finance.yahoo.com/quote/{}/profile?p={}".format(s,s))
results = get_tags(url)
print(results)