网络抓取不同的足球现场比分网站

Web scraping different football live scores sites

我要开发的应用程序需要数据库中的足球实时比分。我发现的 api 不完整或没有我需要的某些功能,网络抓取实时比分网站是否合法?我想我可以抓取不同的网站来不创造流量,你怎么看?谢谢

我不认为从网站解析数据是违法的。你可能会想做这样的事情,它是一个程序,它会转到指定的网页并从特定的行中获取数据并将其保存到文件中以供另一个程序使用。

#this is to get the price of various stocks from Google's search page.     Let's hope this works.
import requests
# Example file for parsing and processing HTML
# import the HTMLParser module
from HTMLParser import HTMLParser
import time

metacount = 0;

x=0
while x==0:
# create a subclass and override the handler methods
class MyHTMLParser(HTMLParser):
    # function to handle character and text data (tag contents)
    def handle_data(self, data):
        #print data
        pos = self.getpos()
        #print "At line: ", pos[0], " position ", pos[1]
        if pos[0]==154:
            price=data
            print price
            # Open a file for writing and create it if it doesn't exist
            f = open("price.txt", "w+")
            # write some lines of data to the file
            f.write(price)
            f.close()
            # Open the file back up and read the contents
            #if = open("price.txt", "r")
            #if f.mode == 'r':  # check to make sure that the file was opened
        # use the read() function to read the entire file
            #   print('true')



def main():
    # instantiate the parser and feed it some HTML
    parser = MyHTMLParser()
    #stock=open('stocks.txt')
    stockname=raw_input('stock symbol')#stock.read()
    r=requests.get('http://stocks.tradingcharts.com/stocks/quotes/'+stockname)
    #print (r.status_code)
    stuff = r.text
    parser.feed(stuff)


if __name__ == "__main__":
    main();
#put it on a timer since the page is updated once every 5 minutes
time.sleep(300)