创建一个从网站抓取文件的 python 程序

Creating a python program that scrapes file from a website

这是我目前所拥有的

import urllib
Champions=["Aatrox","Ahri","Akali","Alistar","Amumu","Anivia","Annie","Ashe","Azir","Blitzcrank","Brand","Braum","Caitlyn","Cassiopeia","ChoGath","Corki","Darius","Diana","DrMundo","Draven","Elise","Evelynn","Ezreal","Fiddlesticks","Fiora","Fizz","Galio","Gangplank","Garen","Gnar","Gragas","Graves","Hecarim","Heimerdinger","Irelia","Janna","JarvanIV","Jax","Jayce","Jinx","Kalista","Karma","Karthus","Kassadin","Katarina","Kayle","Kennen","KhaZix","KogMaw","LeBlanc","LeeSin","Leona","Lissandra","Lucian","Lulu","Lux","Malphite","Malzahar","Maokai","MasterYi","MissFortune","Mordekaiser","Morgana","Nami","Nasus","Nautilus","Nidalee","Nocturne","Nunu","Olaf","Orianna","Pantheon","Poppy","Quinn","Rammus","RekSai","Renekton","Rengar","Riven","Rumble","Ryze","Sejuani","Shaco","Shen","Shyvana","Singed","Sion","Sivir","Skarner","Sona","Soraka","Swain","Syndra","Talon","Taric","Teemo","Thresh","Tristana","Trundle","Tryndamere","TwistedFate","Twitch","Udyr","Urgot","Varus","Vayne","Veigar","VelKoz","Vi","Viktor","Vladimir","Volibear","Warwick","Wukong","Xerath","XinZhao","Yasuo","Yorick","Zac","Zed","Ziggs","Zilean","Zyra"]
currentCount=0
while currentCount < len(Champions):
    urllib.urlretrieve("http://www.lolflavor.com/champions/"+Champions[currentCount]+ "/Recommended/"+Champions[currentCount]+"_lane_scrape.json","C:\Users\Jay\Desktop\LolFlavor\ " +Champions[currentCount]+ "\ "+Champions[currentCount]+ "_lane_scrape.json")
    currentCount+=1

该程序的目的是使用列表和 currentCount 获得冠军,然后转到网站,例如 "Aatrox" http://www.lolflavor.com/champions/Aatrox/Recommended/Aatrox_lane_scrape.json,然后下载并存储在本例中,文件夹 LolFlavor/Aatrox/Aatrox_lane_scrape.json 中的文件。

Aatrox 的位置因冠军而异。 谁能帮我让它工作?

编辑:当前代码有值错误:

import json
import os
import requests
Champions=["Aatrox","Ahri","Akali","Alistar","Amumu","Anivia","Annie","Ashe","Azir","Blitzcrank","Brand","Braum","Caitlyn","Cassiopeia","ChoGath","Corki","Darius","Diana","DrMundo","Draven","Elise","Evelynn","Ezreal","Fiddlesticks","Fiora","Fizz","Galio","Gangplank","Garen","Gnar","Gragas","Graves","Hecarim","Heimerdinger","Irelia","Janna","JarvanIV","Jax","Jayce","Jinx","Kalista","Karma","Karthus","Kassadin","Katarina","Kayle","Kennen","KhaZix","KogMaw","LeBlanc","LeeSin","Leona","Lissandra","Lucian","Lulu","Lux","Malphite","Malzahar","Maokai","MasterYi","MissFortune","Mordekaiser","Morgana","Nami","Nasus","Nautilus","Nidalee","Nocturne","Nunu","Olaf","Orianna","Pantheon","Poppy","Quinn","Rammus","RekSai","Renekton","Rengar","Riven","Rumble","Ryze","Sejuani","Shaco","Shen","Shyvana","Singed","Sion","Sivir","Skarner","Sona","Soraka","Swain","Syndra","Talon","Taric","Teemo","Thresh","Tristana","Trundle","Tryndamere","TwistedFate","Twitch","Udyr","Urgot","Varus","Vayne","Veigar","VelKoz","Vi","Viktor","Vladimir","Volibear","Warwick","Wukong","Xerath","XinZhao","Yasuo","Yorick","Zac","Zed","Ziggs","Zilean","Zyra"]
for champ in Champions:
    os.makedirs("C:\Users\Jay\Desktop\LolFlavor\{}\Recommended".format(champ), exist_ok=True)
    with open(r"C:\Users\Jay\Desktop\LolFlavor\{}\Recommended\{}_lane_scrape.json".format(champ,champ),"w") as f:
        r = requests.get("http://www.lolflavor.com/champions/{}/Recommended/{}_lane_scrape.json".format(champ,champ))
        json.dump(r.json(),f)
    with open(r"C:\Users\Jay\Desktop\LolFlavor\{}\Recommended\{}_jungle_scrape.json".format(champ,champ),"w") as f:
        r = requests.get("http://www.lolflavor.com/champions/{}/Recommended/{}_jungle_scrape.json".format(champ,champ))
        json.dump(r.json(),f)
    with open(r"C:\Users\Jay\Desktop\LolFlavor\{}\Recommended\{}_support_scrape.json".format(champ,champ),"w") as f:
        r = requests.get("http://www.lolflavor.com/champions/{}/Recommended/{}_support_scrape.json".format(champ,champ))
        json.dump(r.json(),f)
    with open(r"C:\Users\Jay\Desktop\LolFlavor\{}\Recommended\{}_aram_scrape.json".format(champ,champ),"w") as f:
        r = requests.get("http://www.lolflavor.com/champions/{}/Recommended/{}_aram_scrape.json".format(champ,champ))
        json.dump(r.json(),f)
import  requests

Champions=["Aatrox","Ahri","Akali","Alistar","Amumu","Anivia","Annie","Ashe","Azir","Blitzcrank","Brand","Braum","Caitlyn","Cassiopeia","ChoGath","Corki","Darius","Diana","DrMundo","Draven","Elise","Evelynn","Ezreal","Fiddlesticks","Fiora","Fizz","Galio","Gangplank","Garen","Gnar","Gragas","Graves","Hecarim","Heimerdinger","Irelia","Janna","JarvanIV","Jax","Jayce","Jinx","Kalista","Karma","Karthus","Kassadin","Katarina","Kayle","Kennen","KhaZix","KogMaw","LeBlanc","LeeSin","Leona","Lissandra","Lucian","Lulu","Lux","Malphite","Malzahar","Maokai","MasterYi","MissFortune","Mordekaiser","Morgana","Nami","Nasus","Nautilus","Nidalee","Nocturne","Nunu","Olaf","Orianna","Pantheon","Poppy","Quinn","Rammus","RekSai","Renekton","Rengar","Riven","Rumble","Ryze","Sejuani","Shaco","Shen","Shyvana","Singed","Sion","Sivir","Skarner","Sona","Soraka","Swain","Syndra","Talon","Taric","Teemo","Thresh","Tristana","Trundle","Tryndamere","TwistedFate","Twitch","Udyr","Urgot","Varus","Vayne","Veigar","VelKoz","Vi","Viktor","Vladimir","Volibear","Warwick","Wukong","Xerath","XinZhao","Yasuo","Yorick","Zac","Zed","Ziggs","Zilean","Zyra"]

for champ in Champions:
    r = requests.get("http://www.lolflavor.com/champions/{}/Recommended/{}_lane_scrape.json".format(champ,champ))
    print(r.json())

如果你想把每一个都保存到一个文件中。 dump json.

import json
import simplejson 

for champ in Champions:
    with open(r"C:\Users\Jay\Desktop\LolFlavor\{}_lane_scrape.json".format(champ),"w") as f:
        try:
            r = requests.get("http://www.lolflavor.com/champions/{}/Recommended/{}_lane_scrape.json".format(champ, champ))
            json.dump(r.json(),f)
        except simplejson.scanner.JSONDecodeError as e:
            print(e.r.url)

错误来自 404 - File or directory not found,因为你们中的一个调用失败,因此没有有效的 json 可以解码。 违规的 url 是:

u'http://www.lolflavor.com/champions/Wukong/Recommended/Wukong_lane_scrape.json'

如果您在浏览器中尝试,也会出现 404 错误。这是因为没有用户 Wukong,可以通过在浏览器中打开 http://www.lolflavor.com/champions/Wukong/ 来确认

无需使用 while 循环为列表编制索引。只需直接遍历列表项并使用 str.format 将变量传递到 url。还要确保在使用 \'s 时使用原始字符串 r 作为文件路径,因为它们在 python 中具有特殊含义,它们用于转义字符,因此 \n\r 等..在你的路径中会导致问题。您还可以使用 / 或使用 \.

转义