如何将嵌套列表中的数据提取到 CSV 或表格中?
How can I extract data from a nested list into a CSV or table?
我目前正在开发一个 Pokémon 数据库应用程序,为了避免手动输入大约 50,000 个 Pokémon <> Move 链接,我正在寻求自动化此过程。
我在网上找到了一个免费的数据集,其中存在 Pokémon <> Move 链接,但采用嵌套列表格式。
我已将部分数据集复制并粘贴到此处:http://pastebin.com/ADeRaBiu
最后,我想要一个 table(最好以 CSV/Excel-readable 格式存储),如下所示:
| pokemonname | move | movelearnmethod |
|-------------|---------|-----------------|
| bulbasaur | amnesia | 6E |
| bulbasaur | attract | 6M |
| bulbasaur | bind | 6T |
| bulbasaur | endure | 6E |
| bulbasaur | endure | 6T |
我曾尝试使用 Python 中的 split() 命令开始按分隔符拆分,但有多个不同的分隔符,我不知道如何解决这个问题。
任何帮助将不胜感激!谢谢!
更新:
澄清一下,我想确保如果 Pokémon 的一个动作有多个动作学习方法,例如 bulbasaur 的忍耐——它有“6E”和“6T”的动作学习方法——它会为第二种移动学习方法,如上面的table。
我不明白你所说的 'multiple delimiters' 是什么意思。好吧,逗号用在很多地方,但冒号或右括号可能是很好的分隔符。
另一种方法是使用正则表达式,因此,使用 perl 而不是 python。
友善,
亚历克西斯。
示例数据非常类似于 Python 字典,但没有引用键。您可以使用一些正则表达式修复它,然后将其作为 Python 字典引用,其中解析非常简单。
import re
import ast
data = """{bulbasaur:{learnset:{amnesia:["6E"],attract:["6M"],bind:["6T"],block:[],bodyslam:[],bulletseed:[],captivate:[],charm:["6E"],confide:["6M"],curse:["6E"],cut:["6M"],defensecurl:[],doubleedge:["6L027"],doubleteam:["6M"],echoedvoice:["6M"],endure:["6E","6T"],energyball:["6M"],facade:["6M"],falseswipe:[],flash:["6M"],frenzyplant:[],frustration:["6M"],furycutter:[],gigadrain:["6E","6T"],grassknot:["6M"],grasspledge:["6T"],grasswhistle:["6E"],grassyterrain:["6E"],growl:["6L003"],growth:["6L025"],headbutt:[],hiddenpower:["6M"],ingrain:["6E"],knockoff:["6T"],leafstorm:["6E"],leechseed:["6L007"],lightscreen:["6M"],magicalleaf:["6E"],mimic:[],mudslap:[],naturalgift:[],naturepower:["6E","6M"],petaldance:["6E"],poisonpowder:["6L013"],powerwhip:["6E"],protect:["6M"],razorleaf:["6L019"],rest:["6M"],"return":["6M"],rocksmash:["6M"],round:["6M"],safeguard:["6M"],secretpower:["6M"],seedbomb:["6L037","6T"],skullbash:["6E"],sleeppowder:["6L013"],sleeptalk:["6M"],sludge:["6E"],sludgebomb:["6M"],snore:["6T"],solarbeam:["6M"],strength:["6M"],stringshot:[],substitute:["6M"],sunnyday:["6M"],swagger:["6M"],sweetscent:["6L021"],swordsdance:["6M"],synthesis:["6L033","6T"],tackle:["6L001a"],takedown:["6L015"],toxic:["6M"],venoshock:["6M"],vinewhip:["6L009"],weatherball:[],worryseed:["6L031","6T"]}}}"""
dict_data = re.sub('(\w+):', '"\1":', data)
move_data = ast.literal_eval(dict_data)
for pokemonname in move_data.keys():
learn_set = move_data[pokemonname]['learnset']
for move in learn_set.keys():
for method in learn_set[move]:
print 'pokemonname: {0}, move: {1}, movelearnmethod: {2}'.format(pokemonname, move, method)
pokemonname: bulbasaur, move: sludgebomb, movelearnmethod: 6M
pokemonname: bulbasaur, move: venoshock, movelearnmethod: 6M
pokemonname: bulbasaur, move: doubleteam, movelearnmethod: 6M
pokemonname: bulbasaur, move: confide, movelearnmethod: 6M
pokemonname: bulbasaur, move: rest, movelearnmethod: 6M
pokemonname: bulbasaur, move: sludge, movelearnmethod: 6E
pokemonname: bulbasaur, move: growth, movelearnmethod: 6L025
pokemonname: bulbasaur, move: grassknot, movelearnmethod: 6M
pokemonname: bulbasaur, move: facade, movelearnmethod: 6M
pokemonname: bulbasaur, move: return, movelearnmethod: 6M
pokemonname: bulbasaur, move: attract, movelearnmethod: 6M
pokemonname: bulbasaur, move: echoedvoice, movelearnmethod: 6M
pokemonname: bulbasaur, move: substitute, movelearnmethod: 6M
pokemonname: bulbasaur, move: growl, movelearnmethod: 6L003
pokemonname: bulbasaur, move: curse, movelearnmethod: 6E
pokemonname: bulbasaur, move: powerwhip, movelearnmethod: 6E
pokemonname: bulbasaur, move: ingrain, movelearnmethod: 6E
pokemonname: bulbasaur, move: gigadrain, movelearnmethod: 6E
pokemonname: bulbasaur, move: gigadrain, movelearnmethod: 6T
pokemonname: bulbasaur, move: worryseed, movelearnmethod: 6L031
pokemonname: bulbasaur, move: worryseed, movelearnmethod: 6T
pokemonname: bulbasaur, move: flash, movelearnmethod: 6M
pokemonname: bulbasaur, move: takedown, movelearnmethod: 6L015
...
获得这些数据后,我建议您查看 Python 的 CSV 编写器:https://docs.python.org/2/library/csv.html#writer-objects。创建 writer 对象后,您可以将上面的打印替换为对 writerow 的调用。
我目前正在开发一个 Pokémon 数据库应用程序,为了避免手动输入大约 50,000 个 Pokémon <> Move 链接,我正在寻求自动化此过程。 我在网上找到了一个免费的数据集,其中存在 Pokémon <> Move 链接,但采用嵌套列表格式。
我已将部分数据集复制并粘贴到此处:http://pastebin.com/ADeRaBiu
最后,我想要一个 table(最好以 CSV/Excel-readable 格式存储),如下所示:
| pokemonname | move | movelearnmethod |
|-------------|---------|-----------------|
| bulbasaur | amnesia | 6E |
| bulbasaur | attract | 6M |
| bulbasaur | bind | 6T |
| bulbasaur | endure | 6E |
| bulbasaur | endure | 6T |
我曾尝试使用 Python 中的 split() 命令开始按分隔符拆分,但有多个不同的分隔符,我不知道如何解决这个问题。 任何帮助将不胜感激!谢谢!
更新:
澄清一下,我想确保如果 Pokémon 的一个动作有多个动作学习方法,例如 bulbasaur 的忍耐——它有“6E”和“6T”的动作学习方法——它会为第二种移动学习方法,如上面的table。
我不明白你所说的 'multiple delimiters' 是什么意思。好吧,逗号用在很多地方,但冒号或右括号可能是很好的分隔符。
另一种方法是使用正则表达式,因此,使用 perl 而不是 python。
友善, 亚历克西斯。
示例数据非常类似于 Python 字典,但没有引用键。您可以使用一些正则表达式修复它,然后将其作为 Python 字典引用,其中解析非常简单。
import re
import ast
data = """{bulbasaur:{learnset:{amnesia:["6E"],attract:["6M"],bind:["6T"],block:[],bodyslam:[],bulletseed:[],captivate:[],charm:["6E"],confide:["6M"],curse:["6E"],cut:["6M"],defensecurl:[],doubleedge:["6L027"],doubleteam:["6M"],echoedvoice:["6M"],endure:["6E","6T"],energyball:["6M"],facade:["6M"],falseswipe:[],flash:["6M"],frenzyplant:[],frustration:["6M"],furycutter:[],gigadrain:["6E","6T"],grassknot:["6M"],grasspledge:["6T"],grasswhistle:["6E"],grassyterrain:["6E"],growl:["6L003"],growth:["6L025"],headbutt:[],hiddenpower:["6M"],ingrain:["6E"],knockoff:["6T"],leafstorm:["6E"],leechseed:["6L007"],lightscreen:["6M"],magicalleaf:["6E"],mimic:[],mudslap:[],naturalgift:[],naturepower:["6E","6M"],petaldance:["6E"],poisonpowder:["6L013"],powerwhip:["6E"],protect:["6M"],razorleaf:["6L019"],rest:["6M"],"return":["6M"],rocksmash:["6M"],round:["6M"],safeguard:["6M"],secretpower:["6M"],seedbomb:["6L037","6T"],skullbash:["6E"],sleeppowder:["6L013"],sleeptalk:["6M"],sludge:["6E"],sludgebomb:["6M"],snore:["6T"],solarbeam:["6M"],strength:["6M"],stringshot:[],substitute:["6M"],sunnyday:["6M"],swagger:["6M"],sweetscent:["6L021"],swordsdance:["6M"],synthesis:["6L033","6T"],tackle:["6L001a"],takedown:["6L015"],toxic:["6M"],venoshock:["6M"],vinewhip:["6L009"],weatherball:[],worryseed:["6L031","6T"]}}}"""
dict_data = re.sub('(\w+):', '"\1":', data)
move_data = ast.literal_eval(dict_data)
for pokemonname in move_data.keys():
learn_set = move_data[pokemonname]['learnset']
for move in learn_set.keys():
for method in learn_set[move]:
print 'pokemonname: {0}, move: {1}, movelearnmethod: {2}'.format(pokemonname, move, method)
pokemonname: bulbasaur, move: sludgebomb, movelearnmethod: 6M
pokemonname: bulbasaur, move: venoshock, movelearnmethod: 6M
pokemonname: bulbasaur, move: doubleteam, movelearnmethod: 6M
pokemonname: bulbasaur, move: confide, movelearnmethod: 6M
pokemonname: bulbasaur, move: rest, movelearnmethod: 6M
pokemonname: bulbasaur, move: sludge, movelearnmethod: 6E
pokemonname: bulbasaur, move: growth, movelearnmethod: 6L025
pokemonname: bulbasaur, move: grassknot, movelearnmethod: 6M
pokemonname: bulbasaur, move: facade, movelearnmethod: 6M
pokemonname: bulbasaur, move: return, movelearnmethod: 6M
pokemonname: bulbasaur, move: attract, movelearnmethod: 6M
pokemonname: bulbasaur, move: echoedvoice, movelearnmethod: 6M
pokemonname: bulbasaur, move: substitute, movelearnmethod: 6M
pokemonname: bulbasaur, move: growl, movelearnmethod: 6L003
pokemonname: bulbasaur, move: curse, movelearnmethod: 6E
pokemonname: bulbasaur, move: powerwhip, movelearnmethod: 6E
pokemonname: bulbasaur, move: ingrain, movelearnmethod: 6E
pokemonname: bulbasaur, move: gigadrain, movelearnmethod: 6E
pokemonname: bulbasaur, move: gigadrain, movelearnmethod: 6T
pokemonname: bulbasaur, move: worryseed, movelearnmethod: 6L031
pokemonname: bulbasaur, move: worryseed, movelearnmethod: 6T
pokemonname: bulbasaur, move: flash, movelearnmethod: 6M
pokemonname: bulbasaur, move: takedown, movelearnmethod: 6L015
...
获得这些数据后,我建议您查看 Python 的 CSV 编写器:https://docs.python.org/2/library/csv.html#writer-objects。创建 writer 对象后,您可以将上面的打印替换为对 writerow 的调用。