如何在不使用 df.iterrows() 的情况下将 Pandas DataFrame 的列转入最内层索引?
How to Pivot Columns of a Pandas DataFrame into Inner-most Level Index without Using df.iterrows()?
原始.csv文件-
#,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
1,Bulbasaur,Grass,Poison,45,49,49,65,65,45,1,FALSE
2,Ivysaur,Grass,Poison,60,62,63,80,80,60,1,FALSE
3,Venusaur,Grass,Poison,80,82,83,100,100,80,1,FALSE
我的 Python 代码使用 df.iterrows() -
import pandas as pd
import os
df = pd.read_csv('pokemon_data.csv')
with open('output.txt', 'w') as f:
for index, row in df.iterrows():
row_i = str(index) + str(row)
f.write(row_i)
我了解到我们应该避免使用 df.iterrow(),因为它在处理大数据时会变得非常慢。
如何在不使用df.iterrows()的情况下,将Pandas DataFrame的列旋转到最内层索引,并得到如下结果?
0 # 1
Name Bulbasaur
Type 1 Grass
Type 2 Poison
HP 45
Attack 49
Defense 49
Sp. Atk 65
Sp. Def 65
Speed 45
Generation 1
Legendary False
1 # 2
Name Ivysaur
Type 1 Grass
Type 2 Poison
HP 60
Attack 62
Defense 63
Sp. Atk 80
Sp. Def 80
Speed 60
Generation 1
Legendary False
2 # 3
Name Venusaur
Type 1 Grass
Type 2 Poison
HP 80
Attack 82
Defense 83
Sp. Atk 100
Sp. Def 100
Speed 80
Generation 1
Legendary False
用str()
你可以得到每一行的字符串表示,然后用.str.cat
:
将它们连接在一起
>>> print(df.agg(str, axis='columns').str.cat(sep='\n\n'))
# 1
Name Bulbasaur
Type 1 Grass
Type 2 Poison
HP 45
Attack 49
Defense 49
Sp. Atk 65
Sp. Def 65
Speed 45
Generation 1
Legendary False
Name: 0, dtype: object
# 2
Name Ivysaur
Type 1 Grass
Type 2 Poison
HP 60
Attack 62
Defense 63
Sp. Atk 80
Sp. Def 80
Speed 60
Generation 1
Legendary False
Name: 1, dtype: object
# 3
Name Venusaur
Type 1 Grass
Type 2 Poison
HP 80
Attack 82
Defense 83
Sp. Atk 100
Sp. Def 100
Speed 80
Generation 1
Legendary False
Name: 2, dtype: object
如果你想保留索引号,你可以使用 reset_index()
然后调整字符串表示
>>> print(df.reset_index().agg(str, axis='columns').str.replace(r'^index\s*', '', regex=True).str.cat(sep='\n\n'))
0
# 1
Name Bulbasaur
Type 1 Grass
Type 2 Poison
HP 45
Attack 49
Defense 49
Sp. Atk 65
Sp. Def 65
Speed 45
Generation 1
Legendary False
df.stack().to_string('output.txt')
output.txt
:
0 # 1
Name Bulbasaur
Type 1 Grass
Type 2 Poison
HP 45
Attack 49
Defense 49
Sp. Atk 65
Sp. Def 65
Speed 45
Generation 1
Legendary False
1 # 2
Name Ivysaur
Type 1 Grass
Type 2 Poison
HP 60
Attack 62
Defense 63
Sp. Atk 80
Sp. Def 80
Speed 60
Generation 1
Legendary False
2 # 3
Name Venusaur
Type 1 Grass
Type 2 Poison
HP 80
Attack 82
Defense 83
Sp. Atk 100
Sp. Def 100
Speed 80
Generation 1
Legendary False
你可以使用 df.apply(axis=1)
:
import pandas as pd
import os
df = pd.read_csv('pokemon_data.csv')
with open('output.txt', 'w') as f:
def write_pokemon(pokemon):
f.write('\n\n')
f.write(pokemon.to_string())
df.apply(write_pokemon, axis=1)
原始.csv文件-
#,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
1,Bulbasaur,Grass,Poison,45,49,49,65,65,45,1,FALSE
2,Ivysaur,Grass,Poison,60,62,63,80,80,60,1,FALSE
3,Venusaur,Grass,Poison,80,82,83,100,100,80,1,FALSE
我的 Python 代码使用 df.iterrows() -
import pandas as pd
import os
df = pd.read_csv('pokemon_data.csv')
with open('output.txt', 'w') as f:
for index, row in df.iterrows():
row_i = str(index) + str(row)
f.write(row_i)
我了解到我们应该避免使用 df.iterrow(),因为它在处理大数据时会变得非常慢。
如何在不使用df.iterrows()的情况下,将Pandas DataFrame的列旋转到最内层索引,并得到如下结果?
0 # 1
Name Bulbasaur
Type 1 Grass
Type 2 Poison
HP 45
Attack 49
Defense 49
Sp. Atk 65
Sp. Def 65
Speed 45
Generation 1
Legendary False
1 # 2
Name Ivysaur
Type 1 Grass
Type 2 Poison
HP 60
Attack 62
Defense 63
Sp. Atk 80
Sp. Def 80
Speed 60
Generation 1
Legendary False
2 # 3
Name Venusaur
Type 1 Grass
Type 2 Poison
HP 80
Attack 82
Defense 83
Sp. Atk 100
Sp. Def 100
Speed 80
Generation 1
Legendary False
用str()
你可以得到每一行的字符串表示,然后用.str.cat
:
>>> print(df.agg(str, axis='columns').str.cat(sep='\n\n'))
# 1
Name Bulbasaur
Type 1 Grass
Type 2 Poison
HP 45
Attack 49
Defense 49
Sp. Atk 65
Sp. Def 65
Speed 45
Generation 1
Legendary False
Name: 0, dtype: object
# 2
Name Ivysaur
Type 1 Grass
Type 2 Poison
HP 60
Attack 62
Defense 63
Sp. Atk 80
Sp. Def 80
Speed 60
Generation 1
Legendary False
Name: 1, dtype: object
# 3
Name Venusaur
Type 1 Grass
Type 2 Poison
HP 80
Attack 82
Defense 83
Sp. Atk 100
Sp. Def 100
Speed 80
Generation 1
Legendary False
Name: 2, dtype: object
如果你想保留索引号,你可以使用 reset_index()
然后调整字符串表示
>>> print(df.reset_index().agg(str, axis='columns').str.replace(r'^index\s*', '', regex=True).str.cat(sep='\n\n'))
0
# 1
Name Bulbasaur
Type 1 Grass
Type 2 Poison
HP 45
Attack 49
Defense 49
Sp. Atk 65
Sp. Def 65
Speed 45
Generation 1
Legendary False
df.stack().to_string('output.txt')
output.txt
:
0 # 1
Name Bulbasaur
Type 1 Grass
Type 2 Poison
HP 45
Attack 49
Defense 49
Sp. Atk 65
Sp. Def 65
Speed 45
Generation 1
Legendary False
1 # 2
Name Ivysaur
Type 1 Grass
Type 2 Poison
HP 60
Attack 62
Defense 63
Sp. Atk 80
Sp. Def 80
Speed 60
Generation 1
Legendary False
2 # 3
Name Venusaur
Type 1 Grass
Type 2 Poison
HP 80
Attack 82
Defense 83
Sp. Atk 100
Sp. Def 100
Speed 80
Generation 1
Legendary False
你可以使用 df.apply(axis=1)
:
import pandas as pd
import os
df = pd.read_csv('pokemon_data.csv')
with open('output.txt', 'w') as f:
def write_pokemon(pokemon):
f.write('\n\n')
f.write(pokemon.to_string())
df.apply(write_pokemon, axis=1)