将数据框转换为不同的形式

Transform data frame to a different form

这是我的数据框。

Date Country Value
1/4/1971 Sweden 5.1643
1/5/1971 Sweden 5.1628
1/6/1971 Sweden 5.1614
1/7/1971 Sweden 5.1649
1/8/1971 Sweden 5.1631
1/4/1971 Canada 1.0109
1/5/1971 Canada 1.0102
1/6/1971 Canada 1.0106
1/7/1971 Canada 1.0148
1/8/1971 Canada 1.0154
1/4/1971 India 8.02
1/5/1971 India 8.00
1/6/1971 India 8.01
1/7/1971 India 8.00
1/8/1971 India 8.03

我想使用 python 和 panda 像下面这样的数据框。

Date Sweden Canada India
1/4/1971 5.1643 1.0109 8.02
1/5/1971 5.1628 1.0102 8
1/6/1971 5.1614 1.0106 8.01
1/7/1971 5.1649 1.0148 8
1/8/1971 5.1631 1.0154 8.03

请帮帮我。 谢谢。

我们在这里创建您的数据框以供测试..

import pandas as pd

arr = [['1/4/1971', 'Sweden', '5.1643'],
       ['1/5/1971', 'Sweden', '5.1628'],
       ['1/6/1971', 'Sweden', '5.1614'],
       ['1/7/1971', 'Sweden', '5.1649'],
       ['1/8/1971', 'Sweden', '5.1631'],
       ['1/4/1971', 'Canada', '1.0109'],
       ['1/5/1971', 'Canada', '1.0102'],
       ['1/6/1971', 'Canada', '1.0106'],
       ['1/7/1971', 'Canada', '1.0148'],
       ['1/8/1971', 'Canada', '1.0154'],
       ['1/4/1971', 'India', '8.02'],
       ['1/5/1971', 'India', '8.00'],
       ['1/6/1971', 'India', '8.01'],
       ['1/7/1971', 'India', '8.00'],
       ['1/8/1971', 'India', '8.03']]
df = pd.DataFrame(arr,columns=['Date','Country','Value'])
print('old form')
print(df)

输出应该是这样的:

old form
        Date Country   Value
0   1/4/1971  Sweden  5.1643
1   1/5/1971  Sweden  5.1628
2   1/6/1971  Sweden  5.1614
3   1/7/1971  Sweden  5.1649
4   1/8/1971  Sweden  5.1631
5   1/4/1971  Canada  1.0109
6   1/5/1971  Canada  1.0102
7   1/6/1971  Canada  1.0106
8   1/7/1971  Canada  1.0148
9   1/8/1971  Canada  1.0154
10  1/4/1971   India    8.02
11  1/5/1971   India    8.00
12  1/6/1971   India    8.01
13  1/7/1971   India    8.00
14  1/8/1971   India    8.03

让我们施展魔法:

note: this code not optimized but works well

table = {}
for row in df.values:
    date = row[0]
    country = row[1]
    value = row[2]
    if date not in table:table[date] = {country:value}
    else:table[date][country] = value

arr = []
for date in table.keys():
    row = table[date]
    row = [date,row['Sweden'],row['Canada'],row['India']]
    arr.append(row)

df2 = pd.DataFrame(arr,columns=['Date','Sweden','Canada','India'])
print('new form')
print(df2)    

最终输出应该是

new form
       Date  Sweden  Canada India
0  1/4/1971  5.1643  1.0109  8.02
1  1/5/1971  5.1628  1.0102  8.00
2  1/6/1971  5.1614  1.0106  8.01
3  1/7/1971  5.1649  1.0148  8.00
4  1/8/1971  5.1631  1.0154  8.03

您可以使用数据框的 pivot 方法执行此操作。

代码

以下代码假定原始数据位于名为 test.csv.

的文件中
import pandas as pd

df = pd.read_csv('test.csv')

print(df)

df = df.pivot(index='Date', columns='Country', values = 'Value').reset_index()

print(df)

之前

Date Country Value
1/4/1971 Sweden 5.1643
1/5/1971 Sweden 5.1628
1/6/1971 Sweden 5.1614
1/7/1971 Sweden 5.1649
1/8/1971 Sweden 5.1631
1/4/1971 Canada 1.0109
1/5/1971 Canada 1.0102
1/6/1971 Canada 1.0106
1/7/1971 Canada 1.0148
1/8/1971 Canada 1.0154
1/4/1971 India 8.02
1/5/1971 India 8
1/6/1971 India 8.01
1/7/1971 India 8
1/8/1971 India 8.03

之后

Country Date Canada India Sweden
0 1/4/1971 1.0109 8.02 5.1643
1 1/5/1971 1.0102 8 5.1628
2 1/6/1971 1.0106 8.01 5.1614
3 1/7/1971 1.0148 8 5.1649
4 1/8/1971 1.0154 8.03 5.1631