如何将python中的数据转换成如下形式？

Question

我在 csv 文件中有以下格式的一些数据。

   Id   Category
    1   A
    2   B
    3   C
    4   B
    5   C
    6   d

我想把它转换成下面的格式，然后另存为另一个csv文件

Id  A   B   C   D   E
1   1   0   0   0   0
2   0   1   0   0   0
3   0   0   1   0   0
4   0   1   0   0   0
5   0   0   1   0   0
6   0   0   0   1   0

Answer 1

试试 pd.get_dummies()

>> df = pd.read_csv(<path_to_file>, sep=',', encoding='utf-8', header=0)

>> df
   Id   Category
0   1          A
1   2          B
2   3          C
3   4          B
4   5          C
5   6          d

>> pd.get_dummies(df.Category)

这将编码 Category 并为您提供新列：

A B C d

但不会 'fix' d -> D 并且不会为您提供无法从 Category.

中的值推导出的任何列

我建议你查看之前评论中发布的解决方案。

编辑

# Load data from .CSV with pd.read_csv() as demonstrated above

In [13]: df
Out[13]: 
  Category  Id
0        A   1
1        B   2
2        C   3
3        B   4
4        C   5
5        D   6

## One-liner for hot-encoding, then concatenating to original dataframe 
## and finally dropping the old column 'Category', you can skip the 
## last part if you want to keep original column as well.
In [14]: df = pd.concat([df, pd.get_dummies(df.Category)], axis=1).drop('Category', axis=1)

In [15]: df
Out[15]: 
   Id    A    B    C    D
0   1  1.0  0.0  0.0  0.0
1   2  0.0  1.0  0.0  0.0
2   3  0.0  0.0  1.0  0.0
3   4  0.0  1.0  0.0  0.0
4   5  0.0  0.0  1.0  0.0
5   6  0.0  0.0  0.0  1.0

## Write to file
In [16]: df.to_csv(<output_path>, sep='\t', encoding='utf-8', index=None)

如您所见，这不是转置，仅将 Category 列的热编码结果添加到每一行。

无论 Excel 是否接受最终数据，不幸的是，您对此 Pandas 无能为力。

希望对您有所帮助。

Answer 2

使用数据透视表 table（已更新以包含 .csv read/write 功能）：

import pandas as pd
path = 'the path to your file'
df = pd.read_csv(path)

# your original dataframe
# Category  Id
# 0        A   1
# 1        B   2
# 2        C   3
# 3        B   4
# 4        C   5
# 5        D   6

# pivot table
df.pivot_table(index=['Id'], columns='Category', fill_value=0, aggfunc='size')

# save to file
df.to_csv('path\filename.csv') #e.g. 'C:\Users\you\Documents\filename.csv'

输出：

Category  A  B  C  D
Id                  
1         1  0  0  0
2         0  1  0  0
3         0  0  1  0
4         0  1  0  0
5         0  0  1  0
6         0  0  0  1

如何将python中的数据转换成如下形式？

How to convert the data as following in python?

python

text-processing

python-3.x

pandas

spyder