将列的数据转换为枚举字典键值
Convert column's data to enumerated dictionary key-value
是否有更好的方法(在最少代码的意义上)可以执行以下操作:将列转换为枚举数值,因此它应该是这样的:
- 在
列中获取 组 项
- 制作一个枚举字典,键值
- 用值
还原密钥
- 使用键值结果而不是新列中的数据。
所以这就是我今天所做的,想知道是否有人可以展示一种经典的方法来做到这一点,这样我就可以避免编写函数 get_color_val:
import pandas as pd
cars = pd.DataFrame({"car_name": ["BMW","BMW","ACCURA","ACCURA","ACCURA","BMW","BMW","BMW"],"color":["RED","RED","RED","RED","GREEN","BLACK","BLUE","BLUE"]})
color_dict = dict(enumerate(set(cars["color"])))
color_dict = dict((y,x) for x,y in color_dict.iteritems())
def get_color_val(row):
my_key = row["color"]
my_value = color_dict.get(my_key)
return my_value
cars["color_val"] = cars.apply(get_color_val, axis=1)
cars = cars.drop("color",1)
print cars
Result
Before------------
car_name color
0 BMW RED
1 BMW RED
2 ACCURA RED
3 ACCURA RED
4 ACCURA GREEN
5 BMW BLACK
6 BMW BLUE
7 BMW BLUE
After------------
car_name color_val
0 BMW 3
1 BMW 3
2 ACCURA 3
3 ACCURA 3
4 ACCURA 2
5 BMW 1
6 BMW 0
7 BMW 0
在这种情况下我会使用 pd.factorize():
In [8]: cars['color_val'] = pd.factorize(cars.color)[0]
In [9]: cars
Out[9]:
car_name color color_val
0 BMW RED 0
1 BMW RED 0
2 ACCURA RED 0
3 ACCURA RED 0
4 ACCURA GREEN 1
5 BMW BLACK 2
6 BMW BLUE 3
7 BMW BLUE 3
是否有更好的方法(在最少代码的意义上)可以执行以下操作:将列转换为枚举数值,因此它应该是这样的:
- 在 列中获取 组 项
- 制作一个枚举字典,键值
- 用值 还原密钥
- 使用键值结果而不是新列中的数据。
所以这就是我今天所做的,想知道是否有人可以展示一种经典的方法来做到这一点,这样我就可以避免编写函数 get_color_val:
import pandas as pd
cars = pd.DataFrame({"car_name": ["BMW","BMW","ACCURA","ACCURA","ACCURA","BMW","BMW","BMW"],"color":["RED","RED","RED","RED","GREEN","BLACK","BLUE","BLUE"]})
color_dict = dict(enumerate(set(cars["color"])))
color_dict = dict((y,x) for x,y in color_dict.iteritems())
def get_color_val(row):
my_key = row["color"]
my_value = color_dict.get(my_key)
return my_value
cars["color_val"] = cars.apply(get_color_val, axis=1)
cars = cars.drop("color",1)
print cars
Result
Before------------
car_name color
0 BMW RED
1 BMW RED
2 ACCURA RED
3 ACCURA RED
4 ACCURA GREEN
5 BMW BLACK
6 BMW BLUE
7 BMW BLUE
After------------
car_name color_val
0 BMW 3
1 BMW 3
2 ACCURA 3
3 ACCURA 3
4 ACCURA 2
5 BMW 1
6 BMW 0
7 BMW 0
在这种情况下我会使用 pd.factorize():
In [8]: cars['color_val'] = pd.factorize(cars.color)[0]
In [9]: cars
Out[9]:
car_name color color_val
0 BMW RED 0
1 BMW RED 0
2 ACCURA RED 0
3 ACCURA RED 0
4 ACCURA GREEN 1
5 BMW BLACK 2
6 BMW BLUE 3
7 BMW BLUE 3