使用 .loc 对数据进行分类并使用 python 将数据帧迭代到 CSV 文件

Question

我有一个 CSV 文件，其中包含我想根据不同类别而不是颜色进行分类的项目列表。 CSV 文件是：

     ITEM  PRICE  QUANTITY CATEGORY
0  Carrot      5        10   Orange
1  Potato      3         5    brown
2   Beans      2         6      red
3   Pizza      2         7      red
4   Salad      3         1    green
5  Burger      1         4    brown
6  Carrot      0         0   orange
7  Carrot      0         0   orange
8  Potato      0         0    brown
9   Beans      0         0      red

我写的代码是：

import pandas as pd
path = 'C:\Users\[username]\.spyder-py3\TestFileCSV.csv

df = pd.read_csv(path)

if df.loc[index, 'ITEM'] == 'Carrot':
    df.loc[index, 'CATEGORY'] == 'VEGETABLE'
elif df.loc[index, 'ITEM'] == 'Beans':
    df.loc[index, 'CATEGORY'] == 'Legumes'
else:
    df.loc[index, 'CATEGORY'] == 'Check'
df.to_csv('TestFileCSV1.csv')

结果是我得到了一个新文件 TestFileCSV1，它具有与原始 TestFileCSV 完全相同的数据帧。 None 个类别是蔬菜或豆类。

谢谢！

编辑：澄清一下，我想遍历列表而不是一次分配一个类别。我要分类的实际数据集一次是数千个项目。再次感谢！

Answer 1

首先，您使用 == 而不是 = 进行赋值。

您可以使用它根据项目值分配类别：

df.loc[df['ITEM'] == 'Carrot', 'CATEGORY'] = 'VEGETABLE'

Answer 2

试试这个：

df['CATEGORY']= (df['ITEM'].apply(lambda x: 'VEGETABLE' if x=='Carrot' 
                                  else( 'Legumes' if x=='Beans' else 'Check')))

df

index	ITEM	PRICE	QUANTITY	CATEGORY
0	Carrot	5	10	VEGETABLE
1	Potato	3	5	check
2	Beans	2	6	Legumes
3	Pizza	2	7	check
4	Salad	3	1	check
5	Burger	1	4	check
6	Carrot	0	0	VEGETABLE
7	Carrot	0	0	VEGETABLE
8	Potato	0	0	check
9	Beans	0	0	Legumes

Answer 3

我觉得这样比较干净

# here we just map the values to the categories
def map(item):
  mapping = {'Carrot':"Vegetable",'Beans':"Legumes"}
  return map.get(item,"Check")


# apply, applies the map function to each value in x
df['CATEGORY'] = df.apply(lambda x: map(item = x['YEAR']), axis=1)

使用 .loc 对数据进行分类并使用 python 将数据帧迭代到 CSV 文件

Using .loc to categorize data and iterate over a dataframe using python to a CSV file

python

csv

dataframe

pandas