如何根据 python 中的列过滤 csv 文件？

Question

我有一个包含超过 5,000,000 行数据的 CSV 文件，看起来像这样（除了它是波斯语）：

Contract Code,Contract Type,State,City,Property Type,Region,Usage Type,Area,Percentage,Price,Price per m2,Age,Frame Type,Contract Date,Postal Code
765720,Mobayee,East Azar,Kish,Apartment,,Residential,96,100,570000,5937.5,36,Metal,13890107,5169614658
766134,Mobayee,East Azar,Qeshm,Apartment,,Residential,144.5,100,1070000,7404.84,5,Concrete,13890108,5166884645
766140,Mobayee,East Azar,Tabriz,Apartment,,Residential,144.5,100,1050000,7266.44,5,Concrete,13890108,5166884645
766146,Mobayee,East Azar,Tabriz,Apartment,,Residential,144.5,100,700000,4844.29,5,Concrete,13890108,5166884645
766147,Mobayee,East Azar,Kish,Apartment,,Residential,144.5,100,1625000,11245.67,5,Concrete,13890108,5166884645
770822,Mobayee,East Azar,Tabriz,Apartment,,Residential,144.5,50,500000,1730.1,5,Concrete,13890114,5166884645

我想编写代码将第一行作为 header 传递，然后从两个特定城市（Kish 和 Qeshm）提取数据并将其保存到新的 CSV 文件中。像这样的东西：

Contract Code,Contract Type,State,City,Property Type,Region,Usage Type,Area,Percentage,Price,Price per m2,Age,Frame Type,Contract Date,Postal Code
765720,Mobayee,East Azar,Kish,Apartment,,Residential,96,100,570000,5937.5,36,Metal,13890107,5169614658
766134,Mobayee,East Azar,Qeshm,Apartment,,Residential,144.5,100,1070000,7404.84,5,Concrete,13890108,5166884645
766147,Mobayee,East Azar,Kish,Apartment,,Residential,144.5,100,1625000,11245.67,5,Concrete,13890108,5166884645

值得一提的是，我是 python 的新手。我已经编写了以下块来定义 headers，但这是迄今为止我得到的最远的。

import pandas as pd

path = '/Users/Desktop/sample.csv'

df = pd.read_csv(path , header=[0])
df.head = ()

Answer 1

您不需要使用 header=... 因为默认是将第一行视为 header，所以

df = pd.read_csv(path)

然后，根据条件保留行：

df2 = df[df['City'].isin(['Kish', 'Qeshm'])]

你可以用

保存它

df2.to_csv(another_path)

如何根据 python 中的列过滤 csv 文件？

How can I filter a csv file based on its columns in python?

python

csv

python-3.x

pandas

spyder