在 python 中,我想遍历多个 csv 文件并删除特定行
In python, I want to loop through multiple csv files and remove specific rows
我有 10 个 csv
文件,在每个文件中,我想删除 UID
列中包含以下数字的行 - 1002
、1007
、1008
.
请注意,所有 10 个 csv
文件都具有相同的列名称
# one of the csv files looks like this
import pandas as pd
df = {
'UID':[1001,1002,1003,1004,1005,1006,1007,1008,1009,1010],
'Name':['Ray','James','Juelz','Cam','Jim','Jones','Bleek','Shawn','Beanie','Amil'],
'Income':[100.22,199.10, 191.13,199.99,230.6,124.2,122.9,128.7,188.12,111.3],
'Age':[24,32,27,54,23,41,44,29,30,68]
}
df = pd.DataFrame(df)
df = df[['UID','Name','Age','Income']]
df
尝试
#I know I need a for loop or glob to iterate through the folder and filter out the desired UIDs. My dilemma is I don't know how to incorporate steps II & III in I
#Step I: looping through the .csv files in the folder
import os
directory = r'C:\Users\admin'
for filename in os.listdir(directory):
if filename.endswith(".csv"):
print(os.path.join(directory, filename))
# StepII: UID to be removed - 1002,1007,1008
df2 = df[~(df.UID.isin([1002,1007,1008]))]
# Step III: Export the new dataframes as .csv files (10 csv files)
df2.to_csv(r'mypath\data.csv)
谢谢
为此您不需要程序,当然也不需要 pandas。如果您有 Linux 个工具:
grep -v -e 1002, -e 1007, -e 1008, incoming.csv > fixed.csv
Windows:
findstr /v /c:1002, /c:1007, /c:1008, incoming.csv > fixed.csv
因此,在批处理文件中:
cd C:\Users\admin
mkdir fixed
for %i in (*.csv) do findstr /v /c:1002, /c:1007, /c:1008, %%i > fixed\%%i
试试这个:
import os
directory = r'C:\Users\admin'
for filename in os.listdir(directory):
if filename.endswith(".csv"):
filepath = os.path.join(directory, filename)
df = pd.read_csv(filepath)
df2 = df[~df['UID'].isin([1002,1007,1008])]
filename, ext = filepath.rsplit('.', maxsplit=1)
filename = f'{filename}_mod.{ext}'
df2.to_csv(filename)
注意:@TimRoberts 是对的,pandas 在这里有点矫枉过正,但如果您想在这里学习,这是一个可能的解决方案。
抱歉我的英语不好
第二步:
如果我没记错的话,你想从此列表 [1001,1002,1003,1004,1005,1006,1007,1008,1009,1010] 中删除值 [1002,1007,1008]在 df 字典中。很简单,你像这样遍历字典的键:
values = [1002,1007,1008]
for key in df.keys():
然后检查该键的值中是否有任何要删除的值
values = [1002,1007,1008]
for key in df.keys():
for value in values:
if value in df[key]:
df[key].remove(value)
第三步
import csv
with open('my_file.csv', mode='w') as file:
file_writer = csv.writer(file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
file_writer.writerow(df)
我有 10 个 csv
文件,在每个文件中,我想删除 UID
列中包含以下数字的行 - 1002
、1007
、1008
.
请注意,所有 10 个 csv
文件都具有相同的列名称
# one of the csv files looks like this
import pandas as pd
df = {
'UID':[1001,1002,1003,1004,1005,1006,1007,1008,1009,1010],
'Name':['Ray','James','Juelz','Cam','Jim','Jones','Bleek','Shawn','Beanie','Amil'],
'Income':[100.22,199.10, 191.13,199.99,230.6,124.2,122.9,128.7,188.12,111.3],
'Age':[24,32,27,54,23,41,44,29,30,68]
}
df = pd.DataFrame(df)
df = df[['UID','Name','Age','Income']]
df
尝试
#I know I need a for loop or glob to iterate through the folder and filter out the desired UIDs. My dilemma is I don't know how to incorporate steps II & III in I
#Step I: looping through the .csv files in the folder
import os
directory = r'C:\Users\admin'
for filename in os.listdir(directory):
if filename.endswith(".csv"):
print(os.path.join(directory, filename))
# StepII: UID to be removed - 1002,1007,1008
df2 = df[~(df.UID.isin([1002,1007,1008]))]
# Step III: Export the new dataframes as .csv files (10 csv files)
df2.to_csv(r'mypath\data.csv)
谢谢
为此您不需要程序,当然也不需要 pandas。如果您有 Linux 个工具:
grep -v -e 1002, -e 1007, -e 1008, incoming.csv > fixed.csv
Windows:
findstr /v /c:1002, /c:1007, /c:1008, incoming.csv > fixed.csv
因此,在批处理文件中:
cd C:\Users\admin
mkdir fixed
for %i in (*.csv) do findstr /v /c:1002, /c:1007, /c:1008, %%i > fixed\%%i
试试这个:
import os
directory = r'C:\Users\admin'
for filename in os.listdir(directory):
if filename.endswith(".csv"):
filepath = os.path.join(directory, filename)
df = pd.read_csv(filepath)
df2 = df[~df['UID'].isin([1002,1007,1008])]
filename, ext = filepath.rsplit('.', maxsplit=1)
filename = f'{filename}_mod.{ext}'
df2.to_csv(filename)
注意:@TimRoberts 是对的,pandas 在这里有点矫枉过正,但如果您想在这里学习,这是一个可能的解决方案。
抱歉我的英语不好
第二步:
如果我没记错的话,你想从此列表 [1001,1002,1003,1004,1005,1006,1007,1008,1009,1010] 中删除值 [1002,1007,1008]在 df 字典中。很简单,你像这样遍历字典的键:
values = [1002,1007,1008]
for key in df.keys():
然后检查该键的值中是否有任何要删除的值
values = [1002,1007,1008]
for key in df.keys():
for value in values:
if value in df[key]:
df[key].remove(value)
第三步
import csv
with open('my_file.csv', mode='w') as file:
file_writer = csv.writer(file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
file_writer.writerow(df)