从具有非数字索引的数据框中删除行
Drop rows from a dataframe with a non-numeric index
我一直在使用 pandas 对 CSV 文件进行一些有趣的过滤,但 运行 成为障碍。我正在尝试检查我的索引列中是否有乱码文本(非整数)数据,并删除这些行。我试过在导入时使用条件从数据框中删除它们,我试过在没有成功的情况下迭代它们。这是一个例子:
df = pd.read_csv(file, encoding='cp1252').set_index("numbers")
results = df[df["columnA"].str.contains("search_data") & ~df["columnB"].isin(seach_list)]
#I need to add to the above statement to check column "numbers" which I have set to be the index,
#to catch some expected garbled text and filter it out... because it is
#an integer, I can't use str.contains or isdigit or isalnum, I've tried to do len(df["columns"] < 20 , df.index < 20 .... i've tried
#i've tried a few other options at this point as well
# after bringing it in, I've also tried iterating through it:
#
for index, row in results.iterrows():
if not (isinstance( row["numbers"], int )):
print(str(row["numbers"]))
#append whole row to new dataframe
#This also didn't work
对我能做什么有什么想法吗?
Example data in the "numbers columns = 329381432
Example garbled text in "numbers" column that I am
trying to keep from importing: äu$ÒÔ”5$ò"Â$”äu$ÒÔ”5$ò
附带说明一下,我不得不更改 pd 函数的编码,以便在存在一些非 utf-8 数据时我仍然可以读取文件中的所有好数据......否则它会抛出一个导入错误。
您可以使用 pd.to_numeric
将 numbers
列转换为数字。所有非数字条目都将被强制为 NaN
,然后您可以删除这些行。
df = pd.read_csv(file, encoding='cp1252')
df['numbers'] = pd.to_numeric(df['numbers'], errors='coerce')
df = df.dropna(subset=['numbers']).set_index('numbers')
我一直在使用 pandas 对 CSV 文件进行一些有趣的过滤,但 运行 成为障碍。我正在尝试检查我的索引列中是否有乱码文本(非整数)数据,并删除这些行。我试过在导入时使用条件从数据框中删除它们,我试过在没有成功的情况下迭代它们。这是一个例子:
df = pd.read_csv(file, encoding='cp1252').set_index("numbers")
results = df[df["columnA"].str.contains("search_data") & ~df["columnB"].isin(seach_list)]
#I need to add to the above statement to check column "numbers" which I have set to be the index,
#to catch some expected garbled text and filter it out... because it is
#an integer, I can't use str.contains or isdigit or isalnum, I've tried to do len(df["columns"] < 20 , df.index < 20 .... i've tried
#i've tried a few other options at this point as well
# after bringing it in, I've also tried iterating through it:
#
for index, row in results.iterrows():
if not (isinstance( row["numbers"], int )):
print(str(row["numbers"]))
#append whole row to new dataframe
#This also didn't work
对我能做什么有什么想法吗?
Example data in the "numbers columns = 329381432
Example garbled text in "numbers" column that I am
trying to keep from importing: äu$ÒÔ”5$ò"Â$”äu$ÒÔ”5$ò
附带说明一下,我不得不更改 pd 函数的编码,以便在存在一些非 utf-8 数据时我仍然可以读取文件中的所有好数据......否则它会抛出一个导入错误。
您可以使用 pd.to_numeric
将 numbers
列转换为数字。所有非数字条目都将被强制为 NaN
,然后您可以删除这些行。
df = pd.read_csv(file, encoding='cp1252')
df['numbers'] = pd.to_numeric(df['numbers'], errors='coerce')
df = df.dropna(subset=['numbers']).set_index('numbers')