在 CSV 文件/Pandas 数据框中查找 header 行的行号
Finding the row number for the header row in a CSV file / Pandas Dataframe
我正在尝试为我的 CSV 文件中包含 header 的行获取索引或行号。
问题是,header 行可以根据我们系统报告的输出上下移动(我无法控制更改它)
代码:
ht = pd.read_csv(file.csv)
test = ht.get_loc('Code') #Code being header im using to locate the header row
csv1 = read_csv(file.csv, header=test)
df1 = df1.append(csv1) #Appending as have many files
如果我要打印测试,我希望数字在 4 或 5 左右,这就是我在第二次阅读中输入的内容 "read_csv"
我得到的错误是它需要 1 header 列,但我有 26 列。我只是想使用第一个 header 字符串来获取行号
谢谢
:-)
编辑:
CSV 格式
This file contains the data around the volume of items blablalbla
the deadlines for delivery of items a - z is 5 days
the deadlines for delivery of items aa through zz are 3 days
the deadlines for delivery of items aaa through zzz are 1 days
code,type,arrived_date,est_del_date
a/wrwgwr12/001,kids,12-dec-18,17-dec-18
aa/gjghgj35/030,pet,15-dec-18,18-dec-18
您会看到 "The deadlines" 行是相同的,这可以是 3 或 5,具体取决于代码 ID,因此 header 行可以向上或向下变化。
我也没有写出所有 26 列 header,不确定是否重要。
求DF格式
index | code | type | arrived_date | est_del_date
1 | a/wrwgwr12/001 | kids | 12-dec-18 | 17-dec-18
2 | aa/gjghgj35/030 | Pet | 15-dec-18 | 18-dec-18
希望这是有道理的..
谢谢,
您可以使用csv
模块找到包含分隔符的第一行,然后将该行的索引作为skiprows
参数提供给pd.read_csv
:
from io import StringIO
import csv
import pandas as pd
x = """This file contains the data around the volume of items blablalbla
the deadlines for delivery of items a - z is 5 days
the deadlines for delivery of items aa through zz are 3 days
the deadlines for delivery of items aaa through zzz are 1 days
code,type,arrived_date,est_del_date
a/wrwgwr12/001,kids,12-dec-18,17-dec-18
aa/gjghgj35/030,pet,15-dec-18,18-dec-18"""
# replace StringIO(x) with open('file.csv', 'r')
with StringIO(x) as fin:
reader = csv.reader(fin)
idx = next(idx for idx, row in enumerate(reader) if len(row) > 1) # 4
# replace StringIO(x) with 'file.csv'
df = pd.read_csv(StringIO(x), skiprows=idx)
print(df)
code type arrived_date est_del_date
0 a/wrwgwr12/001 kids 12-dec-18 17-dec-18
1 aa/gjghgj35/030 pet 15-dec-18 18-dec-18
我正在尝试为我的 CSV 文件中包含 header 的行获取索引或行号。 问题是,header 行可以根据我们系统报告的输出上下移动(我无法控制更改它)
代码:
ht = pd.read_csv(file.csv)
test = ht.get_loc('Code') #Code being header im using to locate the header row
csv1 = read_csv(file.csv, header=test)
df1 = df1.append(csv1) #Appending as have many files
如果我要打印测试,我希望数字在 4 或 5 左右,这就是我在第二次阅读中输入的内容 "read_csv"
我得到的错误是它需要 1 header 列,但我有 26 列。我只是想使用第一个 header 字符串来获取行号
谢谢 :-)
编辑:
CSV 格式
This file contains the data around the volume of items blablalbla
the deadlines for delivery of items a - z is 5 days
the deadlines for delivery of items aa through zz are 3 days
the deadlines for delivery of items aaa through zzz are 1 days
code,type,arrived_date,est_del_date
a/wrwgwr12/001,kids,12-dec-18,17-dec-18
aa/gjghgj35/030,pet,15-dec-18,18-dec-18
您会看到 "The deadlines" 行是相同的,这可以是 3 或 5,具体取决于代码 ID,因此 header 行可以向上或向下变化。
我也没有写出所有 26 列 header,不确定是否重要。
求DF格式
index | code | type | arrived_date | est_del_date
1 | a/wrwgwr12/001 | kids | 12-dec-18 | 17-dec-18
2 | aa/gjghgj35/030 | Pet | 15-dec-18 | 18-dec-18
希望这是有道理的..
谢谢,
您可以使用csv
模块找到包含分隔符的第一行,然后将该行的索引作为skiprows
参数提供给pd.read_csv
:
from io import StringIO
import csv
import pandas as pd
x = """This file contains the data around the volume of items blablalbla
the deadlines for delivery of items a - z is 5 days
the deadlines for delivery of items aa through zz are 3 days
the deadlines for delivery of items aaa through zzz are 1 days
code,type,arrived_date,est_del_date
a/wrwgwr12/001,kids,12-dec-18,17-dec-18
aa/gjghgj35/030,pet,15-dec-18,18-dec-18"""
# replace StringIO(x) with open('file.csv', 'r')
with StringIO(x) as fin:
reader = csv.reader(fin)
idx = next(idx for idx, row in enumerate(reader) if len(row) > 1) # 4
# replace StringIO(x) with 'file.csv'
df = pd.read_csv(StringIO(x), skiprows=idx)
print(df)
code type arrived_date est_del_date
0 a/wrwgwr12/001 kids 12-dec-18 17-dec-18
1 aa/gjghgj35/030 pet 15-dec-18 18-dec-18