当列包含重复的 headers 时重新格式化 pandas table
Reformatting pandas table when column contains repeated headers
我有下面的 pandas DataFrame,我想对其进行排序,使得 ["File Name"、"File Start Time" 等] 是列 headers。我可以想象 运行 在行中循环寻找字符串,但也许有更简单的选择?
import pandas as pd
data = pd.read_csv(file_path + 'chb01-summary.txt',skiprows = 28, header=None, delimiter = ": ")
文件来源https://www.physionet.org/pn6/chbmit/chb01/chb01-summary.txt
您可以使用 read_csv
and reshape by unstack
:
url = 'https://www.physionet.org/pn6/chbmit/chb01/chb01-summary.txt'
df = pd.read_csv(url, skiprows=28, sep=':\s+', names=['a','b'], engine='python')
print (df.head())
a b
0 File Name chb01_01.edf
1 File Start Time 11:42:54
2 File End Time 12:42:54
3 Number of Seizures in File 0
4 File Name chb01_02.edf
df = df.set_index([df['a'].eq('File Name').cumsum(), 'a'])['b']
.unstack()
.reset_index(drop=True)
print (df.head())
a File End Time File Name File Start Time Number of Seizures in File \
0 12:42:54 chb01_01.edf 11:42:54 0
1 13:42:57 chb01_02.edf 12:42:57 0
2 14:43:04 chb01_03.edf 13:43:04 1
3 15:43:12 chb01_04.edf 14:43:12 1
4 16:43:19 chb01_05.edf 15:43:19 0
a Seizure End Time Seizure Start Time
0 None None
1 None None
2 3036 seconds 2996 seconds
3 1494 seconds 1467 seconds
4 None None
我有下面的 pandas DataFrame,我想对其进行排序,使得 ["File Name"、"File Start Time" 等] 是列 headers。我可以想象 运行 在行中循环寻找字符串,但也许有更简单的选择?
import pandas as pd
data = pd.read_csv(file_path + 'chb01-summary.txt',skiprows = 28, header=None, delimiter = ": ")
文件来源https://www.physionet.org/pn6/chbmit/chb01/chb01-summary.txt
您可以使用 read_csv
and reshape by unstack
:
url = 'https://www.physionet.org/pn6/chbmit/chb01/chb01-summary.txt'
df = pd.read_csv(url, skiprows=28, sep=':\s+', names=['a','b'], engine='python')
print (df.head())
a b
0 File Name chb01_01.edf
1 File Start Time 11:42:54
2 File End Time 12:42:54
3 Number of Seizures in File 0
4 File Name chb01_02.edf
df = df.set_index([df['a'].eq('File Name').cumsum(), 'a'])['b']
.unstack()
.reset_index(drop=True)
print (df.head())
a File End Time File Name File Start Time Number of Seizures in File \
0 12:42:54 chb01_01.edf 11:42:54 0
1 13:42:57 chb01_02.edf 12:42:57 0
2 14:43:04 chb01_03.edf 13:43:04 1
3 15:43:12 chb01_04.edf 14:43:12 1
4 16:43:19 chb01_05.edf 15:43:19 0
a Seizure End Time Seizure Start Time
0 None None
1 None None
2 3036 seconds 2996 seconds
3 1494 seconds 1467 seconds
4 None None