在列值第二次出现后删除所有行
Drop All Rows After SECOND Occurrence of Column Value
我想将我已转换为数据框的 .txt 文件中的所有数据放在列值的第二个实例之后。在这种情况下,分隔符“---”。
Dataframe构造如下:
15 Leading Causes of Death 15 Code Deaths Population Crude Rate Crude Rate Lower 95% Confidence Interval Crude Rate Upper 95% Confidence Interval
#Accidents (unintentional injuries) (V01-X59,Y85-Y86) GR113-112 21 152430 13.8 8.5 21.1
#Intentional self-harm (suicide) (*U03,X60-X84,Y87.0) GR113-124 15 152430 Unreliable 5.5 16.2
---
Dataset: Underlying Cause of Death, 1999-2019
Query Parameters:
States: Marin County, CA (06041)
Ten-Year Age Groups: 25-34 years
Year/Month: 1999; 2000; 2001; 2002; 2003
Group By: 15 Leading Causes of Death
Show Totals: Disabled
Show Zero Values: Disabled
Show Suppressed: Disabled
Calculate Rates Per: 100,000
Rate Options: Default intercensal populations for years 2001-2009 (except Infant Age Groups)
---
Help: See http://wonder.cdc.gov/wonder/help/ucd.html for more information.
---
Query Date: Sep 23, 2021 6:51:59 PM
我已经看到很多关于如何在列值或 NaN 等的第一个实例之后执行此操作的解决方案,但对于第二个或第 n 个没有任何解决方案...
这是我目前在文件中阅读的简单代码。
import pandas as pd
dl = pd.read_csv('Underlying Cause of Death, 1999-2019(3).txt', sep = '\t')
dl.to_csv('test.csv', index = False)
查找以“---”开头的行并应用累加和,然后获取等于 2 的第一行的索引并将您的数据帧切片到该索引。
>>> df.iloc[:df.iloc[:, 0].str.startswith('---').cumsum().eq(2).idxmax()]
0 #Accidents (unintentional injuries) (V01-X59,Y... GR113-112 21.0 152430.0 13.8 8.5 21.1
1 #Intentional self-harm (suicide) (*U03,X60-X84... GR113-124 15.0 152430.0 Unreliable 5.5 16.2
2 --- NaN NaN NaN NaN NaN NaN
3 Dataset: Underlying Cause of Death, 1999-2019 NaN NaN NaN NaN NaN NaN
4 Query Parameters: NaN NaN NaN NaN NaN NaN
5 States: Marin County, CA (06041) NaN NaN NaN NaN NaN NaN
6 Ten-Year Age Groups: 25-34 years NaN NaN NaN NaN NaN NaN
7 Year/Month: 1999 2000 2001.0 2002.0 2003 NaN NaN
8 Group By: 15 Leading Causes of Death NaN NaN NaN NaN NaN NaN
9 Show Totals: Disabled NaN NaN NaN NaN NaN NaN
10 Show Zero Values: Disabled NaN NaN NaN NaN NaN NaN
11 Show Suppressed: Disabled NaN NaN NaN NaN NaN NaN
12 Calculate Rates Per: 100,000 NaN NaN NaN NaN NaN NaN
13 Rate Options: Default intercensal populations ... NaN NaN NaN NaN NaN NaN
我想将我已转换为数据框的 .txt 文件中的所有数据放在列值的第二个实例之后。在这种情况下,分隔符“---”。
Dataframe构造如下:
15 Leading Causes of Death 15 Code Deaths Population Crude Rate Crude Rate Lower 95% Confidence Interval Crude Rate Upper 95% Confidence Interval
#Accidents (unintentional injuries) (V01-X59,Y85-Y86) GR113-112 21 152430 13.8 8.5 21.1
#Intentional self-harm (suicide) (*U03,X60-X84,Y87.0) GR113-124 15 152430 Unreliable 5.5 16.2
---
Dataset: Underlying Cause of Death, 1999-2019
Query Parameters:
States: Marin County, CA (06041)
Ten-Year Age Groups: 25-34 years
Year/Month: 1999; 2000; 2001; 2002; 2003
Group By: 15 Leading Causes of Death
Show Totals: Disabled
Show Zero Values: Disabled
Show Suppressed: Disabled
Calculate Rates Per: 100,000
Rate Options: Default intercensal populations for years 2001-2009 (except Infant Age Groups)
---
Help: See http://wonder.cdc.gov/wonder/help/ucd.html for more information.
---
Query Date: Sep 23, 2021 6:51:59 PM
我已经看到很多关于如何在列值或 NaN 等的第一个实例之后执行此操作的解决方案,但对于第二个或第 n 个没有任何解决方案...
这是我目前在文件中阅读的简单代码。
import pandas as pd
dl = pd.read_csv('Underlying Cause of Death, 1999-2019(3).txt', sep = '\t')
dl.to_csv('test.csv', index = False)
查找以“---”开头的行并应用累加和,然后获取等于 2 的第一行的索引并将您的数据帧切片到该索引。
>>> df.iloc[:df.iloc[:, 0].str.startswith('---').cumsum().eq(2).idxmax()]
0 #Accidents (unintentional injuries) (V01-X59,Y... GR113-112 21.0 152430.0 13.8 8.5 21.1
1 #Intentional self-harm (suicide) (*U03,X60-X84... GR113-124 15.0 152430.0 Unreliable 5.5 16.2
2 --- NaN NaN NaN NaN NaN NaN
3 Dataset: Underlying Cause of Death, 1999-2019 NaN NaN NaN NaN NaN NaN
4 Query Parameters: NaN NaN NaN NaN NaN NaN
5 States: Marin County, CA (06041) NaN NaN NaN NaN NaN NaN
6 Ten-Year Age Groups: 25-34 years NaN NaN NaN NaN NaN NaN
7 Year/Month: 1999 2000 2001.0 2002.0 2003 NaN NaN
8 Group By: 15 Leading Causes of Death NaN NaN NaN NaN NaN NaN
9 Show Totals: Disabled NaN NaN NaN NaN NaN NaN
10 Show Zero Values: Disabled NaN NaN NaN NaN NaN NaN
11 Show Suppressed: Disabled NaN NaN NaN NaN NaN NaN
12 Calculate Rates Per: 100,000 NaN NaN NaN NaN NaN NaN
13 Rate Options: Default intercensal populations ... NaN NaN NaN NaN NaN NaN