在列值第二次出现后删除所有行

Question

我想将我已转换为数据框的 .txt 文件中的所有数据放在列值的第二个实例之后。在这种情况下，分隔符“---”。

Dataframe构造如下：

15 Leading Causes of Death  15                          Code        Deaths      Population          Crude Rate  Crude Rate Lower 95% Confidence Interval    Crude Rate Upper 95% Confidence Interval
#Accidents (unintentional injuries) (V01-X59,Y85-Y86)   GR113-112   21          152430              13.8        8.5                                         21.1
#Intentional self-harm (suicide) (*U03,X60-X84,Y87.0)   GR113-124   15          152430              Unreliable  5.5                                         16.2
---                     
Dataset: Underlying Cause of Death, 1999-2019                       
Query Parameters:                       
States: Marin County, CA (06041)                        
Ten-Year Age Groups: 25-34 years                        
Year/Month: 1999; 2000; 2001; 2002; 2003                        
Group By: 15 Leading Causes of Death                        
Show Totals: Disabled                       
Show Zero Values: Disabled                      
Show Suppressed: Disabled                       
Calculate Rates Per: 100,000                        
Rate Options: Default intercensal populations for years 2001-2009 (except Infant Age Groups)                        
---                     
Help: See http://wonder.cdc.gov/wonder/help/ucd.html for more information.                      
---                     
Query Date: Sep 23, 2021 6:51:59 PM

我已经看到很多关于如何在列值或 NaN 等的第一个实例之后执行此操作的解决方案，但对于第二个或第 n 个没有任何解决方案...

这是我目前在文件中阅读的简单代码。

import pandas as pd

dl = pd.read_csv('Underlying Cause of Death, 1999-2019(3).txt', sep = '\t')
dl.to_csv('test.csv', index = False)

Answer 1

查找以“---”开头的行并应用累加和，然后获取等于 2 的第一行的索引并将您的数据帧切片到该索引。

>>> df.iloc[:df.iloc[:, 0].str.startswith('---').cumsum().eq(2).idxmax()]

0   #Accidents (unintentional injuries) (V01-X59,Y...  GR113-112    21.0    152430.0        13.8                                       8.5                                      21.1
1   #Intentional self-harm (suicide) (*U03,X60-X84...  GR113-124    15.0    152430.0  Unreliable                                       5.5                                      16.2
2                                                 ---        NaN     NaN         NaN         NaN                                       NaN                                       NaN
3       Dataset: Underlying Cause of Death, 1999-2019        NaN     NaN         NaN         NaN                                       NaN                                       NaN
4                                   Query Parameters:        NaN     NaN         NaN         NaN                                       NaN                                       NaN
5                    States: Marin County, CA (06041)        NaN     NaN         NaN         NaN                                       NaN                                       NaN
6                    Ten-Year Age Groups: 25-34 years        NaN     NaN         NaN         NaN                                       NaN                                       NaN
7                                    Year/Month: 1999       2000  2001.0      2002.0        2003                                       NaN                                       NaN
8                Group By: 15 Leading Causes of Death        NaN     NaN         NaN         NaN                                       NaN                                       NaN
9                               Show Totals: Disabled        NaN     NaN         NaN         NaN                                       NaN                                       NaN
10                         Show Zero Values: Disabled        NaN     NaN         NaN         NaN                                       NaN                                       NaN
11                          Show Suppressed: Disabled        NaN     NaN         NaN         NaN                                       NaN                                       NaN
12                       Calculate Rates Per: 100,000        NaN     NaN         NaN         NaN                                       NaN                                       NaN
13  Rate Options: Default intercensal populations ...        NaN     NaN         NaN         NaN                                       NaN                                       NaN

在列值第二次出现后删除所有行

Drop All Rows After SECOND Occurrence of Column Value

python

csv

delimiter

dataframe

pandas