Pandas：select 行如何基于前一行中的字符串 - 应该是一个简单的解决方案

Question

我有一个 csv 文件。如何打印具有特定字符串的行之后的行？我需要打印其中包含“ixation”的所有行，然后打印此行之后的行。

这是我当前的代码

df = pd.read_csv('locationof.csv')
df = pd.DataFrame(data, columns = ['Trial', 'Code','Time','Duration'])
list1 = ['100_1to3_start','fixation','Fixation','66_1to3_start']
contain_values = df[df['Code'].str.contains('|'.join(list1), na=False)]

这是我当前的输出...

2      1.0                fixation_dummy    50637.0   25086.0
4      2.0                fixation_dummy    75889.0   25086.0
7      3.0                fixation_dummy   101141.0   25086.0
9      4.0                fixation_dummy   126393.0   25086.0
13     6.0  100_1to3_start_2034_1_0_1060   151811.0   20268.0
23     9.0  100_1to3_start_2456_4_0_2054   216104.0   24587.0
33    12.0  100_1to3_start_1507_7_0_2446   283885.0   15118.0
43    15.0                      Fixation   332229.0  130081.0
55    17.0   66_1to3_start_2369_2_0_2352   484904.0   23590.0
76    23.0   66_1to3_start_1539_8_0_2518   615150.0   15285.0
82    25.0                      Fixation   654357.0  130081.0
123   35.0                      Fixation   996089.0  130081.0
164   45.0                      Fixation  1343635.0  130081.0
174   46.0   66_1to3_start_1884_1_0_2537  1473882.0   18773.0
197   53.0   66_1to3_start_1541_8_0_2545  1621074.0   15284.0
204   55.0                      Fixation  1662939.0  130080.0
213   56.0  100_1to3_start_2115_1_0_2528  1793186.0   21098.0
223   59.0  100_1to3_start_1892_4_0_2544  1859638.0   18939.0
233   62.0  100_1to3_start_2315_7_0_2537  1918282.0   23259.0

但是我想要...

2      1.0                fixation_dummy    50637.0   25086.0
4      2.0                fixation_dummy    75889.0   25086.0
7      3.0                fixation_dummy   101141.0   25086.0
9      4.0                fixation_dummy   126393.0   25086.0
13     6.0  100_1to3_start_2034_1_0_1060   151811.0   20268.0
43    15.0                      Fixation   332229.0  130081.0
55    17.0   66_1to3_start_2369_2_0_2352   484904.0   23590.0
82    25.0                      Fixation   654357.0  130081.0
123   35.0                      Fixation   996089.0  130081.0
164   45.0                      Fixation  1343635.0  130081.0
174   46.0   66_1to3_start_1884_1_0_2537  1473882.0   18773.0
204   55.0                      Fixation  1662939.0  130080.0
213   56.0  100_1to3_start_2115_1_0_2528  1793186.0   21098.0

如何只打印出仅跟在带有“ixation”的行后面的行 66_1to3.., 100_1to3...)？此代码将运行覆盖一系列 csv 文件，其中我需要的确切行因 csv 文件而异。

Answer 1

尝试使用 shift 进行布尔索引，因为我们只关心“ixation”之后的行

list1 = ['100_1to3_start', '66_1to3_start']
df[df[2].str.contains('|'.join(list1), na=False) & df[2].shift().str.contains('ixation')]

      0     1                             2          3        4
4    13   6.0  100_1to3_start_2034_1_0_1060   151811.0  20268.0
8    55  17.0   66_1to3_start_2369_2_0_2352   484904.0  23590.0
13  174  46.0   66_1to3_start_1884_1_0_2537  1473882.0  18773.0
16  213  56.0  100_1to3_start_2115_1_0_2528  1793186.0  21098.0

请注意，根据您的示例，df[2] 将是 df['Code']

Answer 2

要回答此描述：“我需要打印其中包含“ixation”的所有行，然后是该行之后的行。”，解决方案是：

# identify rows with "ixation"
mask = df['Code'].str.contains('ixation')

# select them and one row below
out = df[mask|mask.shift()]

Pandas：select 行如何基于前一行中的字符串 - 应该是一个简单的解决方案

Pandas: how select row based on string in previous row - should be a simple solution

python

csv

pandas