我需要在 pandas DataFrame 中的特定位置添加特定行
I need to add specific rows in pandas DataFrame, at specific position
我目前正在做一个项目,我需要在标记的句子结束时添加特定的行。每当 'N' 列等于 1 时,就意味着开始了一个新句子。我想为每个句子添加两行:一行 'Pos'= START 在句子的开头,一行 'Pos'=End 在每行的末尾。
这是 DataFrame 的样子:
POSTAG = {
'N': [1,2,3,4,5,6,7,8,9,10,11,12,13,14,1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9,10,11,1,2,3,4,5,6,7,8,9],
'Name': ['ἐρᾷ','μὲν','ἁγνὸς','οὐρανὸς','τρῶσαι','χθόνα',',','ἔρως','δὲ','γαῖαν','λαμβάνει','γάμου','τυχεῖν','.','ὄμβρος','δ̓','ἀπ̓','εὐνάοντος','οὐρανοῦ','πεσὼν','ἔκυσε','γαῖαν','.','ἡ','δὲ','τίκτεται','βροτοῖς','μήλων','τε','βοσκὰς','καὶ','βίον','Δημήτριον','.','δενδρῶτις','ὥρα','δ̓','ἐκ','νοτίζοντος','γάμου','τέλειος','ἐστί','.'],
'Pos': ['VERB','ADV','ADJ','NOUN','VERB','NOUN','PUNCT','NOUN','CCONJ','NOUN','VERB','NOUN','VERB','PUNCT','NOUN','ADV','ADP','ADJ','NOUN','VERB','VERB','NOUN','PUNCT','DET','ADV','VERB','NOUN','NOUN','ADV','NOUN','CCONJ','NOUN','ADJ','PUNCT','NOUN','NOUN','ADV','ADP','VERB','NOUN','ADJ','VERB','PUNCT']
}
df = pd.DataFrame(POSTAG, columns = ['N', 'Name','Pos'])
print (df)
在这种情况下,我需要在索引 0 和 15 处有一个 [Nan, Nan, START] 标签。在索引 14 处需要一个 [Nan,Nan, END] 标签。我需要为我的所有 df 创建它。我该怎么做?
分析您的数据框,我只是假设您想在 N
列的值 1
之前插入 START
,并在 END
列的最大连续值之后插入 END
=17=]。如果是这样,您可以执行以下操作
首先创建两个虚拟数据框start_df
和end_df
start_df = pd.DataFrame({'N': [np.nan], 'Name': [np.nan], 'Pos': ['->START']})
end_df = pd.DataFrame({'N': [np.nan], 'Name': [np.nan], 'Pos': ['END<-']})
然后在N
列中拆分具有连续值的数据框
mask = ~df['N'].diff().fillna(0).eq(1)
gb = df.groupby(mask.cumsum())
groups = [gb.get_group(x) for x in gb.groups]
此外,在每组前后插入虚拟数据帧
res = []
for group in groups:
res.append(start_df)
res.append(group)
res.append(end_df)
最后,通过连接列表中的数据帧来创建数据帧
df_ = pd.concat(res).reset_index(drop=True)
# print(df_)
N Name Pos
0 NaN NaN ->START
1 1.0 ἐρᾷ VERB
2 2.0 μὲν ADV
3 3.0 ἁγνὸς ADJ
4 4.0 οὐρανὸς NOUN
5 5.0 τρῶσαι VERB
6 6.0 χθόνα NOUN
7 7.0 , PUNCT
8 8.0 ἔρως NOUN
9 9.0 δὲ CCONJ
10 10.0 γαῖαν NOUN
11 11.0 λαμβάνει VERB
12 12.0 γάμου NOUN
13 13.0 τυχεῖν VERB
14 14.0 . PUNCT
15 NaN NaN END<-
16 NaN NaN ->START
17 1.0 ὄμβρος NOUN
18 2.0 δ̓ ADV
19 3.0 ἀπ̓ ADP
20 4.0 εὐνάοντος ADJ
21 5.0 οὐρανοῦ NOUN
22 6.0 πεσὼν VERB
23 7.0 ἔκυσε VERB
24 8.0 γαῖαν NOUN
25 9.0 . PUNCT
26 NaN NaN END<-
27 NaN NaN ->START
28 1.0 ἡ DET
29 2.0 δὲ ADV
30 3.0 τίκτεται VERB
31 4.0 βροτοῖς NOUN
32 5.0 μήλων NOUN
33 6.0 τε ADV
34 7.0 βοσκὰς NOUN
35 8.0 καὶ CCONJ
36 9.0 βίον NOUN
37 10.0 Δημήτριον ADJ
38 11.0 . PUNCT
39 NaN NaN END<-
40 NaN NaN ->START
41 1.0 δενδρῶτις NOUN
42 2.0 ὥρα NOUN
43 3.0 δ̓ ADV
44 4.0 ἐκ ADP
45 5.0 νοτίζοντος VERB
46 6.0 γάμου NOUN
47 7.0 τέλειος ADJ
48 8.0 ἐστί VERB
49 9.0 . PUNCT
50 NaN NaN END<-
我目前正在做一个项目,我需要在标记的句子结束时添加特定的行。每当 'N' 列等于 1 时,就意味着开始了一个新句子。我想为每个句子添加两行:一行 'Pos'= START 在句子的开头,一行 'Pos'=End 在每行的末尾。 这是 DataFrame 的样子:
POSTAG = {
'N': [1,2,3,4,5,6,7,8,9,10,11,12,13,14,1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9,10,11,1,2,3,4,5,6,7,8,9],
'Name': ['ἐρᾷ','μὲν','ἁγνὸς','οὐρανὸς','τρῶσαι','χθόνα',',','ἔρως','δὲ','γαῖαν','λαμβάνει','γάμου','τυχεῖν','.','ὄμβρος','δ̓','ἀπ̓','εὐνάοντος','οὐρανοῦ','πεσὼν','ἔκυσε','γαῖαν','.','ἡ','δὲ','τίκτεται','βροτοῖς','μήλων','τε','βοσκὰς','καὶ','βίον','Δημήτριον','.','δενδρῶτις','ὥρα','δ̓','ἐκ','νοτίζοντος','γάμου','τέλειος','ἐστί','.'],
'Pos': ['VERB','ADV','ADJ','NOUN','VERB','NOUN','PUNCT','NOUN','CCONJ','NOUN','VERB','NOUN','VERB','PUNCT','NOUN','ADV','ADP','ADJ','NOUN','VERB','VERB','NOUN','PUNCT','DET','ADV','VERB','NOUN','NOUN','ADV','NOUN','CCONJ','NOUN','ADJ','PUNCT','NOUN','NOUN','ADV','ADP','VERB','NOUN','ADJ','VERB','PUNCT']
}
df = pd.DataFrame(POSTAG, columns = ['N', 'Name','Pos'])
print (df)
在这种情况下,我需要在索引 0 和 15 处有一个 [Nan, Nan, START] 标签。在索引 14 处需要一个 [Nan,Nan, END] 标签。我需要为我的所有 df 创建它。我该怎么做?
分析您的数据框,我只是假设您想在 N
列的值 1
之前插入 START
,并在 END
列的最大连续值之后插入 END
=17=]。如果是这样,您可以执行以下操作
首先创建两个虚拟数据框start_df
和end_df
start_df = pd.DataFrame({'N': [np.nan], 'Name': [np.nan], 'Pos': ['->START']})
end_df = pd.DataFrame({'N': [np.nan], 'Name': [np.nan], 'Pos': ['END<-']})
然后在N
mask = ~df['N'].diff().fillna(0).eq(1)
gb = df.groupby(mask.cumsum())
groups = [gb.get_group(x) for x in gb.groups]
此外,在每组前后插入虚拟数据帧
res = []
for group in groups:
res.append(start_df)
res.append(group)
res.append(end_df)
最后,通过连接列表中的数据帧来创建数据帧
df_ = pd.concat(res).reset_index(drop=True)
# print(df_)
N Name Pos
0 NaN NaN ->START
1 1.0 ἐρᾷ VERB
2 2.0 μὲν ADV
3 3.0 ἁγνὸς ADJ
4 4.0 οὐρανὸς NOUN
5 5.0 τρῶσαι VERB
6 6.0 χθόνα NOUN
7 7.0 , PUNCT
8 8.0 ἔρως NOUN
9 9.0 δὲ CCONJ
10 10.0 γαῖαν NOUN
11 11.0 λαμβάνει VERB
12 12.0 γάμου NOUN
13 13.0 τυχεῖν VERB
14 14.0 . PUNCT
15 NaN NaN END<-
16 NaN NaN ->START
17 1.0 ὄμβρος NOUN
18 2.0 δ̓ ADV
19 3.0 ἀπ̓ ADP
20 4.0 εὐνάοντος ADJ
21 5.0 οὐρανοῦ NOUN
22 6.0 πεσὼν VERB
23 7.0 ἔκυσε VERB
24 8.0 γαῖαν NOUN
25 9.0 . PUNCT
26 NaN NaN END<-
27 NaN NaN ->START
28 1.0 ἡ DET
29 2.0 δὲ ADV
30 3.0 τίκτεται VERB
31 4.0 βροτοῖς NOUN
32 5.0 μήλων NOUN
33 6.0 τε ADV
34 7.0 βοσκὰς NOUN
35 8.0 καὶ CCONJ
36 9.0 βίον NOUN
37 10.0 Δημήτριον ADJ
38 11.0 . PUNCT
39 NaN NaN END<-
40 NaN NaN ->START
41 1.0 δενδρῶτις NOUN
42 2.0 ὥρα NOUN
43 3.0 δ̓ ADV
44 4.0 ἐκ ADP
45 5.0 νοτίζοντος VERB
46 6.0 γάμου NOUN
47 7.0 τέλειος ADJ
48 8.0 ἐστί VERB
49 9.0 . PUNCT
50 NaN NaN END<-