pandas 比较两个不同长度的数据帧并将某些行分成两半
pandas comparing two different data frames of length and split certain rows into half
我正在思考 Pandas 的工作原理,并且正在努力操作和比较 Pandas 数据帧。
我有三个数据框只提取了需要的信息;
subjectDF:
Subject ID Subject Year Teaching Hours PW Facility Requirement
0 Mat13 Maths 13 5 N
1 FMat13 Further Mathematics 13 5 N
2 Eco13 Economics 13 5 N
3 Geo13 Geography 13 5 N
4 His13 History 13 4 N
5 EngLang13 English Language 13 4 N
6 EngLit13 English Literature 13 4 N
7 Ger13 German 13 4 N
8 Fre13 French 13 4 N
9 Spa13 Spanish 13 4 N
10 Bus13 Business 13 4 N
11 Film13 Film Studies 13 4 N
12 Psy13 Psychology 13 5 N
13 Lat13 Latin 13 4 N
14 Gre13 Greek 13 4 N
15 Cla13 Classical 13 4 N
16 Phil13 Philosophy 13 4 N
studentDF:
Subject Student ID Student Number
0 Art13 [S8, S19] 2
1 Bio13 [S1, S4, S12, S13, S18, S24, S25, S28, S29, S3... 17
2 Bus13 [S10, S30, S47] 3
3 Che13 [S1, S2, S3, S4, S12, S13, S14, S24, S25, S26,... 20
4 Cla13 [S9, S33, S35] 3
5 Com13 [S2, S3, S10, S14, S16, S19, S31, S45, S192, S... 10
6 Eco13 [S6, S15, S17, S20, S23, S30, S31, S36, S41, S... 13
7 EngLang13 [S9, S11, S21, S22, S47] 5
8 EngLit13 [S5, S9, S22, S28, S32, S37] 6
9 FMat13 [S7, S14, S27, S38, S45, S192] 6
10 Film13 [S8] 1
11 Fre13 [S5, S15, S18, S29, S37, S193] 6
12 Geo13 [S6, S11, S20, S23, S32, S34, S36, S41, S42, S43] 10
13 Ger13 [S17, S43, S195] 3
14 Gre13 [S33, S40] 2
15 His13 [S5, S11, S21, S22, S32, S35, S37, S41] 8
16 Lat13 [S33, S35] 2
17 Mat13 [S1, S2, S3, S4, S6, S7, S10, S12, S13, S14, S... 34
18 Phil13 [S15, S16, S21, S40, S42, S193, S194] 7
19 Phy13 [S1, S7, S26, S27, S38, S44, S48, S49, S50, S2... 12
20 Psy13 [S8, S46] 2
21 Spa13 [S18, S36, S47] 3
classroomDF:
Classroom ID Facility Capacity
0 C8 None 25
1 C9 None 30
2 C10 None 12
3 C11 None 10
4 C12 None 10
5 C13 None 10
6 C14 None 20
7 C15 None 15
8 C16 None 15
9 C17 None 22
10 C22 None 5
11 C23 None 5
我正在尝试比较 subjectDF
中的 'Subject ID'
和 studentDF
中的 'Subject'
,如果 'Subject'
中的某行未在 [=15 中列出=], 删除该行。
例如,由于 'Subject'
中的 Bio13
未在 'Subject ID'
中列出,我希望从 studentDF
中删除 Bio13
。
因此,预期输出将与 studentDF 完全相同,但没有不在 'Subject ID'.
中的行
studentDF:
Subject Student ID Student Number
0 Art13 [S8, S19] 2
1 Bus13 [S10, S30, S47] 3
我尝试了很多不同的方法,但大多数时候都会出现以下错误;
ValueError: Can only compare identically-labeled Series objects
我不确定我是否应该在这里问另一个问题,我现在会 post 如果有问题,我会 post 在另一个问题中。
修改 studentDF 后,我想比较 studentDF
中的 'Student Numbers'
与 classroomDF
中的 'Capacity'
并且如果 'Student Number' > 'Capacity',将学生和学科一分为二。例如,Mat13 有 34 个学生,这超过了 classroomDF 的最大容量。所以我想再次修改 studentDF 如下;
学生DF:
Subject Student ID Student Number
16 ....
17 Mat13_1 [S1, S2, S3, S4, S6, S7, S10, S12, S13, S14, S... 17
18 Mat13_2 [S15, S16, S... 17
....
如果能帮助解决这个问题,我们将不胜感激!
IIUC,这就是你要找的
studentDF[~studentDF['Subject'].isin(subjectDF['Subject ID'])]
输出(由于我的 Jupyter notebook 显示设置,学生 ID 列在这里看起来被截断了)
Subject Student ID Student Number
0 Art13 [S8, S19] 2
1 Bio13 [S1, S4, S12, S13, S18, S24, S25, S28, S29, S3... 17
3 Che13 [S1, S2, S3, S4, S12, S13, S14, S24, S25, S26,... 20
5 Com13 [S2, S3, S10, S14, S16, S19, S31, S45, S192, S... 10
19 Phy13 [S1, S7, S26, S27, S38, S44, S48, S49, S50, S2... 12
我正在思考 Pandas 的工作原理,并且正在努力操作和比较 Pandas 数据帧。
我有三个数据框只提取了需要的信息;
subjectDF:
Subject ID Subject Year Teaching Hours PW Facility Requirement
0 Mat13 Maths 13 5 N
1 FMat13 Further Mathematics 13 5 N
2 Eco13 Economics 13 5 N
3 Geo13 Geography 13 5 N
4 His13 History 13 4 N
5 EngLang13 English Language 13 4 N
6 EngLit13 English Literature 13 4 N
7 Ger13 German 13 4 N
8 Fre13 French 13 4 N
9 Spa13 Spanish 13 4 N
10 Bus13 Business 13 4 N
11 Film13 Film Studies 13 4 N
12 Psy13 Psychology 13 5 N
13 Lat13 Latin 13 4 N
14 Gre13 Greek 13 4 N
15 Cla13 Classical 13 4 N
16 Phil13 Philosophy 13 4 N
studentDF:
Subject Student ID Student Number
0 Art13 [S8, S19] 2
1 Bio13 [S1, S4, S12, S13, S18, S24, S25, S28, S29, S3... 17
2 Bus13 [S10, S30, S47] 3
3 Che13 [S1, S2, S3, S4, S12, S13, S14, S24, S25, S26,... 20
4 Cla13 [S9, S33, S35] 3
5 Com13 [S2, S3, S10, S14, S16, S19, S31, S45, S192, S... 10
6 Eco13 [S6, S15, S17, S20, S23, S30, S31, S36, S41, S... 13
7 EngLang13 [S9, S11, S21, S22, S47] 5
8 EngLit13 [S5, S9, S22, S28, S32, S37] 6
9 FMat13 [S7, S14, S27, S38, S45, S192] 6
10 Film13 [S8] 1
11 Fre13 [S5, S15, S18, S29, S37, S193] 6
12 Geo13 [S6, S11, S20, S23, S32, S34, S36, S41, S42, S43] 10
13 Ger13 [S17, S43, S195] 3
14 Gre13 [S33, S40] 2
15 His13 [S5, S11, S21, S22, S32, S35, S37, S41] 8
16 Lat13 [S33, S35] 2
17 Mat13 [S1, S2, S3, S4, S6, S7, S10, S12, S13, S14, S... 34
18 Phil13 [S15, S16, S21, S40, S42, S193, S194] 7
19 Phy13 [S1, S7, S26, S27, S38, S44, S48, S49, S50, S2... 12
20 Psy13 [S8, S46] 2
21 Spa13 [S18, S36, S47] 3
classroomDF:
Classroom ID Facility Capacity
0 C8 None 25
1 C9 None 30
2 C10 None 12
3 C11 None 10
4 C12 None 10
5 C13 None 10
6 C14 None 20
7 C15 None 15
8 C16 None 15
9 C17 None 22
10 C22 None 5
11 C23 None 5
我正在尝试比较 subjectDF
中的 'Subject ID'
和 studentDF
中的 'Subject'
,如果 'Subject'
中的某行未在 [=15 中列出=], 删除该行。
例如,由于 'Subject'
中的 Bio13
未在 'Subject ID'
中列出,我希望从 studentDF
中删除 Bio13
。
因此,预期输出将与 studentDF 完全相同,但没有不在 'Subject ID'.
中的行studentDF:
Subject Student ID Student Number
0 Art13 [S8, S19] 2
1 Bus13 [S10, S30, S47] 3
我尝试了很多不同的方法,但大多数时候都会出现以下错误;
ValueError: Can only compare identically-labeled Series objects
我不确定我是否应该在这里问另一个问题,我现在会 post 如果有问题,我会 post 在另一个问题中。
修改 studentDF 后,我想比较 studentDF
中的 'Student Numbers'
与 classroomDF
中的 'Capacity'
并且如果 'Student Number' > 'Capacity',将学生和学科一分为二。例如,Mat13 有 34 个学生,这超过了 classroomDF 的最大容量。所以我想再次修改 studentDF 如下;
学生DF:
Subject Student ID Student Number
16 ....
17 Mat13_1 [S1, S2, S3, S4, S6, S7, S10, S12, S13, S14, S... 17
18 Mat13_2 [S15, S16, S... 17
....
如果能帮助解决这个问题,我们将不胜感激!
IIUC,这就是你要找的
studentDF[~studentDF['Subject'].isin(subjectDF['Subject ID'])]
输出(由于我的 Jupyter notebook 显示设置,学生 ID 列在这里看起来被截断了)
Subject Student ID Student Number
0 Art13 [S8, S19] 2
1 Bio13 [S1, S4, S12, S13, S18, S24, S25, S28, S29, S3... 17
3 Che13 [S1, S2, S3, S4, S12, S13, S14, S24, S25, S26,... 20
5 Com13 [S2, S3, S10, S14, S16, S19, S31, S45, S192, S... 10
19 Phy13 [S1, S7, S26, S27, S38, S44, S48, S49, S50, S2... 12