pandas 比较两个不同长度的数据帧并将某些行分成两半

pandas comparing two different data frames of length and split certain rows into half

我正在思考 Pandas 的工作原理,并且正在努力操作和比较 Pandas 数据帧。

我有三个数据框只提取了需要的信息;

subjectDF:
   Subject ID              Subject  Year  Teaching Hours PW Facility Requirement
0       Mat13                Maths    13                  5                    N
1      FMat13  Further Mathematics    13                  5                    N
2       Eco13            Economics    13                  5                    N
3       Geo13            Geography    13                  5                    N
4       His13              History    13                  4                    N
5   EngLang13     English Language    13                  4                    N
6    EngLit13   English Literature    13                  4                    N
7       Ger13               German    13                  4                    N
8       Fre13               French    13                  4                    N
9       Spa13              Spanish    13                  4                    N
10      Bus13             Business    13                  4                    N
11     Film13         Film Studies    13                  4                    N
12      Psy13           Psychology    13                  5                    N
13      Lat13                Latin    13                  4                    N
14      Gre13                Greek    13                  4                    N
15      Cla13            Classical    13                  4                    N
16     Phil13           Philosophy    13                  4                    N

studentDF:
      Subject                                         Student ID  Student Number
0       Art13                                          [S8, S19]               2
1       Bio13  [S1, S4, S12, S13, S18, S24, S25, S28, S29, S3...              17
2       Bus13                                    [S10, S30, S47]               3
3       Che13  [S1, S2, S3, S4, S12, S13, S14, S24, S25, S26,...              20
4       Cla13                                     [S9, S33, S35]               3
5       Com13  [S2, S3, S10, S14, S16, S19, S31, S45, S192, S...              10
6       Eco13  [S6, S15, S17, S20, S23, S30, S31, S36, S41, S...              13
7   EngLang13                           [S9, S11, S21, S22, S47]               5
8    EngLit13                       [S5, S9, S22, S28, S32, S37]               6
9      FMat13                     [S7, S14, S27, S38, S45, S192]               6
10     Film13                                               [S8]               1
11      Fre13                     [S5, S15, S18, S29, S37, S193]               6
12      Geo13  [S6, S11, S20, S23, S32, S34, S36, S41, S42, S43]              10
13      Ger13                                   [S17, S43, S195]               3
14      Gre13                                         [S33, S40]               2
15      His13            [S5, S11, S21, S22, S32, S35, S37, S41]               8
16      Lat13                                         [S33, S35]               2
17      Mat13  [S1, S2, S3, S4, S6, S7, S10, S12, S13, S14, S...              34
18     Phil13              [S15, S16, S21, S40, S42, S193, S194]               7
19      Phy13  [S1, S7, S26, S27, S38, S44, S48, S49, S50, S2...              12
20      Psy13                                          [S8, S46]               2
21      Spa13                                    [S18, S36, S47]               3

classroomDF:
  Classroom ID Facility  Capacity
0            C8     None        25
1            C9     None        30
2           C10     None        12
3           C11     None        10
4           C12     None        10
5           C13     None        10
6           C14     None        20
7           C15     None        15
8           C16     None        15
9           C17     None        22
10          C22     None         5
11          C23     None         5

我正在尝试比较 subjectDF 中的 'Subject ID'studentDF 中的 'Subject',如果 'Subject' 中的某行未在 [=15 中列出=], 删除该行。 例如,由于 'Subject' 中的 Bio13 未在 'Subject ID' 中列出,我希望从 studentDF 中删除 Bio13

因此,预期输出将与 studentDF 完全相同,但没有不在 'Subject ID'.

中的行
studentDF:
      Subject                                         Student ID  Student Number
0       Art13                                          [S8, S19]               2
1       Bus13                                    [S10, S30, S47]               3

我尝试了很多不同的方法,但大多数时候都会出现以下错误;

ValueError: Can only compare identically-labeled Series objects

我不确定我是否应该在这里问另一个问题,我现在会 post 如果有问题,我会 post 在另一个问题中。

修改 studentDF 后,我想比较 studentDF 中的 'Student Numbers'classroomDF 中的 'Capacity' 并且如果 'Student Number' > 'Capacity',将学生和学科一分为二。例如,Mat13 有 34 个学生,这超过了 classroomDF 的最大容量。所以我想再次修改 studentDF 如下; 学生DF:

        Subject                                         Student ID  Student Number
16       ....
17      Mat13_1  [S1, S2, S3, S4, S6, S7, S10, S12, S13, S14, S...              17
18      Mat13_2  [S15, S16, S...                                                17
         ....

如果能帮助解决这个问题,我们将不胜感激!

IIUC,这就是你要找的

studentDF[~studentDF['Subject'].isin(subjectDF['Subject ID'])]

输出(由于我的 Jupyter notebook 显示设置,学生 ID 列在这里看起来被截断了)

Subject     Student ID                                          Student Number
0   Art13   [S8, S19]                                           2
1   Bio13   [S1, S4, S12, S13, S18, S24, S25, S28, S29, S3...   17
3   Che13   [S1, S2, S3, S4, S12, S13, S14, S24, S25, S26,...   20
5   Com13   [S2, S3, S10, S14, S16, S19, S31, S45, S192, S...   10
19  Phy13   [S1, S7, S26, S27, S38, S44, S48, S49, S50, S2...   12