根据数据框中的列条件划分行值 Python

Divide row values based on column criteria in a dataframe Python

嗨,我有一个结构如下的数据框:

              School  Grade Class
Date                             
2019-01-01  School A      2  Math
2019-02-01  School A      3  Math
2019-06-01  School A      1  Math
2019-01-01  School B      4  Math
2019-02-01  School B      5  Math
2019-06-01  School B      2  Math
2019-01-01  School C      6  Math
2019-02-01  School C      5  Math
2019-06-01  School C      6  Math

我想建立学校之间同一日期的比率,并将其添加到同一个数据框中,如下所示

日期:2019-01-01 比例:学校A年级/学校B年级=2/4=0.5等

Date        Type               Value    Class   
2019-01-01  School A           2        Math    
2019-02-01  School A           3        Math    
2019-06-01  School A           1        Math    
2019-01-01  School B           4        Math    
2019-02-01  School B           5        Math    
2019-06-01  School B           2        Math    
2019-01-01  School C           6        Math
2019-02-01  School C           5        Math
2019-06-01  School C           6        Math
2019-01-01  School A/School B  0.5      Math    
2019-02-01  School A/School B  0.6      Math    
2019-06-01  School A/School B  0.5      Math    

代码如下所示:

import pandas as pd

Input = {'Date': ['2019-01-01','2019-02-01','2019-06-01', '2019-01-01','2019-02-01','2019-06-01'],
         'School': ['School A', 'School A', 'School A', 'School B', 'School B', 'School B'],
         'Grade': [2, 3, 1, 4, 5, 2],
         'Class': ['Math', 'Math', 'Math', 'Math', 'Math', 'Math']
        }

df = pd.DataFrame(Input, columns = ['Date', 'School', 'Grade', 'Class'])
df['Date'] = pd.to_datetime(df.Date)
df = df.set_index('Date')

我不确定如何在行上循环(是否需要)并根据条件划分专用数字。

以下应该有效:

df2=df[df['School']=='School A']
df2['School']='School A/School B'
df2['Grade']=df2['Grade']/df[df['School']=='School B']['Grade']
result=pd.concat([df, df2])

print(result)

输出:

                       School  Grade Class
Date
2019-01-01           School A    2.0  Math
2019-02-01           School A    3.0  Math
2019-06-01           School A    1.0  Math
2019-01-01           School B    4.0  Math
2019-02-01           School B    5.0  Math
2019-06-01           School B    2.0  Math
2019-01-01  School A/School B    0.5  Math
2019-02-01  School A/School B    0.6  Math
2019-06-01  School A/School B    0.5  Math

尝试使用 groupby 来避免日期未排序的问题。

from operator import truediv
from functools import reduce

schools = ["School A", "School B"]
df1 = df.loc[df.School.isin(schools)]
grades = pd.DataFrame(df1.groupby(df1.index)["Grade"].agg(lambda s: reduce(truediv, s)))
grades["School"] = "School A / School B"
grades["Class"] = "Math"

pd.concat([df1, grades])

                         School  Grade Class
Date                                        
2019-01-01             School A    2.0  Math
2019-02-01             School A    3.0  Math
2019-06-01             School A    1.0  Math
2019-01-01             School B    4.0  Math
2019-02-01             School B    5.0  Math
2019-06-01             School B    2.0  Math
2019-01-01  School A / School B    0.5  Math
2019-02-01  School A / School B    0.6  Math
2019-06-01  School A / School B    0.5  Math

我的解决方案使用数据帧的深度复制和 select 数据。然后两个df都可以分了。

import pandas as pd

Input = {'Date': ['2018-01-01', '2018-02-01', '2019-01-01', '2019-02-01', '2019-06-01', '2019-01-01', '2019-02-01', '2019-06-01', '2019-01-01', '2019-02-01', '2019-06-01'],
         'School': ['School A', 'School A', 'School A', 'School A', 'School A', 'School B', 'School B', 'School B', 'School C', 'School C', 'School C'],
         'Grade': [1, 6, 2, 3, 1, 4, 5, 2, 6, 5, 6],
         'Class': ['Math', 'Math', 'Math', 'Math', 'Math', 'Math', 'Math', 'Math', 'Math', 'Math', 'Math']
        }

df = pd.DataFrame(Input, columns = ['Date', 'School', 'Grade', 'Class'])
df['Date'] = pd.to_datetime(df.Date)
df = df.set_index('Date')

df_copy_A = df.copy(deep=True)
df_copy_B = df.copy(deep=True)

df_copy_A = df_copy_A[(df_copy_A['School'] == 'School A')]
df_copy_B = df_copy_B[(df_copy_B['School'] == 'School B')]
df_copy_B['School'] = 'School A / School B'

df_copy_B['Grade'] = df_copy_B['Grade'].rdiv(df_copy_A['Grade'])


df = pd.concat([df, df_copy_B])
print(df)

这导致了预期的输出:

                         School  Grade Class
Date                                        
2018-01-01             School A    1.0  Math
2018-02-01             School A    6.0  Math
2019-01-01             School A    2.0  Math
2019-02-01             School A    3.0  Math
2019-06-01             School A    1.0  Math
2019-01-01             School B    4.0  Math
2019-02-01             School B    5.0  Math
2019-06-01             School B    2.0  Math
2019-01-01             School C    6.0  Math
2019-02-01             School C    5.0  Math
2019-06-01             School C    6.0  Math
2019-01-01  School A / School B    0.5  Math
2019-02-01  School A / School B    0.6  Math
2019-06-01  School A / School B    0.5  Math