根据数据框中的列条件划分行值 Python
Divide row values based on column criteria in a dataframe Python
嗨,我有一个结构如下的数据框:
School Grade Class
Date
2019-01-01 School A 2 Math
2019-02-01 School A 3 Math
2019-06-01 School A 1 Math
2019-01-01 School B 4 Math
2019-02-01 School B 5 Math
2019-06-01 School B 2 Math
2019-01-01 School C 6 Math
2019-02-01 School C 5 Math
2019-06-01 School C 6 Math
我想建立学校之间同一日期的比率,并将其添加到同一个数据框中,如下所示
日期:2019-01-01 比例:学校A年级/学校B年级=2/4=0.5等
Date Type Value Class
2019-01-01 School A 2 Math
2019-02-01 School A 3 Math
2019-06-01 School A 1 Math
2019-01-01 School B 4 Math
2019-02-01 School B 5 Math
2019-06-01 School B 2 Math
2019-01-01 School C 6 Math
2019-02-01 School C 5 Math
2019-06-01 School C 6 Math
2019-01-01 School A/School B 0.5 Math
2019-02-01 School A/School B 0.6 Math
2019-06-01 School A/School B 0.5 Math
代码如下所示:
import pandas as pd
Input = {'Date': ['2019-01-01','2019-02-01','2019-06-01', '2019-01-01','2019-02-01','2019-06-01'],
'School': ['School A', 'School A', 'School A', 'School B', 'School B', 'School B'],
'Grade': [2, 3, 1, 4, 5, 2],
'Class': ['Math', 'Math', 'Math', 'Math', 'Math', 'Math']
}
df = pd.DataFrame(Input, columns = ['Date', 'School', 'Grade', 'Class'])
df['Date'] = pd.to_datetime(df.Date)
df = df.set_index('Date')
我不确定如何在行上循环(是否需要)并根据条件划分专用数字。
以下应该有效:
df2=df[df['School']=='School A']
df2['School']='School A/School B'
df2['Grade']=df2['Grade']/df[df['School']=='School B']['Grade']
result=pd.concat([df, df2])
print(result)
输出:
School Grade Class
Date
2019-01-01 School A 2.0 Math
2019-02-01 School A 3.0 Math
2019-06-01 School A 1.0 Math
2019-01-01 School B 4.0 Math
2019-02-01 School B 5.0 Math
2019-06-01 School B 2.0 Math
2019-01-01 School A/School B 0.5 Math
2019-02-01 School A/School B 0.6 Math
2019-06-01 School A/School B 0.5 Math
尝试使用 groupby
来避免日期未排序的问题。
from operator import truediv
from functools import reduce
schools = ["School A", "School B"]
df1 = df.loc[df.School.isin(schools)]
grades = pd.DataFrame(df1.groupby(df1.index)["Grade"].agg(lambda s: reduce(truediv, s)))
grades["School"] = "School A / School B"
grades["Class"] = "Math"
pd.concat([df1, grades])
School Grade Class
Date
2019-01-01 School A 2.0 Math
2019-02-01 School A 3.0 Math
2019-06-01 School A 1.0 Math
2019-01-01 School B 4.0 Math
2019-02-01 School B 5.0 Math
2019-06-01 School B 2.0 Math
2019-01-01 School A / School B 0.5 Math
2019-02-01 School A / School B 0.6 Math
2019-06-01 School A / School B 0.5 Math
我的解决方案使用数据帧的深度复制和 select 数据。然后两个df都可以分了。
import pandas as pd
Input = {'Date': ['2018-01-01', '2018-02-01', '2019-01-01', '2019-02-01', '2019-06-01', '2019-01-01', '2019-02-01', '2019-06-01', '2019-01-01', '2019-02-01', '2019-06-01'],
'School': ['School A', 'School A', 'School A', 'School A', 'School A', 'School B', 'School B', 'School B', 'School C', 'School C', 'School C'],
'Grade': [1, 6, 2, 3, 1, 4, 5, 2, 6, 5, 6],
'Class': ['Math', 'Math', 'Math', 'Math', 'Math', 'Math', 'Math', 'Math', 'Math', 'Math', 'Math']
}
df = pd.DataFrame(Input, columns = ['Date', 'School', 'Grade', 'Class'])
df['Date'] = pd.to_datetime(df.Date)
df = df.set_index('Date')
df_copy_A = df.copy(deep=True)
df_copy_B = df.copy(deep=True)
df_copy_A = df_copy_A[(df_copy_A['School'] == 'School A')]
df_copy_B = df_copy_B[(df_copy_B['School'] == 'School B')]
df_copy_B['School'] = 'School A / School B'
df_copy_B['Grade'] = df_copy_B['Grade'].rdiv(df_copy_A['Grade'])
df = pd.concat([df, df_copy_B])
print(df)
这导致了预期的输出:
School Grade Class
Date
2018-01-01 School A 1.0 Math
2018-02-01 School A 6.0 Math
2019-01-01 School A 2.0 Math
2019-02-01 School A 3.0 Math
2019-06-01 School A 1.0 Math
2019-01-01 School B 4.0 Math
2019-02-01 School B 5.0 Math
2019-06-01 School B 2.0 Math
2019-01-01 School C 6.0 Math
2019-02-01 School C 5.0 Math
2019-06-01 School C 6.0 Math
2019-01-01 School A / School B 0.5 Math
2019-02-01 School A / School B 0.6 Math
2019-06-01 School A / School B 0.5 Math
嗨,我有一个结构如下的数据框:
School Grade Class
Date
2019-01-01 School A 2 Math
2019-02-01 School A 3 Math
2019-06-01 School A 1 Math
2019-01-01 School B 4 Math
2019-02-01 School B 5 Math
2019-06-01 School B 2 Math
2019-01-01 School C 6 Math
2019-02-01 School C 5 Math
2019-06-01 School C 6 Math
我想建立学校之间同一日期的比率,并将其添加到同一个数据框中,如下所示
日期:2019-01-01 比例:学校A年级/学校B年级=2/4=0.5等
Date Type Value Class
2019-01-01 School A 2 Math
2019-02-01 School A 3 Math
2019-06-01 School A 1 Math
2019-01-01 School B 4 Math
2019-02-01 School B 5 Math
2019-06-01 School B 2 Math
2019-01-01 School C 6 Math
2019-02-01 School C 5 Math
2019-06-01 School C 6 Math
2019-01-01 School A/School B 0.5 Math
2019-02-01 School A/School B 0.6 Math
2019-06-01 School A/School B 0.5 Math
代码如下所示:
import pandas as pd
Input = {'Date': ['2019-01-01','2019-02-01','2019-06-01', '2019-01-01','2019-02-01','2019-06-01'],
'School': ['School A', 'School A', 'School A', 'School B', 'School B', 'School B'],
'Grade': [2, 3, 1, 4, 5, 2],
'Class': ['Math', 'Math', 'Math', 'Math', 'Math', 'Math']
}
df = pd.DataFrame(Input, columns = ['Date', 'School', 'Grade', 'Class'])
df['Date'] = pd.to_datetime(df.Date)
df = df.set_index('Date')
我不确定如何在行上循环(是否需要)并根据条件划分专用数字。
以下应该有效:
df2=df[df['School']=='School A']
df2['School']='School A/School B'
df2['Grade']=df2['Grade']/df[df['School']=='School B']['Grade']
result=pd.concat([df, df2])
print(result)
输出:
School Grade Class
Date
2019-01-01 School A 2.0 Math
2019-02-01 School A 3.0 Math
2019-06-01 School A 1.0 Math
2019-01-01 School B 4.0 Math
2019-02-01 School B 5.0 Math
2019-06-01 School B 2.0 Math
2019-01-01 School A/School B 0.5 Math
2019-02-01 School A/School B 0.6 Math
2019-06-01 School A/School B 0.5 Math
尝试使用 groupby
来避免日期未排序的问题。
from operator import truediv
from functools import reduce
schools = ["School A", "School B"]
df1 = df.loc[df.School.isin(schools)]
grades = pd.DataFrame(df1.groupby(df1.index)["Grade"].agg(lambda s: reduce(truediv, s)))
grades["School"] = "School A / School B"
grades["Class"] = "Math"
pd.concat([df1, grades])
School Grade Class
Date
2019-01-01 School A 2.0 Math
2019-02-01 School A 3.0 Math
2019-06-01 School A 1.0 Math
2019-01-01 School B 4.0 Math
2019-02-01 School B 5.0 Math
2019-06-01 School B 2.0 Math
2019-01-01 School A / School B 0.5 Math
2019-02-01 School A / School B 0.6 Math
2019-06-01 School A / School B 0.5 Math
我的解决方案使用数据帧的深度复制和 select 数据。然后两个df都可以分了。
import pandas as pd
Input = {'Date': ['2018-01-01', '2018-02-01', '2019-01-01', '2019-02-01', '2019-06-01', '2019-01-01', '2019-02-01', '2019-06-01', '2019-01-01', '2019-02-01', '2019-06-01'],
'School': ['School A', 'School A', 'School A', 'School A', 'School A', 'School B', 'School B', 'School B', 'School C', 'School C', 'School C'],
'Grade': [1, 6, 2, 3, 1, 4, 5, 2, 6, 5, 6],
'Class': ['Math', 'Math', 'Math', 'Math', 'Math', 'Math', 'Math', 'Math', 'Math', 'Math', 'Math']
}
df = pd.DataFrame(Input, columns = ['Date', 'School', 'Grade', 'Class'])
df['Date'] = pd.to_datetime(df.Date)
df = df.set_index('Date')
df_copy_A = df.copy(deep=True)
df_copy_B = df.copy(deep=True)
df_copy_A = df_copy_A[(df_copy_A['School'] == 'School A')]
df_copy_B = df_copy_B[(df_copy_B['School'] == 'School B')]
df_copy_B['School'] = 'School A / School B'
df_copy_B['Grade'] = df_copy_B['Grade'].rdiv(df_copy_A['Grade'])
df = pd.concat([df, df_copy_B])
print(df)
这导致了预期的输出:
School Grade Class
Date
2018-01-01 School A 1.0 Math
2018-02-01 School A 6.0 Math
2019-01-01 School A 2.0 Math
2019-02-01 School A 3.0 Math
2019-06-01 School A 1.0 Math
2019-01-01 School B 4.0 Math
2019-02-01 School B 5.0 Math
2019-06-01 School B 2.0 Math
2019-01-01 School C 6.0 Math
2019-02-01 School C 5.0 Math
2019-06-01 School C 6.0 Math
2019-01-01 School A / School B 0.5 Math
2019-02-01 School A / School B 0.6 Math
2019-06-01 School A / School B 0.5 Math