Pandas:如何在减法后保留行顺序?
Pandas: How can I preserve row ordering after subtraction?
我有两个三列的数据框,列名相同。我想减去第一列和第二列的值匹配的第三列的值。我尝试了以下方法:
# Common column names
columns = ["month", "category", "sum"]
# First data frame
data1 = [("jan", "j", 10), ("feb", "f", 20)]
df1 = pd.DataFrame.from_records(data1, columns=columns)
# Second data frame
data2 = [("jan", "j", 9.5), ("mar", "m", 30)]
df2 = pd.DataFrame.from_records(data2, columns=columns)
print(df1) # Observe order of `month`s: jan, feb
print(df2) # Observe order of `month`s: jan, mar
# Subtract `sum` where `month`, and `category` match:
df1.set_index(["month", "category"]).subtract(df2.set_index(["month", "category"])).reset_index()
这会产生以下输出。
观察行在 month
.
上按字母顺序排序
month category sum
0 feb f NaN
1 jan j 0.5
2 mar m NaN
如何保持左侧操作数的行顺序? IE。如何获得以下输出(或类似输出):
month category sum
1 jan j 0.5
0 feb f NaN
2 mar m NaN
您可以将列分类并指定您认为合适的任何顺序:
df1['month'] = pd.Categorical(df1['month'], categories=['jan', 'feb', 'mar'], ordered=True)
df2['month'] = pd.Categorical(df2['month'], categories=['jan', 'feb', 'mar'], ordered=True)
# Subtract `sum` where `month`, and `category` match:
res = df1.set_index(["month", "category"]).subtract(df2.set_index(["month", "category"])).reset_index()
print(res)
输出
month category sum
0 jan j 0.5
1 feb f NaN
2 mar m NaN
pd.merge
将保留左操作数的顺序,然后您可以计算两列之间的差异。例如,您可以这样做:
df3 = pd.merge(df1, df2, on=["month", "category"], how="outer")
df3.loc[:, "difference"] = df3["sum_x"] - df3["sum_y"]
您的数据产生的结果:
month category sum_x sum_y difference
0 jan j 10.0 9.5 0.5
1 feb f 20.0 NaN NaN
2 mar m NaN 30.0 NaN
试试这个:
df1.sort_index(inplace=True)
这只会强制数据框按索引排序。
在此处找到更多文档:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_index.html
因为pandas
版本1.1.0 sort_values
可以带一个参数key
。您可以使用该参数传递所需的订单:
order = {"jan": 0, "feb": 1, "mar": 2}
df1.set_index(["month", "category"]).subtract(df2.set_index(["month", "category"])).reset_index().sort_values(by=['month'], key=lambda x: x.map(order))
输出:
month category sum
1 jan j 0.5
0 feb f NaN
2 mar m NaN
我有两个三列的数据框,列名相同。我想减去第一列和第二列的值匹配的第三列的值。我尝试了以下方法:
# Common column names
columns = ["month", "category", "sum"]
# First data frame
data1 = [("jan", "j", 10), ("feb", "f", 20)]
df1 = pd.DataFrame.from_records(data1, columns=columns)
# Second data frame
data2 = [("jan", "j", 9.5), ("mar", "m", 30)]
df2 = pd.DataFrame.from_records(data2, columns=columns)
print(df1) # Observe order of `month`s: jan, feb
print(df2) # Observe order of `month`s: jan, mar
# Subtract `sum` where `month`, and `category` match:
df1.set_index(["month", "category"]).subtract(df2.set_index(["month", "category"])).reset_index()
这会产生以下输出。
观察行在 month
.
month category sum
0 feb f NaN
1 jan j 0.5
2 mar m NaN
如何保持左侧操作数的行顺序? IE。如何获得以下输出(或类似输出):
month category sum
1 jan j 0.5
0 feb f NaN
2 mar m NaN
您可以将列分类并指定您认为合适的任何顺序:
df1['month'] = pd.Categorical(df1['month'], categories=['jan', 'feb', 'mar'], ordered=True)
df2['month'] = pd.Categorical(df2['month'], categories=['jan', 'feb', 'mar'], ordered=True)
# Subtract `sum` where `month`, and `category` match:
res = df1.set_index(["month", "category"]).subtract(df2.set_index(["month", "category"])).reset_index()
print(res)
输出
month category sum
0 jan j 0.5
1 feb f NaN
2 mar m NaN
pd.merge
将保留左操作数的顺序,然后您可以计算两列之间的差异。例如,您可以这样做:
df3 = pd.merge(df1, df2, on=["month", "category"], how="outer")
df3.loc[:, "difference"] = df3["sum_x"] - df3["sum_y"]
您的数据产生的结果:
month category sum_x sum_y difference
0 jan j 10.0 9.5 0.5
1 feb f 20.0 NaN NaN
2 mar m NaN 30.0 NaN
试试这个:
df1.sort_index(inplace=True)
这只会强制数据框按索引排序。 在此处找到更多文档:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_index.html
因为pandas
版本1.1.0 sort_values
可以带一个参数key
。您可以使用该参数传递所需的订单:
order = {"jan": 0, "feb": 1, "mar": 2}
df1.set_index(["month", "category"]).subtract(df2.set_index(["month", "category"])).reset_index().sort_values(by=['month'], key=lambda x: x.map(order))
输出:
month category sum
1 jan j 0.5
0 feb f NaN
2 mar m NaN