在 pandas 中的列之间的范围内搜索值(不是日期列,也没有 sql)
Searching a value within range between columns in pandas (not date columns and no sql)
在此先感谢您的帮助。我有两个数据框
如下所示。我需要根据尺寸框架中的信息在已售框架中创建列类别。它应该检查该产品和 return 组的最小和最大尺寸范围内的产品尺寸。是否可以在 pandas 内完成?不是 SQL。我认为 merge 和 join 方法在这里不起作用。
size=pd.DataFrame({"Min Size":[30,41,40],
"Max Size":[40, 60, 50],
"Category":['small', 'big', "medium"],
"Product":['Apple', 'Apple', "Peach"]})
sold=pd.DataFrame({"Purchase_date":["20/01/2020", "18/02/2020", "01/06/2020"],
"Size":[35, 45, 42],
"Category":["small","big","medium"],
"Product":['Apple', 'Peach', "Apple"]})
pandas 中的加入条件必须完全匹配。它没有像 SQL.
中那样的 BETWEEN ... AND ...
子句
您可以使用 numpy broadcast 将 sold
中的每一行与 size
中的每一行进行比较并过滤匹配项:
# Converting everything to numpy for comparison
sold_product = sold["Product"].to_numpy()[:, None]
sold_size = sold["Size"].to_numpy()[:, None]
product, min_size, max_size = size[["Product", "Min Size", "Max Size"]].T.to_numpy()
# Compare every row in `sold` to every row in `size`.
# `mask` is a len(sold) * len(size) matrix whose value
# indicate if row i in `sold` matches row j in `size`
mask = (sold_product == product) & (min_size <= sold_size) & (sold_size <= max_size)
# For each row in `sold`, find the first row in `size` that
# is True / non-zero
idx, join_key = mask.nonzero()
sold.loc[idx, "join_key"] = join_key
# Result
sold.merge(
size[["Category"]],
how="left",
left_on="join_key",
right_index=True,
suffixes=("_Expected", "_Actual"),
)
在此先感谢您的帮助。我有两个数据框 如下所示。我需要根据尺寸框架中的信息在已售框架中创建列类别。它应该检查该产品和 return 组的最小和最大尺寸范围内的产品尺寸。是否可以在 pandas 内完成?不是 SQL。我认为 merge 和 join 方法在这里不起作用。
size=pd.DataFrame({"Min Size":[30,41,40],
"Max Size":[40, 60, 50],
"Category":['small', 'big', "medium"],
"Product":['Apple', 'Apple', "Peach"]})
sold=pd.DataFrame({"Purchase_date":["20/01/2020", "18/02/2020", "01/06/2020"],
"Size":[35, 45, 42],
"Category":["small","big","medium"],
"Product":['Apple', 'Peach', "Apple"]})
pandas 中的加入条件必须完全匹配。它没有像 SQL.
中那样的BETWEEN ... AND ...
子句
您可以使用 numpy broadcast 将 sold
中的每一行与 size
中的每一行进行比较并过滤匹配项:
# Converting everything to numpy for comparison
sold_product = sold["Product"].to_numpy()[:, None]
sold_size = sold["Size"].to_numpy()[:, None]
product, min_size, max_size = size[["Product", "Min Size", "Max Size"]].T.to_numpy()
# Compare every row in `sold` to every row in `size`.
# `mask` is a len(sold) * len(size) matrix whose value
# indicate if row i in `sold` matches row j in `size`
mask = (sold_product == product) & (min_size <= sold_size) & (sold_size <= max_size)
# For each row in `sold`, find the first row in `size` that
# is True / non-zero
idx, join_key = mask.nonzero()
sold.loc[idx, "join_key"] = join_key
# Result
sold.merge(
size[["Category"]],
how="left",
left_on="join_key",
right_index=True,
suffixes=("_Expected", "_Actual"),
)