在 pandas 中的列之间的范围内搜索值（不是日期列，也没有 sql）

Question

在此先感谢您的帮助。我有两个数据框如下所示。我需要根据尺寸框架中的信息在已售框架中创建列类别。它应该检查该产品和 return 组的最小和最大尺寸范围内的产品尺寸。是否可以在 pandas 内完成？不是 SQL。我认为 merge 和 join 方法在这里不起作用。

size=pd.DataFrame({"Min Size":[30,41,40],
                   "Max Size":[40, 60, 50],
                   "Category":['small', 'big', "medium"],
                   "Product":['Apple', 'Apple', "Peach"]})
sold=pd.DataFrame({"Purchase_date":["20/01/2020", "18/02/2020", "01/06/2020"],
                          "Size":[35, 45, 42],
                          "Category":["small","big","medium"],
                          "Product":['Apple', 'Peach', "Apple"]})

Answer 1

pandas 中的加入条件必须完全匹配。它没有像 SQL.

中那样的 BETWEEN ... AND ... 子句

您可以使用 numpy broadcast 将 sold 中的每一行与 size 中的每一行进行比较并过滤匹配项：

# Converting everything to numpy for comparison
sold_product = sold["Product"].to_numpy()[:, None]
sold_size = sold["Size"].to_numpy()[:, None]

product, min_size, max_size = size[["Product", "Min Size", "Max Size"]].T.to_numpy()

# Compare every row in `sold` to every row in `size`.
# `mask` is a len(sold) * len(size) matrix whose value
# indicate if row i in `sold` matches row j in `size`
mask = (sold_product == product) & (min_size <= sold_size) & (sold_size <= max_size)

# For each row in `sold`, find the first row in `size` that
# is True / non-zero
idx, join_key = mask.nonzero()
sold.loc[idx, "join_key"] = join_key

# Result
sold.merge(
    size[["Category"]],
    how="left",
    left_on="join_key",
    right_index=True,
    suffixes=("_Expected", "_Actual"),
)

在 pandas 中的列之间的范围内搜索值（不是日期列，也没有 sql）

Searching a value within range between columns in pandas (not date columns and no sql)

python

lookup

search

pandas