在 pandas 中使用 sqlite3 在列之间的范围内搜索
searching in range between columns using sqlite3 in pandas
我在一个问题中找到了解决问题的办法
我试图根据我的情况修改它,但它没有用。在下面给出的代码中,
我需要 df 来显示每个销售产品和类别的开始和结束。
但它忽略了介于开始和结束期间之间的日期。
正如在 01/06/2020 期间 26/03/2020 - 31/07/2020 期间的 Apple sel 屏幕上可以看到的那样,但它显示其他。
我应该如何澄清 SQL 查询?
import pandas as pd
import sqlite3
dates_of_discount=pd.DataFrame({"Date_begining":['01/01/2021','01/02/2020','26/03/2020'],
"Date_ending":['31/12/2021', '25/02/2020', '31/07/2020'],
"Category":['Discount', 'Not Discount', "Discount"],
"d_Product":['Apple', 'Peach', "Apple"]})
purchase_dates=pd.DataFrame({"date":(["20/01/2020", "18/02/2020", "01/06/2020"]),
"Qty":[100, 200, 300],
"Price":[3.5,4, 20],
"p_Product":['Apple', 'Peach', "Apple"]})
conn = sqlite3.connect(':memory:')
dates_of_discount.to_sql('disc', conn, index=False)
purchase_dates.to_sql('purch', conn, index=False)
qry = '''
select
purch.date Sold,
purch.p_Product Prod,
purch.Qty,
purch.Price,
Date_begining Period_Start,
Date_ending Period_End,
Category Output
from
purch join disc on
date between Date_begining and Date_ending and
d_Product = p_Product
'''
df = pd.read_sql_query(qry, conn)
df
如评论所述,通过使用 pd.to_datetime
将日期字符串正确转换为 pandas 中的实际 datetime
,SQL 连接操作应该 return 预期结果:
输入数据(带日期转换)
dates_of_discount = pd.DataFrame({
"Date_begining": pd.to_datetime(
['01/01/2021','01/02/2020','26/03/2020'],
format="%d/%m/%Y"
),
"Date_ending": pd.to_datetime(
['31/12/2021', '25/02/2020', '31/07/2020'],
format="%d/%m/%Y"
),
"Category": ['Discount', 'Not Discount', "Discount"],
"d_Product": ['Apple', 'Peach', "Apple"]
})
purchase_dates=pd.DataFrame({
"date": pd.to_datetime(
["20/01/2020", "18/02/2020", "01/06/2020"],
format="%d/%m/%Y"
),
"Qty":[100, 200, 300],
"Price":[3.5,4, 20],
"p_Product":['Apple', 'Peach', "Apple"]
})
SQL站点查询
conn = sqlite3.connect(':memory:')
dates_of_discount.to_sql('disc', conn, index=False)
purchase_dates.to_sql('purch', conn, index=False)
qry = '''
select
purch.date as Sold,
purch.p_Product as Prod,
purch.Qty,
purch.Price,
disc.Date_begining as Period_Start,
disc.Date_ending as Period_End,
disc.Category as Output
from purch
join disc
on purch.date between disc.Date_begining and disc.Date_ending
and purch.p_Product = disc.d_Product
'''
merge_df = pd.read_sql_query(qry, conn)
merge_df
# Sold Prod Qty Price Period_Start Period_End Output
# 0 2020-02-18 00:00:00 Peach 200 4.0 2020-02-01 00:00:00 2020-02-25 00:00:00 Not Discount
# 1 2020-06-01 00:00:00 Apple 300 20.0 2020-03-26 00:00:00 2020-07-31 00:00:00 Discount
顺便说一下,pandas 也可以 运行 类似的操作 merge
按产品和 query
或按日期过滤(reindex
和 set_axis
子集和重命名列):
merge_df = (
purchase_dates.merge(
dates_of_discount, left_on="p_Product", right_on="d_Product"
).query(
"date >= Date_begining & date <= Date_ending"
).reset_index(drop=True)
.reindex(
["date", "p_Product", "Qty", "Price", "Date_begining", "Date_ending", "Category"],
axis = "columns"
).set_axis(
["Sold", "Prod", "Qty", "Price", "Period_Start", "Period_End", "Output"],
axis = "columns",
inplace = False
)
)
merge_df_pd
# Sold Prod Qty Price Period_Start Period_End Output
# 0 2020-06-01 Apple 300 20.0 2020-03-26 2020-07-31 Discount
# 1 2020-02-18 Peach 200 4.0 2020-02-01 2020-02-25 Not Discount
最后,根据您的评论,如果对 SQL 或 pandas 都使用产品尺寸等数字而不是日期:
,则相同的逻辑应该有效
select
purch.date as Sold,
purch.p_Product as Prod,
purch.Qty,
purch.Price,
disc.min_product_size,
disc.max_product_size,
disc.Category as Output
from purch
join disc
on purch.product_size between disc.min_product_size and disc.max_product_size
and purch.p_Product = disc.d_Product
merge_df = (
purchase_dates.merge(
dates_of_discount, left_on="p_Product", right_on="d_Product"
).query(
"product_size >= min_product_size & product_size <= max_product_size"
).reset_index(drop=True)
.reindex(
["date", "p_Product", "Qty", "Price", "min_product_size", "max_product_size", "Category"],
axis = "columns"
).set_axis(
["Sold", "Prod", "Qty", "Price", "min_product_size", "max_product_size", "Output"],
axis = "columns",
inplace = False
)
)
我在一个问题中找到了解决问题的办法
import pandas as pd
import sqlite3
dates_of_discount=pd.DataFrame({"Date_begining":['01/01/2021','01/02/2020','26/03/2020'],
"Date_ending":['31/12/2021', '25/02/2020', '31/07/2020'],
"Category":['Discount', 'Not Discount', "Discount"],
"d_Product":['Apple', 'Peach', "Apple"]})
purchase_dates=pd.DataFrame({"date":(["20/01/2020", "18/02/2020", "01/06/2020"]),
"Qty":[100, 200, 300],
"Price":[3.5,4, 20],
"p_Product":['Apple', 'Peach', "Apple"]})
conn = sqlite3.connect(':memory:')
dates_of_discount.to_sql('disc', conn, index=False)
purchase_dates.to_sql('purch', conn, index=False)
qry = '''
select
purch.date Sold,
purch.p_Product Prod,
purch.Qty,
purch.Price,
Date_begining Period_Start,
Date_ending Period_End,
Category Output
from
purch join disc on
date between Date_begining and Date_ending and
d_Product = p_Product
'''
df = pd.read_sql_query(qry, conn)
df
如评论所述,通过使用 pd.to_datetime
将日期字符串正确转换为 pandas 中的实际 datetime
,SQL 连接操作应该 return 预期结果:
输入数据(带日期转换)
dates_of_discount = pd.DataFrame({
"Date_begining": pd.to_datetime(
['01/01/2021','01/02/2020','26/03/2020'],
format="%d/%m/%Y"
),
"Date_ending": pd.to_datetime(
['31/12/2021', '25/02/2020', '31/07/2020'],
format="%d/%m/%Y"
),
"Category": ['Discount', 'Not Discount', "Discount"],
"d_Product": ['Apple', 'Peach', "Apple"]
})
purchase_dates=pd.DataFrame({
"date": pd.to_datetime(
["20/01/2020", "18/02/2020", "01/06/2020"],
format="%d/%m/%Y"
),
"Qty":[100, 200, 300],
"Price":[3.5,4, 20],
"p_Product":['Apple', 'Peach', "Apple"]
})
SQL站点查询
conn = sqlite3.connect(':memory:')
dates_of_discount.to_sql('disc', conn, index=False)
purchase_dates.to_sql('purch', conn, index=False)
qry = '''
select
purch.date as Sold,
purch.p_Product as Prod,
purch.Qty,
purch.Price,
disc.Date_begining as Period_Start,
disc.Date_ending as Period_End,
disc.Category as Output
from purch
join disc
on purch.date between disc.Date_begining and disc.Date_ending
and purch.p_Product = disc.d_Product
'''
merge_df = pd.read_sql_query(qry, conn)
merge_df
# Sold Prod Qty Price Period_Start Period_End Output
# 0 2020-02-18 00:00:00 Peach 200 4.0 2020-02-01 00:00:00 2020-02-25 00:00:00 Not Discount
# 1 2020-06-01 00:00:00 Apple 300 20.0 2020-03-26 00:00:00 2020-07-31 00:00:00 Discount
顺便说一下,pandas 也可以 运行 类似的操作 merge
按产品和 query
或按日期过滤(reindex
和 set_axis
子集和重命名列):
merge_df = (
purchase_dates.merge(
dates_of_discount, left_on="p_Product", right_on="d_Product"
).query(
"date >= Date_begining & date <= Date_ending"
).reset_index(drop=True)
.reindex(
["date", "p_Product", "Qty", "Price", "Date_begining", "Date_ending", "Category"],
axis = "columns"
).set_axis(
["Sold", "Prod", "Qty", "Price", "Period_Start", "Period_End", "Output"],
axis = "columns",
inplace = False
)
)
merge_df_pd
# Sold Prod Qty Price Period_Start Period_End Output
# 0 2020-06-01 Apple 300 20.0 2020-03-26 2020-07-31 Discount
# 1 2020-02-18 Peach 200 4.0 2020-02-01 2020-02-25 Not Discount
最后,根据您的评论,如果对 SQL 或 pandas 都使用产品尺寸等数字而不是日期:
,则相同的逻辑应该有效select
purch.date as Sold,
purch.p_Product as Prod,
purch.Qty,
purch.Price,
disc.min_product_size,
disc.max_product_size,
disc.Category as Output
from purch
join disc
on purch.product_size between disc.min_product_size and disc.max_product_size
and purch.p_Product = disc.d_Product
merge_df = (
purchase_dates.merge(
dates_of_discount, left_on="p_Product", right_on="d_Product"
).query(
"product_size >= min_product_size & product_size <= max_product_size"
).reset_index(drop=True)
.reindex(
["date", "p_Product", "Qty", "Price", "min_product_size", "max_product_size", "Category"],
axis = "columns"
).set_axis(
["Sold", "Prod", "Qty", "Price", "min_product_size", "max_product_size", "Output"],
axis = "columns",
inplace = False
)
)