SQLAlchemy - 日期范围差距标识,说明重叠的范围
SQLAlchemy - Date range Gap Identifications, account for ranges that overlap
我正在尝试对 table 执行自连接,并查找与集合(州、office_type、office_class、地区)匹配的所有行,在为了识别集合没有数据的日期范围。
我当前的查询:
term_alias = aliased(schema.Term, name='term_alias')
query = Session.query(schema.Term, term_alias).filter(schema.Term.office_type_id == term_alias.office_type_id).\
filter(schema.Term.state_id == term_alias.state_id).\
filter(schema.Term.office_class == term_alias.office_class).\
filter(schema.Term.term_end < term_alias.term_begin).\
filter(or_(schema.Term.district_id == term_alias.district_id,
schema.Term.district_id == None)).\
group_by(schema.Term.term_end).\
group_by(schema.Term.state_id).\
group_by(schema.Term.office_class).\
group_by(schema.Term.office_type_id).\
having(schema.Term.term_end < func.min(term_alias.term_begin)).\
having((term_alias.term_begin - schema.Term.term_end) > 1)
我得到的结果并不完全有效。
突出显示的行,不应该通过。
如您所见,这实际上并不是差距。第一项和第二项 重叠 因此不应作为间隙包含在内
我已经尝试了上述查询的一些变体,但我没有得到正确的结果。我的问题是,如何解释重叠数据?并确保只报告真正的差距,
RAW SQL 由 print(query)
打印
SELECT terms.id AS terms_id, terms.term_begin AS terms_term_begin, terms.term_en
d AS terms_term_end, terms.term_served AS terms_term_served, terms.office_type_i
d AS terms_office_type_id, terms.person_id AS terms_person_id, terms.state_id AS
terms_state_id, terms.district_id AS terms_district_id, terms.removal_reason_id
AS terms_removal_reason_id, terms.political_party_id AS terms_political_party_i
d, terms.is_elected AS terms_is_elected, terms.is_holdover AS terms_is_holdover,
terms.neat_race_id AS terms_neat_race_id, terms.office_class AS terms_office_cl
ass, terms.notes AS terms_notes, terms.is_vacant AS terms_is_vacant, term_alias.
id AS term_alias_id, term_alias.term_begin AS term_alias_term_begin, term_alias.
term_end AS term_alias_term_end, term_alias.term_served AS term_alias_term_serve
d, term_alias.office_type_id AS term_alias_office_type_id, term_alias.person_id
AS term_alias_person_id, term_alias.state_id AS term_alias_state_id, term_alias.
district_id AS term_alias_district_id, term_alias.removal_reason_id AS term_alia
s_removal_reason_id, term_alias.political_party_id AS term_alias_political_party
_id, term_alias.is_elected AS term_alias_is_elected, term_alias.is_holdover AS t
erm_alias_is_holdover, term_alias.neat_race_id AS term_alias_neat_race_id, term_
alias.office_class AS term_alias_office_class, term_alias.notes AS term_alias_no
tes, term_alias.is_vacant AS term_alias_is_vacant
FROM terms, terms AS term_alias
WHERE terms.office_type_id = term_alias.office_type_id AND terms.state_id = term
_alias.state_id AND terms.office_class = term_alias.office_class AND terms.term_
end < term_alias.term_begin AND (terms.district_id = term_alias.district_id OR t
erms.district_id IS NULL) GROUP BY terms.term_end, terms.state_id, terms.office_
class, terms.office_type_id
HAVING terms.term_end < min(term_alias.term_begin) AND term_alias.term_begin - t
erms.term_end > :param_1
SQLFiddle 的当前状态:
http://sqlfiddle.com/#!9/3e030/1
从简单比较可以看出,您的查询无法处理重叠。
查看您的数据,我将使用以下步骤执行搜索:
- 查找差距
begins
:通过查找 Term.term_end
后一天未被任何其他 覆盖 的所有此类 Term
12=](正确的office_type/office_class/state/district)
- 对于每个找到的间隙
begin
通过找到在间隙 begin
之后开始的最早 Term
实例来找到间隙结束。
我得出以下实现:
TE = aliased(Term, name='TE') # gap period start (end of last before gap)
TO = aliased(Term, name='TO') # other term for check of TE
TS = aliased(Term, name='TS') # start of the next period after gap
# condition for the existance of TE: no other periods overlaping with the next
# day after TE.term_end
# define an alias for the `TE.term_end + 1 day` (engine specific)
# gap_sdate = TE.term_end
gap_sdate = func.date(TE.term_end, text("'+1 days'")) # sqlite version
# gap_sdate = func.ADDDATE(TE.term_end, 1)) # mysql version
# subquery which checks if there are Terms covering next day after term_end
subq = (
session
.query(TO.id)
.filter(and_(
TE.office_type_id == TO.office_type_id,
TE.state_id == TO.state_id,
or_(TE.district_id == TO.district_id,
and_(TE.district_id == None, TO.district_id == None)
),
TE.office_class == TO.office_class,
))
.filter(TO.id != TE.id)
.filter(TO.term_begin <= gap_sdate)
.filter(or_(TO.term_end == None, TO.term_end >= gap_sdate))
)
# query to find the gaps
q = (
session
.query(
TE.office_type_id,
TE.state_id,
TE.district_id,
TE.office_class,
gap_sdate.label("vacant_from"),
func.date(func.min(TS.term_begin), text("'-1 days'")).label("vacant_till"), # sqlite version
# func.ADDDATE(func.min(TS.term_begin), -1).label("vacant_till"), # mysql version
)
.join(TS, and_(
TE.office_type_id == TS.office_type_id,
TE.state_id == TS.state_id,
or_(TE.district_id == TS.district_id,
and_(TE.district_id == None, TS.district_id == None)
),
TE.office_class == TS.office_class,
))
# filters for gap start
.filter(TE.term_end != None)
.filter(~subq.exists())
# filters for gap end
.filter(TE.id != TS.id)
.filter(TS.term_begin > gap_sdate)
.group_by(TE)
)
gaps = q.all()
for gap in gaps:
print(gap)
我正在尝试对 table 执行自连接,并查找与集合(州、office_type、office_class、地区)匹配的所有行,在为了识别集合没有数据的日期范围。
我当前的查询:
term_alias = aliased(schema.Term, name='term_alias')
query = Session.query(schema.Term, term_alias).filter(schema.Term.office_type_id == term_alias.office_type_id).\
filter(schema.Term.state_id == term_alias.state_id).\
filter(schema.Term.office_class == term_alias.office_class).\
filter(schema.Term.term_end < term_alias.term_begin).\
filter(or_(schema.Term.district_id == term_alias.district_id,
schema.Term.district_id == None)).\
group_by(schema.Term.term_end).\
group_by(schema.Term.state_id).\
group_by(schema.Term.office_class).\
group_by(schema.Term.office_type_id).\
having(schema.Term.term_end < func.min(term_alias.term_begin)).\
having((term_alias.term_begin - schema.Term.term_end) > 1)
我得到的结果并不完全有效。
突出显示的行,不应该通过。
如您所见,这实际上并不是差距。第一项和第二项 重叠 因此不应作为间隙包含在内
我已经尝试了上述查询的一些变体,但我没有得到正确的结果。我的问题是,如何解释重叠数据?并确保只报告真正的差距,
RAW SQL 由 print(query)
SELECT terms.id AS terms_id, terms.term_begin AS terms_term_begin, terms.term_en
d AS terms_term_end, terms.term_served AS terms_term_served, terms.office_type_i
d AS terms_office_type_id, terms.person_id AS terms_person_id, terms.state_id AS
terms_state_id, terms.district_id AS terms_district_id, terms.removal_reason_id
AS terms_removal_reason_id, terms.political_party_id AS terms_political_party_i
d, terms.is_elected AS terms_is_elected, terms.is_holdover AS terms_is_holdover,
terms.neat_race_id AS terms_neat_race_id, terms.office_class AS terms_office_cl
ass, terms.notes AS terms_notes, terms.is_vacant AS terms_is_vacant, term_alias.
id AS term_alias_id, term_alias.term_begin AS term_alias_term_begin, term_alias.
term_end AS term_alias_term_end, term_alias.term_served AS term_alias_term_serve
d, term_alias.office_type_id AS term_alias_office_type_id, term_alias.person_id
AS term_alias_person_id, term_alias.state_id AS term_alias_state_id, term_alias.
district_id AS term_alias_district_id, term_alias.removal_reason_id AS term_alia
s_removal_reason_id, term_alias.political_party_id AS term_alias_political_party
_id, term_alias.is_elected AS term_alias_is_elected, term_alias.is_holdover AS t
erm_alias_is_holdover, term_alias.neat_race_id AS term_alias_neat_race_id, term_
alias.office_class AS term_alias_office_class, term_alias.notes AS term_alias_no
tes, term_alias.is_vacant AS term_alias_is_vacant
FROM terms, terms AS term_alias
WHERE terms.office_type_id = term_alias.office_type_id AND terms.state_id = term
_alias.state_id AND terms.office_class = term_alias.office_class AND terms.term_
end < term_alias.term_begin AND (terms.district_id = term_alias.district_id OR t
erms.district_id IS NULL) GROUP BY terms.term_end, terms.state_id, terms.office_
class, terms.office_type_id
HAVING terms.term_end < min(term_alias.term_begin) AND term_alias.term_begin - t
erms.term_end > :param_1
SQLFiddle 的当前状态: http://sqlfiddle.com/#!9/3e030/1
从简单比较可以看出,您的查询无法处理重叠。
查看您的数据,我将使用以下步骤执行搜索:
- 查找差距
begins
:通过查找Term.term_end
后一天未被任何其他 覆盖 的所有此类Term
12=](正确的office_type/office_class/state/district) - 对于每个找到的间隙
begin
通过找到在间隙begin
之后开始的最早Term
实例来找到间隙结束。
我得出以下实现:
TE = aliased(Term, name='TE') # gap period start (end of last before gap)
TO = aliased(Term, name='TO') # other term for check of TE
TS = aliased(Term, name='TS') # start of the next period after gap
# condition for the existance of TE: no other periods overlaping with the next
# day after TE.term_end
# define an alias for the `TE.term_end + 1 day` (engine specific)
# gap_sdate = TE.term_end
gap_sdate = func.date(TE.term_end, text("'+1 days'")) # sqlite version
# gap_sdate = func.ADDDATE(TE.term_end, 1)) # mysql version
# subquery which checks if there are Terms covering next day after term_end
subq = (
session
.query(TO.id)
.filter(and_(
TE.office_type_id == TO.office_type_id,
TE.state_id == TO.state_id,
or_(TE.district_id == TO.district_id,
and_(TE.district_id == None, TO.district_id == None)
),
TE.office_class == TO.office_class,
))
.filter(TO.id != TE.id)
.filter(TO.term_begin <= gap_sdate)
.filter(or_(TO.term_end == None, TO.term_end >= gap_sdate))
)
# query to find the gaps
q = (
session
.query(
TE.office_type_id,
TE.state_id,
TE.district_id,
TE.office_class,
gap_sdate.label("vacant_from"),
func.date(func.min(TS.term_begin), text("'-1 days'")).label("vacant_till"), # sqlite version
# func.ADDDATE(func.min(TS.term_begin), -1).label("vacant_till"), # mysql version
)
.join(TS, and_(
TE.office_type_id == TS.office_type_id,
TE.state_id == TS.state_id,
or_(TE.district_id == TS.district_id,
and_(TE.district_id == None, TS.district_id == None)
),
TE.office_class == TS.office_class,
))
# filters for gap start
.filter(TE.term_end != None)
.filter(~subq.exists())
# filters for gap end
.filter(TE.id != TS.id)
.filter(TS.term_begin > gap_sdate)
.group_by(TE)
)
gaps = q.all()
for gap in gaps:
print(gap)