TSQL 数据透视或估计模型复制

TSQL Pivot or Estimate Model Replication

所以,我正在尝试在 SQL 中重新创建一个在 Excel 中开发的模型,但我遇到了困难,因为该模型依赖于某些记录的位置。

它用于根据单位和美元金额估算竞争对手的销售额。

唯一可用的数据是我们客户的数据,因此这是用于估算的数据。换句话说,如果没有数据,那么他们将使用下一个最接近的记录中的数据,并用相同的值填充中间的所有内容。

我正在尝试在 SQL 中重现,因为我开发了 SSIS 包以将数据泵入 SQL 服务器,然后我从那里创建视图以自动报告并推送到 Tableau 或 Power双。这是一个非常手动的过程,有很大的错误空间。

简短修复,我相信特定的支点将帮助我重新创建模型的逻辑。这就是我要调整的方向:

父 ASIN 排名 界限 群组
1 1 1
5 0 1
7 1 2
10 0 2
12 1 3
14 0 3

但是,我需要一个看起来像这样的 table:

群组 Lower_Bound Upper_Bound
1 1 5
2 5 7
3 7 10
4 10 12
5 12 14
6 14 18

我试过使用 LAG 和 LEAD(例如 select *, LAG([RB Units],1,0) over (order by [Parent ASIN Rank]) as test1, LEAD([RB Units],1,0) over (order by [Parent ASIN Rank]) as test2,),但问题是有大量没有数据的项目聚集在一起,所以只有第一个和最后一个项目没有数据在这些组中,可以分配上一个 and/or 下一个记录的销售(或单位)数据。

例如,如果第 10 行没有销售数据,但第 11 和 12 行有,则 LAG 和 LEAD 将起作用。另一方面,如果第 18、19、20、21 和 22 行没有数据,则只有第 18 和 22 行适用于 LAG 和 LEAD 方法。我不确定我是否可以以某种方式对 LAG 和 LEAD 进行分区,使其从下一个最接近的记录一直复制数据到那些没有数据的大组。

这是我尝试复制的模型示例:

以前的单位值列公式: =IF(E10="",G9,E10)

下一个单位值列公式: =IF(E10="",H11,E10)

上一页销售排名值列公式: =IF(E10="",I9,C10)

下一个销售排名值列公式: =IF(E10="",J11,C10)

估计。总单位列公式: =IF(E10="",(G10-((G10-H10)/(J10-I10))*(C10-I10)),E10)

数据透视表 table 按总计排序,总计实际上是父 ASIN 排名基于总计的销售排名,但两者均基于“销售排名:30 天平均”的平均值。这是提供给我们的。

这需要是动态的,因为排名(即总计列和父 ASIN 排名)会在数据刷新时发生变化。截至目前,如果第一行中没有包含数据的记录,则此模型会中断。

我正在做自引用的 CTE,但我开始质疑在 SQL 中复制具有这种结构的模型是否可能或极其困难。这可能不是最好的估算方法,但它是我公司一直在使用的方法(我比较新,想帮助他们实现流程自动化)。

我需要能够说“这条记录从上面的 X 行(上面最近的记录不为空)和下面的 Y 行(下面最近的记录不为空)中提取。

我尝试过使用 ROW_NUMBER() OVER(<按 X、Y 和 Z 排序>)以多种方式分配 ID。

这是我的脚本之一:

--create view vw_PT_Parent_ASIN_Units
--as
with a1 as (
    select
        a.[Adjusted Parent ASIN],
        a.[Fixed Brand],
        AVG([Sales Rank: 30 days avg#]) as Total
    from
        vw_Keepa_IFCN_Phase1 as a
    where
        [Include ASIN in Analysis?] = 1
    group by
        a.[Adjusted Parent ASIN],a.[Fixed Brand]
),
a2 as
(
    select 
        [Parent ASIN], 
        sum([Ordered Units]) as [Ordered Units],
        sum([Ordered Revenue]) as [Ordered Revenue]
    from 
        vw_RB_Sales
    group by 
        [Parent ASIN]
),
a3 as
(
    select
        a1.*,
        a2.[Parent ASIN] as [RB Parent ASIN],
        [RB Units] = 
            case
                when a2.[Parent ASIN] is null or a2.[Parent ASIN] = '' then null
                else a2.[Ordered Units]
            end,
        [RB Sales] = 
            case
                when a2.[Parent ASIN] is null or a2.[Parent ASIN] = '' then null
                else a2.[Ordered Revenue]
            end
    from
        a1
    left join
        a2
    on
        a1.[Adjusted Parent ASIN] = a2.[Parent ASIN]
),
Est_Model as
(
--I need to create a list of ASINs that are sorted by the "Total" 
    select 
        [Adjusted Parent ASIN], 
        ROW_NUMBER() over (order by Total, [Adjusted Parent ASIN]) as [Parent ASIN Rank]
    from 
        a3
    where 
        Total is not null
),
a4 as
(
    select 
        a3.*,
        b.[Parent ASIN Rank]
    from 
        a3
    left join 
        Est_Model as b
    on 
        a3.[Adjusted Parent ASIN] = b.[Adjusted Parent ASIN]
),
a5 as
(
    select distinct 
        a4a.[Adjusted Parent ASIN],
        a4a.[RB Parent ASIN],
        a4a.[Parent ASIN Rank], 
        Bounds=(ROW_NUMBER() over (order by [Parent ASIN Rank]))%2
    from
        a4 as a4a
    where
        a4a.total is not null
    and 
        a4a.[RB Units] is null
)
, test as (
select *,Groups=row_number() over (partition by Bounds order by [parent asin rank]) from a5 --order by [Parent ASIN Rank]
)
select * from test order by [Parent ASIN Rank]


select distinct t1.Groups, t1.[Parent ASIN Rank] as l, t2.[Parent ASIN Rank] as h
from test as t1
inner join test as t2
on t1.[Adjusted Parent ASIN] = t2.[Adjusted Parent ASIN]
where t1.Bounds=1
and t2.bounds =0


select Groups, Bounds, [1] as Lower_Bound, [0] as Upper_Bound
from test
pivot
    (
    sum(Bounds) for [Parent ASIN Rank]
    IN ([1],[0])
    )
as pvt
order by [Parent ASIN Rank]
--select distinct parent asin rank (low),parent asin rank (high),sales,units,total
--but i need to be able to separate the current parent ASIN ranks that you see in the results right now and I should do this by grabbing even and odd numbers
    
    
    /*
    select 
        *,
        ROW_NUMBER() over (order by [Parent ASIN Rank]),
        --DENSE_RANK() over
        (ROW_NUMBER() over (order by [Parent ASIN Rank]))/2,
        (ROW_NUMBER() over (order by [Parent ASIN Rank]))%2
    from
        a4
    where
        total is not null
    and 
        [RB Units] is null
    order by 
        [Parent ASIN Rank]
        */

--pivot so that the ranges are in separate columns and then use that as CTE and say if rank is in between a range then use this units number
--Low | High | Units | Sales
-- 1  |   5  | 2892  |  90186
-- 5  |   7  | 5076  |  121394


--  a4a.*, Test1= case when a4a.[Parent ASIN Rank] <> 1 then row_number() over (partition by a4a.[rb units] order by a4a.[Parent ASIN Rank]) else 0 end
--  , Test2=case when a4a.[rb units] is null then 1 else 0 end
--  , Test3=case when a4a.[rb units] is null then a4a.[Parent ASIN Rank]+1 else 0 end
--  , Test4=case when a4a.[rb units] is null then a4a.[Parent ASIN Rank]-1 else 0 end


--need to get the range. For example, the blank record falls in-between these two records
--I need to have one column with the low units and one column with the high units, but all in the same row....so there needs to be 2 joins...one to a low table and one to a high table
--This means that for low and high they need to match on the parent ASIN rank and an adjusted parent ASIN rank that matches the ASIN rank of the NULL record
--For example, B01M0ZV2CU has a rank of 5 which means that ranks 4 and 6 need to match that 5


--Need to fix duplicates which probably goes back to Keepa view definitions and joins



--Currently, my method works well for when there is a single row with NULL values, but it does not work for when there are multiple rows with NULL values

下面是该脚本的另一个版本,但是这个版本使用了 LAG 和 LEAD,它只适用于一个 NULL 记录:

--create view vw_PT_Parent_ASIN_Units
--as
with a1 as (
    select
        a.[Adjusted Parent ASIN],
        a.[Fixed Brand],
        AVG([Sales Rank: 30 days avg#]) as Total
    from
        vw_Keepa_IFCN_Phase1 as a
    where
        [Include ASIN in Analysis?] = 1
    group by
        a.[Adjusted Parent ASIN],a.[Fixed Brand]
),
a2 as
(
    select 
        [Parent ASIN], 
        sum([Ordered Units]) as [Ordered Units],
        sum([Ordered Revenue]) as [Ordered Revenue]
    from 
        vw_RB_Sales
    group by 
        [Parent ASIN]
),
a3 as
(
    select
        a1.*,
        a2.[Parent ASIN] as [RB Parent ASIN],
        [RB Units] = 
            case
                when a2.[Parent ASIN] is null or a2.[Parent ASIN] = '' then null
                else a2.[Ordered Units]
            end,
        [RB Sales] = 
            case
                when a2.[Parent ASIN] is null or a2.[Parent ASIN] = '' then null
                else a2.[Ordered Revenue]
            end
    from
        a1
    left join
        a2
    on
        a1.[Adjusted Parent ASIN] = a2.[Parent ASIN]
),
Est_Model as
(
--I need to create a list of ASINs that are sorted by the "Total" 
    select 
        [Adjusted Parent ASIN], 
        ROW_NUMBER() over (order by Total, [Adjusted Parent ASIN]) as [Parent ASIN Rank]
    from 
        a3
    where 
        Total is not null
),
a4 as
(
    select 
        a3.*,
        b.[Parent ASIN Rank]
    from 
        a3
    left join 
        Est_Model as b
    on 
        a3.[Adjusted Parent ASIN] = b.[Adjusted Parent ASIN]
)
    select 
        *, 
        LAG([RB Units],1,0) over (order by [Parent ASIN Rank]) as test1,
        LEAD([RB Units],1,0) over (order by [Parent ASIN Rank]) as test2,
        --LAG([RB Units],1,0) over (partition by [RB Units] order by [Parent ASIN Rank]) as test11,
        --LEAD([RB Units],1,0) over (partition by [RB Units] order by [Parent ASIN Rank]) as test22,
        test3=case when [rb units] is null then 1 else 0 end
    from
        a4
    where
        total is not null


--  a4a.*, Test1= case when a4a.[Parent ASIN Rank] <> 1 then row_number() over (partition by a4a.[rb units] order by a4a.[Parent ASIN Rank]) else 0 end
--  , Test2=case when a4a.[rb units] is null then 1 else 0 end
--  , Test3=case when a4a.[rb units] is null then a4a.[Parent ASIN Rank]+1 else 0 end
--  , Test4=case when a4a.[rb units] is null then a4a.[Parent ASIN Rank]-1 else 0 end


--need to get the range. For example, the blank record falls in-between these two records
--I need to have one column with the low units and one column with the high units, but all in the same row....so there needs to be 2 joins...one to a low table and one to a high table
--This means that for low and high they need to match on the parent ASIN rank and an adjusted parent ASIN rank that matches the ASIN rank of the NULL record
--For example, B01M0ZV2CU has a rank of 5 which means that ranks 4 and 6 need to match that 5


--Need to fix duplicates which probably goes back to Keepa view definitions and joins



--Currently, my method works well for when there is a single row with NULL values, but it does not work for when there are multiple rows with NULL values

最后,这是我发布的第二个脚本的输出截图。希望这将有助于澄清一些事情。

接受任何帮助!提前致谢!

COALESCE 结果是答案,因为它returns 列表中的第一个非空值