在我自己的列中使用 SQL 中的算术来填充第三列，其中它为零。（复杂，仅当满足某些条件时）

Question

所以这是我的问题。振作起来，因为你需要一些思考才能让你的头脑围绕我正在尝试做的事情。我正在与 Quarterly census employment and wage data 合作。 QCEW 数据有一种称为抑制代码的东西。如果数据面额（总体、位置商以及每年每个季度的年度数据）被抑制，则该面额的所有数据均为零。我的 table 设置如下（只显示与问题相关的列）：

A County_Id column,              
Industry_ID column,  
Year column,  
Qtr column,

隐藏列（0 表示未隐藏，1 表示隐藏），
Data_Category 列（1 表示总体，2 表示 lq，3 表示全年），
Data_Denomination 列（1-8 表示在该类别中查看的具体数据，例如：每月就业、应税工资等典型数据），和一个值列（如果 Data_Category 被抑制，它将为零 - 因为所有数据面额值都将为零）。

现在，如果 1991 年第 1 季度的总体数据（类别 1）被抑制，但明年第 1 季度的总体数据和全年（类别 1 和 3）都没有被抑制，那么我们可以推断自 OTY1991q1 = (Overall1991q1 - Overall1990q1) 以来，第一年的抑制数据的价值是多少。因此，为了找到被抑制的数据，我们只需从 cat 3（denom 1-8）值中减去 cat 1（denom 1-8）值，以替换前一年抑制值中的零。这在数学上很容易掌握，困难在于有数百万列可以用来检查这些标准。我正在尝试编写某种 SQL 查询来为我执行此操作，检查以确保 Overall-n qtr-n 被抑制，然后查看下一年是否对整体和oty，（可能是某种复杂的案例陈述？然后，如果满足这些条件，则对两个 Data_Cat-Data_Denom 类别执行算术运算，并替换相应的 Cat-Denom 值中的零。

下面是一个简单示例（已删除不相关 data_cats），我希望它能帮助理解我正在尝试做的事情。

|CountyID IndustryID Year Qtr Suppressed Data_Cat Data_Denom Value                                                                          
| 5            10      1990  1      1          1        1        0                                                                                                                                                     
| 5            10      1990  1      1          1        2        0                                                                                                                                                     
| 5            10      1990  1      1          1        3        0                                                                                                                                                     
| 5            10      1991  1      0          1        1        5                                                                                                                                                     
| 5            10      1991  1      0          1        2        15                                                                                                                                                    
| 5            10      1991  1      0          1        3        25                                                                                                                                                    
| 5            10      1991  1      0          3        1        20                                                                                                                                                    
| 5            10      1991  1      0          3        2        20                                                                                                                                                    
| 5            10      1991  1      0          3        3        35

所以基本上我们在这里要做的是从每个数据类别中获取整体数据（我删除了 lq ~ data_cat 2）因为它不相关并且 data_denom（哪个为简单起见，我已将 1991 年的 8 缩小到 3），从 1991 年的总值中减去它，就会得到适用的
|上一年 1990 年的价值 cat_1。所以这里 data_cat 1 Data_denom 1 将是 15 (20-5)，denom 2 将是 5(20-15)，denom 3 将是 10(35-25)。 (Oty 1991q1 - 整体 1991q1) = 1990q1。我希望这有帮助。就像我说的那样，问题不在于数学，它正在制定一个将检查此条件数百万次的查询。

Answer 1

如果你想在下一年和一个季度找到具有 2 行未压缩数据的压缩数据，我们可以使用 cross apply() 来做这样的事情：

测试设置：http://rextester.com/ORNCFR23551

使用 cross apply() 到 return 具有有效派生值的行：

select t.*
  , NewValue = cat3.value - cat1.value
from t
  cross apply (
      select i.value 
      from t as i
      where i.CountyID   = t.CountyID
        and i.IndustryID = t.IndustryID
        and i.Data_Denom = t.Data_Denom 
        and i.Year       = t.Year +1
        and i.Qtr        = t.Qtr
        and i.Suppressed = 0
        and i.Data_Cat   = 1
  ) cat1
  cross apply (
      select i.value 
      from t as i
      where i.CountyID   = t.CountyID
        and i.IndustryID = t.IndustryID
        and i.Data_Denom = t.Data_Denom 
        and i.Year       = t.Year +1
        and i.Qtr        = t.Qtr 
        and i.Suppressed = 0
        and i.Data_Cat   = 3
  ) cat3
where t.Suppressed = 1
  and t.Data_Cat   = 1

return秒：

+----------+------------+------+-----+------------+----------+------------+-------+----------+
| CountyID | IndustryID | Year | Qtr | Suppressed | Data_Cat | Data_Denom | Value | NewValue |
+----------+------------+------+-----+------------+----------+------------+-------+----------+
|        5 |         10 | 1990 |   1 |          1 |        1 |          1 |     0 |       15 |
|        5 |         10 | 1990 |   1 |          1 |        1 |          2 |     0 |        5 |
|        5 |         10 | 1990 |   1 |          1 |        1 |          3 |     0 |       10 |
+----------+------------+------+-----+------------+----------+------------+-------+----------+

使用 outer apply() 到 return 所有行

select t.*
  , NewValue = coalesce(nullif(t.value,0),cat3.value - cat1.value,0)
from t
  outer apply (
      select i.value 
      from t as i
      where i.CountyID   = t.CountyID
        and i.IndustryID = t.IndustryID
        and i.Data_Denom = t.Data_Denom 
        and i.Year       = t.Year +1
        and i.Qtr        = t.Qtr
        and i.Suppressed = 0
        and i.Data_Cat   = 1
  ) cat1
  outer apply (
      select i.value 
      from t as i
      where i.CountyID   = t.CountyID
        and i.IndustryID = t.IndustryID
        and i.Data_Denom = t.Data_Denom 
        and i.Year       = t.Year +1
        and i.Qtr        = t.Qtr
        and i.Suppressed = 0
        and i.Data_Cat   = 3
  ) cat3

returns:

+----------+------------+------+-----+------------+----------+------------+-------+----------+
| CountyID | IndustryID | Year | Qtr | Suppressed | Data_Cat | Data_Denom | Value | NewValue |
+----------+------------+------+-----+------------+----------+------------+-------+----------+
|        5 |         10 | 1990 |   1 |          1 |        1 |          1 |     0 |       15 |
|        5 |         10 | 1990 |   1 |          1 |        1 |          2 |     0 |        5 |
|        5 |         10 | 1990 |   1 |          1 |        1 |          3 |     0 |       10 |
|        5 |         10 | 1991 |   1 |          0 |        1 |          1 |     5 |        5 |
|        5 |         10 | 1991 |   1 |          0 |        1 |          2 |    15 |       15 |
|        5 |         10 | 1991 |   1 |          0 |        1 |          3 |    25 |       25 |
|        5 |         10 | 1991 |   1 |          0 |        3 |          1 |    20 |       20 |
|        5 |         10 | 1991 |   1 |          0 |        3 |          2 |    20 |       20 |
|        5 |         10 | 1991 |   1 |          0 |        3 |          3 |    35 |       35 |
+----------+------------+------+-----+------------+----------+------------+-------+----------+

Answer 2

更新 1 - 修复了一些列名称

更新 2 - 改进了第二个查询中的别名

好的，我想我明白了。

如果您只是想做出这样的推断，那么以下内容可能会有所帮助。（如果这只是您想在填补数据空白时做出的众多推论中的第一个，您可能会发现不同的方法会导致更有效的解决方案来执行 both/all 个推论，但我想当您跨过那座桥时到达那里...)

虽然大部分基本逻辑保持不变，但您如何调整它取决于您是否希望查询只是提供您将推断的值（例如驱动UPDATE 语句），或者是否要在更大的查询中使用此逻辑内联。出于性能原因，我怀疑前者更有意义（尤其是如果您可以执行一次更新，然后多次读取生成的数据集），所以我将从以这种方式构建事物开始，稍后再回到另一个。 ..

听起来您有一个包含所有这些列的 table（我将其称为 QCEW）。在这种情况下，使用连接将每个抑制的整体数据点（以下代码中的 c_oa）与一年后相应的整体和 oty 数据点相关联：

SELECT c_oa.*, n_oa.value - n_oty.value inferred_value
  FROM           QCEW c_oa    --current yr/qtr overall
      inner join QCEW n_oa    --next yr (same qtr) overall
              on c_oa.countyId = n_oa.countyId
             and c_oa.industryId = n_oa.industryId
             and c_oa.year = n_oa.year - 1
             and c_oa.qtr = n_oa.qtr
             and c_oa.data_denom = n_oa.data_denom
      inner join QCEW n_oty   --next yr (same qtr) over-the-year
              on c_oa.countyId = n_oty.countyId
             and c_oa.industryId = n_oty.industryId
             and c_oa.year = n_oty.year - 1
             and c_oa.qtr = n_oty.qtr
             and c_oa.data_denom = n_oty.data_denom
 WHERE c_oa.SUPPRESSED = 1
   AND c_oa.DATA_CAT = 1
   AND n_oa.SUPPRESSED = 0
   AND n_oa.DATA_CAT = 1
   AND n_oty.SUPPRESSED = 0
   AND n_oty.DATA_CAT = 3

现在听起来 table 很大，我们刚刚加入了它的 3 个实例；因此，要使其正常工作，您需要良好的物理设计（适当的 indexes/stats 用于连接列等）。这就是为什么我建议基于上述查询进行一次更新；当然，它可能运行很长，但是您可以立即阅读推断值。

但如果您真的想将其直接合并到数据查询中，您可以对其进行一些修改以显示所有值，并混合推断值。我们需要切换到外部连接来做到这一点，我将用连接条件做一些稍微奇怪的事情来使它适合：

SELECT src.COUNTYID
     , src.INDUSTRYID
     , src.YEAR
     , src.QTR
     , case when (n_oa.value - n_oty.value) is null 
            then src.suppressed 
            else 2
       end as SUPPRESSED_CODE -- 0=NOT SUPPRESSED, 1=SUPPRESSED, 2=INFERRED
     , src.DATA_CAT
     , src.DATA_DENOM
     , coalesce(n_oa.value - n_oty.value, src.value) as VALUE
  FROM          QCEW src     --a source row from which we'll generate a record
      left join QCEW n_oa    --next yr (same qtr) overall (if src is suppressed/overall)
             on src.countyId = n_oa.countyId
            and src.industryId = n_oa.industryId
            and src.year = n_oa.year - 1
            and src.qtr = n_oa.qtr
            and src.data_denom = n_oa.data_denom
            and src.SUPPRESSED = 1 and n_oa.SUPPRESSED = 0
            and src.DATA_CAT = 1 and n_oa.DATA_CAT = 1
      left join QCEW n_oty   --next yr (same qtr) over-the-year (if src is suppressed/overall)
             on src.countyId = n_oty.countyId
            and src.industryId = n_oty.industryId
            and src.year = n_oty.year - 1
            and src.qtr = n_oty.qtr
            and src.data_denom = n_oty.data_denom
            and src.SUPPRESSED = 1 and n_oty.SUPPRESSED = 0
            and src.DATA_CAT = 1 and n_oty.DATA_CAT = 3

在我自己的列中使用 SQL 中的算术来填充第三列，其中它为零。（复杂，仅当满足某些条件时）

Using Arithmetic in SQL on my own columns to fill a third column where it is zero. (complicated, only when certain criteria is met)

sql-server

math

if-statement

case

parameterized-query

在我自己的列中使用 SQL 中的算术来填充第三列，其中它为零。 （复杂，仅当满足某些条件时）

Using Arithmetic in SQL on my own columns to fill a third column where it is zero. (complicated, only when certain criteria is met)

sql-server

math

if-statement

case

parameterized-query

在我自己的列中使用 SQL 中的算术来填充第三列，其中它为零。（复杂，仅当满足某些条件时）