查询 and/or 函数，该函数将一个字段上的数据子集用于所有类型并计算其他字段

Question

我正在尝试从这里开始：

+------+------+------+------+
| fld1 | fld2 | fld3 | etc… |
+------+------+------+------+
| a    |    5 |    1 |      |
| b    |    5 |    0 |      |
| c    |    6 |    0 |      |
| b    |    2 |    5 |      |
| b    |    1 |    6 |      |
| c    |    0 |    6 |      |
| a    |    8 |    9 |      |
+------+------+------+------+

收件人：

+--------+--------+-----------+-----+-----+------+
| Factor |  Agg   | CalcDate  | Sum | Avg | etc… |
+--------+--------+-----------+-----+-----+------+
| fld2   | fld1/a | 8/14/2015 |  13 | 6.5 |      |
| fld2   | fld1/b | 8/14/2015 |   8 | 2.7 |      |
| fld2   | fld1/c | 8/14/2015 |   6 | 3   |      |
| fld3   | fld1/a | 8/14/2015 |  10 | 5   |      |
| fld3   | fld1/b | 8/14/2015 |  11 | 3.7 |      |
| fld3   | fld1/c | 8/14/2015 |   6 | 3   |      |
+--------+--------+-----------+-----+-----+------+

备注：

显然这个数据被简化了很多。
我有很多领域需要这样做
我在此处包括了简单的聚合计算，这样有人可以更轻松地帮助我。详尽的列表是：NaPct、Mean、Sd、Low、Q1、Median、Q3、High、IQR、Kurt、Skew、Obs。其中 NaPct = 为 NULL 的百分比，Sd = 标准差，Q1 = 四分位数 1，Q3 = 四分位数 3，IQR = 四分位数间距，Kurt = 峰度，Skew = 偏度，Obs = 非 NULL 的观察值数。
实际上，在上面的第二个 table 中，因子字段将是 FactorID，Agg 将是 AggID，CalcDate 将是 CalcDateID，但为了便于说明，我将实际值放在那里。不过 question/answer 应该无关紧要。
速度非常重要，因为我有 1305 个字段和几个聚合要在工作日开始前进行计算。
仅使用 MS Access、SQL 和 VBA 回答。对不起业务需求。也就是说，SQL 仅适用于 MS Access 的答案最简单。
下面是使用自定义域函数 (DCalcForQueries) 和我构建的支持函数的代码，return 每个字段和选定聚合计算一个聚合值。又名，不是我想要的。也许该代码可用于我想要的，也许不是。不过它有我想要的计算，希望能有所帮助。
消息框正是我在进行 alpha 测试时进行调试的方式：不需要。
要使用代码，请将所有代码放在 VBA 模块中，将 table "tbl_DatedModel_2015_0702_0" 更改为您在 MS Access 中的 table，更改字段"Rk-IU Mkt Cap" 到 table 中的一个字段，运行 TestIT() 子，您应该在立即 window.

不用那么担心计算。我会处理的。我只需要知道以允许我想要的计算的方式从上面的第一个 table 到上面的第二个 table 的最佳方法是什么。谢谢！

Sub TestIt()
Dim x
Set x = GetOrOpenAndGetExcel

Dim rst As DAO.Recordset
Dim sSql As String
Dim q As String
q = VBA.Chr(34)
sSql = "SELECT " & _
            "DCalcForQueries(" & q & "NaPct" & q & ", " & q & "tbl_DatedModel_2015_0702_0" & q & ", " & q & "Rk-IU Mkt Cap" & q & ", " & q & "[Rk-IU Mkt Cap] IS NOT NULL AND [GICS Sector] = 'Consumer Discretionary'" & q & ") AS NaPct ," & _
            "DCalcForQueries(" & q & "Mean" & q & ", " & q & "tbl_DatedModel_2015_0702_0" & q & ", " & q & "Rk-IU Mkt Cap" & q & ", " & q & "[Rk-IU Mkt Cap] IS NOT NULL AND [GICS Sector] = 'Consumer Discretionary'" & q & ") AS Mean ," & _
            "DCalcForQueries(" & q & "Sd" & q & ", " & q & "tbl_DatedModel_2015_0702_0" & q & ", " & q & "Rk-IU Mkt Cap" & q & ", " & q & "[Rk-IU Mkt Cap] IS NOT NULL AND [GICS Sector] = 'Consumer Discretionary'" & q & ") AS Sd ," & _
            "DCalcForQueries(" & q & "Low" & q & ", " & q & "tbl_DatedModel_2015_0702_0" & q & ", " & q & "Rk-IU Mkt Cap" & q & ", " & q & "[Rk-IU Mkt Cap] IS NOT NULL AND [GICS Sector] = 'Consumer Discretionary'" & q & ") AS Low ," & _
            "DCalcForQueries(" & q & "Q1" & q & ", " & q & "tbl_DatedModel_2015_0702_0" & q & ", " & q & "Rk-IU Mkt Cap" & q & ", " & q & "[Rk-IU Mkt Cap] IS NOT NULL AND [GICS Sector] = 'Consumer Discretionary'" & q & ") AS Q1 ," & _
            "DCalcForQueries(" & q & "Median" & q & ", " & q & "tbl_DatedModel_2015_0702_0" & q & ", " & q & "Rk-IU Mkt Cap" & q & ", " & q & "[Rk-IU Mkt Cap] IS NOT NULL AND [GICS Sector] = 'Consumer Discretionary'" & q & ") AS Median ," & _
            "DCalcForQueries(" & q & "Q3" & q & ", " & q & "tbl_DatedModel_2015_0702_0" & q & ", " & q & "Rk-IU Mkt Cap" & q & ", " & q & "[Rk-IU Mkt Cap] IS NOT NULL AND [GICS Sector] = 'Consumer Discretionary'" & q & ") AS Q3 ," & _
            "DCalcForQueries(" & q & "High" & q & ", " & q & "tbl_DatedModel_2015_0702_0" & q & ", " & q & "Rk-IU Mkt Cap" & q & ", " & q & "[Rk-IU Mkt Cap] IS NOT NULL AND [GICS Sector] = 'Consumer Discretionary'" & q & ") AS High ," & _
            "DCalcForQueries(" & q & "IQR" & q & ", " & q & "tbl_DatedModel_2015_0702_0" & q & ", " & q & "Rk-IU Mkt Cap" & q & ", " & q & "[Rk-IU Mkt Cap] IS NOT NULL AND [GICS Sector] = 'Consumer Discretionary'" & q & ") AS IQR ," & _
            "DCalcForQueries(" & q & "Kurt" & q & ", " & q & "tbl_DatedModel_2015_0702_0" & q & ", " & q & "Rk-IU Mkt Cap" & q & ", " & q & "[Rk-IU Mkt Cap] IS NOT NULL AND [GICS Sector] = 'Consumer Discretionary'" & q & ") AS Kurt ," & _
            "DCalcForQueries(" & q & "Skew" & q & ", " & q & "tbl_DatedModel_2015_0702_0" & q & ", " & q & "Rk-IU Mkt Cap" & q & ", " & q & "[Rk-IU Mkt Cap] IS NOT NULL AND [GICS Sector] = 'Consumer Discretionary'" & q & ") AS Skew ," & _
            "DCalcForQueries(" & q & "Obs" & q & ", " & q & "tbl_DatedModel_2015_0702_0" & q & ", " & q & "Rk-IU Mkt Cap" & q & ", " & q & "[Rk-IU Mkt Cap] IS NOT NULL AND [GICS Sector] = 'Consumer Discretionary'" & q & ") AS Obs " & _
            "FROM tbl_DatedModel_2015_0702_0;"
Debug.Print sSql
Set rst = CurrentDb.OpenRecordset(sSql, dbOpenSnapshot)
rst.MoveFirst

Debug.Print rst.RecordCount
Debug.Print rst.Fields("NaPct")
Debug.Print rst.Fields("Mean")
Debug.Print rst.Fields("Sd")
Debug.Print rst.Fields("Low")
Debug.Print rst.Fields("Q1")
Debug.Print rst.Fields("Median")
Debug.Print rst.Fields("Q3")
Debug.Print rst.Fields("High")
Debug.Print rst.Fields("IQR")
Debug.Print rst.Fields("Kurt")
Debug.Print rst.Fields("Skew")
Debug.Print rst.Fields("Obs")


End Sub
Public Function DCalcForQueries(sCalc As String, Optional sTbl As String = "", Optional sMainFld As String = "", Optional sWhereClause As String = "", Optional k As Double) As Variant

Dim dblData() As Double
Dim oxl As Object
On Error Resume Next
Set oxl = GetObject(, "Excel.Application")
If Err.Number <> 0 Then
    MsgBox "Excel object must be openned by the calling sub of DCalcForQueries so it isn't opened over and over, which is very slow"
    GoTo cleanup
End If

Dim x As Integer

Dim aV() As Variant
Dim tmp
Dim lObsCnt As Long
Dim lNaCnt As Long
Dim i As Long
Dim vTmp As Variant
Dim lTtl As Long
Dim bDoCalc As Boolean

aV = a2dvGetSubsetFromQuery(sTbl, sMainFld, sWhereClause, "Numeric")
If aV(0, 0) = "Not Numeric" Then
    MsgBox "Data returned by query was not numeric. Press OK to Stop and debug."
    Stop
End If

If sCalc = "Percentile" Or sCalc = "Q1" Or sCalc = "Q2" Or sCalc = "Q3" Or sCalc = "Q4" Then
    DCalcForQueries = oxl.WorksheetFunction.Percentile_Exc(aV, k)
ElseIf sCalc = "Median" Then
    DCalcForQueries = oxl.WorksheetFunction.Median(aV)
ElseIf sCalc = "Kurt" Or sCalc = "Kurt" Then
    DCalcForQueries = oxl.WorksheetFunction.Kurt(aV)
ElseIf sCalc = "Minimum" Or sCalc = "Low" Then
    DCalcForQueries = oxl.WorksheetFunction.Min(aV)
ElseIf sCalc = "Maximum" Or sCalc = "High" Then
    DCalcForQueries = oxl.WorksheetFunction.Max(aV)
ElseIf sCalc = "IQR" Then
    DCalcForQueries = oxl.WorksheetFunction.Quartile_Exc(aV, 3) - oxl.WorksheetFunction.Quartile_Exc(aV, 1)
ElseIf sCalc = "Obs" Then
    lObsCnt = 0
    For Each tmp In aV
        If Not IsNull(tmp) Then
            lObsCnt = lObsCnt + 1
        End If
    Next
    DCalcForQueries = lObsCnt
ElseIf sCalc = "%NA" Or sCalc = "PctNa" Or sCalc = "NaPct" Or sCalc = "%Null" Or sCalc = "PctNull" Then
    lNaCnt = 0
    lTtl = UBound(aV, 2) + 1
    For Each tmp In aV
        If IsNull(tmp) Then
            lNaCnt = lNaCnt + 1
        End If
    Next
    DCalcForQueries = (lNaCnt / lTtl) * 100
ElseIf sCalc = "Skewness" Or sCalc = "Skew" Then
    DCalcForQueries = oxl.WorksheetFunction.Skew(aV)
ElseIf sCalc = "StDev" Or sCalc = "Sd" Then
    DCalcForQueries = oxl.WorksheetFunction.StDev_S(aV)
ElseIf sCalc = "Mean" Then
    DCalcForQueries = oxl.WorksheetFunction.Average(aV)
Else
    MsgBox "sCalc parameter not recognized: " & sCalc
End If

cleanup:


End Function

Function a2dvGetSubsetFromQuery(sTbl As String, sMainFld As String, sWhereClause As String, sTest As String) As Variant()
'sTest can be  "Numeric" or "None" ...will implement more as needed
Dim iFieldType As Integer
Dim rst As DAO.Recordset
Dim db As Database
Set db = CurrentDb
Dim sMainFldFull As String
Dim sSubSetFldFull As String
Dim sSql As String

sMainFldFull = "[" & sMainFld & "]"
sSubSetFldFull = ""
sSql = ""

sSql = "SELECT " & sMainFldFull & " FROM " & sTbl
If Len(sWhereClause) > 0 Then
    sSql = sSql & " WHERE " & sWhereClause
End If

Set rst = db.OpenRecordset(sSql, dbOpenSnapshot)

'make sure the data is the right type

iFieldType = rst(sMainFld).Type

If sTest = "Numeric" Then
    If iFieldType = dbByte Or _
        iFieldType = dbInteger Or _
        iFieldType = dbLong Or _
        iFieldType = dbCurrency Or _
        iFieldType = dbSingle Or _
        iFieldType = dbDouble _
        Then
        rst.MoveLast
        rst.MoveFirst

        a2dvGetSubsetFromQuery = rst.GetRows(rst.RecordCount)

    Else
        Dim aV(0 To 1, 0 To 1) As Variant
        aV(0, 0) = "Not Numeric"
        a2dvGetSubsetFromQuery = aV

    End If
ElseIf sTest = "None" Then
    'don't do any testing
    rst.MoveLast
    rst.MoveFirst

    a2dvGetSubsetFromQuery = rst.GetRows(rst.RecordCount)
Else
    MsgBox "Test type (sTest) can only be 'None' or 'Numeric'. It was: " & sTest
    Stop
End If

cleanup:
rst.Close
Set rst = Nothing
End Function
Public Function GetOrOpenAndGetExcel() As Object
'if excel is open it will return the excel object
'if excel is not open it will open excel and return the excel object
On Error GoTo 0
On Error Resume Next
Set GetOrOpenAndGetExcel = GetObject(, "Excel.Application")

If Err.Number <> 0 Then
    Set GetOrOpenAndGetExcel = CreateObject("Excel.Application")
End If

On Error GoTo 0
End Function

Edit1：我上面提供的代码只是为了说明我的尝试和计算。我很确定它与好的答案没有直接关系，但我不是 100% 确定。如果我使用上面的内容，它一次生成一条记录，我必须一次添加（INSERT INTO）每条记录，这会很慢。我的计划是构建一个二维结果数组并使用该二维数组批量添加记录，但被告知如果不循环遍历数组一次添加每条记录就不能这样做，这会破坏目的。我很确定一个解决方案，包括循环遍历 fld1 类型或一个带有子查询的查询，可以在一个步骤中完成，这是应该采取的方向。到目前为止我为优化所做的工作：我将 Excel 对象的创建拉出，因此只在 TestIt() Sub 中创建一次。

Edit2：我有 1305 个字段需要计算。他们不都是一样的table；但是，出于这个问题的目的，我只需要一个一次可以处理多个字段的有效答案。 IE。您的答案可以假设所有字段都在相同的 table 中，为简单起见，您的答案可以只包含 2 个字段，我可以从那里扩展它。在上面的代码中，我在一个字段 "Rk-IU Mkt Cap" 上计算了 12 个指标，聚合了一种类型，'Consumer Discretionary' ([GICS Sector] = 'Consumer Discretionary'")。我拥有的不是我想要的.

Answer 1

如果只使用纯 tSql，这样的东西行得通吗？

1：创建table并插入一些示例数据

CREATE TABLE [dbo].[FLD](
    [fld1] [nvarchar](2) NOT NULL,
    [fld2] [int] NULL,
    [fld3] [int] NULL
) ON [PRIMARY]

GO

INSERT FLD VALUES ('a', 5, 9)
INSERT FLD VALUES ('b', 1, 8)
INSERT FLD VALUES ('a', 3, 7)

2：使用嵌套 UNPIVOT 来创建因子

SELECT t.factor,t.val + '/' + t.v  AS Agg, SUM(value) AS [Sum], AVG(value) AS [AVG]
FROM
(
    SELECT * from
    (
        select * from FLD f
        UNPIVOT
        (
            v
            for val in (fld1)
        ) piv
        ) f
    UNPIVOT 
    (
        value
        for factor in (fld2, fld3)
    ) s
) t 
group by t.v, t.factor, t.val

Answer 2

这将是 Access 数据库引擎要处理的一大难题。它只会让你的数据集增长变得更糟。我建议获取 SQL Server Express 的免费版本，并仅将 Access 用作前端界面。然后随着您的成长，您可以将所有数据库移动到 SQL 服务器……这是一个更强大的数据库引擎。你会很高兴你现在学会了。

SQL Server Express

如果你走这条路，你可以完全使用 T-SQL 和完全基于集合的方法来完成这一切。加速将是剧烈的。我不能在这里给你所有的细节，但总的来说这是你需要做的。在线文档和 Google 可以帮助您完成每个步骤：

安装SQL服务器Express
创建数据库
将您的数据从访问 table 迁移到您的数据库。
创建一个存储过程来更新您的聚合 table。（见下文）
如果您想要访问前端...我建议您创建一个新的 ADP（访问项目文件）并将其连接到您的 SQL 服务器数据库。您将能够根据您的 SQL 服务器 table 和运行过程创建表单和报告。但您也可以只使用标准访问项目并使用传递查询来获取数据或运行过程。

如果您将第一个 table 结构更改为如下所示，则将数据插入聚合 table 的过程会更容易：

+------+------+------+
| fld1 |fname | fval |
+------+------+------+
| a    | fld2 |    5 |
| a    | fld2 |    8 |
| b    | fld2 |    5 |
| b    | fld2 |    2 |
| b    | fld2 |    1 |
| c    | fld2 |    6 |
| c    | fld2 |    0 |
| a    | fld3 |    1 |
| a    | fld3 |    9 |
| b    | fld3 |    0 |
| b    | fld3 |    5 |
| b    | fld3 |    6 |
| c    | fld3 |    0 |
| c    | fld3 |    6 |
+------+------+------+

虽然您可能不想更改基础数据 table 结构；如果没有，您可以创建一个视图作为一个大联合查询，以这种格式输出它：

select fld1,
    'fld2' fname,
    fld2 fval
from OrigDataTable
union all
select fld1,
    'fld3' fname,
    fld3 fval
from OrigDataTable
union all
...etc

那么您插入聚合数据的查询将类似于：

insert into AggreateTable
select Fname, 
    fld1,
    CONVERT(date, getdate()) CalcDate,
    SUM(fval) sum,
    AVG(Fval) avg,
    ...etc.
from DataTable
Group by Fname, fld1

以下是一些有助于构建聚合函数表达式的链接：

如果您想使用 Access 尝试这种方法，这些可能会有所帮助：

Calculating skewness of a data distribution in SQL in Access without an additional subquery
Median, Mode, Skewness, and Kurtosis in MS Access

您可能会得到类似这种方法的方法来完全在访问中工作...但我真的认为访问会处理太多...如果不是今天某个时候的话。

Answer 3

你需要的是一个支点table.

你有两个选择：

迁移到 SQL 服务器

这是首选方法，然后您可以像@Johnv2020 建议的那样使用T-SQL。

要在 sql 服务器中阅读有关 PIVOT 和 UNPIVOT 的更多信息，click here

Access/Excel 枢轴 table

我个人比较熟悉Excel的pivottable，但是access好像也是这个概念(see here)。

您的代码的预期结果基本上是运行多个数据透视表 tables，具有不同的聚合（平均值，总和， ...)，这可以通过使用 VBA 宏

自动化枢轴 tables 来完成

Answer 4

这在 MS Access 中似乎并不难。如果我的逻辑正确：

select "fld2" as factor, "fld1/"&fld1, #8/14/2015# as calcdate,
       sum(fld2), avg(fld2)
from table
group by fld1
union all
select "fld3" as factor, "fld1/"&fld1, #8/14/2015# as calcdate,
       sum(fld3), avg(fld3)
from table
group by fld1;

查询 and/or 函数，该函数将一个字段上的数据子集用于所有类型并计算其他字段

Query and/or function that subsets the data on one field for all types and calcs on other field

sql

ms-access

vba

subquery

aggregate-functions

迁移到 SQL 服务器

Access/Excel 枢轴 table