如何在 cassandra 2.2 中获得前 5 条记录

Question

我需要帮助。我有一个查询，它按日期（不是日期 + 时间）和金额总和获取前 5 条记录。

我写了以下内容，但它 returns 所有记录而不仅仅是前 5 条记录

CREATE OR REPLACE FUNCTION state_groupbyandsum( state map<text, double>, datetime text, amount text )
CALLED ON NULL INPUT
RETURNS map<text, double>
LANGUAGE java 
AS 'String date = datetime.substring(0,10); Double count = (Double) state.get(date);  if (count == null) count = Double.parseDouble(amount); else count = count +  Double.parseDouble(amount); state.put(date, count); return state;' ;


CREATE OR REPLACE AGGREGATE groupbyandsum(text, text) 
SFUNC state_groupbyandsum
STYPE map<text, double>
INITCOND {};

select groupbyandsum(datetime, amout) from warehouse;

你能帮忙弄到5条记录吗

Answer 1

这是一种方法。您按状态分组的功能可能是这样的：

CREATE FUNCTION state_group_and_total( state map<text, double>, type text, amount double )
CALLED ON NULL INPUT
RETURNS map<text, double>
LANGUAGE java AS '
     Double count = (Double) state.get(type);
     if (count == null)
         count = amount;
     else
         count = count + amount;
     state.put(type, count);
     return state;
';

这将建立一个由您的查询 WHERE 子句选择的所有数量行的映射。现在棘手的部分是如何只保留前 N 个。一种方法是使用 FINALFUNC，它在所有行都放入映射后执行。所以这里有一个函数可以使用循环来查找地图中的最大值并将其移动到结果地图。所以要找到前 N 个，它会在地图上迭代 N 次（有比这更有效的算法，但这只是一个快速而肮脏的例子）。

下面是一个查找前两个的示例：

CREATE FUNCTION topFinal (state map<text, double>)
CALLED ON NULL INPUT
RETURNS map<text, double>
LANGUAGE java AS '
    java.util.Map<String, Double> inMap = new java.util.HashMap<String, Double>(),
                                  outMap = new java.util.HashMap<String, Double>();

    inMap.putAll(state);

    int topN = 2;
    for (int i = 1; i <= topN; i++) {
        double maxVal = -1;
        String moveKey = null;
        for (java.util.Map.Entry<String, Double> entry : inMap.entrySet()) {

            if (entry.getValue() > maxVal) {
                maxVal = entry.getValue();
                moveKey = entry.getKey();
            }
        }
        if (moveKey != null) {
            outMap.put(moveKey, maxVal);
            inMap.remove(moveKey);
        }
    }

    return outMap;
';

最后你需要定义 AGGREGATE 来调用你定义的两个函数:

CREATE OR REPLACE AGGREGATE group_and_total(text, double) 
     SFUNC state_group_and_total 
     STYPE map<text, double> 
     FINALFUNC topFinal
     INITCOND {};

让我们看看这是否有效。

CREATE table test (partition int, clustering text, amount double, PRIMARY KEY (partition, clustering));
INSERT INTO test (partition , clustering, amount) VALUES ( 1, '2015', 99.1);
INSERT INTO test (partition , clustering, amount) VALUES ( 1, '2016', 18.12);
INSERT INTO test (partition , clustering, amount) VALUES ( 1, '2017', 44.889);
SELECT * from test;

 partition | clustering | amount
-----------+------------+--------
         1 |       2015 |   99.1
         1 |       2016 |  18.12
         1 |       2017 | 44.889

现在，击鼓...

SELECT group_and_total(clustering, amount) from test where partition=1;

 agg.group_and_total(clustering, amount)
-------------------------------------------
            {'2015': 99.1, '2017': 44.889}

所以你看到它根据金额保留了前 2 行。

请注意，键不会按排序顺序排列，因为它是一个映射，我认为我们无法控制映射中的键顺序，因此在 FINALFUNC 中排序会浪费资源。如果您需要对地图进行排序，那么您可以在客户端中进行排序。

我认为您可以在 state_group_and_total 函数中做更多的工作，以便在进行过程中从地图上删除项目。这样可以更好地防止地图变得太大。

如何在 cassandra 2.2 中获得前 5 条记录

How to get top 5 records in cassandra 2.2

java

user-defined-functions

cassandra

cql3