在一次扫描中使用 MUST_PASS_ONE/ALL 运算符组合两个过滤列表
Comibine two FilterLists with MUST_PASS_ONE/ALL operator in a single Scan
考虑 hbase shell scan 'table'
结果:
ROW COLUMN+CELL
000 column=F:Q, timestamp=1519299345645, value=a
001 column=F:Q, timestamp=1519299345645, value=b
010 column=F:Q, timestamp=1519299345645, value=c
011 column=F:Q, timestamp=1519299345645, value=b
100 column=F:Q, timestamp=1519299345645, value=a
110 column=F:Q, timestamp=1519299345645, value=c
200 column=F:Q, timestamp=1519299345645, value=b
210 column=F:Q, timestamp=1519299345645, value=a
我想要的 scan
结果:
- 行键以
0
或1
和 开头
- 第
F:Q
列的值为 a
或 b
上面的例子是:
ROW COLUMN+CELL
000 column=F:Q, timestamp=1519299345645, value=a
001 column=F:Q, timestamp=1519299345645, value=b
011 column=F:Q, timestamp=1519299345645, value=b
100 column=F:Q, timestamp=1519299345645, value=a
在 hbase shell 中,它将是(忽略所有 \s
和 \n
,我为了更好的可读性 ):
import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
import org.apache.hadoop.hbase.util.Bytes
scan 'table' {
COLUMNS => 'F:Q',
FILTER => "
(
(PrefixFilter('0'))
OR
(PrefixFilter('1'))
)
AND
(
SingleColumnValuFilter(
Bytes.toBytes("F"),
Bytes.toBytes("Q"),
CompareFilter::CompareOp.valueOf('EQUAL'),
Bytes.toBytes("a")
)
OR
SingleColumnValuFilter(
Bytes.toBytes("F"),
Bytes.toBytes("Q"),
CompareFilter::CompareOp.valueOf('EQUAL'),
Bytes.toBytes("b")
)
)
"
}
考虑到我在 java 中有两个过滤器列表:
List<Filter> prefixFilters = new ArrayList<>();
List<Filter> singleColumnValueFilters = new ArrayList();
PrefixFilter one = new PrefixFilter(Bytes.toBytes("1"));
PrefixFilter zero = new PrefixFilter(Bytes.toBytes("0"));
SingleColumnValueFilter a = new SingleColumnValueFilter(
Bytes.toBytes("F"),
Bytes.toBytes("Q"),
CompareFilter.CompareOp.EQUAL,
Bytes.toBytes("a")
);
SingleColumnValueFilter b = new SingleColumnValueFilter(
Bytes.toBytes("F"),
Bytes.toBytes("Q"),
CompareFilter.CompareOp.EQUAL,
Bytes.toBytes("b")
);
prefixFilters.add(zero);
prefixFilters.add(one);
singleColumnValueFilters.add(a);
singleColumnValueFilters.add(b);
FilterList prefixFiltersList = new FitlerList(FilterList.Operator.MUST_PASS_ONE, prefixFilters);
FilterList singleColumnValueFiltersList = new FitlerList(FilterList.Operator.MUST_PASS_ONE, singleColumnValueFilters);
问题: 我如何将它们组合成一个 scan.setFilter()
和 AND
运算符,就像我在 shell 中所做的那样?
我希望为此有特殊的 FilterList
构造函数,它将接受逻辑比较器 (AND
/ OR
) 和多个 List<Filter>
参数。由于有 none,我卡住了。
最后添加
FilterList filters = new FilterList(FilterList.Operator.MUST_PASS_ALL);
filters.addFilter(prefixFiltersList);
filters.addFilter(singleColumnValueFiltersList);
scan.setFilter(filters);
这确保两个 FilterList 都是 运行,并且 MUST_PASS_ALL
充当 AND
条件。
为什么这行得通?根据 FilterList JavaDoc:
Since you can use Filter Lists as children of Filter Lists, you can create a hierarchy of filters to be evaluated.
考虑 hbase shell scan 'table'
结果:
ROW COLUMN+CELL
000 column=F:Q, timestamp=1519299345645, value=a
001 column=F:Q, timestamp=1519299345645, value=b
010 column=F:Q, timestamp=1519299345645, value=c
011 column=F:Q, timestamp=1519299345645, value=b
100 column=F:Q, timestamp=1519299345645, value=a
110 column=F:Q, timestamp=1519299345645, value=c
200 column=F:Q, timestamp=1519299345645, value=b
210 column=F:Q, timestamp=1519299345645, value=a
我想要的 scan
结果:
- 行键以
0
或1
和 开头
- 第
F:Q
列的值为a
或b
上面的例子是:
ROW COLUMN+CELL
000 column=F:Q, timestamp=1519299345645, value=a
001 column=F:Q, timestamp=1519299345645, value=b
011 column=F:Q, timestamp=1519299345645, value=b
100 column=F:Q, timestamp=1519299345645, value=a
在 hbase shell 中,它将是(忽略所有 \s
和 \n
,我为了更好的可读性 ):
import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
import org.apache.hadoop.hbase.util.Bytes
scan 'table' {
COLUMNS => 'F:Q',
FILTER => "
(
(PrefixFilter('0'))
OR
(PrefixFilter('1'))
)
AND
(
SingleColumnValuFilter(
Bytes.toBytes("F"),
Bytes.toBytes("Q"),
CompareFilter::CompareOp.valueOf('EQUAL'),
Bytes.toBytes("a")
)
OR
SingleColumnValuFilter(
Bytes.toBytes("F"),
Bytes.toBytes("Q"),
CompareFilter::CompareOp.valueOf('EQUAL'),
Bytes.toBytes("b")
)
)
"
}
考虑到我在 java 中有两个过滤器列表:
List<Filter> prefixFilters = new ArrayList<>();
List<Filter> singleColumnValueFilters = new ArrayList();
PrefixFilter one = new PrefixFilter(Bytes.toBytes("1"));
PrefixFilter zero = new PrefixFilter(Bytes.toBytes("0"));
SingleColumnValueFilter a = new SingleColumnValueFilter(
Bytes.toBytes("F"),
Bytes.toBytes("Q"),
CompareFilter.CompareOp.EQUAL,
Bytes.toBytes("a")
);
SingleColumnValueFilter b = new SingleColumnValueFilter(
Bytes.toBytes("F"),
Bytes.toBytes("Q"),
CompareFilter.CompareOp.EQUAL,
Bytes.toBytes("b")
);
prefixFilters.add(zero);
prefixFilters.add(one);
singleColumnValueFilters.add(a);
singleColumnValueFilters.add(b);
FilterList prefixFiltersList = new FitlerList(FilterList.Operator.MUST_PASS_ONE, prefixFilters);
FilterList singleColumnValueFiltersList = new FitlerList(FilterList.Operator.MUST_PASS_ONE, singleColumnValueFilters);
问题: 我如何将它们组合成一个 scan.setFilter()
和 AND
运算符,就像我在 shell 中所做的那样?
我希望为此有特殊的
FilterList
构造函数,它将接受逻辑比较器 (AND
/ OR
) 和多个 List<Filter>
参数。由于有 none,我卡住了。
最后添加
FilterList filters = new FilterList(FilterList.Operator.MUST_PASS_ALL);
filters.addFilter(prefixFiltersList);
filters.addFilter(singleColumnValueFiltersList);
scan.setFilter(filters);
这确保两个 FilterList 都是 运行,并且 MUST_PASS_ALL
充当 AND
条件。
为什么这行得通?根据 FilterList JavaDoc:
Since you can use Filter Lists as children of Filter Lists, you can create a hierarchy of filters to be evaluated.