Hbase - 如何添加超级列族?
Hbase - How to add a super column family?
我正在尝试创建 Java 将 MYSQL 数据库转换为 NOSQL Hbase 数据库的应用程序。
到目前为止,它从 mysql 读取数据并将其正确地插入到 hbase
但现在我正在尝试处理 MYSQL 的 table 之间的关系,
我知道如果有关系,你应该添加 table 之一作为超级列族。
我查看了 apatch 网站文档我找不到任何东西。
有任何想法吗 ?
列族与关系无关。相比之下,您必须通过行键设计正确创建倒排索引,这可能允许通过了解另一个 table 的键来有效地从一个 table 检索数据。或者为了避免连接,尝试将所有数据存储在一行中。任何为 HBase 提供 SQL 接口的工具都会生成需要时间来启动和执行的作业。如果您执行 Get 操作或扫描连续的行,HBase 会很快。
希望这有用。
更新
有关列族的更多详细信息,请查看好书
构建 HBase 应用程序
A column family is an HBase-specific concept that you will not find in other RDBMS
applications. For the same region, different column families will store the data into
different files and can be configured differently. Data with the same access pattern
and the same format should be grouped into the same column family. As an example
regarding the format, if you need to store a lot of textual metadata information for
customer profiles in addition to image files for each customer’s profile photo, you
might want to store them into two different column families: one compressed (where
all the textual information will be stored), and one not compressed (where the image
files will be stored). As an example regarding the access pattern, if some information
is mostly read and almost never written, and some is mostly written and almost never
read, you might want to separate them into two different column families. If the different columns you want to store have a similar format and access pattern, regroup
them within the same column family.
The write cache memory area for a given RegionServer is shared by all the column
families configured for all the regions hosted by the given host. Abusing column families will put pressure on the memstore, which will generate many small files, which
in turn will generate a lot of compactions that might impact the performance. There
is no technical limitation on the number of column families you can configure for a
table. However, over the last three years, most of the use cases we had the chance to
work on only required a single column family. Some required two column families,
but each time we have seen more than two column families, it has been possible and
recommended to reduce the number to improve efficiency. If your design includes
more than three column families, you might want to take a deeper look at it and see if all those families are really required; most likely, they can be regrouped. If you do not
have any consistency constraints between your two columns families and data will
arrive into them at a different time, instead of creating two column families for a single table, you can also create two tables, each with a single column family. This strategy is useful when it comes time to decide the size of the regions. Indeed, while it was
better to keep the two column families almost the same size, by splitting them accross
two different tables, it is now easier to let me grow independently.
这个也很有用。
我正在尝试创建 Java 将 MYSQL 数据库转换为 NOSQL Hbase 数据库的应用程序。
到目前为止,它从 mysql 读取数据并将其正确地插入到 hbase
但现在我正在尝试处理 MYSQL 的 table 之间的关系,
我知道如果有关系,你应该添加 table 之一作为超级列族。
有任何想法吗 ?
列族与关系无关。相比之下,您必须通过行键设计正确创建倒排索引,这可能允许通过了解另一个 table 的键来有效地从一个 table 检索数据。或者为了避免连接,尝试将所有数据存储在一行中。任何为 HBase 提供 SQL 接口的工具都会生成需要时间来启动和执行的作业。如果您执行 Get 操作或扫描连续的行,HBase 会很快。 希望这有用。
更新
有关列族的更多详细信息,请查看好书 构建 HBase 应用程序
A column family is an HBase-specific concept that you will not find in other RDBMS applications. For the same region, different column families will store the data into different files and can be configured differently. Data with the same access pattern and the same format should be grouped into the same column family. As an example regarding the format, if you need to store a lot of textual metadata information for customer profiles in addition to image files for each customer’s profile photo, you might want to store them into two different column families: one compressed (where all the textual information will be stored), and one not compressed (where the image files will be stored). As an example regarding the access pattern, if some information is mostly read and almost never written, and some is mostly written and almost never read, you might want to separate them into two different column families. If the different columns you want to store have a similar format and access pattern, regroup them within the same column family. The write cache memory area for a given RegionServer is shared by all the column families configured for all the regions hosted by the given host. Abusing column families will put pressure on the memstore, which will generate many small files, which in turn will generate a lot of compactions that might impact the performance. There is no technical limitation on the number of column families you can configure for a table. However, over the last three years, most of the use cases we had the chance to work on only required a single column family. Some required two column families, but each time we have seen more than two column families, it has been possible and recommended to reduce the number to improve efficiency. If your design includes more than three column families, you might want to take a deeper look at it and see if all those families are really required; most likely, they can be regrouped. If you do not have any consistency constraints between your two columns families and data will arrive into them at a different time, instead of creating two column families for a single table, you can also create two tables, each with a single column family. This strategy is useful when it comes time to decide the size of the regions. Indeed, while it was better to keep the two column families almost the same size, by splitting them accross two different tables, it is now easier to let me grow independently.
这个