A Bigtable is a sparse, distributed, persistent multidimensional sorted map. The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes. HBase uses a data model very similar to that of Bigtable. Users store data rows in labelled tables. A data row has a sortable key and an arbitrary number of columns. The table is stored sparsely, so that rows in the same table can have crazily-varying columns, if the user likes. Unlike most map implementations, in HBase/BigTable the key/value pairs are kept in strict alphabetical order.
HBase is a NoSQL database that runs on top of a Hadoop cluster and provides random ream-time read/write access to data. It can also be used to store JSON in cells.
To perform data modeling for HBase with Hackolade, you must first download the HBase plugin.
Hackolade was specially adapted to support the data modeling of HBase, including the concepts of Column Families and Column Qualifiers. The application closely follows the terminology of the database.
The data model in the picture below results from the modeling of an example from "Introduction to HBase Schema Design" by Amandeep Khurana.
A namespace is a logical grouping of tables analogous to a database in relation database systems. This abstraction lays the groundwork for upcoming multi-tenancy related features. A namespace can be created, removed or altered. Namespace membership is determined during table creation.
Tables in HBase can serve as the input and output for MapReduce jobs run in Hadoop. Tables can also be used to store JSON. Tables are declared up front at schema definition time.
Row keys are uninterpreted bytes. Rows are lexicographically sorted with the lowest order appearing first in a table. The empty byte array is used to denote both the start and end of a tables' namespace.
Attributes data types
HBase supports a "bytes-in/bytes-out" interface, so anything that can be converted to an array of bytes can be stored as a value. Input could be strings, numbers, complex objects, or even images as long as they can rendered as bytes.
Hackolade was specially adapted to support the data types and attributes behavior of HBase.
Hackolade generates the create statement for the namespace, table, and column family.
The script can also be exported to the file system via the menu Tools > Forward-Engineering, or via the Command-Line Interface.
You may define the connection parameters to your instance. Take a look at this page for more information.
The Hackolade reverse-engineering process of an HBase table includes a query for a representative random sample of rows, followed by a schema inference based on the sample, leading. You can easily discover, document, and enrich the structure of your column families and qualifiers, plus infer the structure of JSON documents you store in HBase.