What is Hbase?
Hbase is a column-oriented database management system which runs on top of HDFS (Hadoop Distribute File System). Hbase is not a relational data store, and it does not support structured query language like SQL.
In Hbase, a master node regulates the cluster and region servers to store portions of the tables and operates the work on the data.
When should you use Hbase?
HBase should be used when the big data application has –
- A variable schema
- When data is stored in the form of collections
- If the application demands key based access to data while retrieving
What are the key components of HBase?
Key components of HBase are –
Region– This component contains memory data store and Hfile.
Region Server-This monitors the Region.
HBase Master-It is responsible for monitoring the region server.
Zookeeper- It takes care of the coordination between the HBase Master component and the client.
Catalog Tables-The two important catalog tables are ROOT and META.ROOT table tracks where the META table is and META table stores all the regions in the system.
How many Operational command in Hbase?
There are five main command in HBase.
- Get
- Put
- Delete
- Scan
- Increment
How to open a connection in Hbase?
If you are going to open connection with the help of Java API.
The following code provide the connection
Configuration myConf = HBaseConfiguration.create();
HTableInterface usersTable = new HTable(myConf, “users”);
What is the difference between hbase and hadoop/hdfs?
HDFS is a distributed file system that is well suited for the storage of large files. Its documentation states that it is not, however, a general-purpose file system, and does not provide fast individual record lookups in files. HBase, on the other hand, is built on top of HDFS and provides fast record lookups (and updates) for large tables. This can sometimes be a point of conceptual confusion. HBase internally puts your data in indexed Store Files that exist on HDFS for high-speed lookups.
What does Hbase consists of?
- Hbase consists of a set of tables
- And each table contains rows and columns like traditional database
- Each table must contain an element defined as a Primary Key
- Hbase column denotes an attribute of an object
What is WAL and Hlog in Hbase?
WAL (Write Ahead Log) is similar to MySQL BIN log; it records all the changes occur in data. It is a standard sequence file by Hadoop and it stores HLogkey’s. These keys consist of a sequential number as well as actual data and are used to replay not yet persisted data after a server crash. So, in cash of server failure WAL work as a life-line and retrieves the lost data’s.
When you should use Hbase?
Data size is huge: When you have tons and millions of records to operate
Complete Redesign: When you are moving RDBMS to Hbase, you consider it as a complete re-design then mere just changing the ports
SQL-Less commands: You have several features like transactions; inner joins, typed columns, etc.
Infrastructure Investment: You need to have enough cluster for Hbase to be really useful
Explain deletion in Hbase?
When you delete the cell in Hbase, the data is not actually deleted but a tombstone marker is set, making the deleted cells invisible. Hbase deleted are actually removed during compactions.
What are the three types of tombstone markers in Hbase?
Version delete marker: For deletion, it marks a single version of a column
Column delete marker: For deletion, it marks all the versions of a column
Family delete marker: For deletion, it marks of all column for a column family
What is compaction in Hbase?
As more and more data is written to Hbase, many HFiles get created. Compaction is the process of merging these HFiles to one file and after the merged file is created successfully, discard the old file.
What is a cell in Hbase?
A cell in Hbase is the smallest unit of a Hbase table which holds a piece of data in the form of a tuple{row,column,version}
What is the maximum recommended cell size?
A rough rule of thumb, with little empirical validation, is to keep the data in HDFS and store pointers to the data in HBase if you expect the cell size to be consistently above 10 MB. If you do expect large cell values and you still plan to use HBase for the storage of cell contents, youll want to increase the block size and the maximum region size for the table to keep the index size reasonable and the split frequency acceptable.
What are the different compaction types in Hbase?
There are two types of compaction. Major and Minor compaction.
In minor compaction, the adjacent small HFiles are merged to create a single HFile without removing the deleted HFiles. Files to be merged are chosen randomly.
In Major compaction, all the HFiles of a column are emerged and a single HFiles is created. The delted HFiles are discarded and it is generally triggered manually.
What are the different types of filters used in Hbase?
Filters are used to get specific data form a Hbase table rather than all the records.
They are of the following types.
- Column Value Filter
- Column Value comparators
- KeyValue Metadata filters.
- RowKey filters.
What is TTL (Time to live) in Hbase?
TTL is a data retention technique using which the version of a cell can be preserved till a specific time period. Once that timestamp is reached the specific version will be removed.
What is hotspotting in Hbase?
Hotspotting is asituation when a large amount of client traffic is directed at one node, or only a few nodes, of a cluster. This traffic may represent reads, writes, or other operations. This traffic overwhelms the single machine responsible for hosting that region, causing performance degradation and potentially leading to region unavailability.