Experiment 2
Experiment 2
Theory:
Hbase is an open source and sorted map data built on Hadoop. It is column oriented and horizontally
scalable.
It is based on Google's Big Table.It has set of tables which keep data in key value format. Hbase is well
suited for sparse data sets which are very common in big data use cases. Hbase provides APIs enabling
development in practically any programming language. It is a part of the Hadoop ecosystem that
provides random real-time read/write access to data in the Hadoop File System.
RDBMS get exponentially slow as the data becomes large
Expects data to be highly structured, i.e. ability to fit in a well-defined schema
Any change in schema might require a downtime
For sparse datasets, too much of overhead of maintaining NULL values
Features of Hbase
Horizontally scalable: You can add any number of columns anytime.
Automatic Failover: Automatic failover is a resource that allows a system administrator to
automatically switch data handling to a standby system in the event of system compromise
Integrations with Map/Reduce framework: Al the commands and java codes internally
implement Map/ Reduce to do the task and it is built over Hadoop Distributed File
System.
sparse, distributed, persistent, multidimensional sorted map, which is indexed by
rowkey, column key,and timestamp.
Often referred as a key value store or column family-oriented database, or storing versioned
maps of maps.
fundamentally, it's a platform for storing and retrieving data with random access.
It doesn't care about datatypes(storing an integer in one row and a string in another for
the same column).
It doesn't enforce relationships within your data.
It is designed to run on a cluster of computers, built using commodity hardware.
Hbase commands
Step 1:First go to terminal and type StartCDH.sh
Step 2:Next type jps command in the terminal
Step 5:hbase(main):001:0>version
Version will gives you the version of hbase
Create Table Syntax
hbase(main):011:0> create
'newtbl','knowledge'
hbase(main):011:0>describe 'newtbl'
hbase(main):011:0>status
1 servers, 0 dead, 15.0000 average load
Verification
After disabling the table, you can still sense its existence
through list and exists commands. You cannot scan it. It will give you the following error.
hbase(main):028:0> scan 'newtbl'
ROW COLUMN + CELL
ERROR: newtbl is disabled.
is_disabled
This command is used to find whether a table is disabled. Its syntax is as follows.
hbase> is_disabled 'table name'
disable_all
This command is used to disable all the tables matching the given regex. The syntax for
disable_all command is given below.
hbase> disable_all 'r.*'
Suppose there are 5 tables in HBase, namely raja, rajani, rajendra, rajesh, and raju. The following code
will disable all the tables starting with raj.
hbase(main):002:07> disable_all 'raj.*'
raja
rajani
rajendra
rajesh
raju
Disable the above 5 tables (y/n)?
y
5 tables successfully disabled
enable ‘newtbl’
Example
Given below is an example to enable a table.
Verification
After enabling the table, scan it. If you can see the schema, your table is successfully enabled.
is_enabled
This command is used to find whether a table is enabled. Its syntax is as follows:
hbase> is_enabled 'table name'
The following code verifies whether the table named emp is enabled. If it is enabled, it will return true
and if not, it will return false.
hbase(main):031:0> is_enabled 'newtbl'
true
0 row(s) in 0.0440 seconds
describe
This command returns the description of the table. Its syntax is as follows:
hbase> describe 'table name'