We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 12
Apache HBase
Apache HBase • Apache Hbase is a non-relational (NoSQL) database.
• HBase was created for hosting very large tables with
billions of rows and millions of columns.
• Provides random , real-time data access.
• Allows table inserts, updates and deletes.
• Runs on top of the Hadoop distributed file system.
• Hbase data is automatically replicated by HDFS for
higher availability. Hbase Architecture Hbase Architecture • An Hbase table is automatically distributed across a set of cluster nodes to increase scalability and performance. Hbase can scale out to thousands of nodes. Each cluster node contains a portion of a table called a region. Each region contains some number of table rows. • Each region is managed by a RegionServer service. RegionServers typically run on the same machines that run the Hadoop distributed file system DataNode service. • RegionServers are managed by the Hmaster master service.
Hmaster functions include such things as:
Coordinating database metadata changes. Monitoring the RegionServer nodes Orchestrating load balanceing across RegionServer nodes. Orchestrating recovery from failed RegionServer nodes. • A Zookeeper cluster handles all configuration management. Hbase client programs communicate with ZooKeeper first to find the RegionServer node that manages the data to be read. • Clients access Hbase through a Java API, a REST interface, a Thrift gateway, or the Hbase shell command-line interface. Hbase Architecture Interaction between Dameons Key-Value Mappings • Hbase contains maps of keys and thier values. Key --> Value If we know the key, we can retrieve the value. • Keys are multi-part (column family name, rowID, column qualifier, timestamp) > value • Column family name- determines storage properties • All data in the same column family is stored together on disk. • rowID- used to access data and divide table data into regions. • Regions are maintained on seperate RegionServer nodes. • Column qualifier – the column name, which is just a label in the multi-part key • In any given row, one or more columns might or might not exist. • Timestamp-used to version the data and support data updates. Rows and Columns • Rows and Columns are implemented differently than in most relational databases. • A multi-part key identifies a cell with a value. • Because a table is just a set of key>value mappings, a row is nothing more than a logical collection of values. Hbase is a Column-Oriented Database • A Column-oriented database stores column items together on disk. • Column-oriented databases are well suited for: Fast column operations:
For Example
Calculating the sum or aggregate of an entire column of
data. Finding the 50 largest items in a column of 2 billion records. Spare datasets, which are common in big data use cases. Hbase Operations Overview
• Hbase operations include put , get , delete and scan.
• There is no structured query language (SQL). • Writes initially go to in-memory memstore. • Writes are immediately logged to disk for durability. • Writes are regularly flushed from memstore to a storefile on disk. HBase vs RDBMS