Tutorialspoint HBase Pig
Tutorialspoint HBase Pig
You can create a table using the create command, here you must
specify the table name and the Column Family name.
The syntax to create a table in HBase shell is shown below.
Example
• put command,
Let us insert the first row values into the emp table as shown
below.
The newly given value replaces the existing value, updating the
row.
Example
Suppose there is a table in HBase called emp with the following
data.
The following command will update the city value of the employee
named ‘Raju’ to Delhi.
Example
The following example shows how to use the get command. Let
us scan the first row of the emp table.
Example
Here is an example to delete a specific cell. Here we are deleting
the salary.
Example
Verify the table using the scan command. A snapshot of the table
after deleting the table is given below.
count
You can count the number of rows of a table using
the count command. Its syntax is as follows:
After deleting the first row, emp table will have two rows. Verify it
as shown below.
Syntax
Given below is the syntax of the group operator.
Example
Assume that we have a file named student_details.txt in the HDFS
directory /pig_data/ as shown below.
student_details.txt
001,Rajiv,Reddy,21,9848022337,Hyderabad
002,siddarth,Battacharya,22,9848022338,Kolkata
003,Rajesh,Khanna,22,9848022339,Delhi
004,Preethi,Agarwal,21,9848022330,Pune
005,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar
006,Archana,Mishra,23,9848022335,Chennai
007,Komal,Nayak,24,9848022334,trivendram
008,Bharathi,Nambiayar,24,9848022333,Chennai
And we have loaded this file into Apache Pig with the relation
name student_details as shown below.
grunt> student_details = LOAD
'hdfs://localhost:9000/pig_data/student_details.txt'
USING PigStorage(',')
as (id:int, firstname:chararray, lastname:chararray,
age:int, phone:chararray, city:chararray);
Verification
Verify the relation group_data using the DUMP operator as shown
below.
Output
Then you will get output displaying the contents of the relation
named group_data as shown below. Here you can observe that the
resulting schema has two columns −
You can see the schema of the table after grouping the data using
the describe command as shown below.
• Self-join
• Inner-join
• Outer-join − left join, right join, and full join
This chapter explains with examples how to use the join operator
in Pig Latin. Assume that we have two files
namely customers.txt and orders.txt in the /pig_data/ directory of
HDFS as shown below.
customers.txt
1,Ramesh,32,Ahmedabad,2000.00
2,Khilan,25,Delhi,1500.00
3,kaushik,23,Kota,2000.00
4,Chaitali,25,Mumbai,6500.00
5,Hardik,27,Bhopal,8500.00
6,Komal,22,MP,4500.00
7,Muffy,24,Indore,10000.00
orders.txt
102,2009-10-08 00:00:00,3,3000
100,2009-10-08 00:00:00,3,1500
101,2009-11-20 00:00:00,2,1560
103,2008-05-20 00:00:00,4,2060
And we have loaded these two files into Pig with the
relations customers and orders as shown below.
Self - join
Self-join is used to join a table with itself as if the table were two
relations, temporarily renaming at least one relation.
Syntax
Example
Verification
Output
(1,Ramesh,32,Ahmedabad,2000,1,Ramesh,32,Ahmedabad,2000)
(2,Khilan,25,Delhi,1500,2,Khilan,25,Delhi,1500)
(3,kaushik,23,Kota,2000,3,kaushik,23,Kota,2000)
(4,Chaitali,25,Mumbai,6500,4,Chaitali,25,Mumbai,6500)
(5,Hardik,27,Bhopal,8500,5,Hardik,27,Bhopal,8500)
(6,Komal,22,MP,4500,6,Komal,22,MP,4500)
(7,Muffy,24,Indore,10000,7,Muffy,24,Indore,10000)
Inner Join
Inner Join is used quite frequently; it is also referred to as equijoin.
An inner join returns rows when there is a match in both tables.
Syntax
Example
Let us perform inner join operation on the two
relations customers and orders as shown below.
Verification
Output
You will get the following output that will the contents of the
relation named coustomer_orders.
(2,Khilan,25,Delhi,1500,101,2009-11-20 00:00:00,2,1560)
(3,kaushik,23,Kota,2000,100,2009-10-08 00:00:00,3,1500)
(3,kaushik,23,Kota,2000,102,2009-10-08 00:00:00,3,3000)
(4,Chaitali,25,Mumbai,6500,103,2008-05-20 00:00:00,4,2060)
Note −
Outer Join: Unlike inner join, outer join returns all the rows from at
least one of the relations. An outer join operation is carried out in
three ways −
Syntax
Example
Verification
Output
(1,Ramesh,32,Ahmedabad,2000,,,,)
(2,Khilan,25,Delhi,1500,101,2009-11-20 00:00:00,2,1560)
(3,kaushik,23,Kota,2000,100,2009-10-08 00:00:00,3,1500)
(3,kaushik,23,Kota,2000,102,2009-10-08 00:00:00,3,3000)
(4,Chaitali,25,Mumbai,6500,103,2008-05-20 00:00:00,4,2060)
(5,Hardik,27,Bhopal,8500,,,,)
(6,Komal,22,MP,4500,,,,)
(7,Muffy,24,Indore,10000,,,,)
Syntax
Example
Verification
Output
(2,Khilan,25,Delhi,1500,101,2009-11-20 00:00:00,2,1560)
(3,kaushik,23,Kota,2000,100,2009-10-08 00:00:00,3,1500)
(3,kaushik,23,Kota,2000,102,2009-10-08 00:00:00,3,3000)
(4,Chaitali,25,Mumbai,6500,103,2008-05-20 00:00:00,4,2060)
Syntax
Example
Verification
Output
(1,Ramesh,32,Ahmedabad,2000,,,,)
(2,Khilan,25,Delhi,1500,101,2009-11-20 00:00:00,2,1560)
(3,kaushik,23,Kota,2000,100,2009-10-08 00:00:00,3,1500)
(3,kaushik,23,Kota,2000,102,2009-10-08 00:00:00,3,3000)
(4,Chaitali,25,Mumbai,6500,103,2008-05-20 00:00:00,4,2060)
(5,Hardik,27,Bhopal,8500,,,,)
(6,Komal,22,MP,4500,,,,)
(7,Muffy,24,Indore,10000,,,,)
Syntax
Given below is the syntax of the SPLIT operator.
Example
Assume that we have a file named student_details.txt in the HDFS
directory /pig_data/ as shown below.
student_details.txt
001,Rajiv,Reddy,21,9848022337,Hyderabad
002,siddarth,Battacharya,22,9848022338,Kolkata
003,Rajesh,Khanna,22,9848022339,Delhi
004,Preethi,Agarwal,21,9848022330,Pune
005,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar
006,Archana,Mishra,23,9848022335,Chennai
007,Komal,Nayak,24,9848022334,trivendram
008,Bharathi,Nambiayar,24,9848022333,Chennai
And we have loaded this file into Pig with the relation
name student_details as shown below.
student_details = LOAD
'hdfs://localhost:9000/pig_data/student_details.txt'
USING PigStorage(',')
as (id:int, firstname:chararray, lastname:chararray,
age:int, phone:chararray, city:chararray);
Let us now split the relation into two, one listing the employees of
age less than 23, and the other listing the employees having the
age between 22 and 25.
Verification
Output
Syntax
Given below is the syntax of the FILTER operator.
Example
Assume that we have a file named student_details.txt in the HDFS
directory /pig_data/ as shown below.
student_details.txt
001,Rajiv,Reddy,21,9848022337,Hyderabad
002,siddarth,Battacharya,22,9848022338,Kolkata
003,Rajesh,Khanna,22,9848022339,Delhi
004,Preethi,Agarwal,21,9848022330,Pune
005,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar
006,Archana,Mishra,23,9848022335,Chennai
007,Komal,Nayak,24,9848022334,trivendram
008,Bharathi,Nambiayar,24,9848022333,Chennai
And we have loaded this file into Pig with the relation
name student_details as shown below.
Verification
Output
(6,Archana,Mishra,23,9848022335,Chennai)
(8,Bharathi,Nambiayar,24,9848022333,Chennai)
Syntax
Given below is the syntax of the DISTINCT operator.
Example
Assume that we have a file named student_details.txt in the HDFS
directory /pig_data/ as shown below.
student_details.txt
001,Rajiv,Reddy,9848022337,Hyderabad
002,siddarth,Battacharya,9848022338,Kolkata
002,siddarth,Battacharya,9848022338,Kolkata
003,Rajesh,Khanna,9848022339,Delhi
003,Rajesh,Khanna,9848022339,Delhi
004,Preethi,Agarwal,9848022330,Pune
005,Trupthi,Mohanthy,9848022336,Bhuwaneshwar
006,Archana,Mishra,9848022335,Chennai
006,Archana,Mishra,9848022335,Chennai
And we have loaded this file into Pig with the relation
name student_details as shown below.
Verification
Output
(1,Rajiv,Reddy,9848022337,Hyderabad)
(2,siddarth,Battacharya,9848022338,Kolkata)
(3,Rajesh,Khanna,9848022339,Delhi)
(4,Preethi,Agarwal,9848022330,Pune)
(5,Trupthi,Mohanthy,9848022336,Bhuwaneshwar)
(6,Archana,Mishra,9848022335,Chennai)
Example
Assume that we have a file named student_details.txt in the HDFS
directory /pig_data/ as shown below.
student_details.txt
001,Rajiv,Reddy,21,9848022337,Hyderabad
002,siddarth,Battacharya,22,9848022338,Kolkata
003,Rajesh,Khanna,22,9848022339,Delhi
004,Preethi,Agarwal,21,9848022330,Pune
005,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar
006,Archana,Mishra,23,9848022335,Chennai
007,Komal,Nayak,24,9848022334,trivendram
008,Bharathi,Nambiayar,24,9848022333,Chennai
And we have loaded this file into Pig with the relation
name student_details as shown below.
Let us now get the id, age, and city values of each student from
the relation student_details and store it into another relation
named foreach_data using the foreach operator as shown below.
Verification
Output
(1,21,Hyderabad)
(2,22,Kolkata)
(3,22,Delhi)
(4,21,Pune)
(5,23,Bhuwaneshwar)
(6,23,Chennai)
(7,24,trivendram)
(8,24,Chennai)
Syntax
Given below is the syntax of the ORDER BY operator.
Example
Assume that we have a file named student_details.txt in the HDFS
directory /pig_data/ as shown below.
student_details.txt
001,Rajiv,Reddy,21,9848022337,Hyderabad
002,siddarth,Battacharya,22,9848022338,Kolkata
003,Rajesh,Khanna,22,9848022339,Delhi
004,Preethi,Agarwal,21,9848022330,Pune
005,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar
006,Archana,Mishra,23,9848022335,Chennai
007,Komal,Nayak,24,9848022334,trivendram
008,Bharathi,Nambiayar,24,9848022333,Chennai
And we have loaded this file into Pig with the relation
name student_details as shown below.
Verification
Output
(8,Bharathi,Nambiayar,24,9848022333,Chennai)
(7,Komal,Nayak,24,9848022334,trivendram)
(6,Archana,Mishra,23,9848022335,Chennai)
(5,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar)
(3,Rajesh,Khanna,22,9848022339,Delhi)
(2,siddarth,Battacharya,22,9848022338,Kolkata)
(4,Preethi,Agarwal,21,9848022330,Pune)
(1,Rajiv,Reddy,21,9848022337,Hyderabad)
Syntax
Given below is the syntax of the LIMIT operator.
Example
Assume that we have a file named student_details.txt in the HDFS
directory /pig_data/ as shown below.
student_details.txt
001,Rajiv,Reddy,21,9848022337,Hyderabad
002,siddarth,Battacharya,22,9848022338,Kolkata
003,Rajesh,Khanna,22,9848022339,Delhi
004,Preethi,Agarwal,21,9848022330,Pune
005,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar
006,Archana,Mishra,23,9848022335,Chennai
007,Komal,Nayak,24,9848022334,trivendram
008,Bharathi,Nambiayar,24,9848022333,Chennai
And we have loaded this file into Pig with the relation
name student_details as shown below.
Now, let’s sort the relation in descending order based on the age
of the student and store it into another relation
named limit_data using the ORDER BY operator as shown below.
Verification
(1,Rajiv,Reddy,21,9848022337,Hyderabad)
(2,siddarth,Battacharya,22,9848022338,Kolkata)
(3,Rajesh,Khanna,22,9848022339,Delhi)
(4,Preethi,Agarwal,21,9848022330,Pune)
Eval Functions
Given below is the list of eval functions provided by Apache Pig.
AVG()
1
To compute the average of the numerical values within a bag.
BagToString()
2
To concatenate the elements of a bag into a string. While concatenating, we
can place a delimiter between these values (optional).
CONCAT()
3
To concatenate two or more expressions of same type.
COUNT()
4
To get the number of elements in a bag, while counting the number of tuples
in a bag.
COUNT_STAR()
5
It is similar to the COUNT() function. It is used to get the number of
elements in a bag.
DIFF()
6
To compare two bags (fields) in a tuple.
IsEmpty()
7
To check if a bag or map is empty.
MAX()
8
To calculate the highest value for a column (numeric values or chararrays) in
a single-column bag.
MIN()
9
To get the minimum (lowest) value (numeric or chararray) for a certain
column in a single-column bag.
PluckTuple()
10
Using the Pig Latin PluckTuple() function, we can define a string Prefix and
filter the columns in a relation that begin with the given prefix.
SIZE()
11
To compute the number of elements based on any Pig data type.
SUBTRACT()
12
To subtract two bags. It takes two bags as inputs and returns a bag which
contains the tuples of the first bag that are not in the second bag.
SUM()
13
To get the total of the numeric values of a column in a single-column bag.
TOKENIZE()
14
To split a string (which contains a group of words) in a single tuple and
return a bag which contains the output of the split operation.