Hive Using HiveQL
Hive Using HiveQL
1. Data Loading:
You can load data into Hive tables using the LOAD DATA statement.
LOAD DATA LOCAL INPATH ‘/path/to/data/file’ INTO TABLE table_name;
Use LOCAL for loading data from the local filesystem, or skip it for HDFS paths.
2. Inserting Data:
Use the INSERT INTO statement to append data into a table.
INSERT INTO TABLE table_name VALUES (‘value1’, ‘value2’, ...);
Alternatively, INSERT OVERWRITE replaces the existing data in the table:
INSERT OVERWRITE TABLE table_name SELECT * FROM source_table;
3. Updating Data:
Hive supports updates in tables starting from Hive 0.14 when ACID properties are enabled.
UPDATE table_name SET column1 = ‘value’ WHERE condition;
Tables must be created as transactional tables with ORC format:
CREATE TABLE table_name (...) STORED AS ORC TBLPROPERTIES (“transactional”=”true”);
4. Deleting Data:
Hive also supports the DELETE statement for transactional tables:
DELETE FROM table name WHERE condition;
5. Merging Data:
Hive provides the MERGE statement (from Hive 2.2+) to combine and update data in tables.
MERGE INTO target_table USING source_table
ON target_table.id = source_table.id
WHEN MATCHED THEN UPDATE SET target_table.column = source_table.column
WHEN NOT MATCHED THEN INSERT VALUES (source_table.id, source_table.column);
6. Data Retrieval (SELECT):
Use the SELECT statement to query data:
SELECT column1, column2 FROM table_name WHERE condition;
Complex queries using joins, aggregations, and subqueries are also supported.
7. Partitioning and Bucketing:
Partitioning improves query performance by dividing data into logical subsets:
CREATE TABLE table_name (id INT, name STRING)
PARTITIONED BY (year INT, month STRING);
Bucketing organizes data into fixed-size parts:
CREATE TABLE table_name (...) CLUSTERED BY (column) INTO 4 BUCKETS;
8. Dynamic Partitioning:
Use SET hive.exec.dynamic.partition=true to enable dynamic partitioning during insert operations.
9. Using Temporary Tables:
Temporary tables allow intermediate storage of query results.
CREATE TEMPORARY TABLE temp_table AS SELECT * FROM source_table;
10. Importing/Exporting Data:
Use the EXPORT and IMPORT commands to transfer Hive table data to/from HDFS.
EXPORT TABLE table_name TO ‘hdfs_path’;
IMPORT TABLE table_name FROM ‘hdfs_path’;
11. Handling NULL Values:
Hive supports functions to manage NULL values, such as COALESCE and NVL:
SELECT COALESCE(column, ‘default_value’) FROM table_name;
By leveraging these HiveQL data manipulation features, you can efficiently load, manage, and retrieve data for
analysis in a Hadoop ecosystem.