0% found this document useful (0 votes)
12 views1 page

Hive Using HiveQL

Uploaded by

realmex7max5g
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views1 page

Hive Using HiveQL

Uploaded by

realmex7max5g
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

the key points regarding data manipulation in Hive using HiveQL:

1. Data Loading:
You can load data into Hive tables using the LOAD DATA statement.
LOAD DATA LOCAL INPATH ‘/path/to/data/file’ INTO TABLE table_name;
Use LOCAL for loading data from the local filesystem, or skip it for HDFS paths.
2. Inserting Data:
Use the INSERT INTO statement to append data into a table.
INSERT INTO TABLE table_name VALUES (‘value1’, ‘value2’, ...);
Alternatively, INSERT OVERWRITE replaces the existing data in the table:
INSERT OVERWRITE TABLE table_name SELECT * FROM source_table;
3. Updating Data:
Hive supports updates in tables starting from Hive 0.14 when ACID properties are enabled.
UPDATE table_name SET column1 = ‘value’ WHERE condition;
Tables must be created as transactional tables with ORC format:
CREATE TABLE table_name (...) STORED AS ORC TBLPROPERTIES (“transactional”=”true”);
4. Deleting Data:
Hive also supports the DELETE statement for transactional tables:
DELETE FROM table name WHERE condition;
5. Merging Data:
Hive provides the MERGE statement (from Hive 2.2+) to combine and update data in tables.
MERGE INTO target_table USING source_table
ON target_table.id = source_table.id
WHEN MATCHED THEN UPDATE SET target_table.column = source_table.column
WHEN NOT MATCHED THEN INSERT VALUES (source_table.id, source_table.column);
6. Data Retrieval (SELECT):
Use the SELECT statement to query data:
SELECT column1, column2 FROM table_name WHERE condition;
Complex queries using joins, aggregations, and subqueries are also supported.
7. Partitioning and Bucketing:
Partitioning improves query performance by dividing data into logical subsets:
CREATE TABLE table_name (id INT, name STRING)
PARTITIONED BY (year INT, month STRING);
Bucketing organizes data into fixed-size parts:
CREATE TABLE table_name (...) CLUSTERED BY (column) INTO 4 BUCKETS;
8. Dynamic Partitioning:
Use SET hive.exec.dynamic.partition=true to enable dynamic partitioning during insert operations.
9. Using Temporary Tables:
Temporary tables allow intermediate storage of query results.
CREATE TEMPORARY TABLE temp_table AS SELECT * FROM source_table;
10. Importing/Exporting Data:
Use the EXPORT and IMPORT commands to transfer Hive table data to/from HDFS.
EXPORT TABLE table_name TO ‘hdfs_path’;
IMPORT TABLE table_name FROM ‘hdfs_path’;
11. Handling NULL Values:
Hive supports functions to manage NULL values, such as COALESCE and NVL:
SELECT COALESCE(column, ‘default_value’) FROM table_name;
By leveraging these HiveQL data manipulation features, you can efficiently load, manage, and retrieve data for
analysis in a Hadoop ecosystem.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy