0% found this document useful (0 votes)
26 views10 pages

BDA Mod 3 Piglatin

Big data analytics|module 3 6th sem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views10 pages

BDA Mod 3 Piglatin

Big data analytics|module 3 6th sem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Id Name Age Branch College

1 N1 18 Cse C1
2 N2 18 Ise C1
3 N3 19 Cse C2
4 N4 20 aiml C3
5 N5 19 Aiml C2

Student.csv

id Marks age
1 45 18
2 55 18
4 65 20
5 56 19

Performance.csv
A=LOAD student.csv USING pigstorage(‘,’) AS
(id:int,name:chararray,age:int,branch:chararry,college:chararray);
DUMP A
(1, N1,18,cse,C1)
(2,N2,18,Ise,C1)
(3,N3,19,Cse,C2)
(4,N4,20,aiml,C3)
(5,N5,19,aiml,C2)

B=LOAD performance.csv USING pigstaorage(‘,’) AS (id:int,marks:int,age:int);


DUMP B
(1,45,18)
(2,55,18)
(4,65,20)
(5,56,19)
Id Name Age Branch College
1 N1 18 Cse C1
2 N2 18 Ise C1
3 N3 19 Cse C2
4 N4 20 aiml C3
5 N5 19 Aiml C2

GROUP by a key :
C=GROUP A BY age;
DUMP C;
(18,{(1,N1,18,cse,C1),(2,N2,18,ise,C1)})
(19,{(3,N3,19,Cse,C2),(5,N5,19,Aiml,C2)})
(20,{(4,N4,20,aiml,C3)})

Id Name Age Branch College


1 N1 18 Cse C1
2 N2 18 Ise C1
3 N3 19 Cse C2
4 N4 20 aiml C3
5 N5 19 Aiml C2

GROUP by multiple key fields


D= GROUP A BY (age,college);
DUMP D;
((18,C1),{(1,N1,18,cse,C1),(2,N2,18,ise,C1)})
((19,C2),{(3,N3,19,Cse,C2),(5,N5,19,Aiml,C2)})
((20,C3),{(4,N4,20,aiml,C3)})
Id Name Age Branch College
1 N1 18 Cse C1
2 N2 18 Ise C1
3 N3 19 Cse C2
4 N4 20 aiml C3
5 N5 19 Aiml C2

GROUP by all;
E=GROUP A BY all;
DUMP E;
(all,{(1,N1,18,Cse,C1),(2,N2,18,Ise,C1),(3,N3,19,Cse,C2),(4,N4,20,aiml,C3),
(5,N5,19,Aiml,C2)})
Id Name Age Branch College
1 N1 18 Cse C1
2 N2 18 Ise C1
3 N3 19 Cse C2
4 N4 20 aiml C3
5 N5 19 Aiml C2

C=GROUP A BY college;
DUMP C;
(C1,{(1,N1,18,Cse,C1),(2,N2,18,Ise,C1)})
(C2,{(3,N3,19,Cse,C2),(5,N5,19,Aiml,C2)})
(C3,{4,N4,20,aiml,C3})
FOREACH C GENERATE GROUP as college, AVG(A.age);SUM MAX,MIN
(C1,18)
(C2,19)
(C3,20)
id Marks age
1 45 18
2 55 18
4 65 20
5 56 19

Filter----required tuples with specific


D=Filter B BY marks>=50;
DUMP D;
(2,55,18)
(4,65,20)
(5,56,19)

COGROUP

Problem Statement 1: Employee Salary Analysis

Objective: Analyze employee salary data to find the average salary by department, the highest
salary in each department, and the list of employees earning more than a certain threshold.

Tasks: 1. Calculate the average salary for each department.

2. Find the highest salary in each department.

3. List employees earning more than $70,000.


Employee.csv

Id:int,name:chararry,salary:int,department:chararray,age:int

Id Name Salary Dept age


1 N1 56000 Testing 34
2 N2 50000 Analysis 30
3 N3 100000 ML 45
4 N4 75000 Ml 40

Calculate the average salary for each department.

B=GROUP A BY Dept;

DUMP B;

(Testing,{(1,N1,56000,Testing,34)})

(Analysis,{(2,N2,50000,Analysis,30)})

(ML,{(3,N3,100000,ML,45),(4,N4,75000,ML,40)})

C=FOREACH B GENERATE group as dep,AVG(A.salary);

DUMP C;

(testing,56000)

(analysis,50000)

(ml,87500)

Find the highest salary in each department.

B=GROUP A BY Dept;

DUMP B;

(Testing,{(1,N1,56000,Testing,34)})

(Analysis,{(2,N2,50000,Analysis,30)})

(ML,{(3,N3,100000,ML,45),(4,N4,75000,ML,40)})

C=FOREACH B GENERATE group as dep,MAX(A.salary);

DUMP C;

(testing,56000)

(analysis,50000)

(ml,100000)

List employees earning more than 70,000.


E=FILTER A BY SALARY>=70000;

DUMP E;

(3,N3,100000,ML,45)

(4,N4,75000,ML,40)

COGROUP-----GROUP TUPLES/RECORDS PRESENT MULTIPKLE TABLES

Id Name Age Branch College


1 N1 18 Cse C1
2 N2 18 Ise C1
3 N3 19 Cse C2
4 N4 20 aiml C3
5 N5 19 Aiml C2

id Marks age
1 45 18
2 55 18
4 65 20
5 56 19

C=COGROUP A BY age,B BY age;

(age,{},{})

DUMP C;

(18,{(1,N1,18,Cse,C1),(2,n2,18,Ise,C1)},{(1,45,18),(2,55,18)})

(19,{(3,N3,19,Cse,C2),(5,N5,19,aiml,C2)},{(5,56,19)})

(20,{(4,N4,20,aiml,C3)},{(4,65,20)})

Join-----Join records/table

Self join----table itself

Inner join----2 different

Outer join-----different

----left outer-----all rows of left

-----right outer----all rows of right table

-----full outer----all rows of left n rigt

J=JOIN A1 by id, A2 by id;//self join


(1,N1,18,Cse,C1, 1,N1,18,Cse,C1)

(2,N2,18,Ise,C1, 2,N2,18,Ise,C1)

(3,N3,19,Cse,C2, 3,N3,19,Cse,C2)

(4,N4,20,aiml,C3, 4,N4,20,aiml,C3)

(5,N5,19,aiml,C2, 5,N5,19,aiml,C2)

IJ=JOIN A BY id,B BY id;

DUMP IJ;

(1,N1,18,Cse,C1, 1,45,18)

(2,N2,18,Ise,C1,2,55,18)

(4,N4,20,aiml,C3,4,65,20)

(5,N5,19,aiml,C2,5,56,19)

LJ =JOIN A BY id LEFT OUTER,B BY id;

Dump LJ;

(1,N1,18,Cse,C1, 1,45,18)

(2,N2,18,Ise,C1,2,55,18)

(3,N3,19,Cse,C2,,,)

(4,N4,20,aiml,C3,4,65,20)

(5,N5,19,aiml,C2,5,56,19)

RJ =JOIN A BY id RIGHT OUTER,B BY id;

Dump RJ;

(1,N1,18,Cse,C1, 1,45,18)

(2,N2,18,Ise,C1,2,55,18)

(4,N4,20,aiml,C3,4,65,20)

(5,N5,19,aiml,C2,5,56,19)

FJ =JOIN A BY id FULL OUTER,B BY id;

Dump FJ;

(1,N1,18,Cse,C1, 1,45,18)

(2,N2,18,Ise,C1,2,55,18)

(3,N3,19,Cse,C2,,,)

(4,N4,20,aiml,C3,4,65,20)

(5,N5,19,aiml,C2,5,56,19)
Word Count-Pig Latin

Inut.txt

We like BNMIT

We love BNMIT

A=LOAD input.txt USING pigstorage(‘,’) as(line:chararray);

DUMP A;

We like BNMIT

We love BNMIT

B=FOREACH A GENERATE TOKENIZE(line) As word;

DUMP B;

{We,like,BNMIT}

{We,Love ,BNMIT}

C=FOREACH B GENERATE FLATTEN(word) as word;

DUMP C;

We

Like

BNMIT

We

Love

BNMIT

D=GROUP C BY word;

DUMP D;

(We,{We,We})

(Like,{Like})

(BNMIT,{BNMIT,BNMIT})

(Love,{Love})

E=FOREACH D GENERATE GROUP as word,COUNT(C.word);

DUMP E;

(We,2)
(Like,1)

(BNMIT,2)

(Love,1)

Temperature PIG LATIN SCRIPT


Year temp

2000 25

2000 18

2000 31

2024 40

2024 26

A = LOAD temp.csv USING pigstorage(,) as (year:int, temp:int);

DUMP A;

B = GROUP A BY Year;

DUMP B;

(2000, {(2000,25),(2000,18),(2000,31)})

(2024,{(2024,40), (2024, 26)})

C = FOREACH B GENERATE GROUP AS Year, MAX(A.Temp);

DUMP C;

(2000, 31)

(2024, 40)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy