BDA Mod 3 Piglatin
BDA Mod 3 Piglatin
1 N1 18 Cse C1
2 N2 18 Ise C1
3 N3 19 Cse C2
4 N4 20 aiml C3
5 N5 19 Aiml C2
Student.csv
id Marks age
1 45 18
2 55 18
4 65 20
5 56 19
Performance.csv
A=LOAD student.csv USING pigstorage(‘,’) AS
(id:int,name:chararray,age:int,branch:chararry,college:chararray);
DUMP A
(1, N1,18,cse,C1)
(2,N2,18,Ise,C1)
(3,N3,19,Cse,C2)
(4,N4,20,aiml,C3)
(5,N5,19,aiml,C2)
GROUP by a key :
C=GROUP A BY age;
DUMP C;
(18,{(1,N1,18,cse,C1),(2,N2,18,ise,C1)})
(19,{(3,N3,19,Cse,C2),(5,N5,19,Aiml,C2)})
(20,{(4,N4,20,aiml,C3)})
GROUP by all;
E=GROUP A BY all;
DUMP E;
(all,{(1,N1,18,Cse,C1),(2,N2,18,Ise,C1),(3,N3,19,Cse,C2),(4,N4,20,aiml,C3),
(5,N5,19,Aiml,C2)})
Id Name Age Branch College
1 N1 18 Cse C1
2 N2 18 Ise C1
3 N3 19 Cse C2
4 N4 20 aiml C3
5 N5 19 Aiml C2
C=GROUP A BY college;
DUMP C;
(C1,{(1,N1,18,Cse,C1),(2,N2,18,Ise,C1)})
(C2,{(3,N3,19,Cse,C2),(5,N5,19,Aiml,C2)})
(C3,{4,N4,20,aiml,C3})
FOREACH C GENERATE GROUP as college, AVG(A.age);SUM MAX,MIN
(C1,18)
(C2,19)
(C3,20)
id Marks age
1 45 18
2 55 18
4 65 20
5 56 19
COGROUP
Objective: Analyze employee salary data to find the average salary by department, the highest
salary in each department, and the list of employees earning more than a certain threshold.
Id:int,name:chararry,salary:int,department:chararray,age:int
B=GROUP A BY Dept;
DUMP B;
(Testing,{(1,N1,56000,Testing,34)})
(Analysis,{(2,N2,50000,Analysis,30)})
(ML,{(3,N3,100000,ML,45),(4,N4,75000,ML,40)})
DUMP C;
(testing,56000)
(analysis,50000)
(ml,87500)
B=GROUP A BY Dept;
DUMP B;
(Testing,{(1,N1,56000,Testing,34)})
(Analysis,{(2,N2,50000,Analysis,30)})
(ML,{(3,N3,100000,ML,45),(4,N4,75000,ML,40)})
DUMP C;
(testing,56000)
(analysis,50000)
(ml,100000)
DUMP E;
(3,N3,100000,ML,45)
(4,N4,75000,ML,40)
id Marks age
1 45 18
2 55 18
4 65 20
5 56 19
(age,{},{})
DUMP C;
(18,{(1,N1,18,Cse,C1),(2,n2,18,Ise,C1)},{(1,45,18),(2,55,18)})
(19,{(3,N3,19,Cse,C2),(5,N5,19,aiml,C2)},{(5,56,19)})
(20,{(4,N4,20,aiml,C3)},{(4,65,20)})
Join-----Join records/table
Outer join-----different
(2,N2,18,Ise,C1, 2,N2,18,Ise,C1)
(3,N3,19,Cse,C2, 3,N3,19,Cse,C2)
(4,N4,20,aiml,C3, 4,N4,20,aiml,C3)
(5,N5,19,aiml,C2, 5,N5,19,aiml,C2)
DUMP IJ;
(1,N1,18,Cse,C1, 1,45,18)
(2,N2,18,Ise,C1,2,55,18)
(4,N4,20,aiml,C3,4,65,20)
(5,N5,19,aiml,C2,5,56,19)
Dump LJ;
(1,N1,18,Cse,C1, 1,45,18)
(2,N2,18,Ise,C1,2,55,18)
(3,N3,19,Cse,C2,,,)
(4,N4,20,aiml,C3,4,65,20)
(5,N5,19,aiml,C2,5,56,19)
Dump RJ;
(1,N1,18,Cse,C1, 1,45,18)
(2,N2,18,Ise,C1,2,55,18)
(4,N4,20,aiml,C3,4,65,20)
(5,N5,19,aiml,C2,5,56,19)
Dump FJ;
(1,N1,18,Cse,C1, 1,45,18)
(2,N2,18,Ise,C1,2,55,18)
(3,N3,19,Cse,C2,,,)
(4,N4,20,aiml,C3,4,65,20)
(5,N5,19,aiml,C2,5,56,19)
Word Count-Pig Latin
Inut.txt
We like BNMIT
We love BNMIT
DUMP A;
We like BNMIT
We love BNMIT
DUMP B;
{We,like,BNMIT}
{We,Love ,BNMIT}
DUMP C;
We
Like
BNMIT
We
Love
BNMIT
D=GROUP C BY word;
DUMP D;
(We,{We,We})
(Like,{Like})
(BNMIT,{BNMIT,BNMIT})
(Love,{Love})
DUMP E;
(We,2)
(Like,1)
(BNMIT,2)
(Love,1)
2000 25
2000 18
2000 31
2024 40
2024 26
DUMP A;
B = GROUP A BY Year;
DUMP B;
(2000, {(2000,25),(2000,18),(2000,31)})
DUMP C;
(2000, 31)
(2024, 40)