0% found this document useful (0 votes)
146 views7 pages

Hands-On Exercise: Access HDFS With Command Line and Hue

Uploaded by

nn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
146 views7 pages

Hands-On Exercise: Access HDFS With Command Line and Hue

Uploaded by

nn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Hands-On Exercise: Access HDFS

with Command Line and Hue


Files and Data Used in This Exercise:

Data files (local):


$DEV1DATA/kb/*
$DEV1DATA/base_stations.tsv

 
In  this  exercise  you  will  practice  working  with  HDFS,  the  Hadoop  Distributed  
File  System.    
You  will  use  the  HDFS  command  line  tool  and  the  Hue  File  Browser  web-­‐based  
interface  to  manipulate  files  in  HDFS.  

Set Up Your Environment


1.   Before  starting  the  exercises,  be  sure  you  have  run  the  course  setup  script  in  a  
terminal  window.    (You  only  need  to  run  this  script  once;  if  you  ran  it  earlier,  
you  do  not  need  to  run  it  again.)  

$ $DEV1/scripts/training_setup_dev1.sh

Explore the HDFS Command Line Interface


HDFS  is  already  installed,  configured,  and  running  on  your  virtual  machine.    

The  simplest  way  to  interact  with  HDFS  is  by  using  the  hdfs  command.  To  execute  
file  system  commands  within  HDFS,  use  the  hdfs dfs  command.  

Copyright © 2015 Cloudera, Inc. All rights reserved. 6


Not to be reproduced without prior written consent.
1.   Open  a  terminal  window  (if  one  is  not  already  open)  by  double-­‐clicking  the  
Terminal  icon  on  the  desktop.  

2.   Enter:  

$ hdfs dfs -ls /

This  shows  you  the  contents  of  the  root  directory  in  HDFS.  There  will  be  
multiple  entries,  one  of  which  is  /user.  Individual  users  have  a  “home”  
directory  under  this  directory,  named  after  their  username;  your  username  in  
this  course  is  training,  therefore  your  home  directory  is  /user/training.    

3.   Try  viewing  the  contents  of  the  /user  directory  by  running:  

$ hdfs dfs -ls /user

You  will  see  your  home  directory  in  the  directory  listing.    

4.   List  the  contents  of  your  home  directory  by  running:  

$ hdfs dfs -ls /user/training

There  are  no  files  yet,  so  the  command  silently  exits.  This  is  different  than  if  you  
ran  hdfs dfs -ls /foo,  which  refers  to  a  directory  that  doesn’t  exist  and  
which  would  display  an  error  message.  

Note  that  the  directory  structure  in  HDFS  has  nothing  to  do  with  the  directory  
structure  of  the  local  filesystem;  they  are  completely  separate  namespaces.  

Upload Files to HDFS


Besides  browsing  the  existing  filesystem,  another  important  thing  you  can  do  with  
the  HDFS  command  line  interface  is  to  upload  new  data  into  HDFS.  

Copyright © 2015 Cloudera, Inc. All rights reserved. 7


Not to be reproduced without prior written consent.
5.   Start  by  creating  a  new  top  level  directory  for  exercises.  You  will  use  this  
directory  throughout  the  rest  of  the  course.  

$ hdfs dfs -mkdir /loudacre

6.   Change  directories  to  the  local  filesystem  directory  containing  the  sample  data  
we  will  be  using  in  the  course.  

$ cd $DEV1DATA

If  you  perform  a  regular  Linux  ls  command  in  this  directory,  you  will  see  
several  files  and  directories  used  in  this  class.  One  of  the  data  directories  is  kb.  
This  directory  holds  Knowledge  Base  articles  that  are  part  of  Loudacre’s  
customer  service  website.    

7.   Insert  this  directory  into  HDFS:  

$ hdfs dfs -put kb /loudacre/

This  copies  the  local  kb  directory  and  its  contents  into  a  remote  HDFS  directory  
named  /loudacre/kb.    

8.   List  the  contents  of  the  new  HDFS  directory  now:  

$ hdfs dfs -ls /loudacre/kb

You  should  see  the  KB  articles  that  were  in  the  local  directory.    

Relative paths

In HDFS, any relative (non-absolute) paths are considered relative to your home
directory. There is no concept of a “current” or “working” directory as there is in
Linux and similar file systems.

Copyright © 2015 Cloudera, Inc. All rights reserved. 8


Not to be reproduced without prior written consent.
9.   Practice  uploading  a  directory,  then  remove  it,  as  it  is  not  actually  needed  for  
the  exercises.  

$ hdfs dfs -put $DEV1DATA/calllogs /loudacre/


$ hdfs dfs -rm -r /loudacre/calllogs

View HDFS files


Now  view  some  of  the  data  you  just  copied  into  HDFS.    

10.   Enter:  

$ hdfs dfs -cat /loudacre/kb/KBDOC-00289.html | tail \


-n 20

This  prints  the  last  20  lines  of  the  article  to  your  terminal.  This  command  is  
handy  for  viewing  HDFS  data.  An  individual  file  is  often  very  large,  making  it  
inconvenient  to  view  the  entire  file  in  the  terminal.  For  this  reason,  it’s  often  a  
good  idea  to  pipe  the  output  of  the  fs -cat  command  into  head,  tail,  more,  
or  less.  

Copyright © 2015 Cloudera, Inc. All rights reserved. 9


Not to be reproduced without prior written consent.
11.   To  download  a  file  to  work  with  on  the  local  filesystem  use  the  hdfs dfs -
get  command.  This  command  takes  two  arguments:  an  HDFS  path  and  a  local  
path.  It  copies  the  HDFS  contents  into  the  local  filesystem:    

$ hdfs dfs -get \


/loudacre/kb/KBDOC-00289.html ~/article.html  
$ less ~/article.html

12.   There  are  several  other  operations  available  with  the  hdfs dfs  command  to  
perform  most  common  filesystem  manipulations:  mv,  cp,  mkdir,  etc.    
 
In  the  terminal  window,  enter:  

$ hdfs dfs

You  see  a  help  message  describing  all  the  file  system  commands  provided  by  
HDFS.  

Try  playing  around  with  a  few  of  these  commands  if  you  like.  

Use the Hue File Browser to browse, view and manage


files
13.   Start  Firefox  on  the  VM  (using  the  shortcut  provided  on  your  desktop  or  task).  

14.   Click  the  Hue  bookmark,  or  visit  http://localhost:8888.  

15.   Because  this  is  the  first  time  anyone  has  logged  into  Hue  on  this  server,  you  will  
be  prompted  to  create  a  new  user  account.  Enter  username  training  and  
password  training,  then  click  Create  Account.  (If  prompted  you  may  click  
“Remember  Password”)  

• Note:  When  you  first  log  in  to  Hue  you  may  see  a  misconfiguration  
warning.    This  is  because  not  all  the  services  Hue  depends  on  are  running  
on  the  course  VM.    You  can  disregard  the  message.  

Copyright © 2015 Cloudera, Inc. All rights reserved. 10


Not to be reproduced without prior written consent.
16.   Hue  has  many  useful  features,  many  of  which  will  be  covered  later  in  the  course.  
For  now,  to  access  HDFS,  click  File  Browser  in  the  Hue  menu  bar.  (The  mouse-­‐
over  text  is  “Manage  HDFS”).    

• Note:  If  your  Firefox  window  is  too  small  to  display  the  full  menu  names,  
you  will  see  just  the  icons  instead.  

17.   By  default,  the  contents  of  your  HDFS  home  directory  (/user/training)  
display.  In  the  directory  path  name,  click  the  leading  slash  (/)  to  view  the  HDFS  
root  directory.  

18.   The  contents  of  the  root  directory  display,  including  the  loudacre  directory  you  
created  earlier.  Click  on  that  directory  to  see  the  contents.  

19.   Click  on  the  name  of  the  kb  directory  to  see  the  knowledge  base  articles  you  
uploaded.    

20.   View  one  of  the  files  by  clicking  on  the  name  of  any  one  of  the  articles.  

21.   In  the  file  viewer,  the  contents  of  the  file  are  displayed  on  the  right.  In  this  case,  
the  file  is  fairly  small,  but  typical  files  in  HDFS  are  very  large,  so  rather  than  
displaying  the  entire  contents  on  one  screen,  Hue  provides  buttons  to  move  
between  pages.  

22.   Return  to  the  directory  review  by  clicking  View  file  location  in  the  Action  panel  
on  the  left.  

Copyright © 2015 Cloudera, Inc. All rights reserved. 11


Not to be reproduced without prior written consent.
23.   Click  the  up  arrow  ( )  to  return  to  the  /loudacre  base  directory.  

24.   To  upload  a  file,  click  the  Upload  button.  You  can  choose  to  upload  a  plain  file,  
or  to  upload  a  zipped  file  (which  will  be  automatically  unzipped  after  upload).  
In  this  case,  select  Files,  then  click  Select  Files.  

25.   A  Linux  file  browser  appears.  Browse  to  


/home/training/training_materials/data.  

26.   Choose  base_stations.tsv  and  click  the  Open  button.  

27.   When  the  file  has  uploaded,  it  will  be  displayed  in  the  directory.  Click  the  
checkbox  next  to  the  file’s  icon,  then  click  the  Actions  button  to  see  a  list  of  
actions  that  can  be  performed  on  the  selected  file(s).  

28.   Optional:  explore  the  various  file  actions  available.  When  you’ve  finished,  select  
any  unneeded  files  you  have  uploaded  and  click  the  Move  to  trash  button  to  
delete.  

This is the end of the Exercise

Copyright © 2015 Cloudera, Inc. All rights reserved. 12


Not to be reproduced without prior written consent.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy