Linux For Beginners Complete
Linux For Beginners Complete
How do I:
●
Connect (myself)
myself to the server?
●
Transfer my files holding the data on the server?
●
Launch the program?
program
●
Specify where to find the files with the data it will process?
●
Specify where to write the files with the results it will generate?
●
Organize all these files in folders not to get lost later on ?
●
Run several programs “simultaneously” (or several ”instances” of the same
program)
program with different parameters or input data?
●
Stop an instance of the program if it runs for too long?
●
{Share|protect} my files containing input data or results {with my colleagues|
from my competitors}?
Run
Stop
Users Interrupt/Resume
Transfer
Read
Write Processes
Remove
Organize
(Un)Share
Read
Write
Remove
7 Processes
Linux for beginners
Linux for Beginners | Outline
7 Processes
Linux for beginners
Purpose of an Operating System
TheO
pera
ting
Syste
m (O
S)
●
Open-source and free (as in beer)
– Its code can be freely copied, modified and redistributed
●
Offering a vast software catalogue :
– Office suites : LibreOffice
– Internet tools : browsers (Firefox, Chrome), e-mail programs (Thunderbird, Evolution)
– Multimedia : audio/video playback tools (VLC, Totem )
– Graphics : image manipulation (Gimp), 3D modeling (Blender)
– Software development : languages (Python, Java, C/C++…), environments (Eclipse,
IDLE, PyDev, DDD)
●
Scientific disciplines included :
– Bioinformatics : blast, emboss, phylip, mafft, clustal, trimal...
7 Processes
Linux for beginners
Connecting | The Terminal
The Terminal :
●
A means to ”communicate”with a machine relying on a command line
in the context of a session
1
3
2
3 4
4
5
3
6
8
Linux for beginners
File Transfer using FileZilla
Validating the connection
1 2
4
3
2
3
4
5
Enter password
Use the ordinary copy/paste/drag/drop between local files and remote files.
7 Processes
Linux for beginners
Structure of the Command Line
[stage01@slurm0 ~]$ head -n 20 insulin.fas #print the first 20
lines
[stage01@slurm0 ~] $ The prompt : displays the current user’s login (stage01), the host (or
machine) name (slurm0), the current directory (working directory) (~)
head The name of the program to run (first word following the prompt)
●
The space character is used to separate the various fields of the command line.
●
The character case (upper/lower) is important ( head is not the same as HEAD )
●
Each command has its own set of arguments and options
●
The Enter key ( ) is used to run the program
To get a more detailed documentation, it is possible to use the man (i.e. manual) command
Giving it as an argument the name of the command for which to display the documentation.
[stage01@slurm0~]$ man ls
LS(1) User Commands LS(1)
NAME
ls - list directory contents
SYNOPSIS
ls [OPTION]... [FILE]...
DESCRIPTION
List information about the FILEs (the current directory by default). Sort
entries alphabetically if none of -cftuvSUX nor --sort is specified.
7 Processes
Linux for beginners
Le système de fichiers | Utilité & concepts de base
In most Operating Systems, data are stored in files (text, images, tables,
sequences, measurement series….). Quickly, the number of files increases and
it becomes necessary to organize them to avoid getting lost. This is done by
grouping them in folders or directories. Folders can be stored in other
folders, which in turn can be stored in other folders, which in turn can be stored in other folders...
The Concept
Slash or root directory : the unique (top level) entry point for the
whole file system. A path leading to a file or a directory can always
be specified starting from the root directory
/tmp : a directory where anyone can read and write files (but, only a file’s creator
has the rights to remove her own stuff). Handy to share data with colleagues on the
same machine (but beware the volume of data).
SBR Specifics
/shared/projects : toplevel directory for project directories, destined to hold project
data than needs to be backed-up (initial data sets, final results).
Command execution takes place in the context of a session, defining at all times a
current user (the user running the command) and a current directory or
working directory (the directory where the command has been typed in).
When starting a new session, the current directory is always the home directory.
What can be found “here” (what are the contents of the current directory) ?
[stage01@slurm0 ~]$ ls #list (contents of working directory)
Bureau Documents Images Modèles Musique Public
Téléchargements Vidéos
Linux for Beginners
Navigating in the File System
To change the current directory (a.k.a “move around” in the directory tree)
[stage01@slurm0 ~]$ pwd # print working directory
/home/fr2424/stage/stage01
/
home
fr2424
stage
To change the current directory (a.k.a “move around” in the directory tree)
[stage01@slurm0 ~]$ pwd # print working directory
/home/fr2424/stage/stage01
/
home
fr2424
stage
Absolute paths
Referring to files and directories located in the file system, as command arguments or options, is done
through paths.
Absolute paths are built starting from the root directory and adding the subdirectories one by one
separated by a slash character (/), until the desired file or directory is reached.
stage
images
/home/fr2424/stage/stage01
logo.png
/home/fr2424/stage/stage03/images/logo.png
Linux for Beginners
Navigating in the File System
Relative paths
Relative paths are built with the current directory as starting point and traversing the directory
tree upwards or downwards, until the desired directory or file is reached. The successive path
components are separated with slash (/) characters, and :
- On each upward step in the tree, two dots (..) are added to the path.
- On each downward step in the tree, the name of the directory is added to the path.
images
logo.png
Linux for Beginners
Navigating in the File System
A (provisional) conclusion
A shortcut to refer to the current directory :
- The dot (“.” character) always refers to the current directory
[stage01@slurm0 ~]$ pwd
/home/fr2424/stage/stage01
[stage01@slurm0 ~]$ cd . # ???
[stage01@slurm0 ~]$ pwd
/home/fr2424/stage/stage01
Use case : run a command file located in the current directory : ./mycommand
Teaser : cdmystuff is a so-called alias which we’ll learn to define at the end of the day.
Using the * character in an argument of ls restricts the list to the files and directories whose names
match the pattern formed by the argument:
- image* : all files starting with the letters image (image-001, images-des-vacances,
imagettes)
- *seq* : all files having the letters seq in their names (sequences, mes-sequences,
maiseqoidon)
- * : each and every file (no restriction)
Using autocompletion
To avoid to have to type in long filenames it is possible to use the [TAB]key. Pressing the [TAB] key once
launches a file name (or directory) lookup to determine which ones start with what has already been typed.
-If there is a single match, it will be added to the command line,
-If there are several matches another press of the [TAB] key will list them all.
stage
2 directories, 3 files
●
Create the following directory structure in your home
directory:
●
Check it has been correctly created by displaying it :
●
Create the following directory structure in your home
directory:
Solution
$ cd
$ mkdir myproject
$ cd myproject
$ mkdir finalresult input script tmp
$ cd input
$ mkdir cmd fasta
●
Check it has been correctly created by displaying it :
Solution
$ tree
$ tree -L 1
$ tree -L 2
●
Change to the fasta directory
●
Create a parser directory in the script directory
●
Create a parser directory in the script directory
Solution
$ mkdir ../../script/parser
Copying data
The cp (copy) command copies one or more files, and even whole directory trees. It is used as
follows : cp SRC DEST where SRC is the path to the already existing data (the source) and DEST is
the path the the destination location.
Deleting data
The rm (remove) command deletes th file(s) whose path(s) are given as argument.
●
Copy the file to its destination directory
●
Copy the file to its destination directory
Solution
$ cp Linux-Initiation/insulin.fas myproject/input/fasta
●
Move the insulin.fas file from the
input/fasta directory to the tmp directory
●
Remove the tmp directory and all its contents
●
Move the insulin.fas file from the
input/fasta directory to the tmp directory
Solution
$ cd
$ cd myproject/finalresult
$ mv ../input/fasta/insulin.fas ../tmp
●
Remove the tmp directory and all its contents
Solution
$ cd
$ rm -r myproject/tmp
7 Processes
Linux for beginners
Manipulating File Contents
Linux doesn’t put any requirements on file name extensions (.txt, .csv, .pdf, .html, etc.).
Any extension can be given to any type of file.
Linux uses other recipes to determine the nature of a file’s contents (cf. the file command).
The head command displays the first (10) lines of a file. head -n displays the n first lines.
[stage01@slurm0 ~]$ head -2 acteur.csv
First Name;Last Name;Age
Chuck;Norris;70
The tail command displays the (10) last lines of a file. tail -n displays the n last ones.
[stage01@slurm0 ~]$ tail -2 acteur.csv
Sylvester;Stallone;64
Steven;Seagal;59
The less command also displays the contents of a file “one page at a time”. The spacebar moves
from the current page to the next; and “q” is used to quit. The and arrows allow to move back and
forth in the file.
[stage01@slurm0 ~]$ less insulin.fas
>gi|163659904|ref|NM_000618.3| Homo sapiens insulin-like growth factor
[stage01@slurm0 ~]$ tail
1 (somatomedin C) (IGF1), -2 acteur.csv
transcript variant 4, mRNA
La commande cat affiche l’intégralité du contenu d’un fichier.
Sylvester;Stallone;64
TTTTGTAGATAAATGTGAGGATTTTCTCTAAATCCCTCTTCTGTTTGCTAAATCTCACTGTCACTGCTAA
Steven;Seagal;59
(…)
insulin.fas
Both more and less allow to search for a text string in the file. This is done by typing slash (/)
followed by the text to search for, and then typing Enter Ex : /variant
By default, grep is case sensitive. To override this behaviour, the -i (ignorecase) option can be used.
As is the case for almost every command, options for grep can be combined.
[stage01@slurm0 ~]$ grep -c -i TRANSCRIPT insulin.fas
5
By default, grep can also display the lines not containing the pattern, thanks to the -v (invert) option.
grep can be used to search for a pattern in all the files of a directory tree, with the -r (recursive)
option. In this configuration, the second argument has to be the name of a directory. The lines with the
information about the patterns are then prefixed with the filename to which they belong.
[stage01@slurm0 ~]$ grep -r -c -i TRANSCRIPT .
./insulin_vs_nt.blast:144
./acteur.csv:0
./insulin.fas:5
Find two ways of displaying the first line of the acteur.csv file (using
two different commands)
Find two ways to display the last three lines of the acteur.csv file
(using two different commands) [Granted, one is quite tricky.]
Find two ways of displaying the first line of the acteur.csv file (using
two different commands)
Solution
$ head -1 acteur.csv
$ grep First acteur.csv
Find two ways to display the last three lines of the acteur.csv file
(using two different commands) [Granted, one is quite tricky.]
Solution
$ tail -3 acteur.csv
$ grep -v First acteur.csv
It is possible to determine the file size using the -l (-h)options of the ls command. The file size is
displayed in the fifth column.
[stage01@slurm0 ~]$ ls -l insulin_vs_nt.blast
-rw-r--r-- 1 stage08 stage 30025889 May 03 22:42 insulin_vs_nt.blast
[stage01@slurm0 ~]$ ls -l -h insulin_vs_nt.blast
-rw-r--r-- 1 stage08 stage 29M May 03 22:42 insulin_vs_nt.blast
The wc (word count) command also displays information about the size of a file : line, word and
character count. Using the -l option restricts the output to the number of lines.
[stage01@slurm0 ~]$ wc insulin_vs_nt.blast
622756 2377511 30025889 insulin_vs_nt.blast
[stage01@slurm0 ~]$ wc -l insulin_vs_nt.blast
622756 insulin_vs_nt.blast
[stage01@slurm0 ~]$ du -h .
64.0K ./tmp
4.0K ./cours
29M .
Adding the -s (summary) option displays the total volume occupied by the directory (and its contents).
[stage01@slurm0 ~]$ du -s -h .
29M .
When specifying a directory as argument, df displays information about the mounted file system
containing the directory.
Decompression is achieved with gunzip (for.gz files) or bunzip2 (for .bz2 files).
The compression ratio depends on the file contents : better for text files, quite low for already
compressed data (images, sounds, videos).
Use case :
-One of my directories, allsequences qcontains a myriad of sequence files related to various organisms.
-The file names hint to the organism to which the sequences they contain belong (human_seq*.fasta,
mouse_seq*.fasta, ecto_seq*.fasta, etc.)
-I wish to apply routines to these sequences whose parameters may depend on the organism. And I wish to
group de result files in organism specific (a.k.a separate) directories (process_human/,
process_mouse/, process_ecto/, etc.)
-But I want to avoid copying the sequence data in these process_*/ directories.
A solution relies on the creation, in the process_*/ directories, of shortcuts to each sequence file belonging
to a given organism, using the ln -s (link, symbolic) command.
Ex. 1 : Creating a symbolic link for a single file : the first argument is the name of the existing file (or
directory), and the second argument the name of the shortcut to create, or the directory in which to create a
shortcut with the same name.
[stage01@slurm0 ~]$ ln -s allsequences/ecto_seq_zAb3.fas process_ecto/
[stage01@slurm0 ~]$ ls -l process_ecto/
lrwxrwxrwx 1 stage01 stage 30 May 05 08:44 ecto_seq_zAb3.fas ->
allsequences/ecto_seq_zAb3.fas
home
fr2424
stage
allsequences process_ecto
ecto_seq_zAb3.fas ecto_seq_zAb3.fas
stage
allsequences process_ecto
???? ecto_seq_zAb3.fas
Symbolic links can also be used to transparently manage software package updates : a command can “point”
to a specific version of a tool. When the tool is updated, a new version of the command can be installed
alongside the previous one, and the link to the command is adjusted to point to the latest version.
usr
local
java bin
jdk7 jdk8
java
bin bin
java java
Linux for Beginners
Exercises
●
Create a symbolic link in your home directory pointing to
the script directory.
●
Copy the acteur.csv file to finalresult/test-
TP.txt
●
Create a symbolic link in your home directory pointing to
the test-TP.txt file located in the in the
finalresult directory.
●
Display the contents of the test-TP.txt file located in
your home directory
●
Delete the finalresult/test-TP.txt file
●
Conclusion ?
●
Create a symbolic link in your home directory pointing to the script directory.
$ cd
$ ln -s myproject/script .
●
Copy the acteur.csv file to finalresult/test-TP.txt
$ cd
$ cp Linux-Initiation/acteur.csv myproject/finalresult/test-TP.txt
●
Create a symbolic link in your home directory pointing to the test-TP.txt file
located in the in the finalresult directory.
$ cd
$ ln -s myproject/finalresult/test-TP.txt .
●
Display the contents of the test-TP.txt file located in your home directory
$ cd
$ cat test-TP.txt
●
Delete the finalresult/test-TP.txt file
$ cdmystuff
$ rm -f myproject/finalresult/test-TP.txt
●
Conclusion ?
$ cd
$ cat test-TP.txt
cat: test-TP.txt: No such file or directory
The solution shows the commands used for the stage01 account.
The scp command is used to copy files to/from another Linux (UNIX) machine. To use it, it’s necessary
to have an account (user name and password) on the remote machine. scp is used like cp sbut one of
its arguments (source or destination) includes information about the remote machine as follows:
username@hostname:
Ex1. : Copying a local file to a remote machine : information about the remote machine is found in the
destination argument of the command.
[stage01@slurm0 ~]$ scp Linux-Initiation.tar.gz stage01@sbr2:cours/
Ex2. : Copying a directory structure from a remote machine : the information about the remote machine
is in the source argument of the command.
[stage01@slurm0 ~]$ scp -r stage01@sbr2:cours/ .
Ex2. : Recursively (-r option) fetching contents from an HTML page (handle with care !)
The remote directory structure is recreated in the directory where the wget command is run.
Solution
7 Processes
Linux for beginners
Users, Groups & Access Rights
The id command displays the information about the identity of the user running the command.
[stage01@slurm0 ~]$ id
uid=7069(stage01) gid=1013(stage) groups=1013(stage)
The ls -l command displays information about the ownership (which user/group) of files and
directories.
[stage01@slurm0 ~]$ ls -l acteur.csv
-rw-r--r-- 1 stage01 stage 84 May 13 21:19 acteur.csv
Owner Group
Every user is limited to what she has access to on the system: file read/write access, program
execution, allocated disk or memory space.
3 x Execution - Execution
allowed forbidden
Example
-rw-r--r--
Owner rw- Read and write access allowed, execution forbidden
The chmod command has an -R (recursive) option recursively applying the access rights to all files and
subdirectories of its destination argument.
The chgrp, command allows to define a new group for a file or directory. The user running the
command must be member of the group.
[stage01@slurm0 ~]$ chgrp autregroupe acteur.csv
[stage01@slurm0 ~]$ ls -l acteur.csv
-r--r----- 1 stage01 autregroupe 84 May 13 21:19 acteur.csv
There is a command to change file ownership(chown)but its use is restricted to the system
administrator...
7 Processes
Linux for beginners
Processes
Some Definitions
A process is a currently running program. Each time a user issues a command (runs a program),
the operating system loads it into memory and starts its execution.
In order to run smoothly, a process needs memory and processor time (CPU). It is the duty of the
operating system to proceed to the optimal allocation of these resources among all the processes
running “simultaneously”. The load of a machine reflects the activity of all the active processes at any
given moment.
As for files, a process has an owner (user) and a group ; and associated rights or permissions.
The user has the ability -to some extent- to control process’ execution: she can stop them “by force”,
interrupt them to resume them later on, or modify their priority.
A process running in the current session can be (brutally) stopped by typing the Ctrl-C key
combination.
[stage01@slurm0 ~]$ gedit acteur.csv
^C
[stage01@slurm0 ~]$
A process killed by Ctrl-C frees all the resources (memory, open files) in its possession.
An interrupted process is “frozen”, and keeps all resources it was allocated (except for processor time).
It is assigned a job identifier (not to be confused with the process identifier, cf. following slides).
The jobs command lists all interrupted processes of the current session.
[stage01@slurm0 ~]$ gedit insulin.fas
^Z
[2]+ Stopped gedit insulin.fas
[stage01@slurm0 ~]$ jobs
[1]- Stopped gedit acteur.csv
[2]+ Stopped gedit insulin.fas
A process can be run directly in background mode by adding an ampersand ( &) to the command line.
Ex. 2 : Getting the detailed list of the processes in the current session with the -f (full) option.
[stage01@slurm0 ~]$ ps -f
UID PID PPID C STIME TTY TIME CMD
stage01 16175 16174 0 17:21 pts/14 00:00:00 -bash
stage01 20693 1 0 17:37 pts/14 00:00:00 dbus-launch --autolaunch 9b7328b
stage01 26357 16175 0 17:58 pts/14 00:00:00 gedit insulin.fas
stage01 28638 16175 4 18:06 pts/14 00:00:00 ps -f
Additionnal information is shown about the user (UID), the PID of the parent process (PPID), the start
time (STIME), the execution time (TIME) and the complete command line (CMD).
Ex. 4 : Listing all the processes currently present on the machine with the -elf (extended, long, full) options.
[stage01@slurm0 ~]$ ps -elf
0 S uguyet 29185 29184 0 80 0 - 28353 n_tty_ Apr19 pts/35 00:00:00 /bin/bash
0 S lgueguen 30113 10845 0 80 0 - 26364 n_tty_ May12 pts/32 00:00:00 less macros.xml
4 S root 30980 5864 0 80 0 - 28926 unix_s May12 ? 00:00:00 sshd: nhenry
[priv]
5 S nhenry 31153 30980 0 80 0 - 28961 poll_s May12 ? 00:00:01 sshd: nhenry@notty
0 S nhenry 31154 31153 0 80 0 - 14977 poll_s May12 ? 00:00:01
/usr/libexec/openssh/sft
0 S pmandon 31370 2698 0 80 0 - 27641 n_tty_ May12 pts/64 00:00:00 /bin/bash
0 S lberdjeb 31598 1 0 80 0 - 26526 wait 12:22 ? 00:00:00 /bin/sh
/opt/sge/qlogin.
Ex. 1 : Brutally stopping a process using kill with the -KILL option
[stage01@slurm0 ~]$ gedit acteur.csv &
[1] 27908
[stage01@slurm0 ~]$ kill -KILL 27908
[stage01@slurm0 ~]$
Every process is spawned from a parent process. For instance, open a session yields a new process
which will be the parent process of all of the commands typed in on the command line.
The process with PID 1 is the process that, when the machine was started, gave rise to the system’s
process tree.
Halting a process results in halting all of its child processes (to keep in mind before using kill).
When a process ends, its parent process is notified. Until the parent process handles its child’s
termination, the latter stays in a zombie state. Zombie processes do not take up any system resources,
except if they proliferate in an uncontrollable way. Eliminating a zombie is done by “killing” its parent
process. It is then attached to the process with PID 1 who takes care of eliminating zombies.
Solution
It is frequently necessary to execute programs whose running time will exceed the duration of a
session. Thus, automatic killing of these processes must be avoided when the session is closed.
The nohup (no hang-up) command allows to detach a process from a session.
For programs already running (in background mode) and whose PID is known, the disown command
allows to detach it from the session.
[stage01@slurm0 ~]$ mon_long_programme_de_bioinformatique.pl &
[1] 29087
[stage08@slurm0 ~]$ disown 29087
When starting a session (local or remote), the process handling what the user types
is called the command interpreter or shell. Its purpose is to wait for the user to
hit the Enter key at the end of a command line a to try to make sense of it a.k.a run
the commands whose name(s) were typed in.
To run a command not located in one of the PATH directories, its (absolute or relative) path must be given :
Using Aliases
It is often handy to avoid repeatedly typing commands with the same options & arguments to
define an alias for them. This can be done with the alias command.
There is however a ~/.bashrc (text) file whose contents is read whenever a new session is
opened. The contents of this file can be any shell command, including alias definitions and
PATH modifications.
After modifying this file in the current session, the source command has to be issued for the
new version of the file contents to be taken into account in this session :
The env command displays all the known environment variables with their values.