Lecture Notes - SAN - M1
Lecture Notes - SAN - M1
VTU Syllabus:
Module-1:
A dedicated, fast network that gives storage devices network access is called a Storage
Area Network (SAN).
SANs are generally made up of several technologies, topologies, and protocols that are
used to connect hosts, switches, storage elements, and storage devices. SANs can cover several
locations.
Data transfer between the server and storage device is the primary goal of SAN.
Additionally, it makes data transmission across storage systems possible. Storage area
networks are primarily used to connect servers to storage devices including disk-based
storage and tape libraries.
SANs are primarily used to access storage devices, such as disk arrays and tape libraries
from servers so that the devices appear to the operating system as direct-attached storage.
Businesses use data to derive information that is critical to their day-to-day operations.
Storage is a repository that enables users to store and retrieve this digital data.
Data: Data is a collection of raw facts from which conclusions may be drawn.
The following is a list of some of the factors that have contributed to the growth of digital data :
Proliferation of applications and smart devices: Smart phones, tablets, and newer digital
devices, along with smart applications, have significantly contributed to the generation of
digital content.
Types of Data :
Data can be classified as structured or unstructured based on how it is stored and
managed.
Structured data: Structured data is organized in rows and columns in a rigidly defined
format so that applications can retrieve and process it efficiently.
Example: Structured data is typically stored using a database management system (DBMS).
Unstructured data:
Data is unstructured if its elements cannot be stored in rows and columns, and is
therefore difficult to query and retrieve by business applications.
Example: e-mail messages, business cards, or even digital format files such as .doc,
.txt, and .pdf.
The evolution of open systems and the affordability and ease of deployment that they offer
made it possible for business units/departments to have their own servers and storage.
The rapid increase in the number of departmental servers in an enterprise resulted in unprotected,
unmanaged, fragmented islands of information and increased capital and operating expenses.
Information-centric architecture:
Key data centre Elements: Five core elements are essential for the basic functionality of a data
center.
Application: An application is a computer program that provides the logic for computing
operations. Eg: order processing system.
Database: More commonly, a database management system (DBMS) provides a structured
way to store data in logically organized tables that are interrelated. A DBMS optimizes the
storage and retrieval of data.
Host or compute: A computing platform (hardware, firmware, and software) that runs
applications and databases.
Network: A data path that facilitates communication among various networked devices.
Uninterrupted operation of data centers is critical to the survival and success of a business.
It is necessary to have a reliable infrastructure that ensures data is accessible at all times.
While the requirements, ,are applicable to all elements of the data centre infrastructure, our focus
here is on storage systems.
1Availability: All data center elements should be designed to ensure accessibility. The
inability of users to access data can have a significant negative impact on a business.
2 Security: Polices, procedures, and proper integration of the data center core elements that will
prevent unauthorized access to information must be established. In addition to the security
measures for client access, specific mechanisms must enable servers to access only their
allocated resources on storage arrays.
4 Performance: All the core elements of the data center should be able to provide optimal
performance and service all processing requests at high speed. The infrastructure should be
able to support performance requirements.
5 Data integrity: Data integrity refers to mechanisms such as error correction codes or parity bits
which ensure that data is written to disk exactly as it was received. Any variation in data
during its retrieval implies corruption, which may affect the operations of the organization.
Cloud Computing :
To be a cloud, NIST has determined it must have the following five essential characteristics:
The cloud model is classified on the basis of service and deployment of cloud.
Software as a Service (SaaS): The consumer can use the provider‟s applications
running on a cloud infrastructure.
Private Cloud: The cloud infrastructure is provisioned for exclusive use by a single
organization comprising multiple consumers.
Public Cloud: The cloud infrastructure is provisioned for open use by the general
public. It may be owned, managed, and operated by a business, academic, or government
organization.
Hybrid Cloud: The cloud infrastructure is a composition of two or more distinct cloud
infrastructures (private, community, or public).
Simply put, virtualization can make one resource act like many, while cloud
computing lets different users access a single pool of resources.
With virtualization, a single physical server can become multiple virtual machines, which
are essentially isolated pieces of hardware with plenty of processing, memory, storage, and
network capacity.
For smaller companies, cloud computing is easier and more cost-effective to implement.
Resources are accessed via the Internet rather than added to the network.
Many small businesses are turning to the cloud for applications such as customer relationship
management (CRM), hosted voice over IP (VoIP) or off-site storage. The cost of using the
cloud is much lower than implementing virtualization. Cloud computing also offers easier
installation of applications and hardware, access to software .
In virtualization each virtual machine can run independently while sharing the resources of
a single host machine because they‟ve been loaded into hypervisors. Hypervisors, also known
as the abstraction layer, are used to separate physical resources from their virtual environments.
Once resources are pooled together, they can be divided across many virtual environments as
needed.
Adding many guests to one house maximizes resources, which means the business
needs fewer servers. This cuts down on operational costs.
Fewer servers mean fewer people to look after and manage servers. This helps to
consolidate management, thereby reducing costs.
Virtualization also adds another layer of protection for business continuity, since
virtual machines will limit the damage to itself.
Syllabus:
Data Center Environment: Application Database Management System (DBMS), Host
(Compute), Connectivity, Storage, Disk Drive Components, Disk Drive Performance, Host
Access to Data, Direct-Attached Storage, Storage Design Based on Application
Host (Compute):
The computers on which applications run are referred to as hosts. Hosts can range from simple
laptops to complex clusters of servers.
Hosts can be physical or virtual machines. A compute virtualization software enables creating
virtual machines on top of a physical compute infrastructure.
CPU: The CPU consists of four components-Arithmetic Logic Unit (ALU), control unit, registers,
and cache.
Volume Manager :
In the early days, disk drives appeared to the operating system as a number of continuous
disk blocks. The entire disk drive would be allocated to the file system or other data entity used by
the operating system or application.
Lack of flexibility, when a disk drive ran out of space, there was no easy way to extend the file
system’s size. as the storage capacity of the disk drive increased, allocating the entire disk drive for
the file system often resulted in underutilization of storage capacity .
Disk partitioning was introduced to improve the flexibility and utilization of disk drives. In
partitioning, a disk drive is divided into logical containers called logical volumes (LVs)
Concatenation is the process of grouping several physical drives and presenting them to the
host as one big logical volume.
The basic LVM components are physical volumes, volume groups, and logical volumes.
Each physical disk connected to the host system is a physical volume (PV).
A volume group is created by grouping together one or more physical volumes.
Logical volumes are created within a given volume group. A logical volume can be thought
of as a disk partition, whereas the volume group itself can be thought of as a disk .
Compute virtualization:
A virtual machine is a logical entity but appears like a physical host to the operating
system, with its own CPU, memory, network controller, and disks.
Connectivity and communication between host and storage are enabled using:
Physical components
Interface protocols.
The physical components of connectivity are the hardware elements that connect the host to
storage.
Three physical components of connectivity between the host and storage are -
A host interface device or host adapter connects a host to other hosts and storage
devices
e.g. host bus adapter (HBA) and network interface card (NIC).
A port is a specialized outlet that enables connectivity between the host and
external devices. An HBA may contain one or more ports to connect the host.
Interface Protocols:
A protocol enables communication between the host and storage. Protocols are
implemented using interface devices (or controllers) at both source and destination. The popular
interface protocols used for host to storage communications are:
The latest version of the FC interface (16FC) allows transmission of data up to 16 Gb/s.
IP is a network protocol that has been traditionally used for host-to-host traffic. With the
emergence of new technologies, an IP network has become a viable option for host- to-storage
communication.
Storage:
Storage is a core component in a data center. A storage device uses magnetic, optic, or
solid state media. Disks, tapes, and diskettes use magnetic media, CD/DVD uses optical
media. Removable Flash memory or Flash drives uses solid state media.
Tapes:
In the past, tapes were the most popular storage option for backups because of their low
cost. Tapes have various limitations in terms of performance and management, as listed
below:
Data is stored on the tape linearly along the length of the tape.
Search and retrieval of data are done sequentially, and it invariably takes several
seconds to access the data.
Due to these limitations and availability of low-cost disk drives, tapes are no longer a
preferred choice as a backup destination for enterprise-class data centers.
It is also used as a distribution medium for small applications, such as games, or as a means to
transfer small amounts of data from one computer system to another.
The capability to write once and read many (WORM) is one advantage of optical disc
storage. Eg: CD-ROM
Collections of optical discs in an array, called a jukebox, are still used as a fixed-content
storage solution.
Other forms of optical discs include CD-RW, Blu-ray disc, and other variations of DVD .
Disk Drives:
Disk drives are the most popular storage medium used in modern computers for
storing and accessing data for performance-intensive, online applications.
Disks support rapid access to random data locations. Disks have large capacity. Disk
storage arrays are configured with multiple disks to provide increased capacity and
enhanced performance.
Disk drives are accessed through predefined protocols, such as ATA, SATA, SAS, and
FC. These protocols are implemented on the disk interface controllers.
Disk interface controllers were earlier implemented as separate cards, which were
connected to the motherboard.
Modern disk interface controllers are integrated with the disk drives; therefore, disk drives
are known by the protocol interface they support, for example SATA disk, FC disk, etc.
The advances in disk technology improves disk performance. These advances include
increased rotational speed, faster seek times, and higher data rates. Some other advances
such as disk density or total drive capacity also impact the performance.
Disk performance is measured by "total job completion time" for a complex task involving
a long sequence of disk I/Os.
command overhead
seek time
rotational latency
data transfer time
Command overhead -- The time takes for the disk controller to process and handle
I/O request -- depends on the type of interface (IDE or SCSI), type of command
read/write, use of drive's buffer. Typical value is 0.5 ms for buffer miss and 0.1 ms for
buffer hit.
Seek time -- the time for the head to move from its current cylinder to the targer
cylinder. Settling time -- the time to position the head over the target track until the
correct track identification is confirmed. A typical seek time is 10 ms.
Rotational latency -- in the past disk spins at the speed 3,600 rpm. Today the highest
speed is 10,000 rpm and typically 5,400 rpm. representing the average latency 5.6 ms.
Data transfer time -- depends on "data rate" and "transfer size". There are two kinds of
data rate : media rate and interface rate.
Media rate depends on recording density and rotational speed. For example, a
disk rotating at 5,400 rpm with 111 sectors (512 bytes each) per track will have a media
rate 5 Mbytes per second. Interface rate is how fast data can be transferred between
the host and the disk drive over its interface. SCSI drives can do upto 20 Mbytes per
sec. over each 8-bit-wide transfer. IDE drives with the Ultra-ATA interface support upto
33.3 Mbytes per sec.
Example : The typical average time to do a random 4-K byte disk I/O is overhead + seek +
latency + transfer = 0.5 ms + 10 ms + 5.6 ms + 0.8 ms =16.9 ms
Locality of access -- most I/Os are not random, the efffect is that the real seek time is about one
third of random seek time. Taking this into account the above example will be
File System:
A file is a collection of related records or data stored as a unit with a name.A file system is
a hierarchical structure of files. A file system enables easy access to data files residing within a
disk drive, a disk partition, or a logical volume.
It provides users with the functionality to create, modify, delete, and access files. A file
system organizes data in a structured hierarchical manner via the use of directories, which are
containers for storing pointers to multiple files.
All file systems maintain a pointer map to the directories, subdirectories, and files that
are part of the file system
The following list shows the process of mapping user files to the disk storage subsystem with
an LVM
The file system tree starts with the root directory. The root directory has a number of
subdirectories.
Nonjournaling file systems cause a potential loss of files because they use separate writes to
update their data and metadata. If the system crashes during the write process, the metadata or
data might be lost or corrupted. When the system reboots, the file system attempts to update the
metadata structures by examining and repairing them. This operation takes a long time on large
file systems.
Journaling File System uses a separate area called a log or journal. This journal might
contain all the data to be written (physical journal) or just the metadata to be updated (logical
journal). Before changes are made to the file system, they are written to this separate area. After
the journal has been updated, the operation on the file system can be performed. If the system
crashes during the operation, there is enough information in the log to “replay” the log record
and complete the operation. Nearly all file system implementations today use journaling
Advantages:
Journaling results in a quick file system check because it looks only at the active, most
recently accessed parts of a large file system.
Since information about the pending operation is saved, the risk of files being lost is
reduced.
Disadvantage:
they are slower than other file systems. This slowdown is the result of the extra
operations that have to be performed on the journal each time the file system is changed.
But the advantages of lesser time for file system checks and maintaining file system
integrity far outweighs its disadvantage.
DAS stands for Direct Attached Storage. It is a digital storage device connected directly to
the server, workstation, or personal computer via the cable. In Direct Attached Storage,
applications use the block-level access protocol for accessing the data.
The System of DAS is attached directly to the computer through the HBA (Host Bus
Adapter. As compared to NAS devices, its device attaches directly to the server without the
network. The modern systems of this storage device include the integrated controllers of a disk
array with the advanced features.
It is a good choice for those small businesses, workgroups, and departments, which do not
want to share the data across the enterprises. It is used in those places which require less number
of hosts and servers.
Types of DAS
Internal DAS
External DAS
Internal DAS:
Internal DAS is a DAS in which the storage device is attached internally to the
server or PC by the HBA. In this DAS, HBA is used for high-speed bus connectivity over a
short distance.
External DAS
External DAS is a DAS in which the external storage device is directly connected to
the server without any device. In this type of DAS, FCP and SCSI are the protocols which act
as an interface between server and the storage device.
Differences between the Direct Attached Storage (DAS) and Storage Area Network (SAN):
Jan./Feb-2023
1.
a.What is data Center? Explain key Characteristics of data center with neat
diagram. 05
b. Explain the evolution of storage architecture with neat diagram. 05
c. Describe volume manager and compute virtualization with neat diagram. 10
OR
July/August 2022:
1.
a. Explain the core elements of data centre along with key characteristics. 10
b. Discuss the process of host access to storage. 06
c. Write a short note on evolution of storage architecture. 04
OR