Management Information Systems Unit - 3 Notes-1
Management Information Systems Unit - 3 Notes-1
With effective data management, people across an organization can find and access trusted data
for their queries.
Visibility
Data management can increase the visibility of your organization’s data assets, making it easier
for people to quickly and confidently find the right data for their analysis.
Reliability
Data management helps minimize potential errors by establishing processes and policies for
usage and building trust in the data being used to make decisions across your organization.
With reliable, up-to-date data, companies can respond more efficiently to market changes and
customer needs.
Security
Data management protects your organization and its employees from data losses, thefts, and
breaches with authentication and encryption tools.
Scalability
Data management allows organizations to effectively scale data and usage occasions with
repeatable processes to keep data and metadata up to date.
Challenges of data management
Every day, it is estimated that 2.5 quintillion bytes of data are created. As a result, the
organization now faces new challenges in terms of obtaining, maintaining, and generating
value from data. A skilled team equipped with the appropriate tools and technology may assist
in controlling and managing the exponential expansion of data created or gathered.
This is one of the most significant challenges that businesses confront. Data redundancy occurs
when the same piece of data is stored in two or more separate places and is a common
occurrence in many businesses.
3- Data quality
Data quality is one of the most obstacles confronting many companies today. Most businesses
utilize a database to update information, however maintaining data quality becomes difficult
while processing or recording information.
When data is gathered from many sources, inconsistency in the data is unavoidable. Inadequate
DM processes and systems contribute to inaccurate data. As a result of the insufficient amount
of data, the data is of poor quality and does not fulfill the criteria.
5- Data Integrity
Data integrity is a concept and process that ensures the accuracy, completeness, consistency,
and validity of an organization's data. It is biggest challenge to maintain the integrity of data.
There is a severe lack of experienced specialists available for immediate recruitment. In reality,
these skilled professionals often have larger pay packages since they are required in any firm
that has to maintain strong control and management of their data.
7- Data security
It is a significant challenge that causes concern among organizations. Data security threats can
come from a variety of sources, including hackers, insider threats, natural disasters and human
error.
DATA INDEPENDENCE
Data Independence
o Logical data independence refers characteristic of being able to change the conceptual
schema without having to change the external schema.
o If we do any changes in the conceptual view of the data, then the user view of the data
would not be affected.
o Physical data independence can be defined as the capacity to change the internal schema
without having to change the conceptual schema and external schema as well.
o If we do any changes in the storage size of the database system server, then the
Conceptual structure and external level of the database will not be affected.
Data redundancy
It is also called data duplicity. In DBMS, when the same data is stored in different tables, it
causes data redundancy.
Sometimes, it is done on purpose for recovery or backup of data, faster access of data, or
updating data easily. Redundant data costs extra money, demands higher storage capacity, and
requires extra effort to keep all the files up to date.
Sometimes, unintentional duplicity of data causes a problem for the database to work properly,
or it may become harder for the end user to access data. Redundant data unnecessarily occupy
space in the database to save identical copies, which leads to space constraints, which is one of
the major problems.
In the above example, there is a "Student" table that contains duplicate data of student Sonu.
Data consistency:
Consistency also implies that any changes made to a single object in one table must be mirrored
in all other tables in which that object appears.
Example, if the student’s home address changes, the change must be shown in all tables where
the previous address previously existed. Data inconsistency occurs when one table has the old
address, and the others have the updated address.
Data administration
Data administration is the process by which data is monitored, maintained and managed by a
data administrator and/or an organization. Data administration allows an organization to
control its data assets, as well as their processing and interactions with different applications
and business processes.
A database management system (DBMS)
A database management system (DBMS) is system software for creating and managing
databases.
A database management system is a group of interrelated data and a set of programs to access
the data.
Operations that can be performed on the database are:
• Creation of table
• Insertion of data in the table
• Protection of data
• Updation of data
• Deletion of data
• Alteration of data definition.
Example:
STUDENT
Student_rollno Name Course Marks
2. A relational database contains multiple tables of data with rows and columns that
relate to each other through special key fields. These databases are more flexible than
flat file structures, and provide functionality for reading, creating, updating, and
deleting data. Relational databases use Structured Query Language (SQL).
EXAMPLE :
COLLEGE DATABASE includes the following relations(tables) like
Department
t
Student Faculty
Network database models also have a hierarchical structure. However, instead of using a
single-parent tree hierarchy, this model supports many to many relationships, as child tables
can have more than one parent.
College
Library
MBA Department
Student
An Object-oriented databases, the information is represented as objects, with different types
of relationships possible between two or more objects. Such databases use an object-oriented
programming language for development.
Relational databases, contains multiple tables of data with rows and columns that relate to
each other.
2.Fields: Fields are also called attributes that defines a table.It have different types of
data, such as text, numbers, date etc.For example creating a student table we have to include
attributed related to student like rollno.,name,class,marks,address all these fields are related to
student.
3.A field value: Each record has a field value. For example, 1, Abhinav,89, MBA.
4.A record: Contains specific data, like information about a particular employee or students.In
below student table there are 7 records.
View
ROLLNO MARKS
1 89
2 78
3 80
4 90
5 87
6 79
7 85
A database Report
A database report is the formatted result of database queries and contains useful data
for decision-making and analysis.
For example, when you want to see the details of students who are doing an MBA.
QUERY:
The query can be defined as a request for data from the database. We use SQL queries
to create, select, and update data in the database.
Data warehouse
Data warehousing is the process of gathering, storing, and managing data from various sources
into one convenient repository.
Different departments of an organization can use it for analysis, reporting, and decision-
making.
It is designed to support the decision-making process by providing a centralized location for
all of an organization's data.
Bill Inmon's definition of a data warehouse is that it is a “subject-oriented, nonvolatile,
integrated, time-variant collection of data in support of management's decisions.”
Characteristics of a Data Warehouse
1. Integrated Data
One of the key characteristics of a data warehouse is that it contains integrated data. This
means that the data is collected from various sources, such as transactional systems, and then
cleaned, transformed, and consolidated into a single, unified view. This allows for easy access
and analysis of the data, as well as the ability to track data over time.
2. Subject-Oriented
A data warehouse is also subject-oriented, which means that the data is organized around
specific subjects, such as customers, products, sales. This allows for easy access to the data
relevant to a specific subject, as well as the ability to track the data over time.
3. Non-Volatile
Another characteristic of a data warehouse is that it is non-volatile. This means that the data
in the warehouse is never updated or deleted, only added to. This is important because it allows
for the preservation of historical data, making it possible to track trends and patterns over time.
4. Time-Variant
A data warehouse is also time-variant, which means that the data is stored with a time
dimension. This allows for easy access to data for specific time periods, such as the last quarter
or last year. This makes it possible to track trends and patterns over time.
Data Warehousing Tools
ETL (Extract, Transform, Load) Tools
One of the key tools used in data warehousing is ETL (Extract, Transform, Load) tools. These
tools are used to extract data from various sources, transform the data to fit the data warehouse
schema, and then load the data into the warehouse.
Data Mining Techniques
Data mining includes the utilization of refined data analysis tools to find previously unknown,
valid patterns and relationships in huge data sets. These tools can incorporate statistical models,
machine learning techniques, and mathematical algorithms, such as neural networks or decision
trees. Thus, data mining incorporates analysis and prediction.
In recent data mining projects, various major data mining techniques have been developed and
used, including association, classification, clustering, prediction, sequential patterns, and
regression.
1. Classification:
This technique is used to obtain important and relevant information about data and metadata.
This data mining technique helps to classify data into different classes.
Example:
This classification is as per the type of data handled. For example, multimedia, spatial data,
text data, time-series data, World Wide Web, and so on.
we can say that Clustering analysis is a data mining technique to identify similar data. The
process of making a group of abstract objects into classes of similar objects is known as
clustering. One group is treated as a cluster of data objects
For example, a business may collect the following information about consumers:
Another example It helps marketers to find the distinct groups in their customer base and they can
characterize their customer groups by using purchasing patterns.
3. Regression:
Regression in data mining is a tool that helps predict numerical values in a given data set,
such as predicting temperature, cost or such values. Hence, regression techniques in data
mining are widely popular in business settings, most popularly in marketing, trend analysis
and varied kinds of financial forecasting.
4. Association Rules:
This data mining technique helps to discover a link between two or more items.
Association rules are if-then statements that support showing the probability of interactions
between data items within large data sets in different types of databases. Association rule
mining has several applications and is commonly used to help sales correlations in data or
medical data sets.
The way the algorithm works is that you have various data, For example, a list of grocery items
that you have been buying for the last six months. It calculates the percentage of items being
purchased together.
5. Outlier detection:
This type of data mining technique relates to the observation of data items in the data set, which
do not match an expected pattern or expected behavior. It is also known as Outlier Analysis or
Outlier Mining. The outlier is a data point that diverges too much from the rest of the dataset.
For Example, a dataset of credit card fraud detection contains transactional data of a bank’s
customer who holds a credit card. If we consider the daily transaction amount by the customer
as one of the attributes, then a transaction with a very high amount as compared to the normal
range of the individual’s expenditure will be considered an outlier.
6. Sequential Patterns:
The sequential pattern is a data mining technique specialized for evaluating sequential data to
discover sequential patterns.
In other words, this technique of data mining helps to discover or recognize similar patterns in
transaction data over some time.
Example: Customer shopping sequences: • First buy Computer, then CD-ROM, and then digital
camera, within 12 months.
7. Prediction:
Prediction used a combination of other data mining techniques such as trends, clustering,
classification, etc. It analyzes past events or instances in the right sequence to predict a future
event.
An example of this is, Any retailer can look through a customer database and predict future
transactions by looking at previous transactions. In other words, previous data may allow the
shopkeeper to forecast what will happen in the future, allowing businesspeople to plan
accordingly.
Business Intelligence
Business Intelligence is one of the most powerful tools many organizations use to know their
customer base and market better. It describes the business methodology in which the raw data
is transformed into useful information which helps in decision-making.
Business intelligence has broad applications, and if talking about the benefits of business
intelligence in the retail sector, nowadays business intelligence tools enable organizations to
take benefit of data not only to assume current sales but also to estimate future potential,
patterns, trends and know the demand of the customer on a deeper level.
CREATION OF TABLE
INSERTION OF DATA
CREATION OF VIEWS
create view student_marks as select rollno,marks from student;
CREATION OF REPROTS
Create a report to show rollno and class of all students.
select rollno,class from student;
create a report that includes all the students who are doing MBA.
select * from student where class=’MBA’;