Chapter 6-Foundaiton of BI
Chapter 6-Foundaiton of BI
6.1 what are the problems of managing data resources in a traditional file environment?
→ An effective information system provides users with accurate, timely, and relevant information.
File organization terms and concepts
- A computer system organizes data in a hierarchy that starts with bits and bytes
A database: an organised collection of data stored centrally to serve various information system
applications. A group of related files makes up a database
database management system (DBMS): is software that enables an organization to centralize data,
manage them efficiently, and provide access to the stored data by application programs.
- DBMS acts as an interface between application programs and the physical data files.
- Collects all data in the organization and stores in one place. (ab customer employees, etc)
- Separates the logical (presents data as they would be perceived by end users or business
specialists) and physical view (shows how data are actually organized and structured on physical
storage media ) of the data. Relieves the programmer from the task of understanding where and how
data are actually stored.
- Example: Database is in the middle of accounting department(application system) and the human
resources database (where every info is kept). They send requests to DBMS and it retrieves certain
info that they want.
- Solves problems of traditional file environment
- Controls redundancy (same data stored in diff places), eliminates inconsistencies
- Solves problem of data redundancy by controlling it bc one copy of data
- Solves problem of inconsistency, also more secure
Designing databases:
to create a database, need to understand the relationships among the data, the type of data that will be
maintained in the database, how the data will be used,
● Conceptual design: abstract model of database from a business perspective. describes how the
data elements in the database are to be grouped
● Entity-relationship diagram: methodology for documenting data illustrating relationships. identifies
relationships among data elements and the most efficient way of grouping data elements together
- If the business doesn’t get its data model right, the system wont be bale to serve the business well
● Normalization: process of creating small stable data structures from complex groups of data
● Physical design: detailed description of how the data will actually be arranged and stores on
physical devices
Non-relational databases and databases in the cloud
● Non-relational databases “NoSQL”
- Use a More flexible data model. Designed for managing large data across many distributed
machines and for easily scaling up or down
- Data sets stored across distributed machines, easier to scale, handle large columns on unstructured
and structured data
● Databases in the cloud
- Appeal to start-ups, smaller businesses. example: amazon relational database service
- Private clouds
→ A distributed database is one that is stored in multiple physical locations.
Blockchain (pg 226)
- New business intelligence. Rather than storing data in one palace. We will store in many places.
They will have perfect ability to verify that data.
- Sharing data storage with other people.
- distributed database technology that enables firms and organizations to create and verify
transactions on a network nearly instantaneously without a central authority.
- The blockchain maintains a continuously growing list of records called blocks.
- There are many large benefits to firms using blockchain databases.
- reduce the cost of verifying users, validating transactions, and the risks of storing and
processing transaction information across thousands of firms.
- foundation of bitcoin and other crypto currencies
- Used for financial transactions, supply chain and medical records
- Giving the responsibility of data storage and security to multiple people at the same time
Business intelligence infrastructure
→ array of tools for obtaining info from separate systems and from big data
● Data warehouse
- a database that Stores current and historical data from many core operational transaction systems of
potential interest to decision makers
- data is available to iphones to access as needed but the data cannot be altered. provides analysis
and reporting tools
- Data marts: subset of data warehouses
- Summarized or highly focused portion of firms data for use by a specific population of users. Could
be info on data ab a specific product
- typically focuses on single subject or line of business
● Hadoop 229
- For handling unstructured and semi-structured data in vast quantities, as well as structured data,
organizations are using Hadoop. . It breaks a big data problem down into subproblems, distributes
them among up to thousands of inexpensive computer processing nodes, and then combines the
result into a smaller data set that is easier to analyze
- It breaks a big data problem down into subproblems, distributes them among up to thousands of
inexpensive computer processing nodes, and then combines the result into a smaller data set that is
easier to analyze.
- example: searching for directions on google, connect w a friend on facebook
- key services: hadoop distributed file system (HDFS), make reduce, HBase
● In-memory computing
- Another way of facilitating big data analysis is to use in-memory computing, which relies primarily on
a computer’s main memory (RAM) for data storage to avoid delays in retrieving data
- Complex business calculations that used to take hours or days are able to be completed within
seconds,
● Analytics platforms
- high-speed analytic platforms using both relational and non-relational technology that are optimized
for analyzing large data sets.
Data lake: A data lake is a repository for raw unstructured data or structured data that for the most part has
not yet been analyzed, and the data can be accessed in many ways.
→ Tools for analyzing and providing access to vast amounts of data ot help users make better
business decisions
- OLAP, data mining, text mining, web mining