0% found this document useful (0 votes)
15 views

3.1-Data and Data Analysis

GYVUGYH BG B
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

3.1-Data and Data Analysis

GYVUGYH BG B
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Data- refers to the collection of raw and unorganized facts and figures, which

may be in the form of numbers, letters, characters or images. Data is often


composed of facts and observations. It is an individual unit containing raw
material that does not have any meaning and is measured in bits and bytes.

Information- provides context for the data and is measured in different units.
Information may be based on questions such as 'who', 'what', 'where' and 'when'.

Knowledge- comes next and refers to when more meaning can be derived from
information, which is then applied to achieve a set goal.

Wisdom- follows on from knowledge and is when knowledge can be applied in


action. One may ask questions such as 'why' and use knowledge and insight to
make decisions, determine
patterns and make predictions.

DIKW-The data, information, knowledge. wisdom (DIKW) pyramid is a diagram


that represents the relationship between data, information, knowledge and
wisdom. Each block builds on the previous block, answering different questions
about the initial data and how to add value to it.
Types of data
● Financial
● Medical
● Meteorological
● Geographical
● Scientific

Metadata- is a set of data that describes and gives information about other data.
For example, a document may store details such as the author, the size of the file
and the date it was created.

Data mining- is the term used to describe the process of finding patterns
and correlations, as well as anomalies, within large sets of data.

Data matching: The process of comparing two different sets of data with the
aim of finding data about the same entity.

Primary data:Original data collected for the first time for a specific purpose.

Secondary data:Data that has already been collected by someone else for a
different purpose.

Relational database: A database that has more than one table.

Validation: In databases, this means that only valid (suitable) data can be
entered,

Verification: In databases, these are checks that the data entered is the actual
data that you want, or that the data entered matches the original source of data.
Two common methods of data verification
● include double entry (for example, being asked to enter a password twice

● When registering, check the data visually. username for a new website) or
having a second person check the data visually

Data visualization- is the process by which large sets of data are converted
into charts, graphs or other visual presentations.

Encryption: The process of converting readable data into unreadable


characters to prevent unauthorized access.

Symmetric key encryption- is where the key to encode and decode the data
is the same. Both computers need to know the key to be able to communicate or
share data. This type of encryption is commonly used in wireless security,
security of archived data and security of databases.

Public key (asymmetric) encryption- uses two different keys to encode and
decode the data. The private key is known by the computer sending the data,
while the public key is given by the computer. It is shared with any computer that
the original computer wishes to communicate with. When sending data, the
public key of the destination computer is used. During transmission, this data
cannot be understood without the private key. Once received by the destination
computer the private key is used to decode the data.

Secure Socket Layer (SSL): is a protocol developed for sending information


securely over the Internet by using an encrypted link between a web server and a
browser.

Transport Layer Security (TLS): is an improved version of SSL and is a


protocol that provides security between client and server applications
communicating over the Internet.

Data masking: The process of replacing confidential data with functional


fictitious data, ultimately anonymizing the data.
Data erasure: The destruction of data at the end of the data life cycle.

Data deletion: The sending of the file to the recycle bin which removes the file
icon and pathway of its location.

Blockchain: a digital ledger of transactions that is duplicated and distributed


across a network of computers.

Big data: Term used to describe large volumes of data, which may be both
structured or unstructured.

Big data can be characterized by the 4Vs: volume,velocity, variety and


veracity.

Volume —-big data consists of very large volumes of data that is created every
day from a wide range of sources, whether it is a human interaction with social
media or the collection of data on an internet of things (IoT) network.
Velocity — the speed that data is being generated, collected and analysed.
Variety — data consists of a wide variety of data types and formats, such as
social media posts, videos, photos and pdf files.
Veracity — refers to the accuracy and quality of the data being collected.

Data privacy: The ability for individuals to control their personal information.

Data reliability: Refers to data that is complete and accurate.


Data integrity: Refers to the trustworthiness of the data and whether it
has been compromised.

Unreliable data
Biased data: This could be due to using biased data sets or bias by humans
when selecting the data.
Viruses and malware: Stored data can be vulnerable to these external threats.
Data can be changed, and therefore lose its integrity, or be corrupted and
ultimately lost.
Reliability and validity of sources: Data can be generated from a number of
online sources; if these sources have not been evaluated, this can lead to
unreliable data being used by the IT systems.
Outdated data: Many IT systems collect and store data that is changing; if data
is not updated it becomes unreliable data. Consider the telephone numbers of
parents at school, for example if a parent does not inform the school of a change
in number, this data cannot be relied on to contact parents.
Human error and lack of precision: Any form of manual data entry is prone
to human error. Automating data entry is crucial for reducing these types of
errors. It is also easy for users to accidentally delete files, move them or even
forget the name of the file and where it was saved. Effective file management
procedures are essential to reduce these types of errors.

Real life example


Geographical-Accessing location data Without Authorization:
Australian Federal Police (AFP)
According to Australian Computer Society's Information Age, in 2021 the
Australian Federal Police (AFP) were being investigated for accessing location
data without gaining the correct authorization. The investigation covered a period
of five years from 2015 to 2020 in which there were 1700 instances of police
accessing location data, with compliance for only 100 of these.
DIKW-Citizen scientists by wired
During 2020—21, there was a marked increase in bird watching, which
generated an increase in data. Many people were working from home during this
time due to the COVID-19 pandemic, and large numbers joined projects to collect
and share data about birds in the form of pictures, sound recordings and
observations. One such citizen-science project, Project Safe Flight, asked users
to record birds injured by flying into windows, while eBird allowed citizens to
update sightings of the different species of birds.
In many cases, the number of people registered to these projects doubled, and
so did the amount of data uploaded. From this data, scientists could see changes
in bird behavior, although it was not clear whether this could be attributed to the
increase in observations, or whether the birds were actually changing their
behavior.

Data analysis in employment


Data is collected widely by both people and communities. In employment, for
example, artificial intelligence can be used to analyze data generated by detailed
questionnaires to identify which employees would be suitable for new job
opportunities. In the health industry, data analysis can be used to determine
staffing levels. Too many staff can lead to overspending on labor costs, while
understaffing can create a stressful working environment and lower the quality of
medical care. Data can be used to solve this issue.

Data breaches from lack of data erasure by njb news


In 2010, some photocopiers that were used to copy sensitive medical information
were sent to be resold without wiping the hard drives. Three hundred pages of
individual medical records containing drug prescriptions and blood test results
were still on the hard drive of the copiers. The US Department of Health and
Human Services settled out of court with the original owner of the copiers for the
violation of the Health Insurance Portability and Accountability Act (HIPAA) for
US$I.2 million.
In 2015, a computer at Loyola University that contained names, social security
numbers and financial information for 5800 students was disposed of before the
hard drive was wiped.

Uses of blockchain
● Microsoft's Authenticator app for digital identity
● the healthcare industry is using blockchain technology for patient data
● blockchain technology can provide a single unchangeable vote per person
in digital voting
● The US Government is using blockchain to track weapon and gun
ownership.
Big data in banking and finance by algorithimxlab
Big data is allowing banks to see customer behavior patterns and market trends.
American Express is using big data to get to know its customers using predictive
models to analyze customer transactions. It is also being used to monitor the
efficiency of internal processes to optimize performance and reduce costs. JP
Morgan has used historical data from billions of transactions to automate trading.
A third use of big data has been to improve cybersecurity and detect fraudulent
transactions. Citibank has developed a real-time machine learning and predictive
modeling system that uses data analysis to detect potentially fraudulent
transactions.

Big data in the sports industry


Bundesliga, Germany's professional association football league, introduced
Match Facts in 2021 to give match insights to its viewers. During a match, 24
cameras are positioned on the field to collect and stream data during the
90-minute game. This data is then converted into metadata and used with past
data to provide insights for the fans, such as which player is being most closely
defended or the likelihood of a goal being scored.

Bias in facial recognition by harvard


In 2019 the National Institute of Standards and Technology (NIST) published a
report analyzing the performance of facial-recognition algorithms. Many of these
algorithms were less reliable in identifying the faces of black or East Asian
people, with American Indian faces being the most frequently misidentified. The
main factor was the non-diverse set of training images used.

Reliability and validity of COVID-19 data by guardian


In June 2020, the Guardian reported on a study that was published online about
the effect of the anti-parasite drug Ivermectin on COVID-19 patients. The data in
the study was obtained from the Surgisphere website using the QuartzClinical
database, which claimed to be monitoring real-time data from 1200 international
hospitals. However, as doctors around the world started using this data, they
soon became concerned regarding the amount of anomalies they found. This
resulted in prestigious medical journals reviewing studies that were based on this
unreliable data and the World Health Organization stopping their research into
the potential COVID-19 treatment.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy