3.1-Data and Data Analysis
3.1-Data and Data Analysis
Information- provides context for the data and is measured in different units.
Information may be based on questions such as 'who', 'what', 'where' and 'when'.
Knowledge- comes next and refers to when more meaning can be derived from
information, which is then applied to achieve a set goal.
Metadata- is a set of data that describes and gives information about other data.
For example, a document may store details such as the author, the size of the file
and the date it was created.
Data mining- is the term used to describe the process of finding patterns
and correlations, as well as anomalies, within large sets of data.
Data matching: The process of comparing two different sets of data with the
aim of finding data about the same entity.
Primary data:Original data collected for the first time for a specific purpose.
Secondary data:Data that has already been collected by someone else for a
different purpose.
Validation: In databases, this means that only valid (suitable) data can be
entered,
Verification: In databases, these are checks that the data entered is the actual
data that you want, or that the data entered matches the original source of data.
Two common methods of data verification
● include double entry (for example, being asked to enter a password twice
● When registering, check the data visually. username for a new website) or
having a second person check the data visually
Data visualization- is the process by which large sets of data are converted
into charts, graphs or other visual presentations.
Symmetric key encryption- is where the key to encode and decode the data
is the same. Both computers need to know the key to be able to communicate or
share data. This type of encryption is commonly used in wireless security,
security of archived data and security of databases.
Public key (asymmetric) encryption- uses two different keys to encode and
decode the data. The private key is known by the computer sending the data,
while the public key is given by the computer. It is shared with any computer that
the original computer wishes to communicate with. When sending data, the
public key of the destination computer is used. During transmission, this data
cannot be understood without the private key. Once received by the destination
computer the private key is used to decode the data.
Data deletion: The sending of the file to the recycle bin which removes the file
icon and pathway of its location.
Big data: Term used to describe large volumes of data, which may be both
structured or unstructured.
Volume —-big data consists of very large volumes of data that is created every
day from a wide range of sources, whether it is a human interaction with social
media or the collection of data on an internet of things (IoT) network.
Velocity — the speed that data is being generated, collected and analysed.
Variety — data consists of a wide variety of data types and formats, such as
social media posts, videos, photos and pdf files.
Veracity — refers to the accuracy and quality of the data being collected.
Data privacy: The ability for individuals to control their personal information.
Unreliable data
Biased data: This could be due to using biased data sets or bias by humans
when selecting the data.
Viruses and malware: Stored data can be vulnerable to these external threats.
Data can be changed, and therefore lose its integrity, or be corrupted and
ultimately lost.
Reliability and validity of sources: Data can be generated from a number of
online sources; if these sources have not been evaluated, this can lead to
unreliable data being used by the IT systems.
Outdated data: Many IT systems collect and store data that is changing; if data
is not updated it becomes unreliable data. Consider the telephone numbers of
parents at school, for example if a parent does not inform the school of a change
in number, this data cannot be relied on to contact parents.
Human error and lack of precision: Any form of manual data entry is prone
to human error. Automating data entry is crucial for reducing these types of
errors. It is also easy for users to accidentally delete files, move them or even
forget the name of the file and where it was saved. Effective file management
procedures are essential to reduce these types of errors.
Uses of blockchain
● Microsoft's Authenticator app for digital identity
● the healthcare industry is using blockchain technology for patient data
● blockchain technology can provide a single unchangeable vote per person
in digital voting
● The US Government is using blockchain to track weapon and gun
ownership.
Big data in banking and finance by algorithimxlab
Big data is allowing banks to see customer behavior patterns and market trends.
American Express is using big data to get to know its customers using predictive
models to analyze customer transactions. It is also being used to monitor the
efficiency of internal processes to optimize performance and reduce costs. JP
Morgan has used historical data from billions of transactions to automate trading.
A third use of big data has been to improve cybersecurity and detect fraudulent
transactions. Citibank has developed a real-time machine learning and predictive
modeling system that uses data analysis to detect potentially fraudulent
transactions.