0% found this document useful (0 votes)

15 views56 pages

DS 1

Uploaded by

thisisbalu22

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views56 pages

DS 1

Uploaded by

thisisbalu22

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 56

Fundamentals of Data Science

Unit 1

Prepared By
Dr.P.Sasikumar
Associate Professor, AIML Dept.
Unit # 01
Introduction: What is Data Science?

• Big Data and Data Science hype

• Getting past the hype
• Why now?
• Datafication
• Current landscape of perspectives
• Data Science Jobs
• What is data Scientist

-In Academia

-In Industry
Basic Terminologies

• Data
• It can be Simulation
-generated

-collected

-retrieved.

Similarity Measures

Data Structures

Algorithms
• Data: facts with no meanings.
• Information: learning from facts.
• Knowledge: practical understanding of a subject.
• Understanding: the ability to absorb knowledge and learn to reason.
• Wisdom: the quality of having experience and good judgment; ability to think and foresee.
• Validity: ways to confirm truth.
The DIKW Pyramid

5
• Cross-sectional data: applied on data without time.
• Temporal data: applied on time series.
• Spatial: considers location i.e. coordinate determination in touch phones.
• Temporal cum Spatial (GIS): considers change with passage of time for example population density.

Measurements of Scales

There are 4 scales of measurement

• Nominal: determines classification of data i.e. male/female.
• Ordinal: determines order of data and can be numerical or non-numerical i.e. time of day (dawn, morning, noon,
afternoon, evening, night).
• Interval: gives the interval of a measurement i.e. temperature interval.
• Ratio: gives ratio of the measurement i.e. weight, height, number of children.
Chapetr#01

InrtoducoitnW
: hasitDaatSceince?

Big Data and Data Science Hype:

Skeptical related to Data Sciences .
• Is data sciences only the stuff going in companies like Google, Facebook and tech companies?
• There’s a distinct lack of respect for the researchers in academia and industry labs who have been
working on this kind of stuff for years, and whose work is based on decades.
• The hype is crazy-In general, hype masks reality and increases the noise-to-signal ratio.
• Statisticians already feel that they are studying and working on the “Science of Data.”
Getting Past the Hype
• Rachel’s experience going from getting a PhD in statistics to
working at Google. In her words:
Getting Past the Hype
We have a couple replies to this:

• Sure, there’s is a difference between industry and academia. But does it really have to be that way?
Why do many courses in school have to be so intrinsically out of touch with reality?
• Even so, the gap doesn’t represent simply a difference between industry statistics and academic
statistics.
• The general experience of data scientists is that, at their job, they have access to a larger body
of knowledge and methodology, as well as a process, which we now define as the data science
process, that has foundations in both statistics and computer science.
Around all the hype, in other words, there is a ring of truth: this is something new.
Why Now?
• We have massive amounts of data about many aspects of our lives, and ,simultaneously, What people
might not know is that the “datafication” of our offline behavior has started as well.
• On the Internet, this means Amazon recommendation systems.
• on Facebook, friend recommendations, film and music recommendations, and so on.
• In finance, this means credit ratings, trading algorithms, and models.
• In education, this is starting to mean dynamic personalized learning and assessments coming out of
places like Knewton and Khan Academy.
• In government, this means policies based on data.
Datafication
• In the May/June 2013 issue of Foreign Affairs, Kenneth Neil Cukier and Viktor Mayer-Schoenberger wrote an
article called “The Rise of Big Data”, In it they discuss the concept of datafication,

They define datafication as a process of “taking all aspects of life and turning
them into data.”

• They follow up their definition in the article with a line that speaks volumes about their perspective:

Once we datafy things, we can transform their purpose and turn the information
into new forms of value.
Datafication
Examples:
• How we quantify friendships with “likes”.
• “Twitter(X) datafies stray thoughts.
• LinkedIn datafies professional networks.
• When we “like” someone or something online, we are intending to be datafied.
• Browse the Web, we are unintentionally through cookies.
• When we walk around in a store, or even on the street, we are being datafied, via sensors,
cameras, or Google glasses.
• Taking part in a social media experiment.

• All-out surveillance and stalking .

But it’s all datafication

Current landscape of perspectives
For example,
• On Quora there’s a discussion from 2010 about “What is Data Science?” and here’s Metamarket CEO Mike
Driscoll’s answer:

Data science, as it’s practiced, is a blend of Red-Bull-fueled hacking and espresso-

inspired statistics.
• Driscoll then refers to Drew Conway’s Venn diagram of data science from 2010.
Current landscape of perspectives
• Nathan Yau’s 2009 post, “Rise of the Data Scientist”, which include:
1. Statistics (traditional analysis you’re used to thinking about)
2. Data munging (parsing, scraping, and formatting data)
3. Visualization (graphs, tools, etc.)
• ASA President Nancy Geller’s 2011 Amstat News article, “Don’t shun the ‘S’ word”, in which she defends
statistics:

• Then at LinkedIn and Facebook, respectively—coined the term “data scientist” in 2008.
• Wikipedia finally gained an entry on data science in 2012.
Data Science Jobs
• For three years running, data science has been dubbed ¨the best job in America.¨ According to Stack
Overflow, it is one of the highest paying jobs in the software sector.
• The GDPR increased the reliance companies have on data scientists due to the need for real-time analytics
and storing data responsibly.
• There are 465 job openings in New York City alone for data scientists.
• LinkedIn recently picked data scientist as its most promising career of 2019. One of the reasons it got the
top spot was that the average salary for people in the role is $130,000.
• The January report from Indeed, one of the top job sites, showed a 29% increase in demand for data
scientists year over year and a 344% increase since 2013 -- a dramatic upswing. But while demand -- in
the form of job postings -- continues to rise sharply, searches by job seekers skilled in data science grew at
a slower pace (14%), suggesting a gap between supply and demand.
The growth in data scientist job postings on Indeed, from December 2016 to December 2018
What Is a Data Scientist, Really?
Perhaps the most concrete approach is to define data science is by its usage.

• In Academia
• An academic data scientist is a scientist, trained in anything from social science to biology, who works with large
amounts of data, and must deal with computational problems posed by the structure, size, messiness, and the complexity
and nature of the data, while simultaneously solving a real-world problem.

• In Industry
More generally, a data scientist is someone who knows
• How to design the experiments,
• how to the process of collecting, cleaning, and munging of data.
• Skills that are also necessary for understanding biases in the data, and for debugging logging output from code.
• Exploratory data analysis, which combines visualization and data sense.
• Find patterns, build models, and algorithms.
• Use analyses for decision making.
What Is a Data Scientist
Data Engineers are the
data professionals who
Data analyst is someone
prepare the “big data”
who merely curates
infrastructure to be
meaningful insights from
analyzed by Data
data.
Scientists

A data scientist is a professional with the capabilities to gather large amounts of data to analyze and synthesize
the information into actionable plans for companies and other organizations.
Statistical Inference

• What is Statistical inference is the process of using a sample to infer the properties of a population. Statistical
procedures use sample data to estimate the characteristics of the whole population from which the sample was
drawn.
• studying a phenomenon, such as the effects of a new medication or public opinion
• populations are usually too large to measure fully.
• Consequently, researchers must use a manageable subset of that population to learn about it.
• By using procedures that can make statistical inferences, you can estimate the properties and processes of a
population.
• More specifically, sample statistics can estimate population parameters.

21
How to Make Statistical Inferences
• Process of making a statistical inference requires you to do the following:
• Draw a sample that adequately represents the population.
• Measure your variables of interest.
• Use appropriate statistical methodology to generalize your sample results to the population while accounting for
sampling error.

Common Inferential Methods

• Hypothesis Testing: Uses representative samples to assess two mutually exclusive hypotheses about a population.
Statistically significant results suggest that the sample effect or relationship exists in the population after accounting
for sampling error.
• Confidence Intervals: A range of values likely containing the population value. This procedure evaluates the
sampling error and adds a margin around the estimate, giving an idea of how wrong it might be.
• Margin of Error: Comparable to a confidence interval but usually for survey results.
• Regression Modeling: An estimate of the process that generates the outcomes in the population.

22
Example Statistical Inference
• real flu vaccine study for an example of making a statistical inference

Treatment Flu count Group size Percent infections

Placebo 35 325 10.8%

Vaccine 28 813 3.4%

Effect 7.4%

Study Findings
• From the table above, 10.8% of the unvaccinated got the flu, while only 3.4% of the vaccinated caught it. The
apparent effect of the vaccine is 10.8% – 3.4% = 7.4%
23
Population and Sample
• In statistics as well as in quantitative methodology, the set of data are collected and selected from a statistical
population with the help of some defined procedures. There are two different types of data sets
namely, population and sample

Population
• It includes all the elements from the data set and measurable characteristics of the population such as mean and
standard deviation are known as a parameter.
• For example, All people living in India indicates the population of India.

There are different types of population. They are:

• Finite Population
• Infinite Population
• Existent Population
• Hypothetical Population
• Let us discuss all the types one by one.

24
Types
• Finite Population

The finite population is also known as a countable population in which the population can be counted. In
other words, it is defined as the population of all the individuals or objects that are finite. For statistical analysis, the
finite population is more advantageous than the infinite population. Examples of finite populations are employees of
a company, potential consumer in a market.
• Infinite Population

The infinite population is also known as an uncountable population in which the counting of units in the
population is not possible. Example of an infinite population is the number of germs in the patient’s body is
uncountable.
• Existent Population

The existing population is defined as the population of concrete individuals. In other words, the population whose
unit is available in solid form is known as existent population. Examples are books, students etc.
• Hypothetical Population

The population in which whose unit is not available in solid form is known as the hypothetical population. A
population consists of sets of observations, objects etc that are all something in common. In some situations, the
populations are only hypothetical.

Examples are an outcome of rolling the dice, the outcome of tossing a coin. 25
:

Differences between population and sample

Comparison Population Sample

Meaning Collection of all the units or elements that possess A subgroup of the members of the
common characteristics population

Includes Each and every element of a group Only includes a handful of units of
population

Characteristics Parameter Statistic

Data Collection Complete enumeration or census Sampling or sample survey

Focus on Identification of the characteristics Making inferences about the

population
26
Sample

• It includes one or more observations that are drawn from the population and the measurable characteristic of a
sample is a statistic.
• Sampling is the process of selecting the sample from the population.
• For example, some people living in India is the sample of the population.

Basically, there are two types of sampling. They are:

• Probability sampling
• Non-probability sampling

27
Probability Sampling
• In probability sampling, the population units cannot be selected at the discretion(Option) of the researcher.
• This can be dealt with following certain procedures which will ensure that every unit of the population consists
of one fixed probability being included in the sample.
• Such a method is also called random sampling.

• Some of the techniques used for probability sampling are:

 Simple random sampling
 Cluster sampling
 Multi-stage sampling

28
Non Probability Sampling
• In non-probability sampling, the population units can be selected at the discretion of the researcher.
• Those samples will use the human judgments for selecting units and has no theoretical basis for estimating the
characteristics of the population.
• Some of the techniques used for non-probability sampling are
 Quota sampling
 Judgment sampling
 Purposive sampling

Population and Sample Examples

• All the people who have the ID proofs is the population and a group of people who only have voter id with them is
the sample.
• All the students in the class are population whereas the top 10 students in the class are the sample.
• All the members of the parliament is population and the female candidates present there is the sample.

29
Statistical Modelling
• A statistical model is a type of mathematical model that comprises of the assumptions undertaken to describe the
data generation process.
• The mathematical expressions will be general enough that they have to include parameters, but the values of these
parameters are not yet known.
• In mathematical expressions, the convention is to use Greek letters for parameters and Latin letters for data.
• So, for example, if you have two columns of data, x and y, and you think there’s a linear relationship, you’d write
down y = β0 +β1x.
• You don’t know what β0 and β1 are in terms of actual numbers yet, so they’re the parameters.
• Other people prefer pictures and will first draw a diagram of data flow, possibly with arrows, showing how things
affect other things or what happens over time.
• This gives them an abstract picture of the relationships before choosing equations to express them.

30
Probability Distributions
What Is Probability?
• Probability denotes the possibility of something happening.
• It is a mathematical concept that predicts how likely events are to occur.
• The probability values are expressed between 0 and 1.
• The definition of probability is the degree to which something is likely to occur.
• This fundamental theory of probability is also applied to probability distributions.

Probability Distributions?
• Statistical function that describes all the possible values and probabilities for a random variable within a given
range.
• This range will be bound by the minimum and maximum possible values, but where the possible value would be
plotted on the probability distribution will be determined by a number of factors.

31
Probability Distribution
A probability distribution (function) is a list of the probabilities of the values (simple
outcomes) of a random variable.
Ex: Number of heads in two tosses of a coin

For some experiments, the probability of a simple outcome can be 0  p( y )  1

easily calculated using a specific probability function. If y is a simple
outcome and p(y) is its probability.
 p( y )  1
all y

RVDist-32
Fitting a model to data
• Many data mining procedures fall within this general framework.
• illustrate with some of the most common, all of which are based on linear models.
• The crux of the fundamental concept of this chapter—fitting a model to data by finding “optimal” model
parameters.

33
Classification via mathematical function

34
Overfitting
• Overfitting occurs when our machine learning model tries to cover all the data points or more than the required
data points present in the given dataset.
• Because of this, the model starts caching noise and inaccurate values present in the dataset, and all these factors
reduce the efficiency and accuracy of the model.
• The chances of occurrence of overfitting increase as much we provide training to our model
• Example: The concept of the overfitting can be understood by the below graph of the linear regression output:

As we can see from the above

graph, the model tries to cover all
the data points present in the
scatter plot. It may look efficient,
but in reality, it is not so. Because
the goal of the regression model to
find the best fit line, but here we
have not got any best fit,
so, it will generate the prediction
errors.

35
How to avoid the Overfitting in Model

• Both overfitting and underfitting cause the degraded performance of the machine learning model. But the main
cause is overfitting, so there are some ways by which we can reduce the occurrence of overfitting in our
model.
• Cross-Validation
• Training with more data
• Removing features
• Early stopping the training
• Regularization

36
basic terms for overfitting
• Signal: It refers to the true underlying pattern of the data that helps the machine learning model to learn from the
data.
• Noise: Noise is unnecessary and irrelevant data that reduces the performance of the model.
• Bias: Bias is a prediction error that is introduced in the model due to oversimplifying the machine learning
algorithms. Or it is the difference between the predicted values and the actual values.
• Variance: If the machine learning model performs well with the training dataset, but does not perform well with
the test dataset, then variance occurs.

37
Basics of R
Introduction
• R is a popular programming language used for statistical computing.
• Its most common use is to analyze and visualize data
• Graphics representation and reporting.
• R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and is
currently developed by the R Development Core Team.
• R is freely available under the GNU General Public License, and pre compiled binary versions are provided for
various operating systems like Linux, Windows and Mac.
• This programming language was named R , based on the first letter of first name of the two R authors (Robert
Gentleman and Ross Ihaka), and partly a play on the name of the Bell Labs.
• R allows integration with the procedures written in the C, C++, .Net, Python or FORTRAN languages for
efficiency.

38
Why Use R?

• It is a great resource for data analysis, data visualization, data science and machine learning
• It provides many statistical techniques (such as statistical tests, classification, clustering and data reduction)
• It is easy to draw graphs in R, like pie charts, histograms, box plot, scatter plot
• It works on different platforms (Windows, Mac, Linux)
• It is open-source and free
• It has a large community support
• It has many packages (libraries of functions) that can be used to solve different problems

39
Features of R

• As stated earlier, R is a programming language and software environment for statistical analysis, graphics
representation and reporting.

The following are the important features of R

• R is a well developed, simple and effective programming language which includes conditionals, loops, input and
output facilities.
• R has an effective data handling and storage facility,
• R provides a suite(SET) of operators for calculations on arrays, lists, vectors and matrices.
• R provides a large and integrated collection of tools for data analysis.
• R provides graphical facilities for data analysis and display either directly at the computer or printing at the
papers.

40
R - Environment Setup
1. Installation of R

In Linux: ( Through Terminal )

• Press Ctrl+Alt+T to open Terminal
• Then execute sudo apt-get update
• After that, sudo apt-get install r-base

41
In Windows:
Step – 1: Go to CRAN R project website. (Comprehensive R Archive Network )

Step – 2: Click on the Download R for Windows link. https://cran.r-project.org/bin/windows/base/

Step – 3: Click on the base subdirectory link or install R for the first time link.

Step – 4: Click Download R X.X.X for Windows (X.X.X stand for the latest version of R.

(eg: 3.6.1) and save the executable .exe file.

Step – 5: Run the .exe file and follow the installation instructions.

5.a. Select the desired language and then click Next.

5.b. Read the license agreement and click Next.

5.c. Select the components you wish to install (it is recommended to install all the components). Click Next.

5.d. Enter/browse the folder/path you wish to install R into and then confirm by clicking Next.

5.e. Select additional tasks like creating desktop shortcuts etc. then click Next.

5.f. Wait for the installation process to complete.

5.g. Click on Finish to complete the installation 42

Install RStudio on Windows
Step – 1: With R-base installed, let’s move on to installing RStudio.

To begin, go to download RStudio and click on the download button for RStudio desktop.

Step – 2: Click on the link for the windows version of RStudio and save the .exe file.

Step – 3: Run the .exe and follow the installation instructions.

3.a. Click Next on the welcome window.

3.b. Enter/browse the path to the installation folder and click Next to proceed.

3.c. Select the folder for the start menu shortcut or click on do not create shortcuts and then click Next.

3.d. Wait for the installation process to complete.

3.e. Click Finish to end the installation

43
Syntax
1.To output text in R, use single or double quotes:
• Example

"Hello World!"

2.To output numbers, just type the number (without quotes):

• Example

5
10
25

3. To do simple calculations, add numbers together:

Example

5+5

44
R Print Output
1.Print : Unlike many other programming languages, you can output code in R without using a print function:

Example

"Hello World!"
• However, R does have a print() function available if you want to use it. This might be useful if you are familiar with
other programming languages, such as Python, which often uses the print() function to output code.

Example

print("Hello World!")
• And there are times you must use the print() function to output code, for example when working with for loops.

Example
• for (x in 1:10)
• {
print(x)
}
• It is up to you whether you want to use the print() function to output code. However, when your code is inside an R
expression (e.g. inside curly braces {} like in the example above), use the print() function to output the result. 45
Comments
• Comments can be used to explain R code, and to make it more readable. It can also be used to prevent execution when testing alternative
code.

• Comments starts with a #. When executing code, R will ignore anything that starts with #.

• This example uses a comment before a line of code:

• Example

• # This is a comment
"Hello World!"

• This example uses a comment at the end of a line of code:

• Example

• "Hello World!" # This is a comment

• Comments does not have to be text to explain the code, it can also be used to prevent R from executing the code:

• Example

• # "Good morning!"
"Good night!“

• Multiline Comments :Unlike other programming languages, such as Java, there are no syntax in R for multiline comments. However, we can
just insert a # for each line to create multiline comments: 46
Creating Variables in R
• Variables are containers for storing data values.
• R does not have a command for declaring a variable.
• A variable is created the moment you first assign a value to it. To assign a value to a variable, use the <- sign. To
output (or print) the variable value, just type the variable name:

• From the example above, name and age are variables, while "John" and 40 are values.
• In other programming language, it is common to use = as an assignment operator.
• In R, we can use both = and <- as assignment operators.
• However, <- is preferred in most cases because the = operator can be forbidden in some context in R.

47
Print / Output Variables
• Compared to many other programming languages, you do not have to use a function to print/output variables in
R. You can just type the name of the variable:

• However, R does have a print() function available if you want to use it. This might be useful if you are familiar
with other programming languages, such as Python, which often use a print() function to output variables.

• And there are times you must use the print() function to output code, for example when working with for loops
(which you will learn more about in a later chapter):

48
Concatenate Elements
• You can also concatenate, or join, two or more elements, by using the paste() function.
• To combine both text and a variable, R uses comma (,):

• You can also use , to add a variable to another variable:

• For numbers, the + character works as a mathematical operator:

49
Multiple Variables
• R allows you to assign the same value to multiple variables in one line:

50
Variable Names
• A variable can have a short name (like x and y) or a more descriptive name (age, carname, total_volume). Rules
for R variables are:A variable name must start with a letter and can be a combination of letters, digits, period(.)
and underscore(_). If it starts with period(.), it cannot be followed by a digit.
• A variable name cannot start with a number or underscore (_)
• Variable names are case-sensitive

EX: (age, Age and AGE are three different variables)

• Reserved words cannot be used as variables

EX: (TRUE, FALSE, NULL, if...)

51
R - Data Types
• Generally, while doing programming in any programming language, you need to use various variables to store
various information.
• Variables are nothing but reserved memory locations to store values.
• This means that, when you create a variable you reserve some space in memory.
• You may like to store information of various data types like character, wide character, integer, floating point,
double floating point, Boolean etc. Based on the data type of a variable, the operating system allocates memory and
decides what can be stored in the reserved memory.
• In contrast to other programming languages like C and java in R, the variables are not declared as some data type.
• The variables are assigned with R-Objects and the data type of the R-object becomes the data type of the variable

52
Data Types in R are:

• Each R-Data Type requires different amounts of memory and has some specific operations which can be
performed over it.
• numeric – (3,6.7,121)
• Integer – (2L, 42L; where ‘L’ declares this as an integer)
• logical – (‘True’)
• complex – (7 + 5i; where ‘i’ is imaginary number)
• character – (“a”, “B”, “c is third”, “69”)
• raw – ( as.raw(55); raw creates a raw vector of the specified length)

53
Data type and the values that each data type can
take.
Basic Data Types Values Examples

"numeric_value <- 3.14"

Numeric Set of all real numbers

"integer_value <- 42L"

Integer Set of all integers, Z

"logical_value <- TRUE"

Logical TRUE and FALSE

"complex_value <- 1 + 2i"

Complex Set of complex numbers

"character_value <- "Hello Geeks"

“a”, “b”, “c”, …, “@”, “#”, “$”, …., “1”, “2”,
Character
…etc

"single_raw <- as.raw(255)"

raw as.raw()

54
Data Types
Data type Example Description
Logical True, False It is a special data type for data with only two possible values which
can be construed as true/false.

Numeric 12,32,112,5432 Decimal value is called numeric in R, and it is the default computational
data type.

Integer 3L, 66L, 2346L Here, L tells R to store the value as an integer,

Complex Z=1+2i, t=7+3i A complex value in R is defined as the pure imaginary value i.

Character 'a', '"good'", "TRUE", In R programming, a character is used to represent string values. We
'35.4' convert objects into character values with the help ofas.character()
function.

Raw A raw data type is used to holds raw bytes.

55
Sample program
• # numeric
# character/string
• x <- 10.5
x <- "R is exciting"
• class(x) class(x)

# logical
• # integer
x <- TRUE
• x <- 1000L class(x)

• class(x)

• # complex
• x <- 9i + 3
• class(x)

Datascience
75% (8)
Datascience
28 pages
DSV Module-1
No ratings yet
DSV Module-1
26 pages
CH 1 Introduction To Data Science
No ratings yet
CH 1 Introduction To Data Science
63 pages
Lecture-1 Introduction To Data Science
No ratings yet
Lecture-1 Introduction To Data Science
20 pages
Data Science: Chapter 1: Introduction To Big Data
100% (2)
Data Science: Chapter 1: Introduction To Big Data
77 pages
DS - Module 1
No ratings yet
DS - Module 1
57 pages
Module1 21CS644 DSV
No ratings yet
Module1 21CS644 DSV
16 pages
Lecture 1
No ratings yet
Lecture 1
24 pages
Data Science 1A
100% (2)
Data Science 1A
53 pages
Dia 1
No ratings yet
Dia 1
88 pages
Bsd1313 Chapter 1
No ratings yet
Bsd1313 Chapter 1
60 pages
Fundamentals of Data Science
100% (3)
Fundamentals of Data Science
62 pages
Ids (R22) U1 PPT 03092024
No ratings yet
Ids (R22) U1 PPT 03092024
87 pages
BD - eBOOK Big Data Data Scientist
No ratings yet
BD - eBOOK Big Data Data Scientist
11 pages
Data Science
No ratings yet
Data Science
87 pages
INTRODUCTION and M1-CH-1
No ratings yet
INTRODUCTION and M1-CH-1
63 pages
Data Science
No ratings yet
Data Science
85 pages
Modul 1
No ratings yet
Modul 1
56 pages
Fds Module 1
No ratings yet
Fds Module 1
65 pages
Lec1 - For Upload Complete
No ratings yet
Lec1 - For Upload Complete
111 pages
Careers in Data Science - Institute For Career Research - Careers Ebooks, 2021 - Institute For Career Research - Anna's Archive
No ratings yet
Careers in Data Science - Institute For Career Research - Careers Ebooks, 2021 - Institute For Career Research - Anna's Archive
43 pages
CSIC 221: Machine Learning & Data Analytics: Mayank Dave Professor Dept. of Computer Engineering
No ratings yet
CSIC 221: Machine Learning & Data Analytics: Mayank Dave Professor Dept. of Computer Engineering
23 pages
DS231 Week 2
No ratings yet
DS231 Week 2
33 pages
Cristinnata LH - QAQC of Mine Geology Sampling
100% (1)
Cristinnata LH - QAQC of Mine Geology Sampling
35 pages
347 862932 Introduction
No ratings yet
347 862932 Introduction
35 pages
Chapter 1 - Lecture
No ratings yet
Chapter 1 - Lecture
7 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
16 pages
3250+module+1+ +Intro+to+Data+Science
No ratings yet
3250+module+1+ +Intro+to+Data+Science
71 pages
Executive Data Science A Guide To Training and Managing The Best Data Scientists by Brian Caffo, Roger D. Peng, Jeffrey T. Leek
100% (1)
Executive Data Science A Guide To Training and Managing The Best Data Scientists by Brian Caffo, Roger D. Peng, Jeffrey T. Leek
150 pages
Ids Unit-I
No ratings yet
Ids Unit-I
34 pages
Carmichael MArron 2018 OJO
No ratings yet
Carmichael MArron 2018 OJO
22 pages
Introduction To Datasciecne
No ratings yet
Introduction To Datasciecne
50 pages
Data Science-New (Unit-I)
No ratings yet
Data Science-New (Unit-I)
18 pages
Unit 1
No ratings yet
Unit 1
76 pages
Data Scientist: How To Become A
100% (3)
Data Scientist: How To Become A
45 pages
Datascience Internship
No ratings yet
Datascience Internship
19 pages
Data Science vs. Statistics: Two Cultures?
No ratings yet
Data Science vs. Statistics: Two Cultures?
22 pages
DSF 1-2
No ratings yet
DSF 1-2
28 pages
Datascience (Mod1)
No ratings yet
Datascience (Mod1)
4 pages
Data Science - AD1102-1
No ratings yet
Data Science - AD1102-1
53 pages
Data Science
No ratings yet
Data Science
16 pages
Unit I Introduction To Data Science
No ratings yet
Unit I Introduction To Data Science
79 pages
Eds
100% (2)
Eds
151 pages
Chapter 1
No ratings yet
Chapter 1
47 pages
Data Science
No ratings yet
Data Science
40 pages
Final Research 1
No ratings yet
Final Research 1
33 pages
Ch7-Overview of Data Science-Part 1
No ratings yet
Ch7-Overview of Data Science-Part 1
37 pages
Chapter 2 - Sensory Evaluation
No ratings yet
Chapter 2 - Sensory Evaluation
10 pages
6220010
No ratings yet
6220010
37 pages
Module - 1 IDS
100% (1)
Module - 1 IDS
19 pages
01 Introduction
No ratings yet
01 Introduction
37 pages
Inroduction To Data Science
No ratings yet
Inroduction To Data Science
62 pages
What Is Data Science
No ratings yet
What Is Data Science
4 pages
Introduction To Data Science Lecture 1
No ratings yet
Introduction To Data Science Lecture 1
4 pages
Data
No ratings yet
Data
43 pages
Sluice Design - Wyatt Yeager MSC PDF
100% (8)
Sluice Design - Wyatt Yeager MSC PDF
42 pages
Ds Intro KK
No ratings yet
Ds Intro KK
11 pages
Stat - Normal Curve. Week 3 4
100% (1)
Stat - Normal Curve. Week 3 4
40 pages
Data Science Career Guide
No ratings yet
Data Science Career Guide
19 pages
Bajaj Tiles Retailer Sat.
100% (1)
Bajaj Tiles Retailer Sat.
42 pages
Semana 1: The Data Scientist's Toolbox
No ratings yet
Semana 1: The Data Scientist's Toolbox
20 pages
EDS Unit 1?
No ratings yet
EDS Unit 1?
15 pages
7 Step Ebook Guide
No ratings yet
7 Step Ebook Guide
69 pages
PDF Bell Ceramics Summer Project Report Gaurav 2011 - Compress
No ratings yet
PDF Bell Ceramics Summer Project Report Gaurav 2011 - Compress
75 pages
The Effect of Teachers Characteristics o
No ratings yet
The Effect of Teachers Characteristics o
58 pages
Mohammad Fazal-1
No ratings yet
Mohammad Fazal-1
81 pages
Impulsive Buying Behaviour - Big Bazaar
100% (1)
Impulsive Buying Behaviour - Big Bazaar
62 pages
HIES Final Report 2022 - English - 24dec23
No ratings yet
HIES Final Report 2022 - English - 24dec23
236 pages
DataScientist v2
No ratings yet
DataScientist v2
14 pages
Uavs Designing and Fabrication
No ratings yet
Uavs Designing and Fabrication
19 pages
Introduction To Data Science 5-13
No ratings yet
Introduction To Data Science 5-13
19 pages
Saintgits Institute of Management: Course Plan
No ratings yet
Saintgits Institute of Management: Course Plan
13 pages
Business Statistics Notes - Numericals
No ratings yet
Business Statistics Notes - Numericals
21 pages
Text2fa - Ir Exploring The Impact of Artificial Intelligence in Pers
No ratings yet
Text2fa - Ir Exploring The Impact of Artificial Intelligence in Pers
13 pages
Nursing and Research Statistics (Proposal)
No ratings yet
Nursing and Research Statistics (Proposal)
34 pages
Eng10 1st QT Week 2.1 Noting Important Information (Without Drill)
No ratings yet
Eng10 1st QT Week 2.1 Noting Important Information (Without Drill)
20 pages
Ch3 Numerical Descriptive Measures
No ratings yet
Ch3 Numerical Descriptive Measures
18 pages
308 - Unit 4 Research Methodology
No ratings yet
308 - Unit 4 Research Methodology
24 pages
AGREE Analytical GREEnness Metric Approach and Software
No ratings yet
AGREE Analytical GREEnness Metric Approach and Software
7 pages
Sunny Bradus Ambel - Audience Perception of Layi Wasabi Skit Contents On Social Media and Cultural Promotion in Nigeria
No ratings yet
Sunny Bradus Ambel - Audience Perception of Layi Wasabi Skit Contents On Social Media and Cultural Promotion in Nigeria
32 pages
45 160 1 PB
No ratings yet
45 160 1 PB
10 pages
Ajol File Journals - 333 - Articles - 243360 - Submission - Proof - 243360 3973 584939 1 10 20230312
No ratings yet
Ajol File Journals - 333 - Articles - 243360 - Submission - Proof - 243360 3973 584939 1 10 20230312
16 pages
Elliott, Raghunathan, & Schenker For Wiley StatsRef PDF
No ratings yet
Elliott, Raghunathan, & Schenker For Wiley StatsRef PDF
10 pages
Stat Cluster Sampling
No ratings yet
Stat Cluster Sampling
22 pages
Comparative Study of Metrobank and BDO
No ratings yet
Comparative Study of Metrobank and BDO
22 pages
7.SP.A Estimating The Mean State Area
No ratings yet
7.SP.A Estimating The Mean State Area
5 pages
MMPC015 Exam Notes Sushant
No ratings yet
MMPC015 Exam Notes Sushant
5 pages
GRMD2102 Homework 1 - With - Answer
No ratings yet
GRMD2102 Homework 1 - With - Answer
3 pages
Temali Template
No ratings yet
Temali Template
5 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

DS 1

Uploaded by

DS 1

Uploaded by

Fundamentals of Data Science

• Big Data and Data Science hype

There are 4 scales of measurement

Big Data and Data Science Hype:

• All-out surveillance and stalking .

But it’s all datafication

Data science, as it’s practiced, is a blend of Red-Bull-fueled hacking and espresso-

Common Inferential Methods

Treatment Flu count Group size Percent infections

Placebo 35 325 10.8%

Vaccine 28 813 3.4%

There are different types of population. They are:

Differences between population and sample

Comparison Population Sample

Characteristics Parameter Statistic

Data Collection Complete enumeration or census Sampling or sample survey

Focus on Identification of the characteristics Making inferences about the

Basically, there are two types of sampling. They are:

• Some of the techniques used for probability sampling are:

Population and Sample Examples

For some experiments, the probability of a simple outcome can be 0  p( y )  1

As we can see from the above

The following are the important features of R

In Linux: ( Through Terminal )

Step – 2: Click on the Download R for Windows link. https://cran.r-project.org/bin/windows/base/

(eg: 3.6.1) and save the executable .exe file.

5.a. Select the desired language and then click Next.

5.b. Read the license agreement and click Next.

5.f. Wait for the installation process to complete.

5.g. Click on Finish to complete the installation 42

Step – 3: Run the .exe and follow the installation instructions.

3.a. Click Next on the welcome window.

3.d. Wait for the installation process to complete.

3.e. Click Finish to end the installation

2.To output numbers, just type the number (without quotes):

3. To do simple calculations, add numbers together:

• This example uses a comment before a line of code:

• This example uses a comment at the end of a line of code:

• "Hello World!" # This is a comment

• You can also use , to add a variable to another variable:

• For numbers, the + character works as a mathematical operator:

EX: (age, Age and AGE are three different variables)

EX: (TRUE, FALSE, NULL, if...)

"numeric_value <- 3.14"

"integer_value <- 42L"

"logical_value <- TRUE"

"complex_value <- 1 + 2i"

"character_value <- "Hello Geeks"

"single_raw <- as.raw(255)"

Raw A raw data type is used to holds raw bytes.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.