0% found this document useful (0 votes)

20 views30 pages

Ap Internship Last

The document outlines the field of data science, its processes, and its relevance in optimizing business decisions through data analysis. It describes the YBI Foundation, an online education platform that offers various training programs in data science and machine learning, aiming to empower individuals with industry-relevant skills. The document also details the training modules, objectives, and the impact of the foundation on students' career prospects.

Uploaded by

05 Abhishek Pandey CSE

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views30 pages

Ap Internship Last

Uploaded by

05 Abhishek Pandey CSE

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

1.0 .

INTRODUCTION

Data science is the field of data analytics and data visualization in which raw data or the
unstructured data is cleaned and made ready for the analysis purpose. Data scientists use
this data to get the required information for the future purpose.” Data science uses many
processes and methods on the big data, the data may be structured or unstructured’’. Data
frames available on the internet is the raw data we get. It may be either in unstructured or
semi structured format. This data is further filtered, cleaned and then number of required
task are performed for the analysis with the use of the high programming language. This
data is further analyzed and then presented for our better understanding and evaluation.
So, I have 5 modules that cover:

Module 1: Introduction

Module 2: Python , Python Libraries for Data Science & Machine Learning.

Module 3: Data Science.

Module 4: Machine Learning.

Module 5: Fundamental practice projects .

1.1. Objective

 To explore, sort and analyse mega data from various sources to take advantage of them
and reach conclusions to optimize business processes and for decision support.
 Examples include machine maintenance, in fields of marketing and sales with sales
forecasting based on weather.
 Improve technical skill
 To get awareness about various job opportunities.
 Apply employability skills including fundamental skills and teamwork skills.
 Understand basic concept of Machine learning, Data science .

1
1.2. Scope

 With data being termed as the future oil for organizations, analytics have become an
engine that drives it to arrive at meaningful insights. The powerful combination of
both is what is driving the future scope of data science. All across the globe,
organizations are innovating multiple methods to harness data and use this powerful
tool to drive their businesses.
 The confluence of massive data influx and the need to harness this huge amount of
data has together built a big job market

2
2.0 Company Background & Structure

2.1. YBI Foundation

YBI Foundation Education Private Limited is an online education platform that enables
individuals to develop their professional potential in the most engaging learning environment.
Online education is a fundamental disruption to the traditional model and will be having a far-
reaching impact.

At YBI Foundation, we work towards transforming the online education wave into a tsunami!
At YBI Foundation, our objective is to help individual climb their Future career ladder and
transition smoothly into promising job profile . YBI Foundation is an online higher education
platform providing rigorous industryrelevant programs designed and delivered in collaboration
with world-class faculty and industry. Merging the latest technology, pedagogy, and services,
YBI Foundation is creating an immersive learning experience–anytime and anywhere.

Background:

 Established in October 2020 in Delhi, India, as a not-for-profit company.

 Focuses on education and technology, aiming to empower youth through training in

emerging technologies like data science, machine learning, and artificial intelligence.

 Offers free and paid educational programs, including online

courses, internships, bootcamps, and full-stack programs.

3
 Has partnered with various organizations and companies to provide industry-relevant
training and placement opportunities.

Vision
• Building Careers of Tomorrow

Mission
• To provide opportunities to advance your professional journey through rigorous online
programs that offer personalised support, developed in collaboration with best-in-class faculty
and industry professionals.

Founders
Alok Yadav Co-Founder
Arushi Co-Founder
Phalgun Kompalli (Co-Founder and operations)

Programs:

 Free Programs:

o One-month and two-month Data Science and Machine Learning internships

for beginners.

o Provide basic fundamentals and hands-on experience through projects.

4
YBI Foundation Free Programs

 Paid Programs:

o Data Science and Machine Learning program for comprehensive career

preparation Advance.
o Specialization in Web Scraping Certificate cum Internship.
o Full Stack Dual Certificate Program combining data science and web
development.

YBI Foundation Paid Programs

 Additional Offerings:

o Guaranteed Placement Assistance Program.

o Corporate training programs for organizations.

Impact:

5
 Trained over 10,000 students in data science and related fields.

 Helped students secure placements in top companies like Amazon, IBM, and Flipkart.

 Received positive feedback and testimonials from students and industry partners.

6
3.0 Weekly Jobs Summary

3.1 Weekly Records

Module 1: Introduction

Week1:

 Introduction to Internship.

 Instructions to Complete Internship.

 Introduction to mentor.

 Scope of Data Science .

 Upgrade Your Internship.

Module 2: Python , Python Libraries for Data Science & Machine Learning

week 1&week2:

 Introduction to Python.

 Introduction to Google Colab.

 Python libraries for Data science & Machine Learning.

 Read Data As DataFrame.

 Explore DataFrame.

7
Module 3 : Data science

Week 2:

 Introduction to Data science.

 Mathematics and Statistics.

 Data Exploration and Preprocessing.

 Data Visualization.

 Tools and Platforms.

Module 4: Machine Learning

Week 3&Week 4 :

 Introduction to Machine Learning.

 Types of machine learning (supervised, unsupervised, reinforcement

learning).

 Real-world Applications and Case Studies.

 Train Test Split.

 Linear Regression Model.

 Logistic Regression Models

8
Module 5 : Fundamental practice projects

Week 4:
Fundamental practice Projects

 Chance of Admission Prediction.

 Ice Cream Sales Revenue Prediction.

 Cancer Prediction.

 Fish Weight Prediction.

 Purchase Prediction And Micronumerosity.

 Credit Card Default Prediction.

9
3.2 About the Training

 The program content is highly interactive & developed by YBI Foundation. Employees
with decades of experienced in Data Science Field.
 This training in Data Science which taught me about basics of Python, Data Science,
Data Analytics, Data Visualization, Database, Python Libraries & Machine Learning.
 The training consist of 5 Modules.
 After each module there is a quiz.
 After completion of modules there is mini project & major project.
 To become eligible for getting the training certificate, it is mandatory for the trainee to
score at least 50% marks in the final quiz.

3.3 Training schedule & location

 Training schedule is 24/7 which means we watched recorded and live videos anytime

and from anywhere.

 It is an Remote internship we don’t need to travel anywhere we can do this on your

laptop, smartphone etc.

10
4.0 Technical contents

In this training program, I learned the following technologies :

4.1 Data Science

 Data Science as a multi-disciplinary subject that uses mathematics, statistics, and

computer science to study & evaluate data.
 The key objective of Data Science is to extract valuable information for use in strategic
decision making, product development, trend analysis, and forecasting. Data Science
concepts and process are mostly derived from data engineering, statistics,
programming, social engineering, data warehousing, machine learning, natural
language processing.
 The key techniques in use are data mining, big data analysis, data extraction, and data
retrieval. Data Science is the field of study that combines domain expertise,
programming skills, and knowledge of mathematics and statistics to extract meaningful
insights from data.
 Data Science practitioners apply Machine learning algorithms to numbers, text, images,
videos, audios, and more to produce Artificial Intelligence ( AI ) systems to perform
tasks that ordinarily require.

DATA SCIENCE PROCESS :

• The first step of this process is setting a research goal. The main purpose here is making
sure all the stakeholders understand the what, how, and why of the project.
• The second phase is data retrieval. You want to have data available for analysis, so this
steps includes finding suitable data and getting access to the data form the data owner. The
result is data in its raw from, which probably needs polishing and transformation before it
becomes usable.
• Now that you have the raw data, it’s time to prepare it. This includes transforming the data
from a raw form into data that’s directly usable in your models. To achieve this, you’ll
detect and correct different kinds of errors in the data, combine data from different sources,

11
and transform it. If you have successfully completed this step, you can progress to data
visualization and modelling.
• The fourth step is data exploration. The goal of this step is to gain a deep understanding of
the data. You’ll look for patterns, correlations, and deviations based on visual descriptive
techniques. The insights you gain from this phase will enable you to start modelling.
• Finally, we get to the sexiest part : model building ( often referred to as “ Data Modelling
” throughout this book ). It is now that you attempt to gain the insights insights or make the
predictions in your project charter. Now is the time to bring out the heavy guns, but
remember research has taught us that often ( but not always ) a combinations of simple
models tends to outperform one complicated model. If you’ve done this phase right, you’re
almost done.
• The last step of data science model is presenting your results and automating the analysis,
if needed. One goal of a project is tochange a process and/or make better decisions. You
may need to convince the business that your findings will indeed change the business
process as expected. This is where you can shine in your influencer role. The importance
of this step is more apparent in projects on a strategic and tactical level.

The core skills of a data scientist.

 Programming: Python is the most popular language for data science, but R, Java, and
Scala are also widely used.

 Statistics: Understanding statistical concepts like hypothesis testing, regression

analysis, and time series analysis is essential for data analysis.

 Mathematics: Linear algebra, calculus, and probability theory are all important for
building and understanding machine learning models.

 Domain knowledge: Data scientists need to have a deep understanding of the specific
problem or industry they're working in.

 Communication: Data scientists need to be able to effectively communicate their

findings to both technical and non-technical audiences.

Some of the applications of data science

12
Data science is used in a wide variety of industries, including:

 Finance: Predicting fraud, managing risk, and developing targeted marketing

campaigns.

 Healthcare: Diagnosing diseases, developing new drugs, and personalized medicine.

 Retail: Predicting customer behavior, optimizing inventory levels, and personalized

recommendations.

 Manufacturing: Predictive maintenance, quality control, and process optimization.

 Technology: Recommending products, improving search engines, and developing

new features.

13
4.2 Machine Learning:

Machine Learning is the science of getting computers to learn without being explicitly
programmed. It is closely related to computational statistics, which focuses on making
prediction using computer. In its application across business problems, machine learning is also
referred as predictive analysis. Machine Learning is closely related to computational statistics.
Machine Learning focuses on the development of computer programs that can access data and
use it to learn themselves. The process of learning begins with observations or data, such as
examples, direct experience, or instruction, in order to look for patterns in data and make better
decisions in the future based on the examples that we provide. The primary aim is to allow the
computers learn automatically without human intervention or assistance and adjust actions
accordingly.

History of Machine Learning

The name machine learning was coined in 1959 by Arthur Samuel. Tom M. Mitchell provided
a widely quoted, more formal definition of the algorithms studied in the machine learning field:
"A computer program is said to learn from experience E with respect to some class of tasks T
and performance measure P if its performance at tasks in T, as measured by P, improves with
experience E." This follows Alan Turing's proposal in his paper "Computing Machinery and
Intelligence", in which the question "Can machines think?" is replaced with the question "Can
machines do what we (as thinking entities) can do?". In Turing’s proposal the characteristics
that could be possessed by a thinking machine and the various implications in constructing one
are exposed.

4.2.1. Types of Machine Learning

The types of machine learning algorithms differ in their approach, the type of data they input
and output, and the type of task or problem that they are intended to solve. Broadly Machine
Learning can be categorized into four categories.

i. Supervised Learning

ii. Unsupervised Learning

14
iii. Reinforcement Learning

iv. Semi-supervised Learning

Machine learning enables analysis of massive quantities of data. While it generally delivers
faster, more accurate results in order to identify profitable opportunities or dangerous risks, it
may also require additional time and resources to train it properly.

Supervised Learning

Supervised Learning is a type of learning in which we are given a data set and we already
know what are correct output should look like, having the idea that there is a relationship
between the input and output. Basically, it is learning task of learning a function that maps an
input to an output based on example input output pairs. It infers a function from labeled training
data consisting of a set of training examples. Supervised learning problems are categorized.

Unsupervised Learning

Unsupervised Learning is a type of learning that allows us to approach problems with little or
no idea what our problem should look like. We can derive the structure by clustering the data
based on a relationship among the variables in data. With unsupervised learning there is no
feedback based on prediction result. Basically, it is a type of self-organized learning that helps
in finding previously unknown patterns in data set without pre existing label.

Reinforcement Learning

Reinforcement learning is a learning method that interacts with its environment by producing
actions and discovers errors or rewards. Trial and error search and delayed reward are the most
relevant characteristics of reinforcement learning. This method allows machines and software
agents to automatically determine the ideal behavior within a specific context in order to

15
maximize its performance. Simple reward feedback is required for the agent to learn which
action is best.

Semi-Supervised Learning

Semi-supervised learning fall somewhere in between supervised and unsupervised learning,

since they use both labeled and unlabeled data for training – typically a small amount of labeled
data and a large amount of unlabeled data. The systems that use this method are able to
considerably improve learning accuracy. Usually, semi-supervised learning is chosen when the
acquired labeled data requires skilled and relevant resources in order to train it / learn from it.
Otherwise, acquiring unlabeled data generally doesn’t require additional resources.

The Challenges Facing Machine Learning

While there has been much progress in machine learning, there are also challenges. For
example, the mainstream machine learning technologies are black-box approaches, making us
concerned about their potential risks. To tackle this challenge, we may want to make machine
learning more explainable and controllable. As another example, the computational complexity
of machine learning algorithms is usually very high and we may want to invent lightweight
algorithms or implementations. Furthermore, in many domains such as physics, chemistry,
biology, and social sciences, people usually seek elegantly simple equations (e.g., the
Schrödinger equation) to uncover the underlying laws behind various phenomena. Machine
learning takes much more time. You have to gather and prepare data, then train the algorithm.
There are much more uncertainties. That is why, while in traditional website or application
development an experienced team can estimate the time quite precisely, a machine learning
project used for example to provide product recommendations can take much less or much
more time than expected. Why? Because even the best machine learning engineers don’t know
how the deep learning networks will behave when analyzing different sets of data. It also means
that the machine learning engineers and data scientists cannot guarantee that the training
process of a model can be replicated.

Applications of Machine Learning

16
Machine learning is one of the most exciting technologies that one would have ever come
across. As it is evident from the name, it gives the computer that which makes it more similar
to humans: The ability to learn. Machine learning is actively being used today, perhaps in many
more places than one would expect. We probably use a learning algorithm dozen of time
without even knowing it. Applications of Machine Learning include:

 Web Search Engine: One of the reasons why search engines like google, bing etc work
so well is because the system has learnt how to rank pages through a complex learning
algorithm.

 Photo tagging Applications: Be it facebook or any other photo tagging application, the
ability to tag friends makes it even more happening. It is all possible because of a face
recognition algorithm that runs behind the application.

 Spam Detector: Our mail agent like Gmail or Hotmail does a lot of hard work for us in
classifying the mails and moving the spam mails to spam folder. This is again achieved
by a spam classifier running in the back end of mail application.

Future Scope

Future of Machine Learning is as vast as the limits of human mind. We can always keep
learning, and teaching the computers how to learn. And at the same time, wondering how some
of the most complex machine learning algorithms have been running in the back of our own
mind so effortlessly all the time. There is a bright future for machine learning. Companies like
Google, Quora, and Facebook hire people with machine learning. There is intense research in
machine learning at the top universities in the world. The global machine learning as a service
market is rising expeditiously mainly due to the Internet revolution. The process of connecting
the world virtually has generated vast amount of data which is boosting the adoption of machine
learning solutions. Considering all these applications and dramatic improvements that ML has
brought us, it doesn't take a genius to realize that in coming future we will definitely see more
advanced applications of ML, applications that will stretch the capabilities of machine learning
to an unimaginable level.

17
4.3. Python – The New Generation Language

Python is a widely used general-purpose, high level programming language. It was initially
designed by Guido van Rossum in 1991 and developed by Python Software Foundation. It was
mainly developed for an emphasis on code readability, and its syntax allows programmers to
express concepts in fewer lines of code. Python is dynamically typed and garbage-collected. It
supports multiple programming paradigms, including procedural, object-oriented, and
functional programming. Python is often described as a "batteries included" language due to
its comprehensive standard library.

Features

 Interpreted

In Python there is no separate compilation and execution steps like C/C++. It directly run the
program from the source code. Internally, Python converts the source code into an intermediate
form called bytecodes which is then translated into native language of specific computer to run
it.

 Platform Independent

Python programs can be developed and executed on the multiple operating system platform.
Python can be used on Linux, Windows, Macintosh, Solaris and many more.

 Multi- Paradigm

Python is a multi-paradigm programming language. Object-oriented programming and

structured programming are fully supported, and many of its features support functional
programming and aspect oriented programming .

 Simple

Python is a very simple language. It is a very easy to learn as it is closer to English language.
In python more emphasis is on the solution to the problem rather than the syntax.

 Rich Library Support Python standard library is very vast. It can help to do various things
involving regular expressions, documentation generation, unit testing, threading, databases,
web browsers, CGI, email, XML, HTML, WAV files, cryptography, GUI and many more.

 Free and Open Source

18
Firstly, Python is freely available. Secondly, it is open-source. This means that its source code
is available to the public. We can download it, change it, use it, and distribute it. This is called
FLOSS (Free/Libre and Open Source Software). As the Python community, we’re all headed
toward one goal- an ever-bettering Python.

Why Python Is a Perfect Language for Machine Learning?

1. A great library ecosystem - A great choice of libraries is one of the main reasons Python
is the most popular programming language used for AI. A library is a module or a group
of modules published by different sources which include a pre-written piece of code
that allows users to reach some functionality or perform different actions. Python
libraries provide base level items so developers don’t have to code them from the very
beginning every time. ML requires continuous data processing, and Python’s libraries
let us access, handle and transform data. These are some of the most widespread
libraries you can use for ML and AI:

o Scikit-learn for handling basic ML algorithms like clustering, linear and logistic
regressions, regression, classification, and others.

o Pandas for high-level data structures and analysis. It allows merging and filtering of
data, as well as gathering it from other external sources like Excel, for instance.

o Keras for deep learning. It allows fast calculations and prototyping, as it uses the GPU
in addition to the CPU of the computer.

o TensorFlow for working with deep learning by setting up, training, and utilizing
artificial neural networks with massive datasets.

o Matplotlib for creating 2D plots, histograms, charts, and other forms of visualization.
o NLTK for working with computational linguistics, natural language recognition, and
processing. o Scikit-image for image processing.

o PyBrain for neural networks, unsupervised and reinforcement learning.

o Caffe for deep learning that allows switching between the CPU and the GPU and
processing 60+ mln images a day using a single NVIDIA K40 GPU.

o StatsModels for statistical algorithms and data exploration.

19
In the PyPI repository, we can discover and compare more python libraries.

2. A low entry barrier –

Working in the ML and AI industry means dealing with a bunch of data that we need to
process in the most convenient and effective way. The low entry barrier allows more
data scientists to quickly pick up Python and start using it for AI development without
wasting too much effort into learning the language. In addition to this, there’s a lot of
documentation available, and Python’s community is always there to help out and give
advice.

3. Flexibility-

Python for machine learning is a great choice, as this language is very flexible:

o It offers an option to choose either to use OOPs or scripting.

o There’s also no need to recompile the source code, developers can

implement any changes and quickly see the results.

o Programmers can combine Python and other languages to reach their

goals.

4. Good Visualization Options-

For AI developers, it’s important to highlight that in artificial intelligence, deep

learning, and machine learning, it’s vital to be able to represent data in a human-
readable format. Libraries like Matplotlib allow data scientists to build charts,
histograms, and plots for better data comprehension, effective presentation, and
visualization. Different application programming interfaces also simplify the
visualization process and make it easier to create clear reports.

5. Community Support-

It’s always very helpful when there’s strong community support built around the
programming language. Python is an open-source language which means that there’s a
bunch of resources open for programmers starting from beginners and ending with pros.
A lot of Python documentation is available online as well as in Python communities
and forums, where programmers and machine learning developers discuss errors, solve

20
problems, and help each other out. Python programming language is absolutely free as
is the variety of useful libraries and tools.

6. Growing Popularity-

As a result of the advantages discussed above, Python is becoming more and more
popular among data scientists. According to StackOverflow, the popularity of Python
is predicted to grow until 2020, at least. This means it’s easier to search for developers
and replace team players if required. Also, the cost of their work maybe not as high as
when using a less popular programming language.

21
4.4.Data Preprocessing, Analysis & Visualization

Machine Learning algorithms don’t work so well with processing raw data. Before we can feed
such data to an ML algorithm, we must preprocess it. We must apply some transformations on
it. With data preprocessing, we convert raw data into a clean data set. To perform data this,
there are 7 techniques –

1. Rescaling Data –

For data with attributes of varying scales, we can rescale attributes to possess the same
scale. We rescale attributes into the range 0 to 1 and call it normalization. We use the
MinMaxScaler class from scikit- learn. This gives us values between 0 and 1.

2. Standardizing Data –

With standardizing, we can take attributes with a Gaussian distribution and different
means and standard deviations and transform them into a standard Gaussian distribution
with a mean of 0 and a standard deviation of 1.

3. Normalizing Data –

In this task, we rescale each observation to a length of 1 (a unit norm). For this, we use
the Normalizer class.

4. Binarizing Data –

Using a binary threshold, it is possible to transform our data by marking the values
above it 1 and those equal to or below it, 0. For this purpose, we use the Binarizer class.

4. Mean Removal-

We can remove the mean from each feature to center it on zero.

6. One Hot Encoding –

When dealing with few and scattered numerical values, we may not need to store these.
Then, we can perform One Hot Encoding. For k distinct values, we can transform the
feature into a k-dimensional vector with one value of 1 and 0 as the rest values.

7. Label Encoding –

22
Some labels can be words or numbers. Usually, training data is labelled with words to
make it readable. Label encoding converts word labels into numbers to let algorithms
work on them.

Machine Learning Algorithms

There are many types of Machine Learning Algorithms specific to different use cases.
As we work with datasets, a machine learning algorithm works in two stages. We
usually split the data around 20%-80% between testing and training stages. Under
supervised learning, we split a dataset into a training data and test data in Python ML.
Followings are the Algorithms of Python Machine Learning –

1. Linear Regression-

Linear regression is one of the supervised Machine learning algorithms in Python

that observes continuous features and predicts an outcome. Depending on whether
it runs on a single variable or on many features, we can call it simple linear
regression or multiple linear regression. This is one of the most popular Python ML
algorithms and often under-appreciated. It assigns optimal weights to variables to
create a line ax+b to predict the output. We often use linear regression to estimate
real values like a number of calls and costs of houses based on continuous variables.
The regression line is the best line that fits Y=a*X+b to denote a relationship
between independent and dependent variables.

This Photo by Unknown Author is licensed under CC BY-SA

2.0 Logistic Regression –

23
Logistic regression is a supervised classification is unique Machine Learning
algorithms in Python that finds its use in estimating discrete values like 0/1, yes/no,
and true/false. This is based on a given set of independent variables. We use a
logistic function to predict the probability of an event and this gives us an output
between 0 and 1. Although it says ‘regression’, this is actually a classification
algorithm. Logistic regression fits data into a logit function and is also called logit
regression.

This Photo by Unknown Author is licensed under CC BY

3. Decision Tree - A decision tree falls under supervised Machine Learning Algorithms
in Python and comes of use for both classification and regression- although mostly for
classification. This model takes an instance, traverses the tree, and compares important
features with a determined conditional statement. Whether it descends to the left child
branch or the right depends on the result. Usually, more important features are closer to
the root. Decision Tree, a Machine Learning algorithm in Python can work on both
categorical and continuous dependent variables. Here, we split a population into two or
more homogeneous sets. Tree models where the target variable can take a discrete set
of values are called classification trees; in these tree structures, leaves represent class
labels and branches represent conjunctions of features that lead to those class labels.
Decision trees where the target variable can take continuous values (typically real
numbers) are called regression trees.

4. Support Vector Machine (SVM)-

SVM is a supervised classification is one of the most important Machines Learning

algorithms in Python, that plots a line that divides different categories of your data. In
this ML algorithm, we calculate the vector to optimize the line. This is to ensure that

24
the closest point in each group lies farthest from each other. While you will almost
always find this to be a linear vector, it can be other than that. An SVM model is a
representation of the examples as points in space, mapped so that the examples of the
separate categories are divided by a clear gap that is as wide as possible. In addition to
performing linear classification, SVMs can efficiently perform a non-linear
classification using what is called the kernel trick, implicitly mapping their inputs into
high-dimensional feature spaces. When data are unlabeled, supervised learning is not
possible, and an unsupervised learning approach is required, which attempts to find
natural clustering of the data to groups, and then map new data to these formed groups.

5.Random Forest –

A random forest is an ensemble of decision trees. In order to classify every new object
based on its attributes, trees vote for class- each tree provides a classification. The
classification with the most votes wins in the forest. Random forests or random decision
forests are an ensemble learning method for classification, regression and other tasks
that operates by constructing a multitude of decision trees at training time and
outputting the class that is the mode of the classes (classification) or mean prediction
(regression) of the individual trees.

25
6. kNN Algorithm –

This is a Python Machine Learning algorithm for classification and regression- mostly
for classification. This is a supervised learning algorithm that considers different
centroids and uses a usually Euclidean function to compare distance. Then, it analyzes
the results and classifies each point to the group to optimize it to place with all closest
points to it. It classifies new cases using a majority vote of k of its neighbors. The case
it assigns to a class is the one most common among its K nearest neighbors. For this, it
uses a distance function. k-NN is a type of instance-based learning, or lazy learning,
where the function is only approximated locally and all computation is deferred until
classification. k-NN is a special case of a variable- bandwidth, kernel density "balloon"
estimator with a uniform kernel.

26
7.0.K-Means Algorithm –

k-Means is an unsupervised algorithm that solves the problem of clustering. It classifies

data using a number of clusters. The data points inside a class are homogeneous and
heterogeneous to peer groups. k-means clustering is a method of vector quantization,
originally from signal processing, that is popular for cluster analysis in data mining. k-
means clustering aims to partition n observations into k clusters in which each
observation belongs to the cluster with the nearest mean, serving as a prototype of the
cluster. k-means clustering is rather easy to apply to even large data sets, particularly
when using heuristics such as Lloyd's algorithm. It often is used as a preprocessing step
for other algorithms, for example to find a starting configuration. The problem is
computationally difficult (NP-hard). k-means originates from signal processing, and
still finds use in this domain. In cluster analysis, the k-means algorithm can be used to
partition the input data set into k partitions (clusters). k-means clustering has been used
as a feature learning (or dictionary learning) step, in either (semi-)supervised learning
or unsupervised learning.

27
5.0 LEARNING OUTCOMES AND WORK EXPERIENCE

This industrial training was conducted by YBI Foundation, and the content of this
industrial training is very helpful for me. As this industrial training was Online but the
task and assignments provided in the training were so engaging that I did not feel
disconnected.

5.1 Application of Theory and skills

The technical knowledge that i have gained in this training can be used in the various
day-to-day tasks and also be helpful for the conduction of crucial tasks which deals
with the users

For example, with this knowledge, I can make House Price Prediction applications
which predict the prices of houses.

I also have learnt various skills such as working better in a team, better and more
efficient communication skills, better analysis and critical thinking skills,
Understanding Human emotions and making designs accordingly. These skills not only
will help in the design field but also will be helpful in another part of my career.

28
6.0 CONCLUSION

Data science is well into its formative stages of development. It is envolving into a self

supporting and producing professionals with distinct and complementary skills in statistical

sciences. It is most popular technology used by professional world wide. I realise that I could

do more things than I thought like like learning new things myself. There are huge opportunity

available for the students who want to work in this field. Many private and public organization

heir data science engineer for their online work and analyzing the dataset with the rapid advent

of online industry, the demand of data science engineer is increasing days.

29
REFERENCE

1. Ybi Foundation - FREE Courses, Internship, Bootcamp, Full Stack, Guaranteed

2. https://www.greeksforgreeks.com/

3. https://www.programiz.com/

4. https://www.javapoint.com/

Internshala Summer Training Report On Data Science
77% (22)
Internshala Summer Training Report On Data Science
70 pages
TE Sem1 UNIT 1 (Data Science and Visualization) HONOURS - TE (SEM V)
No ratings yet
TE Sem1 UNIT 1 (Data Science and Visualization) HONOURS - TE (SEM V)
28 pages
Bca Ctis Sem-5 Introduction To Data Science
No ratings yet
Bca Ctis Sem-5 Introduction To Data Science
14 pages
Introduction To Data Science - Ii-I Course File 2025-26
No ratings yet
Introduction To Data Science - Ii-I Course File 2025-26
152 pages
21-07-24 - SR - Iit - Star Co-Sc (Model-A) - Jee Adv - 2021 (P-I) - Wat-55 - QP
No ratings yet
21-07-24 - SR - Iit - Star Co-Sc (Model-A) - Jee Adv - 2021 (P-I) - Wat-55 - QP
20 pages
Active Machine Learning with Python: Refine and elevate data quality over quantity with active learning
From Everand
Active Machine Learning with Python: Refine and elevate data quality over quantity with active learning
Margaux Masson-Forsythe
No ratings yet
My Internship Document
No ratings yet
My Internship Document
41 pages
5th Sem Internship Eport
No ratings yet
5th Sem Internship Eport
83 pages
Hedy HD700 Aug V4.1
No ratings yet
Hedy HD700 Aug V4.1
160 pages
Kadir
No ratings yet
Kadir
84 pages
CCA - Module 2 - CC Architecture
No ratings yet
CCA - Module 2 - CC Architecture
25 pages
Chapter 1
No ratings yet
Chapter 1
85 pages
Global Certificate in Data Science & AI 2024
No ratings yet
Global Certificate in Data Science & AI 2024
29 pages
Ids Unit 1,2,3,4 & 5
No ratings yet
Ids Unit 1,2,3,4 & 5
117 pages
Real Report
No ratings yet
Real Report
62 pages
Unit-1 IDS
No ratings yet
Unit-1 IDS
26 pages
Data Science CLP LED
No ratings yet
Data Science CLP LED
20 pages
Data Science Brochure
No ratings yet
Data Science Brochure
16 pages
Internshala Summer Training Report On Data Science
No ratings yet
Internshala Summer Training Report On Data Science
70 pages
Patriachal History Notes Students-1
No ratings yet
Patriachal History Notes Students-1
71 pages
Data-Science-Report - Priyesh
No ratings yet
Data-Science-Report - Priyesh
32 pages
TKR College of Engineering and Technology: (Autonomous & Accredited With 'A' Grade by NAAC)
No ratings yet
TKR College of Engineering and Technology: (Autonomous & Accredited With 'A' Grade by NAAC)
2 pages
PDF
No ratings yet
PDF
25 pages
It Report
No ratings yet
It Report
24 pages
DATA SCIENCE Course
No ratings yet
DATA SCIENCE Course
11 pages
Project File For Internship Report
No ratings yet
Project File For Internship Report
17 pages
Fundamentals of Data Science
100% (3)
Fundamentals of Data Science
62 pages
An Industrial Training Report On Data Science
No ratings yet
An Industrial Training Report On Data Science
36 pages
Data Science and Big Data by IBM CE Allsoft Summer Training Final Report
100% (1)
Data Science and Big Data by IBM CE Allsoft Summer Training Final Report
41 pages
Data Science Report
No ratings yet
Data Science Report
32 pages
Sushil 7th (1 PDF
No ratings yet
Sushil 7th (1 PDF
29 pages
Data Science Complete Course
No ratings yet
Data Science Complete Course
5 pages
Fazli Bipin
No ratings yet
Fazli Bipin
24 pages
Data Science Report
No ratings yet
Data Science Report
32 pages
Data Science
No ratings yet
Data Science
13 pages
Manoj Intern Data Science
No ratings yet
Manoj Intern Data Science
37 pages
Internship Report: T.J.Instituteoftechnology
No ratings yet
Internship Report: T.J.Instituteoftechnology
29 pages
Unit I
No ratings yet
Unit I
52 pages
Data Science Report
No ratings yet
Data Science Report
32 pages
5-Week Data Science Bootcamp Detailed Syllabus
No ratings yet
5-Week Data Science Bootcamp Detailed Syllabus
4 pages
Typing Keyboard Lmg-Arun
No ratings yet
Typing Keyboard Lmg-Arun
2 pages
Harsh Synopsis
No ratings yet
Harsh Synopsis
21 pages
Difficulties in Listening of The First Year Students at Tay Do University in Vietnam
No ratings yet
Difficulties in Listening of The First Year Students at Tay Do University in Vietnam
9 pages
Data Science Report - Compress
No ratings yet
Data Science Report - Compress
31 pages
Yoga Vasistha Part I
100% (4)
Yoga Vasistha Part I
636 pages
Praise and Worship Songbook
100% (2)
Praise and Worship Songbook
188 pages
Training Report On Data Sciencep
No ratings yet
Training Report On Data Sciencep
80 pages
Data Science Brochure 1425
No ratings yet
Data Science Brochure 1425
8 pages
Professor Scarlet's Notebook
No ratings yet
Professor Scarlet's Notebook
163 pages
TRAINING Report
No ratings yet
TRAINING Report
32 pages
Touchpad Information Technology Class 10: Skill Education Based on Windows & OpenOffice Code (402)
From Everand
Touchpad Information Technology Class 10: Skill Education Based on Windows & OpenOffice Code (402)
Dr. Sanjay Jain
No ratings yet
Data Science
No ratings yet
Data Science
16 pages
AR 25-50 - Prep & Manage Correspondence PDF
No ratings yet
AR 25-50 - Prep & Manage Correspondence PDF
121 pages
DSC Brochure
No ratings yet
DSC Brochure
14 pages
Learning English
No ratings yet
Learning English
2 pages
DS - Unit I
No ratings yet
DS - Unit I
3 pages
PDF Data Science
No ratings yet
PDF Data Science
7 pages
Final Industrial Report
No ratings yet
Final Industrial Report
34 pages
Data Science Training Report.
100% (1)
Data Science Training Report.
73 pages
Muh. Fawaz Salammutaqi - Summary of Describing Jobs
No ratings yet
Muh. Fawaz Salammutaqi - Summary of Describing Jobs
8 pages
BROCHURE - Data Science Learning Path - Board - Infinity
No ratings yet
BROCHURE - Data Science Learning Path - Board - Infinity
30 pages
EDS Unit 1?
No ratings yet
EDS Unit 1?
15 pages
Metis Bootcamp Curriculum 2020
No ratings yet
Metis Bootcamp Curriculum 2020
19 pages
File
No ratings yet
File
27 pages
Data Science - Curriculum Brochure
No ratings yet
Data Science - Curriculum Brochure
31 pages
Certification Course On Data Science Latest
No ratings yet
Certification Course On Data Science Latest
8 pages
Isomorphisms and Allomorphisms in The Morphemic Structure of English and Ukrainian Words
No ratings yet
Isomorphisms and Allomorphisms in The Morphemic Structure of English and Ukrainian Words
13 pages
Data Science Report
No ratings yet
Data Science Report
32 pages
Lab#7
No ratings yet
Lab#7
3 pages
Data Science and Analytics Reviewer
No ratings yet
Data Science and Analytics Reviewer
5 pages
Data Scientist Master Program v4
100% (1)
Data Scientist Master Program v4
28 pages
Hallowed Be Your Name
No ratings yet
Hallowed Be Your Name
23 pages
Data Science Bootcamp: Curriculum
No ratings yet
Data Science Bootcamp: Curriculum
19 pages
Mit Data Science Program
No ratings yet
Mit Data Science Program
13 pages
Skim To Determine Key Ideas and Author's Purpose
No ratings yet
Skim To Determine Key Ideas and Author's Purpose
20 pages
Speaking-Sample 2
No ratings yet
Speaking-Sample 2
3 pages
New Success Photocopiable 4
No ratings yet
New Success Photocopiable 4
2 pages
Lesson Plan Will Be Able To
0% (1)
Lesson Plan Will Be Able To
5 pages
GreyAtom FSDSE Brochure PDF
No ratings yet
GreyAtom FSDSE Brochure PDF
25 pages
Day 1 Handout SLRC Principles and Strategies of Teaching
No ratings yet
Day 1 Handout SLRC Principles and Strategies of Teaching
16 pages
Appendix 1: Lesson Plan (Template) : Lesson Plan Subject: English Trainee: Bashayer Abdul-Aziz Topic or Theme: Phonics
No ratings yet
Appendix 1: Lesson Plan (Template) : Lesson Plan Subject: English Trainee: Bashayer Abdul-Aziz Topic or Theme: Phonics
5 pages
Trackpad Information Technology for Class 10: CODE 402 | Skill Education, Based on Windows & OpenOffice
From Everand
Trackpad Information Technology for Class 10: CODE 402 | Skill Education, Based on Windows & OpenOffice
Shalini Harisukh
No ratings yet
Degree of Comparison
No ratings yet
Degree of Comparison
2 pages
Beck, Plato S Parmenides
No ratings yet
Beck, Plato S Parmenides
6 pages
Thy Will Be Done+ PDF
No ratings yet
Thy Will Be Done+ PDF
1 page
Synthesis Essay Outline
100% (2)
Synthesis Essay Outline
2 pages
Bla Power Pvt. LTD: Woodward 505 Governor Valve / Actuator Calibration &test
No ratings yet
Bla Power Pvt. LTD: Woodward 505 Governor Valve / Actuator Calibration &test
23 pages
A Bridge: - Verb (Used With Object), A Bridged, A Bridg Ing
No ratings yet
A Bridge: - Verb (Used With Object), A Bridged, A Bridg Ing
7 pages
Intertextuality Quiz
No ratings yet
Intertextuality Quiz
1 page
Score Part Copying Guide
No ratings yet
Score Part Copying Guide
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.