0% found this document useful (0 votes)

5 views20 pages

Internship Report

The internship report by Divya Drishti details her experience at Internshala Trainings, where she learned about Python programming and machine learning algorithms over a 56-day period. The report includes an introduction to the platform, an agenda of topics covered, and a project focused on credit card fraud detection using machine learning techniques. Key algorithms discussed include KNN, linear regression, logistic regression, and decision trees.

Uploaded by

restrict7678

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views20 pages

Internship Report

Uploaded by

restrict7678

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

INTERNSHIP REPORT

AT INTERNSHALA TRAININGS,

Attended by
DIVYA DRISHTI
21105111017

in partial fulfilment for the award of the degree of

Bachelor of Technology
In
COMPUTER SCIENCE ENGINEERING
from 23.11.2024 to 22.07.2025

1
BONAFIDE CERTIFICATE

This is to certify that the “Internship report” submitted by

DIVYA DRISHTI is the work done by her and submitted during
academic year, in partial fulfilment of the requirements for the
award of the degree of Bachelor of Technology in COMPUTER
SCIENCE ENGINEERING, at INTERNSHALA
TRAININGS.

HEAD OF THE DEPARTMENT INTERNSHIP COORDINATOR

2
CERTIFICATION

3
TABLE OF CONTENT

S.NO TITLE PAGE NO

1 Introduction 5

2 Agenda 6

3 Topics 8
3.1 Python History 8
3.2 Data Types 10
3.3 Machine Learning Introduction 10
3.4 Machine Learning Algorithms 11
3.4.1 KNN Algorithm 12
3.4.2 Linear regression 14
3.4.3 Logistic regression 14
3.4.4 Decision tree 16
3.4.5 Clustering modules 17

4 PROJECT 18
4.1 Project Explanation 18
4.2 Solution Approach 19

4
1.Introduction

The platform, which was founded in 2010, started out as a WordPress

blog that aggregated internships across India and articles on education,
technology and skill gap. Internshala launched its online training in
2014. As of 2018, the platform had 3.5 million students and 80,000
companies.

Services: Internship matching, online training

Founder: Sarvesh Agrawal

Industry: Education, Employment

Headquarters: Gurgaon, India

Internshala is India's no.1 internship and training platform with 40000+

paid internships in Engineering, MBA, media, law, arts, and other
streams.

5
In this Internship, I had learnt about introduction of ml along with the
ml algorithms. Such that 56 Days of internship has 32 topics and did
mini projects and final assessment in the end of internship i.e., from
56th day.

2. AGENDA

Day 1 - Introduction to python and software requirements to download.

Day 2 - Python comments, variables and syntax
Day 3&4 - Python Data Types, Numbers and Casting
Day 5&6 - Python Strings, Booleans and Operators
Day 7&8 - Lists and Tuples
Day 9 - Sets and Dictionaries
Day 10 - Machine Learning Introduction
Day 11 - Life cycle of Data Scientist
Day 12&13 - Data exploration
Day 14 - Data Manipulation
Day 15 - Brief introduction to Classifications
Day 16 - Supervised learning
Day 17&18 - Unsupervised learning with an real time example
Day 19&20 - Reinforcement learning with real time example

Day 21 & 22 - Linear Regression

Day 23 - Logistic Regression
Day 24 - Random Forest
Day 25 - Module Quiz
Day 26 - Ensemble models
Day 27&28 – Decision trees & knn model
Day 29 - Programming Questions from Level 0
Day 30 - Programming Questions from Level 1
Day 31 - Programming Questions from Level 2
Day 32 - Programming Questions from Level 3

6
3. Topics
3.1 Python History
Python is a widely used general-purpose, high-level programming language.
It was initially designed by Guido van Rossum in 1991 and developed by
Python Software Foundation. It was mainly developed for emphasis on code
readability, and its syntax allows programmers to express concepts in fewer
lines of code. The two of the most used versions have to do with Python 2.x
& 3.x. There is a lot of competition between the two and both of them seem
to have quite a number of different fanbase.
For various purposes such as developing, scripting, generation and software
testing, this language is utilised. Due to its elegance and simplicity, top
technology organisations like Dropbox, Google, Quora, Mozilla,
HewlettPackard, Qualcomm, IBM, and Cisco have implemented Python.

7
8
9
3.2 Data Types in Python

3.3 MACHINE LEARNING INTRODUCTION

Machine learning is a subfield of artificial intelligence (AI). The goal of
machine learning generally is to understand the structure of data and fit that
data into models that can be understood and utilized by people.
Although machine learning is a field within computer science, it differs from
traditional computational approaches. In traditional computing, algorithms
are sets of explicitly programmed instructions used by computers to
calculate or problem solve. Machine learning algorithms instead allow for
computers to train on data inputs and use statistical analysis in order to
output values that fall within a specific range. Because of this, machine
learning facilitates computers in building models from sample data in order
to automate decision-making processes based on data inputs.
Any technology user today has benefitted from machine learning. Facial
recognition technology allows social media platforms to help users tag and
share photos of friends. Optical character recognition (OCR) technology
converts images of text into movable type. Recommendation engines,

10
powered by machine learning, suggest what movies or television shows to
watch next based on user preferences. Self-driving cars that rely on machine
learning to navigate may soon be available to consumers.

Classification of ML:

11
3.4 MACHINE LEARNING ALGORITHMS
3.4.1 KNN Algorithm:
The k-nearest neighbour algorithm is a pattern recognition model that can
be used for classification as well as regression. Often abbreviated as kNN,
the k in k-nearest neighbour is a positive integer, which is typically small.
In either classification or regression, the input will consist of the k closest
training examples within a space.
We will focus on k-NN classification. In this method, the output is class
membership. This will assign a new object to the class most common among
its k nearest neighbours. In the case of k = 1, the object is assigned to the
class of the single nearest neighbour.

Let’s look at an example of k-nearest neighbour. In the diagram below, there

are blue diamond objects and orange star objects. These belong to two
separate classes: the diamond class and the star class.

When a new object is added to the space in this case a green heart, we will
want the machine learning algorithm to classify the heart to a certain class.

12
When we choose k = 3, the algorithm will find the three nearest neighbours
of the green heart in order to classify it to either the diamond class or the
star class.
In our diagram, the three nearest neighbours of the green heart are one
diamond and two stars. Therefore, the algorithm will classify the heart with
the star class.

13
3.4.2 LINEAR REGRESSION
Linear Regression is a machine learning algorithm based on supervised
learning. It performs a regression task. Regression models a target
prediction value based on independent variables. It is mostly used for
finding out the relationship between variables and forecasting. Different
regression models differ based on the kind of relationship between
dependent and independent variables, they are considering and the number
of independent variables being used.

Linear regression performs the task to predict a dependent variable value (y)
based on a given independent variable (x). So, this regression technique
finds out a linear relationship between x (input) and y(output). In the figure
above, X (input) is the work experience and Y (output) is the salary of a
person. The regression line is the best fit line for our model.

3.4.2 LOGISTIC REGRESSION

Logistic regression is a statistical model that in its basic form uses a logistic
function to model a binary dependent variable, although many more
complex extensions exist. In regression analysis, logistic regression (or logit
regression) is estimating the parameters of a logistic model (a form of binary
regression).

14
It is used in statistical software to understand the relationship between the
dependent variable and one or more independent variables by estimating
probabilities using a logistic regression equation. This type of analysis can
help you predict the likelihood of an event happening or a choice being
made.

In statistics, the logistic model is used to model the probability of a certain

class or event taking place, such as the probability of a team winning, of a
patient being healthy, etc. This can be extended to model several classes of
events such as determining whether an image contains a cat, dog, lion, etc.

15
3.4.3 DECISION TREES
For general use, decision trees are employed to visually represent decisions
and show or inform decision making. When working with machine learning
and data mining, decision trees are used as a predictive model. These models
map observations about data to conclusions about the data’s target value.
The goal of decision tree learning is to create a model that will predict the
value of a target based on input variables.

In the predictive model, the data’s attributes that are determined through
observation are represented by the branches, while the conclusions about the
data’s target value are represented in the leaves.
When “learning” a tree, the source data is divided into subsets based on an
attribute value test, which is repeated on each of the derived subsets
recursively. Once the subset at a node has the equivalent value as its target
value has, the recursion process will be complete.

Let’s look at an example of various conditions that can determine whether

or not someone should go fishing. This includes weather conditions as well
as barometric pressure conditions.

In the simplified decision tree above, an example is classified by sorting it

through the tree to the appropriate leaf node. This then returns the
classification associated with the particular leaf, which in this case is either
a Yes or a No. The tree classifies a day’s conditions based on whether or not
it is suitable for going fishing.

16
A true classification tree data set would have a lot more features than what
is outlined above, but relationships should be straightforward to determine.
When working with decision tree learning, several determinations need to
be made, including what features to choose, what conditions to use for
splitting, and understanding when the decision tree has reached a clear
ending.

3.4.4 CLUSTERING MODULES

Cluster analysis, or clustering, is an unsupervised machine learning task. It
involves automatically discovering natural grouping in data. Unlike
supervised learning (like predictive modelling), clustering algorithms only
interpret the input data and find natural groups or clusters in feature space.

17
4. PROJECT

CREDIT CARD FRAUD DETECTION

4.1 Project Explanation

The problem statement chosen for this project is to predict fraudulent
credit card transactions with the help of machine learning models.
In this project, we will analyse customer-level data which has been collected
and analysed during a research collaboration of Worldline and the Machine
Learning Group.
The dataset is taken from the Kaggle Website and it has a total of 2,84,807
transactions, out of which 492 are fraudulent. Since the dataset is highly
imbalanced, so it needs to be handled before model building.

Understanding and Defining Fraud

Credit card fraud is any dishonest act and behaviour to obtain
information without the proper authorization from the account holder for
financial gain. Among different ways of frauds, Skimming is the most
common one, which is the way of duplicating of information located on the
magnetic strip of the card. Apart from this, the other ways are:

• Manipulation/alteration of genuine cards

• Creation of counterfeit cards
• Stolen/lost credit cards
• Fraudulent telemarketing

Data Dictionary
The dataset can be download using this link
The data set includes credit card transactions made by European cardholders
over a period of two days in September 2013. Out of a total of 2,84,807
transactions, 492 were fraudulent. This data set is highly unbalanced, with
the positive class (frauds) accounting for 0.172% of the total transactions.
The data set has also been modified with Principal Component Analysis
(PCA) to maintain confidentiality. Apart from ‘time’ and ‘amount’, all the
other features (V1, V2, V3, up to V28) are the principal components

18
obtained using PCA. The feature 'time' contains the seconds elapsed
between the first transaction in the data set and the subsequent transactions.
The feature 'amount' is the transaction amount. The feature 'class' represents
class labelling, and it takes the value 1 in cases of fraud and 0 in others.

4.2 Solution Approach

1. Data understanding and exploring
2. Data cleaning

• Handling missing values

• Outliers’ treatment
3. Exploratory data analysis

• Univariate analysis
• Bivariate analysis
4. Prepare the data for modelling
Check the skewness of the data and mitigate it for fair analysis

• Handling data imbalance as we see only 0.172% records are the fraud
transactions 5. Split the data into train and test set
• Scale the data (normalization)
6. Model building

• Train the model with various algorithm such as Logistic regression,

SVM, Decision Tree, Random Forest, XG Boost etc.
• Tune the hyperparameters with Grid Search Cross Validation and find
the optimal values of the hyperparameters
7. Model evaluation

• As we see that the data is heavily imbalanced, Accuracy may not be

the correct measure for this particular case
• We have to look for a balance between Precision and Recall over
Accuracy. We also have to find out the good ROC score with high
TPR and low FPR in order to get the lower number of
misclassifications.

19
20

Python Programming Important Notes
No ratings yet
Python Programming Important Notes
46 pages
Internship Report 40 Pages
No ratings yet
Internship Report 40 Pages
40 pages
Assessment and Evaluation in Nursing Education
No ratings yet
Assessment and Evaluation in Nursing Education
7 pages
Vanaja Internship Report 2023
No ratings yet
Vanaja Internship Report 2023
39 pages
E.venkatasai Ir
No ratings yet
E.venkatasai Ir
204 pages
C0 Report
No ratings yet
C0 Report
50 pages
AIML Curriculum Powered by IBM - Pregrad-Merged
No ratings yet
AIML Curriculum Powered by IBM - Pregrad-Merged
66 pages
Course Plan 21CSC307P - Machine Learning For Data Analytics
No ratings yet
Course Plan 21CSC307P - Machine Learning For Data Analytics
13 pages
SMART ATTENDANCE SYSTEM (Report)
No ratings yet
SMART ATTENDANCE SYSTEM (Report)
89 pages
THE SYLLABUS, Schemes of Work, Notes
No ratings yet
THE SYLLABUS, Schemes of Work, Notes
13 pages
Ieee PHD Thesis Format
100% (3)
Ieee PHD Thesis Format
4 pages
Ocs353dsf Unit Wise Notes
100% (2)
Ocs353dsf Unit Wise Notes
121 pages
2024년 3월 고2 모고 24번 변형문제
No ratings yet
2024년 3월 고2 모고 24번 변형문제
134 pages
CECI Uganda 2023 Annual Report - Community Empowerment For Creative Innovation (CECI Uganda) Annual Report
No ratings yet
CECI Uganda 2023 Annual Report - Community Empowerment For Creative Innovation (CECI Uganda) Annual Report
27 pages
J1 (SkillDzire)
No ratings yet
J1 (SkillDzire)
49 pages
Sociology of Area Studies
No ratings yet
Sociology of Area Studies
8 pages
Project
No ratings yet
Project
63 pages
Data Science Report
No ratings yet
Data Science Report
46 pages
213j1a05h6 Data Science Cse-F
No ratings yet
213j1a05h6 Data Science Cse-F
25 pages
Real Report
No ratings yet
Real Report
62 pages
Karunadu Technologies Private Limited: (Affiliated To and Approved By)
No ratings yet
Karunadu Technologies Private Limited: (Affiliated To and Approved By)
53 pages
Planeaciones Normal
No ratings yet
Planeaciones Normal
88 pages
Internshala Summer Training Report On Data Science
No ratings yet
Internshala Summer Training Report On Data Science
70 pages
ML With Py Brochure
No ratings yet
ML With Py Brochure
18 pages
Data Science: Industrial Training Report
No ratings yet
Data Science: Industrial Training Report
45 pages
1971 Chandra Naga Bhargava Reddy
No ratings yet
1971 Chandra Naga Bhargava Reddy
20 pages
File of ML
No ratings yet
File of ML
42 pages
Final Report Anna
No ratings yet
Final Report Anna
33 pages
PDF
No ratings yet
PDF
25 pages
Nagamani
No ratings yet
Nagamani
22 pages
DHANA
No ratings yet
DHANA
15 pages
Data Science and Big Data by IBM CE Allsoft Summer Training Final Report
100% (1)
Data Science and Big Data by IBM CE Allsoft Summer Training Final Report
41 pages
Ethics Review PDF
No ratings yet
Ethics Review PDF
10 pages
Python
No ratings yet
Python
37 pages
Tushar Internship Report 4th Year
No ratings yet
Tushar Internship Report 4th Year
17 pages
Paper 5
No ratings yet
Paper 5
44 pages
IT Lab PPT Pratham Chouhan CSE174
No ratings yet
IT Lab PPT Pratham Chouhan CSE174
40 pages
AIML-Curriculum by Pregrad
No ratings yet
AIML-Curriculum by Pregrad
33 pages
Data Science Report
No ratings yet
Data Science Report
32 pages
2018 Semester 2 - Main Exam
No ratings yet
2018 Semester 2 - Main Exam
18 pages
Loan Approval Predictor Using Data Science and Machine Learning Project
100% (1)
Loan Approval Predictor Using Data Science and Machine Learning Project
66 pages
Final Rep
No ratings yet
Final Rep
23 pages
UPDATED Data Science Syllabus
No ratings yet
UPDATED Data Science Syllabus
20 pages
RK Final
No ratings yet
RK Final
32 pages
Report Data Analysis
No ratings yet
Report Data Analysis
45 pages
Machine Learning and Deep Learning With Python
From Everand
Machine Learning and Deep Learning With Python
James Chen
No ratings yet
AI - ML Curriculum Powered by IBM - Pregrad
No ratings yet
AI - ML Curriculum Powered by IBM - Pregrad
31 pages
Dimensionality Reduction Algorithms
No ratings yet
Dimensionality Reduction Algorithms
34 pages
Heart Disease Detector
No ratings yet
Heart Disease Detector
7 pages
Bachelor of Technology IN Artificial Intelligence and Data Science
No ratings yet
Bachelor of Technology IN Artificial Intelligence and Data Science
16 pages
Content Page
No ratings yet
Content Page
13 pages
CABARDO RO INSET Monitoring and Evaluation Tool
No ratings yet
CABARDO RO INSET Monitoring and Evaluation Tool
3 pages
Internship ML REPORT
No ratings yet
Internship ML REPORT
27 pages
Internship-Data Science and Machine Learning Using Python
No ratings yet
Internship-Data Science and Machine Learning Using Python
5 pages
Industrial Training Report (Sahil)
No ratings yet
Industrial Training Report (Sahil)
33 pages
Spreadsheet Homework Year 9
100% (1)
Spreadsheet Homework Year 9
7 pages
Internship Report On Machine Learning With Python
100% (1)
Internship Report On Machine Learning With Python
50 pages
Carmen National High School-Day Class Cogon West, Carmen, Cebu Midterm Examination in Oral Communication in Context Table of Specifications
No ratings yet
Carmen National High School-Day Class Cogon West, Carmen, Cebu Midterm Examination in Oral Communication in Context Table of Specifications
2 pages
Data Analysis Syllabus
No ratings yet
Data Analysis Syllabus
3 pages
Algorithmic Injustice. A Relational Ethics Approach
No ratings yet
Algorithmic Injustice. A Relational Ethics Approach
9 pages
Data Science Training Report.
100% (1)
Data Science Training Report.
73 pages
Sakthivel Intern Rec
No ratings yet
Sakthivel Intern Rec
22 pages
Final Last
No ratings yet
Final Last
34 pages
University of Edinburgh
100% (1)
University of Edinburgh
3 pages
Project Report
No ratings yet
Project Report
29 pages
CAT2 of MAT1142 Y 2023 - Special - 072307
No ratings yet
CAT2 of MAT1142 Y 2023 - Special - 072307
2 pages
Sat - 100.Pdf - Prediction of Cyber Attacks Using Data Science Technique
No ratings yet
Sat - 100.Pdf - Prediction of Cyber Attacks Using Data Science Technique
11 pages
DS Curriculum
No ratings yet
DS Curriculum
4 pages
Essay On Sts
No ratings yet
Essay On Sts
1 page
Machine Learning With Python Report
100% (1)
Machine Learning With Python Report
41 pages
Introduction To Machine Learning PDF
100% (1)
Introduction To Machine Learning PDF
17 pages
Python Training Report (ML)
No ratings yet
Python Training Report (ML)
19 pages
Organization As Flux and Transformation
0% (1)
Organization As Flux and Transformation
3 pages
Resume JONATHAN MWITA 02 - 15 - 2023 7 - 15 - 50 AM
No ratings yet
Resume JONATHAN MWITA 02 - 15 - 2023 7 - 15 - 50 AM
2 pages
RTS qp3
No ratings yet
RTS qp3
2 pages
Teaching Strategies For Response To Literature
100% (2)
Teaching Strategies For Response To Literature
11 pages
One Month Internship in DataScience With AIML
No ratings yet
One Month Internship in DataScience With AIML
3 pages
Types of Digital Data
No ratings yet
Types of Digital Data
22 pages
KAIZEN
No ratings yet
KAIZEN
3 pages
Practice Quiz M1 (Ungraded) 03
No ratings yet
Practice Quiz M1 (Ungraded) 03
5 pages
KLS'S Vishwanathrao Deshpande Institute of Technology, Haliyal
No ratings yet
KLS'S Vishwanathrao Deshpande Institute of Technology, Haliyal
17 pages
Guc 2 61 38781 2023-11-25T16 29 04
No ratings yet
Guc 2 61 38781 2023-11-25T16 29 04
3 pages
Bs Tourism Curriculum
No ratings yet
Bs Tourism Curriculum
1 page
Report of Industrial Training
No ratings yet
Report of Industrial Training
22 pages
Machine Learning Mastery for Engineers
From Everand
Machine Learning Mastery for Engineers
Abdellatif Sadeq
No ratings yet
CPSC 4430 Introduction To Machine Learning Catalog Description Course Symbol: CPSC 4430 Title: Machine Learning Hours of Credit: 3 Course Description
No ratings yet
CPSC 4430 Introduction To Machine Learning Catalog Description Course Symbol: CPSC 4430 Title: Machine Learning Hours of Credit: 3 Course Description
5 pages
Alternative Therapy
No ratings yet
Alternative Therapy
3 pages
Andre Rouhani Resume - 4 December 2016
No ratings yet
Andre Rouhani Resume - 4 December 2016
1 page
DRRlessonPlan6 1
No ratings yet
DRRlessonPlan6 1
4 pages
Senior PMO Project Manager in Washington DC Resume Mark Yader
No ratings yet
Senior PMO Project Manager in Washington DC Resume Mark Yader
4 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Internship Report

Uploaded by

Internship Report

Uploaded by

INTERNSHIP REPORT

in partial fulfilment for the award of the degree of

This is to certify that the “Internship report” submitted by

HEAD OF THE DEPARTMENT INTERNSHIP COORDINATOR

S.NO TITLE PAGE NO

The platform, which was founded in 2010, started out as a WordPress

Services: Internship matching, online training

Founder: Sarvesh Agrawal

Industry: Education, Employment

Headquarters: Gurgaon, India

Internshala is India's no.1 internship and training platform with 40000+

Day 1 - Introduction to python and software requirements to download.

Day 21 & 22 - Linear Regression

3.3 MACHINE LEARNING INTRODUCTION

Let’s look at an example of k-nearest neighbour. In the diagram below, there

3.4.2 LOGISTIC REGRESSION

In statistics, the logistic model is used to model the probability of a certain

Let’s look at an example of various conditions that can determine whether

In the simplified decision tree above, an example is classified by sorting it

3.4.4 CLUSTERING MODULES

CREDIT CARD FRAUD DETECTION

4.1 Project Explanation

Understanding and Defining Fraud

• Manipulation/alteration of genuine cards

4.2 Solution Approach

• Handling missing values

• Train the model with various algorithm such as Logistic regression,

• As we see that the data is heavily imbalanced, Accuracy may not be

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.