0% found this document useful (0 votes)
48 views26 pages

FINAL INTERN DOCUMENT Dhanunjai

The document discusses a virtual internship report on data science, machine learning and AI. It provides an overview of the internship activities conducted over 8 weeks covering topics like Python, SQL, statistics, machine learning and deep learning. It includes screenshots of the internship portal and quizzes attempted by the intern.

Uploaded by

renukasatya895
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views26 pages

FINAL INTERN DOCUMENT Dhanunjai

The document discusses a virtual internship report on data science, machine learning and AI. It provides an overview of the internship activities conducted over 8 weeks covering topics like Python, SQL, statistics, machine learning and deep learning. It includes screenshots of the internship portal and quizzes attempted by the intern.

Uploaded by

renukasatya895
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 26

VIRTUAL INTERNSHIP REPORT ON DATA SCIENCE

MACHINE LEARNING, AI
A report submitted to the department of
Computer Science And Engineering in partial fulfillment of the
requirements of the award of the Degree
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING
By

Reddy Dhanunjaya Baskara Rao


20A21A05F3

Under the guidance of


Internal Guide External Guide
Mr. P.Srinivas Rao M.Tech(ph.D) Mr. K. Sai Harsha
Associate Professor, CSE Dept Data Valley Pvt.Ltd

(Duration: 12th Feb, 2024 to 16th April, 2024)

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


SWARNANDHRA COLLEGE OF ENGINEERING & TECHNOLOGY
(Autonomous)
(Approved by A.I.C.T.E & Affiliated to JNTU Kakinada)
(Accredited by NAAC with A Grade in 2nd Cycle)

Seetharamapuram, Narsapur-534 280

April 2024

i
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

SWARNANDHRA COLLEGE OF ENGINEERING & TECHNOLOGY


(Autonomous)
Seetharamapuram, Narsapur-534 280

CERTIFICATE

This is to certify that the internship entitled “VIRTUAL INTERNSHIP REPORT ON


DATA SCIENCE MACHINE LEARNING , AI” submitted by Reddy Dhanunjaya
Baskara Rao 20A21A05F3 in the Department of Computer Science And Engineering ,
Swarnandhra College of Engineering & Technology for the award of the Degree of Bachelor
of Technology in Computer Science And Engineering is a bonafide internship carried out
under our supervision.

Project Supervisor Head of the Department


Mr. P.Srinivas Rao M.Tech(ph.D) Dr. P Srinivasulu
Associate Professor,CSE Dept Professor & HOD, CSE Dept

External Examiner

ii
ORGANIZATION CERTIFICATE

iii
DECLARATION

I certify that

a. The internship contained in the report is original and has been done by me under the
guidance of my supervisor.
b. The work has not been submitted to any other University for the award of any degree or
diploma.
c. The guidelines of the college are followed in writing the internship report.

Date:

Reddy Dhanunjaya Baskara Rao


(20A21A05F3)

iv
ACKNOWLEDGEMENT

I whole heartedly and sincerely thank my guide Mr.P.Srinivas Rao M.Tech(Ph.D)


Professor, Dept of CSE and Dr. P.Srinivasulu Professor & Head of the Department of
Computer Science And Engineering, Swarnandhra College of Engineering & Technology,
Seetharampuram for their valuable suggestions and encouragement during the preparation and
progress of my internship.

I express my heartfelt thanks to Dr. S. Suresh Kumar, Principal, Swarnandhra College of


Engineering & Technology for giving me this opportunity for the successful completion of my
degree.

I express my honest thanks to Management of Swarnandhra college of Engineering &


Technology for providing necessary arrangements for completing the internship.
I express my earnest thanks to all the teaching and non-teaching staff of department of
Information Technology for their valuable guidance and support given for the completion of my
internship.
Finally, it is pleasure to thank all my family members and friends for their constant
encouragement and enormous support. Without them I could not go up with my work.

Reddy Dhanunjaya Baskara Rao


(20A21A05F3)

v
INDEX

S.no. CONTENTS PAGE No.

1 About Internship and Learning Objectives 1

2 Organization Profile 2

3 Introduction 3
4 Software Requirement Specifications 4-5
5 About Technologies 6-10
5.1 Python for Data Science 6
5.2 SQL 7-8
5.3 Statistics for Data Science 8
5.4 Machine Learning 9-10
6 Screenshots 11-12
7 Quiz Questions 13-14
8 Internship Registration Proofs 15
9 Conclusion 16
10 List Of Reference 17

WEEKLY OVERVIEW OF INTERNSHIP ACTIVITIES


vi
DATE DAY NAME OF THE TOPIC
12/02/2024 MONDAY INTRODUCTION TO PROGRAM
13/02/2024 TUESDAY INTRODUCTION OF DATA SCIENCE
1st

week 14/02/2024 WEDNESDAY REGRESSION,REGRESSION TABLE

15/02/2024 THURSDAY REGRESSION COEFFICIENTS, REGRESSION


P-VALUE
16/02/2024 FRIDAY REGRESSION R-SQUARED, REGRESSION CASE

17/02/2024 SATURDAY QUIZ

DATE DAY NAME OF THE TOPIC


19/02/2024 MONDAY INTRODUCTION TO PYTHON
2nd
20/02/2024 TUESDAY PYTHON BASICS
week 21/02/2024 WEDNESDAY OPERATIONS AND DATA TYPES IN
PYTHON
22/02/2024 THURSDAY PYTHON LISTS , DICTIONARIES
23/02/2024 FRIDAY PYTHON ADVANCED MODULE
24/02/2024 SATURDAY QUIZ

DATE DAY NAME OF THE TOPIC


26/02/2024 MONDAY INTRODUCTION TO SQL
27/02/2024 TUESDAY TABLE CREATION
3rd
28/02/2024 WEDNESDAY ABSTRACTION,ABSTRACT CLASS AND
week INTERFACE
29/02/2024 THURSDAY STATIC KEYWORD SQL CONSTRAINTS
DATE
01/03/2024 DAY
FRIDAY RETREIVINGNAME OF THE
DATA FROM TOPIC
THE TABLE
04/03/2024
02/03/2024 MONDAY
SATURDAY MATHEMATICS
QUIZ FOR
DATA SCIENCE
4th
05/03/2024 TUESDAY BASICS OF STATISTICS
week 06/03/2024 WEDNESDAY PERCENTILES , STANDARD DEVIATION,
VARIANCE
07/03/2024 THURSDAY CORRELATION , CORRELATION MATRIX

08/03/2024 FRIDAY CORRELATION VS CAUSALITY


09/03/2024 SATURDAY QUIZvii
DATE DAY NAME OF THE TOPIC
11/03/2024 MONDAY INTRODUCTION TO MACHINE LEARNING
12/03/2024 TUESDAY PREDICTIVE MODELLING
5th 13/03/2024 WEDNESDAY INTRODUCTION TO PREDICTIVE MODELLING

week 14/03/2024 THURSDAY TYPES OF PREDICTIVE MODELS


15/03/2024 FRIDAY STAGES OF PREDICTIVE MODELS

16/03/2024 SATURDAY QUIZ

DATE DAY NAME OF THE TOPIC


18/03/2024 MONDAY HYPOTHESIS GENERATION
19/03/2024 TUESDAY DATA EXTRACTION
6th 20/03/2024 WEDNESDAY READING THE DATA

week 21/03/2024 THURSDAY VARIABLE IDENTIFICATION


22/03/2024 FRIDAY ML ALGORITHMS

23/03/2024 SATURDAY QUIZ

DATE DAY NAME OF THE TOPIC


25/03/2024 MONDAY INTRODUCTION OF DEEP LEARNING
26/03/2024 TUESDAY DIFFERENCE BETWEEN MACHINE LEARNING
AND DEEP LEARNING
7th
27/03/2024 WEDNESDAY NEURONS
week 28/03/2024 THURSDAY NEURAL NETWORK
29/03/2024 FRIDAY NEURAL NETWORK MODEL
30/03/2024 SATURDAY QUIZ

viii
DATE DAY NAME OF THE TOPIC
01/04/2024 MONDAY PROJECT
INTRODUCTION
8th
02/04/2024 TUESDAY OVERVIEW OF PROJECT
week 03/04/2024 WEDNESDAY DESIGN AND ANALYSIS
04/04/2024 THURSDAY PREPROCESSING DATA
05/04/2024 FRIDAY APPLYING ALGORITHMS

06/04/2024 SATURDAY QUIZ

ix
1. LEARNING OBJECTIVES/INTERNSHIP OBJECTIVES

Internships are generally thought of to be reserved for college students looking to gain
experience in a particular field. However, a wide array of people can benefit from Training
Internships in order to receive real world experience and develop their skills.

An objective for this position should emphasize the skills you already possess in the area and your
interest in learning more.

Internships are utilized in a number of different career fields, including architecture, engineering,
healthcare, economics, advertising and many more.

Some internship is used to allow individuals to perform scientific research while others are
specifically designed to allow people to gain first-hand experience working.

Utilizing internships is a great way to build your resume and develop skills that can be emphasized in
your resume for future jobs. When you are applying for a Training Internship, make sure to highlight
any special skills or talents that can make you stand apart from the rest of the applicants so that you
have an improved chance of landing the position.

1
2. ORGANIZATION PROFILE

Organization Information:

Datavalley.ai is a leading provider of top-notch training and consulting services in the cutting-
edge fields of Big Data, Data Engineering, Data Architecture, DevOps, Data Science, Machine
Learning, IoT, and Cloud Technologies.

Training:

Data Valley training programs, led by industry experts, are tailored to equip professionals and
organizations with the essential skills and knowledge needed to thrive in the rapidly evolving data
landscape. We believe in continuous learning and growth, and our commitment to staying on top of
emerging trends and technologies ensures that our clients receive the most cutting-edge training
possible.

2
3. INTRODUCTION

Data Science Overview

Data science is the study of data.Like biological sciences is a study of biology, physical
sciences, it’s the study of physical reactions. Data is real, data has real properties, and we
need to study them if we’re going to work on them.Data Science involves data and some
signs. It is a process, not an event.It is the process of using data to understand too many
different things, to understand the world.
Let Suppose when you have a model or proposed explanation of a problem, and you try to
validate that proposed explanation or model with your data.It is the skill of unfolding the
insights and trends that are hiding (or abstract) behind data. It’s when you translate data into
a story. So, use storytelling to generate insight. And with these insights, you can make
strategic choices for a company or an institution.

Predictive modeling:
Predictive modeling is a form of artificial intelligence that uses data mining and probability to
forecast or estimate more granular, specific outcomes.
For example, predictive modeling could help identify customers who are likely to purchase
our new One AI software over the next 90 days.

Machine Learning:
Machine learning is a branch of artificial intelligence (ai) where computers learn to act and
adapt to new data without being programmed to do so. The computer is able to act
independently of human interaction.

Forecasting:

Forecasting is a process of predicting or estimating future events based on past and present data and
most commonly by analysis of trends. "Guessing" doesn't cut it. A forecast, unlike a prediction, must
have logic to it. It must be defendable. This logic is what differentiates it from the magic 8 ball's lucky
guess. After all, even a broken watch is right two times a day.

3
4.SOFTWARE REQUIREMENT SPECIFICATIONS

For data science, you typically need a combination of software tools to perform various tasks
such as data manipulation, analysis, visualization, and machine learning. Here's a list of essential
software requirements for data science:

1. Programming Languages:
• Python: It's the most widely used language in data science due to its extensive libraries for
data manipulation (e.g., Pandas), visualization (e.g., Matplotlib, Seaborn), and machine
learning (e.g., Scikit-learn, TensorFlow, PyTorch).
• R: Another popular language for statistical analysis and visualization, particularly in
academia.
2. Integrated Development Environments (IDEs):
• Jupyter Notebook: A web-based interactive computing environment that allows you to create
and share documents containing live code, equations, visualizations, and narrative text.
• Spyder: A powerful IDE for Python that provides a MATLAB-like interface for data analysis.
3. Data Manipulation and Analysis:
• Pandas: A Python library for data manipulation and analysis.
• NumPy: Provides support for large, multi-dimensional arrays and matrices, along with a
collection of mathematical functions to operate on these arrays.
• R Studio: An integrated development environment (IDE) for R that makes data analysis
easier with its intuitive interface.
3. Data Visualization:
• Matplotlib: A plotting library for Python that provides a MATLAB-like interface for creating
static, interactive, and animated visualizations.
• Seaborn: A Python visualization library based on Matplotlib that provides a highlevel
interface for drawing attractive statistical graphics.

4
• ggplot2: A plotting system for R, based on the grammar of graphics, which provides a highly
customizable approach to data visualization.

4. Machine Learning:
• Scikit-learn: A simple and efficient tool for data mining and data analysis, built on NumPy,
SciPy, and Matplotlib.
• TensorFlow / Keras: TensorFlow is an open-source machine learning library developed by

Google. Keras is a high-level neural networks API, which can run on top of TensorFlow.
5. Deep Learning (Optional):
• TensorFlow / Keras: Widely used for deep learning tasks due to its flexibility and performance. •
PyTorch: Another popular choice for deep learning, known for its dynamic computational graph
and ease of use.

6. Data Storage and Management:


• SQL: Understanding of SQL is essential for querying databases and extracting relevant data.
• SQLite: A C-language library that implements a small, fast, self-contained, highreliability,
fullfeatured, SQL database engine.

• NoSQL Databases (Optional): Depending on the project requirements, familiarity with NoSQL
databases like MongoDB or Cassandra might be necessary. 8. Version Control:

• Git: Essential for tracking changes in code and collaborating with other team members.
Platforms like GitHub, GitLab, or Bitbucket are commonly used for hosting Git repositories.

8. Text Editors:

• VS Code: A lightweight and powerful source code editor that comes with built-in support for
Python and many other languages.

• Atom, Sublime Text, etc.: Other popular text editors with extensive support for various
programming languages.

5
5. TECHNOLOGIES

5.1 Python for Data Science :


Python is a high-level, general-purpose and a very popular programming language. Python
programming language is being used in web development, Machine Learning applications, along
with all cutting edge technology in Software Industry. Python Programming Language is very
wellsuited for Beginners, also for experienced programmers with other programming languages like
C++ and Java.

PANDAS :
When it comes to data manipulation and analysis, nothing beats Pandas. It is the most popular
Python library, period. Pandas is written in the Python language especially for manipulation and
analysis tasks.
Pandas provides features like:
• Dataset joining and merging
• Data Structure column deletion and insertion
• Data filtration
• Reshaping datasets • DataFrame objects to manipulate data, and much more!
NUMPY :
NumPy, like Pandas, is an incredibly popular Python library. NumPy brings in functions to support
large multi-dimensional arrays and matrices. It also brings in high-level mathematical functions to
work with these arrays and matrices. NumPy is an open-source library and has multiple contributors.

MATPLOTLIB :
Matplotlib is the most popular data visualization library in Python. It allows us to generate and build
plots of all kinds. This is my go-to library for exploring data visually along with Seaborn.

6
5.2 SQL Database :

SQL (Structured Query Language) databases are a cornerstone of modern data storage, retrieval, and
management systems. They are designed to efficiently handle structured data and have been the go-to
solution for relational data management for decades. If you need to describe SQL databases for your
report, here are the key points you might want to cover:

Overview of SQL Databases

SQL databases use a structured approach to organize data into tables, which consist of rows and
columns. This tabular structure allows for easy querying, indexing, and relationships among data.
The relational model, proposed by Edgar F. Codd in 1970, underpins the design of SQL databases,
emphasizing the use of keys and relationships to maintain data integrity.

Key Characteristics of SQL Databases

Relational Structure: Data is stored in tables, with each table representing a specific entity.
Relationships among tables are established using primary keys (unique identifiers for each row) and
foreign keys (references to primary keys in other tables).

SQL Language: SQL is the standard language for interacting with relational databases. It provides
commands for querying, updating, and managing data, such as SELECT, INSERT, UPDATE, and
DELETE.

Data Integrity and Constraints: SQL databases enforce data integrity through constraints like
primary keys, foreign keys, unique constraints, and check constraints. These rules ensure that the
data remains consistent and reliable.

ACID Properties: SQL databases adhere to ACID properties—Atomicity, Consistency, Isolation, and
Durability. These properties ensure that transactions are processed reliably, even in the event of system
failures or concurrent access.

7
Normalization: SQL databases use normalization to minimize data redundancy and avoid data
anomalies. This process involves decomposing complex tables into simpler ones to maintain
consistency and reduce duplication.

Common SQL Database Systems

Several SQL database systems are widely used across different industries. Some of the most popular
ones include:

MySQL: An open-source SQL database known for its speed, reliability, and ease of use. It's
commonly used for web applications and small to medium-sized businesses.

5.3 Statistics for Data Science :

Statistics simply means numerical data, and is field of math that generally deals with collection of
data, tabulation, and interpretation of numerical data. It is actually a form of mathematical analysis
that uses different quantitative models to produce a set of experimental data or studies of real life.
It is an area of applied mathematics concern with data collection analysis, interpretation, and
presentation. Statistics deals with how data can be used to solve complex problems. Some people
consider statistics to be a distinct mathematical science rather than a branch of mathematics.
Statistics makes work easy and simple and provides a clear and clean picture of work you do on a
regular basis.
Basic terminology of Statistics:
• Population – It is actually a collection of set of individuals or objects or events whose
properties are to be analyzed.
• Sample – It is the subset of a population.
(i) Mean :
It is measure of average of all value in a sample set (ii)
Median :
It is measure of central value of a sample set. In these, data set is ordered from lowest to
highest value and then finds exact middle.
(iii) Mode :
It is value most frequently arrived in sample set. The value repeated most of time in central
set is actually mode.
Understanding the spread of data

Measure of Variability is also known as measure of dispersion and used to describe variability in a
sample or population.
In statistics, there are three common measures of variability as shown below:

8
(i) Range:
It is given measure of how to spread apart values in sample set or data set. Range =
Maximum value - Minimum value
(ii) Variance :
It simply describes how much a random variable defers from expected value and it is also
computed as square of deviation
(iii) Dispersion:
It is measure of dispersion of set of data from its mean. Represe

5.4 Machine Learning :

Predictive analytics involves certain manipulations on data from existing data sets with the goal of
identifying some new trends and patterns. These trends and patterns are then used to predict future
outcomes and trends. By performing predictive analysis, we can predict future trends and
performance. It is also defined as the prognostic analysis, the word prognostic means prediction.
Predictive analytics uses the data, statistical algorithms and machine learning techniques to
identify the probability of future outcomes based on historical data.

Understanding the types of Predictive Models

•Supervised learning:
Supervised learning as the name indicates the presence of a supervisor as a teacher.
Basically supervised learning is a learning in which we teach or train the machine using data
which is well labeled that means some data is already tagged with the correct answer. After that,
the machine is provided with a new set of examples(data) so that supervised learning algorithm
analyses the training data (set of training examples) and produces a correct outcome from labeled
data.

• Unsupervised learning:
Unsupervised learning is the training of machine using information that is neither
classified nor labeled and allowing the algorithm to act on that information without guidance.
Here the task of machine is to group unsorted information according to similarities, patterns and
differences without any prior training of data.
Stages of Predictive Models :
Steps To Perform Predictive Analysis:

1. Define Problem Statement:


Define the project outcomes, the scope of the effort, objectives, identify the data sets that
are going to be used.
2. Data Collection:

9
Data collection involves gathering the necessary details required for the analysis.
3. Data Cleaning:
Data Cleaning is the process in which we refine our data sets. In the process of data cleaning,
we remove un-necessary and erroneous data. It involves removing the redundant data and
duplicate data from our data sets.
4. Data Analysis:
It involves the exploration of data. We explore the data and analyze it thoroughly in order
to identify some patterns or new outcomes from the data set. In this stage, we discover useful
information and conclude by identifying some patterns or trends.

5.Build Predictive Model:


In this stage of predictive analysis, we use various algorithms to build predictive models
based on the patterns observed. It requires knowledge of python, R, Statistics and MATLAB and
so on. We also test our hypothesis using standard statistic models.
6.Validation:
It is a very important step in predictive analysis. In this step, we check the efficiency of
our model by performing various tests. Here we provide sample input sets to check the
validity of our model. The model needs to be evaluated for its accuracy in this stage.
7. Deployment:
In deployment we make our model work in a real environment and it helps in everyday
discussion making and make it available to use.
8. Model Monitoring:
Regularly monitor your models to check performance and ensure that we have proper
results. It is seeing how model predictions are performing against actual data sets

10
6. SCREENSHOTS

11
12
7. QUIZ QUESTIONS

13
14
8. INTERNSHIP REGISTRATION PROOF

15
9. CONCLUSION
In summary, the convergence of data science, machine learning, and artificial intelligence
represents a transformative force in our digital landscape. This interdisciplinary fusion empowers
us to extract valuable insights from vast datasets, automate processes, and create intelligent
systems capable of learning and adapting. By leveraging advanced algorithms, statistical
techniques, and computational power, practitioners in these fields can tackle complex problems,
drive innovation, and enhance decision-making across diverse domains. As we continue to
advance, interdisciplinary collaboration and ongoing research will further propel the capabilities
of data science, machine learning, and AI, ushering in a new era of technological sophistication
and societal impact.

10. LIST OF REFERENCE

16
Schapire, R.E. (2003). The boosting approach to machine learning: An overview. In Nonlinear
Estimation and Classification, pp. 149–172. Springer. 341Google Scholar

Schapire, R.E., Freund, Y., Bartlett, P. and Lee, W.S. (1998). Boosting the margin: A new
explanation for the effectiveness of voting methods. Annals of Statistics 26(5):1651–1686.
341Google Scholar

Quinlan, J.R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann. 156Google
Scholar

Ragavan, H. and Rendell, L.A. (1993). Lookahead feature construction for learning hard concepts.
In Proceedings of the Tenth International Conference on Machine Learning (ICML 1993), pp. 252–
259. Morgan Kaufmann. 328 Google Scholar

Rajnarayan, D.G. and Wolpert, D. (2010). Bias-variance trade-offs: Novel applications. In C.,
Sammut and G.I., Webb (eds.), Encyclopedia of Machine Learning, pp. 101–110. Springer.
103Google Scholar

17

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy