0% found this document useful (0 votes)

30 views29 pages

Ds Unit 3 Notes

Uploaded by

Priyanka Patil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views29 pages

Ds Unit 3 Notes

Uploaded by

Priyanka Patil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Unit-III

Concept of Data Science

Traits of Big data, Web Scraping, Analysis vs Reporting, Introduction to Programming, Tools for
Data Science, Toolkits using Python: Matplotlib, NumPy, Scikit-learn, NLTK

Traits of Big Data

Big Data refers to extremely large datasets that are complex and grow exponentially over
time. These datasets often exceed the capabilities of traditional data processing tools and
require specialized technologies to manage and analyze. The key traits of Big Data are often
described using the "5 Vs":

Big Data Traits

Big Data is characterized by several traits that differentiate it from traditional data.
Understanding these traits is crucial for managing and leveraging big data effectively. Here’s
a detailed breakdown of each trait with examples:

1. Volume

Definition:

Volume refers to the sheer amount of data that is generated, stored, and processed within a
system. When we talk about Volume, we’re essentially discussing the quantity of data, which
can be massive in scale.

Explanation:

In the context of data, Volume is one of the key characteristics, especially when dealing with
Big Data. Big Data is all about handling enormous datasets that traditional data processing
tools can’t manage efficiently. Volume is significant because:

Prepared by Mrs. Shah S. S.

1
1. Data Generation: With the rise of the internet, social media, IoT devices, and various
online platforms, data is being created at an unprecedented rate. This includes
everything from text and images to videos and logs.
2. Storage Challenges: Storing such a large amount of data requires advanced storage
solutions. Traditional databases might not be sufficient, so technologies like cloud
storage, distributed databases, and data warehouses are often used.
3. Processing and Analysis: Analyzing this vast amount of data can be challenging
because of its size. Advanced tools and algorithms, like Hadoop and Spark, are
needed to process and extract valuable insights from it.

Example: Consider a social media platform like Facebook. Every day, users upload billions
of photos, status updates, and comments. The sheer volume of this data is enormous and
continues to grow exponentially. For instance, Facebook users upload approximately 100
million photos daily. Storing and analyzing this massive volume of data requires scalable
storage solutions and powerful processing capabilities.

Implications:

Challenges:

When dealing with large datasets, storing and analyzing them can be quite challenging.
Traditional databases might struggle with the sheer amount of data, leading to performance
issues and slow processing times. To manage this, organizations need to adopt advanced
database management systems that can efficiently handle large volumes of data. Additionally,
scalable storage solutions are necessary to ensure that the data can be stored without running
out of space or compromising on speed.

Technologies:

1. Hadoop: Hadoop is an open-source framework that enables the distributed processing

of large datasets across clusters of computers. This means that instead of processing
all the data on a single machine, Hadoop breaks it down and processes it in parallel on
multiple machines. This approach not only speeds up data processing but also allows
for handling extremely large datasets that would be impossible to manage on a single
computer.
2. Cloud Storage Services: Cloud storage services, like Amazon Web Services (AWS),
Google Cloud, and Microsoft Azure, provide scalable storage solutions that can grow
with your data. These services offer flexible storage options that can expand as your
data volume increases, ensuring that you never run out of space. Moreover, cloud
services also provide tools for data analysis, making it easier to process and analyze
large datasets without having to invest in expensive hardware.

2. Variety

Definition: The different forms that data can take. Think of it like different types of food:
some are solid (like a sandwich), some are liquid (like soup), and some are a mix of things
(like a salad). Similarly, data comes in different "flavors" or types:

Prepared by Mrs. Shah S. S.

2
1. Structured Data: This is the most organized type of data. It's like a neatly arranged
bookshelf where everything is in order. For example, data in a database is structured
because it's organized into tables with rows and columns.
2. Semi-Structured Data: This data isn't as organized as structured data, but it still has
some level of structure. Imagine a recipe book where the ingredients and steps are
listed, but not in a strict order. Examples include JSON or XML files, which have tags
to organize data, but not as rigidly as a database.
3. Unstructured Data: This type of data is like a box of random stuff — you have to
sort through it to find what you need. It includes things like text documents, images,
videos, and emails. There’s no specific format or structure, making it harder to
manage and analyze.

Example: A retail company collects various types of data:

Structured Data:

• Example: Customer purchase transactions stored in a relational database.

• Explanation: This type of data is highly organized and easily searchable using
traditional database tools. It is stored in tables with rows and columns. For instance,
when a customer buys something, the details like transaction ID, customer ID,
product ID, quantity, and price are recorded in a structured format. Each piece of
information is placed in a specific field, making it easy to query and analyze. For
example, you might use SQL (Structured Query Language) to run a query to find out
which product sold the most last month.

Semi-Structured Data:

• Example: Customer reviews in JSON format from an online review platform.

• Explanation: This type of data doesn’t fit neatly into tables like structured data but
still has some organizational properties that make it easier to analyze compared to
unstructured data. JSON (JavaScript Object Notation) is a common format for this
type of data, where the data is organized in a hierarchy of keys and values. For
example, a JSON file might include a customer's review with fields for rating, review
text, and date, but these fields might vary from review to review. It’s more flexible
than structured data but still contains identifiable pieces of information.

Unstructured Data:

• Example: Customer support chat logs and social media posts.

• Explanation: This type of data doesn’t have a predefined format or structure. It
includes free-form text or multimedia content, which makes it more challenging to
analyze using traditional database tools. For example, chat logs might include lengthy
conversations between customers and support agents, and social media posts might
include text, images, and videos. This data requires more advanced processing
techniques, like natural language processing (NLP) or sentiment analysis, to extract
useful insights.

The complexity arises from integrating and analyzing these diverse data types to gain
meaningful insights. For instance, combining purchase data with social media sentiment
analysis can provide a more comprehensive view of customer behavior.
Prepared by Mrs. Shah S. S.
3
Implications: Tools and techniques like data integration platforms, data lakes, and advanced
analytics are used to handle and extract value from this variety of data.

Data Integration Platforms

Data Integration Platforms are tools that help combine data from various sources into a
unified view. They allow you to:

• Collect Data: Aggregate data from multiple sources like databases, APIs, and files.
• Transform Data: Clean, filter, and format the data to make it suitable for analysis.
• Load Data: Place the cleaned data into a central repository for easier access and analysis.

Example: Imagine a company collects data from customer feedback forms, sales records, and
social media interactions. A data integration platform would help merge this data into a single
system where it can be analyzed collectively.

2. Data Lakes

A Data Lake is a storage repository that holds vast amounts of raw data in its native format
until needed. Unlike traditional databases, data lakes are designed to store structured, semi-
structured, and unstructured data. Here’s why they’re useful:

• Scalability: Data lakes can handle huge volumes of data, scaling up as needed.
• Flexibility: Since the data is stored in its raw form, you can perform different types of
analyses without predefined structures.
• Cost-Effective: Storing data in its native format can be more cost-effective than traditional
database systems.

Example: Suppose a company collects sensor data from manufacturing equipment, social
media posts, and text documents. A data lake can store all this diverse data until it’s needed
for analysis.

3. Advanced Analytics

Advanced Analytics involves using sophisticated techniques and tools to analyze complex
data sets. This includes:

• Machine Learning: Algorithms that can learn from data and make predictions or decisions.
For instance, predicting customer churn based on historical data.
• Data Mining: Discovering patterns and relationships within large datasets. For example,
identifying purchasing patterns among different customer segments.
• Statistical Analysis: Applying statistical methods to understand trends and make inferences.
For example, assessing the impact of a marketing campaign on sales.

Example: After aggregating data in a data lake, a company might use advanced analytics to
predict which products will be popular in the next season based on historical trends and social
media sentiment.

3. Velocity
Prepared by Mrs. Shah S. S.
4
Definition: Velocity, in the context of data, refers to how quickly data is generated,
processed, and shared. Imagine a stream of data coming from various sources, like social
media, sensors, or financial transactions. This data doesn't just arrive in bulk but flows in
real-time or near-real-time, meaning it's constantly being created and updated.

For example, think about the data generated by a social media platform. Every second, people
are posting updates, likes, comments, and photos. This constant influx of new data creates a
"high velocity" of information. To make sense of this data, it has to be processed and
analyzed quickly. If companies want to use this data to track trends or monitor user behavior,
they need to analyze it almost as soon as it's generated.

Example: working with a stock trading platform. This platform collects data on stock prices
and trading volumes all the time—every second, the data is updated as people buy and sell
stocks.

Here’s how this works in practice:

1. Continuous Data Generation: Every time a trade happens or a new piece of news is
announced about a company, the stock price and trading volume data change. For
instance, if Company X just released a groundbreaking product, its stock price might
start rising quickly as more people buy its shares.
2. Real-Time Analysis: Traders need to keep an eye on this constantly updating data to
make quick decisions. If they see the stock price of Company X rising due to the
news, they might decide to buy more shares before the price goes even higher.
3. Reacting to News: Let’s say there's unexpected news about Company Y—maybe
they’re facing a lawsuit. Traders need real-time data to see how this news impacts the
stock price. If the price starts to drop rapidly, they might decide to sell their shares to
avoid losing money.

Implications: Technologies like Apache Kafka and real-time analytics platforms are used to
handle high-velocity data streams, ensuring timely processing and decision-making.

Apache Kafka

1. What is Apache Kafka? Apache Kafka is a distributed event streaming platform designed
to handle high-throughput, low-latency data feeds. It’s like a messaging system that allows
different applications to send and receive large volumes of data efficiently.

2. Key Concepts:

• Topics: Kafka organizes data into topics, which act like channels where different types of
messages are sent. Each topic can have multiple partitions to help with scaling.
• Producers: These are the applications that send data to Kafka topics.
• Consumers: These are the applications that read data from Kafka topics.
• Brokers: Kafka runs on a cluster of servers called brokers. Each broker is responsible for
storing and managing a portion of the data.

3. How It Works:

Prepared by Mrs. Shah S. S.

5
• Data Streaming: Producers send data (events or messages) to Kafka topics. This data is
stored in a distributed manner across multiple brokers.
• High Throughput: Kafka is designed to handle high volumes of data and can process
millions of messages per second.
• Fault Tolerance: Kafka replicates data across multiple brokers, ensuring data is not lost even
if some brokers fail.
• Real-Time: Consumers read data from Kafka topics in real-time, which means they can
process and analyze the data as soon as it’s produced.

Real-Time Analytics Platforms

1. What are Real-Time Analytics Platforms? These platforms are designed to analyze data
as it arrives, rather than after it's been stored. They provide insights and decision-making
capabilities on-the-fly.

2. Key Features:

• Low Latency: They process data almost immediately after it’s generated. This is crucial for
applications where timely insights are needed.
• Scalability: They handle large volumes of data and can scale up to accommodate growing
data streams.
• Complex Event Processing: They can identify patterns and anomalies in data streams, which
helps in making quick decisions based on real-time information.

3. How They Work:

• Data Ingestion: Real-time analytics platforms ingest data from various sources, often
including Kafka topics.
• Stream Processing: The platform processes the data in real-time, applying algorithms and
rules to extract insights or trigger actions.
• Visualization: Results are often visualized in dashboards, providing users with live updates
and insights.
• Integration: They integrate with other systems to act on the insights, such as triggering alerts
or adjusting operational parameters.

4. Veracity
Definition: Veracity refers to the quality, accuracy, and reliability of data and data sources. It
involves dealing with data that may be incomplete, inconsistent, or uncertain.

Example: Consider a healthcare system where patient data is collected from various sources,
such as electronic health records, wearable devices, and patient surveys. Some data may be
incomplete or inaccurate, such as incorrect patient details or missing health records.

Implications: Ensuring data quality requires data cleaning, validation, and verification
processes. Tools and techniques for data quality management and governance help maintain
the accuracy and reliability of the data.

5. Value
Prepared by Mrs. Shah S. S.
6
Definition: Value refers to the usefulness and potential insights that can be derived from
data. It focuses on extracting actionable insights that can drive business decisions and create
value.

Example: An e-commerce company analyzes customer purchase history, browsing behavior,

and feedback to personalize recommendations and targeted marketing campaigns. By
leveraging this data, the company can increase sales and improve customer satisfaction.

Implications: Extracting value from data involves using advanced analytics, machine
learning models, and data visualization tools. The goal is to transform raw data into
actionable insights that drive business strategies and decisions.

Summary

• Volume: The amount of data generated and stored. For example, Facebook's daily uploads.
• Variety: The types of data (structured, semi-structured, unstructured). For example, data from
a retail company's transactions, reviews, and support logs.
• Velocity: The speed at which data is generated and processed. For example, real-time stock
price data in trading.
• Veracity: The quality and reliability of data. For example, ensuring accuracy in healthcare
records.
• Value: The insights and benefits derived from data. For example, personalized marketing
strategies based on customer data.

Understanding these traits helps in selecting the right tools and approaches for managing big
data effectively and deriving meaningful insights.

What is Web Scraping?

Web scraping is an automated process used to extract large amounts of data from websites.
Websites often present data in an unstructured format, such as HTML, which can be
challenging to use directly. Web scraping transforms this unstructured data into a structured
format, like spreadsheets or databases, making it easier to analyze and utilize.

Components of Web Scraping

1. Crawler:
o Definition: A crawler, also known as a spider or bot, is an algorithm that navigates
the web by following links from one page to another.
o Function: It collects URLs and gathers data from the pages it visits.
o Example: A crawler might start at a home page and follow links to product pages,
collecting URLs for all the products listed.
2. Scraper:
o Definition: A scraper is a tool or script that extracts specific information from a web
page.
o Function: It parses the HTML content of a page to retrieve data based on the user’s
requirements.
o Example: A scraper might extract product names, prices, and descriptions from a
product listing page.

Prepared by Mrs. Shah S. S.

7
How Web Scrapers Work

1. Specify URLs:
o Process: You provide the URLs of the web pages you want to scrape.
o Example: If you want to scrape product information from an online bookstore, you
might provide URLs of product category pages.
2. Load HTML Content:
o Process: The scraper retrieves the HTML content of the provided URLs.
o Example: The scraper downloads the HTML code of a product category page, which
includes product listings in HTML format.
3. Extract Data:
o Process: The scraper parses the HTML to find and extract the required information.
o Example: The scraper identifies and extracts product names, prices, and descriptions
from the HTML.
4. Output Data:
o Process: The extracted data is saved in a structured format like CSV, JSON, or Excel.
o Example: The data is saved in a CSV file with columns for product names, prices,
and descriptions.

Example of Web Scraping

Let’s say you want to scrape product details from an online bookstore.

1. Request URL:
o Example URL: https://examplebookstore.com/category/fiction
2. Get HTML:
o Process: The scraper requests the HTML content of the page.
o Example HTML: The HTML might include tags like <div class="product">
containing product details.
3. Extract Data:

Conceptual Webpage Structure:

html
Copy code
<html>
<head>
<title>Electronics Store</title>
</head>
<body>
<divclass="product-list">
<divclass="product-item">
<h2class="product-title">Smartphone XYZ</h2>
<spanclass="price">$499.99</span>
</div>
<divclass="product-item">
<h2class="product-title">Laptop ABC</h2>
<spanclass="price">$899.99</span>
</div>
<divclass="product-item">
<h2class="product-title">Headphones 123</h2>
<spanclass="price">$199.99</span>
</div>
</div>
</body>
</html>
Prepared by Mrs. Shah S. S.
8
Data Extraction Structure:

1. Target Website URL: https://www.example-electronics-store.com/products

2. Data of Interest:
o Product Names: <h2 class="product-title">
o Product Prices: <span class="price">
3. Extracted Data:
o Product 1:
▪ Name: Smartphone XYZ
▪ Price: $499.99
o Product 2:
▪ Name: Laptop ABC
▪ Price: $899.99
o Product 3:
▪ Name: Headphones 123
▪ Price: $199.99

Summary:

• Access the Webpage: Visit the URL.

• Parse HTML: Identify and navigate to the HTML elements containing product names and
prices.
• Extract Information: Extract data from <h2> tags for product names and <span> tags for
prices.
• Organize Data: Compile the extracted information into a list or table format for further use.

This structure helps visualize how the information is laid out on the webpage and how it can
be systematically extracted.

Types of Web Scrapers

1. Self-built vs. Pre-built:

o Self-built: Custom scrapers coded from scratch to meet specific needs.
▪ Example: Writing a Python script to scrape a unique website's product
listings.
o Pre-built: Ready-made scrapers with customizable options.
▪ Example: Tools like Octoparse or ParseHub that come with built-in features
for various scraping tasks.
2. Browser Extension vs. Software:
o Browser Extension: Integrated into browsers like Chrome or Firefox.
▪ Example: Extensions like Web Scraper or Data Miner that scrape data
directly from your browser.
o Software: Standalone applications installed on your computer.
▪ Example: Scrapy (a Python framework) or Import.io (a data extraction tool).
3. Cloud vs. Local:
o Cloud: Runs on remote servers managed by a service provider.
▪ Example: Using a cloud-based tool like ScraperAPI to handle scraping
without burdening your local system.
o Local: Runs on your computer's resources.
▪ Example: Running a Python script with Beautiful Soup on your local
machine.

Prepared by Mrs. Shah S. S.

9
Why Python is Popular for Web Scraping

Python is favored for web scraping due to:

• Ease of Use: Python’s syntax is simple and easy to learn.

• Libraries:
o Scrapy: A framework for creating scrapers and crawlers.
▪ Example: Scraping multiple pages of a website efficiently.
o Beautiful Soup: Parses HTML and XML documents.
▪ Example: Navigating and extracting data from HTML documents easily.
o Pandas: Handles data manipulation and analysis.
▪ Example: Converting scraped data into a DataFrame for further analysis.

Applications of Web Scraping

1. Price Monitoring:
o Example: An e-commerce company monitors competitors’ prices to adjust its own
pricing strategy.
2. Market Research:
o Example: A company analyzes customer reviews from various websites to
understand market trends and customer preferences.
3. News Monitoring:
o Example: A news aggregator collects headlines and articles from multiple news
sources to provide comprehensive news coverage.
4. Sentiment Analysis:
o Example: A brand collects social media mentions to analyze customer sentiment
towards its products.
5. Email Marketing:
o Example: A marketing agency collects email addresses from industry-specific
websites to build a mailing list for promotional campaigns.

Analysis in Data Science

Definition: Analysis is the process of examining data to discover meaningful patterns,
relationships, and insights. It involves transforming raw data into useful information that can
help answer questions or make decisions.

Key Components:

1. Data Collection:
o Description: Gathering data from various sources relevant to the problem you are
trying to solve. This could involve surveys, sensors, databases, or public datasets.
o Example: If you’re studying the impact of study habits on student grades, you collect
data through a survey asking students about their study hours and their grades.
2. Data Cleaning:
o Description: Preparing the data for analysis by correcting or removing errors,
inconsistencies, or missing values. This ensures that the data is accurate and reliable.
o Example: If some students reported missing grades or study hours as zero, you either
correct these entries if possible or remove them from the dataset to avoid skewing
results.
Prepared by Mrs. Shah S. S.
10
3. Exploratory Data Analysis (EDA):
o Description: Investigating data sets to summarize their main characteristics and
discover patterns or anomalies. EDA often involves visualizations and basic statistical
analysis.
o Example: You create histograms to see the distribution of study hours and grades or
scatter plots to explore the relationship between study hours and grades.
4. Modeling:
o Description: Applying statistical or machine learning models to analyze data. This
involves selecting appropriate algorithms, training models on the data, and testing
their performance.
o Example: You use a linear regression model to understand how changes in study
hours affect grades, fitting the model to the collected data.
5. Interpretation:
o Description: Understanding the results from the models and analyses, and translating
these findings into meaningful insights or conclusions.
o Example: You interpret the results of your linear regression analysis to determine
that each additional hour of study is associated with a certain increase in grades.

Tools & Techniques:

• Statistical Analysis: Includes measures such as mean (average), median (middle value),
mode (most frequent value), and standard deviation (measure of variability).
• Machine Learning Models: Algorithms such as linear regression (predicts a value), decision
trees (classifies data), and clustering (groups similar data points).
• Visualization Tools: Software like Matplotlib or Seaborn in Python, or Excel for creating
charts such as scatter plots, bar charts, and heatmaps.

Reporting in Data Science

Definition: Reporting is the process of presenting the results of the data analysis in a clear,
concise, and meaningful way. It aims to communicate the findings to stakeholders or
decision-makers so they can make informed decisions.

Key Components:

1. Summary:
o Description: A brief overview of the main findings from the analysis. It highlights
the most important insights without going into too much detail.
o Example: You summarize that students who study more tend to have higher grades
based on the data analysis.
2. Visualization:
o Description: Creating charts, graphs, and tables to visually represent the data and
findings. Visualizations help in understanding complex information quickly and
clearly.
o Example: You create a bar chart showing the average grades for different ranges of
study hours, making it easy to see the trend.
3. Narrative:
o Description: Writing a clear and engaging explanation of the findings, including
context, methodology, and implications. This helps readers understand the
significance of the results.

Prepared by Mrs. Shah S. S.

11
o Example: You write a report explaining that students who study for more hours
generally perform better in exams, and you provide context about how this
relationship was analyzed.
4. Recommendations:
o Description: Providing actionable suggestions based on the analysis. This helps
stakeholders make decisions or take actions based on the findings.
o Example: You recommend that students should consider increasing their study time
to improve their academic performance.

Tools & Techniques:

• Reports and Dashboards: Tools like Microsoft Excel, Google Sheets, or business
intelligence software like Tableau for creating interactive dashboards.
• Presentations: Software like PowerPoint or Google Slides for presenting findings to an
audience.
• Documentation: Writing detailed reports or summaries that can be shared with stakeholders
or published.

Detailed Example: Study Hours and Grades

1. Analysis:

• Data Collection: Collect survey data from 100 students about their weekly study hours and
their grades.
• Data Cleaning: Check for missing or inconsistent data. For example, if some entries have
study hours listed as negative values, correct or remove these entries.
• Exploratory Data Analysis: Create a scatter plot of study hours versus grades. Calculate
summary statistics like the mean and standard deviation of study hours and grades.
• Modeling: Apply a linear regression model to predict grades based on study hours. The
model might show that an increase in study hours is associated with an increase in grades.
• Interpretation: The regression results might indicate that for every additional hour spent
studying, the grade improves by 2 points on average.

2. Reporting:

• Summary: “Our analysis of 100 students shows that those who study more hours tend to
achieve higher grades.”
• Visualization: Present a bar chart showing average grades for different ranges of study hours.
Include a scatter plot with a regression line to illustrate the relationship.
• Narrative: “The data suggests a positive correlation between study hours and grades.
Students who study more tend to perform better academically.”
• Recommendations: “Students should aim to increase their study time to improve their
academic performance. Educators might also consider encouraging study habits and providing
resources to support extended study periods.”

Introduction to Programming in Data Science

1. Data Manipulation

Data manipulation in data science involves several key tasks to ensure that the data is in the
best possible shape for analysis. Just as you might organize your school notes into folders or

Prepared by Mrs. Shah S. S.

12
rewrite them to make them clearer, data manipulation prepares raw data for further use.
Here’s a closer look at what this entails:

1. Data Cleaning:
o Handling Missing Values: Missing data can be problematic. You might need
to fill in missing values, drop rows or columns with missing data, or use
algorithms that handle missing values gracefully.
o Removing Duplicates: Sometimes data can have duplicate entries. Identifying
and removing these ensures that your analysis is accurate and not skewed by
repetitive data.
o Correcting Errors: Data might contain errors like incorrect data types (e.g.,
numbers stored as text) or outliers. Correcting these ensures data integrity.
2. Data Transformation:
o Normalization and Scaling: Data often needs to be scaled or normalized to
ensure that different features contribute equally to analysis or modeling. For
example, normalizing test scores so that they fit within a 0 to 1 range.
o Data Aggregation: Summarizing data, such as computing the average score
for each subject, to provide meaningful insights.
o Reshaping Data: Changing the structure of data (e.g., pivoting from long to
wide format) to suit the requirements of specific analyses or visualizations.
3. Data Integration:
o Merging Data: Combining datasets from different sources. For example,
merging student grades with attendance records to get a comprehensive view
of performance.
o Concatenation: Stacking datasets together, which is useful when you have
data split across multiple files or tables.

Tools Used

1. Python:
o What It Is: Python is a versatile programming language that is widely used in
data science. It is known for its readability and simplicity, making it a popular
choice for both beginners and experienced programmers.
o Why It’s Useful: Python has a vast ecosystem of libraries that facilitate data
manipulation, analysis, and visualization. Its syntax is designed to be intuitive,
allowing you to write efficient code with fewer lines.
2. Pandas:
o What It Is: Pandas is a powerful Python library specifically designed for data
manipulation and analysis. It provides two primary data structures:
▪ Series: A one-dimensional array-like object that can hold any data
type.
▪ DataFrame: A two-dimensional, tabular data structure similar to a
spreadsheet or SQL table. It is particularly useful for handling large
datasets.
o Key Functions:
▪ read_csv() and read_excel(): Functions for loading data from CSV or
Excel files into DataFrames.
▪ dropna(): Method for removing missing values from your dataset.
▪ fillna(): Method for filling in missing values with a specified value or
method.
Prepared by Mrs. Shah S. S.
13
▪ merge() and concat(): Methods for combining multiple DataFrames.
▪ groupby(): Method for grouping data and performing aggregate
operations.
▪ pivot_table(): Method for creating pivot tables that summarize data.

What is an IDE?

An IDE (Integrated Development Environment) is a software application that provides tools

for writing and managing code. It usually includes a code editor, debugging tools, and other
features to make programming easier.

2. Choosing and Installing an IDE

Here are some popular IDEs for Python that are user-friendly for beginners:

• PyCharm: A powerful IDE specifically for Python.

• Visual Studio Code (VS Code): A versatile editor with support for many languages,
including Python.
• Jupyter Notebook: An interactive environment useful for data science and machine learning.

Option A: Install PyCharm

1. Download PyCharm:
o Go to the PyCharm website.
o Download the Community version (free) for your operating system (Windows,
macOS, or Linux).
2. Install PyCharm:
o Open the downloaded installer file and follow the on-screen instructions to install
PyCharm.
3. Open PyCharm:
o Launch PyCharm from your applications list.
4. Create a New Project:
o Click on "New Project."
o Choose a location for your project and click "Create."
5. Create a New Python File:
o Right-click on your project folder in the "Project" pane.
o Select "New" > "Python File."
o Name your file (e.g., my_script.py).
6. Write Your Code:
o Type your Python code into the new file. For example:

print("Hello, World!")

7. Run Your Code:

o Right-click on your Python file and select "Run 'my_script'".

Option B: Install Visual Studio Code (VS Code)

1. Download VS Code:
o Go to the Visual Studio Code website.
o Download the installer for your operating system.

Prepared by Mrs. Shah S. S.

14
2. Install VS Code:
o Open the downloaded installer file and follow the on-screen instructions.
3. Open VS Code:
o Launch VS Code from your applications list.
4. Install Python Extension:
o Open VS Code.
o Click on the Extensions icon on the sidebar (or press Ctrl+Shift+X).
o Search for "Python" and install the extension provided by Microsoft.
5. Create a New File:
o Click on "File" > "New File" or press Ctrl+N.
6. Save the File:
o Click on "File" > "Save As" and name your file with a .py extension (e.g.,
my_script.py).
7. Write Your Code:
o Type your Python code into the file. For example:

print("Hello, World!")

8. Run Your Code:

o Open a terminal in VS Code by clicking on "Terminal" > "New Terminal".
o Type python my_script.py and press Enter.

Option C: Install Jupyter Notebook

1. Install Jupyter Notebook:

o Open a command prompt (Windows) or terminal (macOS/Linux).
o Type pip install notebook and press Enter. (Make sure you have Python installed and
pip set up.)
2. Launch Jupyter Notebook:
o In the command prompt or terminal, type jupyter notebookorpython -m notebook and
press Enter.
o This will open Jupyter Notebook in your web browser.
3. Create a New Notebook:
o Click on "New" > "Python 3" to create a new notebook.
4. Write Your Code:
o Type your Python code into the cells of the notebook. For example:

print("Hello, World!")

5. Run Your Code:

o Press Shift + Enter to execute the code in a cell.

Example of Data Manipulation in Python Using Pandas

Let’s say you have a CSV file with student grades and attendance records, and you want to
clean and analyze this data.

Prepared by Mrs. Shah S. S.

15
In this example, we:

• Loaded data from a CSV file.

• Cleaned the data by removing missing values and duplicates.
• Filled in missing grades and normalized scores.
• Aggregated data to calculate average scores.
• Merged the cleaned data with additional attendance records.

This process transforms raw data into a structured, clean format, making it ready for analysis
or modeling.

Prepared by Mrs. Shah S. S.

16
2. Data Analysis

Data Analysis

What It Means

Data analysis is about making sense of data to draw meaningful conclusions. After cleaning
and preparing your data, you use analysis to uncover insights, identify trends, and make data-
driven decisions. It's similar to analyzing your test results to understand which subjects you
excel in and which need improvement.

Steps in Data Analysis:

1. Descriptive Statistics: Summarizing the main features of your dataset.

o Mean: Average value (e.g., average test score).
o Median: Middle value when data is sorted (e.g., middle test score).
o Mode: Most frequently occurring value (e.g., most common score).
o Standard Deviation: Measure of the spread of data (e.g., variability in test scores).
2. Exploratory Data Analysis (EDA): Using statistical graphics and other data
visualization methods to understand data distributions and relationships.
o Histograms: Show the distribution of a single variable.
o Box Plots: Display the spread and identify outliers.
o Scatter Plots: Show relationships between two variables.
3. Correlation Analysis: Measuring how variables are related to each other.
o Correlation Coefficient: A numerical value that describes the strength and direction
of the relationship between two variables.
4. Trend Analysis: Identifying patterns or trends over time.
o Time Series Analysis:Analyzing data points collected or recorded at specific time
intervals.

Tools Used

1. Python

Python is a versatile programming language used for a variety of tasks in data analysis. Its
extensive libraries and simple syntax make it ideal for performing complex calculations and
analyses.

• Why Python?
o Readable Syntax: Easy to write and understand code.
o Versatile Libraries: Rich ecosystem for data manipulation, analysis, and
visualization.

2. NumPy

NumPy is a fundamental library in Python used for numerical computations. It provides

support for arrays and matrices, along with a collection of mathematical functions to operate
on these data structures.

• Key Features of NumPy:

Prepared by Mrs. Shah S. S.

17
o N-dimensional Arrays:numpy.array allows you to work with large, multi-dimensional
arrays and matrices.
o Mathematical Functions: Functions for operations like addition, multiplication, and
statistical calculations.
o Performance: Optimized for performance, making it efficient for large datasets.

Example of Data Analysis Using Python and NumPy

Let’s consider a dataset of student test scores to illustrate how Python and NumPy can be
used for data analysis:

How the Code Works:

1. Import Libraries:
o pandasis used to manipulate the CSV file data.
o matplotlib.pyplot is used to visualize the student scores.
Prepared by Mrs. Shah S. S.
18
2. Load Data:
o The code reads the CSV file using pd.read_csv(). Replace the file path with the actual
path to your CSV file.
3. Descriptive Statistics:
o data.describe() prints basic statistics about the scores and attendance.
4. Correlation Analysis:
o data.corr() shows how the variables (Score and Attendance) are related.
5. Visualization:
o A line graph is plotted using plt.plot() to visualize student names and scores.

3. Data Visualization

What Data Visualization Means

Data Visualization is the process of representing data visually through charts and graphs.
This helps make complex data more understandable and accessible. Just as you might draw a
graph to track your progress in various subjects, data scientists use visualizations to uncover
patterns, trends, and insights from data.

Tools Used

Matplotlib

• What It Is: Matplotlib is a powerful and flexible Python library used for creating
static, animated, and interactive visualizations. It’s like having a versatile set of tools
to draw any kind of graph you need.
• Key Features:
o Basic Plot Types: You can create line plots, bar charts, histograms, scatter plots, pie
charts, and more.
o Customization: You can adjust colors, labels, legends, and titles to make your graphs
clear and visually appealing.
o Integration: Works well with other libraries like NumPy and Pandas to visualize
data directly from data structures.
• Example Use Case:

import matplotlib.pyplot as plt

# Sample data

subjects = ['Math', 'Science', 'English']

scores = [85, 90, 78]

# Creating a bar chart

plt.bar(subjects, scores, color='blue')

plt.xlabel('Subjects')

Prepared by Mrs. Shah S. S.

19
plt.ylabel('Scores')

plt.title('Subject Scores')

plt.show()

• In this example, plt.bar() creates a bar chart showing scores in different subjects. The
plt.xlabel(), plt.ylabel(), and plt.title() functions add labels and a title to the chart.

Seaborn

• What It Is: Seaborn is built on top of Matplotlib and provides a high-level interface
for drawing attractive and informative statistical graphics. It’s like an upgraded
version of Matplotlib that makes it easier to create complex visualizations with less
code.
• Key Features:
o Predefined Themes: Comes with several built-in themes that make your plots look
good without needing much customization.
o Statistical Plots: Includes functions for creating complex visualizations like
heatmaps, violin plots, and pair plots, which are useful for statistical analysis.
o Ease of Use: Simplifies the process of creating plots with a more intuitive API
compared to Matplotlib.
• Install Seaborn on jupyter notebook

!pip install seaborn or python -m pip install seaborn

• Example Use Case:

import seaborn as sns

import matplotlib.pyplot as plt

data = {

'Subjects': ['Math', 'Science', 'English'],

Prepared by Mrs. Shah S. S.
20
'Scores': [85, 90, 78]

# Creating a bar plot

sns.barplot(x='Subjects', y='Scores', data=data, palette='viridis')

plt.xlabel('Subjects')

plt.ylabel('Scores')

plt.title('Subject Scores')

plt.show()

In this example, sns.barplot() creates a bar plot with a color palette applied. The
palette='viridis' argument changes the colors of the bars.

4.Building models

It is a core aspect of data science and machine learning. It involves creating

algorithms that can learn from your data and make predictions or decisions based on
it. Think of it like teaching a computer how to recognize patterns or trends so it can
forecast future outcomes.

For example, if you have historical data on how many hours students studied and their test
scores, you can build a model to predict a student's future test score based on their study
hours. The model learns from the past data and tries to generalize this knowledge to make
predictions about new, unseen data.

Steps in Building Models

1. Define the Problem:

Prepared by Mrs. Shah S. S.
21
o Decide what you want the model to predict or classify. For instance, predicting test
scores based on study hours or classifying emails as spam or not spam.
2. Prepare the Data:
o Collect Data: Gather the data that your model will use. This could be from surveys,
databases, or other sources.
o Clean Data: Handle missing values, remove duplicates, and ensure the data is in a
usable format.
o Split Data: Divide your data into training and testing sets. The training set is used to
build the model, while the testing set evaluates its performance.
3. Choose a Model:
o Algorithm Selection: Select a machine learning algorithm suitable for your problem.
This could be a linear regression model for predicting numbers or a decision tree for
classification tasks.
4. Train the Model:
o Fit the Model: Use the training data to teach the model how to make predictions. The
model learns by adjusting its parameters to minimize errors.
5. Evaluate the Model:
o Test Performance: Use the testing data to assess how well the model performs.
Common metrics include accuracy, precision, recall, and F1 score for classification,
and mean squared error for regression.
6. Optimize the Model:
o Tuning: Adjust the model’s parameters or try different algorithms to improve
performance.
o Validation: Use techniques like cross-validation to ensure the model’s performance
is consistent across different subsets of data.
7. Deploy the Model:
o Implementation: Make the model available for real-world use, such as integrating it
into an application or system.
8. Monitor and Maintain:
o Performance Tracking: Regularly check the model’s performance and update it as
needed to account for new data or changes in patterns.

Tools Used

• Scikit-learn:
o Overview:Scikit-learn is a widely-used Python library that provides a range of tools
for building machine learning models.
o Features:
▪ Algorithms: Includes many algorithms for classification (e.g., logistic
regression, decision trees), regression (e.g., linear regression), clustering
(e.g., k-means), and more.
▪ Preprocessing: Tools for preparing data, such as normalization and encoding
categorical variables.
▪ Model Evaluation: Functions to assess model performance, including
metrics like accuracy and confusion matrices.
▪ Model Selection: Utilities for splitting data, cross-validation, and
hyperparameter tuning.

Example: Building a Model with Scikit-learn

Let’s say you want to build a model to predict test scores based on study hours. Here’s a
simplified workflow using Scikit-learn:

Prepared by Mrs. Shah S. S.

22
In this example, Scikit-learn helps you with data splitting, model training, making
predictions, and evaluating performance. This streamlined process allows you to focus on
understanding and refining your model rather than worrying about the underlying
mathematical details.

Prepared by Mrs. Shah S. S.

23
By following these steps and utilizing Scikit-learn, you can effectively build and deploy
predictive models to analyze data and make informed decisions.

Toolkits using Python: Matplotlib, NumPy, Scikit-learn, NLTK

Installing Python and Libraries

To run Python code and use libraries like Matplotlib, NumPy, Scikit-learn, and NLTK, you
need Python installed on your computer.

1. Download and Install Python:

o Go to the Python website.
o Download the latest version for your operating system and follow the installation
instructions.
2. Install Libraries Using Pip:
o Open a command prompt (Windows) or terminal (macOS/Linux).
o Type the following commands to install each library:

pip install matplotlib numpy scikit-learn nltk

1. Matplotlib

What It Is:

• Matplotlib is a tool used to create graphs and charts. It helps you visualize data so you can
understand patterns and trends.

How to Use It:

Step 1: Install Matplotlib

• Before you can use Matplotlib, you need to install it. This is usually done with a command
like pip install matplotlib (but don't worry about this right now; it's often set up for you in
courses).

Step 2: Create a Simple Graph

• Here’s a simple example of how to create a line graph using Matplotlib:

import matplotlib.pyplot as plt

# Data to plot
x = [1, 2, 3, 4, 5] # X-axis values
y = [2, 3, 5, 7, 11] # Y-axis values

# Create a line plot

plt.plot(x, y, label='Prime Numbers')

# Add a title and labels

plt.title('Simple Line Plot')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
Prepared by Mrs. Shah S. S.
24
# Show the plot
plt.show()

Explanation:

• Importing: You first import the matplotlib.pyplot module, which contains functions to create
plots.
• Data: Define what data you want to plot. In this case, x is the list of values for the x-axis, and
y is for the y-axis.
• Plotting: Use plt.plot(x, y) to create a line plot.
• Customizing: Add a title and axis labels to make the plot clear.
• Displaying: plt.show() displays the plot on your screen.

2. NumPy

What It Is:

• NumPy is a tool used for numerical calculations. It helps you perform mathematical
operations on large amounts of data quickly.

How to Use It:

Step 1: Install NumPy

• Similar to Matplotlib, you’d install NumPy using pip install numpy.

Step 2: Perform Basic Operations

• Here’s how to perform basic calculations using NumPy:

import numpy as np

# Create a NumPy array

data = np.array([1, 2, 3, 4, 5])

# Perform operations
Prepared by Mrs. Shah S. S.
25
mean = np.mean(data) # Calculate the mean
sum = np.sum(data) # Calculate the sum

print('Mean:', mean)
print('Sum:', sum)

Output-

Explanation:

• Importing: You import the numpy module as np.

• Creating Arrays: Use np.array to create an array of numbers.
• Calculations: np.mean() calculates the average, and np.sum() calculates the total of the
numbers in the array.
• Printing: Display the results.

3. Scikit-learn

What It Is:

• Scikit-learn is a toolkit for machine learning. It helps you build models that can make
predictions or classify data.

How to Use It:

Step 1: Install Scikit-learn

• Install it with pip install scikit-learn.

Step 2: Create a Simple Model

• Here’s how to build a basic model to predict values:

from sklearn.linear_model import LinearRegression

import numpy as np
# Sample data
X = np.array([[1], [2], [3], [4], [5]]) # Features (study hours)
y = np.array([2, 4, 6, 8, 10]) # Targets (test scores)

# Create and train the model

model = LinearRegression()
model.fit(X, y)

# Make a prediction
prediction = model.predict([[6]])
print('Predicted test score for 6 hours of study:', prediction[0])

Prepared by Mrs. Shah S. S.

26
Output-

Explanation:

• Importing: Import the LinearRegression class from sklearn.linear_model and NumPy.

• Data: Define X as the number of study hours and y as the corresponding test scores.
• Creating Model: Instantiate LinearRegression and train it with .fit().
• Prediction: Use .predict() to estimate the score for 6 hours of study.

4. NLTK (Natural Language Toolkit)

What It Is:

• NLTK is used for processing and analyzing text data. It helps with tasks like tokenizing
(breaking text into words) and finding patterns in language.

How to Use It:

Step 1: Install NLTK

• Install it using pip install nltk.

Step 2: Analyze Text

• Here’s a basic example of tokenizing a sentence:

import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer, WordNetLemmatizer
from nltk import pos_tag

# Download necessary NLTK resources

nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')

# Sample text
text = "NLTK is a great library for performing text processing tasks,
such as tokenization, stemming, and lemmatization."

# Step 1: Tokenization
words = word_tokenize(text)

# Step 2: Remove Stopwords

stop_words = set(stopwords.words('english'))

Prepared by Mrs. Shah S. S.

27
filtered_words = [word for word in words if word.lower() not in
stop_words]

# Step 3: Stemming
stemmer = PorterStemmer()
stemmed_words = [stemmer.stem(word) for word in filtered_words]

# Step 4: Lemmatization
lemmatizer = WordNetLemmatizer()
lemmatized_words = [lemmatizer.lemmatize(word) for word in
filtered_words]

# Results
print("Original Words:", words)
print("Filtered Words (no stopwords):", filtered_words)
print("Stemmed Words:", stemmed_words)
print("Lemmatized Words:", lemmatized_words)

1. Remove Stopwords
Stopwords are common words in a language that are often filtered out before processing text.
Examples include “the,” “is,” “in,” “and,” etc. These words are usually removed because they
don’t carry significant meaning and can clutter the analysis.

2. Stemming
Stemming is the process of reducing words to their base or root form. For example,
“running,” “runner,” and “ran” might all be reduced to “run.” The goal is to treat different
forms of a word as the same term. However, stemming can sometimes produce non-words
(e.g., “running” becomes “run”).

3. Lemmatization
Lemmatization is similar to stemming but more sophisticated. It reduces words to their base
or dictionary form, known as a lemma. For example, “running” would be reduced to “run,”
and “better” would be reduced to “good.” Lemmatization ensures that the resulting words are
valid and meaningful.
Summary

1. Matplotlib helps you create visualizations like graphs and charts.

2. NumPy allows you to perform mathematical operations on data.
Prepared by Mrs. Shah S. S.
28
3. Scikit-learn is used to build machine learning models to make predictions.
4. NLTK helps with analyzing and processing text data.

Prepared by Mrs. Shah S. S.

Algorithmic Marketing Ai For Marketing Operations r1 3g
100% (1)
Algorithmic Marketing Ai For Marketing Operations r1 3g
506 pages
Marco de Referencia de Arquitectura de Datos de BUC
No ratings yet
Marco de Referencia de Arquitectura de Datos de BUC
4 pages
Data Analytics All 5 Units
No ratings yet
Data Analytics All 5 Units
63 pages
Automation Anywhere Training
100% (1)
Automation Anywhere Training
49 pages
Big Data Processing
No ratings yet
Big Data Processing
38 pages
EY 6 Trends That Will Change The TV Industry PDF
No ratings yet
EY 6 Trends That Will Change The TV Industry PDF
24 pages
Bda Unit 1
No ratings yet
Bda Unit 1
20 pages
Nasscom-Future of Work-2024-December 2024
No ratings yet
Nasscom-Future of Work-2024-December 2024
68 pages
BDA Unit-1
No ratings yet
BDA Unit-1
35 pages
Bda (Unit 1)
No ratings yet
Bda (Unit 1)
24 pages
Test 6
No ratings yet
Test 6
32 pages
Literature Review On Big Data Analytics
100% (2)
Literature Review On Big Data Analytics
7 pages
Bda Combined
No ratings yet
Bda Combined
102 pages
Project Charter TSTH SAP ECC 6.0 Implmenation
100% (6)
Project Charter TSTH SAP ECC 6.0 Implmenation
37 pages
Sales, Marketing, Product & Brand Live Projects.V2.0
No ratings yet
Sales, Marketing, Product & Brand Live Projects.V2.0
30 pages
Introduction To Big Data Platform (Module-3)
No ratings yet
Introduction To Big Data Platform (Module-3)
23 pages
AFDM UNIT 2 Notes
No ratings yet
AFDM UNIT 2 Notes
29 pages
20210913115458D3708 - Session 01 Introduction To Big Data Analytics
No ratings yet
20210913115458D3708 - Session 01 Introduction To Big Data Analytics
28 pages
Big Data Study Material Part 1 (Unit I) - 1
No ratings yet
Big Data Study Material Part 1 (Unit I) - 1
38 pages
Big Data
No ratings yet
Big Data
19 pages
Introduction To Simulation: MS 5225 Business Process Modeling & Simulation
No ratings yet
Introduction To Simulation: MS 5225 Business Process Modeling & Simulation
43 pages
Bda Module 1 Notes
No ratings yet
Bda Module 1 Notes
10 pages
BIG DATA 1 Unit
100% (1)
BIG DATA 1 Unit
17 pages
Big Data Unit 1 Notes
100% (1)
Big Data Unit 1 Notes
27 pages
998-20383727 - EcoStruxure Grid - GMA - Ebrochure - Final
No ratings yet
998-20383727 - EcoStruxure Grid - GMA - Ebrochure - Final
27 pages
The Edge Completes The Cloud: A Gartner Trend Insight Report
No ratings yet
The Edge Completes The Cloud: A Gartner Trend Insight Report
26 pages
HR Analytics Applications
No ratings yet
HR Analytics Applications
26 pages
Sources and Nature of Data
No ratings yet
Sources and Nature of Data
44 pages
Imp Answers
No ratings yet
Imp Answers
29 pages
Analyticsindiamag Com
No ratings yet
Analyticsindiamag Com
22 pages
Big Data
No ratings yet
Big Data
84 pages
Business Analytics
No ratings yet
Business Analytics
21 pages
2 Emerging
No ratings yet
2 Emerging
10 pages
Detailed Explanation of Big Data Architecture Components
No ratings yet
Detailed Explanation of Big Data Architecture Components
15 pages
Partiunit5introduction To Big Data Its Type and Advantagedisadvantages
No ratings yet
Partiunit5introduction To Big Data Its Type and Advantagedisadvantages
4 pages
1 s2.0 S277323712200020X Main
No ratings yet
1 s2.0 S277323712200020X Main
16 pages
Rakesh Internship File
No ratings yet
Rakesh Internship File
57 pages
Codetru - Big Data
No ratings yet
Codetru - Big Data
17 pages
Ds Unit 1 Notes
No ratings yet
Ds Unit 1 Notes
23 pages
Ds Unit 2 Notes
No ratings yet
Ds Unit 2 Notes
26 pages
Transaction Laundering
No ratings yet
Transaction Laundering
12 pages
Artificial Intelligence Enabled Marketing Solution
No ratings yet
Artificial Intelligence Enabled Marketing Solution
14 pages
Unit2 ML
No ratings yet
Unit2 ML
79 pages
Unit 1 Big Data
No ratings yet
Unit 1 Big Data
34 pages
BDA Unit 1 Notes
No ratings yet
BDA Unit 1 Notes
34 pages
BDA ppt1
No ratings yet
BDA ppt1
45 pages
Prospects of Artificial Intelligence
No ratings yet
Prospects of Artificial Intelligence
18 pages
Basics of Big Data Notes
No ratings yet
Basics of Big Data Notes
17 pages
Lecture 1
No ratings yet
Lecture 1
25 pages
Big Data in Computer Cyber Security Systems
No ratings yet
Big Data in Computer Cyber Security Systems
10 pages
Unit 3 - ML - CH-1
No ratings yet
Unit 3 - ML - CH-1
45 pages
Big Data Chapter 1
No ratings yet
Big Data Chapter 1
7 pages
Manu
No ratings yet
Manu
26 pages
Big Data (Unit 1)
No ratings yet
Big Data (Unit 1)
32 pages
Fbda Unit-1
No ratings yet
Fbda Unit-1
17 pages
Datasheet About Workday Se
No ratings yet
Datasheet About Workday Se
5 pages
Unit 2
No ratings yet
Unit 2
19 pages
Unit - Big - Data
No ratings yet
Unit - Big - Data
107 pages
CRM Implementation: A Boon To Luxury Retailer's Strategy & Operation
No ratings yet
CRM Implementation: A Boon To Luxury Retailer's Strategy & Operation
6 pages
Unit 1
No ratings yet
Unit 1
21 pages
Data Science Vs Big Data
No ratings yet
Data Science Vs Big Data
34 pages
BIG DATA INTRODUCTION Hadoop
No ratings yet
BIG DATA INTRODUCTION Hadoop
24 pages
Big Data Analytics
No ratings yet
Big Data Analytics
21 pages
Big Data Analytics QB
No ratings yet
Big Data Analytics QB
44 pages
BDA Unit 1 Notes-1
No ratings yet
BDA Unit 1 Notes-1
34 pages
Leapstack (EN)
No ratings yet
Leapstack (EN)
2 pages
Itfm Assignment Group 5
No ratings yet
Itfm Assignment Group 5
14 pages
AI Primer
No ratings yet
AI Primer
24 pages
BD 1
No ratings yet
BD 1
15 pages
Chapter 2
No ratings yet
Chapter 2
22 pages
BIG Data Analytics 21CSH-471: Computer Science & Engineering
No ratings yet
BIG Data Analytics 21CSH-471: Computer Science & Engineering
17 pages
Big Data Analytics
No ratings yet
Big Data Analytics
58 pages
Module 1. 16974328175990
No ratings yet
Module 1. 16974328175990
119 pages
Bda (Chapter 1)
No ratings yet
Bda (Chapter 1)
8 pages
Lecture1 Introductiontobigdata 190301171350
No ratings yet
Lecture1 Introductiontobigdata 190301171350
63 pages
Introduction To Big Data: Types of Digital Data, History of Big Data Innovation
No ratings yet
Introduction To Big Data: Types of Digital Data, History of Big Data Innovation
12 pages
Questionnaire
No ratings yet
Questionnaire
3 pages
Unit-1 Final Sgs
No ratings yet
Unit-1 Final Sgs
24 pages
BDA Unit 1
No ratings yet
BDA Unit 1
22 pages
UNIT-1 Bda Kalyan
No ratings yet
UNIT-1 Bda Kalyan
25 pages
Data Engineering 101
No ratings yet
Data Engineering 101
1 page
Unit 1 Introduction To Data Analytics
No ratings yet
Unit 1 Introduction To Data Analytics
20 pages
Bigdata Analytics
No ratings yet
Bigdata Analytics
19 pages
Sengamala Thayaar Educational Trust Women's College: Sundarakkottai, Mannargudi
No ratings yet
Sengamala Thayaar Educational Trust Women's College: Sundarakkottai, Mannargudi
14 pages
Specialist GAR
No ratings yet
Specialist GAR
4 pages
What Is Data
No ratings yet
What Is Data
20 pages
Module 1 - Big Data
No ratings yet
Module 1 - Big Data
8 pages
Big Data Analytics
No ratings yet
Big Data Analytics
64 pages
Unit 1
No ratings yet
Unit 1
19 pages
Unit 1: To Data Science
No ratings yet
Unit 1: To Data Science
56 pages
Big Data: Concepts, Techniques, Storage and Challenges
No ratings yet
Big Data: Concepts, Techniques, Storage and Challenges
9 pages
Challenges in Big Data Analytics Techniques
No ratings yet
Challenges in Big Data Analytics Techniques
6 pages
Ccs 334
No ratings yet
Ccs 334
16 pages
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.