0% found this document useful (0 votes)

17 views68 pages

B3 Twitter Data

Uploaded by

Siddhartha Jella

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views68 pages

B3 Twitter Data

Uploaded by

Siddhartha Jella

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 68

UNVEILING INSIGHTS WITH TWITTER DATA:

EXPLORING TRENDS, SENTIMENTS, AND PREDICTIONS

THROUGH SOCIAL MEDIA MINING
ABSTRACT

With the rise of social media platforms, Twitter has become a significant source of real-time
data. Analyzing Twitter data can provide valuable insights into public opinions, sentiments,
and trends. The use of Twitter data for analysis gained prominence with the growth of social
media platforms in recent years. Researchers and businesses recognized the potential of
Twitter data in understanding public sentiment, predicting trends, and conducting market
research. As a result, various methods and algorithms were developed to process and classify
Twitter data efficiently. Traditional systems often rely on basic text processing techniques
such as tokenization, stemming, and stop-word removal. While these techniques are useful,
they were not be sufficient for handling the unique characteristics of Twitter data, such as
hashtags, mentions, and emoticons. In addition, the unstructured and noisy nature of Twitter
data poses challenges for effective analysis. Therefore, the need for a comprehensive pre-
processing approach arises from the growing importance of Twitter data in decision-making
processes. Businesses, researchers, and organizations rely on Twitter data for sentiment
analysis, brand monitoring, and trend prediction. To extract meaningful insights from this
data, it is essential to preprocess it effectively, ensuring that irrelevant information and noise
are removed while preserving the context and nuances of social media language. Thus, this
research proposes the effective classification of Twitter data using machine learning
algorithms. This comprehensive pre-processing approach is significant for several reasons
such as improved accuracy, better understanding of public opinion, enhanced decision
making, and research advancements.
CHAPTER 1

INTRODUCTION

1.1 History

The history of analyzing Twitter data for insights traces back to the early 2000s when social
media platforms began to burgeon. With the inception of Twitter in 2006, a new avenue for
real-time data analysis emerged. Initially, researchers and businesses viewed Twitter as a
platform for social interaction. However, as its user base expanded exponentially, it became
evident that Twitter harbored a wealth of information beyond mere conversations.

Around 2010, the academic community and industry pioneers started recognizing Twitter's
potential as a goldmine for understanding public sentiment, predicting trends, and conducting
market research. This recognition marked the onset of a concerted effort to develop methods
and algorithms specifically tailored for processing and classifying Twitter data effectively.

Traditional systems relied on rudimentary text processing techniques like tokenization,

stemming, and stop-word removal. These methods, while useful, struggled to cope with the
unique characteristics of Twitter data, such as hashtags, mentions, and emoticons.
Consequently, researchers began to explore more sophisticated approaches to address the
challenges posed by the unstructured and noisy nature of Twitter data.

The evolution of machine learning algorithms further propelled the analysis of Twitter data.
Researchers started experimenting with various models to extract insights from the vast pool
of tweets generated every second. This experimentation led to the development of novel
techniques aimed at improving the accuracy and efficiency of Twitter data classification.

Research Motivation:

The motivation behind the comprehensive pre-processing approach for high-performance

classification of Twitter data stems from the increasing reliance on social media data for
decision-making processes. Businesses, researchers, and organizations have come to
understand the value embedded in Twitter data for sentiment analysis, brand monitoring, and
trend prediction.

Effective preprocessing of Twitter data is crucial to derive meaningful insights while

mitigating the influence of noise and irrelevant information. Without adequate preprocessing,
the analysis may yield inaccurate or misleading results, hampering the decision-making
process.

Furthermore, the growing importance of Twitter data in shaping public opinion and driving
market trends accentuates the need for robust preprocessing techniques. By harnessing the
power of machine learning algorithms coupled with comprehensive preprocessing,
stakeholders can gain a deeper understanding of consumer behavior, market dynamics, and
societal trends.

Problem Statement:

The problem statement revolves around the inadequacy of traditional text processing
techniques in handling the unique characteristics of Twitter data. Conventional methods such
as tokenization, stemming, and stop-word removal fall short when confronted with hashtags,
mentions, and emoticons prevalent in tweets.

The unstructured and noisy nature of Twitter data poses significant challenges for effective
analysis. Without a comprehensive preprocessing approach, it becomes challenging to extract
meaningful insights while preserving the context and nuances of social media language.

Businesses, researchers, and organizations face the daunting task of navigating through the
vast sea of tweets to distill relevant information and derive actionable insights. Therefore,
there is an urgent need to develop a robust preprocessing framework that can effectively filter
out noise and irrelevant content while retaining the essence of Twitter discourse.

Applications:

The applications of the proposed comprehensive preprocessing approach are multifaceted and
span across various domains. In the realm of business, organizations can leverage Twitter
data for sentiment analysis to gauge customer satisfaction, identify emerging trends, and
monitor brand perception. By preprocessing the data effectively, businesses can extract
actionable insights to inform marketing strategies, product development, and customer
engagement initiatives.

In the field of academia, researchers can utilize Twitter data to study social phenomena,
conduct opinion polls, and analyze public discourse on diverse topics ranging from politics to
health. A robust preprocessing approach ensures the reliability and validity of research
findings by filtering out noise and irrelevant information inherent in Twitter data.
Moreover, government agencies can harness Twitter data for real-time monitoring of public
sentiment, crisis management, and disaster response. By preprocessing the data
comprehensively, policymakers can gain valuable insights into public opinion, identify areas
of concern, and formulate timely interventions to address societal issues.
CHAPTER 2

LITERATURE SURVEY
Sanjay et al. [8] conducted sentiment analysis on Twitter data related to the Indian farmer
protests to gain insights into global public sentiment. They employed algorithms to
analyze approximately twenty thousand tweets associated with theprotests and assess the
sentiments expressed. The researchers analyzed and contrasted the success of 2 popular
text representation techniques BoW and TF-IDF, and discovered that BoW
outperformed TF-IDF in sentiment analysis accuracy. The study further involved the
application of various classifiers, including SVM, RF, DT, and NB, on the dataset. The
results revealed that the RF classifier achieved the best possible accuracy among the
evaluated classifiers.

Behl et al. [9] gathered tweets related to various natural disasters and categorized
them into three groups based on their content: "resource availability," "resource
requirements," and "others." To accomplish this classification task, they employed a
Multi-Layer Perceptron (MLP) network with an optimizer. The proposed model
demonstrated an accuracy of 83%, indicating its effectiveness in accurately
classifying the tweets into the designated categories.

Tan et al. [10] introduced a model that combined BI-LSTM, RoBERTa, and GRU
models. To further enhance the general effectiveness of sentiment analysis, the
model's predictions were averaged using majority voting. Addressing the
challenges posed by unbalanced datasets, the researchers enhanced the data by
utilizing GloVe pre-trained word embeddings. The experimental results
demonstrated that the proposed model surpassed state-of-the-art approaches, achieving
accuracy rates of 0.942, 0.892, and0.9177 on the Sentiment 140, USAirlines, and IMDB
datasets, respectively.For Aspect-level SA, Lu et al. [11] presented IRAN
(Interactive Rule Attention Network). Tosimulate the operation of grammar at the
sentence level, IRAN includes a grammar rule encoder that normalizes the result of
adjacent locations. Furthermore, IRAN makes use of an attention network that interacts
with its environment to better understand the target and its surroundings. We show that
IRAN learns informative features successfully and beats baseline models by
experimenting on the ACL 2014 Twitter & SemEval 2014 datasets. As a result of these
results, it is clear that IRAN is an effective tool for aspect-level sentiment analysis, which
can lead to enhanced performance in the field.

In their study, Mehta et al. [12] conducted a relative investigation of SA specifically

focused on big data. They identified six types of emotions, namely happy, sad, joy,
surprise, disgust, and fear. Additionally, they judged various methods for emotion
identification that can serve as potential avenues for future research in the field.
This analysis provides valuable insights into sentiment analysis in the context of big
data, offering a foundation for exploring emotion identification techniques and
theirapplications.

He et al. [13] introduced LGCF, a multilingual learning paradigm that emphasized active
learning inboth global and local contexts. Unlike its predecessors, this model, LGCF
International Journal of Intelligent Systems and Applications in EngineeringIJISAE, 2024,
12(1), 235–266|237demonstrated the ability to effectively learn the connections between
target aspects and local contexts, along with the connections between target aspects
and global contexts simultaneously. This innovative approach enables the model to capture
and utilizeboth local and global contextual information efficiently, enhancing its overall
performance in sentiment analysis tasks.

In their study [14], an extensive evaluation of sentiment polarity classification methods

was specifically designed for Twitter text. Notably, they expanded the comparison by
including a combination of classifiers in their analysis and introduced the aggregation and
utilization of manually annotated tweets for method evaluation. This aspect is
considered a significant contribution because previous attempts at automated
annotation based on features like emoticons have proven problematic. Such
automated approaches often fail to accurately capture the overall sentiment expressed
by the author, particularly when considering instances of neutral sentiment or the
absenceof sentiment in the text. The inclusion of manual annotations addresses this
limitation and adds value to the evaluation process of SA methods for Twitter text.

To better understand the state of the art in SA using DNNs and CNNs, Qurat et al. [15]
undertook a systematic literature review of current studies. Topics covered in their
investigation of sentiment analysis included text sentiment categorization, cross-lingual
analysis, and both textual and visual analysis. Datasets were culled from a wide range of
social media platforms. The authors presented the various stages of the successful
construction of DL models in emotion analysis and noted that many difficulties in this
field were efficiently solved with high accuracy using deep learning methodologies. With
their more complex structures, deep learning networks were able to extract and represent
features more accurately than traditional neural networks and SVMs. This study
demonstrates the benefits of using DL models for sentiment analysis, which can lead to
improved results in emotion analysis.
CHAPTER 3

EXISTING SYSTEM

3.1 Traditional System Overview

Before the integration of AI and machine learning, the analysis of Twitter data primarily
relied on traditional text processing techniques. These methods were rudimentary, focusing
on basic text manipulation rather than understanding the underlying meaning or context of
the data. Techniques such as tokenization, which breaks down text into individual words or
tokens; stemming, which reduces words to their base or root form; and stop-word removal,
which eliminates common but uninformative words, were commonly used. While these
techniques allowed for some level of text processing, they were limited in their ability to
handle the unique features of Twitter data.

Twitter, as a platform, presents several challenges for traditional text processing. The
presence of hashtags, mentions, and emoticons adds layers of complexity that traditional
methods struggle to manage. For example, hashtags and mentions often carry significant
contextual information, and their proper interpretation is crucial for accurate sentiment
analysis and trend prediction. Traditional systems also faced difficulties with the unstructured
and noisy nature of Twitter data, where informal language, abbreviations, and slang are
prevalent. As a result, the insights derived from such data were often shallow and lacked
depth, limiting their usefulness in decision-making processes.

3.2 Limitations of Traditional Methods

1. Handling Informal Language:

o Twitter users often employ informal language, abbreviations, and slang, which
traditional text processing techniques struggle to interpret accurately. This
leads to a loss of contextual meaning, reducing the effectiveness of analysis.

2. Contextual Understanding:

o Traditional methods lack the ability to grasp the nuanced meanings behind
words or phrases, particularly in the context of hashtags, mentions, and
emoticons. These elements are often critical for understanding the sentiment
and intent behind a tweet.

3. Noise in Data:

o Twitter data is inherently noisy, with a significant amount of irrelevant

information, such as spam, advertisements, and redundant content. Traditional
methods do not efficiently filter out this noise, leading to inaccurate results.

4. Scalability Issues:

o As the volume of Twitter data grows exponentially, traditional systems

struggle to scale effectively. The basic text processing techniques used in these
systems are not designed to handle large datasets, leading to performance
bottlenecks and delays.

5. Sentiment Analysis Limitations:

o Traditional methods often rely on simple keyword-based approaches for

sentiment analysis, which do not account for the complexity of human
emotions expressed in tweets. This results in inaccurate sentiment
classification.

6. Lack of Real-time Analysis:

o Traditional systems are typically batch-oriented and cannot provide real-time

analysis of Twitter data. This is a significant limitation for businesses and
organizations that require timely insights for decision-making.

7. Limited Predictive Capabilities:

o Without advanced algorithms, traditional methods are unable to predict trends

or future events based on historical Twitter data. This limits their applicability
in areas such as market research and trend forecasting.

8. Inadequate Feature Extraction:

o The basic text processing techniques used in traditional systems fail to extract
meaningful features from Twitter data, such as user interactions, tweet
metadata, and temporal patterns. This results in a loss of valuable information
that could enhance analysis.

CHAPTER 4

PROPOSED SYSTEM

4.1 Overview

This Python script appears to be a comprehensive approach for preprocessing and classifying
Twitter data using various machine learning algorithms.

Figure 4.1: Block Diagram

 Importing Libraries:
The script begins by importing necessary libraries, including NumPy, Pandas,
Matplotlib, Seaborn, NLTK, and warnings.
 Loading Data:
The training and testing datasets are loaded from CSV files using Pandas.
The shape of the datasets is printed to provide an overview of the data size.
 Exploratory Data Analysis (EDA):
Displaying the first few rows of the training and testing datasets to inspect the
structure of the data. Checking for missing values in both datasets. Exploring positive
and negative comments in the training set. Visualizing the distribution of tweet
lengths in both training and testing datasets. Creating a new column to represent the
length of each tweet.Grouping the data by label (positive or negative) and analyzing
statistics.
 Data Visualization:
 Creating count plots and histograms to visualize the distribution of tweet lengths,
label frequencies, and hashtag frequencies. Generating word clouds to display the
most frequent words in the overall vocabulary, neutral words, and negative words.
 Hashtag Analysis:
Extracting hashtags from both positive and negative tweets. Creating frequency
distributions and bar plots to display the most common hashtags in each category.
 Word Embeddings with Word2Vec:
Using Gensim to train a Word2Vec model on tokenized tweets. Demonstrating word
similarities for certain words using the trained Word2Vec model.
 Text Preprocessing:
Removing unwanted patterns, converting text to lowercase, and stemming words
using NLTK.Creating bag-of-words representations for both the training and testing
datasets.
 Model Training:
Splitting the training dataset into training and validation sets.
 Standardizing the data using StandardScaler.
Training machine learning models including RandomForestClassifier,
LogisticRegression
 Evaluating the models on the validation set, calculating training and validation
accuracy, F1 score, and generating confusion matrices.

The script covers a wide range of tasks from data loading and exploration to text
preprocessing, visualization, and training various machine learning models for sentiment
analysis on Twitter data.

4.2 Data Preprocessing & Dataset Splitting

Data pre-processing is a process of preparing the raw data and making it suitable for a
machine learning model. It is the first and crucial step while creating a machine learning
model. When creating a machine learning project, it is not always a case that we come across
the clean and formatted data. And while doing any operation with data, it is mandatory to
clean it and put in a formatted way. So, for this, we use data pre-processing task. A real-world
data generally contains noises, missing values, and maybe in an unusable format which
cannot be directly used for machine learning models. Data pre-processing is required tasks
for cleaning the data and making it suitable for a machine learning model which also
increases the accuracy and efficiency of a machine learning model.

 Getting the dataset

 Importing libraries
 Importing datasets
 Finding Missing Data
 Encoding Categorical Data
 Splitting dataset into training and test set

Dataset Splitting

In machine learning data pre-processing, we divide our dataset into a training set and test set.
This is one of the crucial steps of data pre-processing as by doing this, we can enhance the
performance of our machine learning model. Suppose if we have given training to our
machine learning model by a dataset and we test it by a completely different dataset. Then, it
will create difficulties for our model to understand the correlations between the models. If we
train our model very well and its training accuracy is also very high, but we provide a new
dataset to it, then it will decrease the performance. So we always try to make a machine
learning model which performs well with the training set and also with the test dataset. Here,
we can define these datasets as:

Training Set: A subset of dataset to train the machine learning model, and we already know
the output.

Test set: A subset of dataset to test the machine learning model, and by using the test set,
model predicts the output.

4.3 ML Module

The machine learning model building process for Twitter sentiment analysis begins with
preprocessing the text data. The text is cleaned by removing unwanted characters, stopwords,
and stemming words to their root forms. Following this, a Bag of Words (BoW)
representation is created using CountVectorizer, which transforms the cleaned tweets into a
structured format suitable for model input.

The dataset is then split into training and validation sets using train_test_split, ensuring that
the model is tested on unseen data. Standardization is applied using StandardScaler to
normalize the features, improving the performance of the machine learning models.

Various models, including Random Forest, Logistic Regression, Decision Tree, Support
Vector Machine (SVM), and XGBoost, are trained on the processed data. Each model is
evaluated based on its training and validation accuracy, along with the F1 score, which
balances precision and recall.

The model's performance is further analyzed using confusion matrices, which provide
insights into the classification errors. Among the models tested, the one with the highest
validation accuracy and F1 score is selected as the final model for predicting sentiment on the
test dataset.

4.3.1 Existing Algorithm: Extra Trees Regressor

Extra Trees Regressor (Extremely Randomized Trees) is an ensemble learning method that
aggregates multiple decision trees to make a regression prediction. It is similar to Random
Forests but with additional randomness when constructing the trees. In contrast to traditional
decision trees, Extra Trees do not rely on bootstrapping and instead randomly select both
features and thresholds when splitting nodes.

How It Works:

 Random Feature Splits: For each decision tree, the algorithm selects random subsets
of features, but unlike random forests, Extra Trees further randomize by selecting
thresholds for splits randomly rather than choosing the best possible split.

 Tree Construction: Each tree is fully grown without pruning, increasing bias but
reducing variance.

 Ensemble Averaging: The model aggregates the predictions from multiple trees by
averaging (for regression tasks) or taking the majority vote (for classification).

Architecture:
1. Randomness in Node Splitting: Extra Trees splits nodes by selecting random
features and thresholds, unlike traditional decision trees, which select the best split
based on criteria like Gini impurity or entropy.

2. Ensemble of Trees: It builds a large number of uncorrelated decision trees and

averages their results for regression.

3. Parallelization: Like other ensemble methods, it can parallelize the training of

individual trees, making it computationally efficient for large datasets.

Disadvantages:

 Higher Bias: Due to the randomness in splits, Extra Trees tend to have a higher bias
than models like Random Forests.

 Sensitivity to Noisy Data: The additional randomness can lead to overfitting if the
dataset is small or contains significant noise.

 Interpretability: Like other ensemble methods, the model is not easy to interpret
because it involves many decision trees.

4.3.2 Proposed Algorithm: Gradient Boosting Classifier

Gradient Boosting Classifier (GBC) is a powerful machine learning algorithm that builds an
ensemble of weak learners, usually decision trees, and combines them sequentially to
minimize a loss function. It is a boosting technique where each new tree corrects the errors
made by the previous ones.

How It Works:

 Sequential Learning: Gradient Boosting constructs trees one by one. Each

subsequent tree focuses on the residual errors of the previous trees.

 Gradient Descent: The model uses gradient descent to minimize the loss function by
updating the weights of misclassified or poorly predicted instances.

 Additive Model: Trees are added in sequence, and each tree's contribution is
weighted. The final prediction is the weighted sum of all trees’ outputs.

Architecture:
1. Loss Function: The model optimizes a loss function, which could be binary cross-
entropy (for classification) or mean squared error (for regression).

2. Weak Learners: It typically uses shallow decision trees (stumps) as weak learners.
Each tree corrects the mistakes of the previous ones by focusing on instances with
higher residual errors.

3. Gradient Updates: After each iteration, the gradient of the loss function is calculated
to update the model parameters.

4. Shrinkage: To prevent overfitting, a learning rate is applied to the weights of the trees
to control the contribution of each tree.

Advantages:

 High Accuracy: GBC often outperforms other ensemble models in terms of accuracy,
especially on structured/tabular data.

 Robustness to Overfitting: With appropriate regularization techniques like shrinkage

(learning rate) and early stopping, GBC is less prone to overfitting compared to
standard decision trees.

 Handles Imbalanced Data Well: GBC can be tailored to handle imbalanced datasets
by adjusting the loss function or class weights.

Disadvantages:

 Computationally Expensive: Since trees are built sequentially, it can be slower

compared to parallelizable methods like Random Forest.

 Sensitive to Hyperparameters: Performance can be sensitive to hyperparameters

such as learning rate, number of trees, and tree depth, requiring careful tuning.
CHAPTER 5

UML DIAGRAMS
UML stands for Unified Modeling Language. UML is a standardized general-purpose
modeling language in the field of object-oriented software engineering. The standard is
managed, and was created by, the Object Management Group. The goal is for UML to
become a common language for creating models of object-oriented computer software. In its
current form UML is comprised of two major components: a Meta-model and a notation. In
the future, some form of method or process also be added to; or associated with, UML.

The Unified Modeling Language Is a standard language for specifying, Visualization,

Constructing and documenting the artifacts of software system, as well as for business
modeling and other non-software systems. The UML represents a collection of best
engineering practices that have proven successful in the modeling of large and complex
systems. The UML is a very important part of developing objects-oriented software and the
software development process. The UML uses mostly graphical notations to express the
design of software projects.

GOALS: The Primary goals in the design of the UML are as follows:

 Provide users a ready-to-use, expressive visual modeling Language so that they can
develop and exchange meaningful models.
 Provide extendibility and specialization mechanisms to extend the core concepts.
 Be independent of particular programming languages and development process.
 Provide a formal basis for understanding the modeling language.
 Encourage the growth of OO tools market.
 Support higher level development concepts such as collaborations, frameworks,
patterns and components.
 Integrate best practices.

Class Diagram

The class diagram is used to refine the use case diagram and define a detailed design of the
system. The class diagram classifies the actors defined in the use case diagram into a set of
interrelated classes. The relationship or association between the classes can be either an “is-a”
or “has-a” relationship. Each class in the class diagram may be capable of providing certain
functionalities. These functionalities provided by the class are termed “methods” of the class.
Apart from this, each class may have certain “attributes” that uniquely identify the class.

Use case diagram:

The purpose of use case diagram is to capture the dynamic aspect of a system.
DATA FLOW DIAGRAM

A Data Flow Diagram (DFD) is a visual representation of the flow of data within a system or
process. It is a structured technique that focuses on how data moves through different
processes and data stores within an organization or a system. DFDs are commonly used in
system analysis and design to understand, document, and communicate data flow and
processing

SEQUENCE DIAGRAM

A sequence diagram in Unified Modeling Language (UML) is a kind of interaction diagram

that shows how processes operate with one another and in what order. It is a construct of a
Message Sequence Chart. A sequence diagram shows, as parallel vertical lines (“lifelines”),
different processes or objects that live simultaneously, and as horizontal arrows, the messages
exchanged between them, in the order in which they occur. This allows the specification of
simple runtime scenarios in a graphical manner.
ACTIVITY DIAGRAM

Activity diagram is another important diagram in UML to describe the dynamic aspects of the
system.
DATAFLOW DIAGRAM

A deployment diagram in Unified Modeling Language (UML) is a type of diagram that

shows the physical configuration of hardware and software components in a system. It
illustrates the deployment of software artifacts to nodes, which are usually either hardware
devices or software execution environments. Deployment diagrams are used to model the
hardware topology of a system and the distribution of software components that run on that
hardware.

Component Diagram

A component diagram in UML (Unified Modeling Language) is a type of structural diagram

that visualizes the organization and dependencies among software components in a system. It
shows how components such as classes, interfaces, or modules interact with each other to
form a complete system. Components are represented as rectangles with their name and
sometimes their contents (like provided and required interfaces) listed inside. The diagram
also illustrates the relationships between components, such as dependencies, interfaces, and
interactions, helping to model the system's physical and logical structure, identify reusable
components, and ensure the system is modular and maintainable.
System Architecture

In UML (Unified Modeling Language), system architecture refers to the high-level structure
of a software system, capturing the organization of its components, their relationships, and
interactions. It provides a blueprint that describes how different parts of the system fit
together to fulfill the requirements and objectives. System architecture in UML is often
represented through diagrams like component diagrams, deployment diagrams, and class
diagrams, which visually depict the system's structural and behavioral aspects. This helps in
understanding, designing, and communicating the system's framework effectively to
stakeholders.
CHAPTER 6

SOFTWARE ENVIRONMENT

What is Python?

Below are some facts about Python.

 Python is currently the most widely used multi-purpose, high-level programming

language.
 Python allows programming in Object-Oriented and Procedural paradigms. Python
programs generally are smaller than other programming languages like Java.
 Programmers have to type relatively less and indentation requirement of the language,
makes them readable all the time.
 Python language is being used by almost all tech-giant companies like – Google,
Amazon, Facebook, Instagram, Dropbox, Uber… etc.

The biggest strength of Python is huge collection of standard library which can be used for
the following –

 Machine Learning
 GUI Applications (like Kivy, Tkinter, PyQt etc. )
 Web frameworks like Django (used by YouTube, Instagram, Dropbox)
 Image processing (like Opencv, Pillow)
 Web scraping (like Scrapy, BeautifulSoup, Selenium)
 Test frameworks
 Multimedia

Advantages of Python

Let’s see how Python dominates over other languages.

10. Extensive Libraries

Python downloads with an extensive library and it contain code for various purposes like
regular expressions, documentation-generation, unit-testing, web browsers, threading,
databases, CGI, email, image manipulation, and more. So, we don’t have to write the
complete code for that manually.
2. Extensible

As we have seen earlier, Python can be extended to other languages. You can write some of
your code in languages like C++ or C. This comes in handy, especially in projects.

3. Embeddable

Complimentary to extensibility, Python is embeddable as well. You can put your Python code
in your source code of a different language, like C++. This lets us add scripting capabilities to
our code in the other language.

4. Improved Productivity

The language’s simplicity and extensive libraries render programmers more productive than
languages like Java and C++ do. Also, the fact that you need to write less and get more things
done.

5. IOT Opportunities

Since Python forms the basis of new platforms like Raspberry Pi, it finds the future bright for
the Internet Of Things. This is a way to connect the language with the real world.

6. Simple and Easy

When working with Java, you may have to create a class to print ‘Hello World’. But in
Python, just a print statement will do. It is also quite easy to learn, understand, and code. This
is why when people pick up Python, they have a hard time adjusting to other more verbose
languages like Java.

7. Readable

Because it is not such a verbose language, reading Python is much like reading English. This
is the reason why it is so easy to learn, understand, and code. It also does not need curly
braces to define blocks, and indentation is mandatory. This further aids the readability of the
code.

8. Object-Oriented

This language supports both the procedural and object-oriented programming paradigms.
While functions help us with code reusability, classes and objects let us model the real world.
A class allows the encapsulation of data and functions into one.
9. Free and Open-Source

Like we said earlier, Python is freely available. But not only can you download Python for
free, but you can also download its source code, make changes to it, and even distribute it. It
downloads with an extensive collection of libraries to help you with your tasks.

10. Portable

When you code your project in a language like C++, you may need to make some changes to
it if you want to run it on another platform. But it isn’t the same with Python. Here, you need
to code only once, and you can run it anywhere. This is called Write Once Run Anywhere
(WORA). However, you need to be careful enough not to include any system-dependent
features.

11.Interpreted

Lastly, we will say that it is an interpreted language. Since statements are executed one by
one, debugging is easier than in compiled languages.

Any doubts till now in the advantages of Python? Mention in the comment section.

Advantages of Python Over Other Languages

10. Less Coding

Almost all of the tasks done in Python requires less coding when the same task is done in
other languages. Python also has an awesome standard library support, so you don’t have to
search for any third-party libraries to get your job done. This is the reason that many people
suggest learning Python to beginners.

2. Affordable

Python is free therefore individuals, small companies or big organizations can leverage the
free available resources to build applications. Python is popular and widely used so it gives
you better community support.

The 2019 Github annual survey showed us that Python has overtaken Java in the most
popular programming language category.
3. Python is for Everyone

Python code can run on any machine whether it is Linux, Mac or Windows. Programmers
need to learn different languages for different jobs but with Python, you can professionally
build web apps, perform data analysis and machine learning, automate things, do web
scraping and also build games and powerful visualizations. It is an all-rounder programming
language.

Disadvantages of Python

So far, we’ve seen why Python is a great choice for your project. But if you choose it, you
should be aware of its consequences as well. Let’s now see the downsides of choosing Python
over another language.

10. Speed Limitations

We have seen that Python code is executed line by line. But since Python is interpreted, it
often results in slow execution. This, however, isn’t a problem unless speed is a focal point
for the project. In other words, unless high speed is a requirement, the benefits offered by
Python are enough to distract us from its speed limitations.

2. Weak in Mobile Computing and Browsers

While it serves as an excellent server-side language, Python is much rarely seen on the client-
side. Besides that, it is rarely ever used to implement smartphone-based applications. One
such application is called Carbonnelle.

The reason it is not so famous despite the existence of Brython is that it isn’t that secure.

3. Design Restrictions

As you know, Python is dynamically typed. This means that you don’t need to declare the
type of variable while writing the code. It uses duck-typing. But wait, what’s that? Well, it
just means that if it looks like a duck, it must be a duck. While this is easy on the
programmers during coding, it can raise run-time errors.

4. Underdeveloped Database Access Layers

Compared to more widely used technologies like JDBC (Java DataBase Connectivity) and
ODBC (Open DataBase Connectivity), Python’s database access layers are a bit
underdeveloped. Consequently, it is less often applied in huge enterprises.

5. Simple

No, we’re not kidding. Python’s simplicity can indeed be a problem. Take my example. I
don’t do Java, I’m more of a Python person. To me, its syntax is so simple that the verbosity
of Java code seems unnecessary.

This was all about the Advantages and Disadvantages of Python Programming Language.

Modules Used in Project

NumPy

NumPy is a general-purpose array-processing package. It provides a high-performance

multidimensional array object, and tools for working with these arrays.

It is the fundamental package for scientific computing with Python. It contains various
features including these important ones:

 A powerful N-dimensional array object

 Sophisticated (broadcasting) functions
 Tools for integrating C/C++ and Fortran code
 Useful linear algebra, Fourier transform, and random number capabilities

Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional
container of generic data. Arbitrary datatypes can be defined using NumPy which allows
NumPy to seamlessly and speedily integrate with a wide variety of databases.

Pandas

Pandas is an open-source Python Library providing high-performance data manipulation and

analysis tool using its powerful data structures. Python was majorly used for data munging
and preparation. It had very little contribution towards data analysis. Pandas solved this
problem. Using Pandas, we can accomplish five typical steps in the processing and analysis
of data, regardless of the origin of data load, prepare, manipulate, model, and analyze. Python
with Pandas is used in a wide range of fields including academic and commercial domains
including finance, economics, Statistics, analytics, etc.
Matplotlib

Matplotlib is a Python 2D plotting library which produces publication quality figures in a

variety of hardcopy formats and interactive environments across platforms. Matplotlib can be
used in Python scripts, the Python and Ipython shells, the Jupyter Notebook, web application
servers, and four graphical user interface toolkits. Matplotlib tries to make easy things easy
and hard things possible. You can generate plots, histograms, power spectra, bar charts, error
charts, scatter plots, etc., with just a few lines of code. For examples, see the sample plots and
thumbnail gallery.

For simple plotting the pyplot module provides a MATLAB-like interface, particularly when
combined with Ipython. For the power user, you have full control of line styles, font
properties, axes properties, etc, via an object oriented interface or via a set of functions
familiar to MATLAB users.

Scikit – learn

Scikit-learn provides a range of supervised and unsupervised learning algorithms via a

consistent interface in Python. It is licensed under a permissive simplified BSD license and is
distributed under many Linux distributions, encouraging academic and commercial use.
Python

Install Python Step-by-Step in Windows and Mac

Python a versatile programming language doesn’t come pre-installed on your computer

devices. Python was first released in the year 1991 and until today it is a very popular high-
level programming language. Its style philosophy emphasizes code readability with its
notable use of great whitespace.

The object-oriented approach and language construct provided by Python enables

programmers to write both clear and logical code for projects. This software does not come
pre-packaged with Windows.

How to Install Python on Windows and Mac

There have been several updates in the Python version over the years. The question is how to
install Python? It might be confusing for the beginner who is willing to start learning Python
but this tutorial will solve your query. The latest or the newest version of Python is version
3.7.4 or in other words, it is Python 3.
Note: The python version 3.7.4 cannot be used on Windows XP or earlier devices.

Before you start with the installation process of Python. First, you need to know about your
System Requirements. Based on your system type i.e. operating system and based processor,
you must download the python version. My system type is a Windows 64-bit operating
system. So the steps below are to install python version 3.7.4 on Windows 7 device or to
install Python 3. Download the Python Cheatsheet here.The steps on how to install Python on
Windows 10, 8 and 7 are divided into 4 parts to help understand better.

Download the Correct version into the system

Step 1: Go to the official site to download and install python using Google Chrome or any
other web browser. OR Click on the following link: https://www.python.org

Now, check for the latest and the correct version for your operating system.

Step 2: Click on the Download Tab.

Step 3: You can either select the Download Python for windows 3.7.4 button in Yellow Color
or you can scroll further down and click on download with respective to their version. Here,
we are downloading the most recent python version for windows 3.7.4

Step 4: Scroll down the page until you find the Files option.

Step 5: Here you see a different version of python along with the operating system.
 To download Windows 32-bit python, you can select any one from the three options:
Windows x86 embeddable zip file, Windows x86 executable installer or Windows x86
web-based installer.
 To download Windows 64-bit python, you can select any one from the three options:
Windows x86-64 embeddable zip file, Windows x86-64 executable installer or
Windows x86-64 web-based installer.

Here we will install Windows x86-64 web-based installer. Here your first part regarding
which version of python is to be downloaded is completed. Now we move ahead with the
second part in installing python i.e. Installation

Note: To know the changes or updates that are made in the version you can click on the
Release Note Option.

Installation of Python

Step 1: Go to Download and Open the downloaded python version to carry out the
installation process.
Step 2: Before you click on Install Now, Make sure to put a tick on Add Python 3.7 to PATH.

Step 3: Click on Install NOW After the installation is successful. Click on Close.
With these above three steps on python installation, you have successfully and correctly
installed Python. Now is the time to verify the installation.

Note: The installation process might take a couple of minutes.

Verify the Python Installation

Step 1: Click on Start

Step 2: In the Windows Run Command, type “cmd”.

Step 3: Open the Command prompt option.

Step 4: Let us test whether the python is correctly installed. Type python –V and press Enter.

Step 5: You will get the answer as 3.7.4

Note: If you have any of the earlier versions of Python already installed. You must first
uninstall the earlier version and then install the new one.

Check how the Python IDLE works

Step 1: Click on Start

Step 2: In the Windows Run command, type “python idle”.

Step 3: Click on IDLE (Python 3.7 64-bit) and launch the program

Step 4: To go ahead with working in IDLE you must first save the file. Click on File > Click
on Save

Step 5: Name the file and save as type should be Python files. Click on SAVE. Here I have
named the files as Hey World.

Step 6: Now for e.g. enter print (“Hey World”) and Press Enter.
You will see that the command given is launched. With this, we end our tutorial on how to
install Python. You have learned how to download python for windows into your respective
operating system.

Note: Unlike Java, Python does not need semicolons at the end of the statements otherwise it
won’t work.

CHAPTER 7

SYSTEM REQUIREMENTS

Software Requirements

The functional requirements or the overall description documents include the product
perspective and features, operating system and operating environment, graphics requirements,
design constraints and user documentation.

The appropriation of requirements and implementation constraints gives the general overview
of the project in regard to what the areas of strength and deficit are and how to tackle them.

 Python IDLE 3.7 version (or)

 Anaconda 3.7 (or)
 Jupiter (or)
 Google colab
Hardware Requirements

Minimum hardware requirements are very dependent on the particular software being
developed by a given Enthought Python / Canopy / VS Code user. Applications that need to
store large arrays/objects in memory will require more RAM, whereas applications that need
to perform numerous calculations or tasks more quickly will require a faster processor.

 Operating system : Windows, Linux

 Processor : minimum intel i3
 Ram : minimum 4 GB
 Hard disk : minimum 250GB

CHAPTER 8

FUNCTIONAL REQUIREMENTS
OUTPUT DESIGN

Outputs from computer systems are required primarily to communicate the results of
processing to users. They are also used to provides a permanent copy of the results for later
consultation. The various types of outputs in general are:

 External Outputs, whose destination is outside the organization

 Internal Outputs whose destination is within organization and they are the
 User’s main interface with the computer.
 Operational outputs whose use is purely within the computer department.
 Interface outputs, which involve the user in communicating directly.

Output Definition

The outputs should be defined in terms of the following points:

 Type of the output
 Content of the output
 Format of the output
 Location of the output
 Frequency of the output
 Volume of the output
 Sequence of the output

It is not always desirable to print or display data as it is held on a computer. It should be

decided as which form of the output is the most suitable.

Input Design

Input design is a part of overall system design. The main objective during the input design is
as given below:

 To produce a cost-effective method of input.

 To achieve the highest possible level of accuracy.
 To ensure that the input is acceptable and understood by the user.

Input Stages

The main input stages can be listed as below:

 Data recording
 Data transcription
 Data conversion
 Data verification
 Data control
 Data transmission
 Data validation
 Data correction

Input Types

It is necessary to determine the various types of inputs. Inputs can be categorized as follows:

 External inputs, which are prime inputs for the system.

 Internal inputs, which are user communications with the system.
 Operational, which are computer department’s communications to the system?
 Interactive, which are inputs entered during a dialogue.

Input Media

At this stage choice has to be made about the input media. To conclude about the input media
consideration has to be given to;

 Type of input
 Flexibility of format
 Speed
 Accuracy
 Verification methods
 Rejection rates
 Ease of correction
 Storage and handling requirements
 Security
 Easy to use
 Portability

Keeping in view the above description of the input types and input media, it can be said that
most of the inputs are of the form of internal and interactive. As

Input data is to be the directly keyed in by the user, the keyboard can be considered to be the
most suitable input device.

Error Avoidance

At this stage care is to be taken to ensure that input data remains accurate form the stage at
which it is recorded up to the stage in which the data is accepted by the system. This can be
achieved only by means of careful control each time the data is handled.

Error Detection

Even though every effort is make to avoid the occurrence of errors, still a small proportion of
errors is always likely to occur, these types of errors can be discovered by using validations to
check the input data.

Data Validation
Procedures are designed to detect errors in data at a lower level of detail. Data validations
have been included in the system in almost every area where there is a possibility for the user
to commit errors. The system will not accept invalid data. Whenever an invalid data is keyed
in, the system immediately prompts the user and the user has to again key in the data and the
system will accept the data only if the data is correct. Validations have been included where
necessary.

The system is designed to be a user friendly one. In other words the system has been
designed to communicate effectively with the user. The system has been designed with
popup menus.

User Interface Design

It is essential to consult the system users and discuss their needs while designing the user
interface:

User Interface Systems Can Be Broadly Clasified As:

 User initiated interface the user is in charge, controlling the progress of the
user/computer dialogue. In the computer-initiated interface, the computer selects the
next stage in the interaction.
 Computer initiated interfaces

In the computer-initiated interfaces the computer guides the progress of the user/computer
dialogue. Information is displayed and the user response of the computer takes action or
displays further information.

User Initiated Interfaces

User initiated interfaces fall into two approximate classes:

 Command driven interfaces: In this type of interface the user inputs commands or
queries which are interpreted by the computer.
 Forms oriented interface: The user calls up an image of the form to his/her screen and
fills in the form. The forms-oriented interface is chosen because it is the best choice.

Computer-Initiated Interfaces

The following computer – initiated interfaces were used:

 The menu system for the user is presented with a list of alternatives and the user
chooses one; of alternatives.
 Questions – answer type dialog system where the computer asks question and takes
action based on the basis of the users reply.

Right from the start the system is going to be menu driven, the opening menu displays the
available options. Choosing one option gives another popup menu with more options. In this
way every option leads the users to data entry form where the user can key in the data.

Error Message Design

The design of error messages is an important part of the user interface design. As user is
bound to commit some errors or other while designing a system the system should be
designed to be helpful by providing the user with information regarding the error he/she has
committed.

This application must be able to produce output at different modules for different inputs.

Performance Requirements

Performance is measured in terms of the output provided by the application. Requirement

specification plays an important part in the analysis of a system. Only when the requirement
specifications are properly given, it is possible to design a system, which will fit into required
environment. It rests largely in the part of the users of the existing system to give the
requirement specifications because they are the people who finally use the system. This is
because the requirements have to be known during the initial stages so that the system can be
designed according to those requirements. It is very difficult to change the system once it has
been designed and on the other hand designing a system, which does not cater to the
requirements of the user, is of no use.

The requirement specification for any system can be broadly stated as given below:

 The system should be able to interface with the existing system

 The system should be accurate
 The system should be better than the existing system
 The existing system is completely dependent on the user to perform all the duties.
CHAPTER 09

SOURCE CODE

import pandas as pd

import nltk

nltk.download('wordnet')

from nltk.corpus import stopwords

import matplotlib.pyplot as plt

import os

import seaborn as sns

import re

import wordcloud

import warnings

warnings.filterwarnings('ignore')

import joblib

import numpy as np

from sklearn.utils import resample

from sklearn.ensemble import ExtraTreesClassifier

from sklearn.ensemble import GradientBoostingClassifier

from sklearn.feature_extraction.text import CountVectorizer

from sklearn.linear_model import LogisticRegression

from sklearn.model_selection import train_test_split

from sklearn.metrics import

accuracy_score,confusion_matrix,classification_report,precision_score,recall_score,f1_score

from nltk.stem.porter import PorterStemmer

from nltk.tokenize import TweetTokenizer

from nltk.stem import WordNetLemmatizer

# In[2]:

df=pd.read_csv('train_E6oV3lV.csv')

df
# In[3]:

df.head()

# In[4]:

random_sample=df.sample(n=100, random_state=1)

random_sample=random_sample.drop('label',axis=1)

random_sample_test_path="test.csv"

random_sample.to_csv(random_sample_test_path,index=False)

# In[5]:

df.isnull().sum()

# In[6]:

df.shape

# In[7]:

df.columns

# In[8]:
df.nunique()

# In[9]:

df.describe()

# In[10]:

df.info()

# In[11]:

tk=TweetTokenizer()

ps = PorterStemmer()

lem=WordNetLemmatizer()

def cleaning(s):

s = str(s)

s = s.lower()

s = re.sub('\s\W',' ',s)

s = re.sub('\W,\s',' ',s)

s = re.sub(r'[^\w]', ' ', s)

s = re.sub("\d+", "", s)

s = re.sub('\s+',' ',s)
s = re.sub('[!@#$_ðâï¼ªó¾ãº½çæåä¹³ìà¹ëêµéà³à²ùø]', '', s)

s = s.replace("co","")

s = s.replace("https","")

s = s.replace(",","")

s = s.replace("[\w*"," ")

s=s.lower()

s=tk.tokenize(s)

s=[ps.stem(word) for word in s if not word in set(stopwords.words('english'))]

s=[lem.lemmatize(word) for word in s]

s= ' '.join(s)

return s

# In[12]:

df['content'] = [cleaning(s) for s in df['tweet']]

df['content']

# In[13]:

all_words = ' '.join([text for text in df['content']])

all_words

# In[14]:
#creating a count plot:

category_order = ['Negative', 'Positive']

plt.figure(figsize=(10, 6))

ax = sns.countplot(data=df, x='label')

plt.xlabel('label')

ax.set_xticklabels(category_order)

plt.ylabel('Count')

plt.title('Count of label')

for p in ax.patches:

ax.annotate(f'{p.get_height()}', (p.get_x() + p.get_width() / 2., p.get_height()),

ha='center', va='center', fontsize=10, color='black', xytext=(0, 5),

textcoords='offset points')

plt.show()

# In[15]:

normal_words =' '.join([text for text in df['content'][df['label'] == 0]])

normal_words

# In[16]:

def hashtag_extract(x):

hashtags = []

for i in x:
ht =re.findall(r'\w+', i)

hashtags.append(ht)

return hashtags

# In[17]:

HT_regular = hashtag_extract(df['content'][df['label'] == 0])

HT_regular

# In[18]:

HT_negative = hashtag_extract(df['content'][df['label'] == 1])

HT_negative

# In[19]:

HT_regular = sum(HT_regular,[])

HT_regular

# In[20]:

HT_negative = sum(HT_negative,[])

HT_negative
# In[21]:

import seaborn as sns

a = nltk.FreqDist(HT_regular)

d = pd.DataFrame({'Hashtag': list(a.keys()),'Count': list(a.values())})

# In[22]:

# selecting top 10 most frequent hashtags

d = d.nlargest(columns="Count", n = 10)

plt.figure(figsize=(16,5))

ax = sns.barplot(data=d, x= "Hashtag", y = "Count")

ax.set(ylabel = 'Count')

plt.show()

# In[23]:

b = nltk.FreqDist(HT_negative)

e = pd.DataFrame({'Hashtag': list(b.keys()), 'Count': list(b.values())})

# In[24]:

# selecting top 10 most frequent hashtags

e = e.nlargest(columns="Count", n = 10)

plt.figure(figsize=(16,5))

ax = sns.barplot(data=e, x= "Hashtag", y = "Count")

ax.set(ylabel = 'Count')

plt.show()

# In[25]:

vectorizer =
CountVectorizer(max_features=3000,stop_words=stopwords.words('english')).fit(df['content']
)

vectorizer

# In[26]:

X=vectorizer.transform(df['content']).toarray()

# In[27]:

y=df.iloc[:,1]

# In[28]:
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3,random_state=42)

# In[29]:

X_train.shape

# In[30]:

y_train.shape

# In[31]:

labels=['Negative', 'Positive']

# In[32]:

precision = []

recall = []

fscore = []

accuracy = []

# In[33]:

#function to calculate various metrics such as accuracy, precision etc

def calculateMetrics(algorithm, testY,predict):

testY = testY.astype('int')

predict = predict.astype('int')

p = precision_score(testY, predict,average='macro') * 100

r = recall_score(testY, predict,average='macro') * 100

f = f1_score(testY, predict,average='macro') * 100

a = accuracy_score(testY,predict)*100

accuracy.append(a)

precision.append(p)

recall.append(r)

fscore.append(f)

print(algorithm+' Accuracy : '+str(a))

print(algorithm+' Precision : '+str(p))

print(algorithm+' Recall : '+str(r))

print(algorithm+' FSCORE : '+str(f))

report=classification_report(predict, testY,target_names=labels)

print('\n',algorithm+" classification report\n",report)

conf_matrix = confusion_matrix(testY, predict)

plt.figure(figsize =(5, 5))

ax = sns.heatmap(conf_matrix, xticklabels = labels, yticklabels = labels, annot = True,

cmap="Blues" ,fmt ="g");

ax.set_ylim([0,len(labels)])

plt.title(algorithm+" Confusion matrix")

plt.ylabel('True class')

plt.xlabel('Predicted class')
plt.show()

# ### Existing - ExtraTreesClassifier

# In[34]:

#ExtraTreesClassifier:

if os.path.exists('ExtraTreesClassifier.pkl'):

# Load the trained model from the file

ETC= joblib.load('ExtraTreesClassifier.pkl')

print("Model loaded successfully.")

predict = ETC.predict(X_test)

calculateMetrics("ExtraTreesClassifier", predict, y_test)

else:

# Train the model (assuming X_train and y_train are defined)

ETC=ExtraTreesClassifier()

ETC.fit(X_train, y_train)

# Save the trained model to a file

joblib.dump(ETC, 'ExtraTreesClassifier.pkl')

print("Model saved successfully.")

predict = ETC.predict(X_test)

calculateMetrics("ExtraTreesClassifier", predict, y_test)

# ### proposed - GradientBoostingClassifier

# In[35]:

#GradientBoostingClassifier:

if os.path.exists('GradientBoostingClassifier.pkl'):

# Load the trained model from the file

GBC= joblib.load('GradientBoostingClassifier.pkl')

print("Model loaded successfully.")

predict = GBC.predict(X_test)

calculateMetrics("GradientBoostingClassifier", predict, y_test)

else:

# Train the model (assuming X_train and y_train are defined)

GBC=GradientBoostingClassifier()

GBC.fit(X_train, y_train)

# Save the trained model to a file

joblib.dump(GBC, 'GradientBoostingClassifier.pkl')

print("Model saved successfully.")

predict = GBC.predict(X_test)

calculateMetrics("GradientBoostingClassifier", predict, y_test)

# In[36]:

test=pd.read_csv('test.csv')

test
# In[37]:

tk=TweetTokenizer()

ps = PorterStemmer()

lem=WordNetLemmatizer()

def cleaning(s):

s = str(s)

s = s.lower()

s = re.sub('\s\W',' ',s)

s = re.sub('\W,\s',' ',s)

s = re.sub(r'[^\w]', ' ', s)

s = re.sub("\d+", "", s)

s = re.sub('\s+',' ',s)

s = re.sub('[!@#$_ðâï¼ªó¾ãº½çæåä¹³ìà¹ëêµéà³à²ùø]', '', s)

s = s.replace("co","")

s = s.replace("https","")

s = s.replace(",","")

s = s.replace("[\w*"," ")

s=s.lower()

s=tk.tokenize(s)

s=[ps.stem(word) for word in s if not word in set(stopwords.words('english'))]

s=[lem.lemmatize(word) for word in s]

s= ' '.join(s)
return s

# In[38]:

test['content'] = [cleaning(s) for s in test['tweet']]

test['content']

# In[43]:

from sklearn.preprocessing import StandardScaler

tran = StandardScaler()

X_test = vectorizer.transform(test['content']).toarray()

X_test = tran.fit_transform(X_test)

# In[45]:

test['Predict_label']=GBC.predict(X_test)

test['Predict_label']

test
CHAPTER 10

RESULTS AND DISCUSSION

10.1 Implementation Description

The provided code seems to be a comprehensive analysis and implementation of sentiment
analysis on Twitter data using various machine learning models. Let's break down the key
steps and results:

Data Loading and Exploration: Loaded training and test datasets using Pandas. Checked
the shape of the datasets and displayed the first few rows. Checked for missing values in both
datasets.
Exploratory Data Analysis (EDA): Explored negative and positive comments in the training
set. Visualized the distribution of tweet lengths for both training and test sets. Created a bar
plot to show the distribution of sentiment labels. Analyzed the variation in tweet length with
respect to sentiment labels.

Text Data Preprocessing: Used CountVectorizer to convert tweets into a bag-of-words

representation. Created word frequency plots to identify the most frequently occurring words.
Generated word clouds for neutral and negative words. Extracted and visualized hashtags.

Word Embeddings with Word2Vec: Implemented Word2Vec for word embeddings.

Explored word similarities using positive and negative examples.

Text Tokenization and Labeling: Tokenized words in the training set using NLTK. Utilized
Gensim to create a Word2Vec model. Labeled each tweet for further processing.

Text Cleaning and Feature Engineering: Removed unwanted patterns, performed

stemming, and removed stopwords. Created Bag of Words representation using
CountVectorizer. Split the data into training and validation sets.

Model Building: Applied various machine learning models including Random Forest,
Logistic Regression, Decision Tree, Support Vector Machine (SVM), and XGBoost.
Standardized the data using StandardScaler.

Model Evaluation:

Evaluated models on training and validation sets using accuracy, F1 score, and confusion
matrices. Provided results for each model, including training accuracy, validation accuracy,
F1 score, and confusion matrix.

10.2 Dataset Description

The provided dataset appears to be related to Twitter data, containing information such as
tweet IDs, labels, and the content of the tweets. Here's a detailed description:

 ID Column: Represents the unique identifier for each tweet.

 Label Column: Indicates whether the tweet is labeled as 0 or 1. In this context, it
seems to be a binary classification, with 0 possibly representing non-dysfunctional
content and 1 representing dysfunctional content.
 Tweet Column: Contains the actual text content of the tweets. The tweets seem to vary
in topics, including mentions of Lyft, birthday wishes, expressions of love,
motivational content, and more.

It's important to note that without additional context, the specific criteria for labeling tweets
as dysfunctional or the context behind the labeling are not clear. The dataset comprises a
diverse range of tweets, suggesting it may be used for sentiment analysis, classification, or
related natural language processing tasks.

10.3 Results and Description

Figure 1: Data frame used for Twitter data analysis figure likely represents the structure and
content of the data frame used for Twitter data analysis. It might include information such as
tweet text, sentiment labels, other relevant features.

Figure 2: Count plot of target column figure is a visual representation, likely in the form of a
bar chart, showing the distribution or count of different classes in the target column. In the
context of Twitter data analysis, the target column might represent sentiment labels such as
positive, negative, or neutral.

Figure1: Data frame used for Twitter data analysis

Figure 2: Count plot of target column

Figure 3: ETC Classification report

Figure 3 shows

Accuracy: 92.87% This indicates that the model correctly predicted 92.87% of the data
points in the test set. It's a general measure of overall performance.
Precision: 50.0% This metric measures the proportion of positive predictions that were
actually correct. In other words, out of all the instances the model predicted as positive, 50%
were truly positive.

Recall: 46.43% This metric measures the proportion of actual positive instances that were
correctly predicted. It indicates how well the model was able to identify all the positive cases.

F1-score: 48.15% This metric combines precision and recall into a single value. It provides a
balanced measure of both metrics, considering both the model's ability to correctly predict
positive instances and its ability to avoid false positives.

Figure 4: ETC Confusion Matrix

The confusion matrix for the ExtraTreesClassifier model shows the distribution of predicted
and actual classes. The diagonal elements represent correct predictions (e.g., 8905 Negative
instances were correctly predicted as Negative). Off-diagonal elements indicate incorrect
predictions (e.g., 684 Positive instances were incorrectly predicted as Negative). The color
intensity of each cell corresponds to the number of instances in that category, with darker
colors indicating larger quantities.

Figure 5: GBC Classification report

Figure 5 shows that

Accuracy: This is the overall correctness of the model. It's calculated as the number of
correct predictions divided by the total number of predictions. In this case, the accuracy is
94.69%, which means the model correctly predicted 94.69% of the samples.

Precision: This measures how many of the positive predictions made by the model were
actually correct. It's calculated as the number of true positives divided by the sum of true
positives and false positives. In this case, the precision is 65.22%, which means that out of all
the samples the model predicted as positive, only 65.22% were truly positive.

Recall: This measures how many of the actual positive samples the model correctly
identified. It's calculated as the number of true positives divided by the sum of true positives
and false negatives. In this case, the recall is 90.18%, which means that the model correctly
identified 90.18% of the positive samples.

F1-score: This is a harmonic mean of precision and recall. It provides a balance between
precision and recall. A higher F1-score indicates better overall performance. In this case, the
F1-score is 71.27%, which is a good balance between precision and recall.
Classification report: This table provides a more detailed breakdown of the model's
performance for each class (negative and positive). It includes precision, recall, F1-score, and
support for each class.

the model achieved high accuracy (94.69%) but had some limitations in precision (65.22%)
and recall (90.18%). The F1-score of 71.27% indicates a reasonable balance between
precision and recall. The classification report provides further insights into the model's
performance for each class.

Figure 6: GBC Classification Report

Figure 6 shows the

 True Positive (TP): Correctly predicted positive instances.

 True Negative (TN): Correctly predicted negative instances.
 False Positive (FP): Incorrectly predicted positive instances (Type I error).
 False Negative (FN): Incorrectly predicted negative instances (Type II error).

Extracting Values from the Confusion Matrix:

Based on the provided image, we can extract the following values:

 TP: 8869
 TN: 36
 FP: 473
 FN: 211

Numerical Explanation:

 Model Accuracy: The overall correctness of the model.

o Accuracy = (TP + TN) / (TP + TN + FP + FN)
o Accuracy = (8869 + 36) / (8869 + 36 + 473 + 211) ≈ 0.9345
 Precision: The proportion of positive predictions that were actually correct.
o Precision = TP / (TP + FP)
o Precision = 8869 / (8869 + 473) ≈ 0.9487
 Recall: The proportion of actual positive instances that were correctly predicted.
o Recall = TP / (TP + FN)
o Recall = 8869 / (8869 + 211) ≈ 0.9765
 F1-score: The harmonic mean of precision and recall.
o F1-score = 2 * (Precision * Recall) / (Precision + Recall)
o F1-score = 2 * (0.9487 * 0.9765) / (0.9487 + 0.9765) ≈ 0.9623
Figure 7: Predicted Output

Figure 7 shows the predicted output last column (Predict_label).

CHAPTER 11

CONCLUSION AND FUTURE SCOPE

11.1 Conclusion

With the advancement of web technology and its growth, there is a huge volume of data
present on the web for internet users and a lot of data is generated too. The Internet has
become a platform for online learning, exchanging ideas and sharing opinions. Social
networking sites like Twitter, Facebook, Google+ are rapidly gaining popularity as they allow
people to share and express their views about topics, have discussions with different
communities, or post messages across the world. Therefore, this project implemented the
sentiment analysis of twitter dataset for opinion mining using NLP, AI, and lexicon-based
approaches, together with evaluation metrics. Using various machine learning algorithms like
Naive Bayes, and logistic regression, this work provided research on twitter data streams. In
addition, this project has also discussed general challenges and applications of Sentiment
Analysis on Twitter

11.2 Future Scope

The increasing prevalence of social media platforms, particularly Twitter, has significantly
heightened the importance of real-time data analysis. Twitter data has emerged as a crucial
source for gaining valuable insights into public opinions, sentiments, and ongoing trends.
This surge in significance has been particularly notable in recent years with the exponential
growth of social media platforms. Researchers and businesses alike have come to
acknowledge the vast potential of Twitter data, leveraging it for understanding public
sentiment, predicting trends, and conducting insightful market research.As the utilization of
Twitter data became more widespread, various methods and algorithms were developed to
efficiently process and classify this unique form of social media content. Traditional systems
often relied on basic text processing techniques, including tokenization, stemming, and stop-
word removal. While these techniques have proven useful, they may not suffice for handling
the distinctive characteristics of Twitter data, such as hashtags, mentions, and emoticons. The
unstructured and noisy nature of Twitter data further complicates matters, presenting
challenges for effective analysis. Consequently, there has been a growing recognition of the
need for a comprehensive pre-processing approach to address these challenges.
Businesses, researchers, and organizations have increasingly come to rely on Twitter data for
tasks such as sentiment analysis, brand monitoring, and trend prediction. To extract
meaningful insights from this rich source of information, effective pre-processing is essential.
This involves the removal of irrelevant information and noise while preserving the context
and nuances inherent in social media language. Therefore, this research advocates for an
efficient classification of Twitter data through the application of machine learning algorithms,
supported by a comprehensive pre-processing approach.
REFERENCES

[1] Neogi, A. S., Garg, K. A., Mishra, R. K., & Dwivedi, Y. K. (2021). Sentiment
analysis and classification of Indian farmers’ protest using twitter data.
International Journal of Information Management Data Insights, 1(2),
100019. https://doi.org/10.1016/j.jjimei.2021.100019.
[2] Behl, S., Rao, A., Aggarwal, S., Chadha, S., & Pannu, H. (2021). Twitter for
disaster relief through sentiment analysis for COVID-19 and natural hazard
crises. International Journal of Disaster Risk Reduction, 55, 102101.
https://doi.org/10.1016/j.ijdrr.2021.102101.
[3] Tan, K. L., Lee, C. P., Lim, K. M., & Anbananthen, K. S. M. (2022). Sentiment
Analysis With Ensemble Hybrid Deep Learning Model. IEEE Access, 10,
103694–103704. https://doi.org/10.1109/access.2022.3210182.
[4] Lu, Q., Zhu, Z., Zhang, D., Wu, W., & Guo, Q. (2020). Interactive Rule
Attention Network for Aspect-Level Sentiment Analysis. IEEE Access, 8, 52505-
52516,, https://doi.org/10.1109/ACCESS.2020.2981139.
[5] Mehta, K & Panda, S. (2019). A Comparative Analysis Of Sentiment analysis
In Big Data. International Journal of Computer Science and Information
Security, 17, 31-40.
[6] J He, J., Wumaier, A., Kadeer, Z., Sun, W., Xin, X., & Zheng, L. (2022). A Local
and Global Context Focus Multilingual Learning Model for Aspect-Based
Sentiment Analysis. IEEE Access, 10, 84135–84146.
https://doi.org/10.1109/access.2022.3197218.
[7] E. Psomakelis, K. Tserpes, D. Anagnostopoulos, and T. Varvarigou, “Comparing
methods for twitter sentiment analysis,” KDIR 2014 -Proceedings of the Int. Conf.
on Knowledge Discovery and Information Retrieval, pp. 225-232, 2014.
[8] Qurat Tul Ain_, Mubashir Ali_, Amna Riazy, Amna Noureenz, Muhammad
Kamranz, Babar Hayat_ and A. Rehman, Sentiment Analysis Using Deep
Learning Techniques: A Review , International Journal of Advanced
Computer Science and Applications, Vol. 8, No. 6, 2017.
[9] A. Lopez-Chau, D. Valle-Cruz, and R. Sandoval-Almaz ́ an, “Sentiment ́
Analysis of Twitter Data Through Machine Learning Techniques,” Software
Engineering in the Era of Cloud Computing, pp. 185–209, 2020. Publisher:
Springer, Cham.
[10] [17]P. Kalaivani and D. Dinesh, “Machine Learning Approach to
Analyze Classification Result for Twitter Sentiment,” in 2020 International
Conference on Smart Electronics and Communication (ICOSEC), (Trichy, India),
pp. 107–112, IEEE, Sept. 2020.
[11] [18]A. B. S, R. D. B, R. K. M, and N. M, “Real Time Twitter Sentiment
Analysis using Natural Language Processing,” International Journal of
Engineering Research & Technology, vol. 9, July 2020. Publisher: IJERT-
International Journal of Engineering Research & Technology.[19]J. Ranganathan
and A. Tzacheva, “Emotion Mining in Social Media Data,” Procedia Computer
Science, vol. 159, pp. 58–66, Jan. 2019.[20]S. Xiong, H. Lv, W. Zhao, and D. Ji,
“Towards Twitter sentiment classification by multi-level sentiment-enriched
word embeddings,” Neurocomputing, vol. 275, pp. 2459–2466, Jan. 2018.
[12] S. Aloufi and A. E. Saddik, "Sentiment Identification in Football-Specific
Tweets," in IEEE Access, vol. 6, pp. 78609-78621, 2018, doi:
10.1109/ACCESS.2018.2885117.
[13] S. A. El Rahman, F. A. AlOtaibi and W. A. AlShehri, "Sentiment Analysis of
Twitter Data," 2019 International Conference on Computer and Information Sciences
(ICCIS), 2019, pp. 1-4, doi: 10.1109/ICCISci.2019.8716464.
[14] Arora, M., Kansal, V. Character level embedding with deep convolutional
neural network for text normalization of unstructured data for Twitter sentiment
analysis. Soc. Netw. Anal. Min. 9, 12 (2019). https://doi.org/10.1007/s13278-019-
0557-y
[15] L. Wang, J. Niu and S. Yu, "SentiDiff: Combining Textual Information and
Sentiment Diffusion Patterns for Twitter Sentiment Analysis," in IEEE Transactions
on Knowledge and Data Engineering, vol. 32, no. 10, pp. 2026-2039, 1 Oct. 2020,
doi: 10.1109/TKDE.2019.2913641.

DPHFM A Deep Parallel Hybrid Fusion Model For Disa
No ratings yet
DPHFM A Deep Parallel Hybrid Fusion Model For Disa
21 pages
The State-of-the-Art in Twitter Sentiment Analysis, AReview and Benchmark Evaluation
No ratings yet
The State-of-the-Art in Twitter Sentiment Analysis, AReview and Benchmark Evaluation
29 pages
Michael Final Project
100% (1)
Michael Final Project
59 pages
Data Preprocessing in Sentiment Analysis Using Twitter Data: July 2019
No ratings yet
Data Preprocessing in Sentiment Analysis Using Twitter Data: July 2019
5 pages
Bhumesh RD
No ratings yet
Bhumesh RD
9 pages
2019, Pradha - Effective Text Data Preprocessing Technique For Sentiment Analysis in Social Media Data
No ratings yet
2019, Pradha - Effective Text Data Preprocessing Technique For Sentiment Analysis in Social Media Data
8 pages
Sentiment Analysis On Twitter Data-Set Using Naive Bayes Algorithm
No ratings yet
Sentiment Analysis On Twitter Data-Set Using Naive Bayes Algorithm
4 pages
Twitter Data Mining For Sentiment Analysis On Peoples Feedback Against Government Public Policy
100% (2)
Twitter Data Mining For Sentiment Analysis On Peoples Feedback Against Government Public Policy
13 pages
Proposalwriting
No ratings yet
Proposalwriting
16 pages
Sentiment Analysis Based Twitter Tweets Classification Using Data Embedded With LSTM Technique
No ratings yet
Sentiment Analysis Based Twitter Tweets Classification Using Data Embedded With LSTM Technique
9 pages
ProjectFinalReport 2copies
No ratings yet
ProjectFinalReport 2copies
26 pages
Crowd Sourcing Platform IEEE Paper 1
No ratings yet
Crowd Sourcing Platform IEEE Paper 1
7 pages
Conference Template A4
No ratings yet
Conference Template A4
9 pages
FML Project Report
No ratings yet
FML Project Report
18 pages
A Deep Learningbased Social Media Text Analysis Framework For Disaster Resource Management
No ratings yet
A Deep Learningbased Social Media Text Analysis Framework For Disaster Resource Management
14 pages
Fin Irjmets1715854730
No ratings yet
Fin Irjmets1715854730
8 pages
Analyzing and Ranking Prevalent News Over Social Media
No ratings yet
Analyzing and Ranking Prevalent News Over Social Media
12 pages
Fake News Synopsis
No ratings yet
Fake News Synopsis
10 pages
Twitter Sentiment Analysis
No ratings yet
Twitter Sentiment Analysis
7 pages
Finalreview 1
No ratings yet
Finalreview 1
4 pages
IJCRT2207068
No ratings yet
IJCRT2207068
5 pages
Implementation of Sentiment Analysis On Twitter Data
No ratings yet
Implementation of Sentiment Analysis On Twitter Data
6 pages
Sentiment Analysis Using Twitter Data
No ratings yet
Sentiment Analysis Using Twitter Data
7 pages
Unlocking Twitter's Sentiments: A Deep Dive Into Sentiment Analysis
No ratings yet
Unlocking Twitter's Sentiments: A Deep Dive Into Sentiment Analysis
8 pages
Effective Approach For Sentiment Opinion Mining Using Natural Language Extraction and Tweets Evaluation
No ratings yet
Effective Approach For Sentiment Opinion Mining Using Natural Language Extraction and Tweets Evaluation
8 pages
Minor 1
No ratings yet
Minor 1
20 pages
TSA Synopsis
No ratings yet
TSA Synopsis
18 pages
Proposed Preprocessing Methods For Manipulate Text of Tweet
No ratings yet
Proposed Preprocessing Methods For Manipulate Text of Tweet
12 pages
10 1109@icict48043 2020 9112546
No ratings yet
10 1109@icict48043 2020 9112546
6 pages
Anand Institute of Higher Technology Department of Computer Science and Engineering ACADEMIC YEAR: 2018-19 Mini Project Report
No ratings yet
Anand Institute of Higher Technology Department of Computer Science and Engineering ACADEMIC YEAR: 2018-19 Mini Project Report
9 pages
Machine Learning For Sentiment Analysis of Twitter Data
No ratings yet
Machine Learning For Sentiment Analysis of Twitter Data
9 pages
Sentiment Analysis of Twitter Data by Making Use of SVM Random Forest and Decision Tree Algorithm
No ratings yet
Sentiment Analysis of Twitter Data by Making Use of SVM Random Forest and Decision Tree Algorithm
6 pages
Twitter Sentiment Analysis Using Machine Learning Algorithms IJERTV12IS070128
No ratings yet
Twitter Sentiment Analysis Using Machine Learning Algorithms IJERTV12IS070128
3 pages
Essentials of Data Analysis
From Everand
Essentials of Data Analysis
Agasti Khatri
No ratings yet
New Seminar Prompt
No ratings yet
New Seminar Prompt
34 pages
B3 Song Popularity Analysis
No ratings yet
B3 Song Popularity Analysis
65 pages
Blockchain Hacking Preview
100% (1)
Blockchain Hacking Preview
37 pages
B9 Telemedicine Image
No ratings yet
B9 Telemedicine Image
61 pages
Data Mining For Business Analytics & Data Analysis In Python
From Everand
Data Mining For Business Analytics & Data Analysis In Python
Book Option
No ratings yet
B10 Online Car Rental Management
No ratings yet
B10 Online Car Rental Management
57 pages
Data Science Projects for thesis and Portfolio: Solving Political Problems
From Everand
Data Science Projects for thesis and Portfolio: Solving Political Problems
Dr. Zemelak Goraga
No ratings yet
B5 Connect Farmers With Customers
No ratings yet
B5 Connect Farmers With Customers
55 pages
Mastering Data Science and Analytics: The Power of Data: From Analysis to Action in the Modern World
From Everand
Mastering Data Science and Analytics: The Power of Data: From Analysis to Action in the Modern World
Finnley Harper
No ratings yet
A Guide to Data Science and Analytics: Navigating the Data Deluge: Tools, Techniques, and Applications
From Everand
A Guide to Data Science and Analytics: Navigating the Data Deluge: Tools, Techniques, and Applications
Juniper Blake
No ratings yet
Twitter Sentiment Analysis Using Deep Learning
No ratings yet
Twitter Sentiment Analysis Using Deep Learning
17 pages
Data Science and Analytics: Transforming Raw Data into Actionable Insights: A Comprehensive Guide
From Everand
Data Science and Analytics: Transforming Raw Data into Actionable Insights: A Comprehensive Guide
Marlowe Reyes
No ratings yet
Data Science and Analytics Essentials: The Revolution of Decision-Making: Leveraging Data in the Digital Age
From Everand
Data Science and Analytics Essentials: The Revolution of Decision-Making: Leveraging Data in the Digital Age
Daniel Richards
No ratings yet
B5 Stator False Detection
No ratings yet
B5 Stator False Detection
59 pages
Meghnaghat Power Plant
No ratings yet
Meghnaghat Power Plant
65 pages
A Framework To Predict Social Crimes Using Twitter Tweets
No ratings yet
A Framework To Predict Social Crimes Using Twitter Tweets
5 pages
Normalizer Free Networks
No ratings yet
Normalizer Free Networks
22 pages
PYTHON FOR DATA ANALYTICS: Mastering Python for Comprehensive Data Analysis and Insights (2023 Guide for Beginners)
From Everand
PYTHON FOR DATA ANALYTICS: Mastering Python for Comprehensive Data Analysis and Insights (2023 Guide for Beginners)
Waldo Todd
No ratings yet
DA Project Report
No ratings yet
DA Project Report
17 pages
Twitter Sentiment Analysis With Textblob
No ratings yet
Twitter Sentiment Analysis With Textblob
6 pages
Dasar Mesin Elektrik G-M Saja
No ratings yet
Dasar Mesin Elektrik G-M Saja
45 pages
Exam Questions With Answers
No ratings yet
Exam Questions With Answers
11 pages
Module 8 Artificial Intelligence in Monitoring and Evaluation
No ratings yet
Module 8 Artificial Intelligence in Monitoring and Evaluation
23 pages
Senti bp1
No ratings yet
Senti bp1
2 pages
B4 Boston House Pricing
No ratings yet
B4 Boston House Pricing
63 pages
B7 Soil Quality
No ratings yet
B7 Soil Quality
68 pages
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
B7 Online Digital Lib
No ratings yet
B7 Online Digital Lib
37 pages
Resume: Lokam Srikanth Contact No: +91 8463931010
No ratings yet
Resume: Lokam Srikanth Contact No: +91 8463931010
2 pages
Brain Rot
No ratings yet
Brain Rot
2 pages
Cs8381 Datastructures Lab Manual
82% (28)
Cs8381 Datastructures Lab Manual
125 pages
Machine Learning Algorithm For Sentimental Analysis of Twitter Feeds
No ratings yet
Machine Learning Algorithm For Sentimental Analysis of Twitter Feeds
4 pages
Twitter Sentiment Analysis
No ratings yet
Twitter Sentiment Analysis
5 pages
Becoming a Data Analyst: Skills, Tools, and Real-World Strategies
From Everand
Becoming a Data Analyst: Skills, Tools, and Real-World Strategies
Othman Khalifa
No ratings yet
Steps For Price Bid and EPublsih
No ratings yet
Steps For Price Bid and EPublsih
39 pages
Data Science Essentials: Machine Learning and Natural Language Processing
From Everand
Data Science Essentials: Machine Learning and Natural Language Processing
Angel Gabaldon
No ratings yet
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
From Everand
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
Janet Laane Effron
No ratings yet
Data Analytics and Data Processing Essentials
From Everand
Data Analytics and Data Processing Essentials
gareth thomas
No ratings yet
Wireless Sensor Networks
No ratings yet
Wireless Sensor Networks
28 pages
Kiran PDF
No ratings yet
Kiran PDF
4 pages
Mastering Data Science with Python: The Ultimate Guide: Unlock the Power of Data Analysis and Visualization with Python's Cutting-Edge Tools and Techniques
From Everand
Mastering Data Science with Python: The Ultimate Guide: Unlock the Power of Data Analysis and Visualization with Python's Cutting-Edge Tools and Techniques
daniel Huston
No ratings yet
Z0200461 Busi55215 - 2023
No ratings yet
Z0200461 Busi55215 - 2023
9 pages
Central Purchase Contract
No ratings yet
Central Purchase Contract
38 pages
Display A CDS View Using ALV With IDA
No ratings yet
Display A CDS View Using ALV With IDA
7 pages
All About Data Science: Learn Data Science from scratch
From Everand
All About Data Science: Learn Data Science from scratch
Devi Prasad
No ratings yet
Endorsement of Higher Qualification-New
0% (1)
Endorsement of Higher Qualification-New
2 pages
"Data Analysis" Basic Concepts and Applications
From Everand
"Data Analysis" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
Fake Snapchat Chat Generator
No ratings yet
Fake Snapchat Chat Generator
1 page
Computer 10 4th MY ANSWER
No ratings yet
Computer 10 4th MY ANSWER
11 pages
Data Science Career Guide Interview Preparation
From Everand
Data Science Career Guide Interview Preparation
Gradient Publication
No ratings yet
SAP Security Audit Tool or SAP SECURITY WITH SIMPAUDIT
No ratings yet
SAP Security Audit Tool or SAP SECURITY WITH SIMPAUDIT
1 page
Vintage Disco Kit Manual
No ratings yet
Vintage Disco Kit Manual
24 pages
Pathfinder Solution Overview
No ratings yet
Pathfinder Solution Overview
2 pages
Data Preparation and Exploration: Applied to Healthcare Data
From Everand
Data Preparation and Exploration: Applied to Healthcare Data
Robert Hoyt
No ratings yet
OUM MARKETING MANAGEMENT BBPM2103 Topic 2
No ratings yet
OUM MARKETING MANAGEMENT BBPM2103 Topic 2
45 pages
Machine Structure
No ratings yet
Machine Structure
27 pages
Yoga Pavan Resume
No ratings yet
Yoga Pavan Resume
2 pages
Objective:: Write An Experiment On Zener Diode Clipper
No ratings yet
Objective:: Write An Experiment On Zener Diode Clipper
13 pages
1-Introduction To Algorithms and C Programming
No ratings yet
1-Introduction To Algorithms and C Programming
50 pages
Data Analytics with Python: Data Analytics in Python Using Pandas
From Everand
Data Analytics with Python: Data Analytics in Python Using Pandas
Frank Millstein
3/5 (1)
Lecture 1: Cryptography: 1.2.1 Symmetric Case
No ratings yet
Lecture 1: Cryptography: 1.2.1 Symmetric Case
3 pages
Pms Deck Nasyda Linso
100% (1)
Pms Deck Nasyda Linso
21 pages
Service Manual SKS14SBA - VKS14SBA
No ratings yet
Service Manual SKS14SBA - VKS14SBA
21 pages
Keyboard Shortcut Keys
No ratings yet
Keyboard Shortcut Keys
3 pages
New Seminar Prompt
No ratings yet
New Seminar Prompt
15 pages
Ejemplo - Contrato de Obra by Alicia Cosio González - Issuu
No ratings yet
Ejemplo - Contrato de Obra by Alicia Cosio González - Issuu
1 page
Rigging Safety
100% (1)
Rigging Safety
27 pages
Hadoop BIG DATA Interview Questions You'll Most Likely Be Asked
From Everand
Hadoop BIG DATA Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.