0% found this document useful (0 votes)
12 views3 pages

Clean and Analyse Social Media Data

This document discusses analyzing Twitter data using Python to understand user engagement trends across different categories. It introduces generating random social media data with categories like food, travel, and music assigned random numbers of likes. The data is loaded into a Pandas DataFrame and cleaned. Visualization and statistical techniques are used to analyze the distribution of likes across categories and draw conclusions about the most popular categories. The conclusions found that Fitness had the most likes while Family had the fewest, and that Culture and Health had similar engagement levels. Music had the highest number of days with engagement.

Uploaded by

Varshini S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views3 pages

Clean and Analyse Social Media Data

This document discusses analyzing Twitter data using Python to understand user engagement trends across different categories. It introduces generating random social media data with categories like food, travel, and music assigned random numbers of likes. The data is loaded into a Pandas DataFrame and cleaned. Visualization and statistical techniques are used to analyze the distribution of likes across categories and draw conclusions about the most popular categories. The conclusions found that Fitness had the most likes while Family had the fewest, and that Culture and Health had similar engagement levels. Music had the highest number of days with engagement.

Uploaded by

Varshini S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Clean and analyse social media

data(twitter) using python

Introduction
Social media has become a ubiquitous part of modern life, with platforms such as Instagram,
Twitter, and Facebook serving as essential communication channels. Social media data sets are
vast and complex, making analysis a challenging task for businesses and researchers alike. In
this project, we explore a simulated social media, for example Tweets, data set to understand
trends in likes across different categories.

Project Scope
The objective of this project is to analyze tweets (or other social media data) and gain insights
into user engagement. We will explore the data set using visualization techniques to understand
the distribution of likes across different categories. Finally, we will analyze the data to draw
conclusions about the most popular categories and the overall engagement on the platform.

Importing required libraries


importing the following required libraries:
• pandas for creating the dataframe
• numpy for forming a random number from a range
• Matplotlib.pyplot for displaying graphs
• seaborn for plotting the data
• random for making a choice from a list of items

Generating random data for the social media data


Defining a list of categories for the social media experiment. Namely: Food, Travel, 'Fashion, Fitness,
Music, Culture, Family, Movies and Health.

Generating a Python data dictionary with fields Date, Category, and number of likes, all with random
data.

Loading the data into a Pandas DataFrame and Explore the data
Loading the randomly generated data into the pandas dataframe and print the data.
Cleaning the data
Removing all the null data using the dataframe drop method.

Next, Removing all the duplicate values using remove method.

Converting the datatypes of date from string to datetime format.

Visualizing and Analyse the data


Analysing the data using statistical functions such as min, max, sum, and mean

Visualizing the data using pie chart, bar graph and boxplot.

Conclusion
1. The Fitness category has the maximum number of likes
2. The Family category has the minimum number of likes
3. culture and Heath has almost the same percentage of likes
4. .One the basis of number of days the music has the higher numbers

5. The average likes are higher in Fitness category and the lowest in the movies
category

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy