Big Data Analytics A Spotify Case Study
Big Data Analytics A Spotify Case Study
https://doi.org/10.22214/ijraset.2021.38702
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.429
Volume 9 Issue X Oct 2021- Available at www.ijraset.com
Abstract: By developing products that are in line with consumer needs, anticipating their profitability and manufacturing them,
Big Data has opened up a lot of possibilities for building customer loyalty and commercial business by proactively engaging and
comprehensively streamlining offers across all customer touch points. The use of big data to determine the best, most efficient
ways to engage and interact with their customers will be discussed in this paper. An insight into how Spotify intends to provide
music lovers additional ways to find their favourite songs, interact with artists, and improve Spotify recommendations has been
provided.
Keywords: Big Data, Data Analytics, Customer Satisfaction, Exploratory Data Analysis
I. INTRODUCTION
In global firms' new product development (NPD) endeavours, big data is becoming increasingly important. Firms are increasingly
relying on valuable knowledge obtained from big data to be competitive in today's fast-changing business climate (Barton 2 and
Court, 2012) [1] ;(Salehan and Kim, 2015) [2]. As a result, companies are increasingly turning to big data to a) better understand
their customers, b) produce better products, and c) provide more personalised services to their customers. One of the most
significant potential benefits of collecting big data, according to Davenport (2012) [3], is its application in the development of new
products and services. However, just a few studies have looked into how firms may improve their service using big data. This article
explains how firms may use big data to reduce time to market, increase user adoption, and cut costs while developing new products.
Big data enables companies to have a better understanding of their products, customers, and markets, which is essential for
consumer loyalty. Businesses' main challenge is figuring out how to use big data to improve customer intelligence. Whereas almost
everyone, including marketers, corporate managers, researchers, and policymakers, have experienced problems and challenges as a
result of the "big data" phenomenon: How can big data be used to benefit marketing, management, and policy-making? While 63
percent of organisations see big data analytics as a competitive advantage, 80 percent of marketers say they don't know how to turn
data into action, and 95 percent of data within organisations remains unused, according to several academic and industry reports
(Kiron et al., 2011) [4]; Rogers and Sexton, 2012) [5]; (Monetate, 2014 a,b) [6]. Even more baffling, according to one survey (Allen
et al., 2005) [7], while 80% of CEOs believe they provide exceptional customer service, just 8% of customers concur. To
demonstrate how the principles might be utilized to get additional benefit from big data, a case study with Spotify, a commercial
music streaming service firm, is used. The repercussions for practitioners and academia are examined, and conclusions are offered
towards the end.
However, since the id column won’t be that useful for our analysis, we can drop it.
Here is a brief glance at the refined dataset being analysed –
a) We first perform some basic data cleaning methods such as removing any duplicate entries or empty rows before we start our
main analysis. Now, as part of our main data exploration and analysis, we look at the distribution of these tracks by popularity
in Fig. 1:
b) From the graph in Fig.2. it becomes evident that more than 45k songs are in a popularity graveyard. And most of the songs are
distributed between 1 and 40 points of popularity approximately. This indicates that the music market is highly competitive in
this time range. We now try to analyse past trends as well to see if it was always competitive and whether this trend has been
increasing over the years.
Based on this graph in Fig.2. it seems it’s getting more and more competitive year after year, with nearly 140k songs produced in
2020. This could be attributed to the fact that 2020 was a year where many people had a lot of free time at home and hence this led
to a boom in the music generation segment.
If we explore the overall metrics in our dataset, we get the following observation:
From this table in Fig.3., it's evident that there’s a big leap between the 90% and 99% percentiles in the popularity variable,
compared to the previous ones. So, it seems that a few great hits are close to scoring 100 in popularity. This means that there is a
select group of tracks being quite popular on Spotify
We can therefore analyse if it is possible to get there by putting the right chords and rhythm into our song? We plot a correlation
chart (heatmap) to find this out:
Based on this heatmap in Fig.4., we can see that there are no significant correlations between popularity and the track’s attributes.
Still, it would be worth diving deep into the three attributes that showed a positive correlation: danceability, energy and loudness in
Fig.5.
This is confirming what we first saw in our correlation heatmap in Fig.4, but revealed something quite interesting for our analysis:
most of the high popularity outliers are found within the highest ranges of the three attributes, especially for loudness. This might
be a huge stepping stone towards solving our main question.
It would be best to subset our data to get the most popular songs, so we can see how present are these attributes:
So, we gather a subset of all songs that have a popularity factor of greater than or equal to 80
For this new subset, we can see in Fig.6, what they have in common by plotting their attributes by mean:
From the graph it seems, danceability is a quite strong contestant as it is a major attribute in most popular songs as seen in
Fig7.We can also take a look at how these have been part of the most popular songs over time:
Seems like popular songs have always been quite energetic, danceable and loud during the last 50 years, as well as happy. This
looks like a clear indicator of people's music taste. Let’s take a look at the top 25 artists with songs that are currently popular and
compare it with a random sample
['Taylor Swift'] 14
['Billie Eilish'] 10
['BTS'] 9
['Bad bunny'] 9
['Ariana Grande'] 9
['XXXTENTACION'] 8
['Ed Sheeran'] 6
['Lewis Capaldi'] 6
['Harry Styles'] 6
['Pop Smoke'] 5
['One Direction'] 5
['Dua Lipa'] 5
['Imagine Dragons'] 5
['Sam Smith'] 4
['Morgan Wallen'] 4
['Travis Scott'] 4
['Post Malone'] 4
['The Kid LAROI'] 4
['Bruno Mars'] 4
['Justin Bieber'] 4
['Arctic Monkeys'] 4
['Miley Cyrus'] 3
['Shawn Mendes'] 3
['Coldplay'] 3
Seems like Taylor Swift, BTS and Billie Eilish are quite popular right now with >9 current hit tracks. However, all of the artists in
the top 25 list are already very well known, as well as many of the ones in the list.
So, it can be concluded as from the analysis of this data that:
Producing a hit track won’t necessarily depend just on how happy, energic, danceable or loud your song is, but more likely it would
be related to your current popularity as an artist. However, in order to increase the probability of generating a relatively popular
song, the analysis suggests that it would be a good idea to add attributes that have high popularity like the ones shown in the
analysis above.
IV. LIMITATIONS
A. The dataset used in the current analysis is not large enough to capture significant trends at the root level. It is suitable for a
high-level exploratory analysis but not big enough to derive business insights into the data.
B. User data is inherently biased as it only captures the data of a specific section/group of people. Hence the analysis performed
may be affected and the insights captured might possess a certain level of bias that corrupts the derivations.
C. Some parameters used to analyse data may not be significant factors in deriving the required conclusions and insights.
Considering them might skew the results by some margin.
D. Most of the big data captured from users is prone to noise. A single outlier may have a negative effect on the whole dataset.
E. Analysing large quantities of data may slow down systems and also have a significant cost associated with it. Sometimes this
cost outweighs the gains achieved by conducting a thorough analysis.
F. Big data analysis is prone to security vulnerabilities. It can act as a potential point of failure or attack in cases of sensitive data.
G. Big data analytics cannot be conducted conclusively for data which has privacy restrictions.
V. CONCLUSION
In today's environment, when streaming music has surpassed purchased music, the music industry has been forced to shift its focus
away from record sales and toward collecting data with the purpose of determining the impact a particular song, artist, or album has
on the general population. Because the data also provides a deeper understanding of listening trends, audience markets, and other
areas, it is a never-ending revolution for those in the industry.
Spotify becomes an inadvertently self-marketable platform because users promote their engagement on their own accord because it
is a social and sharing experience. By combining its application of data with a robust user experience custom made for social media,
Spotify becomes an inadvertently self-marketable platform. Spotify would not have turned out the way it did if it hadn't been for big
data. With a rising presence in numerous countries and a growing audience, more data will be generated in the future years. More
data will result in better suggestions, better predictions, more users, and, as a result, more compensation to the rights holders.
Spotify was able to completely transform the music industry because to big data.
REFERENCES
[1] Barton, D. and Court, D. (2012), “Making Advanced Analytics Work For You”, Harvard Business Review, Vol. 90 No. 10, pp. 78-84.
[2] Salehan, M. and Kim, D. J. (2016). “Predicting the performance of online consumer reviews: A sentiment mining approach to big data analytics”, Decision
Support Systems, Vol. 81, pp. 30- 40.
[3] DAVENPORT, T. H. 2014. Big data at work: dispelling the myths, uncovering the opportunities, Boston, Harvard Business Review Press.
[4] D. Kiron, R. Shockley, N. Kruschwitz, G. Finch, and M. Haydock. Analytics: The widening divide. MIT Sloan Management Review, 53(3):1–22, 2011.
[5] Rogers and D. Sexton. Marketing roi in the era of big data: The 2012 brite and nyama marketing in transition study. Technical report, Columbia Business
School, http://www.iab.net/media/file/2012-BRITE-NYAMA-Marketing-ROI-Study.pdf, 2012.
[6] Monetate. Connecting data to action. Technical report, Monetate, 2014b.
[7] Allen, F. F. Reichheld, B. Hamilton, and R. Markey. Closing the delivery gap. Technical report, Bain and Company, 2005.