Sales Analysis of E-Commerce Websites Using Data M
Sales Analysis of E-Commerce Websites Using Data M
net/publication/290786852
CITATIONS READS
2 515
1 author:
Anurag Bejju
Birla Institute of Technology and Science Pilani
2 PUBLICATIONS 2 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Real Time Banking Solutions: Era of Cashless Payment Systems View project
All content following this page was uploaded by Anurag Bejju on 05 March 2016.
36
International Journal of Computer Applications (0975 – 8887)
Volume 133 – No.5, January 2016
The most successful approach towards reducing user- been.
perceived latency has been the extraction of path traversal
patterns from past users buying history to predict future user
buying behavior and to fetch the required resources. [6]
Vallamkondu & Gruenwald (2003) describe an approach to
predict user behavior in e-commerce sites. The core of their
approach involves extracting knowledge from integrated data
of purchase and path traversal patterns of past users to
develop a pricing model which focuses on profits as well as
customer satisfaction. [7] Web sites are often used to
establish a company’s image, to promote and sell goods and
to provide customer support. The success of a web site The Internet will lead to increased price competition and the
directly affects the success of the company in an electronic standardization of prices. Also, the ability to compare prices
market. across all suppliers using the Internet and online shopping
services will lead to increased price competition. Finally, the
3.1 Product Strategy price of providing Internet-based services often contains
A product is anything that can be offered to a market for little or no marginal costs. Organizations will have to employ
attention, acquisition, use, or consumption that might satisfy a new pricing models when selling over the Internet. A negative
want or need (Kotler, 2001). In an e-commerce marketing correlation of -0.17907 was found on applying it to training
strategy, it is important to remember that information is now set of 1185 instances. It was also found that
its own viable product. In the physical world, a shopper who
wants to buy something has to manually sift through the
millions of choices. A complete search of all offerings would
be extremely expensive, time-consuming and practically
impossible. Instead consumers rely on product suppliers and 3.3 Productivity Concept
retailers to aid them in the search. This allows the suppliers Business profits can be increased by increasing revenue
and providers to use the consumers’ cost-of search as a through stronger sales and/ or by decreasing the costs
competitive advantage. However, on the Internet, consumers associated with constant sales. One of the major factors in
can search much more comprehensively and at virtually no customer satisfaction is the availability and timeliness of the
cost[11]. delivery of products. If a customer has to wait to receive a
product, it can be detrimental to their feeling of satisfaction.
[13] With that in mind, avoiding back orders should be a
major goal of any business.
Table 3. Correlation Between Productivity and Online
Rating
37
International Journal of Computer Applications (0975 – 8887)
Volume 133 – No.5, January 2016
Fig 1.1. Application of ID3 algorithm on data extracted from e-commerce website
38
International Journal of Computer Applications (0975 – 8887)
Volume 133 – No.5, January 2016
39
International Journal of Computer Applications (0975 – 8887)
Volume 133 – No.5, January 2016
The product name, product price, quantity, type and online further the companies’ profits.
rating is mined from flipkart.com website. Unique attribute
values are deleted and continuous attributes like price, Many of the e-commerce strategy frameworks offer a unique
quantity are discretized to get effective results. Equal width contribution to strategic planning but with limited solution.
binning technique divides the range of possible values into N This model based on web mining integrates the McCarthy’s
sub ranges of the same size. [14]. 4Ps to provide a complete analysis of e-business strategies.
Thus managers can use an organized and precise process to
5. EXTRACT CLASSIFICATION RULES make more successful and effective decisions. An aggressive
Data classification is an important data mining task that tries competition has been observed in market space among the
to identify common characteristics in a set of N objects companies, thus accelerating the consumer dynamics. E-
contained in a database and to categorize them into different commerce will lead to increased price competition and this
groups. We extract classification IF-THEN rules from those web application will provide an efficient way to price a
equivalence classes. For equivalence class {} , , If (a is particular product. It was found that price, product and
greater production had an impact on online customer ratings. This
model considers these three attributes which are correlated to
than or equal to 500 And (a is less than 14600) Then If b is customer satisfaction and help the marketer make an informed
equal to "laptop" Then IF(c is greater than or equal to25) And decision.
(c is less than 650) Then x = 3; can pruned by following a
path in this tree. Here x is the rating which a product can get. 8. REFERENCES
Using these rules a web application using JavaScript, HTML [1] Han, Jiawei, Micheline Kamber, and Jian Pei. “Data
and CSS is developed [15]. mining: concepts and techniques” Morgan kaufmann,
2006.
6. MODEL EVALUATION
The confusion matrix is a useful tool for analyzing how [2] Quinlan J. R. (1986). “Induction of decision
well your classifier can recognize tuples of different classes. trees.Machine Learning,” Vol.1-1, pp. 81-106.
TP and TN tell us when the classifier is getting things right, [3] J. R. Quinlan, “C4.5: Programs for Machine Learning,”
while FP and FN tell us when the classifier is getting things Morgan Kaufmann Publishers, Inc., 1993.
wrong mislabeling). Given m classes (where m≥2), a
confusion matrix is a table of at least size m by m. An entry, [4] Ding Xiang-wu and Wang Bin, "An Improved Pre-
CMi,j in the first m rows and m columns indicates the number pruning Algorithm Based on ID3," Jisuanji
of tuples of class i that were labelled by the classifier as class Yuxiandaihua,Vol.9, pp. 47,2008.
j. For a classifier to have good accuracy, ideally most of the [5] Ming Fan, Xiaofeng Meng translated, “Data mining
tuples would be represented along the diagonal of the techniques and concepts”, Machinary Industry Press,
confusion matrix, from entry CM1,1 to entry CMm,m, with Beijing, pp. 136-145, Feb., 2004.
the rest of the entries being zero or close to zero In this case
we m=4 so we have a 4*4 matrix. After applying ID3 [6] N R Srinivasa Raghavan, ”Data mining in e-commerce:
algorithm this model has 86.4780% accuracy (i.e: out of A survey,” Sadhana Vol. 30, Parts 2 & 3, April/June
every 100 test cases it has correctly predicted 87 test cases 2005, pp.275–289.
[16].
[7] B. Schafer, J.A. Konstan, and J. Reidl, “E-Commerce
Recommendation Applications,” Data Mining and
Knowledge Discovery, Kluwer Academic, 2001, pp.
115-153.
[8] P. Resnick et al., “GroupLens: An Open Architecture for
Collaborative Filtering of Netnews,” Proc. ACM 1994
Conf. Computer Supported Cooperative Work, ACM
Press, 1994, pp. 175-186.
[9] J. Breese, D. Heckerman, and C. Kadie, “Empirical
7. CONCLUSION Analysis of Predictive Algorithms for Collaborative
In this paper, a detailed study based on data mining Filtering,” Proc. 14th Conf. Uncertainty in Artificial
techniques was conducted in order to extract knowledge in a Intelligence, Morgan Kaufmann, 1998, pp. 43-52.
data set with information about user’s history associated to an
e-commerce website. These datasets are directly mined from [10] Baesens, B. Verstreeten, G. Poel, D. (2004). Bayesian
Flipkart.com using an online software which converts html network classifiers for identifying the slope of the
documents to data tables. The main purpose to web mine data customer lifecycle of long-life customers. European
is to apply a set of descriptive data mining techniques to Journal of Operation Research, 156, 508-523.
induce rules that allow data analyst working at ecommerce [11] Cheng, C.H. Chen, Y.S. (2009). Classifying the
companies make strategic decisions to boost their sales as segmentation of customer value via RFM model and RS
well as provide effective customer service. Techniques used theory. Expert System with Applications, 36, 3,
to discover patterns are web mining and decision tree 41764184. [12] Hwang, H., Jung, T. Suh, E., (2004). An
algorithms. In the future, this study can be used to analyze LTV model and customer segmentation based on
ecommerce websites and obtain interesting knowledge to customer value
IJCATM : www.ijcaonline.org 40