CSCI 720 - Project
CSCI 720 - Project
1) Data management is useful in storing and querying data as well as keeping the data
separate from the analysis. Typically done by database management systems (DBMS)
2) As part of the data management component, a base schema was designed using the
attributes from the combined dataset.
3) Data management done in MySQL using SQL Workbench and Python.
4) A representation of the data management component is shown in the figure that follows
Data Mining
1) Why data mining?
-> Data management is useful in web-applications and query-based environments. It can
execute complex queries however it cannot yield insights and it is difficult to perform
visualizations. Thus, data mining is needed for predicting, modeling and visualizing data.
2) Customer reviews can be mined to generate trends as well analyse past history to improve future
recommendations.
3) Cross Industry Standard Process for DataMining (or CRISPDM) is the most popular technique for
Data mining tasks. It consists of the following steps:
a) Business Understanding
b) Data Understanding
c) Data Preparation
d) Data Modeling
e) Data Evaluation
f) Deployment
Exploratory analysis and Visualization
We have performed visualizations in Tableau to explore relationship between attributes as well as
determine timelines and trends in the attributes.
Models Used:
Classification :
Clustering :
Agglomerative (Birch)
SVM Results
Accuracy : 68%
Birch results for a specific topic cluster
Birch results for a Generic topic cluster
Conclusion /Future Work
Future Work
● More categories
● User identification
Tools used