Data Mining and Machine Learning Tools
Data Mining and Machine Learning Tools
ICRTCSDA ‘24
ISBN-978-93-91977-50-4-CiiT Publications
Second International Conference on ‘’Recent Trends in Computer Science and Data Analystics” -2024
ICRTCSDA ‘24
DM tool is presented as an open-source software with 3 features.The main feature of this tool is
the dataset of KEEL, which is a repository containing the partitions of information sets in the
KEEL format. The results of some algorithms are shown in this dataset. The tool provides the
guidelines for using new algorithm. Since KEEL does not depend on any operating system,
anyone can use it. In this study, the Hider method was found to be the best method compared to
the other methods for analysis. RapidMiner is a DM tool that shows few characteristics of
extraction operators and individual operations used on the extension. Formerly known as YALE,
it is available for data analysis as a standalone application. It can be integrated with other
applications such as a DM engine. As a flexible, free, and Open-Source platform, it can run on
any major platform or operating system. Mikut and Marcus have discussed various historical
developments and presented a wide range of current tools for DM and related tools for support
decision making process.
In their work, they have presented nine different types of tools. These are BIS; DMS; INT; mats;
RES; EXT; libs; sols; and specs. These tools differ in characteristics such as user groups; data
structure; implemented tasks and methods; interaction styles like export and import; license
policies and platforms are adjustable; large
datasets with single features, unstructured text and time series can be managed using current
tools; and in the absence of comprehensive and strong DM tools for multi-dimensional datasets
like videos and images. They have discussed various algorithms of clustering using by DM using
the WEKA tool. The main key points are to explain the comparison of different algorithms for
clustering WEKA and to conclude on the best algorithm for users. They have only worked on the
cluster algorithm using WEKA tool. They chose this tool as it can be used without having a deep
understanding of DM techniques.
MATLAB and TANGARA have been utilized for a relative investigation of grouping methods.
The execution of various arrangement procedures is investigated for set of information. Clinical
conclusion is a significant variable forgetting the significant boundaries of the infection as
without conclusion recognizing boundary of disease is troublesome. Tests are directed which
incorporate bunches and arrangements procedures. In any case, many tests could confound the
interaction of analysis and it would be hard to acquire results. Consequently, to defeat ML
instruments are utilized. The extraction of data from enormous datasets and the relationship of a
component in informational collection will assist with breaking down the outcomes. Fluffy
recommendation lays out a connection between contribution of some kind or another also, yield
fluffy set utilizing fluffy rationale. The choice guidelines are executed for control yield worth
and information boundaries to track down the consequence of a diabetic individual. The outcome
could be negative or positive.
A strategy has been proposed to screen the arrange and execution of Wrongdoing area and
recognized the offenders in Indian urban regions utilizing DM procedures [15]. Their technique
ISBN-978-93-91977-50-4-CiiT Publications
Second International Conference on ‘’Recent Trends in Computer Science and Data Analystics” -2024
ICRTCSDA ‘24
is isolated into six modules: Information Extraction (DE), Gathering, Information Preprocessing
(DP), Google Depict, Classification and WEKA execution. DE extricates the unstructured and
indistinct criminal dataset from diverse wrongdoing Web Sources. DP cleans, consolidates
and reduces isolated criminal data to organize number of criminal events. They have settled these
cases utilizing 35 predefined criminal characteristics. The remaining four modules were
accommodating to recognize the Wrongdoing area, recognizable proof of the criminal and desire,
and confirmation of the wrongdoing, independently. The criminal distinguishing proof and desire
were split by utilizing KNN classification. Wrongdoing confirmation is done by the comes about
produced by WEKA. The proposed situation progressed the open way of life by making a
difference the specialists in wrongdoing disclosure as well as recognizable proof of offenders
and thus, decreasing the wrongdoing rates.
Based on alloying elements, the constituents of the microstructure of compact graphite iron have
been identified to determine thickness and effect. models of linear regression, This study used
segmented regression models with the MAR Splines algorithm, Artificial Neural Network
(ANN), Classification, and Regression Tree (CART). According to Siddique and Ahmad,
everything will be computerized in the current software era. It is a difficult task for software
organizations to develop standard software on time and within
estimated costs. When mining software repositories with tools like Apfel, Chianti, Dynamine,
Hipikat, Kenyon, and Softchange, DM is crucial. These tools should have the following
characteristics: they should be language-dependent, efficient, informative, interactive, and
materialistic.
Articles from 2007 to 2017 under the headings "Classification and Prediction," "Knowledge
Representation," "Computational Intelligence," and "DM Fundamental Concepts" been examined
in.In a report they presented, Kodati and Vivekanandam discussed using the Orange and WEKA
tools of DM to analyze cardiac disease.
OVERVIEW OF THE TOOLS
The D2K (Data to Knowledge) [2] toolkit has numerous layouts anticipated to link it with other
standard bundles, along with a visual programming condition. It provides bundles forcarry out
image and content mining and, in addition, provides an external framework of revolutionary
techniques for assembling some crucial genetic computations. Simple to use, private, and
offering an open-source information coordination, preparation, inspection, and investigation
stage is KNIME (Konstanz Information Miner). Devices for pre-handling, modifying, grouping,
affiliation leads, etc. are included in KNIME. The tool's advantage is that it allows several
administrators to cooperate and expand WEKA for potential outcomes with KNIME.
One ML tool is WEKA (Waikato Environment for Knowledge Analysis). All the ML algorithms
that make up this set are utilized to resolve application issues in real life.
ISBN-978-93-91977-50-4-CiiT Publications
Second International Conference on ‘’Recent Trends in Computer Science and Data Analystics” -2024
ICRTCSDA ‘24
RapidMiner (RM – some time ago YALE) [6,7] is an open-source, flexible, and cost-free Java
program. It is a business analytics, image processing, DM, and machine learning tool.
Addtionally, a dataset repository for using regression and classification techniques is accessible.
Python is used to support SciKit Learn [9], which is grouped with NumPy and SciPy. It is
employed in DM computations and outline plotting.
DECISION TRESS
Another popular technique for inductive inference is decision tree learning, which has been
effectively employed to solve a wide range of real-world issues. A decision tree gives a visual
representation of the choices that need to be made, the characteristics that should be chosen, and
the results that result from combining certain choices and characteristics. Decision trees are
composed of branches and nodes. Whereas branches indicate a potential course of action or
alternative, nodes are the places where decisions must be taken. Where the decision-making
process starts, a root node is chosen. illustrates a decision tree in action. As you can see, there are
two pathways leading to accreditation and five pathways leading to non-accreditation.
CONCLUSION
Recently, the academic setting has seen the application of data mining and machine learning,
which has improved the effectiveness of performance throughout students' education, improved
the allocation of human and material resources, and managed student performance. Through the
use of some of the most popular data mining and machine learning techniques, including
artificial neural networks, k-nearest neighbors, Bayesian learning, and decision trees, the
methodologies presented in this paper have demonstrated many approaches to solving many
fascinating academic problems.
Future research could focus on developing automated techniques for a variety of tasks, such as
forecasting student performance, enhancing decision-making, assigning tutors based on certain
criteria, enhancing the admissions process, and detecting students' ability levels. Numerous
more machine learning techniques, such handling unbalanced data, can also be used.
REFERENCE
1. U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, “From data mining to knowledge discovery in
databases,” in AI Magazine, 1996, vol. 17, no. 3, pp. 37–54.
2. J. Alcala-Fdez et al. “KEEL: a software tool to assess evolutionary algorithms for data mining
problems,” in Soft Computing, 2009, vol. 13, pp. 307-318.
3. R. Mikut, and R. Markus, “Data mining tools," in Wiley Interdisciplinary Reviews: Data Mining and
Knowledge Discovery 2011, vol. 1, pp. 431-443.
ISBN-978-93-91977-50-4-CiiT Publications
Second International Conference on ‘’Recent Trends in Computer Science and Data Analystics” -2024
ICRTCSDA ‘24
4. A. M. Hirudkar, and SS. Sherekar, “Comparative analysis of data mining tools and techniques for
evaluating performance of database system,” in International Journal of Computer Science Appllications,
2013, vol. 6, pp. 232-237.
5. N. Sharma, A. Bajpai, and R. Litoriya, “Comparison the various clustering algorithms of weka tools,”
in facilities, 2012, vol.4, pp. 78- 80.
6. M. Yas, A.A. Zaidan, B.B. Zaidan, B. Rahmatullah and H.A. Karim,“Comprehensive insights into
evaluation and benchmarking of real-time skin detectors: Review, open issues & challenges, and
recommended solutions,” in Measurement,2018, vol. 114, pp. 243-260
7. A. Jovic, B. Karla, and B. Nikola, “An overview of free software tools for general data mining,” in
IEEE 37th International Convention on Information and Communication
8. N. Borude, C. Maher, V. Sarda, and A. Santra, “Generic binary classifier tool for diagnosis of patients
suffering from brain disorders in R,” in International Conference on
Computing, Analytics and Security Trends (CAST), 2016, IEEE pp. 173-178.
9. B. Radim, K. Jan, S. Zdeněk, U. Václav, and D. Otto, “Rapidminer image processing extension: A
platform for collaborative research,” in 33rd International Conference on Telecommunication and Signal
Processing, TSP, 2010, pp. 114-118.
10. R.M. Rahman, and F. Afroz, “Comparison of various classification techniques using different data
mining tools for diabetes diagnosis,” in Journal of Software Engineering and Applications, 2013, vol. 6,
p. 85.
11. H. Solanki, “Comparative study of data mining tools and analysis with unified data mining theory,” in
International Journal of
Computer Applications,2013, vol. 75, pp. 23-28. 12. K. Rangra, and K. L. Bansal, “Comparative study of
data mining tools,” in international journal of advanced research in computer science and software
engineering, 2014, vol. 4. 13. X. Wu, X. Zhu, G.Q. Wu, and W. Ding, “Data mining with big data,” in
IEEE Transactions on Knowledge and Data Engineering, 2013, vol. 26, pp. 97-107.
14. DK. Tayal, A. Jain, S. Arora, S. Agarwal, T. Gupta, and N. Tyagi, “Crime detection and criminal
identification in India using data mining techniques,” in AI & society, 2015, vol. 30, pp. 117-127.
15. R. Alcalá, MJ. Gacto, J. Alcalá‐Fdez “Evolutionary data mining and applications: A revision on the
most cited papers from the last
10 years (2007–2017),” in Wiley
16. S. Yefimenko, “Advances in GMDH-based Predictive Analytics Tools for Business Intelligence
Systems,” in International Conference Proceedings, ACIT, 2018, pp. 254- 257.
ISBN-978-93-91977-50-4-CiiT Publications
Second International Conference on ‘’Recent Trends in Computer Science and Data Analystics” -2024
ICRTCSDA ‘24
ISBN-978-93-91977-50-4-CiiT Publications