Sample Project Report
Sample Project Report
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING
By
It is certified that the work contained in the project report titled ”PREDICTING
THE FARMLAND FOR AGRICULTURE FROM SOIL FEATURES USING
MACHINE LEARNING” by ” STUDENT NAME-1 (Reg No-1), STUDENT
NAME-2 (Reg No-2), STUDENT NAME-3 (Reg No-3), STUDENT NAME-4 (Reg
No-4)” has been carried out under my/our supervision and that this work has not
been submitted elsewhere for a degree.
Signature of Supervisor
Ms. N. Sivaranjani
Assistant Professor
Computer Science & Engineering
School of Computing
Bharath Institute of Higher Education and Research
March, 2021
i
DECLARATION
We declare that this written submission represents my ideas in our own words and
where others’ ideas or words have been included, we have adequately cited and
referenced the original sources. We also declare that we have adhered to all
principles of academic honesty and integrity and have not misrepresented or
fabricated or falsified any idea/data/fact/source in our submission. We understand
that any violation of the above will be cause for disciplinary action by the Institute
and can also evoke penal action from the sources which have thus not been properly
cited or from whom proper permission has not been taken when needed.
(Signature)
(STUDENT NAME-1)
Date: / /
(Signature)
(STUDENT NAME-2)
Date: / /
(Signature)
(STUDENT NAME-3)
Date: / /
(Signature)
(STUDENT NAME-4)
Date: / /
ii
APPROVAL SHEET
Examiners Supervisor
Date: / /
Place:
iii
ACKNOWLEDGEMENT
First, we wish to thank the almighty who gave us good health and success throughout our project
work.
We express our deepest gratitude to our beloved President Dr. J. Sundeep Aanand, a n d
Managing Director Dr.E. Swetha Sundeep Aanand for providing us the necessary facilities for the
completion of our project.
We take great pleasure in expressing sincere thanks to Vice Chancellor Dr. K. Vijaya Baskar
Raju, Pro Vice Chancellor (Academic) Dr. M. Sundararajan, Registrar Dr. S. Bhuminathan and
Additional Registrar Dr. R. Hari Prakash for backing us in this project.
We thank our Dean Engineering Dr. J. Hameed Hussain for providing sufficient facilities for
the completion of this project.
We thank our Dean, School of Computing Dr. S. Neduncheliyan for his encouragement and the
valuable guidance throughout the project.
A special thanks to our Project Coordinators Mr. N. Nithiyanandam for his valuable guidance
and support throughout the course of the project.
We also take this opportunity to express a deep sense of gratitude to our Internal Supervisor
Ms. N. Sivaranjani for her cordial support, valuable information and guidance, he helped us in
completing this project through various stages.
We thank our department faculty, supporting staff and friends for their help and guidance to
complete this project.
M. USHA SRI (U17CN011)
I. LAKSHMI PRASANNA (U17CN014)
E. HEMA (U17CN023)
O. DHANA LAKSHMI (U17CN028)
iv
ABSTRACT
Agriculture has been a crucial part of human society due to the fact that the
need to explore the use of innovative methods to improve crop selection for the soil
depending on its texture, type, moisture, temperature, humidity, soil erosion, and
slope. Machine Learning, can help us in learning the data, analyzing the data fields
predicting the future. For predicting the farmland, supervised machine learning
techniques are used. India’s agriculture land over all its states has been used to
predict whether it is farmland or not using Machine Learning Algorithms and making
v
LIST OF FIGURES
vi
LIST OF ACRONYMS ANDABBREVIATIONS
AI Artificial Intelligence
FN False Negative
FP False Positive
ML Machine Learning
RF Random Forest
TP True Positive
TN True Negative
vii
TABLE OF CONTENTS
Page No.
ABSTRACT v
LIST OF FIGURES vi
1 INTRODUCTION 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Aim of the project . . . . . . . . . . . . . . . . . . . . 2
1.3 Project Domain . . . . . . . . . . . . . . . . . . . . . 2
1.4 Problem Statement . . . . . . . . . . . . . . . . . . . 3
1.5 Scope of the Project . . . . . . . . . . . . . . . . . . . 4
1.6 Methodology . . . . . . . . . . . . . . . . . . . . . . 4
2 LITERATURE REVIEW 5
3 PROJECT DESCRIPTION 9
3.1 Existing System . . . . . . . . . . . . . . . . . . . . . 9
3.2 Proposed System . . . . . . . . . . . . . . . . . . . . 9
3.3 Feasibility Study ....................................................................................... 10
3.3.1 Economic Feasibility ................................................................ 10
3.3.2 Technical Feasibility ................................................................. 11
3.3.3 Social Feasibility....................................................................... 11
3.4 System Specification ................................................................................ 11
3.4.1 Hardware Specification............................................................. 11
3.4.2 Software Specification .............................................................. 12
3.4.3 Standards and Policies .............................................................. 12
4 MODULE DESCRIPTION 16
4.1 General Architecture................................................................................. 16
4.2 Design Phase............................................................................................. 17
4.2.1 Data Flow Diagram................................................................... 17
4.2.2 UML Diagram........................................................................... 19
4.2.3 Use Case Diagram .................................................................... 20
4.2.4 Collaboration Diagram ............................................................. 21
4.2.5 Sequence Diagram .................................................................... 22
4.3 Module Description .................................................................................. 23
4.3.1 Data Preprocessing ................................................................... 23
4.3.2 Data Visualization and descriptive statistics ............................ 25
4.3.3 Data Splitting ............................................................................ 26
4.3.4 Predicting Admissions .............................................................. 28
4.3.5 The Evaluation of Model Performance .................................... 29
References 49
Chapter 1
INTRODUCTION
1.1 Introduction
From the past, agriculture is one of the leading sectors practiced in India. Ancient people farm
the crops in their land. Therefore, natural crops are cultivated and had been used by human
beings, animals and birds. The invention of new technologies in the field of agriculture is slowly
disgracing. Due to, plenty of inventions in the agriculture sector people are more concentrated on
cultivating hybrid products which ultimately leads to an unhealthy life, lack of nutrition.
Nowadays, modern people don’t have awareness about farming crops. The changes in the farming
seasons, climatic conditions, soil features like water availability, and soil erosion, etc affects the
cultivating techniques.
By examining all these issues and problems like temperature, weather, and several factors,
there is no true solution and technologies to overcome the situation facedby farmers. In India,
the economical growth of agriculture can be increased in many ways. There are different ways to
increase and improve the crop yield and the quality of the crop. Machine Learning algorithms can
be used for predicting crop yield production-based on the soil features. Supervised Learning
Classification technique is used to predict the accuracy. In supervised learning, both input and
output are given to the system by splitting the data into training and testing data. Training
data is used to train the system and testing data is used for predicting the accuracy. For
predicting the accuracy three algorithms are used and accuracy among them is compared.
1) DNN -Deep Neural Network
2) Random Forest
Practical implementation of the models developed in decision support tools that would provide
a snapshot of areas and classify it whether agriculture could be practiced there or not. Along with
these proposed algorithms, the RANDOM FOREST will be implemented to predict the data and to
improve accuracy.
1
1.2 Aim of the project
The aim of the project is to use Machine Learning to find whether a particular land is
adaptable for agriculture, with the perfect conditions for agriculture, and to improve the accuracy
of the algorithms and compare to find the best among the three.
2
1.5 Scope of the Project
The scope of the project is to identify the Farmlands based on the agriculture conditions of all
types of soil features. It required many inputs such as temp, soil conditions, erosion conditions,
humidity, etc and output are collected as a dataset which includes all information about agriculture
lands that is needed for the project. The agriculture sector is necessary to balance the supply as the
population and de- mand are increasing day by day. Agriculture will be formed as the management
of land parcels and up productivity. It serves solely up to date society and additionally our future
generations. The project is to give valuable information to users about profit yielding farmlands
based on the soil features. So, the need of exploring the use of innovative methods to improve the
crop growing accuracy in that farmland from the soil depending on its texture, type, moisture is
there. Machine Learning, can help in learning, analyzing and predicting the data for the future.
1.6 Methodology
The data is imported, cleaned and standardized. In the preprocessing stage label, encoding and
one hot-encoding is done to convert the data into machine-readable data. Then the data is split into
training data and testing data. After splitting the data, standard scaling is done on the training and
testing data. Then the data is taken for classification for predicting the accuracy and thereby
obtaining the confusion matrix. Different accuracies for obtained for different algorithms.
3
Chapter 2
LITERATURE REVIEW
This chapter gives the overview of literature survey. This chapter represents some of the relevant
work done by the researchers. Many existing techniques have been studied by the researchers on
Search System for proper agriculture prediction problem using python and Spyder we review some
of them below.
The author in [1] presents that Machine Learning deals with issues where the relationship
between input and output variables acquainted or exhausting to get. This characteristic is useful to
model sophisticated non-linear behaviors, sort of operate for crop yield prediction. Machine
learning techniques most successfully applied to Crop Yield Prediction (CYP). The work
methodology continues until the model achieves a desired level of accuracy on the work info.
Samples of supervised Learning: Regression, call Tree, Random Forest, KNN, offer Regression,
etc. This will facilitate farmers to grow the right number of crops within the needed land and to
understand the precipitation liquid ecstasy temporary worker, min temporary worker of that space.
This paper concentrates on the prediction of the foremost profitable crop that may be adult within
the agricultural land exploitation machine learning techniques. This paper includes the utilization
of the android system which will offer them 64000-time crop analysis victimization varied
lookout reports and soil quality. Therefore, farmers would adult the foremost profitable crop
within the bestappropriate months.
The author in paper [2] talks about developing accurate models for crop yield estimation using
Information and Communication Technologies that may help farmers and other stakeholders
improve deciding in reference to national food import/exports and food security. And this will
examine the performance of the RF and the MLC methods.
The author in [3] describes, however, the recent agricultural info is often wont to describe the
long run prediction of crops and yield. It conjointly suggests the farmers regarding what style of
the crop is often big victimization the weather station info and provides the acceptable info to
like the correct season for excellence farming. The data processing techniques area unit
mentioned thoroughly.
The author in [4] describes the vital role performed by data processing ways in the agricultural
field. They have conferred the various metric capacity unit algorithms like Random forest, SVM,
ANN, etc. The crops were foreseen primarily supported climatical options which provide accuracy
score of regarding ninety-fifth with the C4.5 formula.
4
The main objective of this paper [5] is to the analysis of main soil properties like organic
matter, essential plant nutrients; micronutrient that affects the expansion of crops. BPN will find
and suggest the proper correlation percentage among those properties. The machine learning system
will be split into three steps, the first sampling second one is Back Propagation Algorithm and the
third one is Weight updating. The performance of the rear Propagation Neural network model is
going to be evaluated employing a test data set.
The author in paper [6] presents the Latent Dirichlet allocation (LDA) topic models area unit
more and more being employed in communication analysis. Yet, queries relating to dependability
and validity of the approach have received very little attention so far. In applying LDA to matter
information, researchers got to tackle a minimum of four major challenges that have an effect on
these criteria: (a) applicable preprocessing of the text collection; (b) adequate choice of model
parameters, together with the number of topics to be generated; (c) analysis of the model’s
reliability; and (d) the method of with validity deciphering the ensuing topics. The author has a
tendency to review the analysis literature coping with these queries and propose a technique that
approaches these challenges consequently; users have a tendency to develop a short active user
guide for applying LDA topic modeling. Author has a tendency to demonstrate the worth of our
approach with empirical information from AN in progress scientific research.
The innovation of this paper [7] lies in combining CNN-based learning methods for
producing geo-objects and tree-based learning methods for mapping soil property. To improve the
precision of predicting soil properties in a geographic space, the author developed a novel geo-
object-based soil property mapping procedure using machine learning algorithms.
The author in paper [8] gives information about the potential of wireless sensors and IoT in
agriculture, and also because of the challenges expected to be faced when integrating this
technology with normal farming practices. The paper [8] identifiesthe current and future trends of
IoT in agriculture and highlights the potential research challenges.
The author in paper [9] states that the requirements and planning which is needed for developing
software model for precision farming. It deeply studies the fundamentals of precision farming. The
great objective of the model is to deliver direct advisory services to even the tiniest farmer at the
extent of his/her smallest plot ofthe crop, using the foremost accessible technologies like SMS and
email. This model has been designed for the scenario in Kerala State where the quality holding size
is much but most of India. Hence this model is often positioned elsewhere in India only with some
modifications.
The author in paper [10] states that Soil is a crucial ingredient of agriculture. There are
5
many kinds of soils available. Each kind of soil can have different types of features and different
types of crops grow on differing types of soil. Users would like to know the features and
characteristics of various soil types to understand which crops grow better in certain soil types.
Machine learning techniques are often helpful during this case. In recent years, it’s progressed tons.
Machine learning remains an emerging and challenging research field in agricultural data analysis.
The author has proposed a model that can easily suggest suitable crops. Several machine learning
algorithms like weighted k-Nearest Neighbor (k-NN), logistic regression, and DNN based Support
Vector Machines (SVM) are used for soil classification methods.
6
Chapter 3
PROJECT DESCRIPTION
7
2. Technical Feasibility.
3. Social Feasibility.
3.3.1 Economic Feasibility
This study is carried so that the system is economically feasible, as it uses the Keras library. It
requires less cost to complete the entire system. And the system is made using open-source tools
like anaconda navigator, Spyder and python.
8
• MONITOR: 15” LCD
• HARD DISK: 120 GB
9
a community-based development model, as do nearly all of Python’s other implementations. Python
and CPython are maintained by the non-profit Python Software Foundation Python maybe a multi-
paradigm programming language. Object- oriented programming and structured programming are
fully supported, and some of its features support functional programming and object-oriented
programming.
Python uses dynamic typing and a mix of reference counting and it’s miles used for memory
management and It additionally functions dynamic call the resolution, which binds approach and
variable names at some point of software execution. The standard library has modules that
enforce functional gear borrowed from Haskell and Standard Machine studying. Python Machine
Learning, Machine Learning also termed ML. and It is a subset of Artificial Intelligence. It deals
with machine learning algorithms that can look at data that are required for a working model to
learn fromit and make predictions.
Standard used: 3.7.1
Spyder
Spyder is one amongst the open-source cross-platform IDE of the atmosphere for scientific
programming within the Python programing language that has been employed during this project.
Spyder integrates with an expansion of prominent pack- ages within the scientific Python stack,
which including NumPy, SciPy, Matplotlib, pandas, IPython, SymPy, and Cython of python,
likewise as other open-source soft- ware. Spyder has been maintained and continuously improved
by a team of Python developers.
Spyder uses for its Graphic User Interface (GUI) and is supposed to use either of the PyQt or
PySide Python bindings. QtPy, a skinny abstraction layer is developed by the Spyder
assignment and later adopted by using a couple of different pack- ages, provides the pliability
to apply either backend. It is available on cross-platform through Anaconda, on the Windows
platform with WinPython library.
Standard used: 3.3.2
Keras
Keras is an Open Source Neural Network library written in Python programming language that
runs on top of Theano or Tensorflow. it’s designed to be modular, fast and simple to use. it had
been developed by François Cholet, a Google engineer. Keras doesn’t handle the low-level
computation. Keras is using instead of it uses another library to try it and it is known as ”Backend”.
So Keras could be a high-level API wrapper for the low-level API, capable of running on top of
Tensorflow, CNTK, or Theano.
Keras High-Level API handles are used to make models, defining layers, and multiple input-
output models. At a high-level API, Keras also compiles the model with loss and optimizer
10
functions, training process with fit function. and it doesn’t handle Low-Level API models like
making the computational graph, for making the tensors and different variables because it’s been
handled by way of the ”backend” engine.
Standard used: 2.3.0
11
Chapter 4
MODULE DESCRIPTION
After the installation of required software tools, the soil dataset is loaded into the Spyder
software to preprocess the data using Label Encoding, One hot Encoding,and Scaling. In label
encoding each row to computer-readable values i.e.; 0’s and 1’s. After the data preprocessing
feature extraction is done. Then the data is split into the training and the testing data. The
percentage of training data and testing data has to be chosen; preferably training data 80 per and
testing data 20 per. On the training and testing data, the classification techniques like DNN,
Random Forest, and LDA are done. Data can be trained in N number of iterations called Epoch’s
where the N value is an integer. Finally, the confusion matrix and accuracy of each classifier are
displayed as output. The best of the three is compared.
12
4.2 Design Phase
4.2.1 Data Flow Diagram
13
Figure 4.3: Data Flow Diagram of Level-1
14
4.2.2 UML Diagram
15
4.2.3 Use Case Diagram
16
4.2.4 Collaboration Diagram
17
4.2.5 Sequence Diagram
18
4.3 Module Description
4.3.1 Data Preprocessing
Before processing the model, using the data preprocessing technique the raw data is converted
into a clean data. There are different techniques for data preprocessing. Firstly, import the necessary
libraries, importing the dataset, Handling of the missing data in the dataset by applying the data
cleaning method. The data has to be Cleaned before loading into the Neural Classifier by Handling
of categorized data and feature scaling methods are required. These are implemented by using the
following.
Label Encoding
The data should be made ready for the model before using it. To convert any kind of
categorical text data into model-understandable numerical data, this project uses the Label Encoder
class, this work has to do is to label encode the first column. Sclera library contains label encoder,
using fit() and transform() methods on the data, and then replace the existing data with the newly
transformed data.
One hot Encoding
Categorical variables are converted into a form that would be provided to Machine Learning
algorithms to try and do a far better job in prediction is called One hot encoding. The only reason
behind using a one-hot encoder is to perform “binarization” of the category and make it as a feature
to guide the model.
Standard Scaler
ScikitLearn package provides Standard Scaler. This work fitting and transforming the standard
Scaler method on train data. This project has to standardize our scaling so this will use the same
fitted method to transform/scale test data. The independent features present in the data can be
standardized using Feature Scaling.
19
Figure 4.9: Module 2
20
training data. It means the model could not be generalized into new data. It could also happen when
the system fits a linear model to data that is not linear. It almost goes without saying that this model
will have a low predictive ability on training data and cannot be generalized to other data.
21
4.3.4 Predicting Admissions
This project has chosen the DNN – Deep Neural Network LDA and Random Forest
algorithms for prediction. Neural networks use randomness intentionally by choice to verify they
effectively learn the function being approximated for the matter. Randomness is employed because
this class of machine learning algorithm performs better with it than without. The most common
kind of randomness utilized in neural networks is that the random initialization of the network
weights.
DNN can be implemented using a sequential classifier by adding layers. In this add, dense
methods are used to add layers. The compile method is used to run the layers conversion using
adam optimizer, loss binary cross entropy, and accuracy as metrics. The fit method is used to
pass the input to the model and epoch can be set for the number of iterations. Similarly, LDA
and Random Forest algorithms are implemented and the respective confusion matrix and the
accuracy of each algorithm are predicted.
22
4.3.5 The Evaluation of Model Performance
Evaluation of the performance of the machine learning model is done by predicting the accuracy
of the algorithms under supervised learning. Another method is a confusion matrix by comparing
the Actual data and Predicted data. It contains the predicted outcomes of the Y label compared with
the actual output of the Y test data.
There are various ways to check the performance of the machine learning model is1) Confusion
matrix - For simplicity, it’ll mostly discuss things in terms of a binary classification problem some
common terms to be clear with are: True positives (TP): Predicted positive and are literally positive.
False positives (FP): Predicted positive and are literally negative. True negatives (TN): Predicted
negative and are literally negative. False negatives (FN): Predicted negative and are literally
positive. A con- fusion matrix is just a representation of the above parameters in a matrix format.
Better visualization is usually good 2) Accuracy - The most commonly used metric to judge a
model and is actually not a clear indicator of the performance.
23
Chapter 5
IMPLEMENTATION AND TESTING
The implementation stage of the project, the design is turned into a complete working system.
Therefore, it is the key stage in achieving the complete system. This stage involves planning,
examining the existing system and its limitations. It is a very crucial stage of the project when the
theoretical design of the working model is turned out into a working system. It is the most
important stage in achieving a successful new system of the model so that the new system will work
to be effective and to be efficient.
24
a general dataset alongside testing set. When in doubt, the better the training data, the better the
calculation or classifier performs.
5.2 Testing
The reason for testing is to find mistakes. It gives an approach to check the usefulness of
modules and the performance of the entire system. Testing is used for finding the model accuracy
or model performance. Testing identifies the errors present in theentire system.
1 i m p o r t numpy as np
2 i m p o r t panda s as pd
3 import matpl ot lib
4 import theano
5 import tensorflow
6 import keras
7 from s k l e a r n . m e t r i c s i m p o r t c o n f u s i o n m a t r i x
8 from s k l e a r n . m e t r i c s i m p o r t a c c u r a c y s c o r e
9 d a t a s e t = pd . r e a d c s v ( ’C : / / Users / / P r a s a d / / Desktop / / Agri / / F i n a l / / S o i l D a t a s e t . csv ’
)
10 X = dataset . iloc [ : , 3:32]. values
11 y = dataset . iloc [ : , 32]. values
Test result
It is checked whether or not the data is properly flowing into this system unit andproperly
occur out of it or now not using module interface testing.
Input
25
7
8
i m p o r t panda s
9
as pdi m p o r t m a t
10
plotlib
11
import theano
12
import tenso
13
r f l o wi m p o r t k
14
eras
15
d a t a s e t = pd . r e a d c s v ( ’C : / / Users / / P r a s a d / / Desktop / / Agri / / F i n a l / / S o i l D a t a s e t .
16
csv ’
17
)
18
X = dataset . iloc [: , 3:32]. v
19
a l u e sy = d a t a s e t . i l o c [ : , 3 2 ] . v
from s k l e a r n . p r e p r o c e s s i n g i m p o r t Label E ncoder , One Hot
20
aEncoder
lues
21
l a b e l e n c o d e r X 0 = L abel E nc o d er ( )
X [ : , 0 ] = l a b e l e n c o d e r X 0 . f i t t r a n s f o r m (X 0])
[ : ,l a b e l e n c o d e r X 1 = L abel E nc o d er ( )
X [ : , 1 ] = l a b e l e n c o d e r X 1 . f i t t r a n s f o r m (X 1])
[ : ,l a b e l e n c o d e r X 2 = L abel E nc o d er ( )
X [ : , 2 ] = l a b e l e n c o d e r X 2 . f i t t r a n s f o r m (X 2])
[ : ,l a b e l e n c o d e r X 3 = L abel E nc o d er ( )
X [ : , 3 ] = l a b e l e n c o d e r X 3 . f i t t r a n s f o r m (X 3])
[ : ,l a b e l e n c o d e r X 4 = L abel E nc o d er ( )
X [ : , 4 ] = l a b e l e n c o d e r X 4 . f i t t r a n s f o r m (X 4])
[: ,
Test result
If the dataset is loaded properly, label encoding is done for every column in the dataset. If the
dataset is not properly, an error will occur by failing the integration among two tasks i.e., loading
dataset and label encoding.
26
5.3.3 Functional testing
Input
Test Result
It checks whether the information passed to the function as an argument is working properly or
not. In the above diagram, the train, test split is the function used. It will throw an error if the
information is not properly passed.
1 import keras
2 from k e r a s . models i m p o r t S e q u e n t i a l
3 from k e r a s . l a y e r s i m p o r t Dense
4 c l a s s i f i e r = Sequential ()
5 c l a s s i f i e r . add ( Dense ( o u t p u t d i m = 6 , i n i t = ’ u ni f or m ’ , a c t i v a t i o n = ’ r e l u ’ ,
i n p u t d i m = 1422 ) )
6 c l a s s i f i e r . add ( Dense ( o u t p u t d i m = 6 , i n i t = ’ u n ifo rm ’ , a c t i v a t i o n = ’ r e l u ’ ) )
7 c l a s s i f i e r . add ( Dense ( o u t p u t d i m = 1 , i n i t = ’ u n ifo rm ’ , a c t i v a t i o n = ’ s i gm oid ’ ) )
8 c l a s s i f i e r . co m pil e ( o p t i m i z e r = ’ adam ’ , l o s s’ = ’ b i n a r y c r o s s e n t r o p y ’ , m e t r i c s = [
accuracy ’ ] )
9 c l a s s i f i e r . f i t ( X train , y train , batch size = 10 , nb epoch = 100 )
10 y pred = c l a s s i f i e r . pre dict ( X test )
11 y pred = ( y pred > 0.5)
27
12 cm = c o n f u s i o n m a t r i x ( y t e s t , y p r e d )
13 from s k l e a r n . f e a t u r e s e l e c t i o n i m p o r t S e l e c t K B e s t
14 from s k l e a r n . f e a t u r e s e l e c t i o n i m p o r t c h i 2
15 X = X. a s t y p e ( i n t )
16 c h i 2 f e a t u r e s = S e l e c t K B e s t ( chi 2 , k = 2 )
17 X k b e s t f e a t u r e s = c h i 2 f e a t u r e s . f i t t r a n s f o r m ( X, y )
18 p r i n t ( ’ O r i g i n a l f e a t u r e number : ’ , X. shape [ 1 ] )
19 p r i n t ( ’ Reduced f e a t u r e number : ’ , X k b e s t f e a t u r e s . shape [ 1 ] )c
20 h i 2 f e a t u r e s = S e l e c t K B e s t ( chi 2 , k = 2 )
21 X k b e s t = c h i 2 f e a t u r e s . f i t t r a n s f o r m ( X, y )
22 p r i n t ( ’ O r i g i n a l f e a t u r e number : ’ , X. shape [ 1 ] )
23 p r i n t ( ’ Reduced f e a t u r e number : ’ , X k b e s t . shape [ 1 ] )
1 from s k l e a r n . d i s c r i m i n a n t a n a l y s i s i m p o r t L i n e a r D i s c r i m i n a n t A n a l y s i s as LDA
2 l d a = LDA( n co m p o n en ts = 1 )
3 X train = lda . f i t t r a n s f o r m ( X train , y t r a i n )
4 X test = lda . transform ( X test )
5 y pred = lda . p r e d ic t ( X test )
6 cm = c o n f u s i o n m a t r i x ( y t e s t , y p r e d )
7 p r i n t ( cm )
8 p r i n t ( ’ Accuracy f o r LDA ’ + s t r ( a c c u r a c y s c o r e ( y t e s t , y p r e d ) ) )
9 from s k l e a r n . ensemble i m p o r t R a n d o m F o r e s t C l a s s i f i e r
10 c l a s s i f i e r = R a n d o m F o r e s t C l a s s i f i e r ( max depth = 2 , r a n d o m s t a t e = 0 )
11 c l a s s i f i e r . f i t ( X train , y t r a i n )
12 y pred = classifier .predict(X test)
13 cm = c o n f u s i o n m a t r i x ( y t e s t , y p r e d )
14 p r i n t ( cm )
15 p r i n t ( ’ Accuracy f o r Random F o r e s t ’ + s t r ( a c c u r a c y s c o r e ( y t e s t , y p r e d ) ) )
28
5.3.6 Test Result
From the result, LDA gives the best with most elevated precision than RandomForest
algorithm.
29
Chapter 6
RESULTS AND DISCUSSIONS
1 i m p o r t numpy as np
2 i m p o r t panda s as pd
3 import matpl ot lib
4
5 import theano
6 import tensorflow
7 import keras
8
12
13
14
15
16
17
18
19
20
21
22
23
24
25
31
26 p l t . show ( )
27 ’’’
28 from sklearn . preprocessing i m p o r t Label Encoder , One Hot Encoder
29 l a b e l e n c o d e r X 0 = Label En c o de r ( )
30 X [ : , 0 ] = l a b e l e n c o d e r X 0 . f i t t r a n s f o r m (X [ : , 0])
31 l a b e l e n c o d e r X 1 = Label En c o de r ( )
32 X [ : , 1 ] = l a b e l e n c o d e r X 1 . f i t t r a n s f o r m (X [ : , 1])
33
34
35 l a b e l e n c o d e r X 2 = Label En c o de r ( )
36 X [ : , 2 ] = l a b e l e n c o d e r X 2 . f i t t r a n s f o r m (X [ : , 2])
37
38
39 X
40 l a b e l e n c o d e r X 3 = Label En c o de r ( )
41 X [ : , 3 ] = l a b e l e n c o d e r X 3 . f i t t r a n s f o r m (X [ : , 3])
42
43
44 X
45 l a b e l e n c o d e r X 4 = Label En c o de r ( )
46 X [ : , 4 ] = l a b e l e n c o d e r X 4 . f i t t r a n s f o r m (X [ : , 4])
47
48
49 l a b e l e n c o d e r X 5 = Label En c o de r ( )
50 X [ : , 5 ] = l a b e l e n c o d e r X 5 . f i t t r a n s f o r m (X [ : , 5])
51
52 l a b e l e n c o d e r X 6 = Label En c o de r ( )
53 X [ : , 6 ] = l a b e l e n c o d e r X 6 . f i t t r a n s f o r m (X [ : , 6])
54
55
56 l a b e l e n c o d e r X 7 = Label En c o de r ( )
57 X [ : , 7 ] = l a b e l e n c o d e r X 7 . f i t t r a n s f o r m (X [ : , 7])
58
59
60 l a b e l e n c o d e r X 8 = Label En c o de r ( )
61 X [ : , 8 ] = l a b e l e n c o d e r X 8 . f i t t r a n s f o r m (X [ : , 8])
62
63
64 l a b e l e n c o d e r X 9 = Label En c o de r ( )
32
65 X [ : , 9 ] = l a b e l e n c o d e r X 9 . f i t t r a n s f o r m (X [ : , 9])
66
67 l a b e l e n c o d e r X 1 0 = Label E nc o d er ( )
68 X [ : , 10 ] = l a b e l e n c o d e r X 1 0 . f i t t r a n s f o r m (X [ : , 10])
69
70 l a b e l e n c o d e r X 1 1 = Label E nc o d er ( )
71 X [ : , 11 ] = l a b e l e n c o d e r X 1 1 . f i t t r a n s f o r m (X [ : , 11])
72
73 l a b e l e n c o d e r X 1 2 = Label E nc o d er ( )
74 X [ : , 12 ] = l a b e l e n c o d e r X 1 2 . f i t t r a n s f o r m (X [ : , 12])
75
76 X
77 l a b e l e n c o d e r X 1 3 = Label E nc o d er ( )
78 X [ : , 13 ] = l a b e l e n c o d e r X 1 3 . f i t t r a n s f o r m (X [ : , 13])
79
80 X
81 l a b e l e n c o d e r X 1 4 = Label E nc o d er ( )
82 X [ : , 14 ] = l a b e l e n c o d e r X 4 . f i t t r a n s f o r m (X [ : , 14])
83
84 l a b e l e n c o d e r X 1 5 = Label E nc o d er ( )
85 X [ : , 15 ] = l a b e l e n c o d e r X 1 5 . f i t t r a n s f o r m (X [ : , 15])
86
87 l a b e l e n c o d e r X 1 6 = Label E nc o d er ( )
88 X [ : , 16 ] = l a b e l e n c o d e r X 1 6 . f i t t r a n s f o r m (X [ : , 16])
89
90 l a b e l e n c o d e r X 1 7 = Label E nc o d er ( )
91 X [ : , 17 ] = l a b e l e n c o d e r X 1 7 . f i t t r a n s f o r m (X [ : , 17])
92
93 l a b e l e n c o d e r X 1 8 = Label E nc o d er ( )
94 X [ : , 18 ] = l a b e l e n c o d e r X 1 8 . f i t t r a n s f o r m (X [ : , 18])
95
96 l a b e l e n c o d e r X 1 9 = Label E nc o d er ( )
97 X [ : , 19 ] = l a b e l e n c o d e r X 9 . f i t t r a n s f o r m (X [ : , 19])
98
99 l a b e l e n c o d e r X 2 0 = Label E nc o d er ( )
100 X [ : , 20 ] = l a b e l e n c o d e r X 2 0 . f i t t r a n s f o r m (X [ : , 20])
101
102 l a b e l e n c o d e r X 2 1 = Label E nc o d er ( )
103 X [ : , 21 ] = l a b e l e n c o d e r X 2 1 . f i t t r a n s f o r m (X [ : , 21])
33
104
105 l a b e l e n c o d e r X 2 2 = Label E nc o d er ( )
106 X [ : , 22 ] = l a b e l e n c o d e r X 2 2 . f i t t r a n s f o r m (X [ : , 22])
107
108 X
109 l a b e l e n c o d e r X 2 3 = Label E nc o d er ( )
110 X [ : , 23 ] = l a b e l e n c o d e r X 2 3 . f i t t r a n s f o r m (X [ : , 23])
111
112 X
113 l a b e l e n c o d e r X 2 4 = Label E nc o d er ( )
114 X [ : , 24 ] = l a b e l e n c o d e r X 2 4 . f i t t r a n s f o r m (X [ : , 24])
115
116 l a b e l e n c o d e r X 2 5 = Label E nc o d er ( )
117 X [ : , 25 ] = l a b e l e n c o d e r X 2 5 . f i t t r a n s f o r m (X [ : , 25])
118
119 l a b e l e n c o d e r X 2 6 = Label E nc o d er ( )
120 X [ : , 26 ] = l a b e l e n c o d e r X 2 6 . f i t t r a n s f o r m (X [ : , 26])
121
122 l a b e l e n c o d e r X 2 7 = Label E nc o d er ( )
123 X [ : , 27 ] = l a b e l e n c o d e r X 2 7 . f i t t r a n s f o r m (X [ : , 27])
124
125 l a b e l e n c o d e r X 2 8 = Label E nc o d er ( )
126 X [ : , 28 ] = l a b e l e n c o d e r X 2 8 . f i t t r a n s f o r m (X [ : , 28])
127
34
143
144 c l a s s i f i e r = Sequential ()
145 c l a s s i f i e r . add ( Dense ( o u t p u t d i m = 6 , i n i t = ’ u ni f o rm ’ , a c t i v a t i o n = ’ r e l u ’ ,
i n p u t d i m = 1422 ) )
146 c l a s s i f i e r . add ( Dense ( o u t p u t d i m = 6 , i n i t = ’ u ni f o rm ’ , a c t i v a t i o n = ’ r e l u ’ ) )
147 c l a s s i f i e r . add ( Dense ( o u t p u t d i m = 1 , i n i t = ’ u n if o r m ’ , a c t i v a t i o n = ’ s i g moi d ’ ) )
148 c l a s s i f i e r . co m p ile ( o p t i m i z e r = ’ adam ’ , l o s s = ’ b i n a r y c r o s s e n t r o p y ’ , m e t r i c s = [
’ accuracy ’ ] )
149 c l a s s i f i e r . f i t ( X train , y train , b a t c h s i z e = 10 , nb epoch = 100 )
150 y pred = c l a s s i f i e r . pr e d i ct ( X test )
151 y pred = ( y pred > 0.5)
152
153 cm = c o n f u s i o n m a t r i x ( y t e s t , y p r e d )
154 ’ ’ ’ p r i n t ( cm )
155 p r i n t ( ’ Accuracy f o r Random F o r e s t ’ + s t r ( a c c u r a c y s c o r e ( y t e s t , y p r e d ) ) )
156 ’’’
157 from sklearn . feature selection import SelectKBest
158 from s k l e a r n . f e a t u r e s e l e c t i o n import chi2
159
160 X = X. a s t y p e ( i n t )
161 c h i 2 f e a t u r e s = S e l e c t K B e s t ( chi 2 , k = 2 )
162 X k b e s t f e a t u r e s = c h i 2 f e a t u r e s . f i t t r a n s f o r m ( X, y )
163
167 c h i 2 f e a t u r e s = S e l e c t K B e s t ( chi 2 , k = 2 )
168 X k b e s t = c h i 2 f e a t u r e s . f i t t r a n s f o r m ( X, y )
169
179 cm = c o n f u s i o n m a t r i x ( y t e s t , y p r e d )
35
180 p r i n t ( cm )
181 p r i n t ( ’ Accuracy f o r LDA ’ + s t r ( a c c u r a c y s c o r e ( y t e s t , y p r e d ) ) )
182
188
189 cm = c o n f u s i o n m a t r i x ( y t e s t , y p r e d )
190 p r i n t ( cm )
191 p r i n t ( ’ Accuracy f o r Random F o r e s t ’ + s t r ( a c c u r a c y s c o r e ( y t e s t , y p r e d ) ) )
Output
36
Figure 6.2: Epochs Ending of Training the data
N number of iterations is made to train the data. Here in our case N=100.
37
Figure 6.3: Accuracy and Confusion Matrix
Reduced feature number, accuracy and confusion matrix of LDA and RandomForest.
Accuracy of LDA id more compared to Random Forest.
38
Chapter 7
7.1 Conclusion
We could see that the dataset has been processed and trained. The output of the trained data has
been checked against the output set of the train data. The complete portion has once again checked
with the test train and test output data. The accuracy of the DNN has also been verified. It works
fine for the dataset and predicts the outcome accurately.
39
References
[1] Priya, P., U. Muthaiah, and M. Balamurugan. ”Predicting yield of the crop using
machine learning algorithm.” International Journal of Engineering Sciences Research
Technology 7.1 (2018): 1-7.
[2] Narasimhamurthy, V. and Kumar, P., 2017. Rice Crop Yield Forecasting Using Random
Forest Algorithm. Int. J. Res. Appl. Sci. Eng. Technol. IJRASET.
[3] R. Sujatha, Dr. P.Isakki,A Study on Crop Yield Forecasting Using Classification
Techniques, 978-1-4673- 8437-7/16/31.00 Ⓧ
c 2016 IEEE.
[4] Veenadhari, S., Bharat Misra, D Singh, Data mining Techniques for Predicting Crop
Productivity – A review article IJCST International Journal of Computer Science and
technology, march 2011.
[5] Shivnath Ghosh,Santanu Koley, “Machine Learning for Soil Fertility and Plant Nutrient
Management using Back Propagation Neural Networks” IJRITCC, vol. 2, Issue 2,292-
297,2014.
[6] Maier, Daniel, et al. ”Applying LDA topic in communication research: Toward a valid
and reliable methodology.” Communication Methods and Measures12.2-3 (2018): 93-118.
[7] Wu, Tianjun, et al. ”Geo-Object-Based Soil Organic Matter Mapping Using Machine
Learning Algorithms With Multi-Source Geo-Spatial Data.” IEEE Journal of Selected Topics in
Applied Earth Observations and Remote Sensing 12.4 (2019): 1091-1106.
[8] Ayaz, Muhammad, et al. ”Internet-of-Things (IoT)-Based Smart Agriculture: Toward
Making the Fields Talk.” IEEE Access 7 (2019): 129551-129583.
[9] Zingade, D. S., et al. ”Crop prediction system using machine learning.” Int. J. Adv. Eng.
Res. Dev. Spec. Issue Recent Trends Data Eng 4.5 (2017): 1-6.
[10] Rahman SA, Mitra KC, Islam SM. Soil classification using machine learning methods
and crop suggestion based on soil series. In2018 21st International Conference of Computer
and Information Technology (ICCIT) 2018 Dec 21(pp. 1-4). IEEE.
[11] Emrullah, A. C. A. R., Mehmet Sirac OZERDEM, and Burak Berk US- TUNDAG.
”Machine Learning based Regression Model for Prediction of Soil Surface Humidity over
Moderately Vegetated Fields.” 2019 8th International Conference on Agro-Geoinformatics
(Agro-Geoinformatics). IEEE, 2019.
[12] S.R.Rajeswari , Parth Khunteta, Subham Kumar, Amrit Raj Singh, Vaibhav Pandey,2019.
Smart Farming Prediction Using Machine Learning. International Journal of Innovative
40
Technology and Exploring Engineering (IJITEE).
[13] R. Shrestha (2016), Regression based corn yield assessment using MODIS based daily
NDVI in Iowa state, IEEE Fifth International Conference on Agro Geo informatics (Agro Geo
informatics), 24(12), 7-9.
[14] C. Y. Ji (1996), Delineating agricultural field boundaries from TM imagery using dyadic
wavelet transforms, IEEE ISPRS J. Photogramme Remote Sense, 51(6),268-283.
[15] E. V. White and D. P. Roy (2015), A contemporary decennial examination of changing
agricultural field sizes using Landsat time series data, IEEE Geogrphics Environment,12(5), 33-
65.
41