0% found this document useful (0 votes)
362 views52 pages

Sample Project Report

This project report describes predicting suitable farmland for agriculture using machine learning algorithms and soil data. Four students conducted the project under the guidance of a faculty member. They analyzed soil data on India's agricultural land across states to predict whether the land is suitable for farming. Supervised machine learning techniques were used to build models for classification. The models were trained and tested to accurately identify farmland based on soil features.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
362 views52 pages

Sample Project Report

This project report describes predicting suitable farmland for agriculture using machine learning algorithms and soil data. Four students conducted the project under the guidance of a faculty member. They analyzed soil data on India's agricultural land across states to predict whether the land is suitable for farming. Supervised machine learning techniques were used to build models for classification. The models were trained and tested to accurately identify farmland based on soil features.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

PREDICTING THE FARMLAND FOR

AGRICULTURE FROM SOIL FEATURES USING


MACHINE LEARNING

Project report submitted


in partial fulfillment of the requirement for award of the degree of

BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING
By

STUDENT NAME-1 (Reg No-1)


STUDENT NAME-1 (Reg No-2)
STUDENT NAME-1 (Reg No-3)
STUDENT NAME-1 (Reg No-4)

Under the guidance of


Ms. N. Sivaranjani, B.E., M.E.,
ASSISTANT PROFESSOR

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


SCHOOL OF COMPUTING
BHARATH INSTITUTE OF HIGHER EDUCATION AND RESEARCH
(Deemed to be University Estd u/s 3 of UGC Act, 1956)
CHENNAI 600 073, TAMILNADU, INDIA
March, 2021
CERTIFICATE

It is certified that the work contained in the project report titled ”PREDICTING
THE FARMLAND FOR AGRICULTURE FROM SOIL FEATURES USING
MACHINE LEARNING” by ” STUDENT NAME-1 (Reg No-1), STUDENT
NAME-2 (Reg No-2), STUDENT NAME-3 (Reg No-3), STUDENT NAME-4 (Reg
No-4)” has been carried out under my/our supervision and that this work has not
been submitted elsewhere for a degree.

Signature of Supervisor
Ms. N. Sivaranjani
Assistant Professor
Computer Science & Engineering
School of Computing
Bharath Institute of Higher Education and Research
March, 2021

Signature of Head of the Department

Dr. B. Persis Urbana Ivy


Professor & Head
Computer Science & Engineering
School of Computing
Bharath Institute of Higher Education and Research
March, 2021

i
DECLARATION

We declare that this written submission represents my ideas in our own words and
where others’ ideas or words have been included, we have adequately cited and
referenced the original sources. We also declare that we have adhered to all
principles of academic honesty and integrity and have not misrepresented or
fabricated or falsified any idea/data/fact/source in our submission. We understand
that any violation of the above will be cause for disciplinary action by the Institute
and can also evoke penal action from the sources which have thus not been properly
cited or from whom proper permission has not been taken when needed.

(Signature)
(STUDENT NAME-1)
Date: / /

(Signature)
(STUDENT NAME-2)
Date: / /

(Signature)
(STUDENT NAME-3)
Date: / /

(Signature)
(STUDENT NAME-4)
Date: / /

ii
APPROVAL SHEET

This project report entitled (PREDICTING THE FARMLAND FOR


AGRICULTURE FROM SOIL FEATURES USING MACHINE LEARNING) by
(STUDENT NAME-1 (Reg No-1), STUDENT NAME-2 (Reg No-2), STUDENT
NAME-3 (Reg No-3), STUDENT NAME-4 (Reg No-4)) is approved for the degree
of B. Tech in Computer Science & Engineering.

Examiners Supervisor

Ms. N. SIVARANJANI, B.E., M.E.,

Date: / /
Place:

iii
ACKNOWLEDGEMENT

First, we wish to thank the almighty who gave us good health and success throughout our project
work.

We express our deepest gratitude to our beloved President Dr. J. Sundeep Aanand, a n d
Managing Director Dr.E. Swetha Sundeep Aanand for providing us the necessary facilities for the
completion of our project.

We take great pleasure in expressing sincere thanks to Vice Chancellor Dr. K. Vijaya Baskar
Raju, Pro Vice Chancellor (Academic) Dr. M. Sundararajan, Registrar Dr. S. Bhuminathan and
Additional Registrar Dr. R. Hari Prakash for backing us in this project.

We thank our Dean Engineering Dr. J. Hameed Hussain for providing sufficient facilities for
the completion of this project.

We thank our Dean, School of Computing Dr. S. Neduncheliyan for his encouragement and the
valuable guidance throughout the project.

We record indebtedness to our Head, Department of Computer Science and Engineering


Dr. B. Persis Urbana Ivy for immense care and encouragement towards us throughout the course
of this project.

A special thanks to our Project Coordinators Mr. N. Nithiyanandam for his valuable guidance
and support throughout the course of the project.

We also take this opportunity to express a deep sense of gratitude to our Internal Supervisor
Ms. N. Sivaranjani for her cordial support, valuable information and guidance, he helped us in
completing this project through various stages.

We thank our department faculty, supporting staff and friends for their help and guidance to
complete this project.
M. USHA SRI (U17CN011)
I. LAKSHMI PRASANNA (U17CN014)
E. HEMA (U17CN023)
O. DHANA LAKSHMI (U17CN028)

iv
ABSTRACT

Agriculture has been a crucial part of human society due to the fact that the

civilization growth directly depends on agriculture. Advancements in agriculture are

necessary to raise food production as the population is increasing day by day. We

need to explore the use of innovative methods to improve crop selection for the soil

depending on its texture, type, moisture, temperature, humidity, soil erosion, and

slope. Machine Learning, can help us in learning the data, analyzing the data fields

predicting the future. For predicting the farmland, supervised machine learning

techniques are used. India’s agriculture land over all its states has been used to

predict whether it is farmland or not using Machine Learning Algorithms and making

the resources available be- forehand.

Keywords: Agriculture, Food Production, Supervised Machine Learning,

Machine Learning Algorithms.

v
LIST OF FIGURES

4.1 General Architecture Diagram ............................................................. 16

4.2 Data Flow Diagram of Level-0 .............................................................. 17

4.3 Data Flow Diagram of Level-1 .............................................................. 18

4.4 UML Diagram ......................................................................................... 19

4.5 Use Case Diagram ................................................................................... 20

4.6 Collaboration Diagram .......................................................................... 21

4.7 Sequence Diagram .................................................................................. 22

4.8 Module 1 .................................................................................................. 24

4.9 Module 2 .................................................................................................. 25

4.10 Module 3 .................................................................................................. 27

4.11 Module 4 .................................................................................................. 28

4.12 Module 5 .................................................................................................. 29

5.1 Test Image ............................................................................................... 36

6.1 Epochs Starting of Training the data ................................................... 44

6.2 Epochs Ending of Training the data ..................................................... 45

6.3 Accuracy and Confusion Matrix ........................................................... 46

6.4 Variable Explorer of data ...................................................................... 46

vi
LIST OF ACRONYMS ANDABBREVIATIONS

AI Artificial Intelligence

ANN Artificial Neural Network

CNN Convolution Neural Network

CSV Comma Separated Values

CYP Crop Yield Prediction

DFD Data Flow Diagram

DNN Deep Neural Network

FN False Negative

FP False Positive

IoT Internet of Things

KNN K Nearest Neighbor

LDA Linear Discriminant Analysis

ML Machine Learning

RF Random Forest

SVM Support Vector Machine

TP True Positive

TN True Negative

vii
TABLE OF CONTENTS

Page No.

ABSTRACT v

LIST OF FIGURES vi

LIST OF ACRONYMS AND ABBREVIATIONS vii

1 INTRODUCTION 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Aim of the project . . . . . . . . . . . . . . . . . . . . 2
1.3 Project Domain . . . . . . . . . . . . . . . . . . . . . 2
1.4 Problem Statement . . . . . . . . . . . . . . . . . . . 3
1.5 Scope of the Project . . . . . . . . . . . . . . . . . . . 4
1.6 Methodology . . . . . . . . . . . . . . . . . . . . . . 4

2 LITERATURE REVIEW 5

3 PROJECT DESCRIPTION 9
3.1 Existing System . . . . . . . . . . . . . . . . . . . . . 9
3.2 Proposed System . . . . . . . . . . . . . . . . . . . . 9
3.3 Feasibility Study ....................................................................................... 10
3.3.1 Economic Feasibility ................................................................ 10
3.3.2 Technical Feasibility ................................................................. 11
3.3.3 Social Feasibility....................................................................... 11
3.4 System Specification ................................................................................ 11
3.4.1 Hardware Specification............................................................. 11
3.4.2 Software Specification .............................................................. 12
3.4.3 Standards and Policies .............................................................. 12
4 MODULE DESCRIPTION 16
4.1 General Architecture................................................................................. 16
4.2 Design Phase............................................................................................. 17
4.2.1 Data Flow Diagram................................................................... 17
4.2.2 UML Diagram........................................................................... 19
4.2.3 Use Case Diagram .................................................................... 20
4.2.4 Collaboration Diagram ............................................................. 21
4.2.5 Sequence Diagram .................................................................... 22
4.3 Module Description .................................................................................. 23
4.3.1 Data Preprocessing ................................................................... 23
4.3.2 Data Visualization and descriptive statistics ............................ 25
4.3.3 Data Splitting ............................................................................ 26
4.3.4 Predicting Admissions .............................................................. 28
4.3.5 The Evaluation of Model Performance .................................... 29

5 IMPLEMENTATION AND TESTING 30


5.1 Input and Output ....................................................................................... 30
5.1.1 Input Design .............................................................................. 30
5.1.2 Output Design ........................................................................... 31
5.2 Testing ...................................................................................................... 31
5.3 Types of Testing ....................................................................................... 32
5.3.1 Unit testing ................................................................................ 32
5.3.2 Integration testing ..................................................................... 33
5.3.3 Functional testing . . . . . . . . . . . . . . . . 34
5.3.4 White Box Testing .. . . . . . . . . . . . . . 34
5.3.5 Black Box Testing .. . . . . . . . . . . . . . 35
5.3.6 Test Result .. . . . . . . . . . . . . . . . . . 36
5.4 Testing Strategy . . . . . . . . . . . . . . . . . . . . . 37

6 RESULTS AND DISCUSSIONS 38

6.1 Efficiency of the Proposed System .. . . . . . . . . . 38


6.2 Comparison of Existing and Proposed System . . . . . 38
6.3 Advantages of the Proposed System .. . . . . . . . . 39
6.4 Sample Code .. . . . . . . . . . . . . . . . . . . . . 39

7 CONCLUSION AND FUTURE ENHANCEMENTS 47


7.1 Conclusion .. . . . . . . . . . . . . . . . . . . . . . 47
7.2 Future Enhancements . . . . . . . . . . . . . . . . . . 47

References 49
Chapter 1

INTRODUCTION

1.1 Introduction
From the past, agriculture is one of the leading sectors practiced in India. Ancient people farm
the crops in their land. Therefore, natural crops are cultivated and had been used by human
beings, animals and birds. The invention of new technologies in the field of agriculture is slowly
disgracing. Due to, plenty of inventions in the agriculture sector people are more concentrated on
cultivating hybrid products which ultimately leads to an unhealthy life, lack of nutrition.
Nowadays, modern people don’t have awareness about farming crops. The changes in the farming
seasons, climatic conditions, soil features like water availability, and soil erosion, etc affects the
cultivating techniques.
By examining all these issues and problems like temperature, weather, and several factors,
there is no true solution and technologies to overcome the situation facedby farmers. In India,
the economical growth of agriculture can be increased in many ways. There are different ways to
increase and improve the crop yield and the quality of the crop. Machine Learning algorithms can
be used for predicting crop yield production-based on the soil features. Supervised Learning
Classification technique is used to predict the accuracy. In supervised learning, both input and
output are given to the system by splitting the data into training and testing data. Training
data is used to train the system and testing data is used for predicting the accuracy. For
predicting the accuracy three algorithms are used and accuracy among them is compared.
1) DNN -Deep Neural Network

2) Random Forest

3) Linear Discriminant Analysis

Practical implementation of the models developed in decision support tools that would provide
a snapshot of areas and classify it whether agriculture could be practiced there or not. Along with
these proposed algorithms, the RANDOM FOREST will be implemented to predict the data and to
improve accuracy.

1
1.2 Aim of the project
The aim of the project is to use Machine Learning to find whether a particular land is
adaptable for agriculture, with the perfect conditions for agriculture, and to improve the accuracy
of the algorithms and compare to find the best among the three.

1.3 Project Domain


The ability to computers to learn without being explicitly programmed is gained by the use
of Machine learning. ML is one of the most thrilling technologies that one would have known
before. In general, Machine Learning is a type of artificial intelligence that extracts patterns out
of unprocessed raw data by using algorithms or methods. The important factor of Machine
Learning is to allow the computer to learn from past experiences without being directly
programmed or human involvement. Computers are being programmed on their own using
Machine Learning. If normal programming is automation, then machine learning is making the
process of automation. In normal programming data and program is run on the computer to
produce the required output but in Machine Learning data and output is run on the computer to
create are queried program. This program is useful in normal programming. Machine learning is
sub-categorized into three types. They are:
i. Supervised Learning.
ii. Unsupervised Learning.
iii. Reinforcement Learning.
Supervised Learning can be considered as the learning that is guided by a teacher. We have a
required dataset that acts as a teacher and its task is to train the model or the machine. After
training of the model is done it is ready for making predictions.

1.4 Problem Statement


Due to illiteracy among farmers, it is difficult to select a profit yielding crops based on
the soil features. This ultimately ruining the investor’s money, manpower, time, etc. So,
advancements in agriculture are necessary to balance the demand and supply as the population is
increasing day by day. So, the need of exploring the use of innovative methods to improve the
crop growing accuracy in that farmland from the soil depending on its texture, type, moisture is
there. Machine Learning, can help in learning, analyzing and predicting the data for the future.

2
1.5 Scope of the Project
The scope of the project is to identify the Farmlands based on the agriculture conditions of all
types of soil features. It required many inputs such as temp, soil conditions, erosion conditions,
humidity, etc and output are collected as a dataset which includes all information about agriculture
lands that is needed for the project. The agriculture sector is necessary to balance the supply as the
population and de- mand are increasing day by day. Agriculture will be formed as the management
of land parcels and up productivity. It serves solely up to date society and additionally our future
generations. The project is to give valuable information to users about profit yielding farmlands
based on the soil features. So, the need of exploring the use of innovative methods to improve the
crop growing accuracy in that farmland from the soil depending on its texture, type, moisture is
there. Machine Learning, can help in learning, analyzing and predicting the data for the future.

1.6 Methodology
The data is imported, cleaned and standardized. In the preprocessing stage label, encoding and
one hot-encoding is done to convert the data into machine-readable data. Then the data is split into
training data and testing data. After splitting the data, standard scaling is done on the training and
testing data. Then the data is taken for classification for predicting the accuracy and thereby
obtaining the confusion matrix. Different accuracies for obtained for different algorithms.

3
Chapter 2
LITERATURE REVIEW

This chapter gives the overview of literature survey. This chapter represents some of the relevant
work done by the researchers. Many existing techniques have been studied by the researchers on
Search System for proper agriculture prediction problem using python and Spyder we review some
of them below.
The author in [1] presents that Machine Learning deals with issues where the relationship
between input and output variables acquainted or exhausting to get. This characteristic is useful to
model sophisticated non-linear behaviors, sort of operate for crop yield prediction. Machine
learning techniques most successfully applied to Crop Yield Prediction (CYP). The work
methodology continues until the model achieves a desired level of accuracy on the work info.
Samples of supervised Learning: Regression, call Tree, Random Forest, KNN, offer Regression,
etc. This will facilitate farmers to grow the right number of crops within the needed land and to
understand the precipitation liquid ecstasy temporary worker, min temporary worker of that space.
This paper concentrates on the prediction of the foremost profitable crop that may be adult within
the agricultural land exploitation machine learning techniques. This paper includes the utilization
of the android system which will offer them 64000-time crop analysis victimization varied
lookout reports and soil quality. Therefore, farmers would adult the foremost profitable crop
within the bestappropriate months.
The author in paper [2] talks about developing accurate models for crop yield estimation using
Information and Communication Technologies that may help farmers and other stakeholders
improve deciding in reference to national food import/exports and food security. And this will
examine the performance of the RF and the MLC methods.
The author in [3] describes, however, the recent agricultural info is often wont to describe the
long run prediction of crops and yield. It conjointly suggests the farmers regarding what style of
the crop is often big victimization the weather station info and provides the acceptable info to
like the correct season for excellence farming. The data processing techniques area unit
mentioned thoroughly.
The author in [4] describes the vital role performed by data processing ways in the agricultural
field. They have conferred the various metric capacity unit algorithms like Random forest, SVM,
ANN, etc. The crops were foreseen primarily supported climatical options which provide accuracy
score of regarding ninety-fifth with the C4.5 formula.

4
The main objective of this paper [5] is to the analysis of main soil properties like organic
matter, essential plant nutrients; micronutrient that affects the expansion of crops. BPN will find
and suggest the proper correlation percentage among those properties. The machine learning system
will be split into three steps, the first sampling second one is Back Propagation Algorithm and the
third one is Weight updating. The performance of the rear Propagation Neural network model is
going to be evaluated employing a test data set.

The author in paper [6] presents the Latent Dirichlet allocation (LDA) topic models area unit
more and more being employed in communication analysis. Yet, queries relating to dependability
and validity of the approach have received very little attention so far. In applying LDA to matter
information, researchers got to tackle a minimum of four major challenges that have an effect on
these criteria: (a) applicable preprocessing of the text collection; (b) adequate choice of model
parameters, together with the number of topics to be generated; (c) analysis of the model’s
reliability; and (d) the method of with validity deciphering the ensuing topics. The author has a
tendency to review the analysis literature coping with these queries and propose a technique that
approaches these challenges consequently; users have a tendency to develop a short active user
guide for applying LDA topic modeling. Author has a tendency to demonstrate the worth of our
approach with empirical information from AN in progress scientific research.
The innovation of this paper [7] lies in combining CNN-based learning methods for
producing geo-objects and tree-based learning methods for mapping soil property. To improve the
precision of predicting soil properties in a geographic space, the author developed a novel geo-
object-based soil property mapping procedure using machine learning algorithms.

The author in paper [8] gives information about the potential of wireless sensors and IoT in
agriculture, and also because of the challenges expected to be faced when integrating this
technology with normal farming practices. The paper [8] identifiesthe current and future trends of
IoT in agriculture and highlights the potential research challenges.

The author in paper [9] states that the requirements and planning which is needed for developing
software model for precision farming. It deeply studies the fundamentals of precision farming. The
great objective of the model is to deliver direct advisory services to even the tiniest farmer at the
extent of his/her smallest plot ofthe crop, using the foremost accessible technologies like SMS and
email. This model has been designed for the scenario in Kerala State where the quality holding size
is much but most of India. Hence this model is often positioned elsewhere in India only with some
modifications.
The author in paper [10] states that Soil is a crucial ingredient of agriculture. There are

5
many kinds of soils available. Each kind of soil can have different types of features and different
types of crops grow on differing types of soil. Users would like to know the features and
characteristics of various soil types to understand which crops grow better in certain soil types.
Machine learning techniques are often helpful during this case. In recent years, it’s progressed tons.
Machine learning remains an emerging and challenging research field in agricultural data analysis.
The author has proposed a model that can easily suggest suitable crops. Several machine learning
algorithms like weighted k-Nearest Neighbor (k-NN), logistic regression, and DNN based Support
Vector Machines (SVM) are used for soil classification methods.

6
Chapter 3
PROJECT DESCRIPTION

3.1 Existing System


The existing system is based on crop yield prediction using the data mining association rule. The
data mining technique is used to extract information from a huge dataset. The input is given in the
form of the dataset with different fields required for prediction. Then the dataset is preprocessed
for eliminating unwanted data. After preprocessing, the data is clustered using a k-means clustering
algorithm. The clustered is converted to 0’s and 1’s, then used for association rule mining where the
rules are created for frequent pattern mining using the Apriori algorithm. This system predicts the
crop yield based on past data. It doesn’t compare among different algorithms.

3.2 Proposed System


The proposed system will predict the accuracy of whether the land is suitable for farming or not.
If the accuracy is more crops can be grown there. The system is loaded with soil datasets like the
area, texture, irrigation facilities, rotation, yield, soil erosion, wind erosion, slope, removal, etc.
The chi-square feature algorithm is used for Feature Extraction, Selection, and Scaling. It reduces
the noise features of the dataset and optimizes the features for the system to process. The
accuracy willbe developed and increased with the help of algorithms like DNN, Random Forest
and Linear Discriminant Analysis. The best of the three is analyzed.
Advantages
By using these algorithms Random Forest, Support Vector Machine, Deep neural network, we
have compared the accuracy and performance of these models using confusion matrix and random
simulation.

3.3 Feasibility Study


The feasibility of the project is analyzed throughout this section and also the business proposal
is put forth with an awfully general plan for the project and a couple of cost estimates for making
the project. During system analysis, the feasibility study ofthe proposed system is to be distributed.
This is often to make sure that the proposed system isn’t a burden to the corporate or any
institution. For feasibility analysis, some understanding of the most important requirements for
the system is important.Three key considerations involved in the feasibility analysis for the project
are:
1. Economic Feasibility.

7
2. Technical Feasibility.
3. Social Feasibility.
3.3.1 Economic Feasibility
This study is carried so that the system is economically feasible, as it uses the Keras library. It
requires less cost to complete the entire system. And the system is made using open-source tools
like anaconda navigator, Spyder and python.

3.3.2 Technical Feasibility


The study is carried so that the system is technically feasible as only technical knowledge
person can use the system for predicting the accuracy provided the availability of dataset.
3.3.3 Social Feasibility
The study is carried so as to provide input to the system. The dataset is gathered from various
social sources like web, sensors data using IoT devices. This ultimately makes that the data is
correct.

3.4 System Specification


The system requirement is that the most an element of the analyzing phase of the project. The
analyzer of the project possesses to properly analyze the hardware and thus the software
requirements; otherwise, within the future, the project designer will face more trouble with the
hardware and software required. Below specified are the project hardware and software
specifications that are utilized within the project.

3.4.1 Hardware Specification


The hardware requirements may function the idea for a contract for the implementation of the
system and will, therefore, be an entire and consistent specification of the entire system. They are
employed by software engineers because of the start line for the system design. It shows what the
entire system design does and not how it should be implemented.
• SYSTEM: INTEL CORE
• RAM: 4 GB

8
• MONITOR: 15” LCD
• HARD DISK: 120 GB

3.4.2 Software Specification


The software requirements document is the software specification of the system. It should
include both a definition and a specification of the requirements for the project. It is a set of what
the system should do rather than how the system should do it. The software specifications provide a
basis for creating the software requirements specification for the project. It is useful for estimating
the cost, planning team activ- ities, performing tasks of the project and tracking the team’s progress
throughout the development activity.
• OPERATING SYSTEM: WINDOWS 10
• PROGRAMMING LANGUAGE: PYTHON
• TOOLS: ANACONDA, SPYDER
• DATABASE: KERAS LIBRARY

3.4.3 Standards and Policies


Anaconda
Anaconda can be a Python-based totally processing and scientific computing plat- form. it’s
built-in lots of very beneficial third-birthday celebration libraries. In- stalling Anaconda is love
automatically installing Python and a few commonly used libraries like Numpy, Pandas, Scrip, and
Matplotlib, so it makes the installation easier than regular Python installation. Conda away at the
command-line interface, for example, Anaconda Prompt on Windows and terminal on macOS and
Linux. Navi- gator is a work area graphical UI that permits to dispatch applications and effectively
oversee conda bundles, conditions, and channels without utilizing command-line commands. Both
conda and Navigator sees which is on the whole correct to deal with the bundles and situations.
Indeed, even switch among them and the work canbe seen.
Standard used: 2018.12
Python
Python is a programming language that is used to integrate your systems more effectively and its
features are a dynamic type system and automatic memory management. It supports multiple
programming paradigms for developing the project, including object-oriented, functional and
procedural, and features are oversized and it includes comprehensive standard library functions.
Python interpreters to be hadfor numerous running structures for several operating systems.
CPython, the reference implementation of Python and its miles open supply software and features

9
a community-based development model, as do nearly all of Python’s other implementations. Python
and CPython are maintained by the non-profit Python Software Foundation Python maybe a multi-
paradigm programming language. Object- oriented programming and structured programming are
fully supported, and some of its features support functional programming and object-oriented
programming.
Python uses dynamic typing and a mix of reference counting and it’s miles used for memory
management and It additionally functions dynamic call the resolution, which binds approach and
variable names at some point of software execution. The standard library has modules that
enforce functional gear borrowed from Haskell and Standard Machine studying. Python Machine
Learning, Machine Learning also termed ML. and It is a subset of Artificial Intelligence. It deals
with machine learning algorithms that can look at data that are required for a working model to
learn fromit and make predictions.
Standard used: 3.7.1
Spyder
Spyder is one amongst the open-source cross-platform IDE of the atmosphere for scientific
programming within the Python programing language that has been employed during this project.
Spyder integrates with an expansion of prominent pack- ages within the scientific Python stack,
which including NumPy, SciPy, Matplotlib, pandas, IPython, SymPy, and Cython of python,
likewise as other open-source soft- ware. Spyder has been maintained and continuously improved
by a team of Python developers.
Spyder uses for its Graphic User Interface (GUI) and is supposed to use either of the PyQt or
PySide Python bindings. QtPy, a skinny abstraction layer is developed by the Spyder
assignment and later adopted by using a couple of different pack- ages, provides the pliability
to apply either backend. It is available on cross-platform through Anaconda, on the Windows
platform with WinPython library.
Standard used: 3.3.2
Keras
Keras is an Open Source Neural Network library written in Python programming language that
runs on top of Theano or Tensorflow. it’s designed to be modular, fast and simple to use. it had
been developed by François Cholet, a Google engineer. Keras doesn’t handle the low-level
computation. Keras is using instead of it uses another library to try it and it is known as ”Backend”.
So Keras could be a high-level API wrapper for the low-level API, capable of running on top of
Tensorflow, CNTK, or Theano.
Keras High-Level API handles are used to make models, defining layers, and multiple input-
output models. At a high-level API, Keras also compiles the model with loss and optimizer

10
functions, training process with fit function. and it doesn’t handle Low-Level API models like
making the computational graph, for making the tensors and different variables because it’s been
handled by way of the ”backend” engine.
Standard used: 2.3.0

11
Chapter 4
MODULE DESCRIPTION

4.1 General Architecture

Figure 4.1: General Architecture Diagram

Description of the Architecture diagram

After the installation of required software tools, the soil dataset is loaded into the Spyder
software to preprocess the data using Label Encoding, One hot Encoding,and Scaling. In label
encoding each row to computer-readable values i.e.; 0’s and 1’s. After the data preprocessing
feature extraction is done. Then the data is split into the training and the testing data. The
percentage of training data and testing data has to be chosen; preferably training data 80 per and
testing data 20 per. On the training and testing data, the classification techniques like DNN,
Random Forest, and LDA are done. Data can be trained in N number of iterations called Epoch’s
where the N value is an integer. Finally, the confusion matrix and accuracy of each classifier are
displayed as output. The best of the three is compared.

12
4.2 Design Phase
4.2.1 Data Flow Diagram

Figure 4.2: Data Flow Diagram of Level-0

Description of Data Flow Diagram Level-0


Level-0 DFD is like an abstract view of the entire project. It tells about the outline of the flow
of input to the system and output from the system indicated by the arrows. The direction of the
arrow shows the flow of data from input to output.

13
Figure 4.3: Data Flow Diagram of Level-1

Description of Data Flow Diagram Level-1


In level-1 DFD each of the sub-processes is clearly explained by showing the detailed input and
output of each subprocess. In the above diagram, each subprocess is shown in detail with input
and output. In module 1 label encoding and one-hot encoding is done. In module 2 scaling and
transformation of data is done. In module 3 the data is split into training and testing data. In the
next module, the data is classified for feature extraction used for predicting accuracy. In the final
module, the confusion matrix and accuracy of different classifiers are obtained.

14
4.2.2 UML Diagram

Figure 4.4: UML Diagram

Description of UML Diagram


The UML diagrams are used to understand, build and document the requirements of the
system which is under development. UML combines techniques from data modeling diagrams and
workflows, object modeling, and component modeling. In the software development life cycle,
UML is used among different stages in the development of the model.

15
4.2.3 Use Case Diagram

Figure 4.5: Use Case Diagram

Description Use Case Diagram


The use case diagram is used to tell the functional requirement of the system and its
interaction with external agents. It represents where the system can be used. The use case diagram
gives a high-level view of what the system does without implementing the whole details. In the
above diagram, the interaction between the modules is shown.

16
4.2.4 Collaboration Diagram

Figure 4.6: Collaboration Diagram


Description Collaboration Diagram
This diagram shows the relationship between the objects of a system. It defines the roles of
the objects that perform a flow of events.

17
4.2.5 Sequence Diagram

Figure 4.7: Sequence Diagram

Description Sequence Diagram


The sequence diagram talks about the interactions that take place in each submodule. It is also
called as Event diagrams as it talks about the event that occurs in each submodule. After fetching
the dataset, preprocessing and feature extraction of the dataset are done. Then the data is split into
training and testing and accuracy is predicted using the selected algorithms.

18
4.3 Module Description
4.3.1 Data Preprocessing
Before processing the model, using the data preprocessing technique the raw data is converted
into a clean data. There are different techniques for data preprocessing. Firstly, import the necessary
libraries, importing the dataset, Handling of the missing data in the dataset by applying the data
cleaning method. The data has to be Cleaned before loading into the Neural Classifier by Handling
of categorized data and feature scaling methods are required. These are implemented by using the
following.
Label Encoding
The data should be made ready for the model before using it. To convert any kind of
categorical text data into model-understandable numerical data, this project uses the Label Encoder
class, this work has to do is to label encode the first column. Sclera library contains label encoder,
using fit() and transform() methods on the data, and then replace the existing data with the newly
transformed data.
One hot Encoding
Categorical variables are converted into a form that would be provided to Machine Learning
algorithms to try and do a far better job in prediction is called One hot encoding. The only reason
behind using a one-hot encoder is to perform “binarization” of the category and make it as a feature
to guide the model.

Standard Scaler
ScikitLearn package provides Standard Scaler. This work fitting and transforming the standard
Scaler method on train data. This project has to standardize our scaling so this will use the same
fitted method to transform/scale test data. The independent features present in the data can be
standardized using Feature Scaling.

4.3.2 Data Visualization and descriptive statistics


Finally, after we want to visualize data as plots and charts to find out more about it, we are
able to use pandas with Matplotlib libraries. There are two styles of plots
1. Univariate -it’s suggested to examining one variable.
2. Multivariate – it’s suggested to examining quite two variables.
Since Histogram group data into bins and provides us a plan of what percentage observations
each bin holds, this can be a good way to visualize data for ML. A density is used to plot appears
to be an abstracted histogram. Each bin contains a smooth curve drawn through its top.

19
Figure 4.9: Module 2

4.3.3 Data Splitting


In statistics and machine learning of this working, the model is usually split our data into two
subsets: training data and testing data and also sometimes to three:train, validate and test of the
given dataset, and fit our model on the train data, so asto create predictions on the test data. When
this work does that, one of two things might happen: we overfit our model or we underfit our
model.
Overfitting
Overfitting means that the model system trained has to be trained “too well” and is now, well,
fit too closely to the training dataset model. it usually happens when the model is too complex
because of too many features and variables compared to the number of observations of a given
dataset. In order to make the model more accurate on the training data is to be provided but will
not be accurate if the data is untrained or new. It is because this model is not generalized, meaning
it can generalize the results and can’t make any inferences on other data, which is, ultimately, what
the system trying to do. If the model is provided with noise in the training data instead of noise-
free trained data, the actual relationships between variables in the data will not be extracted
properly.
Underfitting
In contrast to overfitting, when a model is under fitted, it means that the model does not fit the

20
training data. It means the model could not be generalized into new data. It could also happen when
the system fits a linear model to data that is not linear. It almost goes without saying that this model
will have a low predictive ability on training data and cannot be generalized to other data.

Train / Test Split


The data that use is typically split into training data and test data. The trainingset contains a
known output and also the model learns on this data so as to be generalized to other data later. This
work has the test dataset (or subset) so as to check our model’s prediction on this subset. In this
project of current working model usually split the information of data around 20-80 percent
between testing and training stages, Under supervised learning.

Figure 4.10: Module 3

21
4.3.4 Predicting Admissions
This project has chosen the DNN – Deep Neural Network LDA and Random Forest
algorithms for prediction. Neural networks use randomness intentionally by choice to verify they
effectively learn the function being approximated for the matter. Randomness is employed because
this class of machine learning algorithm performs better with it than without. The most common
kind of randomness utilized in neural networks is that the random initialization of the network
weights.
DNN can be implemented using a sequential classifier by adding layers. In this add, dense
methods are used to add layers. The compile method is used to run the layers conversion using
adam optimizer, loss binary cross entropy, and accuracy as metrics. The fit method is used to
pass the input to the model and epoch can be set for the number of iterations. Similarly, LDA
and Random Forest algorithms are implemented and the respective confusion matrix and the
accuracy of each algorithm are predicted.

Figure 4.11: Module 4

22
4.3.5 The Evaluation of Model Performance
Evaluation of the performance of the machine learning model is done by predicting the accuracy
of the algorithms under supervised learning. Another method is a confusion matrix by comparing
the Actual data and Predicted data. It contains the predicted outcomes of the Y label compared with
the actual output of the Y test data.
There are various ways to check the performance of the machine learning model is1) Confusion
matrix - For simplicity, it’ll mostly discuss things in terms of a binary classification problem some
common terms to be clear with are: True positives (TP): Predicted positive and are literally positive.
False positives (FP): Predicted positive and are literally negative. True negatives (TN): Predicted
negative and are literally negative. False negatives (FN): Predicted negative and are literally
positive. A con- fusion matrix is just a representation of the above parameters in a matrix format.
Better visualization is usually good 2) Accuracy - The most commonly used metric to judge a
model and is actually not a clear indicator of the performance.

Figure 4.12: Module 5

23
Chapter 5
IMPLEMENTATION AND TESTING

The implementation stage of the project, the design is turned into a complete working system.
Therefore, it is the key stage in achieving the complete system. This stage involves planning,
examining the existing system and its limitations. It is a very crucial stage of the project when the
theoretical design of the working model is turned out into a working system. It is the most
important stage in achieving a successful new system of the model so that the new system will work
to be effective and to be efficient.

5.1 Input and Output


5.1.1 Input Design
It plays a key role in the development system, as the system gets the input in the form of Comma
Separated Values [csv] file. The CSV dataset consists of all the data required for the prediction like
soil erosion, wind erosion, texture, area, moisture content, humidity, yield, rotation, temperature,
season, slope and water erosion, etc. This dataset containing all the data of 34000+ rows is provided
as input into the system. After loading the dataset to the system all the required columns are
extracted and then label encoding and one hot encoding are done. After label encoding and one
hot encoding feature extraction is done to split the data into training and testing data. This split
data is used for predicting the accuracy of farmland.

5.1.2 Output Design


It is based on the input provided to the system. The accuracy may differ slightly if the number of
rows varies. Training of system can be done in N number of iterations called Epochs. The final
output is in the form of an integer representing the accuracy percentage and confusion matrix
which tells the useful and not useful rows among test data. As the system uses three different
classifiers for predicting the accuracy, three accuracies and confusion matrices will be displayed as
output. So, based on the accuracy percentage the best algorithm among the three will be identified.
Test Data
Test data will be data which has been explicitly recognized for use in tests, commonly of a PC
program. A few data might be utilized in a corroborative manner, commonly to check that a given
arrangement of contribution to a given capacity delivers some normal outcome.
Train Data
Training data is utilized to train an algorithm. For the most part, training data isa sure level of

24
a general dataset alongside testing set. When in doubt, the better the training data, the better the
calculation or classifier performs.

5.2 Testing
The reason for testing is to find mistakes. It gives an approach to check the usefulness of
modules and the performance of the entire system. Testing is used for finding the model accuracy
or model performance. Testing identifies the errors present in theentire system.

5.3 Types of Testing


5.3.1 Unit testing
Input

1 i m p o r t numpy as np
2 i m p o r t panda s as pd
3 import matpl ot lib
4 import theano
5 import tensorflow
6 import keras
7 from s k l e a r n . m e t r i c s i m p o r t c o n f u s i o n m a t r i x
8 from s k l e a r n . m e t r i c s i m p o r t a c c u r a c y s c o r e
9 d a t a s e t = pd . r e a d c s v ( ’C : / / Users / / P r a s a d / / Desktop / / Agri / / F i n a l / / S o i l D a t a s e t . csv ’
)
10 X = dataset . iloc [ : , 3:32]. values
11 y = dataset . iloc [ : , 32]. values

Test result
It is checked whether or not the data is properly flowing into this system unit andproperly
occur out of it or now not using module interface testing.

5.3.2 Integration testing

Input

25
7

8
i m p o r t panda s
9
as pdi m p o r t m a t
10
plotlib
11
import theano
12
import tenso
13
r f l o wi m p o r t k
14
eras
15
d a t a s e t = pd . r e a d c s v ( ’C : / / Users / / P r a s a d / / Desktop / / Agri / / F i n a l / / S o i l D a t a s e t .
16
csv ’
17
)
18
X = dataset . iloc [: , 3:32]. v
19
a l u e sy = d a t a s e t . i l o c [ : , 3 2 ] . v
from s k l e a r n . p r e p r o c e s s i n g i m p o r t Label E ncoder , One Hot
20
aEncoder
lues
21
l a b e l e n c o d e r X 0 = L abel E nc o d er ( )
X [ : , 0 ] = l a b e l e n c o d e r X 0 . f i t t r a n s f o r m (X 0])
[ : ,l a b e l e n c o d e r X 1 = L abel E nc o d er ( )
X [ : , 1 ] = l a b e l e n c o d e r X 1 . f i t t r a n s f o r m (X 1])
[ : ,l a b e l e n c o d e r X 2 = L abel E nc o d er ( )
X [ : , 2 ] = l a b e l e n c o d e r X 2 . f i t t r a n s f o r m (X 2])
[ : ,l a b e l e n c o d e r X 3 = L abel E nc o d er ( )
X [ : , 3 ] = l a b e l e n c o d e r X 3 . f i t t r a n s f o r m (X 3])
[ : ,l a b e l e n c o d e r X 4 = L abel E nc o d er ( )
X [ : , 4 ] = l a b e l e n c o d e r X 4 . f i t t r a n s f o r m (X 4])
[: ,

Test result
If the dataset is loaded properly, label encoding is done for every column in the dataset. If the
dataset is not properly, an error will occur by failing the integration among two tasks i.e., loading
dataset and label encoding.

26
5.3.3 Functional testing
Input

1 o n e h o t e n c o d e r = One Hot Encoder ( c a t e g o r i c a l f e a t u r e s = [ 1 ] )


2 X = o n e h o t e n c o d e r . f i t t r a n s f o r m (X) . t o a r r a y ( )
3 X = X[ : , 1 : ]
4

5 from s k l e a r n . m o d e l s e l e c t i o n i m p o r t train test split


6 X t r a i n , X t e s t , y t r a i n , y t e s t = t r a i n t e s t s p l i t ( X, y , t e s t s i z e = 0 . 2 )
7 from s k l e a r n . p r e p r o c e s s i n g i m p o r t S t a n d a r d S c a l e r
8 sc = S t a n d a r d S c a l e r ( )
9 X t r a i n = sc . f i t t r a n s f o r m ( X t r a i n )
10 X t e s t = sc . t r a n s f o r m ( X t e s t )

Test Result
It checks whether the information passed to the function as an argument is working properly or
not. In the above diagram, the train, test split is the function used. It will throw an error if the
information is not properly passed.

5.3.4 White Box Testing

1 import keras
2 from k e r a s . models i m p o r t S e q u e n t i a l
3 from k e r a s . l a y e r s i m p o r t Dense
4 c l a s s i f i e r = Sequential ()
5 c l a s s i f i e r . add ( Dense ( o u t p u t d i m = 6 , i n i t = ’ u ni f or m ’ , a c t i v a t i o n = ’ r e l u ’ ,
i n p u t d i m = 1422 ) )
6 c l a s s i f i e r . add ( Dense ( o u t p u t d i m = 6 , i n i t = ’ u n ifo rm ’ , a c t i v a t i o n = ’ r e l u ’ ) )
7 c l a s s i f i e r . add ( Dense ( o u t p u t d i m = 1 , i n i t = ’ u n ifo rm ’ , a c t i v a t i o n = ’ s i gm oid ’ ) )
8 c l a s s i f i e r . co m pil e ( o p t i m i z e r = ’ adam ’ , l o s s’ = ’ b i n a r y c r o s s e n t r o p y ’ , m e t r i c s = [
accuracy ’ ] )
9 c l a s s i f i e r . f i t ( X train , y train , batch size = 10 , nb epoch = 100 )
10 y pred = c l a s s i f i e r . pre dict ( X test )
11 y pred = ( y pred > 0.5)

27
12 cm = c o n f u s i o n m a t r i x ( y t e s t , y p r e d )
13 from s k l e a r n . f e a t u r e s e l e c t i o n i m p o r t S e l e c t K B e s t
14 from s k l e a r n . f e a t u r e s e l e c t i o n i m p o r t c h i 2
15 X = X. a s t y p e ( i n t )
16 c h i 2 f e a t u r e s = S e l e c t K B e s t ( chi 2 , k = 2 )
17 X k b e s t f e a t u r e s = c h i 2 f e a t u r e s . f i t t r a n s f o r m ( X, y )
18 p r i n t ( ’ O r i g i n a l f e a t u r e number : ’ , X. shape [ 1 ] )
19 p r i n t ( ’ Reduced f e a t u r e number : ’ , X k b e s t f e a t u r e s . shape [ 1 ] )c
20 h i 2 f e a t u r e s = S e l e c t K B e s t ( chi 2 , k = 2 )
21 X k b e s t = c h i 2 f e a t u r e s . f i t t r a n s f o r m ( X, y )
22 p r i n t ( ’ O r i g i n a l f e a t u r e number : ’ , X. shape [ 1 ] )
23 p r i n t ( ’ Reduced f e a t u r e number : ’ , X k b e s t . shape [ 1 ] )

5.3.5 Black Box Testing

1 from s k l e a r n . d i s c r i m i n a n t a n a l y s i s i m p o r t L i n e a r D i s c r i m i n a n t A n a l y s i s as LDA
2 l d a = LDA( n co m p o n en ts = 1 )
3 X train = lda . f i t t r a n s f o r m ( X train , y t r a i n )
4 X test = lda . transform ( X test )
5 y pred = lda . p r e d ic t ( X test )
6 cm = c o n f u s i o n m a t r i x ( y t e s t , y p r e d )
7 p r i n t ( cm )
8 p r i n t ( ’ Accuracy f o r LDA ’ + s t r ( a c c u r a c y s c o r e ( y t e s t , y p r e d ) ) )
9 from s k l e a r n . ensemble i m p o r t R a n d o m F o r e s t C l a s s i f i e r
10 c l a s s i f i e r = R a n d o m F o r e s t C l a s s i f i e r ( max depth = 2 , r a n d o m s t a t e = 0 )
11 c l a s s i f i e r . f i t ( X train , y t r a i n )
12 y pred = classifier .predict(X test)
13 cm = c o n f u s i o n m a t r i x ( y t e s t , y p r e d )
14 p r i n t ( cm )
15 p r i n t ( ’ Accuracy f o r Random F o r e s t ’ + s t r ( a c c u r a c y s c o r e ( y t e s t , y p r e d ) ) )

28
5.3.6 Test Result

Figure 5.1: Test Image

From the result, LDA gives the best with most elevated precision than RandomForest
algorithm.

5.4 Testing Strategy


A methodology for framework testing incorporates framework experiments and structure
systems into an all-around arranged arrangement of steps that outcomes in the effective
development of graphical portrayal. The testing procedure must co- work test arranging,
experiment configuration, test execution, and the resultant information assortment and assessment.
A technique for programming testing must suit low-level tests that are important to check that a
little source code section has been accurately actualized just as elevated level tests that approve
significant framework capacities against client prerequisites. Testing speaks to an intriguing
peculiarity for the examination framework. Consequently, a progression of testing is performed for
the proposed framework before the framework is prepared for client acknowledgment testing.

29
Chapter 6
RESULTS AND DISCUSSIONS

6.1 Efficiency of the Proposed System


The work has chosen Random Forest, LDA, and DNN. Neural networks use randomness by
design to ensure that they effectively learn the function being used for the problem.
Randomness is used because this class of machine learning algorithm performs better with it than
without it. Experienced mathematical modeling is usedto process data in complex ways by Deep
neural networks. Different classifiers are used for predicting the accuracy, the difference between
them is clearly observed.

6.2 Comparison of Existing and Proposed System


The existing system predicts the crop yield but the proposed system finds the crop growing
accuracy. Proposed system uses different algorithms, the best among three is chosen and the
difference in the accuracy among algorithms is found. The input dataset contains of different fields
which are more useful for prediction.

6.3 Advantages of the Proposed System


• Reduces loss to farmers as the crop growing accuracy is found.
• Time of farmers will be utilized effectively.
• Compares between three different algorithms.
• More yielding probability.

6.4 Sample Code

1 i m p o r t numpy as np
2 i m p o r t panda s as pd
3 import matpl ot lib
4

5 import theano
6 import tensorflow
7 import keras
8

10 from s klearn . metrics import confusion mat rix


from sklearn . metrics import accuracy sco re
30

d a t a s e t = pd . r e a d c s v ( ’C : / / Users / / P r a s a d / / Desktop / / Agri / / F i n a l / / S o i l D a t a s e t . csv ’


11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

31
26 p l t . show ( )
27 ’’’
28 from sklearn . preprocessing i m p o r t Label Encoder , One Hot Encoder
29 l a b e l e n c o d e r X 0 = Label En c o de r ( )
30 X [ : , 0 ] = l a b e l e n c o d e r X 0 . f i t t r a n s f o r m (X [ : , 0])
31 l a b e l e n c o d e r X 1 = Label En c o de r ( )
32 X [ : , 1 ] = l a b e l e n c o d e r X 1 . f i t t r a n s f o r m (X [ : , 1])
33

34

35 l a b e l e n c o d e r X 2 = Label En c o de r ( )
36 X [ : , 2 ] = l a b e l e n c o d e r X 2 . f i t t r a n s f o r m (X [ : , 2])
37

38

39 X
40 l a b e l e n c o d e r X 3 = Label En c o de r ( )
41 X [ : , 3 ] = l a b e l e n c o d e r X 3 . f i t t r a n s f o r m (X [ : , 3])
42

43

44 X
45 l a b e l e n c o d e r X 4 = Label En c o de r ( )
46 X [ : , 4 ] = l a b e l e n c o d e r X 4 . f i t t r a n s f o r m (X [ : , 4])
47

48

49 l a b e l e n c o d e r X 5 = Label En c o de r ( )
50 X [ : , 5 ] = l a b e l e n c o d e r X 5 . f i t t r a n s f o r m (X [ : , 5])
51

52 l a b e l e n c o d e r X 6 = Label En c o de r ( )
53 X [ : , 6 ] = l a b e l e n c o d e r X 6 . f i t t r a n s f o r m (X [ : , 6])
54

55

56 l a b e l e n c o d e r X 7 = Label En c o de r ( )
57 X [ : , 7 ] = l a b e l e n c o d e r X 7 . f i t t r a n s f o r m (X [ : , 7])
58

59

60 l a b e l e n c o d e r X 8 = Label En c o de r ( )
61 X [ : , 8 ] = l a b e l e n c o d e r X 8 . f i t t r a n s f o r m (X [ : , 8])
62

63

64 l a b e l e n c o d e r X 9 = Label En c o de r ( )

32
65 X [ : , 9 ] = l a b e l e n c o d e r X 9 . f i t t r a n s f o r m (X [ : , 9])
66

67 l a b e l e n c o d e r X 1 0 = Label E nc o d er ( )
68 X [ : , 10 ] = l a b e l e n c o d e r X 1 0 . f i t t r a n s f o r m (X [ : , 10])
69

70 l a b e l e n c o d e r X 1 1 = Label E nc o d er ( )
71 X [ : , 11 ] = l a b e l e n c o d e r X 1 1 . f i t t r a n s f o r m (X [ : , 11])
72

73 l a b e l e n c o d e r X 1 2 = Label E nc o d er ( )
74 X [ : , 12 ] = l a b e l e n c o d e r X 1 2 . f i t t r a n s f o r m (X [ : , 12])
75

76 X
77 l a b e l e n c o d e r X 1 3 = Label E nc o d er ( )
78 X [ : , 13 ] = l a b e l e n c o d e r X 1 3 . f i t t r a n s f o r m (X [ : , 13])
79

80 X
81 l a b e l e n c o d e r X 1 4 = Label E nc o d er ( )
82 X [ : , 14 ] = l a b e l e n c o d e r X 4 . f i t t r a n s f o r m (X [ : , 14])
83

84 l a b e l e n c o d e r X 1 5 = Label E nc o d er ( )
85 X [ : , 15 ] = l a b e l e n c o d e r X 1 5 . f i t t r a n s f o r m (X [ : , 15])
86

87 l a b e l e n c o d e r X 1 6 = Label E nc o d er ( )
88 X [ : , 16 ] = l a b e l e n c o d e r X 1 6 . f i t t r a n s f o r m (X [ : , 16])
89

90 l a b e l e n c o d e r X 1 7 = Label E nc o d er ( )
91 X [ : , 17 ] = l a b e l e n c o d e r X 1 7 . f i t t r a n s f o r m (X [ : , 17])
92

93 l a b e l e n c o d e r X 1 8 = Label E nc o d er ( )
94 X [ : , 18 ] = l a b e l e n c o d e r X 1 8 . f i t t r a n s f o r m (X [ : , 18])
95

96 l a b e l e n c o d e r X 1 9 = Label E nc o d er ( )
97 X [ : , 19 ] = l a b e l e n c o d e r X 9 . f i t t r a n s f o r m (X [ : , 19])
98

99 l a b e l e n c o d e r X 2 0 = Label E nc o d er ( )
100 X [ : , 20 ] = l a b e l e n c o d e r X 2 0 . f i t t r a n s f o r m (X [ : , 20])
101

102 l a b e l e n c o d e r X 2 1 = Label E nc o d er ( )
103 X [ : , 21 ] = l a b e l e n c o d e r X 2 1 . f i t t r a n s f o r m (X [ : , 21])

33
104

105 l a b e l e n c o d e r X 2 2 = Label E nc o d er ( )
106 X [ : , 22 ] = l a b e l e n c o d e r X 2 2 . f i t t r a n s f o r m (X [ : , 22])
107

108 X
109 l a b e l e n c o d e r X 2 3 = Label E nc o d er ( )
110 X [ : , 23 ] = l a b e l e n c o d e r X 2 3 . f i t t r a n s f o r m (X [ : , 23])
111

112 X
113 l a b e l e n c o d e r X 2 4 = Label E nc o d er ( )
114 X [ : , 24 ] = l a b e l e n c o d e r X 2 4 . f i t t r a n s f o r m (X [ : , 24])
115

116 l a b e l e n c o d e r X 2 5 = Label E nc o d er ( )
117 X [ : , 25 ] = l a b e l e n c o d e r X 2 5 . f i t t r a n s f o r m (X [ : , 25])
118

119 l a b e l e n c o d e r X 2 6 = Label E nc o d er ( )
120 X [ : , 26 ] = l a b e l e n c o d e r X 2 6 . f i t t r a n s f o r m (X [ : , 26])
121

122 l a b e l e n c o d e r X 2 7 = Label E nc o d er ( )
123 X [ : , 27 ] = l a b e l e n c o d e r X 2 7 . f i t t r a n s f o r m (X [ : , 27])
124

125 l a b e l e n c o d e r X 2 8 = Label E nc o d er ( )
126 X [ : , 28 ] = l a b e l e n c o d e r X 2 8 . f i t t r a n s f o r m (X [ : , 28])
127

128 o n e h o t e n c o d e r = One Hot Encoder ( c a t e g o r i c a l f e a t u r e s = [ 1 ] )


129 X = o n e h o t e n c o d e r . f i t t r a n s f o r m (X) . t o a r r a y ( )
130 X = X[ : , 1:]
131

132 from s k l e a r n . m o d e l s e l e c t i o n i m p o r t train test split


133 X t r a i n , X t e s t , y t r a i n , y t e s t = t r a i n t e s t s p l i t ( X, y , t e s t s i z e = 0 . 2 )
134

135 from sklearn . preprocessing import StandardScaler


136 sc = S t a n d a r d S c a l e r ( )
137 X t r a i n = sc . f i t t r a n s f o r m ( X t r a i n )
138 X t e s t = sc . t r a n s f o r m ( X t e s t )
139

140 import keras


141 from k e r a s . models import Sequential
142 from k e r a s . l a y e r s i m p o r t Dense

34
143

144 c l a s s i f i e r = Sequential ()
145 c l a s s i f i e r . add ( Dense ( o u t p u t d i m = 6 , i n i t = ’ u ni f o rm ’ , a c t i v a t i o n = ’ r e l u ’ ,
i n p u t d i m = 1422 ) )
146 c l a s s i f i e r . add ( Dense ( o u t p u t d i m = 6 , i n i t = ’ u ni f o rm ’ , a c t i v a t i o n = ’ r e l u ’ ) )
147 c l a s s i f i e r . add ( Dense ( o u t p u t d i m = 1 , i n i t = ’ u n if o r m ’ , a c t i v a t i o n = ’ s i g moi d ’ ) )
148 c l a s s i f i e r . co m p ile ( o p t i m i z e r = ’ adam ’ , l o s s = ’ b i n a r y c r o s s e n t r o p y ’ , m e t r i c s = [
’ accuracy ’ ] )
149 c l a s s i f i e r . f i t ( X train , y train , b a t c h s i z e = 10 , nb epoch = 100 )
150 y pred = c l a s s i f i e r . pr e d i ct ( X test )
151 y pred = ( y pred > 0.5)
152

153 cm = c o n f u s i o n m a t r i x ( y t e s t , y p r e d )
154 ’ ’ ’ p r i n t ( cm )
155 p r i n t ( ’ Accuracy f o r Random F o r e s t ’ + s t r ( a c c u r a c y s c o r e ( y t e s t , y p r e d ) ) )
156 ’’’
157 from sklearn . feature selection import SelectKBest
158 from s k l e a r n . f e a t u r e s e l e c t i o n import chi2
159

160 X = X. a s t y p e ( i n t )
161 c h i 2 f e a t u r e s = S e l e c t K B e s t ( chi 2 , k = 2 )
162 X k b e s t f e a t u r e s = c h i 2 f e a t u r e s . f i t t r a n s f o r m ( X, y )
163

164 print ( ’ Original f e a t u r e number : ’ , X. shape [ 1 ] )


165 p r i n t ( ’ Reduced f e a t u r e number : ’ , X k b e s t f e a t u r e s . shape [ 1 ] )
166

167 c h i 2 f e a t u r e s = S e l e c t K B e s t ( chi 2 , k = 2 )
168 X k b e s t = c h i 2 f e a t u r e s . f i t t r a n s f o r m ( X, y )
169

170 print ( ’ Original f e a t u r e number : ’ , X. shape [ 1 ] )


171 p r i n t ( ’ Reduced f e a t u r e number : ’ , X k b e s t . shape [ 1 ] )
172

173 from s k l e a r n . d i s c r i m i n a n t a n a l y s i s i m p o r t L i n e a r D i s c r i m i n a n t A n a l y s i s as LDA


174 l d a = LDA( n co mponents = 1 )
175 X train = lda . f i t t r a n s f o r m ( X train , y train )
176 X test = lda . transform ( X test )
177 y pred = lda . p r e di c t ( X test )
178

179 cm = c o n f u s i o n m a t r i x ( y t e s t , y p r e d )

35
180 p r i n t ( cm )
181 p r i n t ( ’ Accuracy f o r LDA ’ + s t r ( a c c u r a c y s c o r e ( y t e s t , y p r e d ) ) )
182

183 from s k l e a r n . ense mble import RandomForestClassifier


184 c l a s s i f i e r = R a n d o m F o r e s t C l a s s i f i e r ( max depth = 2 , r a n d o m s t a t e = 0 )
185 c l a s s i f i e r . f i t ( X train , y t r a i n )
186 y pred = c l a s s i f i e r . p re d i ct ( X test )
187

188

189 cm = c o n f u s i o n m a t r i x ( y t e s t , y p r e d )
190 p r i n t ( cm )
191 p r i n t ( ’ Accuracy f o r Random F o r e s t ’ + s t r ( a c c u r a c y s c o r e ( y t e s t , y p r e d ) ) )

Output

Figure 6.1: Epochs Starting of Training the data

36
Figure 6.2: Epochs Ending of Training the data

N number of iterations is made to train the data. Here in our case N=100.

37
Figure 6.3: Accuracy and Confusion Matrix

Reduced feature number, accuracy and confusion matrix of LDA and RandomForest.
Accuracy of LDA id more compared to Random Forest.

Figure 6.4: Variable Explorer of data

38
Chapter 7

CONCLUSION AND FUTUREENHANCEMENTS

7.1 Conclusion
We could see that the dataset has been processed and trained. The output of the trained data has
been checked against the output set of the train data. The complete portion has once again checked
with the test train and test output data. The accuracy of the DNN has also been verified. It works
fine for the dataset and predicts the outcome accurately.

7.2 Future Enhancements


In the future, the system is developed in the form of a website where the user is able to get
information about different crops and the fertilizers used for a particular crop and the information
about the workshops will be provided on the webpage. And the user can maintain a separate
profile and can able to upload and find the accuracy if the dataset is available with the user. The
methodology will help to improve the net profit of the farmers having a better organic farming
practice. The method will help to get rid of the poverty of farmers and agricultural runoff.
Nowadays the information technology plays a key role in the agriculture industry. All the details
will be storedin the cloud which will be used for easy access and used as a reference by other users
in the form of notifications to mail and to mobile as text messages.

39
References
[1] Priya, P., U. Muthaiah, and M. Balamurugan. ”Predicting yield of the crop using
machine learning algorithm.” International Journal of Engineering Sciences Research
Technology 7.1 (2018): 1-7.
[2] Narasimhamurthy, V. and Kumar, P., 2017. Rice Crop Yield Forecasting Using Random
Forest Algorithm. Int. J. Res. Appl. Sci. Eng. Technol. IJRASET.
[3] R. Sujatha, Dr. P.Isakki,A Study on Crop Yield Forecasting Using Classification
Techniques, 978-1-4673- 8437-7/16/31.00 Ⓧ
c 2016 IEEE.

[4] Veenadhari, S., Bharat Misra, D Singh, Data mining Techniques for Predicting Crop
Productivity – A review article IJCST International Journal of Computer Science and
technology, march 2011.
[5] Shivnath Ghosh,Santanu Koley, “Machine Learning for Soil Fertility and Plant Nutrient
Management using Back Propagation Neural Networks” IJRITCC, vol. 2, Issue 2,292-
297,2014.

[6] Maier, Daniel, et al. ”Applying LDA topic in communication research: Toward a valid
and reliable methodology.” Communication Methods and Measures12.2-3 (2018): 93-118.
[7] Wu, Tianjun, et al. ”Geo-Object-Based Soil Organic Matter Mapping Using Machine
Learning Algorithms With Multi-Source Geo-Spatial Data.” IEEE Journal of Selected Topics in
Applied Earth Observations and Remote Sensing 12.4 (2019): 1091-1106.
[8] Ayaz, Muhammad, et al. ”Internet-of-Things (IoT)-Based Smart Agriculture: Toward
Making the Fields Talk.” IEEE Access 7 (2019): 129551-129583.
[9] Zingade, D. S., et al. ”Crop prediction system using machine learning.” Int. J. Adv. Eng.
Res. Dev. Spec. Issue Recent Trends Data Eng 4.5 (2017): 1-6.
[10] Rahman SA, Mitra KC, Islam SM. Soil classification using machine learning methods
and crop suggestion based on soil series. In2018 21st International Conference of Computer
and Information Technology (ICCIT) 2018 Dec 21(pp. 1-4). IEEE.
[11] Emrullah, A. C. A. R., Mehmet Sirac OZERDEM, and Burak Berk US- TUNDAG.
”Machine Learning based Regression Model for Prediction of Soil Surface Humidity over
Moderately Vegetated Fields.” 2019 8th International Conference on Agro-Geoinformatics
(Agro-Geoinformatics). IEEE, 2019.

[12] S.R.Rajeswari , Parth Khunteta, Subham Kumar, Amrit Raj Singh, Vaibhav Pandey,2019.
Smart Farming Prediction Using Machine Learning. International Journal of Innovative

40
Technology and Exploring Engineering (IJITEE).
[13] R. Shrestha (2016), Regression based corn yield assessment using MODIS based daily
NDVI in Iowa state, IEEE Fifth International Conference on Agro Geo informatics (Agro Geo
informatics), 24(12), 7-9.
[14] C. Y. Ji (1996), Delineating agricultural field boundaries from TM imagery using dyadic
wavelet transforms, IEEE ISPRS J. Photogramme Remote Sense, 51(6),268-283.
[15] E. V. White and D. P. Roy (2015), A contemporary decennial examination of changing
agricultural field sizes using Landsat time series data, IEEE Geogrphics Environment,12(5), 33-
65.

41

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy