QABooklet
QABooklet
Table of Contents
1 Testing Types and Techniques 5
1.1 Unit Testing 5
1.2 Smoke Testing and Sanity testing 5
1.3 System testing / End to End Testing 7
1.4 What is GUI Testing? 7
1.5 Retesting and Regression testing 7
1.6 Integration testing 8
1.7 Interface testing 9
1.8 Functional vs. Non-Functional Testing 9
1.9 Performance testing 11
1.10 Load testing 11
1.11 Stress testing 11
1.12 Recovery testing 12
1.13 Security testing 12
1.14 Compatibility testing 13
1.15 Exploratory testing 13
1.16 Monkey testing 13
1.17 Ad hoc testing 14
1.18 Accessibility testing 14
1.19 Usability testing 14
1.20 Acceptance testing 15
1.21 Alpha and Beta testing 15
1.22 Positive and Negative testing 16
1.23 Dynamic and Static Testing 17
1.24 Black box, White box and Gray box testing 18
1.25 Equivalence Partitioning and Boundary Value Analysis 19
1.26 Database testing 20
1.27 Penetration testing 21
1.28 Experience based testing 21
1.29 Web Vs Desktop Testing 22
2 Quality assurance and Quality Control 22
2.1 Quality Assurance 22
2.2 Quality Control 22
2.3 Software Quality 24
2.4 Software Quality Parameters 24
2.5 Software Testing 25
2.6 Verification and Validation 25
2.7 Requirement Traceability Matrix 27
2.8 Defect lifecycle 28
2.9 Difference between severity and priority of a defect 29
2.10 Difference between Test scenario, Test case and a Test
script 31
2.11 What is Test Suite? 34
2.12 What is test coverage? 34
2.13 What is Test Bed? 34
2.14 Difference between build and release 34
2.15 Seven Principles of Software Testing 35
3 Software Development Models 36
3.1 What is SDLC 36
3.1.1 SDLC Phases 36
3.2 SDLC Models 38
Waterfall Model 39
V-Shaped Model 40
Prototype Model 42
Spiral Model 43
Iterative Incremental Model 45
Agile Model 46
3.3 Scrum Framework 47
4 Software Testing Life Cycle (STLC) 50
4.1 What is STLC? 50
4.2 Phases of STLC 50
4.3 Difference between Test Plan and Test Strategy 53
4.4 Reviews 55
4.4.1 Types of Review 58
5 Automation Testing 59
5.1 Introduction to Automation Testing 59
5.2 Introduction to Selenium 61
5.2.1 What is Selenium 61
5.2.2 Selenium Components 63
6 Basic Database Concepts 64
6.1 Relational Database Basics 64
6.2 SQL Statements 69
Writing my first query 70
6.3 SQL JOIN 75
7 Basic Programming Concepts 81
7.1 OOPs concepts in Java 81
7.1.1 What is an Object 82
7.1.2 What is a Class in OOPs Concepts 83
7.1.3 Object Oriented Programming features 85
8 References 91
Smoke Testing is performed to ascertain that the Sanity Testing is done to check the new
critical functionalities of the program is working functionality/bugs have been fixed
fine
The objective of this testing is to verify the The objective of the testing is to verify the
"stability" of the system in order to proceed with "rationality" of the system in order to proceed with
more rigorous testing more rigorous testing
This testing is performed by the developers or Sanity testing is usually performed by testers
testers
Smoke testing is usually documented or scripted Sanity testing is usually not documented and is
unscripted
Smoke testing is a subset of Acceptance testing Sanity testing is a subset of Regression Testing
Smoke testing exercises the entire system from Sanity testing exercises only the particular
end to end component of the entire system
Smoke testing is like General Health Check Up Sanity Testing is like specialized health check up
1.3 System testing / End to End Testing
System Testing is the testing of a complete and fully integrated software product. Usually, software
is only one element of a larger computer-based system. Ultimately, software is interfaced with
other software/hardware systems. System Testing is actually a series of different tests whose sole
purpose is to exercise the full computer-based system.
Two Category of Software Testing
• Black Box Testing
• White Box Testing
System Testing involves testing the software code for following
• Testing the fully integrated applications including external peripherals in order to check how
components interact with one another and with the system as a whole. This is also called
End to End testing scenario.
• Verify thorough testing of every input in the application to check for desired outputs.
• Testing of the user's experience with the application.
The purpose of Regression Testing is that Re-testing is done on the basis of the
new code changes should not have any Defect fixes
side effects to existing functionalities
Defect verification is not the part of Defect verification is the part of re-
Regression Testing testing
You can do automation for regression You cannot automate the test cases for
testing, Manual Testing could be Retesting.
expensive and time-consuming
Regression testing is done for passed test Retesting is done only for failed test
cases cases
Regression testing checks for unexpected Re-testing makes sure that the original
side-effects fault has been corrected
1.6 Integration testing
Testing of all integrated modules to verify the combined functionality after integration is termed as
Integration Testing. Modules are typically code modules, individual applications, client and server
applications on a network, etc. This type of testing is especially relevant to client/server and
distributed systems. Integration testing is the process of testing the interface between two
software units or module. It’s focus on determining the correctness of the interface. The purpose
of the integration testing is to expose faults in the interaction between integrated units. Once all
the modules have been unit tested, integration testing is performed.
What is Top-Down Approach?
Testing takes place from top to bottom. High-level modules are tested first and then low-level
modules and finally integrating the low-level modules to a high level to ensure the system is
working as intended. Stubs are used as a temporary module if a module is not ready for
integration testing.
This testing does the verification process Dynamic testing does the validation process
Dynamic testing is about finding and fixing the
Static testing is about prevention of defects defects
Static testing gives an assessment of code and Dynamic testing gives bugs/bottlenecks in the
documentation software system.
Cost of finding defects and fixing is less Cost of finding and fixing defects is high
Return on investment will be high as this Return on investment will be low as this process
process involved at an early stage involves after the development phase
More reviews comments are highly More defects are highly recommended for good
recommended for good quality quality.
The main focus of black box testing is on White Box Testing (Unit Testing) validates
the validation of your functional internal structure and working of your
requirements. software code
Black box testing gives abstraction from To conduct White Box Testing, knowledge of
code and focuses on testing effort on the underlying programming language is
software system behavior. essential. Current day software systems use a
variety of programming languages and
technologies and it’s not possible to know all
of them.
Black box testing facilitates testing White box testing does not facilitate testing
communication amongst modules communication amongst modules
It does not involve executing the program It always involves executing a program
It is the procedure to create the deliverables It is the procedure to verify that deliverables
QA involves in full software development life QC involves in full software testing life
cycle cycle
In order to meet the customer requirements, QA QC confirms that the standards are followed
defines standards and methodologies while working on the product
Its main motive is to prevent defects in the Its main motive is to identify defects or bugs
system. It is a less time-consuming activity in the system. It is a more time-consuming
activity
QA ensures that everything is executed in the QC ensures that whatever we have done is
right way, and that is why it falls under as per the requirement, and that is why it
verification activity falls under validation activity
It requires the involvement of the whole team It requires the involvement of the Testing
team
Definition The process of evaluating work- The process of evaluating software during
products (not the actual final product) or at the end of the development process to
of a development phase to determine determine whether it satisfies specified
whether they meet the specified business requirements.
requirements for that phase.
Objective To ensure that the product is being built To ensure that the product actually meets
according to the requirements and the user’s needs and that the specifications
design specifications. In other words, were correct in the first place. In other
to ensure that work products meet their words, to demonstrate that the product
specified requirements. fulfills its intended use when placed in its
intended environment.
Question Are we building the product right? Are we building the right product?
• Walkthroughs
• Inspections
2.10 Difference between Test scenario, Test case and a Test script
Test Case VS Test Scenario
A test case is a set of conditions for evaluation a particular feature of a software product to determine its
compliance with the business requirements.
A Test Case is a set of actions executed to verify a particular feature or functionality of your software
application. The Test Case has a set test data, precondition, certain expected and actual results developed for
specific test scenario to verify any requirement.
A test case includes specific variables or conditions, using which a test engineer can determine as to whether a
software product is functioning as per the requirements of the client or the customer.
Whereas, a test scenario is generally a one line statement describing a feature of application to be tested. It is
used for end to end testing of a feature and is generally derived from the use cases.
COMPARISON
PARAMETER USE CASE TEST CASE
A sequential actions which A Group of test inputs,
is use to describe the conditions and variables
interaction among the role by which the
and system to maintain a characteristics of the
Definition specified objective, software is defined.
To reach the last operation
follow all sequential Validating the software as
Goal operation it is working fine or not.
it follows single test case
Iteration it follows different paths is tested at a time
it is dependent on the it is dependent over the
Dependency requirements use case
Test inputs scripts and
Documents and research is each test scripts complete
Requirement required one step
The testing is done again
Completion complete all step once and again then finish.
Interaction User Results
it is working as following it is working with the
the step by step function help of testers to validate
Working ability of the software. the software
Waterfall Model
Waterfall model is the very first model that is used in SDLC. It is also known as the linear
sequential model.
In this model, the outcome of one phase is the input for the next phase. Development of the next
phase starts only when the previous phase is complete.
• First, Requirement gathering and analysis is done. Once the requirement is freeze then only
the System Design can start. Herein, the SRS document created is the output for the
Requirement phase and it acts as an input for the System Design.
• In System Design Software architecture and Design, documents which act as an input for the
next phase are created i.e. Implementation and coding.
• In the Implementation phase, coding is done and the software developed is the input for the
next phase i.e. testing.
• In the testing phase, the developed code is tested thoroughly to detect the defects in the
software. Defects are logged into the defect tracking tool and are retested once fixed. Bug
logging, Retest, Regression testing goes on until the time the software is in go-live state.
• In the Deployment phase, the developed code is moved into production after the sign off is
given by the customer.
• Any issues in the production environment are resolved by the developers which come under
maintenance.
Advantages of the Waterfall Model:
• Waterfall model is the simple model which can be easily understood and is the one in which
all the phases are done step by step.
• Deliverables of each phase are well defined, and this leads to no complexity and makes the
project easily manageable.
Disadvantages of Waterfall model:
• Waterfall model is time-consuming & cannot be used in the short duration projects as in this
model a new phase cannot be started until the ongoing phase is completed.
• Waterfall model cannot be used for the projects which have uncertain requirement or wherein
the requirement keeps on changing as this model expects the requirement to be clear in
the requirement gathering and analysis phase itself and any change in the later stages
would lead to cost higher as the changes would be required in all the phases.
V-Shaped Model
V- Model is also known as Verification and Validation Model. In this model Verification & Validation
goes hand in hand i.e. development and testing goes parallel. V model and waterfall model are
the same except that the test planning and testing start at an early stage in V-Model.
a) Verification Phase:
(i) Requirement Analysis:
In this phase, all the required information is gathered & analyzed. Verification activities include
reviewing the requirements.
(ii) System Design:
Once the requirement is clear, a system is designed i.e. architecture, components of the product
are created and documented in a design document.
(iii) High-Level Design:
High-level design defines the architecture/design of modules. It defines the functionality between
the two modules.
(iv) Low-Level Design:
Low-level Design defines the architecture/design of individual components.
(v) Coding:
Code development is done in this phase.
b) Validation Phase:
(i) Unit Testing:
Unit testing is performed using the unit test cases that are designed and is done in the Low-level
design phase. Unit testing is performed by the developer itself. It is performed on individual
components which lead to early defect detection.
(ii) Integration Testing:
Integration testing is performed using integration test cases in High-level Design phase.
Integration testing is the testing that is done on integrated modules. It is performed by testers.
(iii) System Testing:
System testing is performed in the System Design phase. In this phase, the complete system is
tested i.e. the entire system functionality is tested.
(iv) Acceptance Testing:
Acceptance testing is associated with the Requirement Analysis phase and is done in the
customer’s environment.
Advantages of V – Model:
• It is a simple and easily understandable model.
• V –model approach is good for smaller projects wherein the requirement is defined and it
freezes in the early stage.
• It is a systematic and disciplined model which results in a high-quality product.
Disadvantages of V-Model:
• V-shaped model is not good for ongoing projects.
• Requirement change at the later stage would cost too high.
Prototype Model
The prototype model is a model in which the prototype is developed prior to the actual software.
Prototype models have limited functional capabilities and inefficient performance when compared
to the actual software. Dummy functions are used to create prototypes. This is a valuable
mechanism for understanding the customers’ needs.
Software
prototypes are built prior to the actual software to get valuable feedback from the customer.
Feedbacks are implemented and the prototype is again reviewed by the customer for
any change. This process goes on until the model is accepted by the customer.
Once the requirement gathering is done, the quick design is created and the prototype which is
presented to the customer for evaluation is built.
Customer feedback and the refined requirement is used to modify the prototype and is again
presented to the customer for evaluation. Once the customer approves the prototype, it is used as
a requirement for building the actual software. The actual software is build using the Waterfall
model approach.
Advantages of Prototype Model:
• Prototype model reduces the cost and time of development as the defects are found much
earlier.
• Missing feature or functionality or a change in requirement can be identified in the evaluation
phase and can be implemented in the refined prototype.
• Involvement of a customer from the initial stage reduces any confusion in the requirement or
understanding of any functionality.
Disadvantages of Prototype Model:
• Since the customer is involved in every phase, the customer can change the requirement of
the end product which increases the complexity of the scope and may increase the delivery
time of the product.
Spiral Model
The Spiral Model includes iterative and prototype approach.
Spiral model phases are followed in the iterations. The loops in the model represent the phase of
the SDLC process i.e. the innermost loop is of requirement gathering & analysis which follows the
Planning, Risk analysis, development, and evaluation. Next loop is designing followed by
Implementation & then testing.
Spiral Model has four phases:
• Planning
• Risk Analysis
• Engineering
• Evaluation
(i) Planning:
The planning phase includes requirement gathering wherein all the required information is
gathered from the customer and is documented. Software requirement specification document is
created for the next phase.
(ii) Risk Analysis:
In this phase, the best solution is selected for the risks involved and analysis is done by building
the prototype.
For Example, the risk involved in accessing the data from a remote database can be that the data
access rate might be too slow. The risk can be resolved by building a prototype of the data access
subsystem.
(iii) Engineering:
Once the risk analysis is done, coding and testing are done.
(iv) Evaluation:
Customer evaluates the developed system and plans for the next iteration.
Advantages of Spiral Model:
• Risk Analysis is done extensively using the prototype models.
• Any enhancement or change in the functionality can be done in the next iteration.
Disadvantages of Spiral Model:
• The spiral model is best suited for large projects only.
• The cost can be high as it might take a large number of iterations which can lead to high time
to reach the final product.
Agile Model
Agile Model is a combination of the Iterative and incremental model. This model focuses more on
flexibility while developing a product rather than on the requirement.
In Agile, a product is broken into small incremental builds. It is not developed as a complete
product in one go. Each build increments in terms of features. The next build is built on previous
functionality.
In agile iterations are termed as sprints. Each sprint lasts for2-4 weeks. At the end of each sprint,
the product owner verifies the product and after his approval, it is delivered to the customer.
Customer feedback is taken for improvement and his suggestions and enhancement are worked
on in the next sprint. Testing is done in each sprint to minimize the risk of any failures.
Advantages of Agile Model:
• It allows more flexibility to adapt to the changes.
• The new feature can be added easily.
• Customer satisfaction as the feedback and suggestions are taken at every stage.
Disadvantages:
• Lack of documentation.
• Agile needs experienced and highly skilled resources.
• If a customer is not clear about how exactly they want the product to be, then the project
would fail.
Conclusion
Adherence to a suitable life cycle is very important, for the successful completion of the Project.
This, in turn, makes the management easier.
Different Software Development Life Cycle models have their own Pros and Cons. The best model
for any Project can be determined by the factors like Requirement (whether it is clear or unclear),
System Complexity, Size of the Project, Cost, Skill limitation, etc.
Example, in case of an unclear requirement, Spiral and Agile models are best to be used as the
required change can be accommodated easily at any stage.
Waterfall model is a basic model and all the other SDLC models are based on that only.
Agile Scrum
Agile software development has been Scrum is ideally used in the project where
widely seen as highly suited to the requirement is rapidly changing.
environments which have small but
expert project development team
In the Agile process, the leadership plays Scrum fosters a self-organizing, cross-
a vital role. functional team.
The agile method needs frequent In the scrum, after each sprint, a build is
delivery to the end user for their delivered to the client for their feedback.
feedback.
Project head takes cares of all the tasks There is no team leader, so
in the agile method. the entire team addresses the issues or
problems.
Deliver and update the software on a When the team is done with the current
regular basis. sprint activities, the next sprint can be
planned.
In the Agile method, the priority is always Empirical Process Control is a core
to satisfy the customer by philosophy of Scrum based process.
providing continuous delivery of
valuable software.
A data-flow diagram (DFD) is a way of representing a flow of a data of a process or a system (usually an
information system). The DFD also provides information about the outputs and inputs of each entity and the
process itself. A data-flow diagram has no control flow, there are no decision rules and no loops.
6 CONTEXT DIAGRAM
A context diagram, sometimes called a level 0 data-flow diagram, is drawn in order to define and clarify the boundaries
of the software system. It identifies the flows of information between the system and external entities. The entire software
system is shown as a single process.
7 ACTIVITY DIAGRAM
An activity diagram is a behavioral diagram i.e. it depicts the behavior of a system. An activity
diagram portrays the control flow from a start point to a finish point showing the various decision paths that
exist while the activity is being executed.
1. Requirement Analysis
During this phase, test team studies the requirements from a testing point of view to identify the
testable requirements.
The QA team may interact with various stakeholders (Client, Business Analyst, Technical Leads,
and System Architects etc.) to understand the requirements in detail.
Requirements could be either Functional (defining what the software must do) or Non Functional
(defining system performance /security / availability)
Activities
• Identify types of tests to be performed.
• Gather details about testing priorities and focus.
• Prepare Requirement Traceability Matrix (RTM).
• Identify test environment details where testing is supposed to be carried out.
• Automation feasibility analysis (if required).
Deliverables
• RTM
• Automation feasibility report. (if applicable)
2. Test Planning
Typically, in this stage, a Senior QA manager will determine effort and cost estimates for the
project and would prepare and finalize the Test Plan. In this phase, Test Strategy is also
determined.
Activities
• Preparation of test plan/strategy document for various types of testing
• Test tool selection
• Test effort estimation
• Resource planning and determining roles and responsibilities.
• Training requirement
Deliverables
• Test plan /strategy document.
• Effort estimation document.
3. Test Case Development
This phase involves the creation, verification and rework of test cases & test scripts. Test data, is
identified/created and is reviewed and then reworked as well.
Activities
• Create test cases, automation scripts (if applicable)
• Review and baseline test cases and scripts
• Create test data (If Test Environment is available)
Deliverables
• Test cases/scripts
• Test data
5. Test Execution
During this phase, the testers will carry out the testing based on the test plans and the test cases
prepared. Bugs will be reported back to the development team for correction and retesting will be
performed.
Activities
• Execute tests as per plan
• Document test results, and log defects for failed cases
• Map defects to test cases in RTM
• Retest the Defect fixes
• Track the defects to closure
Deliverables
• Completed RTM with the execution status
• Test cases updated with results
• Defect reports
8.4 Reviews
Reviews are the form of static testing. In software reviews people analyze the work product of
projects such as requirements document, design document, test strategy, test plan in order to find
out any defects in the documents.
Software Reviews, if done properly are the biggest and most cost effective contributor to product
quality.
Review provides a powerful way to improve the quality and productivity of software development
to recognize and fix their own defects early in the software development process.
Advantages of Reviews:-
1. Types of defects that can be found during static testing are: deviations from standards, missing
requirements, design defects, non-maintainable code and inconsistent interface specifications.
2. Since static testing can start early in the life cycle, early feedback on quality issues can be
established, e.g. an early validation of user requirements and not just late in the life cycle during
acceptance testing.
3. By detecting defects at an early stage, rework costs are relatively low and thus a relatively
cheap improvement of the quality of software products can be achieved.
4. The feedback and suggestions document from the static testing process allows for process
improvement, which supports the avoidance of similar errors being made in the future.
Roles and Responsibilities in a Review
There are various roles and responsibilities defined for a review process. Within a review team,
four types of participants can be distinguished: moderator, author, recorder, reviewer and
manager. Let’s discuss their roles one by one:-
1. The moderator: - The moderator (or review leader) leads the review process. His role is to
determine the type of review, approach and the composition of the review team. The moderator
also schedules the meeting, disseminates documents before the meeting, coaches other team
members, paces the meeting, leads possible discussions and stores the data that is collected.
2. The author: - As the writer of the ‘document under review’, the author’s basic goal should be to
learn as much as possible with regard to improving the quality of the document. The author’s
task is to illuminate unclear areas and to understand the defects found.
3. The recorder: – The scribe (or recorder) has to record each defect found and any suggestions
or feedback given in the meeting for process improvement.
4. The reviewer: - The role of the reviewers is to check defects and further improvements in
accordance to the business specifications, standards and domain knowledge.
5. The manager :- Manager is involved in the reviews as he or she decides on the execution of
reviews, allocates time in project schedules and determines whether review process objectives
have been met or not.
Phases of a formal Review (Phases of Inspection)
A formal review takes place in a piecemeal approach which consists of 6 main steps. Let’s discuss
about these phases one by one.
1. Planning
The review process for a particular review begins with a ‘request for review’ by the author to the
moderator (or inspection leader). A moderator is often assigned to take care of the
scheduling (dates, time, place and invitation) of the review. The project planning needs to allow
time for review and rework activities, thus providing engineers with time to thoroughly participate
in reviews. There is an entry check performed on the documents and it is decided that which
documents are to be considered or not. The document size, pages to be checked, composition
of review team, roles of each participant, strategic approach are decided into planning phase.
2. Kick-Off
The goal of this meeting is to get everybody on the same page regarding the document under
review. Also the result of the entry and exit criteria is discussed. Basically, during the kick-off
meeting, the reviewers receive a short introduction on the objectives of the review and the
documents. Role assignments, checking rate, the pages to be checked, process changes and
possible other questions are also discussed during this meeting. Also, the distribution of the
document under review, source documents and other related documentation, can also be done
during the kick-off.
3. Preparation
In this phase, participants work individually on the document under review using the related
documents, procedures, rules and checklists provided. The individual participants identify
defects, questions and comments, according to their understanding of the document and role.
Spelling mistakes are recorded on the document under review but not mentioned during the
meeting. The annotated document will be given to the author at the end of the logging meeting.
Using checklists during this phase can make reviews more effective and efficient.
4. Review Meeting
This meeting typically consists of the following elements:-
-logging phase
-discussion phase
-decision phase.
During the logging phase the issues, e.g. defects, that have been identified during the preparation
are mentioned page by page, reviewer by reviewer and are logged either by the author or by a
scribe. This phase is for just jot down all the issues not to discuss them in detail. If an issue needs
discussion, the item is logged and then handled in the discussion phase. A detailed discussion on
whether or not an issue is a defect is not very meaningful, as it is much more efficient to simply log
it and proceed to the next one.
The issues classified as discussion items will be handled during discussion phase. Participants
can take part in the discussion by bringing forward their comments and reasoning. The moderator
also paces this part of the meeting and ensures that all discussed items either have an outcome
by the end of the meeting, or are defined as an action point if a discussion cannot be solved
during the meeting. The outcome of discussions is documented for future reference.
At the end of the meeting, a decision on the document under review has to be made by the
participants, sometimes based on formal exit criteria. The most important exit criterion is the
average number of critical and major defects found per page. If the number of defects found per
page exceeds a certain level, the document must be reviewed again, after it has been reworked. If
the document complies with the exit criteria, the document will be checked during follow-up by the
moderator or one or more participants. Subsequently, the document can leave or exit the review
process.
5. Rework
Based on the defects detected and improvements suggested in the review meeting, the author
improves the document under review. In this phase the author would be doing all the rework to
ensure that defects detected should fixed and corrections should be properly implied. Changes
that are made to the document should be easy to identify during follow-up, therefore the author
has to indicate where changes are made.
6. Follow-Up
After the rework, the moderator should ensure that satisfactory actions have been taken on all
logged defects, improvement suggestions and change requests. If it is decided that all
participants will check the updated document, the moderator takes care of the distribution and
collects the feedback. In order to control and optimize the review process, a number of
measurements are collected by the moderator at each step of the process. Examples of such
measurements include number of defects found, number of defects found per page, time spent
checking per page, total review effort, etc. It is the responsibility of the moderator to ensure that
the information is correct and stored for future analysis.
Disadvantages of Selenium
• Selenium needs good technical expertise. The resource should have good programming skills.
• Selenium only supports web based application and does not support windows based application.
• It is difficult to test Image based application.
• Selenium need outside support for report generation activity like TestNG or Jenkins.
• Selenium does not provide any built in IDE for script generation and it need other IDE like Eclipse for
writing scripts.
• Selenium script creation time is bit high.
Relational Databases
A relational database at its simplest is a set of tables used for storing data. Each table has a
unique name and may relate to one or more other tables in the database through common values.
Tables
A table in a database is a collection of rows and columns. Tables are also known as entities or
relations.
Rows
A row contains data pertaining to a single item or record in a table. Rows are also known as
records or tuples.
Columns
A column contains data representing a specific characteristic of the records in the table. Columns
are also known as fields or attributes.
Relationships
A relationship is a link between two tables (i.e. relations). Relationships make it possible to find
data in one table that pertains to a specific record in another table.
Datatypes
Each of a table's columns has a defined datatype that specifies the type of data that can exist in
that column. For example, the FirstName column might be defined as varchar (20), indicating that
it can contain a string of up to 20 characters. Unfortunately, datatypes vary widely between
databases.
Candidate Key
A Candidate key is an attribute or set of attributes that uniquely identifies a record. Among the set
of candidate, one candidate key is chosen as Primary Key. So a table can have multiple candidate
key but each table can have maximum one primary key.
Primary Keys
Most tables have a column or group of columns that can be used to identify records. For example,
an Employees table might have a column called EmployeeID that is unique for every row. This
makes it easy to keep track of a record over time and to associate a record with records in other
tables.
Foreign Keys
Foreign key columns are columns that link to primary key columns in other tables, thereby
creating a relationship. For example, the Customers table might have a foreign key column
called SalesRep that links to EmployeeID, the primary key in the Employees table.
Alternate Key
Alternate keys are candidate keys that are not selected as primary key. Alternate key can also
work as a primary key. Alternate key is also called “Secondary Key”.
Unique Key
A unique key is a set of one or more attribute that can be used to uniquely identify the records in
table. Unique key is similar to primary key but unique key field can contain a “Null” value but
primary ke y doesn’t allow “Null” value. Other difference is that primary key field contain a
clustered index and unique field contain a non-clustered index.
Composite Key
Composite key is a combination of more than one attributes that can be used to uniquely identity
each record. It is also known as “Compound” key. A composite key may be a candidate or primary
key.
Super Key
Super key is a set of on e or more than one keys that can be used to uniquely identify the record
in table. A Super key for an entity is a set of one or more attributes whose combined value
uniquely identifies the entity in the entity set. A super key is a combine form of Primary Key,
Alternate key and Unique key and Primary Key, Unique Key and Alternate Key are subset of super
key.
Surrogate Key
Surrogate key is an artificial key that is used to uniquely identify the record in table. For example,
in SQL Server or Sybase database system contain an artificial key that is known as “Identity”.
Surrogate keys are just simple sequential number. Surrogate keys are only used to act as a
primary key.
Difference between primary key and unique constraints?
Primary key cannot have NULL value; the unique constraints can have NULL values. There is only one
primary key in a table, but there can be multiple unique constrains.
Database Normalization
It is a process of analyzing the given relation schemas based on their functional dependencies
and primary keys to achieve the following desirable properties:
1) Minimizing Redundancy
2) Minimizing the Insertion, Deletion, And Update Anomalies
Relation schemas that do not meet the properties are decomposed into smaller relation schemas
that could meet desirable properties.
An update anomaly. Employee 519 is shown as having different addresses on different records.
An insertion anomaly. Until the new faculty member, Dr. Newsome, is assigned to teach at least one course, his or her
details cannot be recorded.
A deletion anomaly. All information about Dr. Giddens is lost if he or she temporarily ceases to be assigned to any
courses.
When an attempt is made to modify (update, insert into, or delete from) a relation, the following undesirable
side-effects may arise in relations that have not been sufficiently normalized:
• Update anomaly. The same information can be expressed on multiple rows; therefore updates to the relation may result
in logical inconsistencies. For example, each record in an "Employees' Skills" relation might contain an Employee ID,
Employee Address, and Skill; thus a change of address for a particular employee may need to be applied to multiple
records (one for each skill). If the update is only partially successful – the employee's address is updated on some
records but not others – then the relation is left in an inconsistent state. Specifically, the relation provides conflicting
answers to the question of what this particular employee's address is. This phenomenon is known as an update
anomaly.
• Insertion anomaly. There are circumstances in which certain facts cannot be recorded at all. For example, each record
in a "Faculty and Their Courses" relation might contain a Faculty ID, Faculty Name, Faculty Hire Date, and Course
Code. Therefore, we can record the details of any faculty member who teaches at least one course, but we cannot
record a newly hired faculty member who has not yet been assigned to teach any courses, except by setting the Course
Code to null. This phenomenon is known as an insertion anomaly.
• Deletion anomaly. Under certain circumstances, deletion of data representing certain facts necessitates deletion of data
representing completely different facts. The "Faculty and Their Courses" relation described in the previous example
suffers from this type of anomaly, for if a faculty member temporarily ceases to be assigned to any courses, we must
delete the last of the records on which that faculty member appears, effectively also deleting the faculty member,
unless we set the Course Code to null. This phenomenon is known as a deletion anomaly.
Database Normalization
Normalization is a database design technique which organizes tables in a manner that reduces redundancy and
dependency of data.
1st Normal Form
• Each table cell should contain a single value.
• Each record needs to be unique.
Database Trigger
A Trigger is a code that associated with insert, update or delete operations. The code is executed
automatically whenever the associated query is executed on a table. Triggers can be useful to
maintain integrity in database.
Stored Procedure
A stored procedure is like a function that contains a set of operations compiled together. It
contains a set of operations that are commonly used in an application to do some common
database tasks.
Database Indexes
A database index is a data structure that improves the speed of data retrieval operations on a
database table at the cost of additional writes and the use of more storage space to maintain the
extra copy of data.
Relational Database Management System
A Relational Database Management System (RDBMS), commonly (but incorrectly) called a
database, is software for creating, manipulating, and administering a database. For simplicity, we
will often refer to RDBMSs as databases.
Characteristics of Database Management System
A database management system has following characteristics:
Data stored into Tables: Data is never directly stored into the database. Data is stored into
tables, created inside the database. DBMS also allows to have relationships between tables which
makes the data more meaningful and connected. You can easily understand what type of data is
stored where by looking at all the tables created in a database.
Reduced Redundancy: In the modern world hard drives are very cheap, but earlier when hard
drives were too expensive, unnecessary repetition of data in database was a big problem. But
DBMS follows Normalization which divides the data in such a way that repetition is minimum.
Data Consistency: On Live data, i.e. data that is being continuously updated and added,
maintaining the consistency of data can become a challenge. But DBMS handles it all by itself.
Support Multiple user and Concurrent Access: DBMS allows multiple users to work on it(update,
insert, delete data) at the same time and still manages to maintain the data consistency.
Query Language: DBMS provides users with a simple Query language, using which data can be
easily fetched, inserted, deleted and updated in a database.
Security: The DBMS also takes care of the security of data, protecting the data from un-
authorized access. In a typical DBMS, we can create user accounts with different access
permissions, using which we can easily secure our data by restricting user access.
DBMS supports transactions, which allows us to better handle and manage data integrity in real
world applications where multi-threading is extensively used.
The TRUNCATE command is used to delete all the rows from the table and free the space containing the
table.
TRUNCATE TABLE table_name;
The SQL DROP command is used to remove an object from the database. If you drop a table,
all the rows in the table is deleted and the table structure is removed from the database. Once a
table is dropped we cannot get it back, so be careful while using DROP command. When a table
is dropped all the references to the table will not be valid.
DROP TABLE table_name;
If a table is dropped, all the relationships with other tables will no longer be valid, the integrity constraints
will be dropped, grant or access privileges on the table will also be dropped, if you want use the table again
it has to be recreated with the integrity constraints, access privileges and the relationships with other tables
should be established again. But, if a table is truncated, the table structure remains the same, therefore any
of the above problems will not exist.
We have capitalized the words SELECT and FROM because they are SQL keywords. SQL is case insensitive, but it helps for
readability, and is good style.
If we want more information, we can just add a new column to the list of fields, right after SELECT:
SELECT year, month, day
FROM surveys;
Or we can select all of the columns in a table using the wildcard *
SELECT *
FROM surveys;
Unique values
If we want only the unique values so that we can quickly see what species have been sampled we use DISTINCT
SELECT DISTINCT species_id
FROM surveys;
If we select more than one column, then the distinct pairs of values are returned
SELECT DISTINCT year, species_id
FROM surveys;
Calculated values
We can also do calculations with the values in a query. For example, if we wanted to look at the mass of each individual on different
dates, but we needed it in kg instead of g we would use
SELECT year, month, day, weight /1000.0
FROM surveys;
When we run the query, the expression weight / 1000.0 is evaluated for each row and appended to that row, in a new column.
Expressions can use any fields, any arithmetic operators (+, -, *, and /) and a variety of built-in functions. For example, we could round
the values to make them easier to read.
SELECT plot_id, species_id, sex, weight, ROUND (weight / 1000.0, 2)
FROM surveys;
Challenge
Write a query that returns the year, month, day, species_id and weight in mg
SOLUTION
SELECT day, month, year, species_id, weight * 1000
FROM surveys;
Filtering
Databases can also filter data – selecting only the data meeting certain criteria. For example, let’s say we only want data for the
species Dipodomys merriami, which has a species code of DM. We need to add a WHERE clause to our query:
SELECT *
FROM surveys
WHERE species_id='DM';
We can do the same thing with numbers. Here, we only want the data since 2000:
SELECT * FROM surveys
WHERE year >= 2000;
We can use more sophisticated conditions by combining tests with AND and OR. For example, suppose we want the data
on Dipodomys merriami starting in the year 2000:
SELECT *
FROM surveys
WHERE (year >= 2000) AND (species_id = 'DM');
Note that the parentheses are not needed, but again, they help with readability. They also ensure that the computer
combines AND and OR in the way that we intend.
If we wanted to get data for any of the Dipodomys species, which have species codes DM, DO, and DS, we could combine the tests
using OR:
SELECT *
FROM surveys
WHERE (species_id = 'DM') OR (species_id = 'DO') OR (species_id = 'DS');
Challenge
Write a query that returns the day, month, year, species_id, and weight (in kg) for individuals caught on Plot 1 that weigh more than 75
g
SOLUTION
SELECT day, month, year, species_id, weight / 1000.0
FROM surveys
WHERE plot_id = 1
AND weight > 75;
SELECT *
FROM surveys
WHERE (year >= 2000) AND (species_id IN ('DM', 'DO', 'DS'));
We started with something simple, then added more clauses one by one, testing their effects as we went along. For complex queries,
this is a good strategy, to make sure you are getting what you want. Sometimes it might help to take a subset of the data that you can
easily see in a temporary database to practice your queries on before working on a larger or more complicated database.
When the queries become more complex, it can be useful to add comments. In SQL, comments are started by --, and end at the end of
the line. For example, a commented version of the above query can be written as:
-- Get post 2000 data on Dipodomys' species
-- These are in the surveys table, and we are interested in all columns
SELECT * FROM surveys
-- Sampling year is in the column `year`, and we want to include 2000
WHERE (year >= 2000)
-- Dipodomys' species have the `species_id` DM, DO, and DS
AND (species_id IN ('DM', 'DO', 'DS'));
Although SQL queries often read like plain English, it is always useful to add comments; this is especially true of more complex
queries.
Sorting
We can also sort the results of our queries by using ORDER BY. For simplicity, let’s go back to the species table and alphabetize it by
taxa.
First, let’s look at what’s in the species table. It’s a table of the species_id and the full genus, species and taxa information for each
species_id. Having this in a separate table is nice, because we didn’t need to include all this information in our main surveys table.
SELECT *
FROM species;
Now let’s order it by taxa.
SELECT *
FROM species
ORDER BY taxa ASC;
The keyword ASC tells us to order it in Ascending order. We could alternately use DESC to get descending order.
SELECT *
FROM species
ORDER BY taxa DESC;
ASC is the default.
We can also sort on several fields at once. To truly be alphabetical, we might want to order by genus then species.
SELECT *
FROM species
ORDER BY genus ASC, species ASC;
Challenge
Write a query that returns year, species_id, and weight in kg from the surveys table, sorted with the largest weights at the top.
SOLUTION
SELECT year, species_id, weight / 1000.0
FROM surveys ORDER BY weight DESC;
Order of execution
Another note for ordering. We don’t actually have to display a column to sort by it. For example, let’s say we want to order the
birds by their species ID, but we only want to see genus and species.
SELECT genus, species
FROM species
WHERE taxa = 'Bird'
ORDER BY species_id ASC;
We can do this because sorting occurs earlier in the computational pipeline than field selection.
The computer is basically doing this:
Challenge
Let’s try to combine what we’ve learned so far in a single query. Using the surveys table write a query to display the three date
fields, species_id, and weight in kilograms (rounded to two decimal places), for individuals captured in 1999, ordered alphabetically by
the species_id. Write the query as a single line, then put each clause on its own line, and see how more legible the query becomes!
SOLUTION
SELECT year, month, day, species_id, ROUND(weight / 1000.0, 2)
FROM surveys
WHERE year = 1999
ORDER BY species_id;
wGROUP BY:.tutorialgateway.org
The GROUP BY clause is a SQL command that is used to group rows that have the same values. The GROUP
BY clause is used in the SELECT statement . Optionally it is used in conjunction with aggregate functions to
produce summary reports from the database. That's what it does, summarizing data from the database.
Difference between HAVING and WHERE clauses
HAVING: is used to check conditions after the aggregation takes place.
WHERE: is used to check conditions before the aggregation takes place.
This code:
Select City, count(ContactAdd) as AddressCount from Address where State = ‘MA’ group by City having count(ContactAdd)>5
Gives you a table of cities in MA with more than 5 addresses and the number of addresses in each city.
• Inner
• Outer
• Left
• Right
ID NAME
1 abhi
2 adam
3 alex
4 anu
and the class_info table,
ID Address
1 DELHI
2 MUMBAI
3 CHENNAI
Inner JOIN query will be,
SELECT * from class INNER JOIN class_info where class.id = class_info.id;
OUTER JOIN
Outer Join is based on both matched and unmatched data. Outer Joins subdivide further into,
3 CHENNAI
7 NOIDA
8 PANIPAT
Full Outer Join query will be like,
SELECT * FROM class FULL OUTER JOIN class_info ON (class.id = class_info.id);
Example 2:
Let’s take another example.
Object: Car
State: Color, Brand, Weight, Model
Behavior: Break, Accelerate, Slow Down, and Gear change.
Note: As we have seen above, the states and behaviors of an object, can be represented by
variables and methods in the class respectively.
Characteristics of Objects:
If you find it hard to understand Abstraction and Encapsulation, do not worry as I have covered
these topics in detail with examples in the next section of this guide.
1. Abstraction
2. Encapsulation
3. Message passing
Abstraction: Abstraction is a process where you show only “relevant” data and “hide” unnecessary
details of an object from the user.
Message passing
A single object by itself may not be very useful. An application contains many objects. One object
interacts with another object by invoking methods on that object. It is also referred to as Method
Invocation. See the diagram below.
11.1.2 What is a Class in OOPs Concepts
A class can be considered as a blueprint using which you can create as many objects as you like.
For example, here we have a class Website that has two data members (also known as fields,
instance variables and object states). This is just a blueprint, it does not represent any website,
however using this we can create Website objects (or instances) that represents the websites. We
have created two objects, while creating objects we provided separate properties to the objects
using constructor.
public class Website {
//fields (or instance variable)
String webName;
int webAge;
// constructor
Website(String name, int age){
this.webName = name;
this.webAge = age;
}
public static void main(String args[]){
//Creating objects
Website obj1 = new Website("beginners book", 5);
Website obj2 = new Website("google", 18);
What is a Constructor?
Constructor looks like a method but it is in fact not a method. It’s name is same as class name and
it does not return any value. You must have seen this statement in almost all the programs I have
shared above:
MyClass obj = new MyClass();
If you look at the right side of this statement, we are calling the default constructor of
class myClass to create a new object (or instance).
We can also have parameters in the constructor, such constructors are known as parameterized
constructors.
Example of constructor
int age;
String name;
//Default constructor
ConstructorExample(){
this.name="Chaitanya";
this.age=30;
}
//Parameterized constructor
ConstructorExample(String n,int a){
this.name=n;
this.age=a;
}
public static void main(String args[]){
Output:
Chaitanya 30
Steve 56
These four features are the main OOPs Concepts that you must learn to understand the Object
Oriented Programming in Java
Abstraction
Abstraction is the concept of hiding the internal details and describing things in simple terms. There
are many ways to achieve abstraction in object oriented programming, such as encapsulation and
inheritance.
Abstraction is a process where you show only “relevant” data and “hide” unnecessary details of an
object from the user. For example, when you login to your bank account online, you enter your
user_id and password and press login, what happens when you press login, how the input data sent
to server, how it gets verified is all abstracted away from the you.
Encapsulation
Encapsulation is the technique used to implement abstraction in object oriented programming.
Encapsulation is used for access restriction to a class members and methods.
Access modifier keywords are used for encapsulation in object oriented programming. For example,
encapsulation in java is achieved using private, protected and public keywords.
private hides from other classes within the package. public exposes to classes outside the package. protected is a
version of public restricted only to subclasses
in Java protected makes the method also accessible from the whole package.
Encapsulation simply means binding object state(fields) and behavior(methods) together. If you are
creating class, you are doing encapsulation.
Encapsulation example in Java
How to
1) Make the instance variables private so that they cannot be accessed directly from outside the
class. You can only set and get values of these variables through the methods of the class.
2) Have getter and setter methods in the class to set and get the values of the fields.
class EmployeeCount
{
private int numOfEmployees = 0;
public void setNoOfEmployees (int count)
{
numOfEmployees = count;
}
public double getNoOfEmployees ()
{
return numOfEmployees;
}
}
public class EncapsulationExample
{
public static void main(String args[])
{
EmployeeCount obj = new EmployeeCount ();
obj.setNoOfEmployees(5613);
System.out.println("No Of Employees: "+(int)obj.getNoOfEmployees());
}
}
Output:
No Of Employees: 5613
The class EncapsulationExample that is using the Object of class EmployeeCount will not able to get the
NoOfEmployees directly. It has to use the setter and getter methods of the same class to set and get
the value.
So what is the benefit of encapsulation in java programming
Well, at some point of time, if you want to change the implementation details of the class
EmployeeCount, you can freely do so without affecting the classes that are using it.
Inheritance
The process by which one class acquires the properties and functionalities of another class is
called inheritance. Inheritance provides the idea of reusability of code and each sub class
defines only those features that are unique to it, rest of the features can be inherited from the
parent class.
1. Inheritance is a process of defining a new class based on an existing class by extending its
common data members and methods.
2. Inheritance allows us to reuse of code, it improves reusability in your java application.
3. The parent class is called the base class or super class. The child class that extends the base class
is called the derived class or sub class or child class.
Note: The biggest advantage of Inheritance is that the code in base class need not be
rewritten in the child class.
The variables and methods of the base class can be used in the child class as well.
Syntax: Inheritance in Java
To inherit a class we use extends keyword. Here class A is child class and class B is parent
class.
class A extends B
{
}
Inheritance Example
In this example, we have a parent class Teacher and a child class MathTeacher. In
the MathTeacher class we need not to write the same code which is already present in the
present class. Here we have college name, designation and does() method that is common
for all the teachers, thus MathTeacher class does not need to write this code, the common
data members and methods can inherited from the Teacher class.
class Teacher {
String designation = "Teacher";
String college = "Beginnersbook";
void does(){
System.out.println("Teaching");
}
}
public class MathTeacher extends Teacher{
String mainSubject = "Maths";
public static void main(String args[]){
MathTeacher obj = new MathTeacher();
System.out.println(obj.college);
System.out.println(obj.designation);
System.out.println(obj.mainSubject);
obj.does();
}
}
Output:
Beginnersbook
Teacher
Maths
Teaching
Note: Multi-level inheritance is allowed in Java but not multiple inheritance
Types of Inheritance:
Single Inheritance: refers to a child and parent class relationship where a class extends the
another class.
Multilevel inheritance: refers to a child and parent class relationship where a class extends
the child class. For example class A extends class B and class B extends class C.
Hierarchical inheritance: refers to a child and parent class relationship where more than one
classes extends the same class. For example, class B extends class A and class C extends
class A.
Multiple Inheritance: refers to the concept of one class extending more than one classes,
which means a child class has two parent classes. Java doesn’t support multiple inheritance,
read more about it here.
Most of the new OO languages like Small Talk, Java, C# do not support Multiple inheritance.
Multiple Inheritance is supported in C++.
Polymorphism
Polymorphism is the concept where an object behaves differently in different situations. In java, we
use method overloading and method overriding to achieve polymorphism.
Polymorphism is a object oriented programming feature that allows us to perform a single
action in different ways. For example, lets say we have a class Animal that has a
method animalSound(), here we cannot give implementation to this method as we do not know
which Animal class would extend Animal class. So, we make this method abstract like this:
Types of Polymorphism
1) Static Polymorphism
2) Dynamic Polymorphism
Static Polymorphism:
Polymorphism that is resolved during compiler time is known as static polymorphism. Method
overloading can be considered as static polymorphism example.
Method Overloading: This allows us to have more than one methods with same name in a
class that differs in signature.
class DisplayOverloading
{
public void disp( char c)
{
System.out.println(c);
}
public void disp(char c, int num)
{
System.out.println(c + " "+num);
}
}
public class ExampleOverloading
{
public static void main(String args[])
{
DisplayOverloading obj = new DisplayOverloading();
obj.disp('a');
obj.disp('a',10);
}
}
Output:
a
a 10
When I say method signature I am not talking about return type of the method, for example if
two methods have same name, same parameters and have different return type, then this is
not a valid method overloading example. This will throw compilation error.
Dynamic Polymorphism
It is also known as Dynamic Method Dispatch. Dynamic polymorphism is a process in which a
call to an overridden method (Method Overriding means method in the child class having
same name and parameter as in parent class but different implementation) is resolved at
runtime rather, thats why it is called runtime polymorphism.
Example
class Animal{
public void animalSound(){
System.out.println("Default Sound");
}
}
public class Dog extends Animal{
System.out.println("Woof");
}
public static void main(String args[]){
Output:
Woof
Since both the classes, child class and parent class have the same method animalSound. Which
of the method will be called is determined at runtime by JVM.
Few more overriding examples:
Animal obj = new Animal();
obj.animalSound();
// This would call the Animal class method