0% found this document useful (0 votes)
13 views52 pages

SLAM Manuscript

This document reviews the development of Simultaneous Localization and Mapping (SLAM) frameworks for autonomous robotics, highlighting its importance in enabling robots to navigate and map unknown environments. It categorizes SLAM techniques into deterministic and probabilistic frameworks, as well as feature-based and optimization-based approaches, while discussing the evolution and challenges faced in the field. The paper emphasizes the integration of various sensors and the significance of SLAM in enhancing machine autonomy and decision-making capabilities.

Uploaded by

Niva Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views52 pages

SLAM Manuscript

This document reviews the development of Simultaneous Localization and Mapping (SLAM) frameworks for autonomous robotics, highlighting its importance in enabling robots to navigate and map unknown environments. It categorizes SLAM techniques into deterministic and probabilistic frameworks, as well as feature-based and optimization-based approaches, while discussing the evolution and challenges faced in the field. The paper emphasizes the integration of various sensors and the significance of SLAM in enhancing machine autonomy and decision-making capabilities.

Uploaded by

Niva Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Development of Simultaneous localization and

mapping framework for autonomous robotics - A


comprehensive review

Sabita Pal?1 , Smriti Gupta1 , Niva Das1 , and Kuntal Ghosh2


1
Department of Electronics and Communication Engineering, ITER, Siksha ‘O’
Anusandhan (Deemed to be University), Bhubaneswar, Odisha, India
2
Department of Electronics and Communication Engineering, Asansol Engineering
College, India
pal.sabita@gmail.com,smriti3009@gmail.com,nivadas@soa.ac.in,kuntalgh24@gmail.com

Abstract. Autonomous robotics is the emerging field in the present sce-


nario where the current industrial demands are gradually depending on
human-machine interaction. In that process, machine intelligence plays a
dominant role during the decision making in the operational state-space.
Primarily this decision making and execution of control relying on sens-
ing and actuation.Simultaneous localization and mapping(SLAM) is the
most advanced technique that facilitates both sensing and actuation to
achieve autonomy for robots. This work aims to collate multi-dimensional
aspects of simultaneous localization and mapping technique primarily
in the purview of both deterministic and probabilistic framework. This
classification of SLAM technique is further elaborated in other categories
such as Feature-based SLAM and Optimization based SLAM. This sur-
vey also encompasses chronological evolution of SLAM technique to en-
sure comprehensive understanding among the concerned researchers.

Keywords: SLAM, Deterministic SLAM, Probabilistic SLAM, Feature based


SLAM, Optimization based SLAM, ORB-SLAM, BA, Factor graph Optimiza-
tion.

Table 1: List of abbreviations used in this article.

Abbreviation Explanation
AI Artificial Intelligence
ANN Artificial neural network
AUV Autonomous Underwater Vehicle
BA Bundle Adjustment
BRIEF Binary Robust Independent Elementary Features
CML Concurrent Localization and Mapping
DOF Degree Of Freedom
DoG Difference-of-Gaussian
?
corresponding author
EKF Extended Kalman Filter
FAST Features from Accelerated Segment Test
g2o General Graph Optimization
GNC Guidance, Navigation and Control
GPS Global Positioning System
iLBA Incremental Light Bundle Adjustment
IMU Inertial Measurement Unit
iSAM Incremental Smoothing and Mapping
KF Kalman Filter
LAGO Linear Approximation for pose Graph Optimization
LBA Light Bundle Adjustment
LIDAR Light Detection and Ranging
MAV Micro Aerial Vehicles
MLR Multi-level relaxation
MSCKF Multi-State Constraint Kalman Filter
ORB-SLAM Oriented FAST and Rotated BRIEF
RANSAC Random sample consensus
RGB-D Red Green Blue Depth
Right Invariant Error Extended Kalman filter
RIEKF-VINS
Visual Inertial Navigation Systems
SFM Structure from Motion
SGD Stochastic gradient descent
SIFT Scale Invariant Feature Transform
SLAM Simultaneous Localization and Mapping
SPA Sparse Pose Adjustment
SURF Speeded Up Robust Feature
UAV Un-manned Autonomous Vehicle
UKF Unscented Kalman Filter
VINS Visual Inertial Navigation Systems
VO Visual odometry
vSLAM Visual Simultaneous Localization and Mapping

1 Introduction

Robotics is the emerging field of science which aims to reduce human depen-
dency. The implementation of robotic technology provides assistance to various
technical fields like guidance, navigation and control, collaborative work etc. Re-
cently, integration of artificial Intelligence (AI) technology with robotics ensures
autonomy to the robots for decision making. This phenomena assists robots in
predicting the uncertainties in the unknown world. Perception of robots, mainly
develops through single or multiple sensor data [1]. Using these sensor data robot
perceives the surrounding environment. This execution largely depend on the
sensor data processing. In the field of navigation, sensor data track the position
and orientation of the robot body and ensure controls for navigation accordingly.
The mechanism of sensor fusion poses multiple challenges for real-time applica-
tion which leads to generate several modeling algorithm [2]. Based on sensing
methodology a robot can be classified as autonomous in real-time or not. In most
cases sensing works according to the deterministic algorithm in a known envi-
ronment (prior information). Autonomy generates plans based on sensor data.
Depending on the dynamically changing environment, appropriate sequence of
actions are adapted. The non-deterministic algorithms tries to overcome the
probability of error in difficult modeling process. It is said that a machine or
software system is capable to directly replace a human being. A human being
has the capability to plan the path to proceed and also can do problem solving.
A machine works under strict human supervision. Without human interference
or without risk of failure in unmanned ground vehicle, monitoring and planning
with its intelligent capabilities can be achieved [3].
Autonomous robotics faces major challenges in the field of path planning(
moving from point A to point B), mapping(assuming the unknown world by inte-
grating the measurements of positions and rebuilding it), localization(assuming
the perfect location with respect to the world model from sensor data). It has
also made advancement in the SLAM technique, a combination of the above
mentioned challenges. The robotic system works on the principle of sense, plan
and act.

Robot principle
input output
factors

Sensed
Sense Sensor data information

Sensed or
Plan cognitive Directives
Information

Sensed
information or Actuator
Act
directives commands

Fig. 1. Interfaces of Robotics System: This table depicts different sub-systems of au-
tonomous robots

Primarily, robots are used for multiple purposes like surveillance, production,
agriculture, surgery, monitoring etc.It can replace human for dangerous assign-
ments, assist the aging group in their daily chores, amuse by providing entertain-
ment and at last enables humans to project properly from a distance in real time
by sensing, planning and acting accordingly. the military drones guides soldiers
to sense(see) and shoot the target from a beyond visual range. The automated
surgery robots helps surgeons work with precision. Robots are also widely ap-
plied in disaster m while resource allocations [3]. Gradually, autonomous robots
are occupying space in the vast and fast evolution of advanced applications with
multi-dimensional operational challenges.Under those circumstances , robots re-
quire fail-safe guidance, control and navigation (GNC) enabled localization for
monitoring its pose in the unknown environment. Pose estimation generally con-
sists of integrated information of both position and orientation of the moving
robot. Further, this localization in each iteration helps in constructing a map of
the unknown environment (terrestrial, aerial, or underwater). [4] [5]. This inte-
grated process comprised of localization and mapping is known as simultaneous
localization and mapping (SLAM) [6].
Simultaneous localization and mapping technique can be also applied to arti-
ficial intelligence mobile robot for self-exploration in different geographical envi-
ronments. It solves the computational problem for an autonomous mobile robot
which is to be placed in an unknown location and to construct as well as update
a map while simultaneously tracking its position or location. Localization defines
the exact or accurate current pose of the robot in an unknown environment [7].
The minimum requirement for SLAM technique is that the robot should be
mounted with multiple sensors like camera, LIDAR, sonar, IMU, GPS etc [2].
Internal sensors like accelerometer (measures transnational motion), gyroscope
(measure rotational motion and changes in its orientation). External sensors like
cameras (captures real-time structures of landmarks), bump sensor (detects and
avoids obstacles), force-torque sensors (measures the 6 DOF forces for move-
ment), spectrometers. Sensors collect information about the appearance from
the data collected from particular locations along with specific landmarks, thus
building a map. Super-imposing the sensor information over the map built can
enhance the detailing of the unknown environment [8].
The main objective is to self-explore and avoids collisions with the obstacles
by implementing estimation technique under the probabilistic framework. The
autonomous robot or more scientifically called as artificial intelligence robot
which can ‘think’ like most animals or us, human beings do while making the
decision and act accordingly based on the instructions issued after the decision
making as its response [9]. Intensive relevant research has been done in the field
of automatic navigation and control of autonomous mobile robots.

2 Simultaneous Localization and Mapping

In the initial years of 1985-1990, mapping and localization were performed con-
currently and hence the term used was concurrent localization and mapping
(CML) [10, 11]. After a few years, SLAM was introduced by [6].
It is a technique to achieve autonomous control of robots in robotics. Robot
gathers the information of the unknown environment and continuously update
it. [12] The objective is to self-explore the environment avoiding obstacles within
it [13]. Several challenges are faced during self-exploration that includes avoid-
ing obstacles and landmarks while navigation, the capability of estimating own
location in an unknown environment, generating the map of the environment,
and based on that should be able to make decisions without human interference
[14].To build the map, robot depends on inputs from two types of sensors i.e.
proprioceptive and exteroceptive.
Proprioceptive sensors such as gyroscope and accelerometer measure internal
values of the system like velocity, wheel load, position change, and accelerations
and estimate the position of the entity using dead reckoning navigation method.
But due to inherent noise and cumulative error, position estimation of entity is
a bit challenging. On the other hand, exteroceptive sensors for instance sonar,
laser, cameras, and GPS acquires information from the outside world. Although
sonar and laser provide precise information. However, their cost and bulkiness
make them incompetent in a highly cluttered environment restricting their use in
airborne robots and humanoids. On the contrary GPS, sensors are also inefficient
and don’t work well indoors and underwater.
In the last decades, it is completely visible that cameras are the most widely
used exteroceptive sensor with the ability to retrieve the environment’s appear-
ance, color, and texture. These are less expensive, light-weight, and require low
power consumption. It helps the robot in the detection and recognition of land-
marks. SLAM using cameras as the only sensor is referred to as visual SLAM.
Visual Slam technique in the autonomous robot uses a camera for visual infor-
mation that captures video of the surrounding environment. Then the camera
measurements and motion can be estimated. The configuration of the camera as
the sensor is simple, which is widely applicable in the field of computer vision, in
un-manned autonomous vehicle (UAV) and augmented reality. The camera pose
which includes position and orientation is estimated and a 2D-3D structure of
the surrounding is constructed, simultaneously.
The basic SLAM characteristics [15] to be satisfied for autonomous robots
may be summarized as- i)Accuracy- it is the localization of the robot in the
local and global coordinate system of the map. It is always maintained below the
threshold. ii) Scalability- the capability to work in constant time and also possess
a constant memory to load mapped data for easy accessibility. iii)Availability- an
adequate accurate SLAM algorithm for localization can be used with the existing
map. iv)Recovery- the ability of the robot to localize itself in a large map. It
also helps to recover from tracking failures. v) Updatability- the changes in the
current observation are automatically made in the existing map. The changes
made are also not permanent, which may also be altered in the subsequent steps.
vi) Dynamicity- the ability to handle the changes in the dynamic environment.
The presence of obstacles causing collision can affect the SLAM approaches. The
climatic condition may also have an adverse impact on the localization process.
It can be further classified into three categories as feature-based, direct, and
RGB-D camera-based. In a feature-based approach, vSLAM uses a monocular
camera to extract feature points on tracking and mapping.
In the process of initialization, for camera pose estimation and 3D map re-
construction, the features or points in the global coordinate system is defined
Initialization

vSLAM Tracking

Mapping

Fig. 2. Visual SLAM

for the unknown environment. So in the process of initialization, a part of the


environment is reconstructed as a map using the points. In tracking, the recon-
structed map from the features extracted is superimposed in the image for the
estimation of the camera pose. In mapping, for an unknown region, the map
is expanded to form a 3D structure. To handle texture less environment direct
vSLAM approach was proposed which directly works on the whole image for
tracking and mapping without detecting feature points [16].
Cameras are easy to install but have higher technical difficulties as vSLAM
acquires less visual input compared to 360-degree laser sensing. Some works exist
which uses multiple cameras, wide-angle cameras or omnidirectional cameras
to increase the visual inputs [17]. Siva et al. [8] proposed an approach that
incorporates observations acquired from multiple sensors from different viewing
angles in the omnidirectional observation to recognize multi-directional places.
This approach is also valid when the robot approaches the place from various
directions.
However, visual SLAM fails in conditions like insufficient resolution, abrupt
illumination, dynamic environment, blurred images due to erratic movement of
the vehicle, or partial occlusion. Dharmasiri et al. [18] presented a multi-object,
real-time visual SLAM system where any duplicate instances of objects can be
detected while run time and helps the system in optimizing the map. Visual
SLAM when combines information from proprioceptive sensors are termed as
visual-inertial SLAM. Manderson et al. [19] presented a gaze-based SLAM that
assimilates visual and inertial information directing the stereo camera towards
feature-rich scenes. Stereo configurations can easily and accurately calculate
the 3D location of the landmarks using the method of triangulation. However,
kochanov et al. [20] works with static as well as dynamic objects in the map
and develops an approach that fuses stereo frame observation into temporally
consistent semantic 3D maps due to which object discovery is less vulnerable to
occlusions and noisy observations.
RGB-D SLAM uses a color image for feature extraction and matches the
features from the previous image frame. It then evaluates depth image at the
location of feature points creating a point-wise 3D resemblance between any two
frames [21]. Belter et al. [22] applies a Factor graph optimization technique for
the measurement of gains coming from maintaining a map of RGB-D features.
They also used an anisotropic model of uncertainty to improve the accuracy
of feature-based RGB-D SLAM. Conventional RGB-D SLAM uses very limited
depth range measurements and ignores region too close or too far from sensors.
Ataer et al. [23] proposed a system that uses 2D and 3D point features of an
RGB image as invalid and valid depth values. 2D-3D correspondence acts as
hybrid correspondence and can be utilized in both online SLAMs, which focuses
more on speed for real-time operation and offline processing which generates
corresponding frames with higher accuracy. On the other hand, Kendall et al.
[24] uses Bayesian convolutional neural network representation to regress the
full 3D camera pose and needs no additional method for graph optimization
eubertet al. [25] addresses the problem of combining 3D distance information
from the map and the visual image from the camera to navigate within the map
and similarity between the two images is used to evaluate pose hypothesis. On
the contrary Cieslewski et al. [26]relies on recent, machine-learned full image de-
scriptors and k-means clustering for visual place recognition. It involves sending
a lightweight query to a single robot, and if the place is matched the second query
can be sent to the robot who has just observed the match. Cabrera et al. [27]
addresses the problem of the autonomous landing of MAV on a platform that
utilizes a template-based matching in an image pyramid scheme in combination
with a canny edge detector.
Vision-based state estimation or visual SLAM can be broadly categorized
into two approaches which are shown in Fig. 3. The filtering method finds the
probability distribution of the information collected over the time domain by
summarizing it and also marginalizes the previous poses. While the keyframe
method, [28]computationally selects only a few past frames for processing by
global bundle adjustment optimization technique [29].
The first being the probabilistic or filter based approach and the second
the global optimization or machine learning approach. The probabilistic ap-
proach combines the visual features from the camera to the auxiliary sensor
data from the inertial measurement unit (IMU) and global positioning system,
if any. Kalman filter(KF), extended kalman filter (EKF), unscented kalman filter
belong to those categories the state vector contains all the current information of
the map and camera pose which is updated with every prediction step to attain
the assumed constant Gaussian linear motion model. kalman et al. [30] describes
the approaches related to the filtering techniques and their updation by predic-
tion to work as a linear filter. From the sensor data, it finds the relationship
between new observations and the map, to resolve the constraints [15].
The global optimization approach includes visual odometry (VO) by Engel et
al. [31] [32] and structure from motion (SFM) techniques [33] [34]. Features are
extracted from detecting and tracking the estimated pose thereby minimizing
LASER based (LIDAR)
SLAM Deterministic SLAM

VISION based (vSLAM)

Kalman Filter (KF)


Probabilistic SLAM

Extended Kalman Filter


(EKF)

Uncented Kalman Filter


(UKF)

Scale Invariant Feature


Feature based SLAM
Transform (SIFT)

Speeded Up Robust Feature


(SURF)

Features from Accelerated


Segment Test (FAST)

Binary Robust Independent


Elementary Features
(BRIEF)

Oriented FAST and Rotated


BRIEF (ORB-SLAM)

Optimization based Bundle Adjustment (BA)


SLAM

Pose Graph SLAM

Fig. 3. Approaches of vSLAM

the re-projection error. li et al. [35] explained different approaches by collect-


ing data from various indoor and outdoor environment by using an underwater
robot with cameras, flying vehicles as well as terrestrial vehicles and comparing
by testing the datasets over the years in the laboratory. At last, performance
is evaluated. Jones et al. [36] has estimated motion from an unknown gravity
vector and transforms it from monocular visual camera coordinates and inertial
measurement units. The gravity vector denotes the global orientation references
from visual features using filters.
Nowadays a third type of method which is based on machine learning and
neural network is getting popularized but the main disadvantage of this approach
is that it requires preliminary model training. This method is computationally
expensive, and requires more number of model parameters without increasing
performance significantly [24] [4].

3 SLAM technique using Probabilistic Framework

Majority of SLAM techniques use a probabilistic approach or Bayesian approach


for ensuring fail-safe conditions. This approach depends on multi-sensor data
fusion based on camera sensor for visual information and IMU sensors for aux-
iliary data. They use a sequence of imperfect actions and noisy observations of
the robot while estimating robot position and mapping the environment [37].
According to the Bayesian approach, sensor data retrieved by the robot are ex-
pressed as a conditional probability distribution. then these distributions are
used as input and two steps i.e prediction and correction are done to solve the
problem [38]. SLAM is of two types, first one is full SLAM in which the robot
estimates the entire path and map beforehand, the second is online SLAM in
which the robot estimates recent pose and map. Kalman filter (KF), extended
kalman filter (EKF), unscented kalman filter (UKF), Particle filter, etc. belong
to this category.
Garcia et al. [39] investigates an indoor SLAM technique that provides a scale
aware tracking and mapping system. It incorporates visual information and mea-
surements from an inertial measurement unit (IMU) for the 6 DOF pose estima-
tion using extended kalman filter. Sayre et al. [40] demonstrates repeated agile
maneuvering with closed-loop vision-based perception and control algorithms.
However, Wu et al. [41] proposed a novel RIEKF-VINS algorithm that preserves
the invariance property. It integrates RIEKF-VINS into the MSCKF framework
so that the resulting algorithm can obtain a consistent state estimator.

3.1 SLAM Principle

The robot solving a SLAM problem minimum requires a sensor that acquires
information about the environment in which it is moving. SLAM problem can
be defined in two ways online SLAM using equation (1) and full SLAM using
equation (2)
P (xk , m|Z1:k , U1:k ) (1)
P (x1:k , m|Z1:k , U1:k ) (2)
The only difference between the two is that in full SLAM posteriors are
calculated over the entire path x1:k . Here x represents location and orientation
of the robot, m is the ith feature location, Uk is the control vector and Zk is the
observation taken by sensor.
Applying Bayes theorem with assumptions that landmarks are time-invariant
in environment and robot is having Markov motion model SLAM formula can
be described by the recursive expression given by equation (3) [38] [42]

P (xk , m|Zk , Uk ) = η.P (Zk |xk , m) ∫ P (xk |xk−1 , Uk )(xk−1 , m|Zk−1 , Uk−1 )dxk−1
(3)
where η is the normalizing constant, P (xk |xk−1 , Uk ) is the motion model and
P (Zk |xk , m) is observation model of the robot respectively.
In general single robot SLAM is implemented in three steps:-

1. Robot’s current location is predicted by the motion model which is obtained


from the robot’s dynamic equation.

xk = f (xk−1 , Uk ) + wk (4)

Basically the predicted location xk depends on the previous location xk−1 ,


control signal Uk and the zero mean Gaussian noise wk ∼ N (0, Qk ).
2. The robot then gathers information about the environment and this stage is
described by the observation model which in turn depends on sensor readings.

Zk = h(xk , m) + νk (5)

The sensor measurement are function of the robot’s location xk , landmark


location m and the zero mean Gaussian noise νk ∼ N (0, Rk ).
3. Finally robot updates its location and map. The estimation of map and
robot’s location is described by the recursive equation given by equation
(3) [38] [42]

3.1.1 KF based SLAM


Kalman filter can easily handle uncertainty. It uses an iterative mathematical
process that use a set of equations and consecutive data inputs to quickly esti-
mate the true value, position, velocity etc of the object being measured, when
the measured values contain unpredicted or random error uncertainty or varia-
tion. Kalman Filter operates on past outputs . It predicts the new state and its
uncertainty (from the motion model). The corrected value is deduced with the
new observation measurement [30].
The flowchart of Kalman filter consists of three main calculations- First is
the calculation of the Kalman Gain. Kalman gain (KG) depends on the error in
the estimate and error in the data measurement. Second is the calculation of the
current estimate. Kalman gain feeds to the calculation of current estimate and
the previous estimate along with the data input. Third is the calculation of the
new error in the estimate.Based on Kalman gain and the current estimate, the
error in the new estimate is calculated to be fed to the new process.The current
estimate result is used to update estimate. This process gives better result in
tracking [43].
Fig. 4. Block diagram of Kalman Filter

3.1.2 EKF based SLAM


As Kalman filter based SLAM approach is meant only for linear motion but
most realistic problems involve nonlinear functions. EKF based SLAM is the
extension of kalman filter (KF) for the nonlinear cases. EKF based approach
estimates and stores robot pose and landmarks in a state vector and the uncer-
tainties in estimates in an error covariance matrices [44]. The main objective is
to find the posterior distribution of mean and covariance are given by equations
(6) and (7) respectively.  
x̄k|k
x̄ = µk = (6)
m̂k
 
Pxx Pxm
Pk|k = T (7)
Pxm Pmm
In the prediction step robot motion model as expressed by equation (4) and
mean covariance matrix is used to predict the mean and covariance at time k − 1
by using equations (8) and (9) respectively.

x̂k|k−1 = f (x̂k−1|k−1 , Uk ) (8)

Pxx,k|k−1 = ∇f Pxx,k−1|k−1 ∇f T + Qk (9)


where ∇f is the jacobian of nonlinear function f calculated at time k − 1 . It is
the most significant part of EKF as it converts the nonlinear function into linear
and helps in tracking nonlinear motion.
The correction stage provides an estimation of the prediction via observed
measurement and the observation model represented by equation (5). After
that updated state and its covariance is calculated using real observation and
predicted values using equation (10) and (11) respectively.

Vk = Zk − h(x̂k|k−1 , m̂k−1 ) (10)

Sk = ∇hPk|k−1 ∇hT + Rk (11)


where ∇h is the jacobian of h.
Finally estimated state is given by (12) and (13), the state is corrected and
the same steps are repeated for next states.
   
x̂k|k x̂k|k−1
= + Kk Vk (12)
m̂k|k m̂k|k−1

Pk|k = Pk|k−1 − Kk Sk KkT (13)


where k is the kalman gain given by equation (14) [38].

Kk = Pk|k−1 ∇hT Sk−1 (14)

If the number of features increases, the computational cost becomes high


which is one of the key problems in full EKF based SLAM. This complexity
arises because even if a single landmark is observed the whole matrix has to
be updated [45] [46]. Another problem is data association which arises due to
similar-looking features present in the environment [44]. There is also an ap-
proximation issue in EKF based SLAM as it approximates only to the first order
of the optimal terms. These approximations can induce large errors and can lead
to sub-optimal performance [47]. Garc et al. [39] describes for indoor environ-
ment navigation, (MAV) micro aerial vehicles using SLAM technique depends
on the result obtained by fusion of the measurements obtained from a monocular
camera and onboard sensors (IMU) autonomously navigate in unknown, hostile
and GPS denied environments, by pose estimation implemented by Extended
Kalman Filter (EKF) in ruined or semi-demolished buildings. It obtains a bet-
ter estimation of the 6DOFs. This method challenges the scaling and occlusion
changes.

3.1.3 UKF based SLAM


The Unscented Kalman filter(UKF) is a recursive state estimator to approx-
imate the mean and covariance of a random variable. It is based on the ap-
proximation of the probability function using unscented transformation. [48]
The Unscented Kalman filter based SLAM address the approximation issue of
EKF based SLAM. UKF was proposed by Juliear and Uhlmann [48] which was
based on deterministic sampling and unscented transform (UT). In UKF noise
probability distribution is expressed by some variables, called sigma points and
these are not limited to Gaussian distribution only [45]. In place of linearizing
a nonlinear function using Taylor’s expansion by finding jacobian, it uses 2n + 1
sigma points from Gaussian. UKF based SLAM can approximate accurately up
to third-order (Taylor series expansion) of any nonlinearity. The recursive esti-
mation of xk is expressed as

(x̂k = ( prediction of xk ) + Kk · [yk − ( prediction of yk )]) (15)

xk assumes the prior estimate of x̂k−1 and current observation yk are Gaus-
sian Random Variables.
UKF Algorithm
Initialize with:

x̂0 = E [x0 ] (16)


h i
T
P0 = E (x0 − x̂0 ) (x0 − x̂0 ) (17)
T
x̂a0 = E [xa ] = x̂T0 0 0

(18)
 
h i P0 0 0
T
Pa0 = E (xa0 − x̂a0 ) xâ0 − x̂a0

=  0 Pv 0  (19)
0 0 Pn
For k ∈ {1, . . . , ∞} Calculate sigma points:
h q i
a
Xk−1 = x̂ak−1 x̂ak−1 ± (L + λ)Pak−1 (20)

Time update:
X xk|k−1 = F X xk−1 , X vk−1
 
(21)
2L
(m)
X
x̂−
k = Wi ∞
Xi,k|k−1 (22)
i=0
2L h ih iT
(ci
X
P−
k = Wi x
Xi,k|k−1 − x̂−
k X x
i,k|k−1 − x̂−
k (23)
i=0
h i
Yk|k−1 = H X xk|k−1 , X nk−1 (24)
2L
(m)
X
ŷk− = Wi Yi,k|k−1 (25)
i=0

Measurement update equations:


2L
X (c) T
Yi,k|k−1 − ŷk− Yi,k|k−1 − ŷk−
 
Pyk ,yk = Wi (26)
i=0

2L
X (c) T
Xi,k|k−1 − x̂− Yi,k|k−1 − ŷk−
 
Pxk yk = Wi k (27)
i=0

K = Pxk yk P−1
ỹk ỹk (28)
x̂k = x̂− −

k + K yk − ŷk (29)

Pk = P−
k − KPỹk ỹk K
T
(30)
T h iT
T T T
where, xa = xT vT nT , X a = (X x ) (X v ) (X n )

λ = composite scaling
parameter, L = dimension of augmented state, Pv = process noise cov., Pv =
measurement noise cov., Wi = weights as calculated
In comparison to EKF computation,UKF doesn’t need the calculation of
Jacobian and Hessian in the algorithm. In the noisy series of the state estimation
problem UKF has superior performance.

Methods Pros Cons

- Optimal closed-form solution to the - Not applicable if the system or


tracking problem. measurement model is non-linear.
KF
- high convergence - Gaussian assumption.
- handle uncertainty - slow in high dimensional maps

- Able to deal with non-linear model.


- Can provide good loop closure - Problematic when dealing with large
EKF reference. maps that lead to numerous issues.
- Less runtime. - Calculation of Jacobian.
- High accuracy

- UKF is a better approximation than - More runtime.UKF is practically


EKF for non-linear models. slightly slower.
UKF - No need to calculate Jacobians for - Not a good solution for small
the UKF. Derivative free. amount of data.
- Better linearization. Highly efficient.

Fig. 5. Comparison of probabilistic method

3.2 Drawbacks of Probabilistic Approach

– The computational costs rise quadratically with the amount of landmarks


kept in the map.
– Due to the assumption that wk and vk are zero-mean, white and Gaus-
sian the linearization used in the EKF can cause intolerable errors in the
estimation.
– Performs recursive computation for each updation of landmarks – Compu-
tationally complex.
4 Optimization approach for Vision Based SLAM)

Advancement in the machine learning domain and improved computation lead


the learning approaches into feature extraction, detection, and matching methods
[49] [50]. It collects attributes from local training features and extracts features
using that information in real-time [12] [51] [9]. Data association is an important
part of featured based SLAM as it allows the keypoints i.e., feature extracted
from image, to be re-observed in the sequence of images, and also in loop closing.
The interest point detector helps in feature extraction and description from the
neighbors of the local points by adding distinguishing identity [52]. The feature
detection method is the reckoning of the representation of the image information
and making a local decision at every image point to verify the existence of an
image feature of the given type existing at that point. The interest point detector
creates maps and detects the features. The image matching applications continue
to grow in various fields like indoor, outdoor, aerial, and underwater. [53]. A
feature detection technique is robust to image transformations (scale, rotation,
illumination, noise). The features must be unique. the matching probability of a
single feature must be high. [54]

5 Feature Detection And Matching

Feature detection and matching is an important task in many computer vision


applications, such as structure-from-motion, image retrieval, object detection,
and more. A feature consists of the basic information of an image which is sig-
nificant in solving various computational problems in the application field of
structure from motion [33], object detection, automatic object tracking, stereo
calibration, motion-based segmentation, 3D object recognition and reconstruc-
tion, robot navigation.
When feature detection is applied to an image, the feature may be of any
structure like point, lines, edges, blobs, or objects. The selection of features
depends on the environment in which the robot moves and localizes itself. In
point detection, an isolated point has its grey level extensively different from its
background in a homogeneous region. The mask is superimposed on the required
image and is convolved to calculate its response. In-line detection, four different
types of masks are used for tracking the response. In edge detection, [55] [56] the
outline of the object is determined. An edge consists of a set of connected pixel
points that lies on the boundary between two different grey value regions. ‘Edges
are unique in space’ signifies that its position and orientation from various points
remain the same. [57]
The main constituent of feature detection and matching is detection, descrip-
tion, and matching. These are evaluated based on scale invariance, variation in
illumination, and viewpoint changes by rotation. It has multiple applications in
vSLAM which implement techniques like SIFT(Scale Invariant Feature Trans-
form), SURF(Speeded Up Robust Feature), FAST(Features from Accelerated
Segment Test), BRIEF(Binary Robust Independent Elementary Features), and
ORB-SLAM(Oriented FAST and Rotated BRIEF). The point detectors and de-
scriptors are applicable in the ground, underwater as well as aerial environ-
ment [58].

Feature Detection
Detector
and Matching

Descriptor

Matching

Fig. 6. Main Component Of Feature Detection And Matching under SLAM framework

Detection technique traces the interest points for an object. In the case of
edge, the boundary direction edge changes abruptly while the corner is the inter-
section of edge points. Mono-SLAM emphasizes edge segments for the construc-
tion of maps and visual SLAM on corners for locating landmarks [59]. Edges
are less affected by a sudden movement of camera which can blur the features is
explained by Klein et al. [28]. It is generally constant under changes in illumi-
nation, orientation, brightness, and scale.

Feature detection can again be categorized as-


. Flat region - no gradient change is observed in any direction.
. Edge region - no gradient change is observed along the edge in any direc-
tion. It observes an abrupt change in image intensities of the pixel (discon-
tinuous at different sides).
. Corner region – a significant gradient change in all directions. At this
point, two contour lines intersect in the edge local neighborhood.
Harris et al. [55] uses the Harris corner detection technique for rectifying
rotational invariance and sensitivity issues. It shifts a small window in all 8
directions followed by auto-correlation. The point of the largest variation in
intensity is detected as an interest point. It is invariant to sampling quantization,
scales. It overcomes the drawbacks of the Moravec operator which calculates edge
strength along 4 directions by applying threshold and Canny edge detector which
applies the derivative of the gaussian to an operator, thereby generating a new
edge enhanced kernel [60].
Descriptors are the local vector values surrounding the feature interest points
describing the image. It is applicable in areas of image registration, object detec-
tion, and recognition, 3D reconstruction of computer vision. They are invariant
(a) (b) (c)

Fig. 7. (a) Flat region (b) Edge region (c) Corner region

to changes. [61] The interest point along with its descriptor is its local feature
point. In Matching, the descriptors are compared for similar features across the
image1 [(Pi , Qi )] with matching features of image2 [(Pi0 , Q0i )].

[(Pi , Qi ) ⇔ (Pi0 , Q0i )] (31)

The descriptor represents the difference in the binary string of certain pairs of
pixels around the interest point [62].In the algorithm of a feature descriptor, an
image is an input and all distinct keypoints are found. The region around it is
described, extracted, and normalized. The normalized region extracts encoded
local descriptor features as a number series which is differentiated easily. it is then
matched for similar features along with the image. The important characteristics
of some visual SLAM systems are summarized in this paper in Table 2.

6 Approaches in feature detection and matching


The four approaches are-
1. SIFT (Scale Invariant Feature Transform)
2. SURF [(Speeded Up Robust Feature)
3. FAST (Features from Accelerated Segment Test)
4. BRIEF (Binary Robust Independent Elementary Features)
5. ORB (Oriented FAST and Rotated BRIEF)

6.1 SIFT(Scale Invariant Feature Transform)


In 2004, Loweet al. [63] had proposed the SIFT process to solve variant problems
related to the image rotation and scaling, robust to robust occlusion and clutter
and changes in noise, illumination changes, also has strong robustness. The SIFT
algorithm has been categorized into four steps:
1. Scale Space Extrema Peak selection: Potential location for finding features.
The initial image is repeatedly convolved with Gaussians to produce the set
of scale-space images

[L(x, y, σ) = G(x, y, σ) ∗ I(x, y)] (32)


Fig. 8. Feature Detection and Description Method

where
G(x, y, σ) = (1/2πσ 2 ) exp −(x2 + y 2 )/2σ 2

where, G is the Gaussian function and I is the image.


The first stage is to identify the location and scales of key points using scale-
space extrema in the DoG (Difference-of-Gaussian) function with different
values of σ the DoG function is convolved of an image in scale space separated
by a constant factor k as in the following equation.

[D(x, y, σ) = (G(x, y, kσ) − G(x, y, σ)) ∗ I(x, y)] (33)


The Adjacent Gaussian images are subtracted to produce the difference-of-
Gaussian (DoG) images. It uses DOG(Difference of Gaussian)function for
getting the location and scales i.e., interest points of the detected keypoints.
Then, multiple DOG were obtained to get robust and accurate keypoints.
[D(x, y, σ) = (G(x, y, kσ) − G(x, y, σ)) ∗ I(x, y)] [= (L(x, y, kσ) − L(x, y, σ)]
(34)
Difference of Gaussian(DOG) blurring of an image with two different σ, let
it be σ and kσ.
This process is done for different octaves of the image in the Gaussian Pyra-
mid.
2. Keypoint Localization: Accurately locating the feature keypoints. A local
image gradient in localized at each feature keypoint location to define its
location along with its scale. It also discards the lower contrast keypoints.
p
[m(x, y) = (L(x + 1, y) − L(x − 1, y))2 + (L(x, y + 1) − L(x, y − 1))2 ] (35)
Fig. 9. Difference of Gaussian

[θ(x, y) = tan−1 (L(x, y + 1) − L(x, y − 1))] (36)


3. Orientation Assignment: Assigning orientation to keypoints. It assigns ori-
entation to keypoints and calculates gradient magnitude direction enabling
it to be invariant to rotation. The gradient magnitude and orientation of the
image patch for a particular scale is represented as-
p
m(x, y) = (I(x + 1, y) − I(x − 1, y))2 + (I(x, y + 1) − I(x, y − 1))2

I(x, y + 1) − I(x, y − 1)
θ(x, y) = tan−1
I(x + 1, y) − I(x − 1, y)
4. Keypoint Descriptor: Describing the keypoints as a high dimensional vector.
It describes the neighboring keypoints i.e., local structure. It compares the
intensity features making it invariant to illumination and any other changes
in viewing (viewpoint).
the keypoint is at the center of the 16X16 window containing 128 bin values
along 4X4X8 directions with generally 4X4 used over 16X16 sample arrays.
Here keypoint descriptor is represented as feature vectors.
5. Keypoint Matching. Keypoints between two images are matched by identify-
ing their nearest neighbors. Due to noise, in some cases, the second closest-
match may be nearer to the first. The nearest neighbors are identified and
keypoints of two separate images are matched.

Zhouet al. [64] explains SIFT using mean shift algorithm for tracking ob-
jects by similarity searching. An expectation-maximization algorithm evaluates
probability estimation for similarity check. This mutually supporting method is
16 X 16 Window 128 Dimensional Vector

Keypoint

Fig. 10. dimensional vector

Fig. 11. keypoint descriptor

extremely helpful for rectifying unstable measurements in tracking. Lindeberget


al. [65] focuses on all the constituent silent steps involved in SIFT approach.
Panchalet al. [66] surveys the difference amongst SIFT (Scale Invariant Feature
Transform) and SURF (Speed Up Robust Feature), the two feature detection
techniques.
Seet al. [67] experiments by using triclops stereo vision system in a dynamic
environment. Kalman filters track the already localized landmarks, consider-
ing other factors like occlusion and viewpoint variation.SIFT features are scale-
orientation invariant that helps for robot pose estimation and 3D map building
of the tracked landmark. Using, the SIFT features robot ego-motion is estimated
for the matched landmarks.

6.2 SURF(Speeded Up Robust Feature)


SURF(Speeded Up Robust Feature) SURF adds a lot of features to improve the
speed in every step [68] [69].
It is 3 times faster than SIFT while performance is comparable to SIFT.
SURF is good at handling images with blurring and rotation, but not good
at handling viewpoint change and illumination change. [70]. The two steps in
SURF are Interest Point Detection and finding the local maxima. In the Interest
point detection process, the integral image is computed and the second deriva-
tive approximation is applied to filter the image. It is followed by the steps of
Non-maximal suppression. In finding the local maxima in (x,y,s) space), first
the Quadratic interpolation is implemented and fast computation of box type
convolution filters is provided. The integral image is convoluted with a box filter.
The box filter is an approximate filter of the Gaussian filter. The Hessian matrix
H(X, σ) , where X=(x, y) of an image I, at scale σ is defined as follows:

 
Lxx (X, σ) Lxy (X, σ)
H(X, σ) = (37)
Lxy (X, σ) Lyy (X, σ)
Panchal et al. [66] made a comparative study of SIFT and SURF considering
all the dependent factors.

6.3 FAST (Features from Accelerated Segment Test)


FAST is a corner detection method, developed in 2006 by Edward Rosten and
Tom Drummond [71], used mainly in the computer vision field to extract the
feature points for tracking and mapping objects. It was an efficient machine
learning technique in relevance to computation latency and real-time resource
realization. Rosten et al. [71] explains a technique of machine learning which used
for feature detector. The detectors used by other feature extraction techniques
like harris detector, SIFT, SURF are not applicable for real-time implementation,
despite being high-speed detectors. The machine learning approach makes it
faster than other corner detector approaches. The convolution being linear and
optimal matched filtering in the presence of additive Gaussian noise makes it
robust in the presence of noise but fails at high noise levels.
Li et al. [72] emphasizes on the combination algorithm of SURF descriptor
with FAST feature points, which is 50% more efficient than other approach algo-
rithms like SIFT, SURF. It has a better matching effect and high index feature
points for real-time corner and object detection. It limits the problems of slow
speed due to large computation. Viswanathan et al. [73] details the preview of
the FAST algorithm for locating interest points. Here the interest point is de-
fined by a pixel position, making detection procedure robust in SLAM real-time
applications in autonomous mobile robots. Two approaches in FAST- feature
detection and machine learning approach.

1. Feature Detection using FAST [56] It considers a center pixel p, with pixel
intensity Ip, for the threshold t. The arc is represented by the discontinuous
line passing through 12 adjoining pixels brighter than the fixed threshold.
First, the selected pixel is decided to be recognized as an interest point. The
segment test condition selects 16 pixels surrounding the pixel p, (Bresenham
circle) for suitability to be corner. If all the surrounding pixels are brighter
than the sum of pixel’s intensity and threshold, we can denote p as a corner
or all pixels are to be darker than Ip ≤ t.
Fig. 12. Interest point under test along with the 16 pixels on the circle around central
pixel

The more emphasis is put on the four pixels, 1,3, 9,13 (i.e., the four compass
directions). Pixel p is a corner if at least three out of these four should
achieve a threshold criterion for the existence of interest point. It must be
brighter than Ip+t or darker than Ip, else it is not a corner. This process of
full strength is checked for all the 16 pixels with high output performance.
This process of full strength is checked for all the 16 pixels with high output
performance. By overcoming the limitation of speed due to the ordering
for determination of 16 pixels and a maximum number of interest points
detection for N<12 which slows the process.
2. Machine Learning Approach The limitation of the above method is solved
to a great extent by this method. It follows two steps-
First, for constructing a corner detector from a set of images, chosen from the
training set Second, the FAST algorithm is applied to the images to find the
feature points using the segment test criterion with the earlier threshold. The
16 surrounding pixels are tested for each feature point to obtain the feature
vector p. All the individual pixels in the supposed circle x {1,2,. . . .16} pixel
x, relative to feature vector p {p → x} belongs to one of the below states.


darker
 if Intensity of pixel x ≤ Ip − t
Sp → x = similar if Ip − t < Intensity of pixel x < Ip + t (38)

brighter if Ip + t ≤ Intensity of pixel x

After computation of Sp → x for desired value of x from the above three


states, the feature vector obtained can be classified into subsets like Pdark ,
Psimilar , Pbright . A variable kp defines whether Px satisfies to be interest
point or not.
FAST also lacks detection in corner detection perfectly aligned along the x
and y coordinate axes. To include the edges in the corners, rings of contrast-
ing lighter or darker value pixels are considered. [74] [75]

6.4 BRIEF (Binary Robust Independent Elementary Features)


In the vast developing application areas of computer vision like object recogni-
tion followed by 3D reconstruction, camera localization, image retrieval, a major
role is imparted by the feature point descriptor. In BRIEF, the feature point de-
scriptor uses binary strings of data for efficiently fast computation and matching.
In comparison to other descriptor approaches, it is faster with a high recognition
rate. After detection of the keypoints, the feature descriptors are used for encod-
ing the data information into number series differentiating each from the other.
In this approach, keypoints or pixels are surrounded by a square, known as a
patch with some pixel dimensions. These image patches are transformed into a
binary feature vector or binary feature descriptor by BRIEF to represent an ob-
ject. The patches are pre-smoothed to increase the stability and repeatability of
the noise-sensitive descriptors. For this purpose, BRIEF uses a Gaussian kernel.
It enhances performance because smoothing resolves the problem of matching.
A binary feature vector is created for ζ binary test responses out of the patch
and ζ represented as
(
1 p(x) < p(y)
ζ(p; x, y) = (39)
0 p(x) ≥ p(y)
Here x and y are random pair,with intensities p(x) and p(y) inside the patch.
Binary test response chooses n(length of binary feature vector) number of (x,y)
pairs denoting locations.
Approaches for selecting test locations (xi, yi) are shown in Table 2.

Table 2. BRIEF test location selection approaches

Sl. Approaches Position Spread


No.
1. Uniform (G I) x and y pixels in the S/2 around keypoint
random pair
2. Gaussian (G II) x and y pixels in the 0.04 ∗ S 2 around
random pair keypoint
3. Gaussian (G III) Both x and y pixels in 0.04 ∗ S 2 . first pixel(x)
the random pair 0.01 ∗ S 2 .
4. Coarse Polar Grid (G IV) x and y pixels in the
random pair
4. Coarse Polar Grid (G V) x at (0,0) y random pair All possible

Calonder et al. [76] proposes BRIEF, the most efficient feature descriptor
which uses an intensity difference test on few data bits for computation. It rep-
resents an image patch as a binary string Hamming distance helps for fast evalu-
ation of recognition performance. The construction and matching for the BRIEF
descriptor are faster than SIFT and SURF approaches.
Heinly et al. [77] studies about the various factors that has impact on the
BRIEF descriptor’s efficiency and robustness. As BRIEF is invariant to scales,
orientations, transition, it exhibited better performance for a non-geometric
transform as well as perspective transforms. It influences the binary string de-
scriptor.
Li et al. [78] surveys the evolution, working, application, strengths, and draw-
backs of interest point detectors. It focuses on the selection approaches of feature
extraction methods.

6.5 ORB(Oriented FAST and Rotated BRIEF)SLAM


FAST(Feature Accelerated Segment Test) is a keypoint detector. It calculates
the intensity threshold between the center pixel and those in a circular ring of
distance. BRIEF (Binary Robust Independent Elementary Features) use binary
strings as a feature point descriptor. It performs a relatively small number of
intensity difference tests to represent an image patch as a binary string. FAST
does not produce multi-scale features along with no orientation component. [79]
The limitation of rotation invariance in BRIEF and orientation along with
multi-scale feature output in FAST is sorted out in the approach ORB (Ori-
ented FAST and Rotated BRIEF) proposed by Rublee et al. [80]It experimen-
tally explains that ORB SLAM is twice faster than SIFT, with being invariant
to rotation and resistant to noise. It is a combination of orientation FAST key-
points per image and rotation BRIEF features per image that has resistance
to gaussian image noise. It is also more efficient in real-time applications for
object recognition and sfm. It applies an orientation component by analyzing
the variance to rotation and correlation for improved efficiency in the nearest
neighbor implementation [81]. The major advantage of ORB SLAM is that the
same features can be used for mapping and tracking. The frame relocalization
rate and loop closing after detection are also deduced from the earlier obtained
features. Mur-Artal et al. [82] explains For all tasks like tracking, mapping, loop
closing, the same features are used by ORB in real-time applications in large
environments. It is invariant to viewpoint and illumination. The latency is low.
[83] [84]
Tracking controls new keyframe insertion by continuous localization of the
camera pose with each frame. Then, feature matching and optimization are with
the earlier frame. If it fails, due to unexpected camera movement, relocaliza-
tion occurs due to the place recognition technique [85]. The covisibility graph
of keyframes helps to retrieve a local map, previously estimated from camera
pose and feature matching estimations. The decision of adding a new keyframe
is taken by tracking thread. By reprojection, a matching search is done for the
local map along with camera pose optimization [86]. The extract ORB section
deals with the extraction of at least 5 FAST corners per cell. With these cor-
ner cells, orientation and descriptors search for the ORB descriptor to match
features. For tracking to be successful or unsuccessful, by estimating keyframes
from the previous frames, mappoints searching and camera pose prediction is
decided to be carried on or not using constant velocity motion model. Again for
unsuccessful tracking for global relocalization, RANSAC iteration is applied to
trace mappoints in each keyframe. It also tracks the camera pose which is later
optimized.
Tracking of the local map is done by searching for more mappoints by super-
imposing the map on the frame. Local maps are adopted to avoid complexity.
Fig. 13. ORB SLAM flowchart (representing three threads- tracking, local map-
ping,loop closing )

A set of keyframes k1 and k2 shares mappoints with current frames and neigh-
bors of k1 respectively with the reference keyframe being a subset of k1. New
keyframes are inserted to enable the robustness of the tracking process. The re-
dundant keyframes are discarded which doesn’t fulfill the criteria of minimum
50 points passing from earlier global relocalization or earlier keyframe inser-
tion. In local mapping, for keyframe insertion, the covisibility graph is updated
by adding edges and the spanning tree with keyframes (common most points).
They undergo different tests for the first three keyframes for a mappoint to be
retained within it. After the test, mappoints can be removed by the culling of
keyframes. In new mappoint creation, by the process of ORB triangulation, new
mappoints are created in the covisibility graph from connected keyframes. The
ORB pair triangulation checks are acceptance of new points, reprojection error,
positive depth of cameras. Mappoints are focused on the connected keyframes.
In local bundle adjustment, mappoints connected to the currently processed
keyframe and that present in the covisibility graph is optimized by BA. To mini-
mize the complexity, the local map detects and removes the redundant keyframes
known as local keyframe culling. Unless the scene changes, the visual features
content is fixed. It discards mappoints with 90% similarity in the following three
keyframes in the covisibility graph.
Fig. 14. Local mapping stages before loop closing

Mur-Artal et al. [85] For loop detection and closing, the currently processed
keyframe and the last processed keyframe of the local mapping is considered.
The consistency of three consecutive loop candidates is observed in the covisi-
bility graph. To satisfy the geometrical validation of a loop and for loop closing,
similarity transformation is applied. The repeated map or the matched keypoints
are fused [87]. New edges are inserted into the covisibility graph which is be-
ing repeatedly updated to attach to loop closure. This helps in loop correction.
Pose graph optimization is conducted over the essential graph for effective loop
closure. The obtained mappoint is individually transformed accordingly to any
one of the observing keyframes.

6.5.1 Loop closing


Han et al. [88] analyses Multiple Index Hashing in searching binary feature
loop closure detection in the ANN field to solve problems of feature classification
and accuracy in similarity in place measurement. Negre et al. [89] experimentally
uses an Autonomous Underwater Vehicle (AUV) for stable and robust features in
marine environments. clustering keypoints to match keyframes are employed for
loop closing detection. Place recognition is an important block for loop closure,
as it helps sensors for error correction is the previously mapped region. It also
helps for camera localization after an occlusion causes tracking failure [90] [85].
ORB SLAM2 is a SLAM system for monocular, stereo, and RGD-D cam-
eras which include map reuse, loop closing and relocalization capabilities [85]
[79]. It is applicable for indoor as well as outdoor environments. Mur-Artal et
al. [91] explains ORB2 its advantage of predicted loop closure based on Bundle
Adjustment (BA) for accurate trajectory estimation with monocular and stereo
observations in the metric scale. Visual SLAM is performed with the monocular
camera (cheapest and smallest sensor setup). But depth, the scale of the map,
and the estimated trajectory are unknown [92]. For bootstrapping, initialization
of the map is difficult as triangulation(Depth of the points is less than approx.40
times the base margin of stereo) is not possible from the first frame. The draw-
backs of scale drift and pure rotation exist in exploration, which can be solved
by a stereo or an RGB-D camera [93] [94]. Caldato et al. [95] improves the
Fig. 15. ORB SLAM2 flowchart (representing three threads- tracking, local map-
ping,loop closing along with fourth thread, Bundle Adjustment ) [Mur-Artal et al. [91]
]

robot’s trajectory estimation visual tracking odometry results of ORB-SLAM2


with accuracy. It overcomes tracking loss by insertion of graph constraints in
between keyframes.
ORB SLAM2 for stereo and RGB-D cameras are attached to the earlier fea-
ture based ORB SLAM [82] [96]. In addition to the three threads of ORB
SLAM [80], It has an additional thread. First, tracking to localize the camera
with every frame by finding feature matches to the local map and minimizing
the reprojection error applying motion-only Bundle adjustment. Second, local
mapping performs local BA for managing the local map, sequentially optimizing
it [58]. Third, Loop closing is the detection of large loops, and, from the accu-
mulated drift it performs correction by the technique of pose graph optimization
[97]. Last, performs Bundle adjustment for computation of optimal structure
and motion solution following the previous thread.
Tracking failure is caused due to occlusion in view or from the reinitialization
of earlier mapped mappoints and their smooth loop closing. The system has an
embedded place recognition scheme. The keypoints with common points are
linked by covisibility graph, likewise keyframes by spanning tree [98]. Lv et al.
[99] proposes the use of RGBD sensor for reconstruction of 3D point cloud map
and a compact octree map. Depending on it, probability occupancy is estimated
for extending the mapping technique in ORB-SLAM. The problem of obstacle
avoidance and navigation faced by a sparse map is also avoided. For accurate
tracking of camera pose and completion of octomap, it is applied to handheld
Kinect 2.0. Li et al. [100]A fast, efficient and accurate method of Kinect V2 for
robust mapping system to automate photovoltaic panel cleaning system which
is considerate to vibration, occlusion, and loop closure. After loop detection, a
globally consistent 3D map is created which transforms in octree based structure
making the data easy to save. Some low texture scenarios are not extracted by
ORB feature extraction.
Pumarola et al. [101] explains the PL(point-line) SLAM which works to
handle point and line features simultaneously. Some of the advantages of ORB
SLAM2 are robustness in scale and rotation, invariant to illumination changes
which extraction and matching process faster. For featured ORBSLAM2, prepro-
cessing is input is carried out independently of stereo cameras or RGB-D sensor
[96]. karamietal. [102] and Chien et al. [103]describes image matching using all
the feature detection approaches, SIFT, SURF, BRIEF, ORB. Leutenegger et
al. [104] compares the stereo camera vision with monocular vision with imple-
mentation on sliding window filter in inertial odometry.

6.5.2 BUNDLE ADJUSTMENT


The Bundle Adjustment method is applicable in the photogrammetry field
in computer vision. It tracks the trajectory of a 3D calibrated camera [105].
For the relative motion of the camera used for acquiring the image projection
of the 3D coordinate points from different viewpoints, the Bundle Adjustment
method refines the 3D coordinate geometry of the scene. [106] It is applied after
the feature extraction on the detected feature points. It faces the problem in
optimization on the 3D coordinate points constructed into a defined structure.
When a new frame is added to the real-time camera tracking system, bundle
adjustment is performed. It decreases the rate of gross failures by improving the
accuracy, suppressing error buildup in camera tracking as explained by Engelset
al. [107]. The redundant keyframes are culled from the derived map, thereby
reducing the computational cost, increases reliability and long duration stability.
Simultaneously refines 3D structure and camera parameters. Projections of
the 3D point X to its corresponding 2D point x in an image X is 3D point x
corresponding 2D point in an image P is camera matrix
Bundle Adjustment defines the simultaneous refinement of the 3D model from
the 2D moving camera image matrix. It is a projection of X (3D point) with its
corresponding x (2D camera point) image along with the camera matrices M
from various positions of moving camera. Visual odometry using Bundle Adjust-
ment had its real-time application by Mouragnon etal. [109]. It takes video data
as input and tracks the interest points by matching at a video rate the frames.
The camera motion is estimated in real-time by keyframe selection along with
3D feature reconstruction of large sequences by fast technique and BA. Zhang
et al. [110] bundle adjustment technique solves experimentally the error drift
problem by correlating the images road traffic signal. Grisettiet al. [111] opti-
mises the nonlinear leastsquares problems of a hypergraph in BA.
M. Chojnacki et al. [112] and V. Indelman et al. [113] emphasises on simulta-
neous robot’s ego-motion estimation in GPS denied environment. It also tracks
the dynamic target by using bundle adjustment optimization technique. The ad-
Fig. 16. Local Bundle Adjustment stages before loop closing [108]

vanced light bundle adjustment optimization can exclude the 3D points online
reconstruction, thereby reducing the computational time and increased accuracy.
Bundle Adjustment optimizes the solution by minimizing the reprojection
error that occurs along with the viewed camera frame and the reprojected 3D
point. BA SLAM system faces the shortcoming of initialization, estimation, and
timely maintenance of the 3D map, which makes the processing complicated
[114]. Also during rotational as well as slow movement as proposed by Bustoset
al. [115]. To overcome these, SLAM using optimization is proposed. It carries out
incremental optimization on camera orientations. From it, the camera position
and 3D model points are estimated [116]. Consider Zi,j be the coordinates of
the i-th scene point as seen in the j-th image Zj . SfM aims to estimate the
coordinates X = xi of the scene points and poses (Rj , tj ) of the images Zj . The
BA formulation
X
min ||Zi,j − f (Xi |Rj , tj )||22 (40)
{Xi },{(Rj ,tj )}
i,j

f (Xi |Rj , tj ) is the function projecting on Zj by Xi . Zj is the set of images


that are provided as input to BA, with input observations Ui,j , and visibility
matrix.
The following is the core optimization algorithm of BA in SLAM technique.
New image points are produced by initializing n number of new points. The
trajectory of camera and 3D map are estimated using camera tracking and lo-
cal mapping.For accuracy in pose initialization, pose graph optimization is ap-
plied [117] [118]. Re-optimization of the variable points and redistribution of
error occurs at the following steps [119].
Fig. 17. Pseudo code of BA-SLAM Algorithm

The BA-SLAM estimates the variables which update the keyframes, map-
point selection,discarding redundant points by satisfying the conditions of loop
constraints. This makes the system complex leading to failure in optimization.
The problem also arises in the pure rotational motion [120] and slow camera mo-
tion [121]. By rotation averaging, a simpler estimation of camera orientation can
be estimated to obtain the 3D map model and camera position [122] [123] [124].
It is implemented by an advanced optimization method in visual SLAM named
as L-infinity SLAM.

In the L-infinity SLAM algorithm, the camera poses and maps are derived
using a known rotation problem by an estimated orientation [125] .L-infinity
SLAM is used to estimate the orientations, as its position and map are deduced
from global optimization [126]. The simplicity of the technology enables it to
handle pure rotational motion [127]. Rotation averaging formulation
X
min ||Rj,k − Rk R−1 2
j ||F (41)
{Rj }
j,k∈N

relative rotations is Rj,k


Bustoset al. [115] highlights the basic advantage of L-infinity SLAM over
BA-SLAM.First, BA fails to close the sequence due to camera disconnection but
L-infinity SLAM does not need to follow multiple tracks [128]. Second, to track
camera movement, BA fails in the initialization of the map which is not required
in L-infinity SLAM. Third, the rotation averaging in L-infinity SLAM if less time
consuming than the BA.
In Global Bundle Adjustment, the complete map is ultimately optimized
using the pose graph optimization method. This optimization is similar to the
local bundle adjustment of optimizing the keyframes and map points. The local
mapping is again presumed as explained in Mur-Artalet al. [83]
Fig. 18. Pseudo code of L-INFINITY SLAM Algorithm

6.6 Standard Bundle Adjustment


BA is defined as an iterative, non-linear optimization framework generally imple-
mented for estimating camera poses and observed landmarks.It is the combined
process of SLAM and tracking of a moving object [129] [121]. A pinhole cam-
era model is considered to minimize the re-projection error across the measured
and predicted image coordinates. This method is the integration of the SLAM

p^
Reprojection
error
p

Fig. 19. re-projection error

process and tracking of a moving object. It involves an optimization over three


states. The camera’s motion states, the target’s navigation states and the ob-
served structure (3D landmarks).
Bundle adjustment is the process to deduce the camera poses and the sta-
tionary 3D target tracking structure. Here Xk denotes all such states up to time
tk , Lk are n landmarks observed by time tk , Zk are corresponding sensor ob-
servation. The term zi corresponds to all image observations obtained at time
ti and zij is an observation of the j th landmark at time ti . Using probabilistic
representation, the BA problem for a static environment can be expressed by
the joint pdf-
k Y
Y
p zii | xi , lj

P (Xk , Lk | Zk ) ∝ priors · (42)
i=1 j∈Mi

where xk represent the camera poses (i.e. 6DOF position and orientation) at
time step tk , Mi is the set of landmarks which represents the 3D scene points
observed at time index i and priors represent prior information on the estimated
variables.
Bundle adjustment has a computational complexity which is directly de-
pendent on the the factors like the number of images captured, the 3D point
observation along with the actual image observations.This causes the optimiza-
tion problem. The various steps of solving BA problem
Step 1:- Calculating joint pdf, given by
Xk∗ , L∗k = argmaxP (Xk , Lk | Zk ) (43)
Xi ,Ik

Step 2:- The maximum a posteriori estimation is done each time a new im-
age is added.Incremental optimization is the requirement for good initialization.
Calculating the maximum a posteriori estimate over the joint pdf.
Xk∗ , L∗k = arg maxP (Xk , Lk | Zk ) = arg min − log P (Xk , Lk | Zk ) (44)
Xk ,Lk Xk ,Lk

Step 3:- The calculation of MAP estimate is considered equivalent to min-


imizing non-linear least square cost function for optimization(for clarity, priors
are discarded).
X X 2
JBA (Xk , Lk ) = zij − proj (xi , lj ) (45)
Σ
i j∈Mi

6.7 Light Bundle Adjustment(LBA)


It is an efficient optimization method. LBA allows for reduction of the number
of variables involved in the optimization compared to standard BA. It estimates
the consistent motion continuously.
It eliminates landmarks and optimization is performed only over camera pose.
A residual function is the result of eliminating a landmark from the different ob-
servations of two or three cameras. It uses an algebric elimination on the basis
of multi-view constraints with the probability density function PLBA (X | Z)
for the camera poses [112]. The Gaussian noise over the measurement residual
h(x, z) → R, obtained from multi-view constraints by ignoring priors is repre-
sented as
Nh  
Y 1  2
PLBA (X | Z) ∝ exp − hi X̄i , Z̄i Σ (46)
i
2 i
where the it h subset of camera poses is X̄i ⊂ X and the it h subset of image
observation is Z̄i ⊂ Z.

Fig. 20. Three view constraints

Algebraic independence exist only between up to three cameras to observe a


landmark from more than 3 views. It is represented by a set of two and three-
view constraints from different cameras . To deduce the LBA residual function
by finding the reprojection error between landmarks observed from two or three
cameras. Considering a 3D point l1 , a common observation point from the
camera poses, projection equations are derived. From these equations, the point
l1 is algebrically eliminated, giving rise to constraints between the poses. In three
view constraints, these are formulated [113].
The constraints that relate the three poses(k,l,m) while eliminating the land-
mark are two numbers of two-view constraints g2v between two pairs of poses
(e.g. (k, l) and (l, m)) One three-view constraints g3v involving (k, l, m)
Projection equations can be written as
– g2v (xk , xl , zk , zl ) = qk · (tk→l × ql )
– g2v (xk , xl , zk , zl ) = qk · (tk→l × ql )
– g3v (xk , xl , xm , zk , zl , zm ) = (ql × qk ) · (qm × tl→m ) − (qk × tk→l ) · (qm × ql )

6.8 Factor Graph representation


The factorization of joint probability density function P(X) defined for variable X
is represented in a graphical model, called factor graph [130]. The main purpose
is incremental inference having high efficiency. The joint PDF is represented
into product of smaller Probability distribution that considers smaller number
of variables [131].
The factorization of the joint pdf can be represented using a factor graph

P (Xk , Yk , Lk | Zk ) ∝ priors


(47)
Qk Q 
× i=1 fmm (yi , yi−1 ) fproj (xi , yi ) j∈Mi fproj (xi , lj )
Fig. 21. Factor graph representation (a) For BA (b) For LBA

.
 
2
here, fmm (yi , yi−1 ) = exp − 12 kyi − Φi−1 yi−1 kΣmm corresponds to the target
motion model  
. 2
1 j
fproj (xi , lj ) = exp − 2 zi − proj (xi , lj ) correspond to the landmark
Σv
observation model
.
 
2
fproj (xi , yi ) = exp − 21 kziyi − proj (xi , yi )kΣv correspond to the target ob-
servation model
The Joint Pdf of LBA is given by ‘
Nh
Y
PLBA (X | Z) ∝ f2v/3v (Xi ) (48)
i=1

where f2v/3v represents the involved two- and three view factors
Here, f2v and f3v : are maximum likelihood of 2-view and 3-view constraints
.
 
2
f2v (xk , xl ) = exp − 21 kg2v (xk , xl , zk , zl )kΣ2v
 
2
f3v (xk , xl , xm) = exp − 12 kg3v (xk , xl , xm, zk , zl , zm )kΣ3v
In robotics application,the joint pdf includes the two and three view factors
along with factors representing measurement likelihood detected from additional
sensors as per requirement (information fusion).

6.9 Incremental Light Bundle Adjustment (iLBA)


In robotics, incrementally updating the information is carried on as new vari-
ables and measurements arrive from the camera and other sensors in the spec-
ified framework. The incremental smoothing method (iSAM2), efficiently opti-
mizes the iLBA component. The primary motive of incremental inference using
above method is to calculate efficiently the MAP estimation of the LBA pdf at
time,tk+1

Xk+1 = arg min − log pLBA (Xk+1 | Zk+1 ) (49)
Xk+1

iSAM2 applies graph algorithms to incrementally update a factor graph rep-


resentation of the optimization problem. Indelman et al. [132] focuses on the
optimization efficiency and reduced computational complexity of incremental
light bundle adjustment(iLBA). This technique is faster and accurate to noise
levels compared to bundle adjustment technique.

Graph Optimization Run time is an important factor for categorizing SLAM


in as i) online SLAM (or incremental) SLAM ii) offline (or batch) SLAM. As
the sensor measurement is never exact which in turn leads to data association
problems. Pose graph optimization is one of the effective formulations of batch
SLAM which utilizes all the available measurements to estimate the position of
the robot in an environment. Nowadays SLAM techniques based on Pose Graph
are becoming very popular because of their computational efficiency. In general,
the graph is composed of nodes which can be either robot pose or world feature.
The measurement that connects them is termed as constraints. The main goal of
all approaches is to optimize the robot pose and to minimize the errors introduced
by the constraints [133]. It computes the maximum likelihood estimate of robot
pose which leads to optimization with multiple local minima [134] [135]. Bundle
adjustment is one of the alternatives for optimization, which uses a nonlinear
optimizer called Levenberg-Marquardt (LM) [133].
In last decades interest in SLAM has increased remarkably which leads to
the evolution of several algorithms that are more efficient than traditional EKF.
Thrun et al. [136] presents a GraphSLAM which builds a sparse graph of nonlin-
ear constraints, that can be linearized while resolving the problem. It utilizes a
greedy algorithm for data association and can handle more than 108 features for
generating maps as well as can assimilate GPS details while generating maps.
According to Frese et al. [135] map quality, storage space, and computation time
are the three important criteria for an ideal SLAM algorithm. While map qual-
ity connects the map with the real world, the other two deal with efficiency. In
his work, he applies an iterative process, Gauss-Seidel relaxation to SLAM for
solving linear equations in the system. A new algorithm Multi-level relaxation
(MLR) was introduced which is suitable for online real-time SLAM as well as
handles closing of large loops. MLR algorithm is very useful in the identification
of observed environmental features.
Data association and loop closure detection are one of the most fundamen-
tal challenges of SLAM. Several works have been done to solve these problems.
The error rates have been reduced in the case of a single robot system but the
multi-robot system still faces errors. Olson et al. [137] proposed a model based
on incremental pose parametrization that allows maximum likelihood interfer-
ence on the factor graph network, which is not limited to Gaussian distribution.
A max-mixture model was incorporated into the existing graph and enables fast
computation by reducing the computation time of jacobian and residual. Grisetti
et al. [117] approach can be seen as extension of Olson’s approach. It also con-
siders yaw, roll, and pitch errors which can be appropriately distributed over the
sequence of poses in the 3D mapping problem. Stochastic gradient descent (SGD)
has been applied to define and update local regions in each iteration. Wang et
al. [138] studies a pose graph with a special structure where all the edges are con-
nected with either of the two nodes, hence it is known as two anchors pose graph.
He analyzed that under spherical covariance matrices (identity) assumption, op-
timizing a two anchor pose graph is equivalent to one-dimensional optimization.
As at the most three local minima are present in one-dimensional optimization,
so the globally pose configuration of the pose graph can be easily contrived us-
ing the bisection method. Konolige et al. [133] suggested a method called Sparse
Pose Adjustment (SPA) which is fully nonlinear and equally efficient for online
as well as offline SLAM. Using this method sparse matrix of the graph is enu-
merated efficiently as well as solved using a sparse linear model. Kummerle et
al. [118] suggests a g 2 o (general graph optimization) for batch optimization that
optimize non linear error function in the constraint graph. g 2 o is applicable to
bundle adjustment as well as to several variants of SLAM like 2D, 3D, pose only,
with landmarks.
Kaess et al. [139] proposed a novel data structure, the Bayes tree which com-
bines the advantage of a graphical model and sparse linear algebra. Using the
Bayes tree, a new method iSAM2 was suggested that can efficiently solve a non-
linear estimation problem in an incremental and real-time application. Carlone
et al. [134] designed an approximation, named LAGO (Linear Approximation for
pose Graph Optimization), which does not require initial measurements for robot
pose estimation, at the same time also reduce the risk of being trapped in local
minima. In this seminal paper, he extends his previous work [140] [141] [142]
and improved by including an analytical assessment of the suboptimality gap
of the approach. Some comparisons were done w.r.t state-of-art approaches like
Toro, g 2 o, and iSAM2, and the result proved that the accuracy of LAGO is com-
paratively similar to g 2 o and iSAM2. Toro is the least accurate and LAGO is
two times faster than iSAM2.
As we can see that there are so many approaches proposed in the last decade
for robot pose estimation using graph optimization, but the main objective of all
approaches is to optimize the robot pose such that error introduced by constraint
gets minimized [46].

Table 3: Comprehensive summary of different SLAM techniques

Survey paper Core of the Type of Type of Feature


Solution Sensing map Extraction
device scheme
Ataer- RGB-D SLAM RGB-D sensor Metric SURF
Cansizoglu E et
al. (2016) [23]
Belter D et al. PUT-SLAM Monocular Metric SURF+ORB
(2016) [22] camera
Kendall A et al. Deep learning Monocular Metric SURF
(2016) [24] camera
Dharmasiri T et MO-SLAM Stereo camera Metric FAST
al. (2016) [18]
Gawel A et al. ORB-SLAM2 LIDAR and sparse vision visual inertial
(2016) [143] vision pipeline map and odometry
dense
LIDAR map
Manderson T et SLAM with Stereo camera Metric Local binary
al.(2016) [19] active gaze +IMU patterns +
selection harris keypoint
Siva S et al. Fusion of Omnidirectional Topological GIST + HOG
(2018) [8] Omnidirectional camera + GPS + mretric + LBP + CNN
Multisensory data for ground
Perception truth
(FOMP)
Wang C et al. Graph Inertial sensor axonometric visual inertial
(2017) [144] optimization + depth camera map odometry
based
framework
combining
UWB and
visual inertial
odometry
Kochanov D et Discovering Stereo camera Semantic semantic and
al.(2016) [20] objects in mapping motion cues
dynamic
semantic maps
Neubert P et al. Monte Carlo visual camera metric depth
(2017) [25] Localization information
based on from 3D map
synthesizing
depth images
Cieslewski T et Decentralized visual camera metric NetVLAD
al. (2017) [26] visual place feature vector
recognition +k
means
clustering
Cabrera-Ponce template Bebop 2 camera metric canny detection
AA et al. (2017) matching + + Ground
[27] ORB SLAM2 control station
(GCS)
Liang X et al. LRGC SLAM LASER + metric ORB features
(2016) [145] and SPA Camera sensor and Bags of
optimization words
Evers C et al. GEneralized acoustic metric Global
(2017) [146] Mo- microphone descriptors
tion(GEM)SLAM arrays
Xin GX et al. ORB SLAM + RGB-D sensor Topological FAST
(2019) [147] PROSAC
Sayre-McCord EKF extroceptive metric visual inertial
T et al. (2018) sensor odometry
[40]
Triggs et al. ORB-SLAM camera sensor metric visual inertial
(1999) [105] odometry
Aulinas et ORB SLAM2 camera metric SURF
al.(2011) [58] on-board
SPARUS AUV
Wang X et al. RANSAC based odometry + Topological inverse
(2017) [148] method for pose monocular projection
optimization camera function
and loop closure
detection
Guerrero-Font Two parallel IMU + metric —–
E et al. (2016) EKF Doppller
[149] velocity log
(DVL) +
Pressure sensor
+ visual tracker
+GPS
Qiu K et al. EKF - based IMU + Aerial SFM
(2017) [150] sensor fusion for Monocular
closed-loop Fisheye camera
control
Li J et al. Pose Graph DVL +IMU + Underwater saliency scores
(2018) [151] SLAM for SONAR
underwater
robot
Luo J et al. FastSLAM GPS + INS Metric strong tracking
(2018) [152] combining square root
improved box central
particle filter difference
(IBPF) and particle filter
extended (STSRCDPF)
interval Kalman
filter (EIKF)
Concha A et al. tightly coupled IMU + semidense superpixel
(2016) [153] visual inertial Monocular map mapping
SLAM camera
Fischer T et al. S-PTAM Stereo Camera Metric GFTT and
(2016) [154] BRIEF
et al. (2017) Probabilistic visual odometry metric PSM
[155] Surfel Map + monocular
(PSM) for dense camera
visual SLAM
Rameau F et al. RANSAC-less Stereo camera metric FLANN
(2016) [156] visual odometry + visual
odometry
Liu H et al. RKSLAM, a IMU + metric FAST corner +
(2016) [157] robust monocular global image
keyframe-based camera alignment
monocular
SLAM
Sjanic Z et al. EM-SLAM IMU + metric SIFT
(2017) [158] (Expectation monocular
-Minimization camera
SLAM)
Zhou H et al. SFM algorithm gyroscope metric SIFT
(2016) [159] with EKF
Cieslewski T et distributed Monocular metric image patches
al. (2017) [26] inverted index camera
+ bag of words
Qin T et al. visual inertial IMU + metric DBoW2 +
(2018) [160] odometry + monocular BRIEF
loop detection camera descriptors
+ pose graph
optimization
Zienkiewicz J et ORB SLAM Monocular metric image patches
al. (2016) [161] camera
Tateno K et al. Deep neural Camera metric CNN
(2017) [162] network
Wang Z et al. Fusion of 2D monocular metric Horizontal
(2018) [163] battery-free camera + RFID Dilution of
RFID and + MVO Precision
monocular (HDOP) value
visual odometry
Carlone L et al. visual inertial Onboard metric minEig +
(2018) [164] navigation camera + logDet
inertial sensor
Schmuck P et ORB SLAM-2 Monocular metric visual odometry
al. (2017) [165] camera
Wang S et al. deep Recurrent monocular VO metric CNN
(2017) [166] Convolutional
Neural
Networks
(RCNNs)
Teixeira L et al. Keyframe based RGB- D sensor metric BRISK
(2016) [167] SLAM + IMU
Aulinas et ORB SLAM2 camera metric SURF
al.(2011) [58] on-board
SPARUS AUV
Rublee et al. ORB extroceptive metric SIFT, SURF
(2011) [80] sensor
Lowe et al. difference-of- extroceptive metric SIFT
(2004) [63] Gaussian sensor
function
Lajoie et al. difference-of- extroceptive metric SIFT
(2020) [168] Gaussian sensor
function
Rosten et al. feature detector extroceptive metric harris detector,
(2006) [56] sensor SIFT, SURF
Viswanathan et threshold extroceptive metric FAST
al. (2009) [73] intensity sensor
Calonder et al. Hamming extroceptive metric BRIEF
(2010) [76] distance sensor
Li et al. (2017) interest point extroceptive metric FAST+SURF
[72] detectors sensor
Mur-Artal et al. Visual Stereo or metric ORB-SLAM
(2018) [82] odometry monocular
camera
Caldato et al. visual tracking monocular and metric ORB-SLAM
(2018) [95] odometry stereo
observations
Li et al. (2018) Pose-graph extroceptive metric ORB-SLAM
[151] SLAM sensor
Pumarola et al. Visual monocular and metric PL-SLAM
(2017) [101] odometry stereo
observations
Triggs et al. Visual 3D caliberated metric ORB-SLAM+
(1999) [105] odometry camera BA
Zhang et al. monocular monocular and metric ORB-SLAM+
(2019) [110] visual odometry stereo BA
observations
Bustos et al. pose graph Stereo camera metric BA using
(2019) [115] optimization optimization
Bagchi et al. Hough Stereo camera metric BA using
(2020) [123] Transforms optimization
Huang et al. Stereo VIO metric Kalman filters
(2020) [169] Visual-Inertial
Odometry

7 Conclusion

Simultaneous localization and Mapping is the most advanced technique that fa-
cilitate autonomy to the robots. This technique also ensures the multi-sensor data
fusion for decision making by robots. This article summarizes different SLAM
techniques and their corresponding technical aspects for comprehensive under-
standing about SLAM. Family of both deterministic SLAM and probabilistic
SLAM are explored for understanding of comprehensive framework.
The deterministic SLAM directly uses the sensor data for tracking and map-
ping purpose. However, estimation of the probable poses and features are gener-
ally not required in featureless environment.In the laser based (LIDAR) SLAM,
the LASER sensors are used in pair with the IMU. The information in the form
of scans are processed to build the pose graph. It is fast and accurate because
laser performs with high precision. The only problem is occlusion. The vision
based SLAM (vSLAM) uses cameras paired with the IMU. The data from the
cameras keeps a track of the changes in position of the robot in motion. The
3D location can be triangulated from the successive camera frames. The error in
this approach is reprojection error.
The probabilistic SLAM adopts the state estimation techniques for percep-
tion and probable action. The next states of the systems are predicted from
the previous state (using the Bayesian formula). The motion model and the
observation model can be formulated to derive the state of the system. This
approach uses an iterative mathematical process that use a set of equations and
consecutive data inputs to quickly estimate the true value, position, velocity etc
of the object being measured, when the measured values contain unpredicted
or random error uncertainty or variation. Kalman filter(KF) uses of linearized
motion and observation models. Kalman Filter operates on past outputs . It
predicts the new state and its uncertainty (from the motion model). The cor-
rected value is deduced with the new observation measurement. The orientation
in the robot pose contain non-linearities in the robot motion model and the fea-
ture observation model. EKF-SLAM linearises both the motion and observation
models by using first-order Taylor series expansions around a working point,
which generates the current state estimate. The Jacobians causes linearization
errors.The UKF-SLAM processes the non-linear model directly.UKF addresses
the approximation issue of EKF. UKF based SLAM can accurately approximate
the third-order (Taylor series expansion) of any nonlinearity. Factor Graph per-
forms information fusion includes the joint pdf of two and three view factors
with the measurement likelihood detected from additional sensors.
Feature point-based SLAM approach includes feature detectors and descrip-
tors for ensuring stable estimation results. In this process, estimation of the
camera motion obtained from the frames captured and mapping estimates from
the feature points, the 3D pose by triangulation. Technical aspects are cate-
gorized based on different image matching techniques. These are dependent on
transformations factors such as scaling, rotation, noise and distortion.
a)SIFT features are scale-orientation invariant that helps for robot pose esti-
mation and 3D map building. The SIFT features estimates the ego-motion of a
robot.
b)SURF has additional characteristics of handling images with blurring and ro-
tation, but fails at handling viewpoint change and illumination change.
c)FAST is a corner detection method that extracts the feature points for track-
ing and mapping objects.FAST also lacks in corner detection perfectly aligned
along the x and y coordinate axes.
d) BRIEF is an efficient feature descriptor which uses an intensity difference test
on few data bits for computation. It represents an image patch as a binary string
Hamming distance helps for fast evaluation of recognition performance.
e) Oriented FAST and Rotated BRIEF (ORB) SLAM is the combination of the
FAST and BRIEF methods. FAST(Feature Accelerated Segment Test) is a key-
point detector. It calculates the intensity threshold between the center pixel and
those in a circular ring of distance. BRIEF (Binary Robust Independent Ele-
mentary Features) use binary strings as a feature point descriptor. It performs a
relatively small number of intensity difference tests to represent an image patch
as a binary string.By comparing SIFT, SURF, FAST, BRIEF and ORB, we
observe ORB is the fast algorithm. ORB gets efficient keypoint. ORB-SLAM
estimates accurate rotational movement. Real time Loop closing in an unknown
environment based on localization is an open challenge in ORB–SLAM. L-infinity
SLAM is a simpler alternative to SLAM systems based on bundle adjustment.
There is no need to maintain an accurate map and camera motions at key frame
rate as demanded by systems based on bundle adjustment.
Optimization based SLAM used to optimize the camera poses by suppressing
the accumulated error. The camera poses are represented as a graph and the
consistent graph suppresses the error in the optimization.Bundle adjustment
(BA) is used to minimize the reprojection error of the map by optimizing both
the map and the camera poses.

References
1. Arimoto, S., Kawamura, S., Miyazaki, F.: Bettering operation of robots by learn-
ing. Journal of Robotic systems 1(2), 123–140 (1984)
2. Durrant-Whyte, H., Henderson, T.C.: Multisensor data fusion. In: Springer hand-
book of robotics, pp. 867–896. Springer (2016)
3. Murphy, R.R.: Introduction to AI robotics. MIT press (2019)
4. Liu, Q., Li, R., Hu, H., Gu, D.: Extracting semantic information from visual data:
A survey. Robotics 5(1), 8 (2016)
5. Yang, G.Z., Dario, P., Kragic, D.: Social robotics—trust, learning, and social
interaction (2018)
6. Durrant-Whyte, H., Bailey, T.: Simultaneous localization and mapping: part i.
IEEE robotics & automation magazine 13(2), 99–110 (2006)
7. Morad, S.D.: The spinning projectile extreme environment robot (2019)
8. Siva, S., Zhang, H.: Omnidirectional multisensory perception fusion for long-term
place recognition. In: 2018 IEEE International Conference on Robotics and Au-
tomation (ICRA). pp. 1–9. IEEE (2018)
9. Lu, Y., Xue, Z., Xia, G.S., Zhang, L.: A survey on vision-based uav navigation.
Geo-spatial information science 21(1), 21–32 (2018)
10. Chatila, R., Laumond, J.P.: Position referencing and consistent world modeling for
mobile robots. In: Proceedings. 1985 IEEE International Conference on Robotics
and Automation. vol. 2, pp. 138–145. IEEE (1985)
11. Smith, R., Self, M., Cheeseman, P.: Estimating uncertain spatial relationships in
robotics. In: Autonomous robot vehicles, pp. 167–193. Springer (1990)
12. Clabaugh, C., Matarić, M.J.: Robots for the people, by the people: Personalizing
human-machine interaction. Science robotics 3(21), eaat7451 (2018)
13. Leonard, J.J., Durrant-Whyte, H.F.: Simultaneous map building and localization
for an autonomous mobile robot. In: IROS. vol. 3, pp. 1442–1447 (1991)
14. Dissanayake, G., Huang, S., Wang, Z., Ranasinghe, R.: A review of recent de-
velopments in simultaneous localization and mapping. In: 2011 6th International
Conference on Industrial and Information Systems. pp. 477–482. IEEE (2011)
15. Bresson, G., Alsayed, Z., Yu, L., Glaser, S.: Simultaneous localization and map-
ping: A survey of current trends in autonomous driving. IEEE Transactions on
Intelligent Vehicles 2(3), 194–220 (2017)
16. Taketomi, T., Uchiyama, H., Ikeda, S.: Visual slam algorithms: a survey from
2010 to 2016. IPSJ Transactions on Computer Vision and Applications 9(1), 16
(2017)
17. Fuentes-Pacheco, J., Ruiz-Ascencio, J., Rendón-Mancha, J.M.: Visual simultane-
ous localization and mapping: a survey. Artificial intelligence review 43(1), 55–81
(2015)
18. Dharmasiri, T., Lui, V., Drummond, T.: Mo-slam: Multi object slam with run-
time object discovery through duplicates. In: 2016 IEEE/RSJ International Con-
ference on Intelligent Robots and Systems (IROS). pp. 1214–1221. IEEE (2016)
19. Manderson, T., Shkurti, F., Dudek, G.: Texture-aware slam using stereo imagery
and inertial information. In: 2016 13th Conference on Computer and Robot Vision
(CRV). pp. 456–463. IEEE (2016)
20. Kochanov, D., Ošep, A., Stückler, J., Leibe, B.: Scene flow propagation for seman-
tic mapping and object discovery in dynamic street scenes. In: 2016 IEEE/RSJ
International Conference on Intelligent Robots and Systems (IROS). pp. 1785–
1792. IEEE (2016)
21. Endres, F., Hess, J., Engelhard, N., Sturm, J., Cremers, D., Burgard, W.: An
evaluation of the rgb-d slam system. In: 2012 IEEE International Conference on
Robotics and Automation. pp. 1691–1696. IEEE (2012)
22. Belter, D., Nowicki, M., Skrzypczyński, P.: Improving accuracy of feature-based
rgb-d slam by modeling spatial uncertainty of point features. In: 2016 IEEE in-
ternational conference on robotics and automation (ICRA). pp. 1279–1284. IEEE
(2016)
23. Ataer-Cansizoglu, E., Taguchi, Y., Ramalingam, S.: Pinpoint slam: A hybrid of
2d and 3d simultaneous localization and mapping for rgb-d sensors. In: 2016
IEEE international conference on robotics and automation (ICRA). pp. 1300–
1307. IEEE (2016)
24. Kendall, A., Cipolla, R.: Modelling uncertainty in deep learning for camera relo-
calization. In: 2016 IEEE international conference on Robotics and Automation
(ICRA). pp. 4762–4769. IEEE (2016)
25. Neubert, P., Schubert, S., Protzel, P.: Sampling-based methods for visual nav-
igation in 3d maps by synthesizing depth images. In: 2017 IEEE/RSJ Interna-
tional Conference on Intelligent Robots and Systems (IROS). pp. 2492–2498.
IEEE (2017)
26. Cieslewski, T., Scaramuzza, D.: Efficient decentralized visual place recognition
from full-image descriptors. In: 2017 International Symposium on Multi-Robot
and Multi-Agent Systems (MRS). pp. 78–82. IEEE (2017)
27. Cabrera-Ponce, A.A., Martinez-Carranza, J.: A vision-based approach for au-
tonomous landing. In: 2017 Workshop on Research, Education and Development
of Unmanned Aerial Systems (RED-UAS). pp. 126–131. IEEE (2017)
28. Klein, G., Murray, D.: Improving the agility of keyframe-based slam. In: European
conference on computer vision. pp. 802–815. Springer (2008)
29. Ferrera, M., Moras, J., Trouvé-Peloux, P., Creuze, V.: Real-time monocular visual
odometry for turbid and dynamic underwater environments. Sensors 19(3), 687
(2019)
30. Kalman, R.E.: A new approach to linear filtering and prediction problems (1960)
31. Engel, J., Koltun, V., Cremers, D.: Direct sparse odometry. IEEE transactions on
pattern analysis and machine intelligence 40(3), 611–625 (2017)
32. Aqel, M.O., Marhaban, M.H., Saripan, M.I., Ismail, N.B.: Review of visual odom-
etry: types, approaches, challenges, and applications. SpringerPlus 5(1), 1897
(2016)
33. Schonberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 4104–
4113 (2016)
34. Ozyesil, O., Voroninski, V., Basri, R., Singer, A.: A survey of structure from
motion. arXiv preprint arXiv:1701.08493 (2017)
35. Li, A.Q., Coskun, A., Doherty, S.M., Ghasemlou, S., Jagtap, A.S., Modasshir, M.,
Rahman, S., Singh, A., Xanthidis, M., O’Kane, J.M., et al.: Experimental com-
parison of open source vision-based state estimation algorithms. In: International
Symposium on Experimental Robotics. pp. 775–786. Springer (2016)
36. Jones, E.S., Soatto, S.: Visual-inertial navigation, mapping and localization: A
scalable real-time causal approach. The International Journal of Robotics Re-
search 30(4), 407–430 (2011)
37. Blanco, J.L., González, J., Fernández-Madrigal, J.A.: A pure probabilistic ap-
proach to range-only slam. In: 2008 IEEE International Conference on Robotics
and Automation. pp. 1436–1441. IEEE (2008)
38. Korkmaz, M., Yılmaz, N., Durdu, A.: Comparison of the slam algorithms: Hangar
experiments. In: MATEC Web of Conferences. vol. 42, p. 03009. EDP Sciences
(2016)
39. Garcı́a, S., López, M.E., Barea, R., Bergasa, L.M., Gómez, A., Molinos, E.J.:
Indoor slam for micro aerial vehicles control using monocular camera and sen-
sor fusion. In: 2016 international conference on autonomous robot systems and
competitions (ICARSC). pp. 205–210. IEEE (2016)
40. Sayre-McCord, T., Guerra, W., Antonini, A., Arneberg, J., Brown, A., Cavalheiro,
G., Fang, Y., Gorodetsky, A., McCoy, D., Quilter, S., et al.: Visual-inertial navi-
gation algorithm development using photorealistic camera simulation in the loop.
In: 2018 IEEE International Conference on Robotics and Automation (ICRA).
pp. 2566–2573. IEEE (2018)
41. Wu, K., Zhang, T., Su, D., Huang, S., Dissanayake, G.: An invariant-ekf vins
algorithm for improving consistency. In: 2017 IEEE/RSJ International Conference
on Intelligent Robots and Systems (IROS). pp. 1578–1585. IEEE (2017)
42. Kuzmin, M.: Classification and comparison of the existing slam methods for
groups of robots. In: 2018 22nd Conference of Open Innovations Association
(FRUCT). pp. 115–120. IEEE (2018)
43. Brown, R.G., Hwang, P.Y.: Introduction to random signals and applied kalman
filtering: with matlab exercises and solutions. Introduction to random signals and
applied Kalman filtering: with MATLAB exercises and solutions (1997)
44. Chatterjee, A., Rakshit, A., Singh, N.N.: Simultaneous localization and mapping
(slam) in mobile robots. In: Vision Based Autonomous Robot Navigation, pp.
167–206. Springer (2013)
45. Li, S., Ni, P.: Square-root unscented kalman filter based simultaneous localization
and mapping. In: The 2010 IEEE International Conference on Information and
Automation. pp. 2384–2388. IEEE (2010)
46. Konolige, K., Agrawal, M.: Frameslam: From bundle adjustment to real-time vi-
sual mapping. IEEE Transactions on Robotics 24(5), 1066–1077 (2008)
47. Wan, E.A., Van Der Merwe, R.: The unscented kalman filter for nonlinear estima-
tion. In: Proceedings of the IEEE 2000 Adaptive Systems for Signal Processing,
Communications, and Control Symposium (Cat. No. 00EX373). pp. 153–158. Ieee
(2000)
48. Julier, S.J., Uhlmann, J.K.: New extension of the kalman filter to nonlinear sys-
tems. In: Signal processing, sensor fusion, and target recognition VI. vol. 3068,
pp. 182–193. International Society for Optics and Photonics (1997)
49. Hassaballah, M., Abdelmgeid, A.A., Alshazly, H.A.: Image features detection,
description and matching. In: Image Feature Detectors and Descriptors, pp. 11–
45. Springer (2016)
50. Baltrušaitis, T., Ahuja, C., Morency, L.P.: Multimodal machine learning: A survey
and taxonomy. IEEE transactions on pattern analysis and machine intelligence
41(2), 423–443 (2018)
51. Baltrušaitis, T., Ahuja, C., Morency, L.P.: Challenges and applications in mul-
timodal machine learning. In: The Handbook of Multimodal-Multisensor Inter-
faces: Signal Processing, Architectures, and Detection of Emotion and Cognition-
Volume 2, pp. 17–48 (2018)
52. Gil, A., Mozos, O.M., Ballesta, M., Reinoso, O.: A comparative evaluation of
interest point detectors and local descriptors for visual slam. Machine Vision and
Applications 21(6), 905–920 (2010)
53. Salahat, E., Qasaimeh, M.: Recent advances in features extraction and description
algorithms: A comprehensive survey. In: 2017 IEEE international conference on
industrial technology (ICIT). pp. 1059–1063. IEEE (2017)
54. Přibyl, B., Chalmers, A., Zemčı́k, P., Hooberman, L., Čadı́k, M.: Evaluation of
feature point detection in high dynamic range imagery. Journal of Visual Com-
munication and Image Representation 38, 141–160 (2016)
55. Harris, C.G., Stephens, M., et al.: A combined corner and edge detector. In: Alvey
vision conference. vol. 15, pp. 10–5244. Citeseer (1988)
56. Rosten, E., Drummond, T.: Machine learning for high-speed corner detection. In:
European conference on computer vision. pp. 430–443. Springer (2006)
57. Tuytelaars, T., Mikolajczyk, K.: Local invariant feature detectors: a survey. Now
Publishers Inc (2008)
58. Aulinas, J., Carreras, M., Llado, X., Salvi, J., Garcia, R., Prados, R., Petillot,
Y.R.: Feature extraction for underwater visual slam. In: OCEANS 2011 IEEE-
Spain. pp. 1–7. IEEE (2011)
59. Eade, E., Drummond, T.: Edge landmarks in monocular slam. In: In Proc. British
Machine Vision Conf. Citeseer (2006)
60. Harris, C.G., Pike, J.: 3d positional integration from image sequences. Image and
Vision Computing 6(2), 87–90 (1988)
61. Ballesta, M., Gil, A., Martinez Mozos, O., Reinoso, O., et al.: Local descriptors
for visual slam (2007)
62. Muñoz-Salinas, R., Marı́n-Jimenez, M.J., Yeguas-Bolivar, E., Medina-Carnicer,
R.: Mapping and localization from planar markers. Pattern Recognition 73, 158–
171 (2018)
63. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Interna-
tional journal of computer vision 60(2), 91–110 (2004)
64. Zhou, H., Yuan, Y., Shi, C.: Object tracking using sift features and mean shift.
Computer vision and image understanding 113(3), 345–352 (2009)
65. Lindeberg, T.: Scale invariant feature transform (2012)
66. Panchal, P., Panchal, S., Shah, S.: A comparison of sift and surf. International
Journal of Innovative Research in Computer and Communication Engineering
1(2), 323–327 (2013)
67. Se, S., Lowe, D., Little, J.: Mobile robot localization and mapping with un-
certainty using scale-invariant visual landmarks. The international Journal of
robotics Research 21(8), 735–758 (2002)
68. Bay, H., Tuytelaars, T., Van Gool, L.: Surf: Speeded up robust features. In: Eu-
ropean conference on computer vision. pp. 404–417. Springer (2006)
69. Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (surf).
Computer vision and image understanding 110(3), 346–359 (2008)
70. Kole, S., Agarwal, C., Gupta, T., Singh, S.: Surf and ransac: A conglomerative
approach to object recognition. International Journal of Computer Applications
109(4) (2015)
71. Rosten, E., Porter, R., Drummond, T.: Faster and better: A machine learning
approach to corner detection. IEEE transactions on pattern analysis and machine
intelligence 32(1), 105–119 (2008)
72. Li, A., Jiang, W., Yuan, W., Dai, D., Zhang, S., Wei, Z.: An improved fast+ surf
fast matching algorithm. Procedia Computer Science 107, 306–312 (2017)
73. Viswanathan, D.G.: Features from accelerated segment test (fast). Homepages.
Inf. Ed. Ac. Uk (2009)
74. Schmid, C., Mohr, R., Bauckhage, C.: Evaluation of interest point detectors. In-
ternational Journal of computer vision 37(2), 151–172 (2000)
75. Canny, J.F.: Finding edges and lines in images. Tech. rep., MASSACHUSETTS
INST OF TECH CAMBRIDGE ARTIFICIAL INTELLIGENCE LAB (1983)
76. Calonder, M., Lepetit, V., Strecha, C., Fua, P.: Brief: Binary robust independent
elementary features. In: European conference on computer vision. pp. 778–792.
Springer (2010)
77. Heinly, J., Dunn, E., Frahm, J.M.: Comparative evaluation of binary features. In:
European Conference on Computer Vision. pp. 759–773. Springer (2012)
78. Li, Y., Wang, S., Tian, Q., Ding, X.: A survey of recent advances in visual feature
detection. Neurocomputing 149, 736–751 (2015)
79. Paz, L.M., Piniés, P., Tardós, J.D., Neira, J.: Large-scale 6-dof slam with stereo-
in-hand. IEEE transactions on robotics 24(5), 946–957 (2008)
80. Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: Orb: An efficient alternative to
sift or surf. In: 2011 International conference on computer vision. pp. 2564–2571.
Ieee (2011)
81. Lv, Q., Josephson, W., Wang, Z., Charikar, M., Li, K.: Multi-probe lsh: efficient
indexing for high-dimensional similarity search. In: Proceedings of the 33rd inter-
national conference on Very large data bases. pp. 950–961 (2007)
82. Mur-Artal, R., Montiel, J.M.M., Tardos, J.D.: Orb-slam: a versatile and accurate
monocular slam system. IEEE transactions on robotics 31(5), 1147–1163 (2015)
83. Mur-Artal, R., Tardós, J.D.: Orb-slam: tracking and mapping recognizable fea-
tures. In: Workshop on Multi View Geometry in Robotics (MVIGRO)-RSS. vol.
2014, p. 2 (2014)
84. Mur-Artal, R., Tardós, J.D.: Visual-inertial monocular slam with map reuse. IEEE
Robotics and Automation Letters 2(2), 796–803 (2017)
85. Mur-Artal, R., Tardós, J.D.: Fast relocalisation and loop closing in keyframe-
based slam. In: 2014 IEEE International Conference on Robotics and Automation
(ICRA). pp. 846–853. IEEE (2014)
86. Fujimoto, S., Hu, Z., Chapuis, R., Aufrère, R.: Orb-slam map initialization im-
provement using depth. In: 2016 IEEE International Conference on Image Pro-
cessing (ICIP). pp. 261–265. IEEE (2016)
87. Majdik, A.L., Verda, D., Albers-Schoenberg, Y., Scaramuzza, D.: Air-ground
matching: Appearance-based gps-denied urban localization of micro aerial ve-
hicles. Journal of Field Robotics 32(7), 1015–1039 (2015)
88. Han, L., Zhou, G., Xu, L., Fang, L.: Beyond sift using binary features in loop clo-
sure detection. In: 2017 IEEE/RSJ International Conference on Intelligent Robots
and Systems (IROS). pp. 4057–4063. IEEE (2017)
89. Negre, P.L., Bonin-Font, F., Oliver, G.: Cluster-based loop closing detection for
underwater slam in feature-poor regions. In: 2016 IEEE International Conference
on Robotics and Automation (ICRA). pp. 2589–2595. IEEE (2016)
90. Gálvez-López, D., Tardos, J.D.: Bags of binary words for fast place recognition
in image sequences. IEEE Transactions on Robotics 28(5), 1188–1197 (2012)
91. Mur-Artal, R., Tardós, J.D.: Orb-slam2: An open-source slam system for monoc-
ular, stereo, and rgb-d cameras. IEEE Transactions on Robotics 33(5), 1255–1262
(2017)
92. Civera, J., Davison, A.J., Montiel, J.M.: Inverse depth parametrization for monoc-
ular slam. IEEE transactions on robotics 24(5), 932–945 (2008)
93. Engel, J., Stückler, J., Cremers, D.: Large-scale direct slam with stereo cameras.
In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems
(IROS). pp. 1935–1942. IEEE (2015)
94. Endres, F., Hess, J., Sturm, J., Cremers, D., Burgard, W.: 3-d mapping with an
rgb-d camera. IEEE transactions on robotics 30(1), 177–187 (2013)
95. Caldato, B.A., Achilles Filho, R., Castanho, J.E.C.: Orb-odom: Stereo and odome-
ter sensor fusion for simultaneous localization and mapping. In: 2017 latin Amer-
ican robotics symposium (LARS) and 2017 Brazilian symposium on robotics
(SBR). pp. 1–5. Ieee (2017)
96. Kerl, C., Sturm, J., Cremers, D.: Dense visual slam for rgb-d cameras. In: 2013
IEEE/RSJ International Conference on Intelligent Robots and Systems. pp. 2100–
2106. IEEE (2013)
97. Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: The kitti
dataset. The International Journal of Robotics Research 32(11), 1231–1237 (2013)
98. Strasdat, H., Davison, A.J., Montiel, J.M., Konolige, K.: Double window optimisa-
tion for constant time visual slam. In: 2011 international conference on computer
vision. pp. 2352–2359. IEEE (2011)
99. Lv, Q., Lin, H., Wang, G., Wei, H., Wang, Y.: Orb-slam-based tracing and 3d
reconstruction for robot using kinect 2.0. In: 2017 29th Chinese Control And
Decision Conference (CCDC). pp. 3319–3324. IEEE (2017)
100. Li, M., Zhang, M., Fu, Y., Guo, W., Zhong, X., Wang, X., Chen, F.: Fast
and robust mapping with low-cost kinect v2 for photovoltaic panel cleaning
robot. In: 2016 International Conference on Advanced Robotics and Mechatronics
(ICARM). pp. 95–100. IEEE (2016)
101. Pumarola, A., Vakhitov, A., Agudo, A., Sanfeliu, A., Moreno-Noguer, F.: Pl-
slam: Real-time monocular visual slam with points and lines. In: 2017 IEEE in-
ternational conference on robotics and automation (ICRA). pp. 4503–4508. IEEE
(2017)
102. Karami, E., Prasad, S., Shehata, M.: Image matching using sift, surf, brief and orb:
performance comparison for distorted images. arXiv preprint arXiv:1710.02726
(2017)
103. Chien, H.J., Chuang, C.C., Chen, C.Y., Klette, R.: When to use what feature? sift,
surf, orb, or a-kaze features for monocular visual odometry. In: 2016 International
Conference on Image and Vision Computing New Zealand (IVCNZ). pp. 1–6.
IEEE (2016)
104. Leutenegger, S., Lynen, S., Bosse, M., Siegwart, R., Furgale, P.: Keyframe-based
visual–inertial odometry using nonlinear optimization. The International Journal
of Robotics Research 34(3), 314–334 (2015)
105. Triggs, B., McLauchlan, P.F., Hartley, R.I., Fitzgibbon, A.W.: Bundle adjust-
ment—a modern synthesis. In: International workshop on vision algorithms. pp.
298–372. Springer (1999)
106. Dellaert, F.: Visual slam tutorial: Bundle adjustment (2014)
107. Engels, C., Stewénius, H., Nistér, D.: Bundle adjustment rules. Photogrammetric
computer vision 2(32) (2006)
108. Sweeney, C., Sattler, T., Hollerer, T., Turk, M., Pollefeys, M.: Optimizing the
viewing graph for structure-from-motion. In: Proceedings of the IEEE Interna-
tional Conference on Computer Vision. pp. 801–809 (2015)
109. Mouragnon, E., Lhuillier, M., Dhome, M., Dekeyser, F., Sayd, P.: Real time lo-
calization and 3d reconstruction. In: 2006 IEEE Computer Society Conference on
Computer Vision and Pattern Recognition (CVPR’06). vol. 1, pp. 363–370. IEEE
(2006)
110. Zhang, Y., Yang, J., Zhang, H., Hwang, J.N.: Bundle adjustment for monocular
visual odometry based on detected traffic sign features. In: 2019 IEEE Interna-
tional Conference on Image Processing (ICIP). pp. 4350–4354. IEEE (2019)
111. Grisetti, G., Kümmerle, R., Strasdat, H., Konolige, K.: g2o: A general framework
for (hyper) graph optimization. In: Proceedings of the IEEE International Con-
ference on Robotics and Automation (ICRA), Shanghai, China. pp. 9–13 (2011)
112. Chojnacki, M., Indelman, V.: Vision-based dynamic target trajectory and ego-
motion estimation using incremental light bundle adjustment. International Jour-
nal of Micro Air Vehicles, Special Issue on Estimation and Control for MAV Nav-
igation in GPS-denied Cluttered Environments 10(2), 157–170 (2018)
113. Indelman, V., Roberts, R., Dellaert, F.: Incremental light bundle adjust-
ment for structure from motion and robotics. Robotics and Autonomous Sys-
tems 70, 63–82 (2015), http://www.sciencedirect.com/science/article/pii/
S0921889015000810
114. Demoulin, Q., Lefebvre-Albaret, F., Basarab, A., Kouamé, D., Tourneret, J.Y.:
Constrained bundle adjustment applied to wing 3d reconstruction with mechani-
cal limitations
115. Bustos, Á.P., Chin, T.J., Eriksson, A., Reid, I.: Visual slam: Why bundle adjust?
In: 2019 International Conference on Robotics and Automation (ICRA). pp. 2385–
2391. IEEE (2019)
116. Li, X., Ling, H.: Hybrid camera pose estimation with online partitioning. arXiv
preprint arXiv:1908.01797 (2019)
117. Grisetti, G., Stachniss, C., Burgard, W.: Nonlinear constraint network optimiza-
tion for efficient map learning. IEEE Transactions on Intelligent Transportation
Systems 10(3), 428–439 (2009)
118. Kümmerle, R., Grisetti, G., Strasdat, H., Konolige, K., Burgard, W.: g 2 o: A gen-
eral framework for graph optimization. In: 2011 IEEE International Conference
on Robotics and Automation. pp. 3607–3613. IEEE (2011)
119. Engel, J.: Tutorial on geometric and semantic 3d reconstruction, cvpr 2017
120. Pirchheim, C., Schmalstieg, D., Reitmayr, G.: Handling pure camera rotation
in keyframe-based slam. In: 2013 IEEE international symposium on mixed and
augmented reality (ISMAR). pp. 229–238. IEEE (2013)
121. Indelman, V.: Bundle adjustment without iterative structure estimation and its
application to navigation. In: Proceedings of the 2012 IEEE/ION Position, Loca-
tion and Navigation Symposium. pp. 748–756. IEEE (2012)
122. Hartley, R., Trumpf, J., Dai, Y., Li, H.: Rotation averaging. International journal
of computer vision 103(3), 267–305 (2013)
123. Bagchi, S., Chin, T.J.: Event-based star tracking via multiresolution progressive
hough transforms. In: The IEEE Winter Conference on Applications of Computer
Vision. pp. 2143–2152 (2020)
124. Eriksson, A., Olsson, C., Kahl, F., Chin, T.J.: Rotation averaging and strong
duality. In: Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition. pp. 127–135 (2018)
125. Sim, K., Hartley, R.: Recovering camera motion using l\infty minimization. In:
2006 IEEE Computer Society Conference on Computer Vision and Pattern Recog-
nition (CVPR’06). vol. 1, pp. 1230–1237 (2006)
126. Kneip, L., Li, H.: Efficient computation of relative pose for multi-camera sys-
tems. In: Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition. pp. 446–453 (2014)
127. Khosravian, A., Chin, T.J., Reid, I., Mahony, R.: A discrete-time attitude observer
on so (3) for vision and gps fusion. In: 2017 IEEE International Conference on
Robotics and Automation (ICRA). pp. 5688–5695. IEEE (2017)
128. Liu, L.Y.: Towards Observable Urban Visual SLAM. Ph.D. thesis (2020)
129. Ovechkin, V., Indelman, V.: Bafs: Bundle adjustment with feature scale con-
straints for enhanced estimation accuracy. IEEE Robotics and Automation Let-
ters (RA-L) 3(2), 804–810 (2018)
130. Dellaert, F., Kaess, M., et al.: Factor graphs for robot perception. Foundations
and Trends R in Robotics 6(1-2), 1–139 (2017)
131. Kschischang, F.R., Frey, B.J., Loeliger, H.A.: Factor graphs and the sum-product
algorithm. IEEE Transactions on information theory 47(2), 498–519 (2001)
132. Indelman, V., Roberts, R., Dellaert, F.: Incremental light bundle adjustment for
structure from motion and robotics. Robotics and Autonomous Systems 70, 63–82
(2015)
133. Konolige, K., Grisetti, G., Kümmerle, R., Burgard, W., Limketkai, B., Vincent,
R.: Efficient sparse pose adjustment for 2d mapping. In: 2010 IEEE/RSJ Inter-
national Conference on Intelligent Robots and Systems. pp. 22–29. IEEE (2010)
134. Carlone, L., Aragues, R., Castellanos, J.A., Bona, B.: A fast and accurate approxi-
mation for planar pose graph optimization. The International Journal of Robotics
Research 33(7), 965–987 (2014)
135. Frese, U., Larsson, P., Duckett, T.: A multilevel relaxation algorithm for simulta-
neous localization and mapping. IEEE Transactions on Robotics 21(2), 196–207
(2005)
136. Thrun, S., Montemerlo, M.: The graph slam algorithm with applications to large-
scale mapping of urban structures. The International Journal of Robotics Re-
search 25(5-6), 403–429 (2006)
137. Olson, E., Agarwal, P.: Inference on networks of mixtures for robust robot map-
ping. The International Journal of Robotics Research 32(7), 826–840 (2013)
138. Wang, H., Hu, G., Huang, S., Dissanayake, G.: On the structure of nonlinearities
in pose graph slam. In: Proc. Robot.: Sci. Syst. VIII. pp. 425–433 (2013)
139. Kaess, M., Johannsson, H., Roberts, R., Ila, V., Leonard, J.J., Dellaert, F.: isam2:
Incremental smoothing and mapping using the bayes tree. The International Jour-
nal of Robotics Research 31(2), 216–235 (2012)
140. Carlone, L., Aragues, R., Castellanos, J.A., Bona, B.: A first-order solution to
simultaneous localization and mapping with graphical models. In: 2011 IEEE In-
ternational Conference on Robotics and Automation. pp. 1764–1771. IEEE (2011)
141. Carlone, L., Aragues, R., Castellanos, J.A., Bona, B.: A linear approximation
for graph-based simultaneous localization and mapping. Robotics: Science and
Systems VII pp. 41–48 (2012)
142. Carlone, L., Aragues, R., Castellanos, J., Bona, B.: A fast and accurate approxi-
mation for pose graph optimization. Int. J. Robot. Res (2012)
143. Gawel, A., Cieslewski, T., Dubé, R., Bosse, M., Siegwart, R., Nieto, J.: Structure-
based vision-laser matching. In: 2016 IEEE/RSJ International Conference on In-
telligent Robots and Systems (IROS). pp. 182–188. IEEE (2016)
144. Wang, C., Zhang, H., Nguyen, T.M., Xie, L.: Ultra-wideband aided fast local-
ization and mapping system. In: 2017 IEEE/RSJ International Conference on
Intelligent Robots and Systems (IROS). pp. 1602–1609. IEEE (2017)
145. Liang, X., Chen, H., Li, Y., Liu, Y.: Visual laser-slam in large-scale indoor envi-
ronments. In: 2016 IEEE International Conference on Robotics and Biomimetics
(ROBIO). pp. 19–24. IEEE (2016)
146. Evers, C., Naylor, P.A.: Optimized self-localization for slam in dynamic scenes us-
ing probability hypothesis density filters. IEEE Transactions on Signal Processing
66(4), 863–878 (2017)
147. Xin, G.x., Zhang, X.t., Wang, X., Song, J.: A rgbd slam algorithm combining orb
with prosac for indoor mobile robot. In: 2015 4th International Conference on
Computer Science and Network Technology (ICCSNT). vol. 1, pp. 71–74. IEEE
(2015)
148. Wang, X., Chen, H., Li, Y.: Online calibration for monocular vision and odometry
fusion. In: 2017 IEEE International Conference on Unmanned Systems (ICUS).
pp. 602–607. IEEE (2017)
149. Guerrero-Font, E., Massot-Campos, M., Negre, P.L., Bonin-Font, F., Codina,
G.O.: An usbl-aided multisensor navigation system for field auvs. In: 2016 IEEE
International Conference on Multisensor Fusion and Integration for Intelligent
Systems (MFI). pp. 430–435. IEEE (2016)
150. Qiu, K., Liu, T., Shen, S.: Model-based global localization for aerial robots using
edge alignment. IEEE Robotics and Automation Letters 2(3), 1256–1263 (2017)
151. Li, J., Kaess, M., Eustice, R.M., Johnson-Roberson, M.: Pose-graph slam using
forward-looking sonar. IEEE Robotics and Automation Letters 3(3), 2330–2337
(2018)
152. Luo, J., Qin, S.: A fast algorithm of slam based on combinatorial interval filters.
IEEE Access 6, 28174–28192 (2018)
153. Concha, A., Loianno, G., Kumar, V., Civera, J.: Visual-inertial direct slam. In:
2016 IEEE international conference on robotics and automation (ICRA). pp.
1331–1338. IEEE (2016)
154. Fischer, T., Pire, T., Čı́žek, P., De Cristóforis, P., Faigl, J.: Stereo vision-based
localization for hexapod walking robots operating in rough terrains. In: 2016
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
pp. 2492–2497. IEEE (2016)
155. Yan, Z., Ye, M., Ren, L.: Dense visual slam with probabilistic surfel map. IEEE
transactions on visualization and computer graphics 23(11), 2389–2398 (2017)
156. Rameau, F., Ha, H., Joo, K., Choi, J., Park, K., Kweon, I.S.: A real-time aug-
mented reality system to see-through cars. IEEE transactions on visualization
and computer graphics 22(11), 2395–2404 (2016)
157. Liu, H., Zhang, G., Bao, H.: Robust keyframe-based monocular slam for aug-
mented reality. In: 2016 IEEE International Symposium on Mixed and Augmented
Reality (ISMAR). pp. 1–10. IEEE (2016)
158. Sjanic, Z., Skoglund, M.A., Gustafsson, F.: Em-slam with inertial/visual appli-
cations. IEEE Transactions on Aerospace and Electronic Systems 53(1), 273–285
(2017)
159. Zhou, H., Ni, K., Zhou, Q., Zhang, T.: An sfm algorithm with good convergence
that addresses outliers for realizing mono-slam. IEEE Transactions on Industrial
Informatics 12(2), 515–523 (2016)
160. Qin, T., Li, P., Shen, S.: Vins-mono: A robust and versatile monocular visual-
inertial state estimator. IEEE Transactions on Robotics 34(4), 1004–1020 (2018)
161. Zienkiewicz, J., Tsiotsios, A., Davison, A., Leutenegger, S.: Monocular, real-time
surface reconstruction using dynamic level of detail. In: 2016 Fourth International
Conference on 3D Vision (3DV). pp. 37–46. IEEE (2016)
162. Tateno, K., Tombari, F., Laina, I., Navab, N.: Cnn-slam: Real-time dense monoc-
ular slam with learned depth prediction. In: Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition. pp. 6243–6252 (2017)
163. Wang, Z., Xu, M., Ye, N., Wang, R., Huang, H.: Rf-mvo: Simultaneous 3d object
localization and camera trajectory recovery using rfid devices and a 2d monocular
camera. In: 2018 IEEE 38th International Conference on Distributed Computing
Systems (ICDCS). pp. 534–544. IEEE (2018)
164. Carlone, L., Karaman, S.: Attention and anticipation in fast visual-inertial navi-
gation. IEEE Transactions on Robotics 35(1), 1–20 (2018)
165. Schmuck, P., Chli, M.: Multi-uav collaborative monocular slam. In: 2017 IEEE
International Conference on Robotics and Automation (ICRA). pp. 3863–3870.
IEEE (2017)
166. Wang, S., Clark, R., Wen, H., Trigoni, N.: Deepvo: Towards end-to-end visual
odometry with deep recurrent convolutional neural networks. In: 2017 IEEE Inter-
national Conference on Robotics and Automation (ICRA). pp. 2043–2050. IEEE
(2017)
167. Teixeira, L., Chli, M.: Real-time mesh-based scene estimation for aerial inspection.
In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems
(IROS). pp. 4863–4869. IEEE (2016)
168. Lajoie, P.Y., Ramtoula, B., Chang, Y., Carlone, L., Beltrame, G.: Door-slam:
Distributed, online, and outlier resilient slam for robotic teams. IEEE Robotics
and Automation Letters 5(2), 1656–1663 (2020)
169. Huang, W., Liu, H., Wan, W.: An online initialization and self-calibration method
for stereo visual-inertial odometry. IEEE Transactions on Robotics (2020)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy