0% found this document useful (0 votes)

38 views20 pages

A Tutorial On Visual Servo Control

This document provides a tutorial introduction to visual servo control of robotic manipulators. It begins with an overview of relevant topics in robotics and computer vision. It then presents a taxonomy of visual servo control systems, distinguishing between position-based and image-based approaches. It discusses feature tracking methods needed for visual servoing. Finally, it notes the increasing research interest in visual servo control in recent years, fueled by advances in computing power.

Uploaded by

Alberto Nicolotti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views20 pages

A Tutorial On Visual Servo Control

Uploaded by

Alberto Nicolotti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

IEEE TRANSACTIONS O N ROROTLCS AND AUTOMATION, VOL. 12, NO.

5 , OCTOBER 1996 65 I

A Tutorial on Visual Servo Control

Seth Hutchinson, Member, IEEE, Gregory D. Hager, Member, IEEE, and Peter I. Corke, Member, IEEE

Abstract-This article provides a tutorial introduction to visual the resulting operation depends directly on the accuracy of the
servo control of robotic manipulators.Since the topic spans many visual sensor and the robot end-effector.
disciplines our goal is limited to providing a basic conceptual An alternative to increasing the accuracy of these subsys-
framework. We begin by reviewing the prerequisite topics from
robotics and computer vision, including a brief review of coordi- tems is to use a visual-feedback control loop that will increase
nate transformations, velocity representation, and a description the overall accuracy of the system-a principal concern in
of the geometric aspects of the image formation process. We then most applications. Taken to the extreme, machine vision
present a taxonomy of visual servo control systems. The two can provide closed-loop position control for a robot end-
major classes of systems, position-basedand image-basedsystems, effector-this is referred to as visual servoing. This term
are then discussed in detail. Since any visual servo system must
be capable of tracking image features in a sequence of images, we appears to have been first introduced by Hill and Park [2]
also include an overview of feature-based and correlation-based in 1979 to distinguish their approach from earlier “blocks
methods for tracking. We conclude the tutorial with a number world” experiments where the system alternated between
of observations on the current directions of the research field of picture taking and moving. Prior to the introduction of this
visual servo control.
term, the less specific term visual feedback was generally used.
For the purposes of this article, the task in visual servoing is to
I. INTRODUCTION use visual information to control the pose of the robot’s end-

T HE VAST majority of today’s growing robot population

operate in factories where the environment can be con-
trived to suit the robot. Robots have had far less impact in
effector relative to a target object or a set of target features.
The task can also be defined for mobile robots, where it
becomes the control of the vehicle’s pose with respect to some
applications where the work environment and object placement landmarks.
cannot be accurately controlled. This limitation is largely due Since the first visual servoing systems were reported in the
to the inherent lack of sensory capability in contemporary early 1980s, progress in visual control of robots has been fairly
commercial robot systems. It has long been recognized that slow, but the last few years have seen a marked increase
sensor integration is fundamental to increasing the versatility in published research. This has been fueled by personal
and application domain of robots, but to date this has not computing power crossing the threshold that allows analysis
proven cost effective for the bulk of robotic applications, of scenes at a sufficient rate to “servo” a robot manipulator.
which are in manufacturing. The “frontier” of robotics, which Prior to this, researchers required specialized and expensive
is operation in the everyday world, provides new impetus for pipelined pixel processing hardware. Applications that have
this research. Unlike the manufacturing application, it will not been proposed or prototyped span manufacturing (grasping
be cost effective to re-engineer “our world” to suit the robot. objects on conveyor belts and part mating), teleoperation,
Vision is a useful robotic sensor since it mimics the human missile tracking cameras, and fruit picking, as well as robotic
sense of vision and allows for noncontact measurement of ping-pong, juggling, balancing, car steering, and even aircraft
the environment. Since the early work of Shirai and Inoue landing. A comprehensive review of the literature in this field,
[I] (who describe how a visual feedback loop can be used as well the history and applications reported to date, is given
to correct the position of a robot to increase task accuracy), by Corke 131 and includes a large bibliography.
considerable effort has been devoted to the visual control of Visual servoing is the fusion of results from many elemental
robot manipulators. Robot controllers with fully integrated areas including high-speed image processing, kinematics, dy-
vision systems are now available from a number of vendors. namics, control theory, and real-time computing. It has much
Typically visual sensing and manipulation are combined in an in common with research into active vision and structure
open-loop fashion, ‘‘looking’’ then “moving”. The accuracy of from motion, but is quite different from the often described
Manuscript reccived March 24, 1995; revised January 19, 1996. G. D. use of vision in hierarchical task-level robot control systems.
Hager was supported by ARPA grant N00014-93-1-1235, Army DURIP grant Many of the control and vision problems are similar to those
DAAH04-95-1-0058, National Science Foundation grant IRI-9420982, and encountered by active vision researchers who are building
by funds provided by Yale University. This paper was recommended for
publication by Associate Editor J. Funda and Editor S. E. Salcudean upon “robotic heads”. However the task in visual servoing is to
evaluation of reviewers’ comments. control a robot to manipulate its environment using vision as
S. Hutchinson is with the Department of Electrical and Computer Engineer- opposed to just observing the environment.
ing, The Beckman Institute for Advanced Science and Technology, University
of-Illinois at Urbana-Champaign, Urbana, IL 61801 USA. Given the current interest in visual servoing it seems both
G. D. Hager is with the Department of Computer Science, Yale University, appropriate and timely to provide a tutorial introduction to
New Haven, CT 06520-8285 USA. this topic. Our aim is to assist others in creating visually
P. I. Corke is with the CSIRO Division of’ Manufacturing Technology,
Kenmore, Australia, 4069. servoed systems by providing a consistent terminology and
Publisher Item Identifier S 1042-296)3(96)07366-1. nomenclature, and an appreciation of possible applications.
1042-296>(/96$05.00 0 1996 IEEE

Authorized licensed use limited to: UNIVERSIDAD DE LAS AMERICAS PUEBLA. Downloaded on November 04,2022 at 15:21:45 UTC from IEEE Xplore. Restrictions apply.
652 IEEE TRANSACTIONS ON ROBOTICS AND AIJTOMATION, VOL 12, NO 5 , OCTOBER 1996

To assist newcomers to the field we will describe techniques m = 6. In some applications, the task space may be restricted
which require only simple vision hardware (just a digitizer), to a subspace of SE3. For example, for pick and place, we
freely available vision software [4], and which make few may consider pure translations (7 = iK3; for which m = 3 ) :
assumptions about the robot and its control system. This is while for tracking an object and keeping it in view we might
sufficient to commence investigation of many applications consider only rotations (7 = SO3, for which m = 3 ) .
where high control andor vision performance are not required. Typically, robotic tasks are specified with respect to one or
One of the difficulties in writing such an article is that more coordinate frames. For example, a camera may supply
the topic spans many disciplines that cannot be adequately information about the location of an object with respect to
addressed in a single article. For example, the underlying a camera frame, while the configuration used to grasp the
control problem is fundamentally nonlinear, and visual recog- object may be specified with respect to a coordinate frame
nition, tracking, and reconstruction are fields unto themselves. attached to the object. We represent the coordinates of point
Therefore we have concentrated on certain basic aspects of P with respect to coordinate frame .?: by the notation " P .
each discipline, and have provided an extensive bibliography Given two frames, z and y 7 the rotation, matrix that represents
to assist the reader who seeks greater detail than can be the orientation of frame y with respect to frame x is denoted
provided here. Our preference is always to present those by 3"?,. The location of the origin of frame y with respect to
ideas and techniques that we have found to function well frame IC is denoted by the vector V,. Together, the position
in practice and that have some generic applicability. Another and orientation of a frame specify a pose, which we denote by
difficulty is the current rapid growth in the vision-based motion "zy. If the leading superscript, 2 , is not specified, the world
control literature, which contains solutions and promising coordinate frame is assumed.
approaches to many of the theoretical and technical problems We may also use a pose to specify a coordinate transforma-
involved. Again we have presented what we consider to be tion. We use function application to denote applying a change
the most fundamental concepts, and again refer the reader to of coordinates to a point. In particular, if we are given YP (the
the bibliography. coordinates of point P relative to frame y), and we obtain
The remainder of this article is structured as follows. the coordinates of P with respect to frame R' by applying the
Section I1 reviews the relevant fundamentals of coordinate coordinate transformation rule
transformations, pose representation, and image formation. In
Section 111, we present a taxonomy of visual servo control
systems (adapted from [5]). The two major classes of systems,
position-based visual servo systems and image-based visual
servo systems, are then discussed in Sections IV and V
In the sequel, we will use the notation 2xy to refer either
respectively. Since any visual servo system must be capable to a coordinate transformation, or to a pose that is specified
of tracking image features in a sequence of images, Section VI
by a rotation matrix and translation, and Tt,. respec-
describes some approaches to visual tracking that have found tively. Likewise, we will use the terms pose and coordinate
wide applicability and can be implemented using a minimum
transformation interchangeably. In general, there should be no
of special-purpose hardware. Finally, Section VI1 presents a
ambiguity between the two interpretations of z:zy'.
number of observations regarding the current directions of the
Often, we must compose multiple coordinate transforma-
research field of visual servo control.
tions to obtain a desired change of coordinates. For example,
suppose that we are given poses sxv and Yx,. If we are given
AND DEFINITIONS
11. BACKGROUND "E' and wish to compute "'17,we may use the composition of
In this section we provide a very brief overview of some coordinate transformations
topics from robotics and computer vision that are relevant to
visual servo control. We begin by defining the terminology and
notation required to represent coordinate transformations and
the velocity of a rigid object moving through the workspace
(Sections 11-A and 11-B). Following this, we briefly discuss
several issues related to image formation (Sections 11-C and
11-D), and possible camerdrobot configurations (Section II-
E). The reader who is familiar with these topics may wish to As seen here, we represent the composition of coordinate
proceed directly to Section 111. transformations by 'x, = "x, o Yx,, and the corresponding
coordinate transformation of the point "Pby ( r x yo v x z ) ( " P ) .
The corresponding rotation matrix and translation are given by
A. Coordinate Transformations
In this paper, the task space of the robot, represented by I;
is the set of positions and orientations that the robot tool can
attain. Since the task space is merely the configuration space of
the robot tool, the task space is a smooth m-manifold (see, e.g.,
'We have not used more common notations based on homogeneous
[6]). If the tool is a single rigid body moving arbitrarily in a transforms because over parametcrizing points makes it difficult to develop
three-dimensional workspace, then 1= SE3 = $?3 x SO3, and somc of the machinery nccded for control.

Authorized licensed use limited to: UNIVERSIDAD DE LAS AMERICAS PUEBLA. Downloaded on November 04,2022 at 15:21:45 UTC from IEEE Xplore. Restrictions apply.
HUTCHINSON et al.: TUTORIAL ON VISUAL SERVO CONTROL 653

Some coordinate frames that will be needed frequently are Together, T and R define what is known in the robotics
referred to by the following superscripts/subscripts: literature as a velocity screw

e The coordinate frame attached to the robot

end effector
r =
t The coordinate frame attached t o the target
0 Thc base frame for thc robot
c, The coordinste frame of the ith camcra
Note that i also represents the derivative of T when the rotation
matrix, R, is parameterized by the set of rotations about the
When 7 = SE3! we will use the notation x p E I to
coordinate axes.
represent the pose of the end-effector coordinate frame relative
Define the 3 x ci matrix A ( P ) = [I31- s k ( P ) ] where 13
to the world frame. In this case, we often prefer to parameterize
represents the 3 x 3 identity matrix. Then (13) can be rewritten
a pose using a translation vector and three angles, (e.g.,
in matrix form as
roll, pitch and yaw [7]). Although such parameterizations are
inherently local, it is often convenient to represent a pose by a
vector T E g6,rather than by 2, E 7.This notation can easily
P =A(P)f. (14)
be adapted to the case where I C SE3. For example, when Suppose now that we are given a point expressed in end-
'T = !R3I we will parameterize the task space by T = [x,TJ. z]'. effector coordinates, ' P . and we wish to determine the motion
In the sequel, to maintain generality we will assume that of this point in base coordinates as the robot is in motion.
T E Rm, unless we are considering a specific task.
Combining (1) and (14), we have

B. The Velocity of a Rigid Object P = A(2R(CP)i. (15)

In visual servo applications, we are often interested in
Occasionally, it is useful to transform velocity screws
the relationship between the velocity of some object in the
among coordinate frames. For example, suppose that ' i =
workspace (e.g., the manipulator end-effector) and the cor-
responding changes that occur in the observed image of the
['T; 'Cl]' is the velocity of the end-effector in end-effector
coordinates, Then the equivalent screw in base coordinates is
workspace. In this section, we briefly introduce notation to
represent velocities of objects in the workspace.
Consider the robot end-effector moving in a workspace
with 7 C SE3. In base coordinates, the motion is described
by an angular velocity 62(t) = [ ~ ~ ( t ) , ~ ~ ( t ) , and w ~ ( t ) ] ~
r = I;[ = [R,"T ReeR
R,'R
- x t,
I
C. Camera Projection Models
a translational velocity T ( t ) = [T,(t),T,(t),TZ(t)I7'. The
rotation acts about a point which, unless otherwise indicated, To control the robot using information provided by a com-
we take to be the origin of the base coordinate system. Let puter vision system, it is necessary to understand the geometric
P be a point that is rigidly attached to the end-effector, with aspects of the imaging process. Each camera contains a lens
base frame coordinates [:I;.,y, 21'. The time derivatives of the that forms a 2D projection of the scene on the image plane
coordinates of P , expressed in base coordinates, are given by where the sensor is located. This projection causes direct depth
information to be lost so that each point on the image plane
.i: = ZWY - yw,: + TT (9) corresponds to a ray in 3D space. Therefore, some additional
=ZW, - ZW, + Tu (10) information is needed to determine the 3D coordinates cor-
2 =YW, - zwl, + T, (11)
responding to an image plane point. This information may
come from multiple cameras, multiple views with a single
which can be written in vector notation as camera, or knowledge of the geometric relationship between
several feature points on the target. In this section, we describe
three projection models that have been widely used to model
the image formation process: perspective projection, scaled
This can be written concisely in matrix form by noting that orthographic projection, and affine projection. Although we
the cross product can be represented in terms of the skew- briefly describe each of these projection models, throughout
symmetric matrix the remainder of the tutorial we will assume the use of
perspective projection.
For each of the three projection models, we assign the
camera coordinate system with the TC- and y-axes forming
a basis for the image plane, the z-axis perpendicular to the
allowing us to write image plane (along the optical axis), and with origin located
at distance X behind the image plane, where X is the focal
length of the camera lens. This is illustrated in Fig. 1.

Authorized licensed use limited to: UNIVERSIDAD DE LAS AMERICAS PUEBLA. Downloaded on November 04,2022 at 15:21:45 UTC from IEEE Xplore. Restrictions apply.
654 IEEE TRANSACTIONS O N ROBOTICS AND AUTOMATION, VOL 12, NO 5 , OCTOBER 1996

D. Image Features and the Image Feature Parameter Space

i
In the computer vision literature, an image feature is any
structural feature than can be extracted from an image (e.g., an
edge or a corner). Typically, an image feature will correspond
to the projection of a physical feature of some object (e.g., the
robot tool) onto the camera image plane. A good feature point
is one that can be located unambiguously in different views
of the scene, such as a hole in a gasket [lo] or a contrived
pattern [ l l ] , [12]. We define an image feature parameter to
be any real-valued quantity that can be calculated from one
or more image features.2 Some of the feature parameters that
Fig. 1. The coordinate frame for the carnerdlens \ystein.
have been used for visual servo control include the image plane
coordinates of points in the image [l I], [14]-[19], the distance
I ) Perspective Projection: Assuming that the projective ge- between two points in the image plane and the orientation of
ometry of the camera is modeled by perspective projection the line connecting those two points [lo], [20],perceived edge
(see, e.g., [8]), a point, “ P= [ x ,y j z]”: whose coordinates are length [21], the area of a projected surface and the relative
expressed with respect to the camera coordinate frame, c, will areas of two projected surfaces [21], the centroid and higher
project onto the image plane with coordinates p = [ I L , ~ ] ~ order
, moments of a projected surface [211-[24], the parameters
given by of lines in the image plane [ I l l , and the parameters of an
ellipse in the image plane 1111. In this tutorial we will restrict
our attention to point features whose parameters are their
image plane coordinates.
If the coordinates of I’ are expressed relative to coordinate Given a set of k image feature parameters, we can define an
frame x ; we must first perform the coordinate transformation image feature parameter vector f = [ f l . . . f k I T . Since each
c p = (-x,r(JP). ,f, is a (possibly bounded) real valued parameter, we have
2 ) Scaled Orthographic Projection: Perspective projection f = [ f l . . . f k , l T E 3 C Rk,where 3 represents the image
is a nonlinear mapping from Cartesian to image Coordinates. feature parameter space.
In many cases, it is possible to approximate this mapping by The mapping from the position and orientation of the end-
the linear scaled orthographic projection. Under this model, effector to the corresponding image feature parameters can be
image coordinates for point ‘I‘ are given by computed using the projective geometry of the camera. We
will denote this mapping by F , where

F : I ---i 3. (19)
where s is a fixed scale factor.
For example, if F C g2 is the space of ? L , U image plane
Orthographic projection models are valid for scenes where
coordinates for the projection of some point P onto the image
the relative depth of the points in the scene is small compared
plane, then, assuming perspective projection, f = [U,uIT:
to the distance from the camera to the scene, for example, an
where 71 and ?I are given by (16). The exact form of (19)
airplane flying over the earth, or a camera with a long focal
will depend in part on the relative configuration of the camera
length lens placed several meters from the workspace.
and end-effector as discussed in the next section.
3) Af$ne projection: Another linear approximation to per-
spective projection is known as affine projection. In this case,
the image coordinates for the projection of a point ‘ P are E. Camera Configuration
given by Visual servo systems typically use one of two camera con-
figurations: end-effector mounted, or fixed in the workspace.
[:] = A‘P+c (1 8) The first, often called an eye-in-hand configuration, has the
camera mounted on the robot’s end-effector. Here, there exists
where A is an arbitrary 2 x 3 matrix and c is an arbitrary a known, often constant, relationship between the pose of the
2-vector. camera(s) and the pose of the end-effector. We represent this
Note that scaled orthographic projection is a special case relationship by the pose ‘x,. The pose of the target3 relative
of affine projection. Affine projection does not correspond to to the camera frame is represented by ‘ z t . The relationship
any specific imaging situation. Its primary advantage is that between these poses is shown in Fig. 2.
it is a good local approximation to perspective projection that The second configuration has the camera(s) fixed in the
accounts for both the external geometry of the camera (i.e., workspace. In this case, the camera(s) are related to the base
its position in space), and the intemal geometry of the lens coordinate system of the robot by ‘2, and to the object by
and CCD (i.e., the focal length, and scaling and offset to pixel
Jang [ 131 provides a formal definition of what we term feature parameters
coordinates). Since the model is purely linear, A and c are as image functionals.
easily computed using linear regression techniques [9], and 3The word fcirwr will be used to refer to the obiect of interest. that is. the
the camera calibration problem is greatly simplified. object that will he tracked

Authorized licensed use limited to: UNIVERSIDAD DE LAS AMERICAS PUEBLA. Downloaded on November 04,2022 at 15:21:45 UTC from IEEE Xplore. Restrictions apply.
HUTCHINSON et a/ : TUTORIAL ON VISUAL SERVO CONTROL 655

Fig. 2. Relevant coordinate frames (world, end-effector, camera and target) for end-effector mounted, and fixed, camera configurations

Fig. 3. Dynamic position-based look-and-move structure

(‘xt.In this case, the camera image of the target is, of course, 2) 1s the error signal defined in 3D (task space) coordinates,
independent of the robot motion (unless the target is the end- or directly in terms of image features?
effector itself). A variant of this is for the camera to be The resulting taxonomy, thus, has four major categories, which
agile, mounted on another robot or padtilt head in order to we now describe. These fundamental structures are shown
observe the visually controlled robot from the best vantage schematically in Figs. 3-6.
~51. If the control architecture is hierarchical and uses the vision
For either choice of camera configuration, prior to the system to provide set-point inputs to the joint-level controller,
execution of visual servo tasks, camera calibration must be thus making use of joint feedback to internally stabilize the
performed in order to determine the intrinsic camera pa- robot, it is referred to as a dynamic look-and-move system.
rameters such as focal length, pixel pitch and the principal In contrast, direct visual servo4 eliminates the robot controller
point. A fixed camera’s pose, Oxc,.with respect to the world entirely replacing it with a visual servo controller that directly
coordinate system must be established, and is encapsulated computes joint inputs, thus using vision alone to stabilize the
in the extrinsic parameters determined by a camera calibra- mechanism.
tion procedure. For the eye-in-hand case the relative pose, For several reasons, nearly all implemented systems adopt
‘xC. must be determined and this is known as the handleye the dynamic look-and-move approach. Firstly, the relatively
calibration problem. Calibration is a long standing research low sampling rates available from vision make direct control
issue in the computer vision community (good solutions to the of a robot end-effector with complex, nonlinear dynamics
calibration problem can be found in a number of references, an extremely challenging control problem. Using internal
e.g., [26]-[28]). feedback with a high sampling rate generally presents the
visual controller with idealized axis dynamics [29]. Sec-
ondly, many robots already have an interface for accepting
111. SERVOINGARCHITECTURES Cartesian velocity or incremental position commands. This
In 1980, Sanderson and Weiss [5] introduced a taxonomy of simplifies the construction of the visual servo system, and also
visual servo systems, into which all subsequent visual servo makes the methods more portable. Thirdly, look-and-move
systems can be categorized. Their scheme essentially poses separates the kinematic singularities of the mechanism from
two questions: the visual controller, allowing the robot to be considered as
1) Is the control structure hierarchical, with the vision
system providing set-points as input to the robot’s joint- 4Sanderson and Weiss used the term “visual servo” for this type of system,
but since then this term has come to be accepted as a generic description for
level controller, or does the visual controller directly any type o f visual control of a robotic system. Here we use the term “direct
compute the joint-level inputs? visual servo’’ to avoid confusion.

Authorized licensed use limited to: UNIVERSIDAD DE LAS AMERICAS PUEBLA. Downloaded on November 04,2022 at 15:21:45 UTC from IEEE Xplore. Restrictions apply.
656 IFFE TRANSACTlONS O N ROBOTICS AND AUTOMArION, VOL 12. N O 5 , OCTOBER 1996

Fig 4 Dynamic image-based look-and movc stiuLture

lrflagc
feature

Fig. 5. Position-based visual servo (PBVS) structure as per Weiss

Camera

Fig. 6. Image-based visual servo (IBVS) structure as per Weiss.

an ideal Cartesian motion device. Since many resolved rate it does present a significant challenge to controller design since
[30] controllers have specialized mechanisms for dealing with the plant is nonlinear and highly coupled.
kinematic singularities [311, the system design is again greatly One of the typical applications of visual servoing is to
simplified. In this article, we will utilize the look-and-move position an end-effector relative to a target. For example,
model exclusively. many authors use an end-effector mounted camera to position
The second major classification of systems distinguishes a robot arm for grasping. In most cases, the control algorithm
position-based control from image-based control. In position- is expressed in terms of moving the camera to a pose defined
bused control, features are extracted from the image and used in terms of the image of the object to be grasped. The position
in conjunction with a geometric model of the target and the of the end-effector relative to the object is determined only
known camera model to estimate the pose of the target with indirectly by its known kinematic relationship with the camera.
respect to the camera. Feedback is computed by reducing er- Errors in this kinematic relationship lead to positioning errors
rors in estimated pose space. In image-based servoing, control which cannot be observed by the system. Observing the
values are computed on the basis of image features directly. end-effector directly makes it possible to sense and correct
The image-based approach may reduce computational delay, for such errors. In general, there is no guarantee on the
eliminate the necessity for image interpretation and eliminate positioning accuracy of the system unless control points on
errors due to sensor modeling and camera calibration. However both the end-effector and target can be observed [9], [32],

[33]. To emphasize this distinction, we refer to systems that ways of thinking about position-based control, and that will
only observe the target object as endpoint open-loop (EOL) also provide useful comparisons when we consider image-
systems, and systems that observe both the target object and based control in the next section. Section IV-A introduces
the robot end-effector as endpoint closed-loop (ECL) systems. several simple positioning primitives, based on directly ob-
The differences between EOL and ECL systems will be made servable feature points, which can be compounded to achieve
more precise in subsequent discussions. more complex positioning tasks. Next, Section IV-B describes
It is usually possible to transform an EOL system to an positioning tasks based on the explicit estimation of the target
ECL system simply by including direct observation of the object's pose. Finally, in Section IV-C, we briefly describe
end-effector or other task-related control points. Thus, from how point position and object pose can be computed using
a theoretical perspective, it would appear that ECL systems visual information from one or more cameras-the visual
would always be preferable to EOL systems. However, since reconstruction problem.
ECL systems must track the end-effector as well as the target
object, the implementation of an ECL controller often requires A. Point-Feature Based Motions
solution of a more demanding vision problem and places
We begin by considering a positioning task in which some
field-of-view constraints on the system that cannot always be
point on the robot with end-effector coordinates, ' P , is to be
satisfied.
brought to a fixed stationing point, S. visible in the scene. We
refer to this as point-to-point positioning. In the case where the
IV. POSITION-BASED
VISUALSERVOCONTROL camera is fixed, the kinematic error function may be defined
in base coordinates as
We begin our discussion of visual servoing methods with
position-based visual servoing. As described in the previous ; " P )= X " ( " P ) - s.
E p p ( 2 ,s, (20)
section, in position-based visual servoing, features are ex-
tracted from the image and used to estimate the pose of the Here, as in the sequel, the argument before the semicolon is the
target with respect to the camera. Using these values, an error value to be controlled (in all cases, manipulator position) and
between the current and the desired pose of the robot is defined the values after the semicolon parameterize the positioning
in the task space. In this way, position-based control neatly task.
separates the control issues, namely the the computation of E,, defines a three degree of freedom kinematic constraint
the feedback signal, from the estimation problems involved in on the robot end-effector position. If the robot workspace is
computing position or pose from visual data. restricted to be 7 = @, this task can be thought of as a rigid
We now formalize the notion of a positioning task as link that fully constrains the pose of the end-effector relative
follows: to the target. When IT SE3, the constraint defines a virtual
Definition 4.1: A positioning task is represented by a func- spherical joint between the object and the robot end-effector.
tion E : I + R"'. This function is referred to as the kinematic Let I = R3. We first consider the case in which one or
error ,function. A positioning task is fulfilled with the end- more cameras calibrated to the robot base frame furnish an
effector in pose 2, if E ( 2 , ) = 0. estimate, ' S . of the stationing point coordinates with respect
If we consider a general pose x, for which the task is to a camera coordinate frame. Using the estimate of the camera
fulfilled, the error function will constrain some number, d 5 pose in base coordinates, 2'. from off-line calibration and (l),
711; degrees of freedom of the manipulator. The value d will be we have S = S,('S).
referred to as the degree of the constraint. As noted by Espiau Since 7 = R3,the control input to be computed is the
et ul. [ I l l , [34], the kinematic error function can be thought desired robot translational velocity, which we denote by u3 to
of as representing a virtual kinematic constraint between the distinguish it from the more general end-effector screw. Since
end-effector and the target. (20) is linear in xf,it is well known that in the absence of
Once a suitable kinematic error function has been defined outside disturbances, the proportional control law
and the parameters of the functions are instantiated from visual
data, a regulator is defined that reduces the estimated value of
the kinematic error function to zero. This regulator produces
at every time instant a desired end-effector velocity screw
U E $2' that is sent to the robot control subsystem. For the will drive the system to an equilibrium state in which the
purposes of this article, we use simple proportional control value of the error function is zero [35]. The value k > 0 is a
methods for linear and linearized systems to compute U [35]. proportional feedback gain. Note that we have written 3, in
Although there are formalized methods for developing such the feedback law to emphasize the fact that this value is also
control laws, since the kinematic error functions are defined subject to errors.
in Cartesian space, for most problems it i s possible to develop The expression (21) is equivalent to open-loop positioning
a regulator through geometric insight. The process is to first of the manipulator using vision-based estimates of geome-
determine the relative motion that would fulfill the task, and try. Variations on this scheme are used by [36], [37]. In
then to write a control law that would produce that motion. our simplified dynamics, the manipulator is stationary when
The remainder of the section presents various example ILJ = 0. Since the right hand side of the equation includes

problems that we have chosen to provide some insight into estimated quantities, it follows that errors in 3,, or "S

Authorized licensed use limited to: UNIVERSIDAD DE LAS AMERICAS PUEBLA. Downloaded on November 04,2022 at 15:21:45 UTC from IEEE Xplore. Restrictions apply.
658 lEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, VOL. 12, NO. 5, OCTOBER 1996

(robot kinematics, camera calibration and visual reconstruction for free-standing cameras. Similar manipulations yield
respectively) can lead to positioning errors of the end-effector.
Now, consider the situation when the cameras are mounted
on the robot and calibrated to the end-effector. In this case,
we can express (20) in end-effector coordinates for end-effector mounted cameras. Substituting the appropriate
expression for u3 or eu, from the previous discussion leads to
p E I I p ( z s;
, ; ' P ) = ' P - 'zo(S). (22) a form of proportional regulation for the Cartesian problem.
As a second example of feature-based positioning, consider
The camera(s) furnish an estimate of the stationing point, " S , that some point on the end-effector, " P , is to be brought to
which can be combined with information from the camera the line joining two fixed points S1 and S2 in the world. The
calibration and robot kinematics to produce S = (ke o shortest path for performing this task is to move " P toward the
'kc)(?,!?). w e now compute line joining SI and 5'2 along the perpendicular to the line. The
error function describing this trajectory in base coordinates is:
'213 = - k eElIp(kp; (& 0 " k C ) ( " S" )P, )
= -k(?P- ('20 0 02' 0 %)("S)) Epz(2e;Sl,SZ,"P)
= -k('P - ek.c("s)). (23) = (S2 - SI)x ( ( z r ( ' P )- S I )X (5'2 - SI)). (29)

Notice that the terms involving 2' have dropped out. Notice that although E,l is a mapping from 7 to $I3. placing a
Thus (23) is not only simpler, but positioning accuracy is point on a line is a constraint of degree 2. From the geometry of
also independent of the accuracy of the robot kinematics-a the problem and the previous discussion, we see that defining
fundamental benefit of visual servoing.
All of the above formulations presume prior knowledge of U = -kA(kr('P))+Epl(3,; gl,g2, ' P )
" P and are therefore EOL systems. To convert them to ECL
systems, we suppose that ' P is directly observed and estimated is a proportional feedback law for this problem.
by the camera system. In this case, (21) and (23) can be written Suppose that now we apply this constraint to two points on
A A
the end-effector
us = -k E p p ( d e ; k < : ( C"s 2) ,J " P ) =
) -k 2J'P - 'S)
(24)
cu3 = - k % ( " P ) )= -k ekc("P- 'S)
'Epp(k,;kc(Cs),
Epplnow defines a four degree of freedom positioning con-
(25)
straint that aligns the points on the end-effector with those
respectively. We now see that u3 (respectively ' U S ) does not in target coordinates, and again no unique motion satisfies
depend on 2, and is homogeneous in xc (respectively this kinematic error function. A geometrically straightforward
Hence, if 'S = ' P , then u3 = 0 , independent of errors in solution is to compute a translation, T , which moves ' P I to
the robot kinematics or the camera calibration. This is an the line through SI and S2. Simultaneously, we can choose
important advantage for systems where a precise camerdend- a rotation, R which rotates eP2 about eP1 so that the line
effector relationship is difficult or impossible to determine through ' P I and 'P2 becomes parallel to that through S1 and
off-line. SZ.
Consider now the full Cartesian problem where I C SE3, In order to compute a velocity screw U = ( T ,R), we first
and the control input is the complete velocity screw U E R6. note that the end-effector rotation matrix R, can be represented
Since, the error functions presented above only constrain 3 as a rotation through an angle B about an axis defined by a
degrees of freedom, the problem of computing U from the unit vector k [7]. In this case, the axis of rotation is
estimated error is under-determined. One way of proceeding
is as follows. Consider the case of free standing cameras. Then k = (S2 - SI) x [R,(eP2 - " P I ) ]
in base coordinates we know that P = 213. Using (14), we can
where the bar over expressions on the right denotes normal-
relate this to the end-effector velocity screw as follows:
ization to a unit vector. Hence, a natural feedback law for the
P = 113 = A( P ) u . (26) rotational portion of the velocity screw is
Thus, if we could "solve for" IL in the above equation, we R = -k1t. (30)
could effectively use the three-dimensional solution to arrive at
the full Cartesian solution. Unfortunately, A is not square and Note the expression on the right hand size is the zero vector
therefore can cannot be inverted to solve for U . However, recall if the lines joining associated points are parallel as we desire.
that the matrix right inverse for an m x n matrix M , n > m is The only complication to computing the translation portion
defined as M + = M T ( M M T ) - l .The right inverse computes of the vector is to realize that rotation introduces translation of
the minimum norm vector which solves the original system of points attached to the end-effector. Hence, we need to move
equations. Hence, we have " P I toward the goal line while compensating for the motion
introduced by rotation. Based on the discussion above, we
U A(P)+u~ (27) know the former is given by - E p i ( x e ;SI,5 ' 2 , ' P I ) while

from (12) the latter is simply R x ze(‘Pl).Combining these coordinate system have dropped out of the error equation.
two expressions, we have Hence, these factors do not affect the positioning accuracy
of the system.
The modifications of pose-based methods to end-effector
based systems are completely straightforward and are left for
the reader.
Note that we are still free to choose translations along the line
joining SI and S2 as well as rotations about it. Full six degree-
of-freedom positioning can be attained by enforcing another C. Estimation
point-to-line constraint using an additional point on the end-
effector and an additional point in the world. Similar geometric A key issue in position-based visual servo is the estimation
arguments can be used to define a proportional feedback law. of the quantities used to parameterize the feedback. In this
These formulations can be adjusted for end-effector regard, position-based visual servoing is closely related to the
mounted camera and can be implemented as ECL or EOL problem of recovering scene geometry from one or more cam-
systems. We leave these modifications as an exercise for the era images. This encompasses problems including structure
reader. from motion, exterior orientation, stereo reconstruction, and
absolute orientation. Unfortunately, space does not permit a
complete coverage of these topics here and we have opted
B. Pose-Based Motion
to provide pointers to the literature, except in the case of
In the previous section, positioning was defined in terms of point estimation for two cameras, which has a straightforward
directly observable point features. When working with a priori solution. A comprehensive discussion of these topics can be
known objects, it is possible to recover the pose of the object, found in a recent review article [38].
xt,and to define stationing points with respect to object pose. 1 ) Estimation with a Single Camera: As noted previously,
The methods of the previous section can be easily applied it follows from (16) that a point in a single camera im-
when object pose is available. For example, suppose t S is an age corresponds to a line in space. Although it is possible
arbitrary stationing point in a target object’s coordinate system, to perform geometric reconstruction using a single moving
and that we can compute ‘ x t using end-effenctor mounted camera, the equations governing this process are often ill-
camera(s). Then using (1) we can compute ‘5’ = ‘ 2 t ( t S ) . conditioned, leading to stability problems [38]. Better results
This estimate can be used in any of the end-effector based can be achieved if target image features have some internal
feedback methods of the previous section in both ECL and structure, or the image features come from a known object.
EOL configurations. Similar remarks hold for systems utilizing Below, we briefly describe methods for performing both point
free-standing cameras. estimation and pose estimation with a single camera assuming
Given an object pose, it is possible to directly define such information is available.
positioning tasks in terms of that object pose. Let ‘x: be a a ) Single Points: Clearly, extra information is needed in
desired stationing pose (rather than point as in the previous order to reconstruct the Cartesian coordinates of a point in
section) for the end-effector, and suppose the system employs space from a single camera projection. This may come from
free-standing cameras. We can define a positioning error additional measurable attributes, for example, in the case of
a circular opening with known diameter d the image will be
an ellipse. The ellipse can be described by five image feature
(Note that in order for this error function to be in accord parameters from which can be derived distance to the opening,
with our definition of kinematic error we must select a and orientation of the plane containing the hole.
parameterization of rotations which is 0 when the end-effector b ) Object Pose: Object pose can be estimated if the
is in the desised position.) vision system observes multiple point features on a known
Using feature information and the camera calibration, we object. This is referred to as the pose estimation problem in the
can directly estimate xt = xc o ‘ S t . If we again represent vision literature, and numerous methods for its solution have
the rotation in terms of a unit vector ‘ k, and rotation angle been proposed. These can be broadly divided into analytic
‘ H,, we can define solutions and least-squares solutions. Analytic solutions for
three and four points are given by [39]-[43], and unique
solutions exist for four coplanar, but not collinear, points.
Least-squares solutions can be found in [44]-[50]. Six or
more points always yield unique solutions and allow the
where t , is the origin of the end-effector frame in base camera calibration matrix to be computed. This can then be
coordinates. decomposed [48] to yield the target’s pose.
If we can also observe the end-effector and estimate its pose, The general least-squares solution is a nonlinear optimiza-
‘XP we can rewrite (32) as follows: tion problem which has no known closed-form solution. In-
= (‘& 0 ‘2.0)0 (OX(.0 ‘&) 0 t2, = e&. 0 e x + 0 t 2 e - . stead, iterative optimization techniques are generally em-
ployed. These techniques iteratively refine a nominal pose
Once again we see that for an ECL system, both the robot value using observed data (see [SI] for a recent review).
kinematic chain and the camera pose relative to the base Because of the sensitivity of the reconstruction process to

Authorized licensed use limited to: UNIVERSIDAD DE LAS AMERICAS PUEBLA. Downloaded on November 04,2022 at 15:21:45 UTC from IEEE Xplore. Restrictions apply.
660 IEEE TRANSACTlONS ON KOBOTICS A N D AUTOMATION, VOL. 12, NO. 5, OCTOBER 1996

noise, it is often a good idea to incorporate some type of the only unknown in the system. The corresponding least-
smoothing or averaging of the computed pose parameters, at squares problem can either be solved explicitly for rotation
the cost of some delay in response to changes in target pose. (see [56]-[58]), or solved incrementally using linearization.
A particularly elegant formulation of this updating procedure Given an estimate for rotation, the computation of translation
results by application of statistical techniques such as the is a standard linear least squares problem.
extended Kalman filter [52]. This approach has been recently
demonstrated by Wilson [53] for six DOF control of end- D. Discussion
effector pose. A similar approach was recently reported in
The principle advantage of position-based control is that
t541. it is possible to describe tasks in terms Cartesian pose as is
2) Estimation with Multiple Cameras: Multiple cameras
common in robotics. It's primary disadvantage is that feedback
greatly simplify the reconstruction process and many systems
is computed using estimated quantities that are a function of
utilizing position-based control with stereo vision from free-
the system calibration parameters. Hence, in some situations.
standing cameras have been demonstrated. For example, Allen
position-based control can become extremely sensitive to cal-
[36] shows a system that can grasp a toy train using stereo
ibration error. Endpoint closed-loop systems are demonstrably
vision. Rizzi [37] demonstrates a system which can bounce
less sensitive to calibration. However, particularly in stereo
a ping-pong ball. All of these systems are EOL. Cipolla [9]
systems, small errors in computing the orientation of the
describes an ECL system using free-standing stereo cameras.
cameras can still lead to reconstruction errors that impact the
One novel feature of this system is the use of the affine
positioning accuracy of the system.
projection model (Section II-C) for the imaging geometry.
Pose-based methods for visual servoing seem to be the most
This leads to linear calibration and control at the cost of some
generic approach to the problem, as they support arbitrary
system performance. The development of a position-based
relative position with respect to the object. An often cited
stereo eye-in-hand servoing system has also been reported
disadvantage of pose-based methods is the computation time
[551. required to solve the relative orientation problem. However
a ) Single Points: Let ax,l represent the pose of a camera
recent results show that solutions can be computed in only a
relative to an arbitrary base coordinate frame a. By inverting
few milliseconds even using iteration [51] or Kalman filtering
this transformation and combining (1) and (16) for a point
[53]. In general, given the rapid advances in microproces-
" P = [x,?/,zITwe have
sor technology, computational considerations are becoming
less of an issue in the design of visual servoing systems.
Another disadvantage of pose-based approaches is the fact
that they inherently depend on having an accurate model of
where x,y and z are the rows of '.lR, and "ta = [ t r ,C,, 1,IT. the target object-a form of calibration. Hence, feature-based
Multiplying through by the denominator of the right-hand side, approaches tend to be more appropriate to tasks where there
we have is no prior model of the geometry of the task, for example
in teleoperation applications [59]. Generally speaking, since
Al(P,)"P Z Y bl(P1). (36) feature-based methods rely on less prior information (which
where may be in error), they can be expected to perform more
robustly on comparable tasks.
Another approach to position-based visual servoing which
has not been discussed here is to use an active 3D sensor. For
Given a second camera at location ' x , ~we can compute example, active 3D sensors based on structured lighting are
now compact and fast enough to use for visual servoing. If
A2 ( p 2 )and b 2 ( p 2 )analogously. Stacking these together results
in a matrix equation the sensor is small and mounted on the robot the depth and
orientation information can be used directly for position-based
visual servoing [60]-[621.

which is an over-determined system that can be solved for V. IMAGE-BASED

CONTROL
" P .Note the same approach can be used to provide estimates As described in Section 111, in image-based visual servo
from three or more cameras. control the error signal is defined directly in terms of image
b ) Object Pose: As seen above, given two or more cam- feature parameters (in contrast to position-based methods that
eras, it is straightforward to estimate their camera relative define the error signal in the task space coordinates). Thus,
coordinates. Given observations of three or more points in we posit the following definition.
known locations with respect to an object coordinate system, Definition 5.1: An image-based visual servoing task is rep-
it is relatively straightforward to solve the absolute orientation resented by an image error function e: 3 + R'.where I k <
problem which relates camera coordinates to object coordi- and k is the dimension of the image feature parameter space.
nates. The solution is based on noting that the centroid of As described in Section II-E, the system may use either
a rigid set of points is invariant to rotation. By exploiting a fixed camera or an eye-in-hand configuration. In either
this observation, it is possible to first isolate rotation as case, motion of the manipulator causes changes to the image

Authorized licensed use limited to: UNIVERSIDAD DE LAS AMERICAS PUEBLA. Downloaded on November 04,2022 at 15:21:45 UTC from IEEE Xplore. Restrictions apply.
HIJTCkIINSON c’f d :TUTORIAL ON VISUAL SERVO CONTROL 66 I

observed by the vision system. Thus, the specification of an B. An Example Image Jacobian
image-based visual servo task involves determining an appro- Suppose that the end-effector is moving with angular ve-
priate error function e , such that when the task is achieved, locity ‘12, = [w,, w y ,w.] and translational velocity ‘T, =
e = 0. This can be done by directly using the projection [T,. T?,)TZ] (as described in Section 11-B) both with respect to
equations (16), or via a “teach by showing” approach in which the camera frame in a fixed camera system. Let P be a point
the robot is moved to a goal position and the corresponding rigidly attached to the end-effector. The velocity of the point
image is used to compute a vector of desired image feature P , expressed relative to the camera frame, is given by
parameters, f d .If the task is defined with respect to a moving
object, the error, e . will be a function, not only of the pose of ‘’P = ‘R, x ‘ P + ‘T?. (39)
the end-effector, but also of the pose of the moving object.
Although the error, e , is defined on the image parameter To simplify notation, let ‘ P = [x.y. z]*. Substituting the
space, the manipulator control input is typically defined either perspective projection equations (16) into (10) and (1 l), we
in joint coordinates or in task space coordinates. Therefore, it can write the derivatives of the coordinates of “ P in terms of
is necessary to relate changes in the image feature parameters the image feature parameters U , ? ) as
to changes in the position of the robot. The image Jacobian,
introduced in Section V-A, captures these relationships. We
.i. = Z
Cl6
W ~- -U,
x
+ Tr (40)
present an example image Jacobian in Section V-B. In Section UZ
G=-w,
V-C, we describe methods that can be used to “invert” the x - ZW, + T y (41)
z
image Jacobian, to derive the robot velocity that will produce z = -(vw, - uw,) + T,.
x (42)
the desired change in the image. Finally, in Sections V-D
and V-E we describe how controllers can be designed for Now, let f = [ U , U]*. as above and using the quotient rule,
image-based systems.
2.r. - 22.
il=A- (43)
22
A. The Image Jacobian
Let T represent coordinates of the end-effector in some
parameterization of the task space 7 and i represent the (44)
corresponding end-effector velocity (note, i is a velocity
screw, as defined in Section 11-B). Let f represent a vector of
image feature parameters and the corresponding vector of
image feature parameter rates of change’. The image Jacobian, Similarly
is a linear transformation from the tangent space of 7 at
?7?>, x u -A2 - v2 uv
r to the tangent space of .F at J. In particular ii = -T
z Y - -T, + wT + -wY
x +uw,. (46)
f = J,;(r)i- (37) Finally, we may rewrite these two equations in matrix form
to obtain
where *7,>E RRkXnL,
and

J,; ( r )= [g] = I;[ [, =

x
0
0
x
-
z
U

z
u
z
?1

x
v x2 + u2
x
UV
-
x
Recall that m, is the dimension of the task space, 7. Thus the (47)
number of columns in the image Jacobian will vary depending
on the task. which relates image-plane velocity of a point to the relative
The image Jacobian was first introduced by Weiss et al. velocity of the point with respect to the camera. Alternative
[21J, who referred to it as the feature seizsitivity matrix. It is derivations for this example can be found in a number of
also referred to as the interaction matrix [ 111 and the B matrix references including [63], [64].
[16], [17]. Other applications of the image Jacobian include It is straightforward to extend this result to the general
~101,1141, [151, ~ 4 1 . case of using k / 2 image points for the visual control by
The relationship given by (37) describes how image feature simply stacking the Jacobians for each pair of image point
parameters change with respect to changing manipulator pose. coordinates; see (48), shown at the bottom of the next page.
In visual servoing we are interested in determining the ma- Finally, note that the Jacobian matrices given in (47) and
nipulator velocity, i , required to achieve some desired value (48) are functions of z,, the distance to the point being imaged.
of i. This requires solving the system given by (37). We will For a fixed camera system, when the target is the end-effector
discuss this problem in Section V-C, but first we present an these x values can be computed using the forward kinematics
example image Jacobian. of the robot and the camera calibration information. For an
‘If the image featurc parameters are point coordinates these rates arc image eye-in-hand system, determining z can be more difficult, and
planc point velocities. this problem is discussed further in Section V-F.

Authorized licensed use limited to: UNIVERSIDAD DE LAS AMERICAS PUEBLA. Downloaded on November 04,2022 at 15:21:45 UTC from IEEE Xplore. Restrictions apply.
662 IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, VOL. 12, NO. 5 , OCTOBER 1996

C. Using the Image Jacobian to Compute i.e., there are certain components of the object motion that can
End-Effector Velocity not be observed. In this case, the appropriate pseudoinverse is
The results of the previous sections show how to relate given by
robot end-effector motion to perceived motion in a camera
image. However, visual servo control applications typically
require the reverse-computation of i given f as input. There In general, for k < m, ( I - S T J , ) # 0, and all vectors of the
are three cases that must be considered: k = 7n, k < m, and form ( I - J ; J , ) b lie in the null space of J,, and correspond to
k. > m. We now discuss each of these. those components of the object velocity that are unobservable.
When k = m, and J , is nonsingular, J,' exists. Therefore, In this case, the solution is given by (49). For example, as
in this case, r = Ji'f. Such an approach has been used shown in [64], the null space of the image Jacobian given in
by Feddema [20], who also describes an automated approach (47), is spanned by the four vectors
to image feature selection in order to minimize the condition -U -0
number of J u . U 0
When k # m : J L 1 does not exist. In this case, assuming that x 0
J , , is full rank (i.e., rank(J,,) = min(k, m ) ) ,we can compute 0 U
a least squares solution, which, in general, is given by 0 V

+
i = Jrff ( I - J ; f J , , ) b (49)
-0 -x
In some instances, there is a physical interpretation for the
where Jrf is a suitable pseudoinverse for J,, and b is an
vectors that span the null space of the image Jacobian. For
arbitrary vector of the appropriate dimension. The least squares
example, the vector [ U . v , A, 0, 0, 0IT reflects that the motion of
solution gives a value f o r i that minimizes the norm l l f - Jl,ill.
a point along a projection ray cannot be observed. The vector
We first consider the case k > m; that is, there are more
[O, 0, 0, U , v , A]' reflects the fact that rotation of a point on a
feature parameters than task degrees of freedom. By the projection ray about that projection ray cannot be observed.
implicit function theorem [65],if, in some neighborhood of
Unfortunately, not all basis vectors for the null space have
r , m 5 k: and rank(d,,) = m (i.e., J,: is full rank), we
such an obvious physical interpretation. The null space of the
can express the coordinates fTrL+1 Jllc as smooth functions
image Jacobian plays a significant role in hybrid methods, in
of f l . . ,fnL. From this, we deduce that there are k - m which some degrees of freedom are controlled using visual
redundant visual features. Typically, this will result in a set
servo, while the remaining degrees of freedom are controlled
of inconsistent equations (since the k visual features will be
using some other modality [14].
obtained from a computer vision system and are likely to be
noisy). In this case, the appropriate pseudoinverse is given by
D. Resolved-Rate Methods
J: = (J:J?,)-'J;. (50) The earliest approaches to image-based visual servo control
[lo], [21] were based on resolved-rate motion control [30],
Here, we have ( I - J:J,) = 0 (the rank of the null space of
which we will briefly describe here. Suppose that the goal of
J , is 0, since the dimension of the column space of J , : m,
a particular task is to reach a desired image feature parameter
equals rank(J ( , ) ) Therefore,
. the solution can be written more
vector, f d . If the control input is defined as in Section IV to be
concisely as
an end-effector velocity, then we have U = r , and assuming for
r = J:f. ( 5 1) the moment that the image Jacobian is square and nonsingular,

Such approaches have been used by Hashimoto [lS] and Jang

[661. If we define the error function as e ( f ) = fd - f , a simple
When k < n, the system is under-constrained. In the visual
proportional control law is given by
servo application, this implies that we are not observing
enough features to uniquely determine the object motion i,

x
- 0
A2 + up
21 x
x UlUl
__
0 -
21 x
x x2
0
2k/2 x
0
x k/2 k/2
x

Authorized licensed use limited to: UNIVERSIDAD DE LAS AMERICAS PUEBLA. Downloaded on November 04,2022 at 15:21:45 UTC from IEEE Xplore. Restrictions apply.
HUTCHlNSON et ut.: TIJTORlAL ON VISUAL SERVO CONTROL 663

where K is a constant gain matrix of the appropriate di- proof proceeds as follows. The origin of the coordinate frame
mension. For the case of a nonsquare image Jacobian, the for the left camera, together with the projections of S1 and
techniques described in Section V-C would be used to compute S2 onto the left image forms a plane. Likewise, the origin of
for U . Similar results have been presented in [14], [15]. More the coordinate frame for the right camera, together with the
advanced techniques based on optimal control are discussed projections of SI and onto the right image forms a plane.
in 1161. The intersection of these two planes is exactly the line joining
SI and S2 in the workspace. When P lies on this line, it must
E. Examde Servoinn- Tasks lie simultaneously in both of these planes, and therefore, must
be colinear with the the projections of the points S1 and S2
In this section, we revisit some of the problems introduced
in both images.
in Section IV-A and describe image-based solutions for these
We now tum to conditions that determine when the pro-
problems. In all cases, we assume two fixed cameras are
jection of P is colinear with the projections of the points SI
observing the scene.
and Sz? and will use the knowledge that three vectors are
1) Point to Point Positioning: Consider the task of bring-
coplanar if and only if their scalar triple product is zero. For
ing some point P on the manipulator to a desired stationing
the left image, let the projection of SI have image coordinates
point S. The kinematic error function was given in (20). If
two cameras are viewing the scene, a necessary and sufficient
[U;,41. the projection of Sz have image coordinates [ U ; , wk],
and the projection of P have image coordinates [d. U‘]. If the
condition for P and S to coincide in the workspace is that the
three vectors from the origin of the left camera to these image
projections of P and S coincide in each image.
points are coplanar, then the three image points are colinear.
If we let [U’, 71‘1~ and [ u rvrIT
, be the image coordinates for
Thus, we construct the scalar triple product
the projection of P in the left and right images, respectively,
then we may take f = [ , ~ ‘ , d , u ~ , v If ~ . let ‘T = R,?,
~ ]we
then in (19), F is a mapping from 7 to R4.
Let the projection of S have coordinates [U:,TI,:] and [7~:, ?I;]
in the left and right images. We then define the desired feature
and proceeding in a similar fashion for the right image, derive
vector to be f C i = [ U ; . U;,,U:, yielding from which we construct the error function

The image Jacobian for this problem can be constructed

by “stacking” (47) for each camera. Note, however, that where f = [U‘,d ,U ” ; uUTlT.It is important to note that this
a coordinate transformation must be used for each camera error is a linear projection of the image coordinates of the
in order to relate the end-effector velocity screw in camera point P , and hence the Jacobian is also a linear transformation
coordinates to the robot reference frame. of the image Jacobian for P . To make this explicit, let $;(E‘)
Unfortunately, the resulting Jacobian matrix cannot be in- denote the image Jacobian for P in the left camera. Then it
verted as it is a matrix with four rows and six columns which is follows that the image Jacobian for efr is
of rank three. This is a reflection of the fact that although two
cameras provide four measurements, the point observed has
only three degrees of freedom. Hence, one measurement value
is redundant, or equivalently the observations are constrained
to lie on a three-dimensional subspace of four-dimensional
measurement space. The constraint defining this subspace is
known as the epipolar constraint in the vision literature [67]. The derivation of the image Jacobian in the right camera is
similar. The full Jacobian is the “stack’ consisting of Jhi
There are a variety of methods for dealing with this problem.
and -7Ll multiplied with a coordinate transformation to relate
The simplest is to note that most stereo camera systems are
the end-effector velocity screw in robot coordinates to the
arranged so that the camera z (horizontal) axes are roughly
equivalent motion in the camera coordinate frame. Note that
co-planar. In this case, the redundant information is largely
given a second point on the end-effector, a four degree of
concentrated in the y (vertical) coordinates, and so one can be
freedom positioning operation can be defined by simply stack-
discarded. Doing so removes a row from the Jacobian, and the
resulting matrix has a well-defined inverse. ing the two “point-to-line” errors and their image Jacobians.
2) Point to Line Positioning: Consider again the task in Likewise, choosing yet another point in the world and on
the manipulator, and setting up an additional independent
which some point P on the manipulator end-effector is to
“point-to-line” problem yields a rigid six degree of freedom
be brought to the line joining two fixed points SI and S z in
positioning problem
the world. The kinematic error function is given by (29).
If two cameras are viewing the workspace, it can be shown
that a necessary and sufficient condition for P to be colinear F. Discussion
with the line joining S1 and 5 2 is that the projection of P It is interesting to note that image-based solutions to the
be colinear with the projections of the points S1 and szin point-to-line problem discussed above perform with an ac-
both images (for nondegenerate camera configurations). The curacy that is independent of calibration. This follows from

Authorized licensed use limited to: UNIVERSIDAD DE LAS AMERICAS PUEBLA. Downloaded on November 04,2022 at 15:21:45 UTC from IEEE Xplore. Restrictions apply.
664 IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, VOL 12, NO 5 , OCTOBER 1996

the fact that by construction, when the image error function servoing task. Hence, visual servoing pre-supposes the solution
is zero, the kinematic error must also be zero. Even if the to a set of potentially difficult static and dynamic vision
hand-eye system is miscalibrated, if the feedback system is problems. To this end many reported implementations contrive
asymptotically stable, the image error will tend to zero, and the vision problem to be simple: e.g. painting objects white,
hence so will the kinematic error. This is not the case with the using artificial targets, and so forth [lo], [14], [37], [70].Other
position-based system described in Section IV [68]. Thus, one authors use extremely task-specific clues: e.g. Allen [36] uses
of the chief advantages to image-based control over position- motion detection for locating a moving object to be grasped,
based control is that the positioning accuracy of the system is and a fruit picking system looks for the characteristic fruit
less sensitive to camera calibration errors. color. A review of tracking approaches used by researchers in
There are also often computational advantages to image- this field is given in [3].
based control, particularly in ECL configurations. For example, In less structured situations, vision has typically relied on the
a position-based relative pose solution for an ECL single- extraction of sharp contrast changes, referred to as “comers”
camera system must perform two nonlinear least squares or “edges”, to indicate the presence of object boundaries or
optimizations in order to compute the error function. The surface markings in an image. Processing the entire image to
comparable image-based system must only compute a sim- extract these features necessitates the use of extremely high-
ple image error function, an inverse Jacobian solution, and speed hardware in order to work with a sequence of images
possibly a single position or pose calculation to parameterize at camera rate. However not all pixels in the image are of
the Jacobian. In practice, as described in Section V-B, the un- interest, and computation time can be greatly reduced if only
known parameter for Jacobian calculation is distance from the a small region around each image feature is processed. Thus,
camera. Some recent papers present adaptive approaches for a promising technique for making vision cheap and tractable
estimating this depth value [16], or develop feedback methods is to use window-based tracking techniques [4], [37], [71].
which do not use depth in the feedback formulation [69]. Window-based methods have several advantages, among them:
One disadvantage of image-based methods compared to computational simplicity, little requirement for special hard-
position-based methods is the presence of singularities in the ware, and easy reconfiguration for different applications. We
feature mapping function which reflect themselves as unstable note, however, that initial positioning of each window typically
points in the inverse Jacobian control law. These instabili- presupposes an automated or human-supplied solution to a
ties are often less prevalent in the equivalent position-based potentially complex vision problem.
scheme. Returning again to the point-to-line example, the This section describes a window-based approach to tracking
Jacobian calculation becomes singular when the two stationing features in an image. The methods are capable of tracking a
points are coplanar with the optical centers of both cameras. In number of point or edge features at frame rate on a workstation
this configuration, rotations and translations of the setpoints in computer and require a framestore, no specialized image pro-
the plane are not observable. This singular configuration does cessing hardware, and have been incorporated into a publicly
not exist for the position-based solution. available software “toolkit” [4]. A discussion of methods
In the above discussion we have referred to f as the desired which use specialized hardware combined with temporal and
feature parameter vector, and implied that it is a constant. If geometric constraints can be found in [67]. The remainder of
it is a constant then the robot will move to the desired pose this section is organized as follows. Section VI-A describes
with respect to the target. If the target is moving the system how window-based methods can be used to implement fast
will endeavor to track the target and maintain relative pose, detection of edge segments, a common low-level primitive
but the tracking performance will be a function of the system for vision applications. Section VI-B describe an approach
dynamics, as discussed below in Section VII. However many based on temporally correlating image regions over time.
tasks can be described in terms of the motion of image features, VI-C describes some general issues related to the use of
for instance by aligning visual cues within the scene. Jang et temporal and geometric constraints, and Section VI-D briefly
ul. [66] describe a generalized approach to servoing on image summarizes some of the issues surrounding the choice of a
features, with trajectories specified in feature space which feature extraction method for tracking.
results in trajectories (tasks) that are independent of target
geometry. Feddema [lo] also uses a feature space trajectory
generator to interpolate feature parameter values due to the low A. Feature Bused Methods
update rate of the vision system used. Skaar et al. [IS] describe In this section, we illustrate how window-based processing
the example of a lDOF robot catching a ball by observing techniques can be used to perform fast detection of isolated
visual cues such as the ball, the arm’s pivot point, and another straight edge segments of fixed length. Edge segments are
point on the arm. The interception task can then be specified, intrinsic to applications where man-made parts contain comers
even if the relationship between camera and arm is not known or other patterns formed from physical edges.
a priori. Images are comprised of pixels organized into a two-
dimensional coordinate system. We adopt the notation 1(x,t )
to denote the pixel at location 2 [ U , TI. in an image captured
VI. IMAGE FEATURE
EXTRACTION
AND TRACKING at time k . A window can be thought of as a two-dimensional
Irrespective of the control approach used, a vision system array of pixels related to a larger image by an invertible
is required to extract the information needed to perform the mapping from window coordinates to image coordinates. We

Authorized licensed use limited to: UNIVERSIDAD DE LAS AMERICAS PUEBLA. Downloaded on November 04,2022 at 15:21:45 UTC from IEEE Xplore. Restrictions apply.
HIJTCHINSON et al TUTORlAl ON VISUAI. SERVO CONTROL 665

consider rigid transformations consisting of a translation vector the columns of W1 and along diagonals corresponding to
c = [ J ,']/1 and a rotation 8. A pixel value at x = [ U . U]'' in angles of fa.The maxima of all three curves are located
window coordinates is related to the larger image by and interpolated to yield edge orientation and position. Thus,
for the price of one window acquisition, one complete 1-
W ( z :c, 0, t ) = I ( c + B ( Q ) X , t ) (60) dimensional convolution, and three column sums, the vertical
where R is a two dimensional rotation matrix. We adopt the offset So and the orientation offset 60 can be computed. Once
conventions that x = 0 is the center of the window, and the these two values are determined, the state variables of the
set X represents the set of all values of x. acquisition window are updated as
Window-based tracking algorithms typically operate in two
stages. In the first stage, one or more windows are acquired
8+ =e- +so
using a nominal set of window parameters. The pixel values U+ = c - 60 sin(e+)
for all x E X are copied into a two-dimensional array that is v+ =?I- + SOCOS(0+).
subsequently treated as a rectangular image. Such acquisitions
can be implemented extremely efficiently using line-drawing An implementation of this method [4] has shown that
and region-fill algorithms commonly developed for graphics localizing a 20 pixel long edge using a Prewitt-style mask
applications [72].In the second stage, the windows are pro- 15 pixels wide searching k10 pixels and ~ t 1 5degrees takes
cessed to locate image features and from their parameters 1.5 ms on a Sun Sparc I1 workstation. At this rate, 22 edge
a new set of window parameters, 0 and e, are computed. segments can be tracked simultaneously at 30 Hz, the video
These parameters may be modified using external geometric frame rate used. Longer edges can be tracked at comparable
constraints or temporal prediction, and the cycle repeats. speeds by sub-sampling along the edge.
We consider an edge segment to be characterized by three Clearly, this edge-detection scheme is susceptible to mis-
parameters in the image plane: the PL and 7) coordinates tracking caused by background or foreground occluding edges.
of the center of the segment, and the orientation of the Large acquisition windows increase the range of motions that
segment relative to the image plane coordinate system. These can be tracked, but reduce the tracking speed and increase
values correspond directly to the parameters of the acquisition the likelihood that a distracting edge will disrupt tracking.
window used for edge detection. Let us first assume we have Likewise, large orientation brackets reduce the accuracy of the
correct prior values e- = ( P L - , v ) and Q- for an edge estimated orientation, and make it more susceptible to edges
segment. A window, W - (x) = W ( x ;e - , Q-, t ) . extracted that are not closely oriented to the underlying edge.
with these parameters would then have a vertical edge segment There are several ways of increasing the robustness of edge
within it. tracking. One is to include some type of additional information
Isolated step edges can be localized by determining the about the edges being tracked such as the sign or absolute
location of the maximum of the first derivative of the signal value of the edge response. For more complex edge-based
1641, [67], [731. Let c be a l-dimensional edge detection detection, collections of such oriented edge detectors can be
kernel arranged as a single row. The convolution W , ( x ) = combined to verify the location and position of the entire
( W - ~r P)(z)will have a response curve in each row which feature. Some general ideas in this direction are discussed in
peaks at the location of the edge. Summing each column Section VI-C.
of W1 superimposes the peaks and yields a one-dimensional
response curve. If the estimated orientation, 0-, was correct, B. Area-Based Methods
the maximum of this response curve determines the offset of Edge-based methods tend to work well in environments
the edge in window coordinates. By interpolating the response in which man-made objects are to be tracked. If, however,
curve about the maximum value, sub-pixel localization of the the desired feature is a specific pattern, then tracking can be
edge can be achieved. Here, e is taken to be a 1-dimensional based on matching the appearance of the feature (in terms
Prewitt operator [64] which, although not optimal from a signal of its spatial pattern of gray-values) in a series of images,
processing point of view, is extremely fast to execute on simple and exploiting its temporal consistency-the observation that
hardware. the appearance of small region in an image sequence changes
If the 0- was incorrect, the response curves in W1 will little. Such techniques are well described in image registration
deviate slightly from one another and the superposition of literature and have been applied to other computer vision
these curves will form a lower and less sharp aggregate problems such as stereo matching and optical flow.
curve. Thus, maximizing the maximum value of the aggregate Consider only windows that differ in the location of their
response curve is a way to determine edge orientation. This center. We assume some reference window was acquired at
can be approximated by performing the detection operation time t at location c. Some small time interval, 7 , later, a
on windows acquired at 8- as well as two bracketing angles candidate window of the same size is acquired at location
8- 5 a and performing quadratic interpolation on the maxima c + d . The correspondence between these two images is given
of the corresponding aggregate response curves. Computing by some similarity measure
the three oriented edge detectors is particularly simple if the
range of angles is small. In this case, a single window is O(d)= f ( ( W ( zc,; i ) ) - W ( x ;c + d ,t + 7))1u(x),
processed with the initial convolution yielding W1. Three XE'Y
aggregate response curves are computed by summing along T>O (61)

Authorized licensed use limited to: UNIVERSIDAD DE LAS AMERICAS PUEBLA. Downloaded on November 04,2022 at 15:21:45 UTC from IEEE Xplore. Restrictions apply.
666 IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, VOL. 12, NO. 5 , OCTOBER 1996

where w( is a weighting function over the image region and

8 ) Solving for d yields an estimate, d of the offset that would
f ( . ) is a scalar function. Commonly used functions include cause the two windows to have maximum correlation. We
, f ( x )= 1x1 for sum of absolute values (SAD) and f(x) = x2 then compute c+ = c- + d yielding the updated window
for sum of squared differences (SSD). location for the next tracking cycle. This is effectively a
The aim is to find the displacement, d , that minimizes O ( d ) . proportional control algorithm for the “servoing” the location
Since images are inherently discrete, a natural solution is to of an acquisition to maintain the best match with the reference
select a finite range of values D and compute window over time.
In practice this method will only work for small motions
d = min q d ) . (it is mathematically correct only for a fraction of a pixel).
drD
This problem can be alleviated by first performing the opti-
The advantage of a complete discrete search is that the mization at low levels of resolution, and using the result as a
true minimum over the search region is guaranteed to be seed for computing the offset at higher levels of resolution.
found. However, the larger the area covered, the greater For example, reducing the resolution by a factor of two
the computational burden. This burden can be reduced by
by summing groups of four neighboring pixels doubles the
performing the optimization starting at low resolution and
maximum displacement between two images. It also speeds up
proceeding to higher resolution, and by ordering the candidates the computations since fewer operations are needed to compute
in D from most to least likely and terminating the search
d for the smaller low-resolution image.
once a candidate with an acceptably low SSD value is found Another drawback of this method is the fact that it relies
[17]. Once the discrete minimum is found, the location can on an exact match of the gray values-changes in contrast
be refined to sub-pixel accuracy by interpolation of the SSD
or brightness can bias the results and lead to mistracking.
values about the minimum. Even with these improvements, Thus, it is common to normalize the images to have zero
[ 171 reports that a special signal processor is required to attain
mean and consistent variance. With these modifications, it is
frame-rate performance.
easy to show that solving (64) is equivalent to maximizing the
It is also possible to solve (61) using continuous optimiza-
correlation between the two windows [74].
tion methods [4] [74]-[761. The solution begins by expanding Continuous optimization has two principle advantages over
W ( z ; c , t in
) a Taylor series about ( c , t ) yielding discrete optimization. Firstly, a single updating cycle is usually
W ( z ;c + d, t + ?-) faster to compute. For example, (64) can be computed and
N W ( ~ ; ~ , t ) f W dx+W,(z)
z(z) dy+Wt(z)r solved in less than 5 ms on a Sparc I1 computer [4]. Secondly,
it is easy to incorporate other window parameters such as
where W , I W , and W , are respectively the horizontal and ver- rotation and scaling into the system without greatly increasing
tical spatial, and temporal derivatives of the image computed the computation time [41, [76]. Thus, SSD methods can be
using convolution as follows: used to perform template matching as well as tracking of
image regions.

C. Feature Prediction
Window-based trackmg implicitly assumes that the inter-
frame motions of the tracked feature do not exceed the size of
search window, or, in the case of continuous optimization, a
few pixels from the expected location of the image region.
Substituting into (61) yields In the simplest case, the previous location of the image
O(d)= (Wz(x)dz+W,(x) d y + W + ( 2 ) r ) 2 w ( 2 )(62)
. feature can be used as a predictor of its current location.
Xt’Y
Unfortunately, as feature velocity increases the search window
must be enlarged which adversely affects computation time.
Define The robustness and speed of tracking can be significantly
increased with knowledge about the motion of the observed
features, which may be due to the camera and/or target moving.
Expression (62) can now be written more concisely as For example, given knowledge of the image feature location
xt at time t , Jacobian J , , the end-effector velocity ut, and
O(d)= (g(2)’ d + h(2)?-)2. the inter-frame time r, the expected location of the search
XEX windows can be computed, assuming no target motion, by the
Notice 0 is now a quadratic function of d . Computing the prediction
derivatives o f 0 with respect to the components of d, setting
the result equal to zero, and rearranging yields a linear system ftt7 =f t + rJvut.
of equations: Likewise, if the dynamics of a moving object are known,
r 1 then it is possible to use this information to enhance tracking
performance. For example, Rizzi [37] describes the use of a
Newtonian flight dynamics model to make it possible to track a

Authorized licensed use limited to: UNIVERSIDAD DE LAS AMERICAS PUEBLA. Downloaded on November 04,2022 at 15:21:45 UTC from IEEE Xplore. Restrictions apply.
HUTCHINSON ef ul.: TUTORIAL ON VISIJAL SERVO CONTROL 667

-113 tracking
ping-pong ball during flight. Predictors based on (II fitting an ellipse to the edge locations. In particular, short
filters and Kalman filters have also been used [36], [53], [67]. edge segments could be located using the techniques
described in Section VI-A. Once the segments have
been fit to an ellipse, the orientation and location of the
D. Discussion segments would be adjusted for the subsequent tracking
Prior to executing or planning visually controlled motions, cycle using the geometry of the ellipse.
a specific set of visual features must be chosen. Discussion During task execution, other problems arise. The two most
of the issues related to feature selection for visual servo common problems are occlusion of features and visual singu-
control applications can be found in [20], [21]. The “right” larities. Solutions to the former include intelligent observers
image feature tracking method to use is extremely application that note the disappearance of features and continue to predict
dependent. For example, if the goal is to track a single special their locations based on previously observed motion [37], or
pattern or surface marking that is approximately planar and redundant feature specifications that can perform even with
moving at slow to moderate speeds, then area-based tracking some loss of information. Solution to the latter require some
is appropriate. It does not require special image structure (e.g. combination of intelligent path planning and/or intelligent ac-
straight lines), is robust to large set of image distortions, and quisition and focus-of-attention to maintain the controllability
for small motions can be implemented to run at frame rates. of the system.
In comparison to the edge detection methods described It is probably safe to say that fast and robust image process-
above, area-based tracking is sensitive to occlusions and ing presents the greatest challenge to general-purpose hand-eye
background changes (if the template includes any background coordination. As an effort to help overcome this obstacle, the
pixels). Thus, if a task requires tracking several occluding methods described above and other related methods have been
contours of an object with a changing background, edge-based incorporated into a publicly available software “toolkit.” The
methods are clearly faster and more robust. interested reader is referred to [4] for details.
In many realistic cases, neither of these approaches by
themselves yields the robustness and performance desired. For
example, tracking occluding edges in an extremely cluttered VII. DISCUSSION
environment is sure to distract edge tracking as “better” edges This paper has presented a tutorial introduction to robotic
invade the search window, while the changing background visual servo control, focusing on the relevant fundamentals
would ruin the SSD match for the region. Such situations call of coordinate transformations, image formation, feedback al-
for the use of more global task constraints (e.g. the geometry gorithms, and visual tracking. In the interests of space and
of several edges), more global tracking (e.g. extended contours clarity, we have concentrated on presenting methods that are
or snakes [77]), or improved or specialized detection methods. well-represented in the literature, and that can be solved using
To illustrate these tradeoffs, suppose a visual servoing task relatively straightforward techniques. The reader interested in
relies on tracking the image of a circular opening over time. a broader overview of the field or interested in acquiring more
In general, the opening will project to an ellipse in the camera. detail on a particular area is invited to consult the references
There are several candidate algorithms for detecting this ellipse we have provided. Another goal has been to establish a
and recovering its parameters: consistent nomenclature and to summarize important results
1) If the contrast between the interior of the opening and here using that notation.
area around it is high, then binary thresholding followed Many aspects of the more general problem of vision-based
by a calculation of the first and second central moments control of motion have necessarily been omitted or abbreviated
can be used to localize the feature [37]. to a great degree. One important issue is the choice between
2) If the ambient illumination changes greatly over time, using an image-based or position-based system. Many systems
but the brightness of the opening and the brightness of based on image-based and position-based architectures have
the surrounding region are roughly constant, a circular been demonstrated, and the computational costs of the two
template could be localized using SSD methods aug- approaches seem to be comparable and are easily within the
mented with brightness and contrast parameters. In this capability of modem computers. In many cases the motion
case, (61) must also include parameters for scaling and of a target, for example an object on a conveyer, is most
aspect ratio [4]. naturally expressed in a Cartesian reference frame. For this
3) The opening could be selected in an initial image, reason, most systems dealing with moving objects ([36], [37])
and subsequently located using SSD methods. This have used position-based methods. Although there has been
differs from the previous method in that this calculation recent progress in understanding image plane dynamics [22],
does not compute the center of the opening, only its the design of stable, robust image-based servoing systems for
correlation with the starting image. Although useful for capturing moving objects has not been fully explored.
servoing a camera to maintain the opening within the In general, the accuracy of image-based methods for static
field of view, this approach is probably not useful for positioning is less sensitive to calibration than comparable
manipulation tasks that need to attain a position relative position-based methods, however image-based methods re-
to the center of the opening. quire online computation of the image Jacobian. Unfortu-
4) If the contrast and background are changing, the opening nately, this quantity inherently depends on the distance from
could be tracked by performing edge detection and the camera to the target which, particularly in a monocular

Authorized licensed use limited to: UNIVERSIDAD DE LAS AMERICAS PUEBLA. Downloaded on November 04,2022 at 15:21:45 UTC from IEEE Xplore. Restrictions apply.
668 IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, VOL. 12, NO 5, OCTOBER 1996

system, is difficult to compute. Many systems utilize a constant visual recognition. In the case of the former, although path-
image Jacobian, which is computationally efficient, but valid planning is a well-established discipline, the idea of combining
only over a small region of the task space6. Other systems image space feature path-planning with visual feedback has
have resorted to performing a partial pose estimation [lo], not been adequately explored. For a simple example of visual
adaptive depth estimation [ 161, or image Jacobian estima- servoing with obstacle avoidance, see [78]. Visual recognition
tion [78]. However, both add significantly to the complexity or interpretation is also important for any visual servoing
of the system design as well as introducing an additional system that is to operate without constant human intervention.
computational load. These are but two of the many issues that the designer
This issue is further complicated when dynamics are in- of an autonomous system that is to operate in unstructured
troduced into the problem. Even when the target object is not environments must confront.
moving, it is important to realize that a visual servo system is a It is appropriate to note that despite the long history and
closed-loop discrete-time dynamical system. The sampling rate intuitive appeal of using vision to guide robotic systems, the
in such a system is limited by the frame rate of the camera, applications of this technology remain limited. To some degree
though many reported systems operate at a sub-multiple of this has been due to the high costs of the specialized hardware
the camera frame rate due to limited computational ability. and the diverse engineering skdls required to construct an
Negative feedback is applied to a plant that generally includes integrated visually controlled robot system. Fortunately the
a time delays due to charge integration time within the camera, costs of key elements such as cameras, framestores, image
serial pixel transport from the camera to the vision system, and processing hardware and computers in general, continue to
computation time for feature parameter extraction. In addition fall and appear set to do so for some time to come. Cameras
most reported visual servo systems employ a relatively low are now becoming available with performance characteristics
bandwidth communications link between the vision system such as frame rate and image resolution beyond the limiting
and the robot controller, which introduces further latency. broadcast television standards which have constrained them
Some robot controllers operate with a sample interval that for so long.
is not related to the sample rate of the vision system, and In conclusion we hope that this paper has shown that visual
this introduces still further delay. A good example of this servoing is both useful and achievable using technology that
is the common Unimate Puma robot whose position loops is readily available today. In conjunction with the cost trends
operate at a sample interval of 14 or 28 ms while vision noted above we believe that the future for visual servoing
systems operate at sample intervals of 33 or 40 ms for RS is bright and will become an important and common control
170 or CCIR video respectively [29]. It is well known that a modality for robot systems in the future.
feedback system including delay will become unstable as the
loop gain is increased. Many visual closed-loop systems are
ACKNOWLEDGMENT
tuned empirically, increasing the loop gain until overshoot or
oscillation becomes intolerable. The authors are grateful to R. Kelly and to the anonymous
Simple proportional controllers are commonly used and can reviewers for their helpful comments on an earlier version of
be shown to drive the steady state error to zero. However this this
implies nothing about performance when tracking a moving
object, which will typically exhibit pronounced image plane REFERENCES
error and tracking lag. If the target motion is constant then
111 Y. Shirai and H. Inoue, “Guiding a robot by visual fcedback in
prediction (based upon some assumption of target motion) assembling tasks,” Puttern Recognil., vol. 5. pp. 99-108, 1973.
can be used to compensate for the latency, and predictors 121 J. Hill and W. T. Park, “Real time control of a robot with a mobile
camera,” i n P m . . 9th ISIR, Washington, D.C., Mar. 1979, pp. 233-246.
based on autoregressive models, Kalman filters, 01 - /? and
[31 P. Corkc, “Visual control of robot manipulators-A review,” in I6.sual
cv - /3 - y tracking filters have been demonstrated for visual Srrvoing K. Hashimoto. Ed. Singapore: World Scientific, 1993, pp.
servoing. However when combined with a low sample rate 1-3 1. (vol. 7 of X o h o f i c . . ~und Automured Sy.c/rms).
predictors can result in poor disturbance rejection and long 141 G. D. Hagcr, “The “X-vision” system: A general purpose substrate for
real-time vision-based robotics,” in Proc. Workshop on Visionfbr Robots,
reaction time to unmodeled target motion. In order for a 1995, pp. 56-63, 1995. Also available as Yale CS-RR-1078.
visual-servo system to provide good tracking performance 151 A. C. Sanderson and L. E. Wciss, “Image-based visual servo control
using relational &rapheiTor signals,” Proc. IEEE, pp. 1074-1077, 1980.
for moving targets considerable attention must be paid to 161 J. C. Latombe, Robot Motion Plunning. Boston: Kluwer, 1991.
modeling the dynamics of the robot, the target, and vision ~ 7 1J. J. Craig, Infroduction lo Roboiics. Menlo Park: Addison-Wesley.
system and designing an appropriate control system. Other 2nd ed., 1986.
181 B. K. P. Horn, Robot Vision. Cambridge, MA: MIT Press, 1986.
issues for consideration include whether or not the vision 191 N. Hollinghurst and R. Cipolla, “Uncalibrated stereo hand eye coor-
system should “close the loop” around robot axes which can be dination,” Image und Msion Computing. vol. 12, no. 3, pp. 187-192,
1994.
position, velocity or torque controlled. A detailed discussion l1Ol J. Feddema and 0. Mitchcll, ‘Vision-guided servoing with feature-
of these dynamic issues in visual servo systems is given by based trajectory generation,” lEEE Truns. Robot. Aufomul., vol. 5 , pp.
Corke [29], [79]. 691-700, Oct. 1989.
[ I l l B. Espiau, F. Chaumette, and P. Rives, “.4ncw approach to visual
In addition to these “low-level” considerations, other issues servoing in robotics,” IEEE lruns. Rohof. Autonzut., vol. 8, pp. 3 13-326,
that merit consideration are vision-based path planning, and 1992.
1121 M. L. Cyros, “Datacube at the space shuttle’s launch pad,” Dalucuhe
‘However recent results indicate that a visual servo system will converge World Review. vol. 2, pp. 1-3, Sept. 1988. Datacube Inc., 4 Dearborn
dcspite quite significant image Jacobian errors. Road. Peabody, MA.

Authorized licensed use limited to: UNIVERSIDAD DE LAS AMERICAS PUEBLA. Downloaded on November 04,2022 at 15:21:45 UTC from IEEE Xplore. Restrictions apply.
HUTCHINSON et a1 TUTORIAL ON VISUAL SERVO CONTROL 669

113) W. Jang, K. Kim, M. Chung, and Z. Bien, “Concepts of augmented [40] R. M. Haralick, C. Lee, K. Ottenberg, and M. Nolle, “Analysis and
image space and transformed feature space for efficient visual servoing solutions of the three point perspective pose estimation problem,” in
of an “eye-in-hand robot”,” Roborica, vol. 9, pp. 203-212, 1991. Proc. IEEE Cunf Computer Vision Pattern Recognition, pp. 592-598,
[I41 A. Castano and S . A. Hutchinson, “Visual compliance: Task-directed vi- 1991.
sual servo control,” IEEE Truns. Robot. Automa/., vol. 10, pp. 334-342, [41] D. DeMenthon and L. S. Davis, “Exact and approximate solutions of the
June 1994. perspective-three-point problem,” IEEE Trans. Pattern Anal. Machine
[I51 K. Hashimoto, T. Kimoto, T. Ehine, and H. Kimura, “Manipulator Intell., no. 1 I , pp. 1100-1105, 1992.
control with image-based visual servo,” in Proc. IEEE In/’/ Conf on 1421 R. Horaud, B. Canio, and 0. Leboullenx, “An analytic solution for the
Robotics cmd Automution, 1991, pp. 2267-2272. perspective 4-point problem,” Comput. Vision Graphics, Image Process,
1161 N . P. Papanikolopoulos and P. K. Khosla, “Adaptive robot visual no. 1, pp. 33-1-4, 1989.
tracking: Theory and experiments,” IEEE Trans. Automat. Contr., vol. [43] M. Dhome, M. Richetin, J. LaprestC, and G. Rives, “Determination of
38, no. 3, pp. 429445, 1993. the attitude of 3-D objects from a single perspective view,” IEEE Trans.
1171 N. P. Papanikolopoulos, P. K. Khosla, and T. Kanade, “Visual tracking Pattern Anal. Machine Intell., no. 12, pp. 1265-1278, 1989.
of a moving target by a camera mounted on a robot: A combination 1441 G. H. Rosenfield, “The problem of exterior orientation in photogram-
of vision and control,” IEEE Trans. Robot. Automczt., vol. 9, no. I, pp. metry,” Photogrumnietric Eng., pp. 536-553, 1959.
14-35, 1993. 1451 D. G. Lowe, “Fitting parametrized three-dimensional models to im-
[ 181 S. Skaar, W. Brockman, and R. Hanson, “Camera-space manipulation,” ages,” IEEE Trans. Pattern Anal. Machine Intell., no. 5 , pp. 441450,
In/. J. Robot. Res., vol. 6 , no. 4, pp. 20-32, 1987. 1991.
1191 S. B. Skaar, W. H. Brockman, and W. S. Jang, “Three-dimensional 1461 R. Goldberg, “Constrained pose refinement of parametric objects,” Int.
camcra spacc manipulation,” Irzt. J. Robol. Res., vol. 9, no. 4, pp. 22-39, J. Comput. Vision, no. 2, pp. 181-211, 1994.
1990. [47] R. Kumar, “Robust methods for estimating pose and a sensitivity
(201 J. T. Feddema, C. S . G. Lee, and 0. R. Mitchell, “Weighted selection of analysis,” CVGIP: Image Understanding, no. 3, pp. 313-342,
image features for resolved rate visual feedback control,” IEEE Trans. 1994.
Robot. Automat., vol. 7, pp. 31-47, Feb. 1991. 1481 S. Ganapathy, “Decomposition of transformation matrices for robot
1211 A. C. Sanderson, L. E. Weiss, and C. P. Neuman, “Dynamic sensor- vision,” Pattern Recog. Lezt., pp. 4 0 1 4 1 2 , 1989.
based control of robots with visual feedback,” [EEE Truns. Robot. 1491 M. Fischler and R. C. Bolles, “Random sample consensus: A paradigm
Auroniat., vol. RA-3, pp. 404-417, Oct. 1987. for model fitting and automatic cartography,” Commun. ACM, no. 6,
1221 M. Lei and B. K. Ghosh, “Visually-guided robotic motion tracking,” pp. 381-395, 1981.
in Pror.. Thirtieth Annu. Allerton Con/: on Communicution, Control, and [SO] Y. Liu, T. S. Huang, and 0. D. Faugeras, “Determination of camera
location from 2-D to 3-D line and point correspondences,” IEEE Trans.
Computing, 1992, pp. 712-721.
Pat. Anal. Machine Intell., no. 1, pp. 28-37, 1990.
[23] R. L. Andersson, A Robot Ping-Pong Player. Experiment in Real-Time
lntellipxt Control. Cambridge, MA: MIT Press, 1988. [SI] C. Lu, E. J. Mjolsness, and G. D. Hager, “Online computation of exterior
orientation with application to hand-eye calibration,” DCS RR- 1046,
1241 B. Yoshimi and P. K. Allen, “Active, uncalibrated visual servoing,”
in Proc. IEEE In/. Conf: on Rohotica and Automation, San Diego, CA,
Yale University, New Haven, CT, Aug. 1994; To appear in Mathematical
and Compuler Modeling.
May 1994, pp. 156-161.
[52] A. Gelb, Ed., Applied Optimal Estimation. Cambridge, MA: MIT
1251 B. Nelson and P. K. Khosla, “Integrating sensor placement arid vi-
Press, 1974.
sual tracking strategies,” in Pmc. IEEE Int ’1 Con$ on Robozics and
1.531 W. Wilson, “Visual servo control of robots using Kalman filter estimates
Automation, 1994, pp. 1351-1356.
of robot pose relative to work-pieces,” in Visual Sewoing, K. Hashimoto,
1261 I. E. Sutherland, “Three-dimensional data input by tablet,” Proc. IEEE,
Ed. Singapore: World Scientific, 1994, pp. 71-104.
vol. 62, pp. 453461, Apr. 1974. 1.541 C. Fagerer, D. Dickmanns, and E. Dickmanns, “Visual grasping with
1271 R . Tsai and R. Lenz, “A new technique for fully autonomous and effi- long delay time of a free floating object in orbit,” Anton. Robots, vol.
cient 3D robotics handeye calibi-a tion.” IEEE Truns. Robot. Automat.,
1, no. I , 1994.
vol. 5 , pp. 345-358, June 1989. [SS] J. Pretlove and G. Parker, “The development of a real-time stereo-vision
1281 R. Tsai, “A versatile camera calibration technique for high accuracy 3D system to aid robot guidance in carrying out a typical manufacturing
machine vision m etrology using off-the-shclf TV cameras and lenses,” task,” in Proc. 22nd ISRR, Detroit, MI, 1991, pp. 21.1-21.23.
IEEE Trans. Robot. Automat., vol. pp. 323-344, Aug. 1987. [56] B. K. P. Horn, H. M. Hilden, and S. Negahdaripour, “Closed-form
1291 P. 1. Corke, High-Peiformance u ~ Closed-Lmp
l Robot Control. solution of absolute orientation using orthonormal matrices,” J. Opt.
Ph.D. Dissertation, Univcrsity of Melbourne, Dept. Mechanical and Soc. Amer., vol. A-5, pp. 1127-1135, 198.
Manufacturing Engineering, July 1994. [57] K. S . Arun, T. S. Huang, and S. D. Blostein, “Least-squares fitting of
[30] D. E. Whitney, “The mathematics of coordinated control of’ prosthetic two 3-D point sets,” IEEE Trans. Pattern Anal. Machine Intell., vol. 9,
arms and manipulators,” J. Dyn. Svst., Meas. Control, vol. 122, pp. pp. 698-700, 1987.
303-309, Dec. 1972. [58] B. K. P. Horn, “Closed-form solution of absolute orientation using unit
1311 S. Chieaverini, L. Sciavicco, and B. Siciliano, “Control of robotic quaternions,” J. Opt. Soc. Amer., vol. A-4, pp. 629-642, 1987.
systems through singularities,” in Proc. Inf. Workshop on Nonlineur [59] G. D. Hager, G. Grunwald, and G. Hirzinger, “Feature-based visual
and Aduptive Con/rol: I.s.sue.s in Robotics C. C. de Wit, Ed. Berlin: servoing and ita application to telerobotics,” in Proc. IEEWRSJ Int. Con$
Springer-Verlag, 199 I. on lntellijient Robo!s and Systems, Jan. 1994, pp. 1644171.
-1321- S. Wiiesoma, D. Wolfe, and R. Richards, “Eye-to-hand coordination for [60] G. Agin, “Calibration and use of a light stripe range scnsor mounted
vision-guided robot control applications,” Int. J. Robot. Re.s., vol. 12, on the hand of a robot,” in Proc. IEEE In/. Conf: on Robotics und
no. 1, pp. 65-78, 1993. Automution, 1985, pp. 680-685.
[33] G. D. Hager, W.-C. Chang, and A. S. Morse, “Robot hand-eye coordi- 1611 S. Venkatesan and C. Archibald, “Realtime tracking in five degrees of
nation based on stereo vision,” I Control Sy.s/. Mag., vol. 15, pp. freedom using two wrist-mounted laser
30-39, Feb. 1995. Int. Conj! on Robotics and Automation, 1990, pp.
[34] C. Samson, M. Le Borgne, and B. Espiau, Robot Control: The Task [62] J. Dietrich, G. Hirzinger, B. Gombert,
Function Approuch. Oxford, England: Clarendon, 1992. concept for a new generation of light-w
1351 G. Franklin, J. Powell, and A. Emami-Naeini, Feedback Control of Robotics 1, V. Hayward and 0. Khatib, eds., Berling, Germany:
Dynamic Systems. Boston, MA: Addison-Wesley, 2nd ed., 1991. Springer-Verlag, 1989, pp. 287-295. (vol. 139 of Lecture Notes in
[36] P. K. Allen, A. Timcenko, B. Yoshimi, and P. Michelman, “Automated Control and Information Sciences).
tracking and grasping of a moving object with a robotic hand-eye [63] J. Aloimonos and D. P. Tsakiris, “On the mathematics of visual
system,” IEEE Trclns. Robot. Automat., vol. 9, no. 2, pp. 152-165, 1993. tracking,” Image and Vision Computing, vol. 9, pp. 235-251, Aug. 1991.
[37] A. Rizzi and D. Koditschek, “An active visual estimator for dexterous [64] R. M. Haralick and L. G. Shapiro, Computer and Robot Vision. Read-
manipulation,” in Proc. IEEE Int. Con$ on Robotics und AUt<JmUtkJn, ing, MA: Addison-Wesley, 1993.
1994. [65] F. W. Warner, Foundations uf Differentiublr Manifolds and Lie Groups.
1381 T. S. Huang and A. N. Netravali, “Motion and structure from feature New York: Springer-Verlag, 1983.
correspondences: A review,” Proc. IEEE, vol. 82, no. 2, pp. 252-268, [66] W. Jang and Z. Bien, “Feature-based visual servoing of an eye-in-hand
1994. robot with improved tracking performance,” in Proc. IEEE Int. Con$ on
1391 M. A. Fischler and R. C. Bolles, “Random sample consensus: a paradigm Robotics and Automation, 199 I , pp. 2254-2260.
for model fitting with applicatio ns to image analysis and automated [67] 0. Faugeras, Three-Dimensional Computer Vision. Cambridge, MA:
cartography,” Conzmun. ACM, vol. 24, pp. 381-395, June 1981. MIT Press, 1993.

Authorized licensed use limited to: UNIVERSIDAD DE LAS AMERICAS PUEBLA. Downloaded on November 04,2022 at 15:21:45 UTC from IEEE Xplore. Restrictions apply.
670 lEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, VOL. 12, NO 5, OCTOBER 1996

[68] G. D.Hager, “Calibration-free visual control using projective invari- Seth Hutchinson (S’85-M’88), for a photograph and biography, see p. 650
ance,” in Proc. ICCV, pp. 1009-1015, 1995. Also available as Yale of this issue.
CS-RR-1046.
[69] D. Kim, A. Rizzi, G. D.Hager, and D. Koditschek, “A “robust”
convergent visual servoine - svstem.”
, in Proc. IEEE/RSJ Int. Conf on
Intelligent Robots and Systems, 1995, vol. I, pp. 348-353. Gregory D. Hager (S’85-M’SS), for a photograph and biography, see p. 650
[70] R. L. Anderson, “Dynamic sensing in a ping-pong playing robot,” IEEE of this issue,
Trans. Robot. Automat.. vol. 5. no. 6. DU. 723-739. 1989.
I I L

[71] E. Dickmanns and V. Graefe, “Dynamic monocular machine vision,’’

Mach. Vis. Applicat., vol. 1, pp. 223-240, 1988.
[72] J. Foley, A. van Dam, S. Feiner, and J. Hughes, Computer Graphics.
Reading, MA: Addison-Wesley, 1993. Peter 1. Corke (S’82-M’83 received the B E
[731 D. Ballard and C. Brown, Computer Vision. Englewood Cliffs, NJ: (Elec), the M.Eng.’Sc. and Ph.D. degrees from the
Prentice-Hall, 1982. University of Melbourne, Australia, where he also
[74] B. D. Lucas and T. Kanade, “An iterative image registration technique lectured in Electrical Engineering.
with an application to stereo vision,” in Proc. Int. Joint Con$ on Artificzul He is a Principal Research Scientist within the In-
Intelligence, 1981, pp. 674-679. dustrial Automation program of the CSIRO Division
[75] P. Anandan, “A computational framework and an algorithm for the of Manufacturing Technology, Kenmore, Australia.
measurement of structure from motion,” h t . J. Coinput. Vis., vol. 2, As a CSIRO overseas fellow (1988-1989) he visited
pp. 283-310, 1989. the GRASP laboratory at the University of Penn-
[76] J. Shi and C. Tomasi, “Good features to track,” in Proc. IEEE Computer sylvania, Philadelphia. At CSIRO, he has worked
Society Con$ on Computer Vision and Pattern Recognition, 1994, pp. extensivelv in the areas of robotics. control. and
593-600. high-speed machine vision. Projects have included robot control architectures,
[77] M. Kass, A. Witkin, and D. Terzopoulos, “Snakes: active contour force controlled deblurring, development of the APA512 (a hardware unit for
models,” Int. J. Comput. Vis., vol. I , no. 1, pp. 321-331, 1987. video rate region analysis of binary images) and applications of high-speed
[78] K. Hosada and M. Asada, ‘Versatile visual servoing without knowledge machine vision to the food processing, traffic monitoring and other sectors.
of true jacobian,” Proc. IROS 94, Sept. 1994, pp. 186-191. More recently his research interests have included high-performance visually
[79] P. Corke and M. Good, “Dynamic effects in visual closed-loop systems,” guided robot motion, and the application of robotic and vision technology to
IEEE Trans. Robot. Automat., this issue, pp. 671-683. the automation of mining equipment.

Authorized licensed use limited to: UNIVERSIDAD DE LAS AMERICAS PUEBLA. Downloaded on November 04,2022 at 15:21:45 UTC from IEEE Xplore. Restrictions apply.

Lecture 4 Robotic Sensors
No ratings yet
Lecture 4 Robotic Sensors
15 pages
Lecture41-IntelligentControl-VisualServoing-I
No ratings yet
Lecture41-IntelligentControl-VisualServoing-I
14 pages
Visual Servo Control Theory
No ratings yet
Visual Servo Control Theory
42 pages
Extended_Abstract (1)
No ratings yet
Extended_Abstract (1)
10 pages
Instruction Set Architecture
No ratings yet
Instruction Set Architecture
28 pages
Visual Servoing IEEE Paper
No ratings yet
Visual Servoing IEEE Paper
20 pages
Credit Card Fraud Detection Using State-Of-The-Art Machine Learning and Deep Learning Algorithms
No ratings yet
Credit Card Fraud Detection Using State-Of-The-Art Machine Learning and Deep Learning Algorithms
16 pages
Image Space Trajectory Tracking of 6-DOF Robot Manipulator in Assisting Visual Servoing
No ratings yet
Image Space Trajectory Tracking of 6-DOF Robot Manipulator in Assisting Visual Servoing
18 pages
Active Vision in Robotic Systems A Survey of Recent Developments
No ratings yet
Active Vision in Robotic Systems A Survey of Recent Developments
36 pages
2023 Sivos+HCR VS
No ratings yet
2023 Sivos+HCR VS
148 pages
Xiaomi Jan
No ratings yet
Xiaomi Jan
2 pages
GSTD 2022-196
No ratings yet
GSTD 2022-196
11 pages
2012_cherubini_IJRR_HAL
No ratings yet
2012_cherubini_IJRR_HAL
38 pages
1 (5)
No ratings yet
1 (5)
3 pages
Visual Behaviors for Docking
No ratings yet
Visual Behaviors for Docking
16 pages
Classification and Realization of the Different Vision Based Tasks
No ratings yet
Classification and Realization of the Different Vision Based Tasks
29 pages
s00170-013-5138-z
No ratings yet
s00170-013-5138-z
14 pages
Big Data Framework
No ratings yet
Big Data Framework
3 pages
Image-Based Visual Servoing Techniques For Robot Control
No ratings yet
Image-Based Visual Servoing Techniques For Robot Control
6 pages
Lenovo m920q
No ratings yet
Lenovo m920q
68 pages
Robotvision 161023045752
No ratings yet
Robotvision 161023045752
36 pages
10 1016@j Compeleceng 2020 106580
No ratings yet
10 1016@j Compeleceng 2020 106580
19 pages
Versaart Re640
No ratings yet
Versaart Re640
140 pages
Thesis Final Copy Harish (2)
No ratings yet
Thesis Final Copy Harish (2)
62 pages
Enriching BERT With Knowledge Graph Embeddings For Document Classification
No ratings yet
Enriching BERT With Knowledge Graph Embeddings For Document Classification
8 pages
Regionalization PDF
No ratings yet
Regionalization PDF
15 pages
A_Hierarchical_Data-Driven_Predictive_Control_of_Image-Based_Visual_Servoing_Systems_With_Unknown_Dynamics
No ratings yet
A_Hierarchical_Data-Driven_Predictive_Control_of_Image-Based_Visual_Servoing_Systems_With_Unknown_Dynamics
14 pages
Eng SS 408-8623 C
No ratings yet
Eng SS 408-8623 C
4 pages
Factorization Machines Steffen Rendle Osaka University 2010
No ratings yet
Factorization Machines Steffen Rendle Osaka University 2010
6 pages
Robotic Vision Sensors and Vision System
No ratings yet
Robotic Vision Sensors and Vision System
11 pages
365372400R5.1 - V1 - Alcatel-Lucent 1850 TSS-5
No ratings yet
365372400R5.1 - V1 - Alcatel-Lucent 1850 TSS-5
474 pages
Exp 2-1
No ratings yet
Exp 2-1
23 pages
VOIP Exercise in Packet Tracer
No ratings yet
VOIP Exercise in Packet Tracer
18 pages
Global Uncalibrated Visual Servoing For
No ratings yet
Global Uncalibrated Visual Servoing For
9 pages
Opoku Daniel Dartey - pg7366621 - Instrumentation and Control
No ratings yet
Opoku Daniel Dartey - pg7366621 - Instrumentation and Control
19 pages
17 VisualServoing
No ratings yet
17 VisualServoing
53 pages
06614580
No ratings yet
06614580
6 pages
2 1 2D Visual Servoing
No ratings yet
2 1 2D Visual Servoing
13 pages
Handbook On Visual Servoing and Visula Tracking
No ratings yet
Handbook On Visual Servoing and Visula Tracking
28 pages
GTSD 2022 Paper 196
No ratings yet
GTSD 2022 Paper 196
11 pages
Cyb 150 Week 2 Flash Card Questions and Answers
No ratings yet
Cyb 150 Week 2 Flash Card Questions and Answers
2 pages
Vision Assisted Pick and Place Robotic Arm Guided by Image Processing Concepts For Object Sorting System
No ratings yet
Vision Assisted Pick and Place Robotic Arm Guided by Image Processing Concepts For Object Sorting System
28 pages
The Internet Basics
No ratings yet
The Internet Basics
29 pages
History of Control in Robotics
No ratings yet
History of Control in Robotics
5 pages
Adaptive Output-Feedback Image-Based Visual Servoing For Quadrotor Unmanned Aerial Vehicles
No ratings yet
Adaptive Output-Feedback Image-Based Visual Servoing For Quadrotor Unmanned Aerial Vehicles
8 pages
Visual Servoing in Robotics
No ratings yet
Visual Servoing in Robotics
2 pages
Case Study - FR
No ratings yet
Case Study - FR
6 pages
Isie 2012
No ratings yet
Isie 2012
6 pages
Visual Servoing Mobile Robot
No ratings yet
Visual Servoing Mobile Robot
11 pages
Homography-Based Visual Servo Tracking Control of A Wheeled Mobile Robot
No ratings yet
Homography-Based Visual Servo Tracking Control of A Wheeled Mobile Robot
6 pages
E1605 Assignment Problem
No ratings yet
E1605 Assignment Problem
14 pages
Konfiguration Movilizer MFS Solution
No ratings yet
Konfiguration Movilizer MFS Solution
3 pages
Conveyor Visual Tracing
No ratings yet
Conveyor Visual Tracing
5 pages
Eye-In-Hand / Eye-To-Hand Cooperation For Visual Servoing
No ratings yet
Eye-In-Hand / Eye-To-Hand Cooperation For Visual Servoing
6 pages
A.C. Wall Transformers
No ratings yet
A.C. Wall Transformers
3 pages
Unit 4 Robot Control and Observer Scheme
No ratings yet
Unit 4 Robot Control and Observer Scheme
41 pages
Simulating: Pursuit With Machines
No ratings yet
Simulating: Pursuit With Machines
6 pages
Machines That Can See
No ratings yet
Machines That Can See
7 pages
Control Systems and Vision in Robotics
From Everand
Control Systems and Vision in Robotics
Ashwin Hegde
No ratings yet
Survey of Vision-Based Robot Control: Ezio Malis INRIA, Sophia Antipolis, France, Ezio - Malis@
No ratings yet
Survey of Vision-Based Robot Control: Ezio Malis INRIA, Sophia Antipolis, France, Ezio - Malis@
16 pages
Conveyor Visual Tracking Using Robot Vision: April 2006
No ratings yet
Conveyor Visual Tracking Using Robot Vision: April 2006
6 pages
IJETR021664
No ratings yet
IJETR021664
5 pages
Vision Systems. Applications-I-Tech (2007)
No ratings yet
Vision Systems. Applications-I-Tech (2007)
616 pages
1z0-1046-22_ExamKiller
No ratings yet
1z0-1046-22_ExamKiller
23 pages
EXPERIMENT NO 2 - Resisotor Color Coding and Use of Ohmmeter
100% (1)
EXPERIMENT NO 2 - Resisotor Color Coding and Use of Ohmmeter
6 pages
Activity10 Vision
No ratings yet
Activity10 Vision
3 pages
Visual Control With Adaptive Dynamical Compensation For 3D Target Tracking
No ratings yet
Visual Control With Adaptive Dynamical Compensation For 3D Target Tracking
12 pages
Template Tracking and Visual Servoing For Alignment Tasks With Autonomous Underwater Vehicles
No ratings yet
Template Tracking and Visual Servoing For Alignment Tasks With Autonomous Underwater Vehicles
6 pages
Mi TV Bar: Accessories
No ratings yet
Mi TV Bar: Accessories
5 pages
Frontend Projects
No ratings yet
Frontend Projects
3 pages
Visual Perception in Hexapod Robot: Volume 1, Issue 3, September - October 2012
No ratings yet
Visual Perception in Hexapod Robot: Volume 1, Issue 3, September - October 2012
5 pages
Fire Detecting Robots PPT Seminars
50% (2)
Fire Detecting Robots PPT Seminars
17 pages
Vision Guided Robotics
No ratings yet
Vision Guided Robotics
13 pages
Hi-Scan 100100v-2is: Heimann X-Ray Technology New: 160 KV X-Ray Source - Typical Steel Penetration 37 MM
No ratings yet
Hi-Scan 100100v-2is: Heimann X-Ray Technology New: 160 KV X-Ray Source - Typical Steel Penetration 37 MM
2 pages
BR SN3000 Series
No ratings yet
BR SN3000 Series
9 pages
Algorithms and Parallel Computing: Dr. Fayez Gebali, P.Eng
No ratings yet
Algorithms and Parallel Computing: Dr. Fayez Gebali, P.Eng
17 pages
Sony Klv-32s400a Chassis Eg1l-Ga
80% (5)
Sony Klv-32s400a Chassis Eg1l-Ga
81 pages
Excel Weekly Status Report Template
No ratings yet
Excel Weekly Status Report Template
3 pages
Vizio Gv47l Fhdtv10a Service Manual
67% (3)
Vizio Gv47l Fhdtv10a Service Manual
164 pages
Optical Braille Recognition: Empowering Accessibility Through Visual Intelligence
From Everand
Optical Braille Recognition: Empowering Accessibility Through Visual Intelligence
Fouad Sabry
No ratings yet
Solutions Book Activity 6.1 To 6.7
No ratings yet
Solutions Book Activity 6.1 To 6.7
11 pages
Industrial Robotics
100% (2)
Industrial Robotics
47 pages
Crabtree Switches Combined List Price 1st DEC 2020 - Final - V2
No ratings yet
Crabtree Switches Combined List Price 1st DEC 2020 - Final - V2
28 pages
Prash - ROBOTIC VISION
100% (1)
Prash - ROBOTIC VISION
38 pages
Computer Vision: Exploring the Depths of Computer Vision
From Everand
Computer Vision: Exploring the Depths of Computer Vision
Fouad Sabry
No ratings yet
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
From Everand
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
Fouad Sabry
No ratings yet
Computer Vision: Fundamentals and Applications
From Everand
Computer Vision: Fundamentals and Applications
Fouad Sabry
No ratings yet
Object Detection: Advances, Applications, and Algorithms
From Everand
Object Detection: Advances, Applications, and Algorithms
Fouad Sabry
No ratings yet
Unit 6 Industrial Robotics
No ratings yet
Unit 6 Industrial Robotics
74 pages
Robotics by RK Mittal PDF
63% (27)
Robotics by RK Mittal PDF
92 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

A Tutorial On Visual Servo Control

Uploaded by

A Tutorial On Visual Servo Control

Uploaded by

IEEE TRANSACTIONS O N ROROTLCS AND AUTOMATION, VOL. 12, NO.

A Tutorial on Visual Servo Control

T HE VAST majority of today’s growing robot population

e The coordinate frame attached to the robot

B. The Velocity of a Rigid Object P = A(2R(CP)i. (15)

D. Image Features and the Image Feature Parameter Space

Fig. 3. Dynamic position-based look-and-move structure

Fig 4 Dynamic image-based look-and movc stiuLture

Fig. 5. Position-based visual servo (PBVS) structure as per Weiss

Fig. 6. Image-based visual servo (IBVS) structure as per Weiss.

which is an over-determined system that can be solved for V. IMAGE-BASED

J,; ( r )= [g] = I;[ [, =

Such approaches have been used by Hashimoto [lS] and Jang

The image Jacobian for this problem can be constructed

where w( is a weighting function over the image region and

[71] E. Dickmanns and V. Graefe, “Dynamic monocular machine vision,’’

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.