A Tutorial On Visual Servo Control
A Tutorial On Visual Servo Control
5 , OCTOBER 1996 65 I
Abstract-This article provides a tutorial introduction to visual the resulting operation depends directly on the accuracy of the
servo control of robotic manipulators.Since the topic spans many visual sensor and the robot end-effector.
disciplines our goal is limited to providing a basic conceptual An alternative to increasing the accuracy of these subsys-
framework. We begin by reviewing the prerequisite topics from
robotics and computer vision, including a brief review of coordi- tems is to use a visual-feedback control loop that will increase
nate transformations, velocity representation, and a description the overall accuracy of the system-a principal concern in
of the geometric aspects of the image formation process. We then most applications. Taken to the extreme, machine vision
present a taxonomy of visual servo control systems. The two can provide closed-loop position control for a robot end-
major classes of systems, position-basedand image-basedsystems, effector-this is referred to as visual servoing. This term
are then discussed in detail. Since any visual servo system must
be capable of tracking image features in a sequence of images, we appears to have been first introduced by Hill and Park [2]
also include an overview of feature-based and correlation-based in 1979 to distinguish their approach from earlier “blocks
methods for tracking. We conclude the tutorial with a number world” experiments where the system alternated between
of observations on the current directions of the research field of picture taking and moving. Prior to the introduction of this
visual servo control.
term, the less specific term visual feedback was generally used.
For the purposes of this article, the task in visual servoing is to
I. INTRODUCTION use visual information to control the pose of the robot’s end-
Authorized licensed use limited to: UNIVERSIDAD DE LAS AMERICAS PUEBLA. Downloaded on November 04,2022 at 15:21:45 UTC from IEEE Xplore. Restrictions apply.
652 IEEE TRANSACTIONS ON ROBOTICS AND AIJTOMATION, VOL 12, NO 5 , OCTOBER 1996
To assist newcomers to the field we will describe techniques m = 6. In some applications, the task space may be restricted
which require only simple vision hardware (just a digitizer), to a subspace of SE3. For example, for pick and place, we
freely available vision software [4], and which make few may consider pure translations (7 = iK3; for which m = 3 ) :
assumptions about the robot and its control system. This is while for tracking an object and keeping it in view we might
sufficient to commence investigation of many applications consider only rotations (7 = SO3, for which m = 3 ) .
where high control andor vision performance are not required. Typically, robotic tasks are specified with respect to one or
One of the difficulties in writing such an article is that more coordinate frames. For example, a camera may supply
the topic spans many disciplines that cannot be adequately information about the location of an object with respect to
addressed in a single article. For example, the underlying a camera frame, while the configuration used to grasp the
control problem is fundamentally nonlinear, and visual recog- object may be specified with respect to a coordinate frame
nition, tracking, and reconstruction are fields unto themselves. attached to the object. We represent the coordinates of point
Therefore we have concentrated on certain basic aspects of P with respect to coordinate frame .?: by the notation " P .
each discipline, and have provided an extensive bibliography Given two frames, z and y 7 the rotation, matrix that represents
to assist the reader who seeks greater detail than can be the orientation of frame y with respect to frame x is denoted
provided here. Our preference is always to present those by 3"?,. The location of the origin of frame y with respect to
ideas and techniques that we have found to function well frame IC is denoted by the vector V,. Together, the position
in practice and that have some generic applicability. Another and orientation of a frame specify a pose, which we denote by
difficulty is the current rapid growth in the vision-based motion "zy. If the leading superscript, 2 , is not specified, the world
control literature, which contains solutions and promising coordinate frame is assumed.
approaches to many of the theoretical and technical problems We may also use a pose to specify a coordinate transforma-
involved. Again we have presented what we consider to be tion. We use function application to denote applying a change
the most fundamental concepts, and again refer the reader to of coordinates to a point. In particular, if we are given YP (the
the bibliography. coordinates of point P relative to frame y), and we obtain
The remainder of this article is structured as follows. the coordinates of P with respect to frame R' by applying the
Section I1 reviews the relevant fundamentals of coordinate coordinate transformation rule
transformations, pose representation, and image formation. In
Section 111, we present a taxonomy of visual servo control
systems (adapted from [5]). The two major classes of systems,
position-based visual servo systems and image-based visual
servo systems, are then discussed in Sections IV and V
In the sequel, we will use the notation 2xy to refer either
respectively. Since any visual servo system must be capable to a coordinate transformation, or to a pose that is specified
of tracking image features in a sequence of images, Section VI
by a rotation matrix and translation, and Tt,. respec-
describes some approaches to visual tracking that have found tively. Likewise, we will use the terms pose and coordinate
wide applicability and can be implemented using a minimum
transformation interchangeably. In general, there should be no
of special-purpose hardware. Finally, Section VI1 presents a
ambiguity between the two interpretations of z:zy'.
number of observations regarding the current directions of the
Often, we must compose multiple coordinate transforma-
research field of visual servo control.
tions to obtain a desired change of coordinates. For example,
suppose that we are given poses sxv and Yx,. If we are given
AND DEFINITIONS
11. BACKGROUND "E' and wish to compute "'17,we may use the composition of
In this section we provide a very brief overview of some coordinate transformations
topics from robotics and computer vision that are relevant to
visual servo control. We begin by defining the terminology and
notation required to represent coordinate transformations and
the velocity of a rigid object moving through the workspace
(Sections 11-A and 11-B). Following this, we briefly discuss
several issues related to image formation (Sections 11-C and
11-D), and possible camerdrobot configurations (Section II-
E). The reader who is familiar with these topics may wish to As seen here, we represent the composition of coordinate
proceed directly to Section 111. transformations by 'x, = "x, o Yx,, and the corresponding
coordinate transformation of the point "Pby ( r x yo v x z ) ( " P ) .
The corresponding rotation matrix and translation are given by
A. Coordinate Transformations
In this paper, the task space of the robot, represented by I;
is the set of positions and orientations that the robot tool can
attain. Since the task space is merely the configuration space of
the robot tool, the task space is a smooth m-manifold (see, e.g.,
'We have not used more common notations based on homogeneous
[6]). If the tool is a single rigid body moving arbitrarily in a transforms because over parametcrizing points makes it difficult to develop
three-dimensional workspace, then 1= SE3 = $?3 x SO3, and somc of the machinery nccded for control.
Authorized licensed use limited to: UNIVERSIDAD DE LAS AMERICAS PUEBLA. Downloaded on November 04,2022 at 15:21:45 UTC from IEEE Xplore. Restrictions apply.
HUTCHINSON et al.: TUTORIAL ON VISUAL SERVO CONTROL 653
Some coordinate frames that will be needed frequently are Together, T and R define what is known in the robotics
referred to by the following superscripts/subscripts: literature as a velocity screw
Authorized licensed use limited to: UNIVERSIDAD DE LAS AMERICAS PUEBLA. Downloaded on November 04,2022 at 15:21:45 UTC from IEEE Xplore. Restrictions apply.
654 IEEE TRANSACTIONS O N ROBOTICS AND AUTOMATION, VOL 12, NO 5 , OCTOBER 1996
F : I ---i 3. (19)
where s is a fixed scale factor.
For example, if F C g2 is the space of ? L , U image plane
Orthographic projection models are valid for scenes where
coordinates for the projection of some point P onto the image
the relative depth of the points in the scene is small compared
plane, then, assuming perspective projection, f = [U,uIT:
to the distance from the camera to the scene, for example, an
where 71 and ?I are given by (16). The exact form of (19)
airplane flying over the earth, or a camera with a long focal
will depend in part on the relative configuration of the camera
length lens placed several meters from the workspace.
and end-effector as discussed in the next section.
3) Af$ne projection: Another linear approximation to per-
spective projection is known as affine projection. In this case,
the image coordinates for the projection of a point ‘ P are E. Camera Configuration
given by Visual servo systems typically use one of two camera con-
figurations: end-effector mounted, or fixed in the workspace.
[:] = A‘P+c (1 8) The first, often called an eye-in-hand configuration, has the
camera mounted on the robot’s end-effector. Here, there exists
where A is an arbitrary 2 x 3 matrix and c is an arbitrary a known, often constant, relationship between the pose of the
2-vector. camera(s) and the pose of the end-effector. We represent this
Note that scaled orthographic projection is a special case relationship by the pose ‘x,. The pose of the target3 relative
of affine projection. Affine projection does not correspond to to the camera frame is represented by ‘ z t . The relationship
any specific imaging situation. Its primary advantage is that between these poses is shown in Fig. 2.
it is a good local approximation to perspective projection that The second configuration has the camera(s) fixed in the
accounts for both the external geometry of the camera (i.e., workspace. In this case, the camera(s) are related to the base
its position in space), and the intemal geometry of the lens coordinate system of the robot by ‘2, and to the object by
and CCD (i.e., the focal length, and scaling and offset to pixel
Jang [ 131 provides a formal definition of what we term feature parameters
coordinates). Since the model is purely linear, A and c are as image functionals.
easily computed using linear regression techniques [9], and 3The word fcirwr will be used to refer to the obiect of interest. that is. the
the camera calibration problem is greatly simplified. object that will he tracked
Authorized licensed use limited to: UNIVERSIDAD DE LAS AMERICAS PUEBLA. Downloaded on November 04,2022 at 15:21:45 UTC from IEEE Xplore. Restrictions apply.
HUTCHINSON et a/ : TUTORIAL ON VISUAL SERVO CONTROL 655
Fig. 2. Relevant coordinate frames (world, end-effector, camera and target) for end-effector mounted, and fixed, camera configurations
(‘xt.In this case, the camera image of the target is, of course, 2) 1s the error signal defined in 3D (task space) coordinates,
independent of the robot motion (unless the target is the end- or directly in terms of image features?
effector itself). A variant of this is for the camera to be The resulting taxonomy, thus, has four major categories, which
agile, mounted on another robot or padtilt head in order to we now describe. These fundamental structures are shown
observe the visually controlled robot from the best vantage schematically in Figs. 3-6.
~51. If the control architecture is hierarchical and uses the vision
For either choice of camera configuration, prior to the system to provide set-point inputs to the joint-level controller,
execution of visual servo tasks, camera calibration must be thus making use of joint feedback to internally stabilize the
performed in order to determine the intrinsic camera pa- robot, it is referred to as a dynamic look-and-move system.
rameters such as focal length, pixel pitch and the principal In contrast, direct visual servo4 eliminates the robot controller
point. A fixed camera’s pose, Oxc,.with respect to the world entirely replacing it with a visual servo controller that directly
coordinate system must be established, and is encapsulated computes joint inputs, thus using vision alone to stabilize the
in the extrinsic parameters determined by a camera calibra- mechanism.
tion procedure. For the eye-in-hand case the relative pose, For several reasons, nearly all implemented systems adopt
‘xC. must be determined and this is known as the handleye the dynamic look-and-move approach. Firstly, the relatively
calibration problem. Calibration is a long standing research low sampling rates available from vision make direct control
issue in the computer vision community (good solutions to the of a robot end-effector with complex, nonlinear dynamics
calibration problem can be found in a number of references, an extremely challenging control problem. Using internal
e.g., [26]-[28]). feedback with a high sampling rate generally presents the
visual controller with idealized axis dynamics [29]. Sec-
ondly, many robots already have an interface for accepting
111. SERVOINGARCHITECTURES Cartesian velocity or incremental position commands. This
In 1980, Sanderson and Weiss [5] introduced a taxonomy of simplifies the construction of the visual servo system, and also
visual servo systems, into which all subsequent visual servo makes the methods more portable. Thirdly, look-and-move
systems can be categorized. Their scheme essentially poses separates the kinematic singularities of the mechanism from
two questions: the visual controller, allowing the robot to be considered as
1) Is the control structure hierarchical, with the vision
system providing set-points as input to the robot’s joint- 4Sanderson and Weiss used the term “visual servo” for this type of system,
but since then this term has come to be accepted as a generic description for
level controller, or does the visual controller directly any type o f visual control of a robotic system. Here we use the term “direct
compute the joint-level inputs? visual servo’’ to avoid confusion.
Authorized licensed use limited to: UNIVERSIDAD DE LAS AMERICAS PUEBLA. Downloaded on November 04,2022 at 15:21:45 UTC from IEEE Xplore. Restrictions apply.
656 IFFE TRANSACTlONS O N ROBOTICS AND AUTOMArION, VOL 12. N O 5 , OCTOBER 1996
fd
lrflagc
feature
Camera
an ideal Cartesian motion device. Since many resolved rate it does present a significant challenge to controller design since
[30] controllers have specialized mechanisms for dealing with the plant is nonlinear and highly coupled.
kinematic singularities [311, the system design is again greatly One of the typical applications of visual servoing is to
simplified. In this article, we will utilize the look-and-move position an end-effector relative to a target. For example,
model exclusively. many authors use an end-effector mounted camera to position
The second major classification of systems distinguishes a robot arm for grasping. In most cases, the control algorithm
position-based control from image-based control. In position- is expressed in terms of moving the camera to a pose defined
bused control, features are extracted from the image and used in terms of the image of the object to be grasped. The position
in conjunction with a geometric model of the target and the of the end-effector relative to the object is determined only
known camera model to estimate the pose of the target with indirectly by its known kinematic relationship with the camera.
respect to the camera. Feedback is computed by reducing er- Errors in this kinematic relationship lead to positioning errors
rors in estimated pose space. In image-based servoing, control which cannot be observed by the system. Observing the
values are computed on the basis of image features directly. end-effector directly makes it possible to sense and correct
The image-based approach may reduce computational delay, for such errors. In general, there is no guarantee on the
eliminate the necessity for image interpretation and eliminate positioning accuracy of the system unless control points on
errors due to sensor modeling and camera calibration. However both the end-effector and target can be observed [9], [32],
Authorized licensed use limited to: UNIVERSIDAD DE LAS AMERICAS PUEBLA. Downloaded on November 04,2022 at 15:21:45 UTC from IEEE Xplore. Restrictions apply.
HUTCHINSON et a/: 'rUTORIA1L ON VISUAI. SERVO CONTROL 651
[33]. To emphasize this distinction, we refer to systems that ways of thinking about position-based control, and that will
only observe the target object as endpoint open-loop (EOL) also provide useful comparisons when we consider image-
systems, and systems that observe both the target object and based control in the next section. Section IV-A introduces
the robot end-effector as endpoint closed-loop (ECL) systems. several simple positioning primitives, based on directly ob-
The differences between EOL and ECL systems will be made servable feature points, which can be compounded to achieve
more precise in subsequent discussions. more complex positioning tasks. Next, Section IV-B describes
It is usually possible to transform an EOL system to an positioning tasks based on the explicit estimation of the target
ECL system simply by including direct observation of the object's pose. Finally, in Section IV-C, we briefly describe
end-effector or other task-related control points. Thus, from how point position and object pose can be computed using
a theoretical perspective, it would appear that ECL systems visual information from one or more cameras-the visual
would always be preferable to EOL systems. However, since reconstruction problem.
ECL systems must track the end-effector as well as the target
object, the implementation of an ECL controller often requires A. Point-Feature Based Motions
solution of a more demanding vision problem and places
We begin by considering a positioning task in which some
field-of-view constraints on the system that cannot always be
point on the robot with end-effector coordinates, ' P , is to be
satisfied.
brought to a fixed stationing point, S. visible in the scene. We
refer to this as point-to-point positioning. In the case where the
IV. POSITION-BASED
VISUALSERVOCONTROL camera is fixed, the kinematic error function may be defined
in base coordinates as
We begin our discussion of visual servoing methods with
position-based visual servoing. As described in the previous ; " P )= X " ( " P ) - s.
E p p ( 2 ,s, (20)
section, in position-based visual servoing, features are ex-
tracted from the image and used to estimate the pose of the Here, as in the sequel, the argument before the semicolon is the
target with respect to the camera. Using these values, an error value to be controlled (in all cases, manipulator position) and
between the current and the desired pose of the robot is defined the values after the semicolon parameterize the positioning
in the task space. In this way, position-based control neatly task.
separates the control issues, namely the the computation of E,, defines a three degree of freedom kinematic constraint
the feedback signal, from the estimation problems involved in on the robot end-effector position. If the robot workspace is
computing position or pose from visual data. restricted to be 7 = @, this task can be thought of as a rigid
We now formalize the notion of a positioning task as link that fully constrains the pose of the end-effector relative
follows: to the target. When IT SE3, the constraint defines a virtual
Definition 4.1: A positioning task is represented by a func- spherical joint between the object and the robot end-effector.
tion E : I + R"'. This function is referred to as the kinematic Let I = R3. We first consider the case in which one or
error ,function. A positioning task is fulfilled with the end- more cameras calibrated to the robot base frame furnish an
effector in pose 2, if E ( 2 , ) = 0. estimate, ' S . of the stationing point coordinates with respect
If we consider a general pose x, for which the task is to a camera coordinate frame. Using the estimate of the camera
fulfilled, the error function will constrain some number, d 5 pose in base coordinates, 2'. from off-line calibration and (l),
711; degrees of freedom of the manipulator. The value d will be we have S = S,('S).
referred to as the degree of the constraint. As noted by Espiau Since 7 = R3,the control input to be computed is the
et ul. [ I l l , [34], the kinematic error function can be thought desired robot translational velocity, which we denote by u3 to
of as representing a virtual kinematic constraint between the distinguish it from the more general end-effector screw. Since
end-effector and the target. (20) is linear in xf,it is well known that in the absence of
Once a suitable kinematic error function has been defined outside disturbances, the proportional control law
and the parameters of the functions are instantiated from visual
data, a regulator is defined that reduces the estimated value of
the kinematic error function to zero. This regulator produces
at every time instant a desired end-effector velocity screw
U E $2' that is sent to the robot control subsystem. For the will drive the system to an equilibrium state in which the
purposes of this article, we use simple proportional control value of the error function is zero [35]. The value k > 0 is a
methods for linear and linearized systems to compute U [35]. proportional feedback gain. Note that we have written 3, in
Although there are formalized methods for developing such the feedback law to emphasize the fact that this value is also
control laws, since the kinematic error functions are defined subject to errors.
in Cartesian space, for most problems it i s possible to develop The expression (21) is equivalent to open-loop positioning
a regulator through geometric insight. The process is to first of the manipulator using vision-based estimates of geome-
determine the relative motion that would fulfill the task, and try. Variations on this scheme are used by [36], [37]. In
then to write a control law that would produce that motion. our simplified dynamics, the manipulator is stationary when
The remainder of the section presents various example ILJ = 0. Since the right hand side of the equation includes
problems that we have chosen to provide some insight into estimated quantities, it follows that errors in 3,, or "S
Authorized licensed use limited to: UNIVERSIDAD DE LAS AMERICAS PUEBLA. Downloaded on November 04,2022 at 15:21:45 UTC from IEEE Xplore. Restrictions apply.
658 lEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, VOL. 12, NO. 5, OCTOBER 1996
(robot kinematics, camera calibration and visual reconstruction for free-standing cameras. Similar manipulations yield
respectively) can lead to positioning errors of the end-effector.
Now, consider the situation when the cameras are mounted
on the robot and calibrated to the end-effector. In this case,
we can express (20) in end-effector coordinates for end-effector mounted cameras. Substituting the appropriate
expression for u3 or eu, from the previous discussion leads to
p E I I p ( z s;
, ; ' P ) = ' P - 'zo(S). (22) a form of proportional regulation for the Cartesian problem.
As a second example of feature-based positioning, consider
The camera(s) furnish an estimate of the stationing point, " S , that some point on the end-effector, " P , is to be brought to
which can be combined with information from the camera the line joining two fixed points S1 and S2 in the world. The
calibration and robot kinematics to produce S = (ke o shortest path for performing this task is to move " P toward the
'kc)(?,!?). w e now compute line joining SI and 5'2 along the perpendicular to the line. The
error function describing this trajectory in base coordinates is:
'213 = - k eElIp(kp; (& 0 " k C ) ( " S" )P, )
= -k(?P- ('20 0 02' 0 %)("S)) Epz(2e;Sl,SZ,"P)
= -k('P - ek.c("s)). (23) = (S2 - SI)x ( ( z r ( ' P )- S I )X (5'2 - SI)). (29)
Notice that the terms involving 2' have dropped out. Notice that although E,l is a mapping from 7 to $I3. placing a
Thus (23) is not only simpler, but positioning accuracy is point on a line is a constraint of degree 2. From the geometry of
also independent of the accuracy of the robot kinematics-a the problem and the previous discussion, we see that defining
fundamental benefit of visual servoing.
All of the above formulations presume prior knowledge of U = -kA(kr('P))+Epl(3,; gl,g2, ' P )
" P and are therefore EOL systems. To convert them to ECL
systems, we suppose that ' P is directly observed and estimated is a proportional feedback law for this problem.
by the camera system. In this case, (21) and (23) can be written Suppose that now we apply this constraint to two points on
A A
the end-effector
us = -k E p p ( d e ; k < : ( C"s 2) ,J " P ) =
) -k 2J'P - 'S)
(24)
cu3 = - k % ( " P ) )= -k ekc("P- 'S)
'Epp(k,;kc(Cs),
Epplnow defines a four degree of freedom positioning con-
(25)
straint that aligns the points on the end-effector with those
respectively. We now see that u3 (respectively ' U S ) does not in target coordinates, and again no unique motion satisfies
depend on 2, and is homogeneous in xc (respectively this kinematic error function. A geometrically straightforward
Hence, if 'S = ' P , then u3 = 0 , independent of errors in solution is to compute a translation, T , which moves ' P I to
the robot kinematics or the camera calibration. This is an the line through SI and S2. Simultaneously, we can choose
important advantage for systems where a precise camerdend- a rotation, R which rotates eP2 about eP1 so that the line
effector relationship is difficult or impossible to determine through ' P I and 'P2 becomes parallel to that through S1 and
off-line. SZ.
Consider now the full Cartesian problem where I C SE3, In order to compute a velocity screw U = ( T ,R), we first
and the control input is the complete velocity screw U E R6. note that the end-effector rotation matrix R, can be represented
Since, the error functions presented above only constrain 3 as a rotation through an angle B about an axis defined by a
degrees of freedom, the problem of computing U from the unit vector k [7]. In this case, the axis of rotation is
estimated error is under-determined. One way of proceeding
is as follows. Consider the case of free standing cameras. Then k = (S2 - SI) x [R,(eP2 - " P I ) ]
in base coordinates we know that P = 213. Using (14), we can
where the bar over expressions on the right denotes normal-
relate this to the end-effector velocity screw as follows:
ization to a unit vector. Hence, a natural feedback law for the
P = 113 = A( P ) u . (26) rotational portion of the velocity screw is
Thus, if we could "solve for" IL in the above equation, we R = -k1t. (30)
could effectively use the three-dimensional solution to arrive at
the full Cartesian solution. Unfortunately, A is not square and Note the expression on the right hand size is the zero vector
therefore can cannot be inverted to solve for U . However, recall if the lines joining associated points are parallel as we desire.
that the matrix right inverse for an m x n matrix M , n > m is The only complication to computing the translation portion
defined as M + = M T ( M M T ) - l .The right inverse computes of the vector is to realize that rotation introduces translation of
the minimum norm vector which solves the original system of points attached to the end-effector. Hence, we need to move
equations. Hence, we have " P I toward the goal line while compensating for the motion
introduced by rotation. Based on the discussion above, we
U A(P)+u~ (27) know the former is given by - E p i ( x e ;SI,5 ' 2 , ' P I ) while
Authorized licensed use limited to: UNIVERSIDAD DE LAS AMERICAS PUEBLA. Downloaded on November 04,2022 at 15:21:45 UTC from IEEE Xplore. Restrictions apply.
HUTCHINSON et al.: TUTORIAL ON VISUAL SERVO CONTROL 659
from (12) the latter is simply R x ze(‘Pl).Combining these coordinate system have dropped out of the error equation.
two expressions, we have Hence, these factors do not affect the positioning accuracy
of the system.
The modifications of pose-based methods to end-effector
based systems are completely straightforward and are left for
the reader.
Note that we are still free to choose translations along the line
joining SI and S2 as well as rotations about it. Full six degree-
of-freedom positioning can be attained by enforcing another C. Estimation
point-to-line constraint using an additional point on the end-
effector and an additional point in the world. Similar geometric A key issue in position-based visual servo is the estimation
arguments can be used to define a proportional feedback law. of the quantities used to parameterize the feedback. In this
These formulations can be adjusted for end-effector regard, position-based visual servoing is closely related to the
mounted camera and can be implemented as ECL or EOL problem of recovering scene geometry from one or more cam-
systems. We leave these modifications as an exercise for the era images. This encompasses problems including structure
reader. from motion, exterior orientation, stereo reconstruction, and
absolute orientation. Unfortunately, space does not permit a
complete coverage of these topics here and we have opted
B. Pose-Based Motion
to provide pointers to the literature, except in the case of
In the previous section, positioning was defined in terms of point estimation for two cameras, which has a straightforward
directly observable point features. When working with a priori solution. A comprehensive discussion of these topics can be
known objects, it is possible to recover the pose of the object, found in a recent review article [38].
xt,and to define stationing points with respect to object pose. 1 ) Estimation with a Single Camera: As noted previously,
The methods of the previous section can be easily applied it follows from (16) that a point in a single camera im-
when object pose is available. For example, suppose t S is an age corresponds to a line in space. Although it is possible
arbitrary stationing point in a target object’s coordinate system, to perform geometric reconstruction using a single moving
and that we can compute ‘ x t using end-effenctor mounted camera, the equations governing this process are often ill-
camera(s). Then using (1) we can compute ‘5’ = ‘ 2 t ( t S ) . conditioned, leading to stability problems [38]. Better results
This estimate can be used in any of the end-effector based can be achieved if target image features have some internal
feedback methods of the previous section in both ECL and structure, or the image features come from a known object.
EOL configurations. Similar remarks hold for systems utilizing Below, we briefly describe methods for performing both point
free-standing cameras. estimation and pose estimation with a single camera assuming
Given an object pose, it is possible to directly define such information is available.
positioning tasks in terms of that object pose. Let ‘x: be a a ) Single Points: Clearly, extra information is needed in
desired stationing pose (rather than point as in the previous order to reconstruct the Cartesian coordinates of a point in
section) for the end-effector, and suppose the system employs space from a single camera projection. This may come from
free-standing cameras. We can define a positioning error additional measurable attributes, for example, in the case of
a circular opening with known diameter d the image will be
an ellipse. The ellipse can be described by five image feature
(Note that in order for this error function to be in accord parameters from which can be derived distance to the opening,
with our definition of kinematic error we must select a and orientation of the plane containing the hole.
parameterization of rotations which is 0 when the end-effector b ) Object Pose: Object pose can be estimated if the
is in the desised position.) vision system observes multiple point features on a known
Using feature information and the camera calibration, we object. This is referred to as the pose estimation problem in the
can directly estimate xt = xc o ‘ S t . If we again represent vision literature, and numerous methods for its solution have
the rotation in terms of a unit vector ‘ k, and rotation angle been proposed. These can be broadly divided into analytic
‘ H,, we can define solutions and least-squares solutions. Analytic solutions for
three and four points are given by [39]-[43], and unique
solutions exist for four coplanar, but not collinear, points.
Least-squares solutions can be found in [44]-[50]. Six or
more points always yield unique solutions and allow the
where t , is the origin of the end-effector frame in base camera calibration matrix to be computed. This can then be
coordinates. decomposed [48] to yield the target’s pose.
If we can also observe the end-effector and estimate its pose, The general least-squares solution is a nonlinear optimiza-
‘XP we can rewrite (32) as follows: tion problem which has no known closed-form solution. In-
= (‘& 0 ‘2.0)0 (OX(.0 ‘&) 0 t2, = e&. 0 e x + 0 t 2 e - . stead, iterative optimization techniques are generally em-
ployed. These techniques iteratively refine a nominal pose
Once again we see that for an ECL system, both the robot value using observed data (see [SI] for a recent review).
kinematic chain and the camera pose relative to the base Because of the sensitivity of the reconstruction process to
Authorized licensed use limited to: UNIVERSIDAD DE LAS AMERICAS PUEBLA. Downloaded on November 04,2022 at 15:21:45 UTC from IEEE Xplore. Restrictions apply.
660 IEEE TRANSACTlONS ON KOBOTICS A N D AUTOMATION, VOL. 12, NO. 5, OCTOBER 1996
noise, it is often a good idea to incorporate some type of the only unknown in the system. The corresponding least-
smoothing or averaging of the computed pose parameters, at squares problem can either be solved explicitly for rotation
the cost of some delay in response to changes in target pose. (see [56]-[58]), or solved incrementally using linearization.
A particularly elegant formulation of this updating procedure Given an estimate for rotation, the computation of translation
results by application of statistical techniques such as the is a standard linear least squares problem.
extended Kalman filter [52]. This approach has been recently
demonstrated by Wilson [53] for six DOF control of end- D. Discussion
effector pose. A similar approach was recently reported in
The principle advantage of position-based control is that
t541. it is possible to describe tasks in terms Cartesian pose as is
2) Estimation with Multiple Cameras: Multiple cameras
common in robotics. It's primary disadvantage is that feedback
greatly simplify the reconstruction process and many systems
is computed using estimated quantities that are a function of
utilizing position-based control with stereo vision from free-
the system calibration parameters. Hence, in some situations.
standing cameras have been demonstrated. For example, Allen
position-based control can become extremely sensitive to cal-
[36] shows a system that can grasp a toy train using stereo
ibration error. Endpoint closed-loop systems are demonstrably
vision. Rizzi [37] demonstrates a system which can bounce
less sensitive to calibration. However, particularly in stereo
a ping-pong ball. All of these systems are EOL. Cipolla [9]
systems, small errors in computing the orientation of the
describes an ECL system using free-standing stereo cameras.
cameras can still lead to reconstruction errors that impact the
One novel feature of this system is the use of the affine
positioning accuracy of the system.
projection model (Section II-C) for the imaging geometry.
Pose-based methods for visual servoing seem to be the most
This leads to linear calibration and control at the cost of some
generic approach to the problem, as they support arbitrary
system performance. The development of a position-based
relative position with respect to the object. An often cited
stereo eye-in-hand servoing system has also been reported
disadvantage of pose-based methods is the computation time
[551. required to solve the relative orientation problem. However
a ) Single Points: Let ax,l represent the pose of a camera
recent results show that solutions can be computed in only a
relative to an arbitrary base coordinate frame a. By inverting
few milliseconds even using iteration [51] or Kalman filtering
this transformation and combining (1) and (16) for a point
[53]. In general, given the rapid advances in microproces-
" P = [x,?/,zITwe have
sor technology, computational considerations are becoming
less of an issue in the design of visual servoing systems.
Another disadvantage of pose-based approaches is the fact
that they inherently depend on having an accurate model of
where x,y and z are the rows of '.lR, and "ta = [ t r ,C,, 1,IT. the target object-a form of calibration. Hence, feature-based
Multiplying through by the denominator of the right-hand side, approaches tend to be more appropriate to tasks where there
we have is no prior model of the geometry of the task, for example
in teleoperation applications [59]. Generally speaking, since
Al(P,)"P Z Y bl(P1). (36) feature-based methods rely on less prior information (which
where may be in error), they can be expected to perform more
robustly on comparable tasks.
Another approach to position-based visual servoing which
has not been discussed here is to use an active 3D sensor. For
Given a second camera at location ' x , ~we can compute example, active 3D sensors based on structured lighting are
now compact and fast enough to use for visual servoing. If
A2 ( p 2 )and b 2 ( p 2 )analogously. Stacking these together results
in a matrix equation the sensor is small and mounted on the robot the depth and
orientation information can be used directly for position-based
visual servoing [60]-[621.
Authorized licensed use limited to: UNIVERSIDAD DE LAS AMERICAS PUEBLA. Downloaded on November 04,2022 at 15:21:45 UTC from IEEE Xplore. Restrictions apply.
HIJTCkIINSON c’f d :TUTORIAL ON VISUAL SERVO CONTROL 66 I
observed by the vision system. Thus, the specification of an B. An Example Image Jacobian
image-based visual servo task involves determining an appro- Suppose that the end-effector is moving with angular ve-
priate error function e , such that when the task is achieved, locity ‘12, = [w,, w y ,w.] and translational velocity ‘T, =
e = 0. This can be done by directly using the projection [T,. T?,)TZ] (as described in Section 11-B) both with respect to
equations (16), or via a “teach by showing” approach in which the camera frame in a fixed camera system. Let P be a point
the robot is moved to a goal position and the corresponding rigidly attached to the end-effector. The velocity of the point
image is used to compute a vector of desired image feature P , expressed relative to the camera frame, is given by
parameters, f d .If the task is defined with respect to a moving
object, the error, e . will be a function, not only of the pose of ‘’P = ‘R, x ‘ P + ‘T?. (39)
the end-effector, but also of the pose of the moving object.
Although the error, e , is defined on the image parameter To simplify notation, let ‘ P = [x.y. z]*. Substituting the
space, the manipulator control input is typically defined either perspective projection equations (16) into (10) and (1 l), we
in joint coordinates or in task space coordinates. Therefore, it can write the derivatives of the coordinates of “ P in terms of
is necessary to relate changes in the image feature parameters the image feature parameters U , ? ) as
to changes in the position of the robot. The image Jacobian,
introduced in Section V-A, captures these relationships. We
.i. = Z
Cl6
W ~- -U,
x
+ Tr (40)
present an example image Jacobian in Section V-B. In Section UZ
G=-w,
V-C, we describe methods that can be used to “invert” the x - ZW, + T y (41)
z
image Jacobian, to derive the robot velocity that will produce z = -(vw, - uw,) + T,.
x (42)
the desired change in the image. Finally, in Sections V-D
and V-E we describe how controllers can be designed for Now, let f = [ U , U]*. as above and using the quotient rule,
image-based systems.
2.r. - 22.
il=A- (43)
22
A. The Image Jacobian
Let T represent coordinates of the end-effector in some
parameterization of the task space 7 and i represent the (44)
corresponding end-effector velocity (note, i is a velocity
screw, as defined in Section 11-B). Let f represent a vector of
image feature parameters and the corresponding vector of
image feature parameter rates of change’. The image Jacobian, Similarly
is a linear transformation from the tangent space of 7 at
?7?>, x u -A2 - v2 uv
r to the tangent space of .F at J. In particular ii = -T
z Y - -T, + wT + -wY
x +uw,. (46)
f = J,;(r)i- (37) Finally, we may rewrite these two equations in matrix form
to obtain
where *7,>E RRkXnL,
and
z
u
z
?1
x
v x2 + u2
x
UV
-
x
Recall that m, is the dimension of the task space, 7. Thus the (47)
number of columns in the image Jacobian will vary depending
on the task. which relates image-plane velocity of a point to the relative
The image Jacobian was first introduced by Weiss et al. velocity of the point with respect to the camera. Alternative
[21J, who referred to it as the feature seizsitivity matrix. It is derivations for this example can be found in a number of
also referred to as the interaction matrix [ 111 and the B matrix references including [63], [64].
[16], [17]. Other applications of the image Jacobian include It is straightforward to extend this result to the general
~101,1141, [151, ~ 4 1 . case of using k / 2 image points for the visual control by
The relationship given by (37) describes how image feature simply stacking the Jacobians for each pair of image point
parameters change with respect to changing manipulator pose. coordinates; see (48), shown at the bottom of the next page.
In visual servoing we are interested in determining the ma- Finally, note that the Jacobian matrices given in (47) and
nipulator velocity, i , required to achieve some desired value (48) are functions of z,, the distance to the point being imaged.
of i. This requires solving the system given by (37). We will For a fixed camera system, when the target is the end-effector
discuss this problem in Section V-C, but first we present an these x values can be computed using the forward kinematics
example image Jacobian. of the robot and the camera calibration information. For an
‘If the image featurc parameters are point coordinates these rates arc image eye-in-hand system, determining z can be more difficult, and
planc point velocities. this problem is discussed further in Section V-F.
Authorized licensed use limited to: UNIVERSIDAD DE LAS AMERICAS PUEBLA. Downloaded on November 04,2022 at 15:21:45 UTC from IEEE Xplore. Restrictions apply.
662 IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, VOL. 12, NO. 5 , OCTOBER 1996
C. Using the Image Jacobian to Compute i.e., there are certain components of the object motion that can
End-Effector Velocity not be observed. In this case, the appropriate pseudoinverse is
The results of the previous sections show how to relate given by
robot end-effector motion to perceived motion in a camera
image. However, visual servo control applications typically
require the reverse-computation of i given f as input. There In general, for k < m, ( I - S T J , ) # 0, and all vectors of the
are three cases that must be considered: k = 7n, k < m, and form ( I - J ; J , ) b lie in the null space of J,, and correspond to
k. > m. We now discuss each of these. those components of the object velocity that are unobservable.
When k = m, and J , is nonsingular, J,' exists. Therefore, In this case, the solution is given by (49). For example, as
in this case, r = Ji'f. Such an approach has been used shown in [64], the null space of the image Jacobian given in
by Feddema [20], who also describes an automated approach (47), is spanned by the four vectors
to image feature selection in order to minimize the condition -U -0
number of J u . U 0
When k # m : J L 1 does not exist. In this case, assuming that x 0
J , , is full rank (i.e., rank(J,,) = min(k, m ) ) ,we can compute 0 U
a least squares solution, which, in general, is given by 0 V
+
i = Jrff ( I - J ; f J , , ) b (49)
-0 -x
In some instances, there is a physical interpretation for the
where Jrf is a suitable pseudoinverse for J,, and b is an
vectors that span the null space of the image Jacobian. For
arbitrary vector of the appropriate dimension. The least squares
example, the vector [ U . v , A, 0, 0, 0IT reflects that the motion of
solution gives a value f o r i that minimizes the norm l l f - Jl,ill.
a point along a projection ray cannot be observed. The vector
We first consider the case k > m; that is, there are more
[O, 0, 0, U , v , A]' reflects the fact that rotation of a point on a
feature parameters than task degrees of freedom. By the projection ray about that projection ray cannot be observed.
implicit function theorem [65],if, in some neighborhood of
Unfortunately, not all basis vectors for the null space have
r , m 5 k: and rank(d,,) = m (i.e., J,: is full rank), we
such an obvious physical interpretation. The null space of the
can express the coordinates fTrL+1 Jllc as smooth functions
image Jacobian plays a significant role in hybrid methods, in
of f l . . ,fnL. From this, we deduce that there are k - m which some degrees of freedom are controlled using visual
redundant visual features. Typically, this will result in a set
servo, while the remaining degrees of freedom are controlled
of inconsistent equations (since the k visual features will be
using some other modality [14].
obtained from a computer vision system and are likely to be
noisy). In this case, the appropriate pseudoinverse is given by
D. Resolved-Rate Methods
J: = (J:J?,)-'J;. (50) The earliest approaches to image-based visual servo control
[lo], [21] were based on resolved-rate motion control [30],
Here, we have ( I - J:J,) = 0 (the rank of the null space of
which we will briefly describe here. Suppose that the goal of
J , is 0, since the dimension of the column space of J , : m,
a particular task is to reach a desired image feature parameter
equals rank(J ( , ) ) Therefore,
. the solution can be written more
vector, f d . If the control input is defined as in Section IV to be
concisely as
an end-effector velocity, then we have U = r , and assuming for
r = J:f. ( 5 1) the moment that the image Jacobian is square and nonsingular,
x
- 0
A2 + up
21 x
x UlUl
__
0 -
21 x
x x2
0
2k/2 x
0
x k/2 k/2
x
Authorized licensed use limited to: UNIVERSIDAD DE LAS AMERICAS PUEBLA. Downloaded on November 04,2022 at 15:21:45 UTC from IEEE Xplore. Restrictions apply.
HUTCHlNSON et ut.: TIJTORlAL ON VISUAL SERVO CONTROL 663
where K is a constant gain matrix of the appropriate di- proof proceeds as follows. The origin of the coordinate frame
mension. For the case of a nonsquare image Jacobian, the for the left camera, together with the projections of S1 and
techniques described in Section V-C would be used to compute S2 onto the left image forms a plane. Likewise, the origin of
for U . Similar results have been presented in [14], [15]. More the coordinate frame for the right camera, together with the
advanced techniques based on optimal control are discussed projections of SI and onto the right image forms a plane.
in 1161. The intersection of these two planes is exactly the line joining
SI and S2 in the workspace. When P lies on this line, it must
E. Examde Servoinn- Tasks lie simultaneously in both of these planes, and therefore, must
be colinear with the the projections of the points S1 and S2
In this section, we revisit some of the problems introduced
in both images.
in Section IV-A and describe image-based solutions for these
We now tum to conditions that determine when the pro-
problems. In all cases, we assume two fixed cameras are
jection of P is colinear with the projections of the points SI
observing the scene.
and Sz? and will use the knowledge that three vectors are
1) Point to Point Positioning: Consider the task of bring-
coplanar if and only if their scalar triple product is zero. For
ing some point P on the manipulator to a desired stationing
the left image, let the projection of SI have image coordinates
point S. The kinematic error function was given in (20). If
two cameras are viewing the scene, a necessary and sufficient
[U;,41. the projection of Sz have image coordinates [ U ; , wk],
and the projection of P have image coordinates [d. U‘]. If the
condition for P and S to coincide in the workspace is that the
three vectors from the origin of the left camera to these image
projections of P and S coincide in each image.
points are coplanar, then the three image points are colinear.
If we let [U’, 71‘1~ and [ u rvrIT
, be the image coordinates for
Thus, we construct the scalar triple product
the projection of P in the left and right images, respectively,
then we may take f = [ , ~ ‘ , d , u ~ , v If ~ . let ‘T = R,?,
~ ]we
then in (19), F is a mapping from 7 to R4.
Let the projection of S have coordinates [U:,TI,:] and [7~:, ?I;]
in the left and right images. We then define the desired feature
and proceeding in a similar fashion for the right image, derive
vector to be f C i = [ U ; . U;,,U:, yielding from which we construct the error function
Authorized licensed use limited to: UNIVERSIDAD DE LAS AMERICAS PUEBLA. Downloaded on November 04,2022 at 15:21:45 UTC from IEEE Xplore. Restrictions apply.
664 IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, VOL 12, NO 5 , OCTOBER 1996
the fact that by construction, when the image error function servoing task. Hence, visual servoing pre-supposes the solution
is zero, the kinematic error must also be zero. Even if the to a set of potentially difficult static and dynamic vision
hand-eye system is miscalibrated, if the feedback system is problems. To this end many reported implementations contrive
asymptotically stable, the image error will tend to zero, and the vision problem to be simple: e.g. painting objects white,
hence so will the kinematic error. This is not the case with the using artificial targets, and so forth [lo], [14], [37], [70].Other
position-based system described in Section IV [68]. Thus, one authors use extremely task-specific clues: e.g. Allen [36] uses
of the chief advantages to image-based control over position- motion detection for locating a moving object to be grasped,
based control is that the positioning accuracy of the system is and a fruit picking system looks for the characteristic fruit
less sensitive to camera calibration errors. color. A review of tracking approaches used by researchers in
There are also often computational advantages to image- this field is given in [3].
based control, particularly in ECL configurations. For example, In less structured situations, vision has typically relied on the
a position-based relative pose solution for an ECL single- extraction of sharp contrast changes, referred to as “comers”
camera system must perform two nonlinear least squares or “edges”, to indicate the presence of object boundaries or
optimizations in order to compute the error function. The surface markings in an image. Processing the entire image to
comparable image-based system must only compute a sim- extract these features necessitates the use of extremely high-
ple image error function, an inverse Jacobian solution, and speed hardware in order to work with a sequence of images
possibly a single position or pose calculation to parameterize at camera rate. However not all pixels in the image are of
the Jacobian. In practice, as described in Section V-B, the un- interest, and computation time can be greatly reduced if only
known parameter for Jacobian calculation is distance from the a small region around each image feature is processed. Thus,
camera. Some recent papers present adaptive approaches for a promising technique for making vision cheap and tractable
estimating this depth value [16], or develop feedback methods is to use window-based tracking techniques [4], [37], [71].
which do not use depth in the feedback formulation [69]. Window-based methods have several advantages, among them:
One disadvantage of image-based methods compared to computational simplicity, little requirement for special hard-
position-based methods is the presence of singularities in the ware, and easy reconfiguration for different applications. We
feature mapping function which reflect themselves as unstable note, however, that initial positioning of each window typically
points in the inverse Jacobian control law. These instabili- presupposes an automated or human-supplied solution to a
ties are often less prevalent in the equivalent position-based potentially complex vision problem.
scheme. Returning again to the point-to-line example, the This section describes a window-based approach to tracking
Jacobian calculation becomes singular when the two stationing features in an image. The methods are capable of tracking a
points are coplanar with the optical centers of both cameras. In number of point or edge features at frame rate on a workstation
this configuration, rotations and translations of the setpoints in computer and require a framestore, no specialized image pro-
the plane are not observable. This singular configuration does cessing hardware, and have been incorporated into a publicly
not exist for the position-based solution. available software “toolkit” [4]. A discussion of methods
In the above discussion we have referred to f as the desired which use specialized hardware combined with temporal and
feature parameter vector, and implied that it is a constant. If geometric constraints can be found in [67]. The remainder of
it is a constant then the robot will move to the desired pose this section is organized as follows. Section VI-A describes
with respect to the target. If the target is moving the system how window-based methods can be used to implement fast
will endeavor to track the target and maintain relative pose, detection of edge segments, a common low-level primitive
but the tracking performance will be a function of the system for vision applications. Section VI-B describe an approach
dynamics, as discussed below in Section VII. However many based on temporally correlating image regions over time.
tasks can be described in terms of the motion of image features, VI-C describes some general issues related to the use of
for instance by aligning visual cues within the scene. Jang et temporal and geometric constraints, and Section VI-D briefly
ul. [66] describe a generalized approach to servoing on image summarizes some of the issues surrounding the choice of a
features, with trajectories specified in feature space which feature extraction method for tracking.
results in trajectories (tasks) that are independent of target
geometry. Feddema [lo] also uses a feature space trajectory
generator to interpolate feature parameter values due to the low A. Feature Bused Methods
update rate of the vision system used. Skaar et al. [IS] describe In this section, we illustrate how window-based processing
the example of a lDOF robot catching a ball by observing techniques can be used to perform fast detection of isolated
visual cues such as the ball, the arm’s pivot point, and another straight edge segments of fixed length. Edge segments are
point on the arm. The interception task can then be specified, intrinsic to applications where man-made parts contain comers
even if the relationship between camera and arm is not known or other patterns formed from physical edges.
a priori. Images are comprised of pixels organized into a two-
dimensional coordinate system. We adopt the notation 1(x,t )
to denote the pixel at location 2 [ U , TI. in an image captured
VI. IMAGE FEATURE
EXTRACTION
AND TRACKING at time k . A window can be thought of as a two-dimensional
Irrespective of the control approach used, a vision system array of pixels related to a larger image by an invertible
is required to extract the information needed to perform the mapping from window coordinates to image coordinates. We
Authorized licensed use limited to: UNIVERSIDAD DE LAS AMERICAS PUEBLA. Downloaded on November 04,2022 at 15:21:45 UTC from IEEE Xplore. Restrictions apply.
HIJTCHINSON et al TUTORlAl ON VISUAI. SERVO CONTROL 665
consider rigid transformations consisting of a translation vector the columns of W1 and along diagonals corresponding to
c = [ J ,']/1 and a rotation 8. A pixel value at x = [ U . U]'' in angles of fa.The maxima of all three curves are located
window coordinates is related to the larger image by and interpolated to yield edge orientation and position. Thus,
for the price of one window acquisition, one complete 1-
W ( z :c, 0, t ) = I ( c + B ( Q ) X , t ) (60) dimensional convolution, and three column sums, the vertical
where R is a two dimensional rotation matrix. We adopt the offset So and the orientation offset 60 can be computed. Once
conventions that x = 0 is the center of the window, and the these two values are determined, the state variables of the
set X represents the set of all values of x. acquisition window are updated as
Window-based tracking algorithms typically operate in two
stages. In the first stage, one or more windows are acquired
8+ =e- +so
using a nominal set of window parameters. The pixel values U+ = c - 60 sin(e+)
for all x E X are copied into a two-dimensional array that is v+ =?I- + SOCOS(0+).
subsequently treated as a rectangular image. Such acquisitions
can be implemented extremely efficiently using line-drawing An implementation of this method [4] has shown that
and region-fill algorithms commonly developed for graphics localizing a 20 pixel long edge using a Prewitt-style mask
applications [72].In the second stage, the windows are pro- 15 pixels wide searching k10 pixels and ~ t 1 5degrees takes
cessed to locate image features and from their parameters 1.5 ms on a Sun Sparc I1 workstation. At this rate, 22 edge
a new set of window parameters, 0 and e, are computed. segments can be tracked simultaneously at 30 Hz, the video
These parameters may be modified using external geometric frame rate used. Longer edges can be tracked at comparable
constraints or temporal prediction, and the cycle repeats. speeds by sub-sampling along the edge.
We consider an edge segment to be characterized by three Clearly, this edge-detection scheme is susceptible to mis-
parameters in the image plane: the PL and 7) coordinates tracking caused by background or foreground occluding edges.
of the center of the segment, and the orientation of the Large acquisition windows increase the range of motions that
segment relative to the image plane coordinate system. These can be tracked, but reduce the tracking speed and increase
values correspond directly to the parameters of the acquisition the likelihood that a distracting edge will disrupt tracking.
window used for edge detection. Let us first assume we have Likewise, large orientation brackets reduce the accuracy of the
correct prior values e- = ( P L - , v ) and Q- for an edge estimated orientation, and make it more susceptible to edges
segment. A window, W - (x) = W ( x ;e - , Q-, t ) . extracted that are not closely oriented to the underlying edge.
with these parameters would then have a vertical edge segment There are several ways of increasing the robustness of edge
within it. tracking. One is to include some type of additional information
Isolated step edges can be localized by determining the about the edges being tracked such as the sign or absolute
location of the maximum of the first derivative of the signal value of the edge response. For more complex edge-based
1641, [67], [731. Let c be a l-dimensional edge detection detection, collections of such oriented edge detectors can be
kernel arranged as a single row. The convolution W , ( x ) = combined to verify the location and position of the entire
( W - ~r P)(z)will have a response curve in each row which feature. Some general ideas in this direction are discussed in
peaks at the location of the edge. Summing each column Section VI-C.
of W1 superimposes the peaks and yields a one-dimensional
response curve. If the estimated orientation, 0-, was correct, B. Area-Based Methods
the maximum of this response curve determines the offset of Edge-based methods tend to work well in environments
the edge in window coordinates. By interpolating the response in which man-made objects are to be tracked. If, however,
curve about the maximum value, sub-pixel localization of the the desired feature is a specific pattern, then tracking can be
edge can be achieved. Here, e is taken to be a 1-dimensional based on matching the appearance of the feature (in terms
Prewitt operator [64] which, although not optimal from a signal of its spatial pattern of gray-values) in a series of images,
processing point of view, is extremely fast to execute on simple and exploiting its temporal consistency-the observation that
hardware. the appearance of small region in an image sequence changes
If the 0- was incorrect, the response curves in W1 will little. Such techniques are well described in image registration
deviate slightly from one another and the superposition of literature and have been applied to other computer vision
these curves will form a lower and less sharp aggregate problems such as stereo matching and optical flow.
curve. Thus, maximizing the maximum value of the aggregate Consider only windows that differ in the location of their
response curve is a way to determine edge orientation. This center. We assume some reference window was acquired at
can be approximated by performing the detection operation time t at location c. Some small time interval, 7 , later, a
on windows acquired at 8- as well as two bracketing angles candidate window of the same size is acquired at location
8- 5 a and performing quadratic interpolation on the maxima c + d . The correspondence between these two images is given
of the corresponding aggregate response curves. Computing by some similarity measure
the three oriented edge detectors is particularly simple if the
range of angles is small. In this case, a single window is O(d)= f ( ( W ( zc,; i ) ) - W ( x ;c + d ,t + 7))1u(x),
processed with the initial convolution yielding W1. Three XE'Y
aggregate response curves are computed by summing along T>O (61)
Authorized licensed use limited to: UNIVERSIDAD DE LAS AMERICAS PUEBLA. Downloaded on November 04,2022 at 15:21:45 UTC from IEEE Xplore. Restrictions apply.
666 IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, VOL. 12, NO. 5 , OCTOBER 1996
C. Feature Prediction
Window-based trackmg implicitly assumes that the inter-
frame motions of the tracked feature do not exceed the size of
search window, or, in the case of continuous optimization, a
few pixels from the expected location of the image region.
Substituting into (61) yields In the simplest case, the previous location of the image
O(d)= (Wz(x)dz+W,(x) d y + W + ( 2 ) r ) 2 w ( 2 )(62)
. feature can be used as a predictor of its current location.
Xt’Y
Unfortunately, as feature velocity increases the search window
must be enlarged which adversely affects computation time.
Define The robustness and speed of tracking can be significantly
increased with knowledge about the motion of the observed
features, which may be due to the camera and/or target moving.
Expression (62) can now be written more concisely as For example, given knowledge of the image feature location
xt at time t , Jacobian J , , the end-effector velocity ut, and
O(d)= (g(2)’ d + h(2)?-)2. the inter-frame time r, the expected location of the search
XEX windows can be computed, assuming no target motion, by the
Notice 0 is now a quadratic function of d . Computing the prediction
derivatives o f 0 with respect to the components of d, setting
the result equal to zero, and rearranging yields a linear system ftt7 =f t + rJvut.
of equations: Likewise, if the dynamics of a moving object are known,
r 1 then it is possible to use this information to enhance tracking
performance. For example, Rizzi [37] describes the use of a
Newtonian flight dynamics model to make it possible to track a
Authorized licensed use limited to: UNIVERSIDAD DE LAS AMERICAS PUEBLA. Downloaded on November 04,2022 at 15:21:45 UTC from IEEE Xplore. Restrictions apply.
HUTCHINSON ef ul.: TUTORIAL ON VISIJAL SERVO CONTROL 667
-113 tracking
ping-pong ball during flight. Predictors based on (II fitting an ellipse to the edge locations. In particular, short
filters and Kalman filters have also been used [36], [53], [67]. edge segments could be located using the techniques
described in Section VI-A. Once the segments have
been fit to an ellipse, the orientation and location of the
D. Discussion segments would be adjusted for the subsequent tracking
Prior to executing or planning visually controlled motions, cycle using the geometry of the ellipse.
a specific set of visual features must be chosen. Discussion During task execution, other problems arise. The two most
of the issues related to feature selection for visual servo common problems are occlusion of features and visual singu-
control applications can be found in [20], [21]. The “right” larities. Solutions to the former include intelligent observers
image feature tracking method to use is extremely application that note the disappearance of features and continue to predict
dependent. For example, if the goal is to track a single special their locations based on previously observed motion [37], or
pattern or surface marking that is approximately planar and redundant feature specifications that can perform even with
moving at slow to moderate speeds, then area-based tracking some loss of information. Solution to the latter require some
is appropriate. It does not require special image structure (e.g. combination of intelligent path planning and/or intelligent ac-
straight lines), is robust to large set of image distortions, and quisition and focus-of-attention to maintain the controllability
for small motions can be implemented to run at frame rates. of the system.
In comparison to the edge detection methods described It is probably safe to say that fast and robust image process-
above, area-based tracking is sensitive to occlusions and ing presents the greatest challenge to general-purpose hand-eye
background changes (if the template includes any background coordination. As an effort to help overcome this obstacle, the
pixels). Thus, if a task requires tracking several occluding methods described above and other related methods have been
contours of an object with a changing background, edge-based incorporated into a publicly available software “toolkit.” The
methods are clearly faster and more robust. interested reader is referred to [4] for details.
In many realistic cases, neither of these approaches by
themselves yields the robustness and performance desired. For
example, tracking occluding edges in an extremely cluttered VII. DISCUSSION
environment is sure to distract edge tracking as “better” edges This paper has presented a tutorial introduction to robotic
invade the search window, while the changing background visual servo control, focusing on the relevant fundamentals
would ruin the SSD match for the region. Such situations call of coordinate transformations, image formation, feedback al-
for the use of more global task constraints (e.g. the geometry gorithms, and visual tracking. In the interests of space and
of several edges), more global tracking (e.g. extended contours clarity, we have concentrated on presenting methods that are
or snakes [77]), or improved or specialized detection methods. well-represented in the literature, and that can be solved using
To illustrate these tradeoffs, suppose a visual servoing task relatively straightforward techniques. The reader interested in
relies on tracking the image of a circular opening over time. a broader overview of the field or interested in acquiring more
In general, the opening will project to an ellipse in the camera. detail on a particular area is invited to consult the references
There are several candidate algorithms for detecting this ellipse we have provided. Another goal has been to establish a
and recovering its parameters: consistent nomenclature and to summarize important results
1) If the contrast between the interior of the opening and here using that notation.
area around it is high, then binary thresholding followed Many aspects of the more general problem of vision-based
by a calculation of the first and second central moments control of motion have necessarily been omitted or abbreviated
can be used to localize the feature [37]. to a great degree. One important issue is the choice between
2) If the ambient illumination changes greatly over time, using an image-based or position-based system. Many systems
but the brightness of the opening and the brightness of based on image-based and position-based architectures have
the surrounding region are roughly constant, a circular been demonstrated, and the computational costs of the two
template could be localized using SSD methods aug- approaches seem to be comparable and are easily within the
mented with brightness and contrast parameters. In this capability of modem computers. In many cases the motion
case, (61) must also include parameters for scaling and of a target, for example an object on a conveyer, is most
aspect ratio [4]. naturally expressed in a Cartesian reference frame. For this
3) The opening could be selected in an initial image, reason, most systems dealing with moving objects ([36], [37])
and subsequently located using SSD methods. This have used position-based methods. Although there has been
differs from the previous method in that this calculation recent progress in understanding image plane dynamics [22],
does not compute the center of the opening, only its the design of stable, robust image-based servoing systems for
correlation with the starting image. Although useful for capturing moving objects has not been fully explored.
servoing a camera to maintain the opening within the In general, the accuracy of image-based methods for static
field of view, this approach is probably not useful for positioning is less sensitive to calibration than comparable
manipulation tasks that need to attain a position relative position-based methods, however image-based methods re-
to the center of the opening. quire online computation of the image Jacobian. Unfortu-
4) If the contrast and background are changing, the opening nately, this quantity inherently depends on the distance from
could be tracked by performing edge detection and the camera to the target which, particularly in a monocular
Authorized licensed use limited to: UNIVERSIDAD DE LAS AMERICAS PUEBLA. Downloaded on November 04,2022 at 15:21:45 UTC from IEEE Xplore. Restrictions apply.
668 IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, VOL. 12, NO 5, OCTOBER 1996
system, is difficult to compute. Many systems utilize a constant visual recognition. In the case of the former, although path-
image Jacobian, which is computationally efficient, but valid planning is a well-established discipline, the idea of combining
only over a small region of the task space6. Other systems image space feature path-planning with visual feedback has
have resorted to performing a partial pose estimation [lo], not been adequately explored. For a simple example of visual
adaptive depth estimation [ 161, or image Jacobian estima- servoing with obstacle avoidance, see [78]. Visual recognition
tion [78]. However, both add significantly to the complexity or interpretation is also important for any visual servoing
of the system design as well as introducing an additional system that is to operate without constant human intervention.
computational load. These are but two of the many issues that the designer
This issue is further complicated when dynamics are in- of an autonomous system that is to operate in unstructured
troduced into the problem. Even when the target object is not environments must confront.
moving, it is important to realize that a visual servo system is a It is appropriate to note that despite the long history and
closed-loop discrete-time dynamical system. The sampling rate intuitive appeal of using vision to guide robotic systems, the
in such a system is limited by the frame rate of the camera, applications of this technology remain limited. To some degree
though many reported systems operate at a sub-multiple of this has been due to the high costs of the specialized hardware
the camera frame rate due to limited computational ability. and the diverse engineering skdls required to construct an
Negative feedback is applied to a plant that generally includes integrated visually controlled robot system. Fortunately the
a time delays due to charge integration time within the camera, costs of key elements such as cameras, framestores, image
serial pixel transport from the camera to the vision system, and processing hardware and computers in general, continue to
computation time for feature parameter extraction. In addition fall and appear set to do so for some time to come. Cameras
most reported visual servo systems employ a relatively low are now becoming available with performance characteristics
bandwidth communications link between the vision system such as frame rate and image resolution beyond the limiting
and the robot controller, which introduces further latency. broadcast television standards which have constrained them
Some robot controllers operate with a sample interval that for so long.
is not related to the sample rate of the vision system, and In conclusion we hope that this paper has shown that visual
this introduces still further delay. A good example of this servoing is both useful and achievable using technology that
is the common Unimate Puma robot whose position loops is readily available today. In conjunction with the cost trends
operate at a sample interval of 14 or 28 ms while vision noted above we believe that the future for visual servoing
systems operate at sample intervals of 33 or 40 ms for RS is bright and will become an important and common control
170 or CCIR video respectively [29]. It is well known that a modality for robot systems in the future.
feedback system including delay will become unstable as the
loop gain is increased. Many visual closed-loop systems are
ACKNOWLEDGMENT
tuned empirically, increasing the loop gain until overshoot or
oscillation becomes intolerable. The authors are grateful to R. Kelly and to the anonymous
Simple proportional controllers are commonly used and can reviewers for their helpful comments on an earlier version of
be shown to drive the steady state error to zero. However this this
implies nothing about performance when tracking a moving
object, which will typically exhibit pronounced image plane REFERENCES
error and tracking lag. If the target motion is constant then
111 Y. Shirai and H. Inoue, “Guiding a robot by visual fcedback in
prediction (based upon some assumption of target motion) assembling tasks,” Puttern Recognil., vol. 5. pp. 99-108, 1973.
can be used to compensate for the latency, and predictors 121 J. Hill and W. T. Park, “Real time control of a robot with a mobile
camera,” i n P m . . 9th ISIR, Washington, D.C., Mar. 1979, pp. 233-246.
based on autoregressive models, Kalman filters, 01 - /? and
[31 P. Corkc, “Visual control of robot manipulators-A review,” in I6.sual
cv - /3 - y tracking filters have been demonstrated for visual Srrvoing K. Hashimoto. Ed. Singapore: World Scientific, 1993, pp.
servoing. However when combined with a low sample rate 1-3 1. (vol. 7 of X o h o f i c . . ~und Automured Sy.c/rms).
predictors can result in poor disturbance rejection and long 141 G. D. Hagcr, “The “X-vision” system: A general purpose substrate for
real-time vision-based robotics,” in Proc. Workshop on Visionfbr Robots,
reaction time to unmodeled target motion. In order for a 1995, pp. 56-63, 1995. Also available as Yale CS-RR-1078.
visual-servo system to provide good tracking performance 151 A. C. Sanderson and L. E. Wciss, “Image-based visual servo control
using relational &rapheiTor signals,” Proc. IEEE, pp. 1074-1077, 1980.
for moving targets considerable attention must be paid to 161 J. C. Latombe, Robot Motion Plunning. Boston: Kluwer, 1991.
modeling the dynamics of the robot, the target, and vision ~ 7 1J. J. Craig, Infroduction lo Roboiics. Menlo Park: Addison-Wesley.
system and designing an appropriate control system. Other 2nd ed., 1986.
181 B. K. P. Horn, Robot Vision. Cambridge, MA: MIT Press, 1986.
issues for consideration include whether or not the vision 191 N. Hollinghurst and R. Cipolla, “Uncalibrated stereo hand eye coor-
system should “close the loop” around robot axes which can be dination,” Image und Msion Computing. vol. 12, no. 3, pp. 187-192,
1994.
position, velocity or torque controlled. A detailed discussion l1Ol J. Feddema and 0. Mitchcll, ‘Vision-guided servoing with feature-
of these dynamic issues in visual servo systems is given by based trajectory generation,” lEEE Truns. Robot. Aufomul., vol. 5 , pp.
Corke [29], [79]. 691-700, Oct. 1989.
[ I l l B. Espiau, F. Chaumette, and P. Rives, “.4ncw approach to visual
In addition to these “low-level” considerations, other issues servoing in robotics,” IEEE lruns. Rohof. Autonzut., vol. 8, pp. 3 13-326,
that merit consideration are vision-based path planning, and 1992.
1121 M. L. Cyros, “Datacube at the space shuttle’s launch pad,” Dalucuhe
‘However recent results indicate that a visual servo system will converge World Review. vol. 2, pp. 1-3, Sept. 1988. Datacube Inc., 4 Dearborn
dcspite quite significant image Jacobian errors. Road. Peabody, MA.
Authorized licensed use limited to: UNIVERSIDAD DE LAS AMERICAS PUEBLA. Downloaded on November 04,2022 at 15:21:45 UTC from IEEE Xplore. Restrictions apply.
HUTCHINSON et a1 TUTORIAL ON VISUAL SERVO CONTROL 669
113) W. Jang, K. Kim, M. Chung, and Z. Bien, “Concepts of augmented [40] R. M. Haralick, C. Lee, K. Ottenberg, and M. Nolle, “Analysis and
image space and transformed feature space for efficient visual servoing solutions of the three point perspective pose estimation problem,” in
of an “eye-in-hand robot”,” Roborica, vol. 9, pp. 203-212, 1991. Proc. IEEE Cunf Computer Vision Pattern Recognition, pp. 592-598,
[I41 A. Castano and S . A. Hutchinson, “Visual compliance: Task-directed vi- 1991.
sual servo control,” IEEE Truns. Robot. Automa/., vol. 10, pp. 334-342, [41] D. DeMenthon and L. S. Davis, “Exact and approximate solutions of the
June 1994. perspective-three-point problem,” IEEE Trans. Pattern Anal. Machine
[I51 K. Hashimoto, T. Kimoto, T. Ehine, and H. Kimura, “Manipulator Intell., no. 1 I , pp. 1100-1105, 1992.
control with image-based visual servo,” in Proc. IEEE In/’/ Conf on 1421 R. Horaud, B. Canio, and 0. Leboullenx, “An analytic solution for the
Robotics cmd Automution, 1991, pp. 2267-2272. perspective 4-point problem,” Comput. Vision Graphics, Image Process,
1161 N . P. Papanikolopoulos and P. K. Khosla, “Adaptive robot visual no. 1, pp. 33-1-4, 1989.
tracking: Theory and experiments,” IEEE Trans. Automat. Contr., vol. [43] M. Dhome, M. Richetin, J. LaprestC, and G. Rives, “Determination of
38, no. 3, pp. 429445, 1993. the attitude of 3-D objects from a single perspective view,” IEEE Trans.
1171 N. P. Papanikolopoulos, P. K. Khosla, and T. Kanade, “Visual tracking Pattern Anal. Machine Intell., no. 12, pp. 1265-1278, 1989.
of a moving target by a camera mounted on a robot: A combination 1441 G. H. Rosenfield, “The problem of exterior orientation in photogram-
of vision and control,” IEEE Trans. Robot. Automczt., vol. 9, no. I, pp. metry,” Photogrumnietric Eng., pp. 536-553, 1959.
14-35, 1993. 1451 D. G. Lowe, “Fitting parametrized three-dimensional models to im-
[ 181 S. Skaar, W. Brockman, and R. Hanson, “Camera-space manipulation,” ages,” IEEE Trans. Pattern Anal. Machine Intell., no. 5 , pp. 441450,
In/. J. Robot. Res., vol. 6 , no. 4, pp. 20-32, 1987. 1991.
1191 S. B. Skaar, W. H. Brockman, and W. S. Jang, “Three-dimensional 1461 R. Goldberg, “Constrained pose refinement of parametric objects,” Int.
camcra spacc manipulation,” Irzt. J. Robol. Res., vol. 9, no. 4, pp. 22-39, J. Comput. Vision, no. 2, pp. 181-211, 1994.
1990. [47] R. Kumar, “Robust methods for estimating pose and a sensitivity
(201 J. T. Feddema, C. S . G. Lee, and 0. R. Mitchell, “Weighted selection of analysis,” CVGIP: Image Understanding, no. 3, pp. 313-342,
image features for resolved rate visual feedback control,” IEEE Trans. 1994.
Robot. Automat., vol. 7, pp. 31-47, Feb. 1991. 1481 S. Ganapathy, “Decomposition of transformation matrices for robot
1211 A. C. Sanderson, L. E. Weiss, and C. P. Neuman, “Dynamic sensor- vision,” Pattern Recog. Lezt., pp. 4 0 1 4 1 2 , 1989.
based control of robots with visual feedback,” [EEE Truns. Robot. 1491 M. Fischler and R. C. Bolles, “Random sample consensus: A paradigm
Auroniat., vol. RA-3, pp. 404-417, Oct. 1987. for model fitting and automatic cartography,” Commun. ACM, no. 6,
1221 M. Lei and B. K. Ghosh, “Visually-guided robotic motion tracking,” pp. 381-395, 1981.
in Pror.. Thirtieth Annu. Allerton Con/: on Communicution, Control, and [SO] Y. Liu, T. S. Huang, and 0. D. Faugeras, “Determination of camera
location from 2-D to 3-D line and point correspondences,” IEEE Trans.
Computing, 1992, pp. 712-721.
Pat. Anal. Machine Intell., no. 1, pp. 28-37, 1990.
[23] R. L. Andersson, A Robot Ping-Pong Player. Experiment in Real-Time
lntellipxt Control. Cambridge, MA: MIT Press, 1988. [SI] C. Lu, E. J. Mjolsness, and G. D. Hager, “Online computation of exterior
orientation with application to hand-eye calibration,” DCS RR- 1046,
1241 B. Yoshimi and P. K. Allen, “Active, uncalibrated visual servoing,”
in Proc. IEEE In/. Conf: on Rohotica and Automation, San Diego, CA,
Yale University, New Haven, CT, Aug. 1994; To appear in Mathematical
and Compuler Modeling.
May 1994, pp. 156-161.
[52] A. Gelb, Ed., Applied Optimal Estimation. Cambridge, MA: MIT
1251 B. Nelson and P. K. Khosla, “Integrating sensor placement arid vi-
Press, 1974.
sual tracking strategies,” in Pmc. IEEE Int ’1 Con$ on Robozics and
1.531 W. Wilson, “Visual servo control of robots using Kalman filter estimates
Automation, 1994, pp. 1351-1356.
of robot pose relative to work-pieces,” in Visual Sewoing, K. Hashimoto,
1261 I. E. Sutherland, “Three-dimensional data input by tablet,” Proc. IEEE,
Ed. Singapore: World Scientific, 1994, pp. 71-104.
vol. 62, pp. 453461, Apr. 1974. 1.541 C. Fagerer, D. Dickmanns, and E. Dickmanns, “Visual grasping with
1271 R . Tsai and R. Lenz, “A new technique for fully autonomous and effi- long delay time of a free floating object in orbit,” Anton. Robots, vol.
cient 3D robotics handeye calibi-a tion.” IEEE Truns. Robot. Automat.,
1, no. I , 1994.
vol. 5 , pp. 345-358, June 1989. [SS] J. Pretlove and G. Parker, “The development of a real-time stereo-vision
1281 R. Tsai, “A versatile camera calibration technique for high accuracy 3D system to aid robot guidance in carrying out a typical manufacturing
machine vision m etrology using off-the-shclf TV cameras and lenses,” task,” in Proc. 22nd ISRR, Detroit, MI, 1991, pp. 21.1-21.23.
IEEE Trans. Robot. Automat., vol. pp. 323-344, Aug. 1987. [56] B. K. P. Horn, H. M. Hilden, and S. Negahdaripour, “Closed-form
1291 P. 1. Corke, High-Peiformance u ~ Closed-Lmp
l Robot Control. solution of absolute orientation using orthonormal matrices,” J. Opt.
Ph.D. Dissertation, Univcrsity of Melbourne, Dept. Mechanical and Soc. Amer., vol. A-5, pp. 1127-1135, 198.
Manufacturing Engineering, July 1994. [57] K. S . Arun, T. S. Huang, and S. D. Blostein, “Least-squares fitting of
[30] D. E. Whitney, “The mathematics of coordinated control of’ prosthetic two 3-D point sets,” IEEE Trans. Pattern Anal. Machine Intell., vol. 9,
arms and manipulators,” J. Dyn. Svst., Meas. Control, vol. 122, pp. pp. 698-700, 1987.
303-309, Dec. 1972. [58] B. K. P. Horn, “Closed-form solution of absolute orientation using unit
1311 S. Chieaverini, L. Sciavicco, and B. Siciliano, “Control of robotic quaternions,” J. Opt. Soc. Amer., vol. A-4, pp. 629-642, 1987.
systems through singularities,” in Proc. Inf. Workshop on Nonlineur [59] G. D. Hager, G. Grunwald, and G. Hirzinger, “Feature-based visual
and Aduptive Con/rol: I.s.sue.s in Robotics C. C. de Wit, Ed. Berlin: servoing and ita application to telerobotics,” in Proc. IEEWRSJ Int. Con$
Springer-Verlag, 199 I. on lntellijient Robo!s and Systems, Jan. 1994, pp. 1644171.
-1321- S. Wiiesoma, D. Wolfe, and R. Richards, “Eye-to-hand coordination for [60] G. Agin, “Calibration and use of a light stripe range scnsor mounted
vision-guided robot control applications,” Int. J. Robot. Re.s., vol. 12, on the hand of a robot,” in Proc. IEEE In/. Conf: on Robotics und
no. 1, pp. 65-78, 1993. Automution, 1985, pp. 680-685.
[33] G. D. Hager, W.-C. Chang, and A. S. Morse, “Robot hand-eye coordi- 1611 S. Venkatesan and C. Archibald, “Realtime tracking in five degrees of
nation based on stereo vision,” I Control Sy.s/. Mag., vol. 15, pp. freedom using two wrist-mounted laser
30-39, Feb. 1995. Int. Conj! on Robotics and Automation, 1990, pp.
[34] C. Samson, M. Le Borgne, and B. Espiau, Robot Control: The Task [62] J. Dietrich, G. Hirzinger, B. Gombert,
Function Approuch. Oxford, England: Clarendon, 1992. concept for a new generation of light-w
1351 G. Franklin, J. Powell, and A. Emami-Naeini, Feedback Control of Robotics 1, V. Hayward and 0. Khatib, eds., Berling, Germany:
Dynamic Systems. Boston, MA: Addison-Wesley, 2nd ed., 1991. Springer-Verlag, 1989, pp. 287-295. (vol. 139 of Lecture Notes in
[36] P. K. Allen, A. Timcenko, B. Yoshimi, and P. Michelman, “Automated Control and Information Sciences).
tracking and grasping of a moving object with a robotic hand-eye [63] J. Aloimonos and D. P. Tsakiris, “On the mathematics of visual
system,” IEEE Trclns. Robot. Automat., vol. 9, no. 2, pp. 152-165, 1993. tracking,” Image and Vision Computing, vol. 9, pp. 235-251, Aug. 1991.
[37] A. Rizzi and D. Koditschek, “An active visual estimator for dexterous [64] R. M. Haralick and L. G. Shapiro, Computer and Robot Vision. Read-
manipulation,” in Proc. IEEE Int. Con$ on Robotics und AUt<JmUtkJn, ing, MA: Addison-Wesley, 1993.
1994. [65] F. W. Warner, Foundations uf Differentiublr Manifolds and Lie Groups.
1381 T. S. Huang and A. N. Netravali, “Motion and structure from feature New York: Springer-Verlag, 1983.
correspondences: A review,” Proc. IEEE, vol. 82, no. 2, pp. 252-268, [66] W. Jang and Z. Bien, “Feature-based visual servoing of an eye-in-hand
1994. robot with improved tracking performance,” in Proc. IEEE Int. Con$ on
1391 M. A. Fischler and R. C. Bolles, “Random sample consensus: a paradigm Robotics and Automation, 199 I , pp. 2254-2260.
for model fitting with applicatio ns to image analysis and automated [67] 0. Faugeras, Three-Dimensional Computer Vision. Cambridge, MA:
cartography,” Conzmun. ACM, vol. 24, pp. 381-395, June 1981. MIT Press, 1993.
Authorized licensed use limited to: UNIVERSIDAD DE LAS AMERICAS PUEBLA. Downloaded on November 04,2022 at 15:21:45 UTC from IEEE Xplore. Restrictions apply.
670 lEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, VOL. 12, NO 5, OCTOBER 1996
[68] G. D.Hager, “Calibration-free visual control using projective invari- Seth Hutchinson (S’85-M’88), for a photograph and biography, see p. 650
ance,” in Proc. ICCV, pp. 1009-1015, 1995. Also available as Yale of this issue.
CS-RR-1046.
[69] D. Kim, A. Rizzi, G. D.Hager, and D. Koditschek, “A “robust”
convergent visual servoine - svstem.”
, in Proc. IEEE/RSJ Int. Conf on
Intelligent Robots and Systems, 1995, vol. I, pp. 348-353. Gregory D. Hager (S’85-M’SS), for a photograph and biography, see p. 650
[70] R. L. Anderson, “Dynamic sensing in a ping-pong playing robot,” IEEE of this issue,
Trans. Robot. Automat.. vol. 5. no. 6. DU. 723-739. 1989.
I I L
Authorized licensed use limited to: UNIVERSIDAD DE LAS AMERICAS PUEBLA. Downloaded on November 04,2022 at 15:21:45 UTC from IEEE Xplore. Restrictions apply.