Roboscan: A Combined 2D and 3D Vision System For Improved Speed and Flexibility in Pick-And-Place Operation
Roboscan: A Combined 2D and 3D Vision System For Improved Speed and Flexibility in Pick-And-Place Operation
DOI 10.1007/s00170-013-5138-z
ORIGINAL ARTICLE
Received: 31 December 2012 / Accepted: 17 June 2013 / Published online: 11 July 2013
# Springer-Verlag London 2013
principle, reduces the amount of activity required to properly measurement accuracy since present 2D elaboration tech-
position the items before the manipulation area. niques are more accurate than 3D ones. Therefore, “tandem”
Vision systems can be broadly subdivided in two major 2D and 3D operation of the vision system could be beneficial
categories: 2D vision and 3D vision (today, also 4-D vision in terms of the general efficacy of a pick-and-place operation.
systems—where the fourth “dimension” can be either a To prove the above concept, in this paper we describe
spectral property, or colour, or any other measurable Roboscan, a Robot-guiding vision system that combines 2D
property—are investigated, but their application to Robotics and 3D vision techniques and integrates them into an anthro-
is still premature). 2D vision makes use of a single camera pomorphic manipulator, for the optimization of pick-and-
(with either a line- or matrix organization). At present, 2D place operations. We demonstrated that the use of 2D to
vision has reached a considerable degree of maturity in perform information retrieval greatly simplifies the extrac-
manufacturing and product inspection, and its application tion of some features of the scene that would be difficult to
to Robots to monitor the scene and to allow the manipulator treat with 3D techniques alone. The system is composed of a
to adapt to varying scenarios is a state-of-art. 3D laser slit sensor mounted on the robot arm. The measure-
3D vision is accomplished by using either two cameras ment is based on optical triangulation: the 3D information
“seeing” the scene from two different angles (passive stereo about the scene is acquired by scanning the working area [6].
vision), or a single camera with a special “information-car- Segmentation of objects in the 3D point cloud is performed
rying illuminator”, in general a laser stripe or a set of fringes, by means of suitable 2D information, collected by the camera,
which illuminate the scene at an angle with respect to that of used in a "stand-alone" mode. Suitable geometrical pattern
the camera (active stereo vision) [5–9]. All 3D systems matching is applied to the 2D scene, to count the objects and
share, in relation to their use in Robotic picking-and- to estimate their position. This information is exploited to
placing, the ability to give the Robot depth information that identify each object in the scene, in a very flexible way:
is obviously absent in 2D systems. actually the robot is able to pick up objects randomly placed
In general, handling 2D data sets (organized in a bi- in the scene without particular constraints on the typology of
dimensional data matrix) is straightforward, and the extrac- the objects.
tion of the information about pose and orientation from the In this application, the LabVIEW platform has been used
matrix is rather easy: a number of libraries for edge extrac- to develop the whole software architecture, using the "Vision
tion, template matching, etc. are available in combination Development Module" for the image processing algorithms
with standard cameras. “Intelligent” cameras (equipped with and the "Labview Robotics Library for DENSO" for the
elaboration power) are able to perform a large amount of robot motion [17].
operations in “quasi” real time [10]. The system has been tested to evaluate its effectiveness
With 3D systems, the elaboration of the data is more and flexibility with special attention to the measurement
complex: the data set is the so-called three-dimensional point performances of the vision system and to the accuracy of
cloud, and segmentation of the object from the scene, object segmentation. To this aim, special scenarios have been
contouring, pose estimation and template matching (e.g. prepared to evaluate the system capability to interact with
with respect to a CAD description of the object) can be real situations, using some common objects to simulate pick-
time-consuming. Also, 3D libraries to perform these opera- and-place operations.
tions are available, and a number of novel techniques includ- The paper is organized as follows. In Section 2, the
ing neuro-fuzzy functions are proposed. However, it is true Roboscan system is presented. In Section 3, the system work
that 3D vision techniques do not allow fast pick and place flow is detailed, together with the basic features of both 2D
operations of Robots due to the time-consuming information and 3D vision procedures. Section 4 shows the experimental
retrieval operations [11–16]. results.
A simple, but very effective alternative solution, which
could combine the ease and speed of operation of 2D sys-
tems with the completeness of information of 3D ones, can 2 Description of the system components
be represented by a suitable match of the two techniques. A
3D system inherently includes a 2D system: in fact it is The Roboscan system is presented in Fig. 1. It is composed
equipped with at least one camera. If this camera is suitably of two subsystems: the Robot Subsystem and the Vision
oriented with respect to the object to be sampled, it can be Subsystem.
very effectively used alone, to build an additional 2D system The Robot Subsystem is composed of an anthropomor-
together with its libraries. A number of operations can be phic, 6-DOF manipulator (DENSO VP-6556G). The end
performed by the 2D system, and the results can be passed to effector is a pneumatic vacuum gripper, using a VSN1420 tool
the 3D system to facilitate the information retrieval from equipped with a silicon bellow vacuum cup (GIMATIC s.p.a.
the point cloud. This, in general, increases also the object Italy). The robot is cabled to a DENSO RC7M controller,
Int J Adv Manuf Technol (2013) 69:1873–1886 1875
The contact tool shown in Fig. 3 is used to calibrate the The calibration of the 3DLS sensor is aimed at estimating the
robot. By means of the teach-pendant, it is moved to pose and the orientation of the optical pair (i.e. camera
either the centre of the framed marker, the centre of one M_CAM and laser LSR) with respect to GRS. To this aim,
marker along the row and one marker along the column 3DLS is oriented as shown Fig. 3, with LSR perpendicular to
that intersect in point O on the master. The correspond- the calibration plate, and M_CAM at 55° with respect to it.
ing positions are learned and define origin O and axes X The system baseline is at Z0 =350 mm from the plate. Firstly,
and Y, respectively. Z is taken perpendicularly to plane X two sets of images are acquired: the former is obtained by
Y, and oriented as shown in the figure. This procedure is moving the sensor along −Z from Z0 at steps of 5 mm for
simple and very fast; the positioning precision of the contact N=10 times, and by acquiring the image of the calibration
tool is not critical. plate at each Zh height (h=1,…,N). The latter results from the
Int J Adv Manuf Technol (2013) 69:1873–1886 1877
acquisition of the laser blade projected on the calibration equation system. This is done, on one side, by feeding it with
plate in correspondence with each Zh position. Hence, the the centroid coordinates from the images in the first set as the
whole range along Z is equal to 45 mm. An example of the known terms and, on the other side, by using the a priori
images from the two sets is shown in Fig. 4. known coordinates of the corresponding markers in the
Secondly, M_CAM is calibrated. The well-known Pin- calibration plate as the coefficient of the unknown camera
hole camera model has been chosen to estimate the camera parameters. A simple maximum likelihood algorithm solves
parameters. These are three rotation parameters, three trans- for the unknowns.
lation parameters, the focal length, one scale factor and the The last step is the calibration of laser LSR. It basically
image coordinates of the central image point [19]. We chose consists into the estimation of the coefficients of the equation
this model as the most appropriate in this application for its of the plane of light projected by the device. To this aim, the
simplicity and accuracy. It describes the camera behaviour laser blades acquired in the second image set are elaborated
by means of a system of two linear equations (the so-called to map the intensity values into image pixel coordinates,
perspective equations), which combine the 11 unknown forming the so-called signal of Centres of Gravity (signal
camera parameters with the pixel coordinates of each imaged CG) [19]. These values are then used into the camera per-
point. The unknowns are calculated by overestimating the spective equations and solved for coordinates X and Y, being
Z equal to values Zh in the images. The resulting point cloud
represents the light plane into GRS: it is fitted to an equation
of the following type:
z ¼ ax þ by þ c ð1Þ
In Eq. 1 coefficients a, b and c express the orientation of
LSR in GRS. The result of the calibration is that the 3DLS
sensor is able to measure the coordinates of each point
illuminated by the laser blade in the X, Y, Z reference.
Although the whole process seems a little bit complex, it
is not: in the experimental practice, it only requires the
acquisition of the two sets of images, which is automatically
carried out by the robot in very short time.
3.3 3D scanning
In this task, the robot scans the ROS area along X. Figure 6a Fig. 6 3D scanning procedure. a Deformation of the laser blade
shows an example of the deformation induced by the object acquired by M_CAM and signal of the centres of gravity; b Resulting
shape on the laser blade. The pattern is acquired by M_CAM 3D point cloud
Int J Adv Manuf Technol (2013) 69:1873–1886 1879
templates in acquired images [21]. The technique is well all the objects, except for those labelled 12 and 13, since they
known: it is based on the convolution between a template and are almost completely occluded. It is worth noting that also
the image. The template must retain information about the objects 6, 9 and 11 are detected, despite the fact that they are
geometry (shape) of the searched element. Every time the not in the foreground. In the figure, the centre of each box is
template is found in the image, the corresponding element is denoted by point C. The coordinates of each centre are also
framed in the image by a bounding box. The coordinates of its visible. In the following, they will be denoted by Xc, Yc.
centre point, the scale factor, the dimension of the box and
information about its orientation are also provided.
3.5 3D cloud segmentation
Geometric template matching has been preferred to grey
level template matching, to decrease the dependence of the
The segmentation of the whole 3D cloud is performed in two
detection on both the colour of the objects and the environ-
steps. In the former, the order of the segmentation is set in a
mental illumination. To further increase the robustness of the
segmentation list. Objects are sorted using decreasing values
geometric matching, the templates have been defined by
of the scale factor parameter: this operation allows us to
means of a suitably developed image processing procedure.
segment fully imaged objects first (i.e. objects that are not
The procedure is based on (1) brightness and contrast
occluded), and thereafter increasingly occluded objects. In
adjustment and (2) Laplacian edge detection. Task (1) is
the latter step, sub-clouds are extracted from the whole point
carried out to maximize the grey level dynamics. Task (2)
cloud (segmentation). This operation is performed starting
is aimed at retaining only the object’s contours. A FIR high-
from the first element in the segmentation list: the grid of X,
pass filter is implemented, by convolving the image with the
Y coordinates that corresponds to this element is easily
following bi-dimensional kernel:
calculated from both coordinates (Xc, Yc) of point C and
0 1 the dimension of the bounding box. Then, points in the
−1 −1 −1 −1 −1 −1 −1
B −1 −1 −1 −1 −1 −1 −1 C whole 3D point cloud with coordinates X, Y corresponding
B C
B −1 −1 −1 −1 −1 −1 −1 C to those in the grid are saved in a new file, and are eliminated
B C
Laplacian ¼ B
B −1 −1 −1 48 −1 −1 −1 C
C ð3Þ from the original cloud. This process loops until the segmen-
B −1 −1 −1 −1 −1 −1 −1 C tation list is empty. As a result, a number of sub-clouds are
B C
@ −1 −1 −1 −1 −1 −1 −1 A obtained, corresponding to detected elements in the scene.
−1 −1 −1 −1 −1 −1 −1 The original point cloud becomes smaller at each loop, and
retains only 3D points of unrecognized objects: these will be
The kernel coefficients are chosen so that the image re- segmented afterwards, when a new, inherently smaller point
gions presenting both constant and slow-to-medium varia- cloud will be acquired (see Section 3).
tions of the grey levels are set to zero. Only the boundaries
corresponding to the contours of the objects are retained. To 3.6 3D object description
prevent from image noise amplification, a FIR low-pass filter
is applied before it. It is based on the following kernel: The aim of this procedure is to provide information about the
0 1 approaching direction and the final position that the robot
1 2 4 2 1
B2 end effector must have to correctly pick objects up. Firstly,
B 4 8 4 2CC
Gaussian ¼ B
B4 8 16 8 4CC ð4Þ for each sub-cloud, a surface fitting algorithm is performed,
@2 4 8 4 2A in order to find the equation of the surface that best repre-
1 2 4 2 1 sents the object. The system is able to find objects with
different types of surfaces (planar, spherical, cylindrical or
The kernel coefficients have been determined by sam- pyramidal), by means of a dedicated procedure that fits the
pling the Gaussian bi-dimensional impulse response, to min- following surface:
imize the Gibbs phenomenon [22]. Dedicated VIs (LabView
z ¼ f ðx; yÞ ¼ Dx2 þ Ey2 þ Fx þ Gy þ Hxy þ J ð5Þ
Virtual Instruments), belonging to IMAQ Vision Develop-
ment module, have been used to implement the described where parameters D, E, F, G, H and J are the unknowns. The
procedure [17]. These are the IMAQ BCGLookup.VI for Levemberg-Marquardt algorithm is used to iteratively esti-
Task (1) and the IMAQ Convolute.VI for Tasks (2) and (3). mate these parameters by means of a maximum likelihood
As an example, Fig. 7a shows the effect of this process- algorithm [23].
ing: the template obtained from the object at left is defined Secondly, the equation of the plane tangent to the surface
only by the object contour, and is not influenced by the at point C(Xc, Yc) is calculated, and the normal vector at the
surface reflectance. Figure 7b shows the performance of plane is determined. The vector orientation is defined in GRS
GTM on the image in Fig. 5a: the algorithm is able to detect by using the cosine value of angles α, β and γ, as shown in
1880 Int J Adv Manuf Technol (2013) 69:1873–1886
Fig. 8. These are the director cosines that allow the end 3DLS sensor. Then, specific tests have been performed to
effector to correctly approach the object. assess the robustness of the vision algorithms, as well as their
Finally, coordinate Zc along Z of the fitted surface is
calculated, solving Eq. 5 for X=Xc and Y=Yc. Value Zc tells
the robot the final position of the end effector. Figure 9
shows the performance of this procedure on the point cloud
in Fig. 6b. Objects have different colours, indicating that
they have been segmented. Red arrows graphically represent
normal vectors. Table 1 lists for each object the values of
origin coordinates (Xc, Yc, Zc) and corresponding values of
angles α, β and γ.
Object picking is carried out following the order set in the
segmentation list. In this way, collisions are avoided.
4 Experimental results
area could look like. There are three yoghurt jars, four plastic
disks and four soap bars. Objects are tilted, placed upside-
down and overlapped. In addition, most of them are adjacent
to each other.
In order to handle such a situation, the templates in
Fig. 12b have been defined. Yoghurt jars are detected by
means of two templates, (i.e. standing and tilted orienta-
tions), while both disks and soap bars require the definition
of one template, respectively.
Figure 12c shows how the elements in the scene are
detected by GTM. Objects from ‘1’ to ‘10’ are framed by
their bounding boxes, the coordinates of their centres are
measured and their orientation is correctly detected. The only
object that has not been detected is the one labelled ‘11’ in
Fig. 12a: this is by no means surprising, being this element
almost completely occluded by disk ‘10’.
Another interesting example of this class of tests
deals with the scene in Fig. 11a. The templates defined
in this situation are shown in Fig. 13a, while the per-
formance of GTM is presented in Fig. 13b. Here, despite
object ‘5’ is partially occluded by object ‘4’, the matching
is positive since most of the object shape is well visible in Fig. 14 3D analysis of the scene; a 3D point cloud of the scene in
the image. Fig. 12a; b visualization of both tangent planes and normal vectors
1884 Int J Adv Manuf Technol (2013) 69:1873–1886
limitation, being the video camera a very low-cost device, which surface was a plane in itself. Figure 16b presents the sub-
could be replaced by a faster model at very reasonable costs. clouds in the PolyWorks reference system: fitted planes and
their normal vectors are overlaid onto each point cloud. Cor-
4.5 Director cosines accuracy evaluation responding elements in Fig. 16a and in Fig. 16b are identified
by the same label.
The evaluation of the accuracy in the determination of angles Table 2 shows the results. The first column identifies the
α, β and γ has been thought mandatory, in order to assess the objects. Angles α, β and γ are shown in even columns:
quality of the procedure described in Section 3.6. It has been corresponding values α', β' and γ' are listed in odd columns.
carried out comparing the values estimated by our procedure Errors (Δα=α−α'), (Δβ=β−β'), and (Δγ=γ−γ') are plotted
with those calculated by the PolyWorks IMEdit software in Fig. 16c: mean absolute values for Δα, Δβ and Δγ are
(InnovMetric, Ottawa, Ca), a commercial software specifi- 0.295°, 0.193° and 0.388°, respectively. These values can be
cally designed for creating and elaborating 3D point clouds, considered negligible as far as the pick-and-place application
for CAD, rapid prototyping and gauging applications [24]. is considered.
Among the tools available in this software, we chose the
fitting plane procedure, which is designed to estimate the
plane tangent to the 3D input cloud at a specific point: this 5 Conclusions
point, thereafter called point C', is manually selected over the
point cloud by the operator. The fitting plane tool outputs the In this work, we presented an alternative solution to fully 3D
director cosines of angles α', β', and γ' of the vector normal procedures, with the aim of obtaining effective approaches to
to the fitted plane. To provide direct comparison between the pick-and-place applications, while keeping the complexity at
values of the angles output from the PolyWorks software low levels. In particular, the interaction between 2D and 3D
with those estimated by our procedure, we have paid atten- image processing tasks has been studied and characterized
tion to avoid errors that could originate from the inaccuracy to assess the feasibility of exploiting the system in real
in establishing the correspondence between point C' in the industrial environments.
PolyWorks cloud and point C in our segmented point cloud. The method shows good performance in the presence of
In practice, even a small inaccuracy in the location of point objects characterized by simple shapes, like planes, cylin-
C' would result into even markedly different tangent planes ders, cones, spheres and free-form shapes characterized by
in the two point clouds, especially in correspondence with smooth local slopes. A great deal of work has been focused
high local surface curvatures. In such cases, the evaluation of on allowing the system to handle object occlusions and to
differences (α−α'), (β−β') and (γ−γ') would become correctly estimate the pose of objects presenting different
meaningless. shapes, colours and textures in the work area. The use of a
To overcome this problem, we set up a scene using eight video camera in the visible range and of the red laser illumi-
planar objects like those in Fig. 5a, each one tilted by a nation limits the elaboration to non-transparent objects.
certain angle. The scene has been acquired by the 3DLS We are aware that this system cannot be considered as
device: the corresponding 3D point cloud is shown in Fig. 16a. a final solution for bin-picking situations, but we also
This cloud was segmented, and angles α, β and γ were think that it can be considered a quite simple and smart
estimated for each single sub-cloud using the procedure in way to handle a rather large variety of situations. In order
Section 3.6. Then, we imported the sub-clouds into the to appreciate the work performed by Roboscan, a video
PolyWorks environment, and we applied the fitting plane tool has been prepared. It can be downloaded at the website of
to them: selection of point C' was a minor concern since each our laboratory.
1886 Int J Adv Manuf Technol (2013) 69:1873–1886
Acknowledgments The authors are grateful to Dr. Yosuke Sawada 11. Aqsense SAL3D, http://www.aqsense.com/products/sal3d. Accessed
and to Mr. Gabriele Coffetti for their continuous support during the 1 July 2013
development of this project. 12. Rusu RB, Cousins S (2011) 3D is here: Point Cloud Library (PCL).
Proceedings of the IEEE International Conference on Robotics and
Automation 2011:305–309
13. MVTec Software GmbH, Halcon—the power of machine vision—
References HDevelop User’s Guide, München, 2009, pp 185–188
14. Zhao D, Li S (2005) A 3D image processing method for
manufacturing process automation. Comput Ind 56:975–985
1. Brogardh T (2007) Present and future robot control development—an 15. Biegelbauer G, Vincze M, Wohlkinger W (2010) Model-based 3D
industrial perspective. Annual Rev Control 31:69–79 object detection: efficient approach using superquadrics. Mach
2. Xiong Yand Quek F (2002) Machine Vision for 3D mechanical part Vision Appl 21:497–516
recognition in intelligent manufacturing environments. In: Proceed- 16. Richtsfeld M, Vincze M (2009) Robotic grasping of unknown ob-
ings of the 3rd International Workshop on Robotic Motion and jects, contemporary robotics—challenges and solutions. A D
Control (RoMo-Co’02), pp 441–446 Rodić (ed.). ISBN: 978-953-307-038-4, InTech, DOI: 10.5772/7805.
3. Sakakibara S (2006) The robot cell as a re-configurable machining Available from: http://www.intechopen.com/books/contemporary-
system. In: Dashchenko AI (ed) Reconfigurable manufacturing robotics-challenges-and-solutions/robotic-grasping-of-unknown-
systems and transformable factories. Springer, Berlin, pp 259– objects. Accessed 1 July 2013
272 17. Klinger T (2003) Image processing with labVIEW and Imaq Vi-
4. Tudorie CR (2010) Different approaches in feeding of a flexible sion. Prentice-Hall, USA
manufacturing cell. In: Simulation, modeling, and programming for 18. Gruen A, Huang TS (2001) Calibration and orientation of cameras
autonomous robots. Springer-Verlag, Heidelberg, pp 509–520 in computer vision. Springer, Berlin
5. Steger C, Ulrich M, Wiedemann C (2008) Machine vision algo- 19. Sansoni G, Bellandi P, Docchio F (2011) Design and development
rithms and applications. Wiley, Weinheim of a 3D system for the measurement of tube eccentricity. Meas Sci
6. Blais F (2004) A review of 20 years of range sensors development. Technol 22:075302. doi:10.1088/0957-0233/22/7/075302
J Electron Imag 13(1):231–240 20. Moeslund TB (2012) Introduction to video and image processing.
7. Sumi Y, Kawai Y, Yoshimi T, Tomita F (2002) 3D object recognition Springer, London
in cluttered environments by segment-based stereo vision. Int J 21. Sibiryakov A et al (2008) Statistical template matching under
Comput Vision 46(1):5–23 geometric transformations. In: Coeurjolly D (ed) Discrete geometry
8. Rossi NV, Savino C (2010) A new real-time shape acquisition with for computer imagery. Springer, Berlin, pp 225–237
a laser scanner: first test results. Robot Comp Int Man 26:543–550 22. Trucco A Verri E (1998) Introductory techniques for 3D computer
9. Rahayem M, Kjellander JAP (2011) Quadric segmentation and vision. (Prentice-Hall, ISBN 0-13-261108-2)
fitting of data captured by a laser profile scanner mounted on an 23. Marquardt D (1963) An algorithm for least-squares estimation of
industrial robot. Int J Adv Manuf Technol 52:155–169 nonlinear parameters. SIAM J Appl Math 11:431–441
10. Parker JR (2010) Algorithms for image processing and computer 24. InnovMetric Software (2011) Polyworks modeler & inspector—
vision. Wiley, New York user's guide. Ste-Foy, Quèbec