0% found this document useful (0 votes)
147 views6 pages

A New Approach Using The Viterbi Algorithm in Stereo Correspondence Problem

This paper presents a new approach for solving the stereo correspondence problem using the Viterbi algorithm. The matching process is divided into two parts: 1) matching the left scene to the right scene, and 2) matching the right scene to the left scene. The result is selected by comparing the results of the two matching processes. This allows stereo matching to be performed without explicitly detecting occlusions. The complexity of the proposed method is O(6 x N x P), where N is the number of disparities and P is the length of the epipolar line, which is an improvement over other dynamic programming approaches. The method was tested on random dot and Tsukuba stereo image samples and achieved a 94.5% accuracy on Ts

Uploaded by

Jeremy Chang
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
147 views6 pages

A New Approach Using The Viterbi Algorithm in Stereo Correspondence Problem

This paper presents a new approach for solving the stereo correspondence problem using the Viterbi algorithm. The matching process is divided into two parts: 1) matching the left scene to the right scene, and 2) matching the right scene to the left scene. The result is selected by comparing the results of the two matching processes. This allows stereo matching to be performed without explicitly detecting occlusions. The complexity of the proposed method is O(6 x N x P), where N is the number of disparities and P is the length of the epipolar line, which is an improvement over other dynamic programming approaches. The method was tested on random dot and Tsukuba stereo image samples and achieved a 94.5% accuracy on Ts

Uploaded by

Jeremy Chang
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2004 IEEE International Conference on Systems, Man and Cybemetics

A New Approach Using The Viterbi Algorithm In Stereo Correspondence Problem *


'kan T. Son and Seiichi Mita
Department of Electronics and Information Toyota Institute o f Technology, Nagoya, Japan ttson,smita@toyota-ti.ac.jp

Abstract - This paper presents an approach that uses the Viterbi algorithm in a stereo correspondence problem. We propose a matchingprocess which is visualized as a trellis diagram to n d the maximum aposterior result. The matchingprocess is divided into twoparts: matching the left scene to the right scene and matching the right scene to the left scene. ?%e result o stereo problem is selected by a conif parison between the results of two matchingprocesses. This makes the stereo matchingpossible without explicitly detecting occlusions. Moi,eoveq this stereo matching algorithm can improve the accuracy of the disparif?;image, and it has an acceptable running time for practical applications since if uses a trellis diagram iteratively and bi-directionally The complexity of our proposed method is shown approximately as O(6 x N x P ) ,in which N is the number ofdispariw, and P is the length ofthe epipolar line in both the l e j and right images. Our proposed method has been proved to be robust when applied to several samples of stereo images such as randoni dot and Tsukuba image. Itpmvides a 94.5 percent accuracy for the Tsrrkuba images. Keywords: the Viterbi algorithm, stereo correspondence
image, occlusion region, disparity, trellis diagram, epipolar line.

discontinuities in depth for stereo images. The difficulties of the stereo correspondence problem in three observations are described as follows: 1) Many images frequently have large regions oflow texture in which sufficient information forpattem matching can not be obtained. This obstacle strongly induces large errors in the window-based correlation method. 2) When the occlusion regions are large, the estimation of occlusion regions makes finding an exact disparity a highly time-consuming process. This demerit is a major problem in many methods based on dynamic programming. 3) When the gray level of occlusion regions is too similar to that ofthe background, it is difficult to identify the difference between occlusion edges and background level edges. A number of studies on disparity estimation have been conducted since the pioneering work by Marr and Poggio (81 in 1976. There are many existing techniques for disparity estimation such as window matching [7], disparity-space based techniques [ I ; 3, 61, graph-based method 121, 3-D maximum surface techniques [9], etc. However, there is still scope for an improvement in performance for gaining a high-accuracy result with a low-complexity calculation. Based on the asymmetry of occlusion regions in two stereo images, we propose a method using the Viterbi algorithm (dynamic programming) iteratively and bi-directionally to overcome two disculties

1 Introduction
Stereo correspondence is one ofthe fundamental and classical problems in computer vision. It plays an important role in many fields such as optical flow detection, image segmentation for object-based video coding, and it is also a key factor in virtual image synthesis. The stereo correspondence problem is detecting the disparity between two corresponding pixels positioned in the same row of stereo images. In most papers, it has been assumed that the correspondence between two pixels in the left and right images is unique. However, there are some regions in stereo images where we cannot find uniquely corresponding points between a pair of stereo images. Those regions are called occlusion regions, and a pixel of occlusion regions is called an occlusion point. The existence of occlusion regions causes disparity between the right and left images and also causes
'0-7803-85667/04/$20.00 @ 2004 IEEE.

Figure 1: Relation between depth and disparity

3016

listed above without explicitly detecting occlusion regions in stereo images. In this paper, we find the method which satisfies high accuracy of the disparity image and less cornputational complexity within the framework of one monotonic constraint and one Bayesian relation that were proposed by [ I , 61. The rest of the paper is organized as follows: Section 2 presents stereo correspondence problem and our approach to this problem. Our proposed method is described in the section 3. Section 4 consists of simulation results and concluding remarks, and section 5 presents ow conclusion.

2 Problem statement and approach analysis


2.1 Problem statement
A simple geometry of a two-camera system 11 is shown in 4 Fig. I . In the relative coordinate systems of the two cameras, let b denote the baseline connecting the two focal points Cl and C, of the camera system, let f denote the focal len,$h, let pl and p , denote the projection of p to the left and the right image plane, and let ZLand z denote the coordinates , of pl and p , in each image plane, respectively. We assume that the optical axes of the two cameras are parallel to sone another and perpendicular to the baseline b connecting the two camera centers. Moreover, the focal lengths of the two cameras are equal to each other. Disparity is defined as the difference d = 2, - 21. As in Fig. 1, it can be easily determined that there is a relation between the disparity d and the 3-D coordinate of point p defined as (x',' z'), in which p y, is a point on the surface of a real 3-D object.

Figure 2: Example of a matching space. space considered in our method. A set of diagonal lines D {Do,D1,.,DN}is defined by the following equation ..
=

d=x,-xi=bx

f -. ",, *

= -(2,+z 2d

1).

(1)

Let IL(z,y) and IR(z,y) denote two hnctions of the left and the right image, respectively. The solution to the correspondence problem of I L ( z ,y) and I R ( z ,y) is a disparity field d(z:y). The disparity d(x,y) is defined a s the difference in distance ofposition between one pixel in the left image and its corresponding pixel in the same row of the right image (or vice versa) such that

I R ( z , Y) = IL(x + d(z,U), Y).

12)

Matching space A4 is a 2-D space where the axes are given by the epipolar at row y, of the left and right images. Each element M(2,r ) of the matching space is a measure to deter, mine whether a pixel in the row y of the left image at po:dtion 1 matches a pixel of the right image at position r . The solution of the matching problem is a path in the matching space that satisfies an optimal criterion of a cost function.
A l ( 1 , ~ ) lPL(zl,Yj) - I R ( z T , ~ j ) l , =

(3)

where I, T = 1 3 , and P is the length of the epipolar line in left and right images. Figure 2 presents an example of a matching space. The parallelogram ABCD is the matching

. where1 = 1,.. ,P - i , ? = i + l , .. . , P , a n d i = 0 , . . . , N . Note that N is the maximum index of disparity. The monotonicity constraint requires that for any matched pair (?, I ) at least one of the neighboring points at T 1 (in the right image) or I 1 (in the left image) must also be matched to a point located at a position which is larger than 1 (or r ) . The monotonicity constraint implies that a discontinuity expressed by a vertical or horizontal jump in the matching space cannot occur simultaneously at the same position in the epipolar lines of the left and right images. In most studies, the dynamic programming algorithm has been used to match the left and right scan line in only one direction: leftscan line (one row in the left image) to right-scan line (one row in the right image) or vice-versa. For instance, in Fig. 3 if we try to find a correspondence for each position on the left-scan line from the right-scan line, for any matched pair (z~, in the process of matching the left-scan line to the 5,) right-scan line, zTis chosen from N + 1neighboring points of the same position as zi in the right image. We can easily 1 ,' obtain a matched pair (21,2 , ) when z is positioned in ? &, and S. On the other hand, if zl is positioned in R,one zT corresponds to many x1 which violates the uniqueness constraint. To solve this problem using dynamic programming, for each 5 positioned on R, it is necessary to check all the 1 possible candidates for all the positions on R. As a result, the number of states in dynamic programming increases to w x N states, in which w is equal the maximum length of occlusion regions. Namely, a maximum size of the search space for the dynamic programming is determined by area R. Therefore, the conventional dynamic programming requires a large computation complexity for at least O(w x N 2 x P). In our proposed method, the computational complexity is reduced

3017

--.Violation of monotonicity

I Occlusion region

1) Matching space M e f t - - f o - , . i g h t

2) Matching space Mrtght-to--left

Figure 3: The matching process comprises W Opaw: 1) mafching the left-scan line to the right-scan line, and 2) matching the right-scan line to the left-scan line.

to 0 ( 6 x N x P ) by reducing links of each state and using the Viterhi algorithm bi-directionally.

space of (2).
Aileft-to--right

= A4vrght--to-left.

(5)

2.2 Reduction in the number of states


The concept of our method is presented in Fig. 3 . There are two matching spaces: Adi,ft-ro-,.ishl and A47ight-to-left. Let A be the left-lower comer and B be the right-upper comer in the matching space A4teft-t.-7ight. Then, we go from .4 to B to find a correct path corresponding to the distribution of disparity shown as a solid line in Fig. 3, and the matching space A/lright+-left in the second part is used to define the correct path from B to A. The occlusion regions in both the stereo images are asymmetric if the monotonic constraint is satisfied. As a result, if we match the right-scan line to the left-scan line, the error caused by the occlusion regions in the left cameras image is always larger than the error caused by the non-occlusion regions in the right cameras image at the same position on the disparity image, and likewise for matching the left-scan line to the right-scan line. In our approach, if an occlusion appears on the left scene, then we cross over it by matching the rightscan line to the left-scan line. Likewise, if the occlusion appears on the right scene, then we can use the left-scan line to match to the right-scan line to avoid unmatched occlusion regions. The matching process is divided into two parts: (1) matching the left-scan line to the right-scan line by scanning in the left-to-right direction along the epipolar lines, and (2) matching the right-scan line to the left-scan line by scanning in the right-to-left direction. The last result is attained by selecting candidates of minimum error from the two results of process ( I ) and (2). It can be easily determined that the matching space of (1) is just the transpose of the matching

We apply the Viterbi algorithm to detect the disparity in the two directions and select the last result from a comparison of two obtained results based on the selection of minimum error. We observed that after passing through the occlusion regions the Viterbi detector requires that the ACS (add, compare, and select) he run at least N times in order for the paths on the trellis converge. Once the paths merge, the Viterhi detector can operate in order. These N matching losses may be neglected in most pictures. This method has two advantages: I ) In the matching process, there is no need to detect occlusion regions, and 2) the number of states is equal to N 1as the number of disparity.

23 .

Reduction in the number of links

In general, if we attempt to find the best solution among all the combinations in this matching space, this problem will become an NP problem. Therefore, calculation should he confined to high probability parts of a stereo pair in image space in order to reduce computational complexity. We assume Gibbs distribution [SIfor the cost function. Using continuity, the difference in disparity between the neighboring pixels must be small except when the pixel is at the edge of a stereo image. Thus, all pixels have a link to their neigbborhood within a radius of fl.This implies that the number of links for each state is equal to 3. To express discontinuity in disparity, an extra link between two states that are not neighbors is formed. This means that a state at the i t h position connected to the state at the (i iT ) t hposition. By matching the two directions and relaxing the links of each state, the complexity of computation is reduced from O(v x N 2 x P) to 0 ( 2 x 3N x P ) , where N is the number of states, P is

3018

: :I
Statel

.......... ....... ....... . . . .

........ ........... ........ ... .. .. .. . . . .. . . . . . . . ...:; .

::::[$Kg
1 0
L..

Stat$$

K1

=;;,, ;:. ~"


................ ........ .....
:::

........... :

......

N
I
2

... . . . . . . ... . .

K-l

'0 K

Figure 4: [a) The trellis code for matching the left-scanline lo the right-scan line by scanning from left to right on the epipolar line with the extra step T equal to 3; (b) The trellis code when matching the right-scan line to left-scanline by scanning from right to left on the epipolar line with T = 3. the length of the epipolar line, and v i s the maximum leiigth of occlusion regions.

3 The Viterbi algorithm in the correspondence problem


3.1 A priori information
We define this undetermined problem in an assumption of a Markov process while searching the path of minimum error in the matching space. In addition, a priori information A is also defined for maintaining the smoothness on the reconstructed surfaces. A is defined in disparity field estimation as follows Arg max Prob(SjI O j , A ) , (6)
s,

Prob(S,) is a priori probability defined by apnori information ofthe two stereo images. We apply the vertical information, i.e., the definite states in the upper row yjpl and in the , lower row y j + ~ of the two stereo images for the definition of a priori infomiation.

3.2 Calculation process

The trellis diagrams in Figs. 4(a), 4(b) are reflected in the where Sj is the set of finite states corresponding 10 probability equation of a Markov process, in which equations pixels in row yj. They are defined by S j = ( 7 ) and (8) have been implemented as follows: {s(O),s(zl,yj),s(z2,Y) .... s(zP,Y~)}, ,, where 4 0 ) i:j a virtual initial state of the trellis diagram, and s(zi,yj) E S. 0, is a sequence of { ~ ~ } j = ~ , . .in .the scanning line y . p , and is written as Oj = {(zI,Y)),(~z;~?,),...'(Z~:Y~)} . Then, if I L ( z i y,) of the scanning line is given, a cost error , [ I I L ( z i , y j- I R ( z i + d , y j ) ( ( is induced by O j , and dispar) ity d is calculated as the values of s ( q > y j j t = l > . , ,at which p the cost error is minimum. Equation (6) can be rewritten LICin which Z1 is a normalization constant, cording to Bayes' theorem as

In the case of a single Markov process, the first term of the numerator on the RHS of eq.(7) is given as the following maximum likelihood problem:

0;othenvise. where T is the size of the extra step. Assuming Gibbs distribution [S]for the cost function, the probability in (9) can be

3019

expressed as

Prob(lllL(xk,yj)- I R ( % k +s(%YJ),Y;)~~+ A X I l s ( ~ , y j -) s ( ~ ~ - I , Y ~ ) I O = exp(- ( ( ~ ~ ( z k , - j~ ~ (+z + k , ~ j ) , ~ j ) l l y ) - t -

A x l l s ( ~ ~ >s ~~ j~ ) I ~ Y ~ ) I I ) ~ - ( -

(12)

For each state, a priori probability is set up based on the cityblock memc between the left and the right-scan line corresponding to the epipolar lines. Let two cameras be denoted by L&R, and let I be the set of illuminated intensities recorded by the left and right cameras. As mentioned in the above descriptions, there are N 1states of disparity in the set S = {so, sl,. . . SN}. The a priori probability of each state for pixel k in the scan line is defined as

(a) Let? image of random picture

(b) The last result

Figure 5: (a) The left image of the random picture; (b) The last result.

Prob(S;)
where

Prob(s(z1,yj)) x Prob(s(zz,yj))
x . . . x Prob(s(zp,yj)),

(13)

Prob(s(xn,Yj))
= Prob(s(zn,Yj-1)) x P r o b ( s ( x n , ~ j + ~ ) ) ,

where Z 2 is a normalization constant and n = 1,.. . ,P. S, is calculated iteratively. Let Sy denote Sj at the

SF-',

Two pairs of random images are generated as follows: an arbitrary function is designed to define a value in the range of 0 and 1 for each pixel located in a pattern image of 100 x 375. Similarly, an arbitrary window of 50 x 100 is generated. We paste one arbitrary window to the pattem image at location (x, corresponding to y) its upper-left comer to generate the left image and at location (z 10, y) to generate the right image. In this experiment, the value of A is set to 0. The structure of the trellis diagram has 15 states corresponding to the number of disparity, and the size of the jumping step is set to 3. The length to define a valid value of path memory is set to 30. Figures 5(a) and 5(bj present the left image and the last result of our proposed method, respectively. It takes about 47 s for each process.

(m.- l)"h mth iteration, respectively and sm-'(xt,yj), and sm(zi;y,) denote a definite state of and ST at the position (xi% on the epipolar line, respectively. The disyj) yj) parity field D ( z ; ; is given as

S7-l

D ( z z ;yj) sm(zi,yj): if

In practice, after calculating the initial probability Prob(Sj), we run the matching process only twice (m = 2): one for matching the left-scan line to the right-scan line and the other for matching the right-scan line to the left-scan line. The disparity at each position on the epipolar line is selected based on the comparison between the two results as described in eq. (15).

-i

(15)

ll~~(~.~Y~)-~~(~.+l~-(~~,Y~)l,~,)ll~
ilI~(r,,Yi)-l~(z*+lJ~"'-')(Zllyl)J.yl)li

s(m-l)(zi, y;), otherwise.

Simulation results

In the simulation, we apply the proposed method to essential stereo pairs such as random images and Tsukuba images. Based on the structure of the image's background, the value A is changed to control the matching process. lfthere are flat surfaces in a pair of images, the value of A will be set to a large value whereas if there are no flat surfaces, the value of A is set to a small value or 0. The program is run on a PC with 2.2 GHz CPU Pentium using Matlab programming. 3020

Tsukuba images from the University of Tsukuba are well-known sample images used in academia for studying the stereo correspondence problem. Figure 6(a) presents the left image of (288 x 384) selected from the University of Tsukuba's database of multiview images. In the simulation, the number of disparity is equal to 15 ( N = 15). Therefore, states corresponding to the disparity are {1,2,. . . ,15}. The value of T , the extra step, is set to 3. The value of T can be chosen arbitrarily but it should be large enough to link the two states that are not lying in their respective neighborhood areas. The value of A is equal to 10. The length to define a valid value of path memory in this experiment is set to 40. Figure 6(b) shows the result ofthe window-based correlation method with a window size 11 x 11. Figure 6(c) shows the ground truth of Tsukuba stereo image and Fig. 6(d) shows the last result of our method after combining the two results from the first and second process. When compared with the ground truth, the last result has an accuracy of 94.5 percent. In the last result, we can see that there are some areas in which errors still occur such as the left side of the lamp and the head of the statue. The reason is that the proposed method cannot detect the disparity in the occlusion area where the gray level of occlusion area is similar to that of the background. In such a case, our method will treat this gap like a simple edge in the image instead of an oc-

clusion area. It is also noted that the vertical edges of the reconstructed object in our method are sometimes jagged because the continuity constraint between different epipolar lines is not efficient in the matching process due to the use of the weights of one upper and that of one lower epipolar line as described in eq. (14). To eliminate the jagged edges, we need more a priori information of the reconstructed object to define the number of neighboring epipolar lines. This information is based on the structure of the object and the condition of lighting in the two stereo images. In this paper we have not considered this a priori information because the error caused by jagged edges is small, and a very large computation isnecessary to obtain such apriori informaiion. Therefore, it has not been mentioned in our proposed method and not used to produce the last results in our simulations. The time to calculate each process in ow proposed method is 128 s. Meanwhile, the time to finish the window-based correlation method is 363 s in the same programming environment.

(a) The leR image

(b) Window-based reSUlt

(e) The last TesuIt

(dj The ground truth

5 Conclusions
In this paper we propose a method to solve the stereo correspondence problem based on Bayesian probability. Two given stereo images are assumed to satisfy the monotonic constraint. Our approach utilizes the Viterhi algorithm to treat a stereo matching process as a simplified single Markov process. To overcome occlusion regions, a two-direction matching method is proposed in this paper. The efficiency of our proposed method is presented in terms of both the computational complexity and the ratio of accuracy. The computational time is reduced considerably from O ( v x N 2 x P ) to O(6N x P ) when a trellis diagram is applied iteratively and bi-directionally. Our research shows that the proposed method is stable and robust. The experimental results show that the computational time of our method has improved in comparison with the classical methods based on window correlation. Furthemiore, the robustness of our method has also improved when compared with other approaches based on dynamic programming as in our method it is easy to find a suitable parameter for each experiment. Figure 6: (a) The letl image of Tsukuba picture; (b) The result of window-based correlation method (11 x 11); (c) The last result synthesized from the two processes; (d) The ground truth.

[4] 0. Faugeras, Three-Dimensional Computer fission,

MIT press. Cambridge, Mass, 1981. [5] S.Geman, D. Geman, Stochastic Relaxation, Gibs Distributions, and the Bayesian Restoration of Images, IEEE Transactions on Pattem Analysis and Machine Intelligence, ~01.6, 721-741, 1984. pp. [6] D. Geiger, B. Ladendorf, A. Yuille, Occlusions and Binocular Stereo, International Journal of Computer Vision, ~01.14, 21 1-226, 1995. pp. 171 T. Kanade, M. Okutomi, A Stereo Matching Algorithm with An Adaptive Window: Theory and Experiment, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 16(9), pp.920-932, 1994. [8] D. Marr, T.Poggio, Cooperative Computation of Stereo Disparity, Science, vol. 194, pp. 283-287, 1976. [9] C. Sun, Fast Stereo Matching Using Rectangular Subregioning and 3D Maximum-Surface Techniques, International Joumal of Computer Vision, vo1.47, no.11213, pp. 99-1 17, May 2002.

References
[l] P. Belhumeur, D. Mumford, A Bayesian Treatment of the Stereo Correspondence Problem Using HalfOccluded Regions, Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, 1992. .
[2] Y. Boykov, 0. Veksler, R. Zabih, Fast Approximate Energy Minimization via Graph Cuts, IEEE Transactions on Pattem Analysis and Machine Intelligence, vo1.23(1 I), pp. 1222-1239,2001. [3] I.J. Cox, S.L. Hingorani, S.B. Rao, B.M. Maggs, A Maximum Likelihood Stereo Algorithm, Computer Vision and Image Understanding, vo1.63(3), pp. 542567,1996.

30: 21

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy