A New Approach Using The Viterbi Algorithm in Stereo Correspondence Problem
A New Approach Using The Viterbi Algorithm in Stereo Correspondence Problem
Abstract - This paper presents an approach that uses the Viterbi algorithm in a stereo correspondence problem. We propose a matchingprocess which is visualized as a trellis diagram to n d the maximum aposterior result. The matchingprocess is divided into twoparts: matching the left scene to the right scene and matching the right scene to the left scene. ?%e result o stereo problem is selected by a conif parison between the results of two matchingprocesses. This makes the stereo matchingpossible without explicitly detecting occlusions. Moi,eoveq this stereo matching algorithm can improve the accuracy of the disparif?;image, and it has an acceptable running time for practical applications since if uses a trellis diagram iteratively and bi-directionally The complexity of our proposed method is shown approximately as O(6 x N x P ) ,in which N is the number ofdispariw, and P is the length ofthe epipolar line in both the l e j and right images. Our proposed method has been proved to be robust when applied to several samples of stereo images such as randoni dot and Tsukuba image. Itpmvides a 94.5 percent accuracy for the Tsrrkuba images. Keywords: the Viterbi algorithm, stereo correspondence
image, occlusion region, disparity, trellis diagram, epipolar line.
discontinuities in depth for stereo images. The difficulties of the stereo correspondence problem in three observations are described as follows: 1) Many images frequently have large regions oflow texture in which sufficient information forpattem matching can not be obtained. This obstacle strongly induces large errors in the window-based correlation method. 2) When the occlusion regions are large, the estimation of occlusion regions makes finding an exact disparity a highly time-consuming process. This demerit is a major problem in many methods based on dynamic programming. 3) When the gray level of occlusion regions is too similar to that ofthe background, it is difficult to identify the difference between occlusion edges and background level edges. A number of studies on disparity estimation have been conducted since the pioneering work by Marr and Poggio (81 in 1976. There are many existing techniques for disparity estimation such as window matching [7], disparity-space based techniques [ I ; 3, 61, graph-based method 121, 3-D maximum surface techniques [9], etc. However, there is still scope for an improvement in performance for gaining a high-accuracy result with a low-complexity calculation. Based on the asymmetry of occlusion regions in two stereo images, we propose a method using the Viterbi algorithm (dynamic programming) iteratively and bi-directionally to overcome two disculties
1 Introduction
Stereo correspondence is one ofthe fundamental and classical problems in computer vision. It plays an important role in many fields such as optical flow detection, image segmentation for object-based video coding, and it is also a key factor in virtual image synthesis. The stereo correspondence problem is detecting the disparity between two corresponding pixels positioned in the same row of stereo images. In most papers, it has been assumed that the correspondence between two pixels in the left and right images is unique. However, there are some regions in stereo images where we cannot find uniquely corresponding points between a pair of stereo images. Those regions are called occlusion regions, and a pixel of occlusion regions is called an occlusion point. The existence of occlusion regions causes disparity between the right and left images and also causes
'0-7803-85667/04/$20.00 @ 2004 IEEE.
3016
listed above without explicitly detecting occlusion regions in stereo images. In this paper, we find the method which satisfies high accuracy of the disparity image and less cornputational complexity within the framework of one monotonic constraint and one Bayesian relation that were proposed by [ I , 61. The rest of the paper is organized as follows: Section 2 presents stereo correspondence problem and our approach to this problem. Our proposed method is described in the section 3. Section 4 consists of simulation results and concluding remarks, and section 5 presents ow conclusion.
Figure 2: Example of a matching space. space considered in our method. A set of diagonal lines D {Do,D1,.,DN}is defined by the following equation ..
=
d=x,-xi=bx
f -. ",, *
= -(2,+z 2d
1).
(1)
Let IL(z,y) and IR(z,y) denote two hnctions of the left and the right image, respectively. The solution to the correspondence problem of I L ( z ,y) and I R ( z ,y) is a disparity field d(z:y). The disparity d(x,y) is defined a s the difference in distance ofposition between one pixel in the left image and its corresponding pixel in the same row of the right image (or vice versa) such that
12)
Matching space A4 is a 2-D space where the axes are given by the epipolar at row y, of the left and right images. Each element M(2,r ) of the matching space is a measure to deter, mine whether a pixel in the row y of the left image at po:dtion 1 matches a pixel of the right image at position r . The solution of the matching problem is a path in the matching space that satisfies an optimal criterion of a cost function.
A l ( 1 , ~ ) lPL(zl,Yj) - I R ( z T , ~ j ) l , =
(3)
where I, T = 1 3 , and P is the length of the epipolar line in left and right images. Figure 2 presents an example of a matching space. The parallelogram ABCD is the matching
. where1 = 1,.. ,P - i , ? = i + l , .. . , P , a n d i = 0 , . . . , N . Note that N is the maximum index of disparity. The monotonicity constraint requires that for any matched pair (?, I ) at least one of the neighboring points at T 1 (in the right image) or I 1 (in the left image) must also be matched to a point located at a position which is larger than 1 (or r ) . The monotonicity constraint implies that a discontinuity expressed by a vertical or horizontal jump in the matching space cannot occur simultaneously at the same position in the epipolar lines of the left and right images. In most studies, the dynamic programming algorithm has been used to match the left and right scan line in only one direction: leftscan line (one row in the left image) to right-scan line (one row in the right image) or vice-versa. For instance, in Fig. 3 if we try to find a correspondence for each position on the left-scan line from the right-scan line, for any matched pair (z~, in the process of matching the left-scan line to the 5,) right-scan line, zTis chosen from N + 1neighboring points of the same position as zi in the right image. We can easily 1 ,' obtain a matched pair (21,2 , ) when z is positioned in ? &, and S. On the other hand, if zl is positioned in R,one zT corresponds to many x1 which violates the uniqueness constraint. To solve this problem using dynamic programming, for each 5 positioned on R, it is necessary to check all the 1 possible candidates for all the positions on R. As a result, the number of states in dynamic programming increases to w x N states, in which w is equal the maximum length of occlusion regions. Namely, a maximum size of the search space for the dynamic programming is determined by area R. Therefore, the conventional dynamic programming requires a large computation complexity for at least O(w x N 2 x P). In our proposed method, the computational complexity is reduced
3017
--.Violation of monotonicity
I Occlusion region
1) Matching space M e f t - - f o - , . i g h t
Figure 3: The matching process comprises W Opaw: 1) mafching the left-scan line to the right-scan line, and 2) matching the right-scan line to the left-scan line.
to 0 ( 6 x N x P ) by reducing links of each state and using the Viterhi algorithm bi-directionally.
space of (2).
Aileft-to--right
= A4vrght--to-left.
(5)
We apply the Viterbi algorithm to detect the disparity in the two directions and select the last result from a comparison of two obtained results based on the selection of minimum error. We observed that after passing through the occlusion regions the Viterbi detector requires that the ACS (add, compare, and select) he run at least N times in order for the paths on the trellis converge. Once the paths merge, the Viterhi detector can operate in order. These N matching losses may be neglected in most pictures. This method has two advantages: I ) In the matching process, there is no need to detect occlusion regions, and 2) the number of states is equal to N 1as the number of disparity.
23 .
In general, if we attempt to find the best solution among all the combinations in this matching space, this problem will become an NP problem. Therefore, calculation should he confined to high probability parts of a stereo pair in image space in order to reduce computational complexity. We assume Gibbs distribution [SIfor the cost function. Using continuity, the difference in disparity between the neighboring pixels must be small except when the pixel is at the edge of a stereo image. Thus, all pixels have a link to their neigbborhood within a radius of fl.This implies that the number of links for each state is equal to 3. To express discontinuity in disparity, an extra link between two states that are not neighbors is formed. This means that a state at the i t h position connected to the state at the (i iT ) t hposition. By matching the two directions and relaxing the links of each state, the complexity of computation is reduced from O(v x N 2 x P) to 0 ( 2 x 3N x P ) , where N is the number of states, P is
3018
: :I
Statel
::::[$Kg
1 0
L..
Stat$$
K1
........... :
......
N
I
2
... . . . . . . ... . .
K-l
'0 K
Figure 4: [a) The trellis code for matching the left-scanline lo the right-scan line by scanning from left to right on the epipolar line with the extra step T equal to 3; (b) The trellis code when matching the right-scan line to left-scanline by scanning from right to left on the epipolar line with T = 3. the length of the epipolar line, and v i s the maximum leiigth of occlusion regions.
Prob(S,) is a priori probability defined by apnori information ofthe two stereo images. We apply the vertical information, i.e., the definite states in the upper row yjpl and in the , lower row y j + ~ of the two stereo images for the definition of a priori infomiation.
The trellis diagrams in Figs. 4(a), 4(b) are reflected in the where Sj is the set of finite states corresponding 10 probability equation of a Markov process, in which equations pixels in row yj. They are defined by S j = ( 7 ) and (8) have been implemented as follows: {s(O),s(zl,yj),s(z2,Y) .... s(zP,Y~)}, ,, where 4 0 ) i:j a virtual initial state of the trellis diagram, and s(zi,yj) E S. 0, is a sequence of { ~ ~ } j = ~ , . .in .the scanning line y . p , and is written as Oj = {(zI,Y)),(~z;~?,),...'(Z~:Y~)} . Then, if I L ( z i y,) of the scanning line is given, a cost error , [ I I L ( z i , y j- I R ( z i + d , y j ) ( ( is induced by O j , and dispar) ity d is calculated as the values of s ( q > y j j t = l > . , ,at which p the cost error is minimum. Equation (6) can be rewritten LICin which Z1 is a normalization constant, cording to Bayes' theorem as
In the case of a single Markov process, the first term of the numerator on the RHS of eq.(7) is given as the following maximum likelihood problem:
0;othenvise. where T is the size of the extra step. Assuming Gibbs distribution [S]for the cost function, the probability in (9) can be
3019
expressed as
A x l l s ( ~ ~ >s ~~ j~ ) I ~ Y ~ ) I I ) ~ - ( -
(12)
For each state, a priori probability is set up based on the cityblock memc between the left and the right-scan line corresponding to the epipolar lines. Let two cameras be denoted by L&R, and let I be the set of illuminated intensities recorded by the left and right cameras. As mentioned in the above descriptions, there are N 1states of disparity in the set S = {so, sl,. . . SN}. The a priori probability of each state for pixel k in the scan line is defined as
Figure 5: (a) The left image of the random picture; (b) The last result.
Prob(S;)
where
Prob(s(z1,yj)) x Prob(s(zz,yj))
x . . . x Prob(s(zp,yj)),
(13)
Prob(s(xn,Yj))
= Prob(s(zn,Yj-1)) x P r o b ( s ( x n , ~ j + ~ ) ) ,
where Z 2 is a normalization constant and n = 1,.. . ,P. S, is calculated iteratively. Let Sy denote Sj at the
SF-',
Two pairs of random images are generated as follows: an arbitrary function is designed to define a value in the range of 0 and 1 for each pixel located in a pattern image of 100 x 375. Similarly, an arbitrary window of 50 x 100 is generated. We paste one arbitrary window to the pattem image at location (x, corresponding to y) its upper-left comer to generate the left image and at location (z 10, y) to generate the right image. In this experiment, the value of A is set to 0. The structure of the trellis diagram has 15 states corresponding to the number of disparity, and the size of the jumping step is set to 3. The length to define a valid value of path memory is set to 30. Figures 5(a) and 5(bj present the left image and the last result of our proposed method, respectively. It takes about 47 s for each process.
(m.- l)"h mth iteration, respectively and sm-'(xt,yj), and sm(zi;y,) denote a definite state of and ST at the position (xi% on the epipolar line, respectively. The disyj) yj) parity field D ( z ; ; is given as
S7-l
D ( z z ;yj) sm(zi,yj): if
In practice, after calculating the initial probability Prob(Sj), we run the matching process only twice (m = 2): one for matching the left-scan line to the right-scan line and the other for matching the right-scan line to the left-scan line. The disparity at each position on the epipolar line is selected based on the comparison between the two results as described in eq. (15).
-i
(15)
ll~~(~.~Y~)-~~(~.+l~-(~~,Y~)l,~,)ll~
ilI~(r,,Yi)-l~(z*+lJ~"'-')(Zllyl)J.yl)li
Simulation results
In the simulation, we apply the proposed method to essential stereo pairs such as random images and Tsukuba images. Based on the structure of the image's background, the value A is changed to control the matching process. lfthere are flat surfaces in a pair of images, the value of A will be set to a large value whereas if there are no flat surfaces, the value of A is set to a small value or 0. The program is run on a PC with 2.2 GHz CPU Pentium using Matlab programming. 3020
Tsukuba images from the University of Tsukuba are well-known sample images used in academia for studying the stereo correspondence problem. Figure 6(a) presents the left image of (288 x 384) selected from the University of Tsukuba's database of multiview images. In the simulation, the number of disparity is equal to 15 ( N = 15). Therefore, states corresponding to the disparity are {1,2,. . . ,15}. The value of T , the extra step, is set to 3. The value of T can be chosen arbitrarily but it should be large enough to link the two states that are not lying in their respective neighborhood areas. The value of A is equal to 10. The length to define a valid value of path memory in this experiment is set to 40. Figure 6(b) shows the result ofthe window-based correlation method with a window size 11 x 11. Figure 6(c) shows the ground truth of Tsukuba stereo image and Fig. 6(d) shows the last result of our method after combining the two results from the first and second process. When compared with the ground truth, the last result has an accuracy of 94.5 percent. In the last result, we can see that there are some areas in which errors still occur such as the left side of the lamp and the head of the statue. The reason is that the proposed method cannot detect the disparity in the occlusion area where the gray level of occlusion area is similar to that of the background. In such a case, our method will treat this gap like a simple edge in the image instead of an oc-
clusion area. It is also noted that the vertical edges of the reconstructed object in our method are sometimes jagged because the continuity constraint between different epipolar lines is not efficient in the matching process due to the use of the weights of one upper and that of one lower epipolar line as described in eq. (14). To eliminate the jagged edges, we need more a priori information of the reconstructed object to define the number of neighboring epipolar lines. This information is based on the structure of the object and the condition of lighting in the two stereo images. In this paper we have not considered this a priori information because the error caused by jagged edges is small, and a very large computation isnecessary to obtain such apriori informaiion. Therefore, it has not been mentioned in our proposed method and not used to produce the last results in our simulations. The time to calculate each process in ow proposed method is 128 s. Meanwhile, the time to finish the window-based correlation method is 363 s in the same programming environment.
5 Conclusions
In this paper we propose a method to solve the stereo correspondence problem based on Bayesian probability. Two given stereo images are assumed to satisfy the monotonic constraint. Our approach utilizes the Viterhi algorithm to treat a stereo matching process as a simplified single Markov process. To overcome occlusion regions, a two-direction matching method is proposed in this paper. The efficiency of our proposed method is presented in terms of both the computational complexity and the ratio of accuracy. The computational time is reduced considerably from O ( v x N 2 x P ) to O(6N x P ) when a trellis diagram is applied iteratively and bi-directionally. Our research shows that the proposed method is stable and robust. The experimental results show that the computational time of our method has improved in comparison with the classical methods based on window correlation. Furthemiore, the robustness of our method has also improved when compared with other approaches based on dynamic programming as in our method it is easy to find a suitable parameter for each experiment. Figure 6: (a) The letl image of Tsukuba picture; (b) The result of window-based correlation method (11 x 11); (c) The last result synthesized from the two processes; (d) The ground truth.
MIT press. Cambridge, Mass, 1981. [5] S.Geman, D. Geman, Stochastic Relaxation, Gibs Distributions, and the Bayesian Restoration of Images, IEEE Transactions on Pattem Analysis and Machine Intelligence, ~01.6, 721-741, 1984. pp. [6] D. Geiger, B. Ladendorf, A. Yuille, Occlusions and Binocular Stereo, International Journal of Computer Vision, ~01.14, 21 1-226, 1995. pp. 171 T. Kanade, M. Okutomi, A Stereo Matching Algorithm with An Adaptive Window: Theory and Experiment, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 16(9), pp.920-932, 1994. [8] D. Marr, T.Poggio, Cooperative Computation of Stereo Disparity, Science, vol. 194, pp. 283-287, 1976. [9] C. Sun, Fast Stereo Matching Using Rectangular Subregioning and 3D Maximum-Surface Techniques, International Joumal of Computer Vision, vo1.47, no.11213, pp. 99-1 17, May 2002.
References
[l] P. Belhumeur, D. Mumford, A Bayesian Treatment of the Stereo Correspondence Problem Using HalfOccluded Regions, Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, 1992. .
[2] Y. Boykov, 0. Veksler, R. Zabih, Fast Approximate Energy Minimization via Graph Cuts, IEEE Transactions on Pattem Analysis and Machine Intelligence, vo1.23(1 I), pp. 1222-1239,2001. [3] I.J. Cox, S.L. Hingorani, S.B. Rao, B.M. Maggs, A Maximum Likelihood Stereo Algorithm, Computer Vision and Image Understanding, vo1.63(3), pp. 542567,1996.
30: 21