Abstract
Nowadays, fully unsupervised video object segmentation is still a challenge in computer vision. Furthermore, it is more difficult to segment the object from a set of clips. In this paper, we propose an unsupervised and on-line method that efficiently segments common objects from a set of video clips. Our approach is based on the hypothesis, that common or similar objects in multiple video clips are salient, and they share similar features. At first, we try to find out the regions in every clip which are salient and share similar features by proposing a new co-saliency scheme based on superpixels. Then, the most salient superpixels are chosen as the initial object marker superpixels. Starting from these superpixels, we merge neighboring and similar regions, and segment out the final object parts. The experimental results demonstrate that the proposed method can efficiently segment the common objects from a group of video clips with generally lower error rate than some state-of-the-art video co-segmentation methods.
















Similar content being viewed by others
References
Achanta R, Estrada F, Wils P, Süsstrunk S (2008) Salient region detection and segmentation. In: Gasteratos A, Vincze M, Tsotsos J (eds) Computer Vision Systems. vol. 5008, Springer Berlin Heidelberg, pp. 66–75
Achanta R, Hemami S, Estrada F, Susstrunk S (2009) Frequency-tuned salient region detection. In: 2009 I.E. Conference on Computer Vision and Pattern Recognition, pp. 1597–1604
Achanta R, Shaji A, Smith K, Lucchi A, Fua P, Susstrunk S (2012) SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans Pattern Anal Mach Intell 34:2274–2282
Alexe B, Deselaers T, Ferrari V (2012) Measuring the objectness of image windows. IEEE Trans Pattern Anal Mach Intell 34:2189–2202
Badrinarayanan V, Budvytis I, Cipolla R (2013) Semi-supervised video segmentation using tree structured graphical models. IEEE Trans Pattern Anal Mach Intell 35:2751–2764
Bai X, Wang J, Simons D, Sapiro G (2009) Video SnapCut: robust video object cutout using localized classifiers. ACM Trans Graph 28:1–11
Batra D, Kowdle A, Parikh D, Jiebo L, Tsuhan C (2010) iCoseg: interactive co-segmentation with intelligent scribble guidance. In: 2010 I.E. Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3169–3176
Cao X, Tao Z, Zhang B, Fu H, Feng W (2014) Self-adaptively weighted co-saliency detection via rank constraint. IEEE Trans Image Process 23:4175–4186
Cheng M-M, Zhang G-X, Mitra NJ, Huang X, Hu S-M (2011) Global contrast based salient region detection. In: 2011 I.E. Conference on Computer Vision and Pattern Recognition (CVPR), pp. 409–416
Chiu W-C, Fritz M (2013) Multi-class video co-segmentation with a generative multi-video model. In 2013 I.E. Conference on Computer Vision and Pattern Recognition (CVPR), pp. 321–328
Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24:603–619
Endres I, Hoiem D (2010) Category Independent Object Proposals. In: Daniilidis K, Maragos P, Paragios N (eds) Computer Vision – ECCV 2010. vol. 6315, Springer Berlin Heidelberg, pp. 575–588
Feng T, Brennan S, Qi Z, Hai T (2007) Co-tracking using semi-supervised support vector machines. In: Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on, pp. 1–8
Fu H, Cao X, Tu Z (2013) Cluster-based co-saliency detection. IEEE Trans Image Process 22:3766–3778
Golland P, Bruckstein AM (1997) Motion from color. Comput Vis Image Underst 68:346–362
Hochbaum DS, Singh V (2009) An efficient algorithm for Co-segmentation. In: 2009 I.E. 12th International Conference on Computer Vision, pp. 269–276
Hou X, Zhang L (2007) Saliency detection: a spectral residual approach. In: 2007 I.E. Conference on Computer Vision and Pattern Recognition, pp. 1–8
Huazhu F, Dong X, Bao Z, Lin S (2014) Object-based multiple foreground video co-segmentation. In: Computer Vision and Pattern Recognition (CVPR), 2014 I.E. Conference on, pp. 3166–3173
Jiaming G, Zhuwen L, Loong-Fah C, Zhou SZ (2013) Video co-segmentation for meaningful action extraction. In: 2013 I.E. International Conference on Computer Vision (ICCV), pp. 2232–2239
Joulin A, Bach F, Ponce J (2010) Discriminative clustering for image co-segmentation. In: 2010 I.E. Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1943–1950
Joulin A, Bach F, Ponce J (2012) Multi-class cosegmentation. In 2012 I.E. Conference on Computer Vision and Pattern Recognition (CVPR), pp. 542–549
Kailath T (1967) The divergence and bhattacharyya distance measures in signal selection. IEEE Trans Commun Technol 15:52–60
Krähenbühl P, Koltun V (2014) Geodesic object proposals. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer Vision – ECCV 2014. vol. 8693, Springer International Publishing, pp. 725–739
Li H, Ngan KN (2011) A co-saliency model of image pairs. IEEE Trans Image Process 20:3365–3375
Liu Z, Zou W, Li L, Shen L, Meur OL (2014) Co-saliency detection based on hierarchical segmentation. IEEE Signal Process Lett 21:88–92
Manen S, Guillaumin M, Gool LV (2013) Prime object proposals with randomized prim’s algorithm. Presented at the Proceedings of the 2013 I.E. International Conference on Computer Vision
Meng F, Li H, Liu G, Ngan KN (2012) Object co-segmentation based on shortest path algorithm and saliency model. IEEE Trans Multimedia 14:1429–1441
Ning J, Zhang L, Zhang D, Wu C (2010) Interactive image segmentation by maximal similarity based region merging. Pattern Recogn 43:445–456
Tang K, Joulin A, Li-Jia L, Li F-F (2014) Co-localization in real-world images. In: 2014 I.E. Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1464–1471
Van den Bergh M, Boix X, Roig G, de Capitani B, Van Gool L (2012) SEEDS: superpixels extracted via energy-driven sampling. In: Fitzgibbon A, Lazebnik S, Perona P, Sato Y, Schmid C (eds) Computer Vision – ECCV 2012. vol. 7578, Springer Berlin Heidelberg, pp. 13–26
Vedaldi A, Soatto S (2008) Quick shift and kernel methods for mode seeking. In: Forsyth D, Torr P, Zisserman A (eds) Computer Vision – ECCV 2008. vol. 5305, Springer Berlin Heidelberg, pp. 705–718
Vincent L, Soille P (1991) Watersheds in digital spaces: an efficient algorithm based on immersion simulations. IEEE Trans Pattern Anal Mach Intell 13:583–598
Wang T, Han B, Collomosse J (2014) TouchCut: fast image and video segmentation using single-touch interaction. Comput Vis Image Underst 120:14–30
Willert V, Eggert J, Clever S, Körner E (2005) Probabilistic color optical flow. In: Kropatsch W, Sablatnig R, Hanbury A (eds) Pattern Recognition. vol. 3663, Springer Berlin Heidelberg, pp. 9–16
Zhai Y, Shah M (2006) Visual attention detection in video sequences using spatiotemporal cues. Presented at the Proceedings of the 14th annual ACM international conference on Multimedia, Santa Barbara, CA, USA
Zhang D, Javed O, Shah M (2013) Video object segmentation through spatially accurate and temporally dense extraction of primary object regions. Presented at the Proceedings of the 2013 I.E. Conference on Computer Vision and Pattern Recognition
Zhang D, Javed O, Shah M (2014) Video object co-segmentation by regulated maximum weight cliques. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer Vision – ECCV 2014. vol. 8695, Springer International Publishing, pp. 551–566
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Huang, G., Pun, CM. & Lin, C. Unsupervised video co-segmentation based on superpixel co-saliency and region merging. Multimed Tools Appl 76, 12941–12964 (2017). https://doi.org/10.1007/s11042-016-3709-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-016-3709-3