Simplicity: Semantics-Sensitive Integrated Matching For Picture Libraries
Simplicity: Semantics-Sensitive Integrated Matching For Picture Libraries
AbstractÐThe need for efficient content-based image retrieval has increased tremendously in many application areas such as
biomedicine, military, commerce, education, and Web image classification and searching. We present here SIMPLIcity (Semantics-
sensitive Integrated Matching for Picture LIbraries), an image retrieval system, which uses semantics classification methods, a
wavelet-based approach for feature extraction, and integrated region matching based upon image segmentation. As in other region-
based retrieval systems, an image is represented by a set of regions, roughly corresponding to objects, which are characterized by
color, texture, shape, and location. The system classifies images into semantic categories, such as textured-nontextured, graph-
photograph. Potentially, the categorization enhances retrieval by permitting semantically-adaptive searching methods and narrowing
down the searching range in a database. A measure for the overall similarity between images is developed using a region-matching
scheme that integrates properties of all the regions in the images. Compared with retrieval based on individual regions, the overall
similarity approach 1) reduces the adverse effect of inaccurate segmentation, 2) helps to clarify the semantics of a particular region,
and 3) enables a simple querying interface for region-based image retrieval systems. The application of SIMPLIcity to several
databases, including a database of about 200,000 general-purpose images, has demonstrated that our system performs significantly
better and faster than existing ones. The system is fairly robust to image alterations.
Index TermsÐContent-based image retrieval, image classification, image segmentation, integrated region matching, clustering,
robustness.
1 INTRODUCTION
a more important reason for using the signature is to gain subimages. An obvious drawback of the system is the
on improved correlation between image representation and sharply increased computational complexity and increase of
semantics. Actually, the main task of designing a signature size of the search space due to exhaustive generation of
is to bridge the gap between image semantics and the pixel subimages. Furthermore, texture and shape information is
representation, that is, to create a better correlation with discarded in the signatures because every subimage is
image semantics. partitioned into four blocks and only average colors of the
Existing general-purpose CBIR systems roughly fall into blocks are used as features. This system is also limited to
three categories depending on the approach to extract intensity-level image representations.
signatures: histogram, color layout, and region-based
search. We will briefly review the three methods in this 1.1.3 Region-Based Search
section. There are also systems that combine retrieval Region-based retrieval systems attempt to overcome the
results from individual algorithms by a weighted sum deficiencies of color layout search by representing images at
matching metric [7], [4], or other merging schemes [19].
the object-level. A region-based retrieval system applies
After extracting signatures, the next step is to determine a
image segmentation [20], [27] to decompose an image into
comparison rule, including a querying scheme and the
regions, which correspond to objects if the decomposition is
definition of a similarity measure between images. For most
image retrieval systems, a query is specified by an image to ideal. The object-level representation is intended to be close
be matched. We refer to this as global search since similarity to the perception of the human visual system (HVS).
is based on the overall properties of images. By contrast, However, image segmentation is nearly as difficult as
there are also ªpartial searchº querying systems that retrieve image understanding because the images are 2D projections
based on a particular region in an image [11], [2]. of 3D objects and computers are not trained in the 3D world
the way human beings are.
1.1.1 Histogram Search Since the retrieval system has identified what objects are
Histogram search algorithms [4], [18] characterize an image in the image, it is easier for the system to recognize similar
by its color distribution or histogram. Many distances have objects at different locations and with different orientations
been used to define the similarity of two color histogram and sizes. Region-based retrieval systems include the NeTra
representations. Euclidean distance and its variations are system [11], the Blobworld system [2], and the query system
the most commonly used [4]. Rubner et al. of Stanford with color region templates [22].
University proposed the earth mover's distance (EMD) [18] The NeTra and the Blobworld systems compare images
using linear programming for matching histograms. based on individual regions. Although querying based on a
The drawback of a global histogram representation is limited number of regions is allowed, the query is
that information about object location, shape, and texture performed by merging single-region query results. The
[10] is discarded. Color histogram search is sensitive to motivation is to shift part of the comparison task to the
intensity variation, color distortions, and cropping. users. To query an image, a user is provided with the
segmented regions of the image and is required to select the
1.1.2 Color Layout Search regions to be matched and also attributes, e.g., color and
The ªcolor layoutº approach attempts to overcome the texture, of the regions to be used for evaluating similarity.
drawback of histogram search. In simple color layout Such querying systems provide more control to the user.
indexing [4], images are partitioned into blocks and the However, the user's semantic understanding of an image is
average color of each block is stored. Thus, the color layout
at a higher level than the region representation. For objects
is essentially a low resolution representation of the original
without discerning attributes, such as special texture, it is
image. A relatively recent system, WBIIS [28], uses
not obvious for the user how to select a query from the large
significant Daubechies' wavelet coefficients instead of
variety of choices. Thus, such a querying scheme may add
averaging. By adjusting block sizes or the levels of wavelet
transforms, the coarseness of a color layout representation burdens on users without significant reward. On the other
can be tuned. The finest color layout using a single pixel hand, because of the great difficulty of achieving accurate
block is the original pixel representation. Hence, we can segmentation, systems in [11], [2] often partition one object
view a color layout representation as an opposite extreme of into several regions with none of them being representative
a histogram. At proper resolutions, the color layout for the object, especially for images without distinctive
representation naturally retains shape, location, and texture objects and scenes.
information. However, as with pixel representation, Not much attention has been paid to developing similarity
although information such as shape is preserved in the measures that combine information from all of the regions.
color layout representation, the retrieval system cannot One effort in this direction is the querying system developed
perceive it directly. Color layout search is sensitive to by Smith and Li [22]. Their system decomposes an image into
shifting, cropping, scaling, and rotation because images are regions with characterizations predefined in a finite pattern
described by a set of local properties [28]. library. With every pattern labeled by a symbol, images are
The approach taken by the recent WALRUS system [14] then represented by region strings. Region strings are
to reduce the shifting and scaling sensitivity for color layout converted to composite region template (CRT) descriptor
search is to exhaustively reproduce many subimages based matrices that provide the relative ordering of symbols.
on an original image. The subimages are formed by sliding Similarity between images is measured by the closeness
windows of various sizes and a color layout signature is between the CRT descriptor matrices. This measure is
computed for every subimage. The similarity between sensitive to object shifting since a CRT matrix is determined
images is then determined by comparing the signatures of solely by the ordering of symbols. The measure also lacks
WANG ET AL.: SIMPLICITY: SEMANTICS-SENSITIVE INTEGRATED MATCHING FOR PICTURE LIBRARIES 949
robustness to scaling and rotation. Because the definition of photograph is higher than a threshold, the image is marked
the CRT descriptor matrix relies on the pattern library, the as photograph; otherwise, text.
system performance depends critically on the library. The Other examples include the WIPE system to detect
performance degrades if region types in an image are not objectionable images developed by Wang et al. [29],
represented by patterns in the library. The system uses a motivated by an earlier system by Fleck et al. [5] of the
CRT library with patterns described only by color. In University of California at Berkeley. WIPE uses training
images and CBIR to determine if a given image is closer to
particular, the patterns are obtained by quantizing color
the set of objectionable training images or the set of benign
space. If texture and shape features are also used to
training images. The system developed by Fleck et al.,
distinguish patterns, the number of patterns in the library however, is more deterministic and involves a skin filter
will increase dramatically, roughly exponentially in the and a human figure grouper.
number of features if patterns are obtained by uniformly Szummer and Picard [24] have developed a system to
quantizing features. classify indoor and outdoor scenes. Classification over
low-level image features such as color histogram and
1.2 Related Work in Semantic Classification DCT coefficients is performed. A 90 percent accuracy rate
The underlying assumption of CBIR is that semantically- has been reported over a database of 1,300 images from Kodak.
relevant images have similar visual characteristics, or Other examples of image semantic classification include
features. Consequently, a CBIR system is not necessarily city versus landscape [26] and face detection [1]. Wang and
capable of understanding image semantics. Image semantic Fischler [30] have shown that rough, but accurate semantic
classification, on the other hand, is a technique for understanding, can be very helpful in computer vision tasks
classifying images based on their semantics. While image such as image stereo matching.
semantics classification is a limited form of image under-
standing, the goal of image classification is not to under- 1.3 Overview of the SIMPLIcity System
stand images the way human beings do, but merely to CBIR is a complex and challenging problem spanning
assign the image to a semantic class. We argue that image diverse disciplines, including computer vision, color per-
class membership can assist retrieval. ception, image processing, image classification, statistical
Minka and Picard [12] introduced a learning component clustering, psychology, human-computer interaction (HCI),
in their CBIR system. The system internally generated many and specific application domain dependent criteria. While
segmentations or groupings of each image's regions based we are not claiming to be able to solve all the problems
on different combinations of features, then it learned which related to CBIR, we have made some advances towards the
combinations best represented the semantic categories final goal, close to human-level automatic image under-
given as exemplars by the user. The system requires the standing and retrieval performance.
supervised training of various parts of the image. In this paper, we discuss issues related to the design and
Although region-based systems aim at decomposing implementation of a semantics-sensitive CBIR system for
images into constituent objects, a representation composed picture libraries. An experimental system, the SIMPLIcity
of pictorial properties of regions is indirectly related to its (Semantics-sensitive Integrated Matching for Picture
semantics. There is no clear mapping from a set of pictorial LIbraries) system, has been developed to validate the
properties to semantics. An approximately round brown methods. We summarize the main contributions as follows.
region might be a flower, an apple, a face, or part of a sunset
sky. Moreover, pictorial properties such as color, shape, and 1.3.1 Semantics-Sensitive Image Retrieval
texture of an object vary dramatically in different images. If The capability of existing CBIR systems is limited in large
a system understood the semantics of images and could part by fixing a set of features used for retrieval.
determine which features of an object are significant, it Apparently, different image features are suitable for the
would be capable of fast and accurate search. However, due retrieval of images in different semantic types. For example,
to the great difficulty of recognizing and classifying images, a color layout indexing method may be good for outdoor
not much success has been achieved in identifying high- pictures, while a region-based indexing approach is much
level semantics for the purpose of image retrieval. There- better for indoor pictures. Similarly, global texture matching
fore, most systems are confined to matching images with is suitable only for textured pictures.
low-level pictorial properties. We propose a semantics-sensitive approach to the problem
Despite the fact that it is currently impossible to reliably of searching general-purpose image databases. Semantic
recognize objects in general-purpose images, there are classification methods are used to categorize images so that
methods to distinguish certain semantic types of images. semantically-adaptive searching methods applicable to each
Any information about semantic types is helpful since a category can be applied. At the same time, the system
system can constrict the search to images with a particular can narrow down the searching range to a subset of the
semantic type. More importantly, the semantic classification original database to facilitate fast retrieval. For example,
schemes can improve retrieval by using various matching automatic classification methods can be used to categorize a
schemes tuned to the semantic class of the query image. general-purpose picture library into semantic classes
One example of semantic classification is the identifica- including ªgraph,º ªphotograph,º ªtextured,º ªnontex-
tion of natural photographs versus artificial graphs gener- tured,º ªbenign,º ªobjectionable,º ªindoor,º ªoutdoor,º
ated by computer tools [29]. The classifier divides an image ªcity,º ªlandscape,º ªwith people,º and ªwithout people.º
into blocks and classifies every block into either of the In our experiments, we used textured-nontextured and
two classes. If the percentage of blocks classified as graph-photograph classification methods. We apply a
950 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 23, NO. 9, SEPTEMBER 2001
suitable feature extraction method and a corresponding 2. In many cases, knowing that one object usually
matching metric to each of the semantic classes. When more appears with another helps to clarify the semantics
classification methods are utilized, the current semantic of a particular region. For example, flowers typically
classification architecture may need to be improved. appear with green leaves, and boats usually appear
In our current system, the set of features for a particular with water.
image category is determined empirically based on the 3. By defining an overall image-to-image similarity
perception of the developers. For example, shape-related measure, the SIMPLIcity system provides users with
features are not used for textured images. Automatic a simple querying interface. To complete a query, a
derivation of optimal features is a challenging and important user only needs to specify the query image. If desired,
issue in its own right. A major difficulty in feature selection is the system can be added with a function allowing
the lack of information about whether any two images in the users to query based on a specific region or a few
database match with each other. The only reliable way to regions.
obtain this information is through manual assessment which
is formidable for a database of even moderate size. 1.4 Outline of the Paper
Furthermore, human evaluation is hard to be kept consistent The remainder of the paper is organized as follows: The
from person to person. To explore feature selection, primitive semantics-sensitive architecture is further introduced in
studies can be carried with relatively small databases. A Section 2. The image segmentation algorithm is described in
database can be formed from several distinctive groups of Section 3. Classification methods are presented in Section 4.
images, among which only images from the same group are The IRM similarity measure based on segmentation is
considered matched. A search algorithm can be developed to defined in Section 5. In Section 6, experiments and results
select a subset of candidate features that provides optimal are described. We conclude and suggest future research in
retrieval according to an objective performance measure. Section 7.
Although such studies are likely to be seriously biased,
insights regarding which features are most useful for a certain
image category may be obtained. 2 SEMANTICS-SENSITIVE ARCHITECTURE
The architecture of the SIMPLIcity retrieval system is
1.3.2 Image Classification
presented in Fig. 1. During indexing, the system partitions
For the purpose of searching picture libraries such as those an image into 4 4 pixel blocks and extracts a feature vector
on the Web or in a patient digital library, we are initially for each block. A statistical clustering [8] algorithm is then
focusing on techniques to classify images into the classes used to quickly segment the image into regions. The
ªtexturedº versus ªnontextured,º ªgraphº versus ªphoto- segmentation result is fed into a classifier that decides the
graph.º Several other classification methods have been semantic type of the image. An image is currently classified as
previously developed elsewhere, including ªcityº versus one of the n manually-defined mutually exclusive and
ªlandscapeº [26], and ªwith peopleº versus ªwithout collectively exhaustive semantic classes. The system can be
peopleº [1]. In this paper, we report on several classification extended to one that classifies an image softly into multiple
methods we have developed and their performance. classes with probability assignments. Examples of semantic
types are indoor-outdoor, objectionable-benign, textured-
1.3.3 Integrated Region Matching (IRM) Similarity
nontextured, city-landscape, with-without people, and
Measure graph-photograph images. Features reflecting color, texture,
Besides using semantics classification, another strategy of shape, and location information are then extracted for each
SIMPLIcity to better capture the image semantics is to region in the image. The features selected depend on the
define a robust region-based similarity measure, the semantic type of the image. The signature of an image is the
Integrated Region Matching (IRM) metric. It incorporates collection of features for all of its regions. Signatures of images
the properties of all the segmented regions so that with various semantic types are stored in separate databases.
information about an image can be fully used to gain In the querying process, if the query image is not in the
robustness against inaccurate segmentation. Image segmen- database as indicated by the user interface, it is first passed
tation is an extremely difficult process and is still an open through the same feature extraction process as was used
problem in computer vision. For example, an image
during indexing. For an image in the database, its semantic
segmentation algorithm may segment an image of a dog
type is first checked and then its signature is extracted from
into two regions: the dog and the background. The same
the corresponding database. Once the signature of the
algorithm may segment another image of a dog into six
regions: the body of the dog, the front leg(s) of the dog, the query image is obtained, similarity scores between the
rear leg(s) of the dog, the eye(s), the background grass, and query image and images in the database with the same
the sky. semantic type are computed and sorted to provide the list of
Traditionally, region-based matching is performed on images that appear to have the closest semantics.
individual regions [2], [11]. The IRM metric we have
developed has the following major advantages: 3 THE IMAGE SEGMENTATION METHOD
1. Compared with retrieval based on individual re- In this section, we describe the image segmentation
gions, the overall ªsoft similarityº approach in IRM procedure based on the k-means algorithm [8] using color
reduces the adverse effect of inaccurate segmenta- and spatial variation features. For general-purpose images
tion, an important property lacked by previous such as the images in a photo library or on the World Wide
systems. Web (WWW), automatic image segmentation is almost as
WANG ET AL.: SIMPLICITY: SEMANTICS-SENSITIVE INTEGRATED MATCHING FOR PICTURE LIBRARIES 951
Fig. 1. The architecture of feature indexing process. The heavy lines show a sample indexing path of an image.
difficult as automatic image semantic understanding. The segmentation process generates much less number
segmentation accuracy of our system is not crucial because of segments in an image. The threshold is rarely met.
an integrated region-matching (IRM) scheme is used to Six features are used for segmentation. Three of them are
provide robustness against inaccurate segmentation. the average color components in a 4 4 block. The other three
To segment an image, SIMPLIcity partitions the image into represent energy in high frequency bands of wavelet trans-
blocks with 4 4 pixels and extracts a feature vector for each forms [3], that is, the square root of the second order moment
block. The k-means algorithm is used to cluster the feature of wavelet coefficients in high frequency bands. We use the
well-known LUV color space, where L encodes luminance
vectors into several classes with every class corresponding to
and U and V encode color information (chrominance). The
one region in the segmented image. Since the block size is LUV color space has good perception correlation properties.
small and boundary blockyness has little effect on retrieval, The block size is chosen to be 4 4 to compromise between
we choose blockwise segmentation rather than pixelwise the texture detail and the computation time.
segmentation to lower computational cost significantly. To obtain the other three features, we apply either the
Suppose observations are fxi : i 1; . . . ; Lg. The goal of Daubechies-4 wavelet transform or the Haar transform to
the k-means algorithm is to partition the observations into the L component of the image. We use these two wavelet
k groups with means x^1 ; x^2 ; . . . ; x^k such that transforms because they have better localization proper-
ties and require less computation compared to Daube-
X
L chies' wavelets with longer filters. After a one-level
D
k min
xi ÿ x^j 2
1 wavelet transform, a 4 4 block is decomposed into four
1jk
i1
frequency bands, as shown in Fig. 2. Each band contains
is minimized. The k-means algorithm does not specify how 2 2 coefficients. Without loss of generality, suppose the
many clusters to choose. We adaptively choose the number coefficients in the HL band are fck;l ; ck;l1 ; ck1;l ; ck1;l1 g.
of clusters k by gradually increasing k and stop when a One feature is then computed as
criterion is met. We start with k 2 and stop increasing k if !12
one of the following conditions is satisfied. 1X 1 X 1
f c2 :
4 i0 j0 ki;lj
1. The distortion D
k is below a threshold. A low D
k
indicates high purity in the clustering process. The The other two features are computed similarly from the
threshold is not critical because the IRM measure is LH and HH bands. The motivation for using these features is
not sensitive to k. their reflection of texture properties. Moments of wavelet
2. The first derivative of distortion with respect to k, coefficients in various frequency bands have proven effective
D
k ÿ D
k ÿ 1, is below a threshold with compar- for discerning texture [25]. The intuition behind this is that
ison to the average derivative at k 2; 3. A low D
k ÿ coefficients in different frequency bands signal variations in
D
k ÿ 1 indicates convergence in the clustering
process. The threshold determines the overall time
to segment images and needs to be set to a near-zero
value. It is critical to the speed, but not the quality of
the final image segmentation. The threshold can be
adjusted according to the experimental runtime.
3. The number k exceeds an upper bound. We allow an
image to be segmented into a maximum of
16 segments. That is, we assume an image has less Fig. 2. Decomposition of images into frequency bands by wavelet
than 16 distinct types of objects. Usually, the transforms.
952 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 23, NO. 9, SEPTEMBER 2001
Fig. 3. Segmentation results by the k-means clustering algorithm: First row: original images. Second row: regions of the images. Results for other
images in the database can be found online.
different directions. For example, the HL band shows textured images. As textured images do not contain isolated
activities in the horizontal direction. An image with vertical objects or object clusters, the perception of such images
strips thus has high energy in the HL band and low energy in focuses on color and texture, but not shape, which is critical
the LH band. This texture feature is a good compromise for understanding nontextured images. Thus, an efficient
between computational complexity and effectiveness. retrieval system should use different features to depict
Examples of segmentation results for both textured and these two types of images. To our knowledge, the problem
nontextured images are shown in Fig. 3. Segmented regions of distinguishing textured images and nontextured images
are shown in their representative colors. It takes about one has not been explored in the literature.
second on average to segment a 384 256 image on a For textured images, color and texture are much more
Pentium Pro 450MHz PC using the Linux operating system. important perceptually than shape since there are no
We do not apply postprocessing to smooth region bound- clustered objects. As shown by the segmentation results in
aries or to delete small isolated regions because these errors Fig. 3, regions in textured images tend to scatter in the
rarely cause degradation in the performance of our retrieval entire image, whereas nontextured images are usually
system, which is designed to tolerate inaccurate segmenta- partitioned into clumped regions. A mathematical descrip-
tion. Additionally, postprocessing usually costs a large tion of how evenly a region scatters in an image is the
amount of computation. goodness of match between the distribution of the region
and a uniform distribution. The goodness of fit is measured
by the 2 statistics.
4 THE IMAGE CLASSIFICATION METHODS We partition an image evenly into 16 zones,
The image classification methods described in this section fZ1 ; Z2 ; . . . ; Z16 g. Suppose the image is segmented into
have been developed mainly for searching picture libraries regions fri : i 1; . . . ; mg. PFor each region ri , its percen-
16
such as Web images. We are initially interested in tage in zone Zj is pi;j , j1 pi;j 1, i 1; . . . ; m. The
classifying images into the classes textured versus non- uniform distribution over the zones should have
textured, graph versus photograph, and objectionable probability mass function qj 1=16, j 1; . . . ; 16. The
versus benign. Karu et al. provided an overview of 2 statistics for region i, 2i , is computed by
texture-related research [10]. Other classification methods
X
16
pi;j ÿ qj 2 X
16
1 2
such as city versus landscape [26] and with people versus 2i 16 pi;j ÿ :
2
without people [1] were developed elsewhere. j1
qj j1
16
4.1 Textured versus Nontextured Classification The classification of textured or nontextured image is
In this section, we describe the algorithm to classify images performed by thresholding the average
P 2 statistics for all
into the semantic classes textured or nontextured. A textured the regions in the image, 2 m1 m 2
2 < 0:32, the
i1 i . If
image is defined as an image of a surface, a pattern of image is labeled as textured; otherwise, nontextured. We
similarly-shaped objects, or an essential element of an randomly chose 100 textured images and 100 nontextured
object. For example, the structure formed by the threads of a images and computed 2 for them. The histograms of 2 for
fabric is a textured image. Fig. 4 shows some sample the two types of images are shown in Fig. 5. It is shown that
Fig. 4. Sample textured images. (a) Surface texture. (b) Fabric texture. (c) Artificial texture. (d) Pattern of similarly-shaped objects.
WANG ET AL.: SIMPLICITY: SEMANTICS-SENSITIVE INTEGRATED MATCHING FOR PICTURE LIBRARIES 953
Fig. 6. Integrated Region Matching (IRM) is potentially robust to poor image segmentation.
Fig. 7. Region-to-region matching results are incorporated in the Integrated Region Matching (IRM) metric. A 3D feature space is shown to illustrate
the concept.
954 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 23, NO. 9, SEPTEMBER 2001
We call this distance the integrated region matching (IRM) M f
i; j : i 1; . . . ; m; j 1; . . . ; ng:
distance.
The problem of defining distance between region sets is 2. Choose the minimum di;j for
i; j 2 M ÿ L. Label
then converted to choosing the significance matrix S. A the corresponding
i; j as
i0 ; j0 .
natural issue to raise is what constraints should be put on 3. min
pi0 ; p0j0 ! si0 ;j0 .
si;j so that the admissible matching yields good similarity 4. If pi0 < p0j0 , set si0 ;j 0, j 6 j0 ; otherwise, set si;j0 0,
measure. In other words, what properties do we expect an i 6 i0 .
WANG ET AL.: SIMPLICITY: SEMANTICS-SENSITIVE INTEGRATED MATCHING FOR PICTURE LIBRARIES 955
wise, stop.
Consider an example of applying the integrated region
matching algorithm. Assume that m 2 and n 3. The
values of pi and p0j are: p1 0:4, p2 0:6, p01 0:2, p02 0:3, Fig. 9. Feature extraction in the SIMPLIcity system. (* The computation
p03 0:5. of shape features is omitted for textured images.)
The region distance matrix fdi;j g, i 1; 2, j 1; 2; 3, is
8 9 The parameter wi;j is chosen to adjust the effect of region i
: 0:5 1:2 0:1 >
> ;:
1:0 1:6 2:0 and j on the similarity measure. In the SIMPLIcity system,
regions around boundaries are slightly down-weighted by
The sorted di;j is
using this generalized IRM distance.
i; j :
1; 3
1; 1
2; 1
1; 2
2; 2
2; 3
11 5.2 Distance between Regions
di;j : 0:1 0:5 1:0 1:2 1:6 2:0:
Now, we discuss the definition of distance between a region
The first two regions matched are regions 1 and 3. As the pair, d
r; r0 . The SIMPLIcity system characterizes a region by
significance of region 1, p1 , is fulfilled by the matching, color, texture, and shape. The feature extraction process is
region 1 in Image 1 is no longer in consideration. The shown in Fig. 9. We have described the features used by the
second pair of regions matched is then regions 2 and 1. The k-means algorithm for segmentation. The mean values of
region pairs are listed below in the order of being matched: these features in one cluster are used to represent color
region pairs :
1; 3
2; 1
2; 2
2; 3 and texture in the corresponding region. These features
12
significance : 0:4 0:2 0:3 0:1: are denoted as: f1 , f2 , and f3 for the averages in L, U,
V components of color, respectively; f4 , f5 , and f6 for the
The significance matrix is
square roots of the 2nd-order moment of wavelet coefficients
8 9
: 0:0 0:0 0:4 >
> ;: in the HL band, the LH band, and the HH band, respectively.
0:2 0:3 0:1 To describe shape, normalized inertia [6] of order 1 to 3
Now, we come to the issue of choosing pi . The value of pi are used. For a region H in k-dimensional Euclidean space
is chosen to reflect the significance of region i in the image. <k , its normalized inertia of order
is
If we assume that every region is equally important, then R
kx ÿ x^k
dx
pi 1=m, where m is the number of regions. In the case that l
H;
H ;
14
V
H1
=k
Image 1 and Image 2 have the same number of regions, a
region in Image 1 is matched exclusively to one region in where x^ is the centroid of H and V
H is the volume of H.
Image 2. Another choice of pi is the percentage of the image Since an image is specified by pixels on a grid, the discrete
covered by region i based on the view that important form of the normalized inertia is used, that is,
objects in an image tend to occupy larger areas. We refer to P
kx ÿ x^k
this assignment of pi as the area percentage scheme. This l
H;
x:x2H 1
=k ;
15
V
H
scheme is less sensitive to inaccurate segmentation than the
uniform scheme. If one object is partitioned into several where V
H is the number of pixels in region H. The
regions, the uniform scheme raises its significance impro- normalized inertia is invariant with scaling and rotation.
perly, whereas the area percentage scheme retains its The minimum normalized inertia is achieved by spheres.
significance. On the other hand, if objects are merged into Denote the
th order normalized inertia of spheres as L
.
one region, the area percentage scheme assigns relatively We define shape features as l
H;
normalized by L
:
high significance to the region. The SIMPLIcity system uses f7 l
H; 1=L1 ; f8 l
H; 2=L2 ; f9 l
H; 3=L3 :
16
the area percentage scheme.
The scheme of assigning significance credits can also The computation of shape features is skipped for textured
take region location into consideration. For example, higher images because in this case region shape is not perceptually
significance may be assigned to regions in the center of an important. The region distance d
r; r0 is defined as
image than to those around boundaries. Another way to X
6
count location in the similarity measure is to generalize the d
r; r0 wi
fi ÿ fi0 2 :
17
i1
definition of the IRM distance to
X For nontextured images, d
r; r0 is defined as
d
R1 ; R2 si;j wi;j di;j :
13
i;j d
r; r0 g
ds
r; r0 dt
r; r0 ;
18
956 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 23, NO. 9, SEPTEMBER 2001
Fig. 10. The empirical pdf and cdf of the IRM distance.
where ds
r; r0 is the shape distance computed by shown in the empirical cumulative distribution function, an
IRM distance of 15 represents approximately 1 percent of
X
9
the images in the database. We may notify the user that
ds
r; r0 wi
fi ÿ fi0 2
19
i7 two images are considered to be very close when the
IRM distance between the two images is less than 15.
and dt
r; r0 is the color and texture distance defined equally Likewise, we may advise the user that two images are
as the distance between textured image regions, i.e., considerably different when the IRM distance between the
X
6 two images is greater than 50.
dt
r; r0 wi
fi ÿ fi0 2 :
20
i1
6 EXPERIMENTS
The function g
ds
r; r0 is a converting function to ensure
The SIMPLIcity system has been implemented with
a proper influence of the shape distance on the total
a general-purpose image database including about
distance. In our system, it is defined as
200; 000 pictures, which are stored in JPEG format with
8
<1 d 0:5 size 384 256 or 256 384. The system uses no textual
g
d 0:85 0:2 < d 0:5
21 information in the matching process because we try to
: explore the possible advances of CBIR. In a real-world
0:5 d < 0:2:
application, however, textual information is often used as a
It is observed that, when ds
r; r0 0:5, the two regions helpful addition to CBIR systems. Two classification
bear little resemblance and, hence, distinguishing the extent methods, graph-photograph and textured-nontextured,
of similarity by ds
r; r0 is not meaningful. Thus, we set have been used in our experiments. Adding more classifica-
g
d 1 for d greater than the threshold 0:5. When ds
r; r0 tion methods into the system may introduce problems to
is very small, we intend to keep the influence of color and the accuracy of the retrieval.
texture. Therefore, g
d is bounded away from zero. We For each image, the features, locations, and areas of all its
define g
d as a piecewise constant function instead of a regions are stored. Images of different semantic classes are
smooth function for simplicity. Because rather simple shape stored in separate databases. Because the EMD-based color
features are used in our system, we emphasize color and histogram system [18] and the WBIIS system are the only
texture more than shape. As demonstrated by the definition other systems we have access to, we compare the accuracy
of d
r; r0 , the shape distance serves as a ªbonus.º If two of the SIMPLIcity system to these systems using the same
regions match very well in shape, their color and texture COREL database. WBIIS had been compared with the
distance is attenuated by a smaller weight to provide the original IBM QBIC system and found to perform better [28].
final distance. It is difficult to design a fair comparison with existing
region-based searching algorithms such as the Blobworld
5.3 Characteristics of IRM
system and the NeTra system which depends on additional
To study the characteristics of the IRM distance, we information to be provided by the user during the process.
performed 100 random queries on our COREL photograph As a future work, we will try to compare our system with
data set. Based on the 5:6 million IRM distances obtained, other existing systems such as the VisualSeek system
we estimated the distribution of the IRM distance. The developed by Columbia Univerisity.
empirical mean of the IRM is 44:30, with a 95 percent With the Web, online demonstration has become a
confidence interval of 44:28; 44:32. The standard deviation popular direction in letting user evaluate CBIR systems.
of the IRM is 21:07. Fig. 10 shows the empirical probability An online demonstration is provided.1 Readers are encour-
distribution function (pdf) and the empirical cumulative aged to compare the performance of SIMPLIcity with other
distribution function (cdf). systems. A list of online image retrieval demonstration
Based on this empirical distribution of the IRM, we may Web sites can be found on our site.
give more intuitive similarity distances to the end user than
the distances themselves using the similarity percentile. As 1. URL: http://wang.ist.psu.edu.
WANG ET AL.: SIMPLICITY: SEMANTICS-SENSITIVE INTEGRATED MATCHING FOR PICTURE LIBRARIES 957
The current implementation of the SIMPLIcity system WBIIS misses this image because the query image contains
provides several query interfaces: a CGI-based Web access important fine details which are smoothed out by the
interface, a JAVA-based drawing interface, and a CGI-based multilevel wavelet transform in the system. The smoothing
Web interface for submitting a query image of any format also causes a textured image (the third match) to be
anywhere on the Internet. matched. Such errors are observed with many other image
queries. The SIMPLIcity system, however, classifies images
6.1 Accuracy first and tries to prevent images classified as textured
We evaluated the accuracy of the system in two ways. First, images to be matched to images classified as nontextured
we used a 200,000-image COREL database to compare with images. The method relies on highly accurate classifiers. In
existing systems such as EMD-based color histogram and practice, a classifier can give wrong classification results,
WBIIS. Then, we designed systematic evaluation methods to which lead to wrong retrieval.
judge the performance statistically. The SIMPLIcity system Another three query examples are compared in Figs. 11c,
has demonstrated much improved accuracy over the other 11d, and 11e. The query images in Figs. 11c and 11d are
systems. difficult to match because objects in the images are not
distinctive from the background. Moreover, the color
6.2 Query Comparison contrast for both images is small. It can be seen that the
We compare the SIMPLIcity system with the WBIIS SIMPLIcity system achieves better retrieval, based on the
(Wavelet-Based Image Indexing and Searching) system relevance criteria we have used. For the query in Fig. 11c,
[28] with the same image database. In this section, we only the third matched image is not a picture of a person. A
show the comparison results using query examples. Due to few images, the first, fourth, seventh, and eighth matches,
the limitation of space, we show only two rows of images depict a similar topic as well, probably about life in Africa.
with the top 11 matches to each query. At the same time, we The query in Fig. 11e also shows the advantages of
provide the number of related images in the top 29 matches SIMPLIcity. The system finds photos of similar flowers
(i.e., the first screenful) for each query. We chose the with different sizes and orientations. Only the ninth match
numbers ª11º and ª29º before viewing the results. In the does not have flowers in it.
next section, we provide numerical evaluation results by For textured images, SIMPLIcity and WBIIS often per-
systematically comparing several systems. form equally well. However, SIMPLIcity captures high
For each query example, we manually examine the frequency texture information better. An example of
precision of the query results. The relevance of image textured image search is shown in Fig. 12. The granular
semantics depends on the point-of-view of the reader. We surface in the query image is matched more accurately by
use our judgments here to determine the relevance of the SIMPLIcity system. We performed another test on this
images. In each query, we decide the relevance to the query query using SIMPLIcity system without the image classifi-
image before viewing the query results. We admit that our cation component. As shown in Fig. 12, the degraded
relevance criteria, specified in the caption of Fig. 11, may be system found several nontextured pictures (e.g., sunset
very different from the criteria used by a user of the system. scenes) for this textured query picture.
As WBIIS forms image signatures using wavelet coeffi- Typical CBIR systems do not perform well when the
cients in the lower frequency bands, it performs well with image databases contain both photographs and graphs.
relatively smooth images, such as most landscape images. Graphs, such as clip art pictures and image maps, appear
For images with details crucial to semantics, such as frequently on the Web. The semantics of clip art pictures are
pictures with people, the performance of WBIIS degrades. typically more abstract and significantly different from
In general, SIMPLIcity performs as well as WBIIS for photos with similar low-level visual features, such as the
smooth landscape images. One example is shown in color histogram. For image maps on the Web, an indexing
Fig. 11a. The query image is the image at the upper-left method based on Optical Character Recognition (OCR) may
corner. The underlined numbers below the pictures are the be more efficient than CBIR systems based on visual
ID numbers of the images in the database. The other two features. SIMPLIcity classifies picture libraries into graphs
numbers are the value of the similarity measure between and photographs using image segmentation and statistical
the query image and the matched image, and the number of hypothesis testing before the feature indexing step. Fig. 13
regions in the image. To view the images better or to see shows the result of a clip art query. All the best 11 matches
more matched images, users can visit the demonstration of this 200,000-picture database are clip art pictures, many
Web site and use the query image ID to repeat the retrieval. with similar semantics.
SIMPLIcity also gives higher precision within the best 11 6.3 Systematic Evaluation
or 29 matches for images composed of fine details. Retrieval
results with a photo of a hamburger as the query are shown 6.3.1 Performance on Image Queries
in Fig. 11b. The SIMPLIcity system retrieves 10 images with To provide numerical results, we tested 27 sample images
food out of the first 11 matched images. The WBIIS system, chosen randomly from nine categories, each containing
however, does not retrieve any image with food in the first three of the images. Image matching is performed on the
11 matches. It is often impossible to define the relevance COREL database of 200,000 images. A retrieved image is
between two given images. For example, the user may be considered a match if it belongs to the same category of the
interested in finding other hamburger images and not food query image. The categories of images tested are listed in
images. Returning food images is not likely to be more Table 1a. Most categories simply include images containing
helpful to the user than returning other images. The top the specified objects. Images in the ªsports and public
match made by SIMPLIcity is also a photo of hamburger eventsº class contain people in a game or public event, such
which is also perceptually very close to the query image. as a festival. Portraits are not included in this category. The
958 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 23, NO. 9, SEPTEMBER 2001
Fig. 11. Comparison of SIMPLIcity and WBIIS. The query image is the upper-left corner image of each block of images. Due to the limitation of
space, we show only two rows of images with the top 11 matches to each query. More matches can be viewed from the online demonstration site.
(a) Natural out-door scene, (b) food, (c) people, (d) portrait, and (e) flower.
ªlandscape with buildingsº class refers to outdoor scenes is difficult to estimate the total number of images in one
featuring man-made constructions such as buildings and category, even approximately. In the future, we will develop a
sculptures. The ªbeachº class refers to scenery at coasts or large-scale sharable test database to evaluate the recall.
river banks. For the ªportraitº class, an image has to show To account for the ranks of matched images, the average
people as the main feature. A scene with human beings as a of the precision values within k retrieved images,
1
P100 nk
minor part is not included. k 1; . . . ; 100, is computed. That is, p 100 k1 k and nk
Precision was computed for both SIMPLIcity and WBIIS. is the number of matches in the first k retrieved images.
Recall was not calculated because the database is large and it This average precision is called the ªweighted precisionº
WANG ET AL.: SIMPLICITY: SEMANTICS-SENSITIVE INTEGRATED MATCHING FOR PICTURE LIBRARIES 959
Fig. 12. SIMPLIcity gives better results than the same system without the classification component. The query image is a textured image.
because it is equivalent to a weighted percentage of computed for each query: 1) the precision within the first
matched images with a larger weight assigned to an image 100 retrieved images, 2) the mean rank of all the matched
retrieved at a higher rank. For instance, a relevant image images, and 3) the standard deviation of the ranks of matched
appearing earlier in the list of retrieved images would images.
enhance the weighted precision more significantly than if it The recall within the first 100 retrieved images is identical
appears later in the list. to the precision in this special case. The total number of
For each of the nine image categories, the average semantically related images for each query is fixed to be 100.
precision and weighted precision based on the three sample The average performance for each image category is
images are plotted in Fig. 14. The image category identifica- computed in terms of the three statistics: p (precision), r (the
tion number is indicated in Table 1a. Except for the tools mean rank of matched images), and (the standard deviation
and toys category, in which case the two systems perform of the ranks of matched images). For a system that ranks
about equally well, SIMPLIcity has achieved better results images randomly, the average p is about 0:1, and the average r
than WBIIS measured in both ways. For the two categories is about 500. An ideal CBIR system should demonstrate an
of landscape with buildings and vehicle, the difference average p of 1 and an average r of 50.
between the two systems is quite significant. On average, Similar evaluation tests were carried out for the state-of-
the precision and the weighted precision of SIMPLIcity are the-art EMD-based color histogram match. We used
higher than those of WBIIS by 0:227 and 0:273, respectively. LUV color space and a matching metric similar to the
EMD described in [18] to extract color histogram features
6.3.2 Performance on Image Categorization and match in the categorized image database. Two different
The SIMPLIcity system was also evaluated based on a subset color bin sizes, with an average of 13.1 and 42.6 filled color
of the COREL database, formed by 10 image categories bins per image, were evaluated. We call the one with less
(shown in Table 1b), each containing 100 pictures. Within this filled color bins the Color Histogram 1 system and the other
database, it is known whether any two images are of the same the Color Histogram 2 system. Fig. 15 shows the perfor-
category. In particular, a retrieved image is considered a mance when compared to the SIMPLIcity system. Clearly,
match if and only if it is in the same category as the query. This both of the two color histogram-based matching systems
assumption is reasonable since the 10 categories were chosen perform much worse than the SIMPLIcity region-based
so that each depicts a distinct semantic topic. Every image in CBIR system in almost all image categories. The perfor-
the subdatabase was tested as a query and the retrieval ranks mance of the Color Histogram 2 system is better than that of
of all the rest images were recorded. Three statistics were the Color Histogram 1 system due to more detailed color
separation obtained with more filled bins. However, the
Color Histogram 2 system is so slow that it is practically
impossible to obtain matches on databases with more than
50,000 images. For this reason, we cannot evaluate this
system using the COREL database of 200,000 images and the
27 sample query images described in the previous section.
SIMPLIcity runs at about twice the speed of the relatively
fast Color Histogram 1 system and still provides much
better searching accuracy than the extremely slow Color
Histogram 2 system.
6.4 Robustness
Fig. 13. SIMPLIcity does not mix clip art pictures with photographs. A
graph-photograph classification method using image segmentation and We have performed extensive experiments on the robustness
statistical hypothesis testing is used. The query image is a clip art of the system. Figs. 17 and 18 summarize the results. The
picture. graphs in the first row show the changes in ranking of the
960 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 23, NO. 9, SEPTEMBER 2001
TABLE 1
COREL Categories of Images Tested
target image as we increase the significance of image methods. Fast indexing has provided us with the capability of
alterations. The graphs in the second row show the the handling external queries and sketch queries in real time.
changes in IRM distance between the altered image and the The matching speed is very fast. When the query image
target image as we increase the significance of image is in the database, it takes about 1:5 seconds of CPU time on
alterations. average to sort all the images in the 200,000-image database
The system is fairly robust to image alterations such as using the IRM similarity measure. If the query image is not
intensity variation, sharpness variation, intentional color already in the database, one extra second of CPU time is
distortions, other intentional distortions, cropping, shifting, spent to extract the feature from the query image.
and rotation. Fig. 16 shows some query examples, using the
200,000-image COREL database. On average, the system is 7 CONCLUSIONS AND FUTURE WORK
robust to approximately 10 percent brightening, 8 percent
darkening, blurring with a 15 15 Gaussian filter, 70 percent In this work, we experimented with the idea that images
sharpening, 20 percent more saturation, 10 percent less can be classified into global semantic classes, such as
saturation, random spread by 30 pixels, and pixelization by textured or nontextured, graph or photograph, and that
25 pixels. These features are important to biomedical image much can be gained if the feature extraction scheme is
databases because usually visual features of the query image tailored to best suit each class. For the purpose of searching
are not identical to the visual features of those semantically- general-purpose image databases, we have developed a
relevant images in the database because of problems such as series of statistical image classification methods, including
the graph-photograph, textured-nontextured classifiers. We
occlusion, difference in intensity, and difference in focus.
have explored the application of advanced wavelets in
6.4.1 Speed feature extraction. We have developed an image region
segmentation algorithm using wavelet-based feature
The algorithm has been implemented on a Pentium III
extraction and the k-means statistical clustering algorithm.
450MHz PC using the Linux operating system. To compute
Finally, we have developed a measure for the overall
the feature vectors for the 200; 000 color images of
similarity between images, i.e., the Integrated Region
size 384 256 in our general-purpose image database
Matching (IRM) measure, defined based on a region-
requires approximately 60 hours. On average, one second is matching scheme that integrates properties of all the
needed to segment an image and to compute the features of all regions in the images, resulting in a simple querying
regions. The speed is much faster than other region-based interface. The advantage of using such a soft matching is the
improved robustness against poor segmentation, an im-
portant property overlooked in previous work.
The application of SIMPLIcity to a database of about
200,000 general-purpose images shows more accurate and
much faster retrieval compared with the existing algorithms.
An important feature of the algorithms implemented in
SIMPLIcity is that it is fairly robust to intensity variations,
sharpness variations, color distortions, other distortions,
cropping, scaling, shifting, and rotation. The system is also
easier to use than other region-based retrieval systems.
The system has several limitations:
Fig. 15. Comparing SIMPLIcity with color histogram methods on average precision p, average rank of matched images r, and the standard deviation
of the ranks of matched images . The lower numbers indicate better results for the last two plots (i.e., the r plot and the plot). Color Histogram 1
gives an average of 13.1 filled color bins per image, while Color Histogram 2 gives an average of 42.6 filled color bins per image. SIMPLIcity
partitions an image into an average of only 4.3 regions.
IRM distance should be computed after merging the A limitation of our current evaluation results is that they
matched regions. are based mainly on precision or variations of precision. In
3. The statistical semantic classification methods do not practice, a system with a high overall precision may have a
distinguish images in different classes perfectly.
low overall recall. Precision and recall often trade off
Furthermore, an image may fall into several seman-
tic classes simultaneously. against each other. It is extremely time-consuming to
4. The querying interfaces are not powerful enough to manually create detailed descriptions for all the images in
allow users to formulate their queries freely. For our database in order to obtain numerical comparisons on
different user domains, the query interfaces should recall. The COREL database provides us rough semantic
ideally provide different sets of functions. labels on the images. Typically, an image is associated with
Fig. 16. The robustness of the system to image alterations. Due to space, only the best five matches are shown. The first image in each example is
the query image. Database size: 200,000 images.
962 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 23, NO. 9, SEPTEMBER 2001
Fig. 17. The robustness of the system compared to image alterations. Six query images were randomly selected from the database. Each curve
represents the robustness on one of the six images.
one keyword about the main subject of the image. For sharable testbed for statistical evaluation of different
example, a group of images may be labeled as ªflowerº and CBIR systems. Experiments with a WWW image database
another group of images may be labeled as ªKyoto, Japan.º or a video database could be another interesting study.
If we use the descriptions such as ªflowerº and ªKyoto,
Japanº as definitions of relevance to evaluate CBIR systems,
it is unlikely that we can obtained a consistent performance ACKNOWLEDGMENTS
evaluation. A system may perform very well on one query This work was supported in part by the US National Science
(such as the flower query), but very poorly on another (such Foundation under grant IIS-9817511. Research was per-
as the Kyoto query). Until this limitation is thoroughly formed while J.Z. Wang and J. Li were at Stanford University.
investigated, the evaluation results reported in the compar-
The authors would like to thank Shih-Fu Chang, Oscar
isons should be interpreted cautiously.
Firschein, Martin A. Fischler, Hector Garcia-Molina, Yoshi-
A statistical soft classification architecture can be devel-
nori Hara, Kyoji Hirata, Quang-Tuan Luong, Wayne Niblack,
oped to allow an image to be classified based on its
probability of belonging to a certain semantic class. We and Dragutin Petkovic for valuable discussions on content-
need to design more high-level classifiers. The speed can be based image retrieval, image understanding, and photogra-
improved significantly by adopting a feature clustering phy. They would also like to acknowledge the comments and
scheme or using a parallel query processing scheme. We constructive suggestions from anonymous reviewers and the
need to continue our effort in designing simple but capable associate editor. Finally, they thank Thomas P. Minka for
graphical user interfaces. We are planning to build a providing them with the source codes of the MIT Photobook.
REFERENCES [28] J.Z. Wang, G. Wiederhold, O. Firschein, and X.W. Sha, ªContent-
Based Image Indexing and Searching Using Daubechies' Wave-
[1] M.C. Burl, M. Weber, and P. Perona, ªA Probabilistic Approach to lets,º Int'l J. Digital Libraries, vol. 1, no. 4, pp. 311-328, 1998.
Object Recognition Using Local Photometry and Global Geome- [29] J.Z. Wang, J. Li, G. Wiederhold, and O. Firschein, ªSystem for
try,º Proc. European Conf. Computer Vision, pp. 628-641, June 1998. Screening Objectionable Images,º Computer Comm., vol. 21, no. 15,
[2] C. Carson, M. Thomas, S. Belongie, J.M. Hellerstein, and J. Malik, pp. 1355-1360, 1998.
ªBlobworld: A System for Region-Based Image Indexing and [30] J.Z. Wang and M.A. Fischler, ªVisual Similarity, Judgmental
Retrieval,º Proc. Visual Information Systems, pp. 509-516, June 1999. Certainty and Stereo Correspondence,º Proc. DARPA Image
[3] I. Daubechies, Ten Lectures on Wavelets. Philadelphia: SIAM, 1992. Uunderstanding Workshop, 1998.
[4] M. Flickner, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom
et al. ªQuery by Image and Video Content: The QBIC System,º James Z. Wang received the Summa Cum
IEEE Computer, vol. 28, no. 9, 1995. Laude bachelor's degree in mathematics and
[5] M. Fleck, D.A. Forsyth, and C. Bregler, ªFinding Naked People,º computer science from University of Minnesota
Proc. European Conf. Computer Vision, vol. 2, pp. 593-602, 1996. (1994), the MSc degree in mathematics and the
[6] A. Gersho, ªAsymptotically Optimum Block Quantization,º IEEE MSc degree in computer science, both from
Trans. Information Theory, vol. 25, no. 4, pp. 373-380, July 1979. Stanford University (1997), and the PhD degree
[7] A. Gupta and R. Jain, ªVisual Information Retrieval,º Comm. from Stanford University Biomedical Informatics
ACM, vol. 40, no. 5, pp. 70-79, May 1997. Program and Computer Science Database
[8] J.A. Hartigan and M.A. Wong, ªAlgorithm AS136: A k-means Group (2000). He is the holder of the PNC
Clustering Algorithm,º Applied Statistics, vol. 28, pp. 100-108, 1979. Technologies Career Development Endowed
Professorship at the School of Information Sciences and Technology
[9] R. Jain, S.N.J. Murthy, P.L.-J. Chen, and S. Chatterjee, ªSimilarity
and the Department of Computer Science and Engineering at The
Measures for Image Databases,º Proc. SPIE, vol. 2420, pp. 58-65,
Pennsylvania State University. He has been a visiting scholar at
Feb. 1995.
Uppsala University in Sweden, SRI International, IBM Almaden
[10] K. Karu, A.K. Jain, and R.M. Bolle, ªIs There any Texture in the
Research Center, and NEC Computer and Communications Research
Image?º Pattern Recognition, vol. 29, pp. 1437-1446, 1996.
Lab. He is a member of the IEEE.
[11] W.Y. Ma and B. Manjunath, ªNaTra: A Toolbox for Navigating
Large Image Databases,º Proc. IEEE Int'l Conf. Image Processing,
Jia Li received the BS degree in electrical
pp. 568-571, 1997.
engineering from Xi'an JiaoTong University,
[12] T.P. Minka and R.W. Picard, ªInteractive Learning Using a Society China, in 1993, the MSc degree in electrical
of Models,º Pattern Recognition, vol. 30, no. 3, p. 565, 1997. engineering in 1995, the MSc degree in statistics
[13] S. Mukherjea, K. Hirata, and Y. Hara, ªAMORE: A World Wide in 1998, and the PhD degree in electrical
Web Image Retrieval Wngine,º Proc. World Wide Web, vol. 2, no. 3, engineering in 1999, all from Stanford Univer-
pp. 115-132, 1999. sity. She is an assistant professor of statistics at
[14] A. Natsev, R. Rastogi, and K. Shim, ªWALRUS: A Similarity The Pennsylvania State University. In 1999, she
Retrieval Algorithm for Image Databases,º SIGMOD Record, worked as a research associate in the Computer
vol. 28, no. 2, pp. 395-406, 1999. Science Department at Stanford University. She
[15] A. Pentland, R.W. Picard, and S. Sclaroff, ªPhotobook: Tools for was a researcher at the Xerox Palo Alto Research Center from 1999 to
Content-Based Manipulation of Image Databases,º Proc. SPIE, 2000. Her research interests include statistical classification and
vol. 2185, pp. 34-47, Feb. 1994. modeling, data mining, image processing, and image retrieval. She is
[16] E.G.M. Petrakis and A. Faloutsos, ªSimilarity Searching in a member of the IEEE.
Medical Image Databases,º IEEE Trans. Knowledge and Data Eng.,
vol. 9, no. 3, pp. 435-447, May/June 1997. Gio Wiederhold received a degree in aeronau-
[17] R.W. Picard and T. Kabir, ªFinding Similar Patterns in Large tical engineering in Holland in 1957 and the PhD
Image Databases,º Proc. IEEE Int'l Conf. Acoustics, Speech, and degree in medical information science from the
Signal Processing, vol. 5, pp. 161-164, 1993. University of California at San Francisco in 1976.
[18] Y. Rubner, L.J. Guibas, and C. Tomasi, ªThe Earth Mover's He is a professor of computer science at
Distance, Multi-Dimensional Scaling, and Color-Based Image Stanford University with courtesy appointments
Retrieval,º Proc. DARPA Image Understanding Workshop, pp. 661- in medicine and electrical engineering. He has
668, May 1997. supervised 30 PhD theses and published more
[19] G. Sheikholeslami, W. Chang, and A. Zhang, ªSemantic Clustering than 350 books, papers, and reports. He has
and Querying on Heterogeneous Features for Visual Data,º Proc. been elected fellow of the ACMI, the IEEE, and
ACM Multimedia, pp. 3-12, 1998. the ACM. His current research includes privacy protection in collabora-
[20] J. Shi and J. Malik, ªNormalized Cuts and Image Segmentation,º tive settings, software composition, access to simulations to augment
Proc. Computer Vision and Pattern Recognition, pp. 731-737, June information systems, and developing an algebra over ontologies. Prior to
1997. his academic career, he spent 16 years in the software industry. His
[21] J.R. Smith and S.-F. Chang, ªVisualSEEk: A Fully Automated Web page is http://www-db.stanford.edu/people/gio.html.
Content-Based Image Query System,º Proc. ACM Multimedia,
pp. 87-98, Nov. 1996.
[22] J.R. Smith and C.S. Li, ªImage Classification and Querying Using
Composite Region Templates,º Int'l J. Computer Vision and Image
Understanding, vol. 75, nos. 1-2, pp. 165-174, 1999.
[23] S. Stevens, M. Christel, and H. Wactlar, ªInformedia: Improving . For more information on this or any other computing topic,
Access to Digital Video,º Interactions, vol. 1, no. 4, pp. 67-71, 1994. please visit our Digital Library at http://computer.org/publications/dlib.
[24] M. Szummer and R.W. Picard, ªIndoor-Outdoor Image Classifica-
tion,º Proc. Int'l Workshop Content-Based Access of Image and Video
Databases, pp. 42-51, Jan. 1998.
[25] M. Unser, ªTexture Classification and Segmentation Using
Wavelet Frames,º IEEE Trans. Image Processing, vol. 4, no. 11,
pp. 1549-1560, Nov. 1995.
[26] A. Vailaya, A. Jain, and H.J. Zhang, ªOn Image Classification: City
versus Landscape,º Proc. IEEE Workshop Content-Based Access of
Image and Video Libraries, pp. 3-8, June 1998
[27] J.Z. Wang, J. Li, R.M. Gray, and G. Wiederhold, ªUnsupervised
Multiresolution Segmentation for Images with Low Depth of
Field,º IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23,
no. 1, pp. 85-91, Jan. 2001.