Keywords

1 Introduction

Data visualization is becoming increasingly important to understand information extracted from the web using data mining technology. The visualized data are considered objective in many research areas [13]; however, subjective data can be used to investigate consumers’ evaluations of products or companies and understand brand images that they hope to demonstrate to people. In this study, we extracted quantitative features of brand images from specific home interior company web sites and visualized their features. We extracted color features from the images and used them to compare the brands.

1.1 Background

Brand image is a factor of brand value, which is one of the important assets for companies and contributes to driving sales, growing market share, and building shareholder value [4]. In this study, we define brand image as affecting the value and impression communicated between the company and customers. The communication is performed in shopping, advertising, and any other scenes of market in real world and on the Internet. Therefore, we regard brand images communicated via the Internet as subjective information of the web, and tried to extract it by web image analysis. However, various images may be formed from a single brand because the brand image is subjective. In such a case, the company would not communicate with customers effectively, because the image the company is expecting to create and the images actually impressed upon the people are different. This paper considers brand images expected by companies, because our final goal is to clarify the brand image desired by the company and evaluate the communication of the company with its customers.

1.2 Related Works

In terms of data mining, there are two types of data, structured and unstructured data. Text and images are mostly unstructured data. Extraction of knowledge from a web image is classified as web content mining or web image mining.

The main focus of knowledge mining from images on the web is the acquisition and application of visual concepts that correspond to the text. The mining of visual concepts generally requires large numbers of tagged images. In this regard, image databases such as ImageNet [5], Google Image Search [6, 7], and photo-sharing services [6, 8] are exploited.

To date, researchers who have studied the extraction of a feature of color automatically from the image data, have proposed the use of techniques such as color histograms and higher order local autocorrelation coefficients. In these studies, because the meaning of each of the variables is unclear, it is difficult to apply to design and marketing. Therefore, we propose methods that use a small number of representative colors in the image as feature followed by their application for estimating the impression.

Wang et al. [9] extracted one color to be perceptually central in the image as the dominant color. Their approach was to remove all colors except the dominant color, which is a good idea in terms of perceptual weight. Nevertheless, for use in the estimation of impression when viewed as a whole photo, it is insufficient. The impression created by products was also evaluated by Niwa and Kato [10], who extracted features from only the product area of ​​the product photo. However, it is considered that the effective analysis of a brand image would necessarily have to include the background area of ​​the image photograph.

We consider color features to sufficiently represent the brand image without the need for information about the composition, object, and shapes. In this study, we describe the analysis of the brand image as conveyed by a photo with color features.

2 Extraction of Color Feature from Image Photograph

We analyzed web images by using representative colors that are used as the main colors in a photograph. In this section, we describe images that are useful to understand the brand image and the method we used to extract the representative colors.

2.1 Image Photographs

Our approach focused on the brand image conveyed by images on the web sites of home interior businesses. We collected the images from their respective websites to analyze the features for comparison purposes. There are mainly three types of images: image photographs, product photographs, and parts of the web page layout. Image photographs do not only show products but also the floor, the wall or other furniture in the background. Product images show only products in contrast. The background of a product photograph is usually a single color such as white. Parts of the page layout also contain images; e.g., small pictures of buttons to click. A product photograph is useful to analyze the features of the products because it is easy to remove the background from the image. However, our focus is on image photographs rather than on product photographs, because image photographs are used to show examples of a combination of items and to express the concept of a product and brand image, whereas product photographs are used explanatorily on catalog pages (Fig. 1).

Fig. 1.
figure 1

Examples of an image photograph (left) and product photograph (right)

2.2 Extraction of Representative Colors

Representative color is one of the ways to express an image feature, and its combination is the color theme. The number of representative colors depends on each individual image photograph. However, the number was assumed to be constant by many researchers, conducting research on color themes, for the purpose of comparing color themes and calculating the dissimilarity between images. The visualization of affective data requires the expression of color features to match human perception. Therefore, we employed hierarchical clustering to extract representative colors without specifying a number of colors.

Extraction of the representative colors from image photographs requires us to determine the amount of space covered by each color. This was calculated for each color cluster, which is created by aggregate hierarchical clustering of a single link method. The aggregate hierarchical clustering algorithm combines a pair of two similar clusters iteratively. This algorithm needs a stop criterion as parameter and the threshold corresponds to perceptual distinction. We used 3.0 as the height of pruning that is obtained as perceptual value in the pre-experiment.

Clusters that covered only one pixel were excluded from this study, and the remaining clusters were extracted as the representative colors of the image photograph. The reason for excluding clusters consisting of one pixel is the following: Clusters obtained by hierarchical clustering are divided into groups of large clusters, small clusters, and very small clusters. In general, large clusters have low saturation, whereas high saturation color is seen in small clusters. In the case of home interiors, vivid colors are not used on the walls and floor, because they are added in the form of small furniture items and accessories for accentuation. In other words, large clusters covering many pixels (e.g., in Fig. 2 the lower left corner of the images in white or gray or the black floor in the bottom right-hand corner) correspond to a wall or the floor, whereas small clusters (the blue green color in the lower left corner, or the yellow in the bottom right) correspond to other furniture. Therefore, a combination of large and small clusters is considered to contain the necessary information for image analysis of the interior images. On the other hand, very small clusters covering only 1 pixel occur in the lower right-hand side of Fig. 2. These pixels are considered to originate from a small amount of visual noise caused by lighting. Although there is a large number of these colors, we do not consider them to be important.

Fig. 2.
figure 2

Examples of representative colors extracted (Color figure online)

2.3 Materials

The target brands are the typical interior brands of eight companies that are based locally and abroad: arflex (608), Cassina (471), Carl Hansen and Son (360), IKEA (2247), Herman Miller (154), Ralph Lauren Home (615), Karimoku (3985), and Nitori (188). The number in parentheses is the number of images downloaded from their respective websites. However, the variation in these numbers is such that a meaningful comparison of image photographs would not be possible. Thus, we randomly sampled 150 images per brand for analysis purposes.

2.4 Result of Extraction

Firstly, we described the method to analyze. Since the number of representative colors extracted from the image photograph is unspecified, it is difficult to compare the representative colors for each picture. Therefore we analyzed all the representative colors extracted in Fig. 3.

Fig. 3.
figure 3

Scatter plot of hue and saturation extracted from image photograph of each brand

Figure 3 is the plane of hue and saturation. This distribution has a spread in the direction of orange and the blue at the center of the achromatic color. Such data structure is suitably expressed using hues. Hue is a angular value, and it is difficult to quantitative treatment. However hue is one of scale forming psychological color space, and it is characterized in that intuitively easy to understand the color from the value. For this reason hue is effective as a representation of the features of the color-coordination and design. As seen from the figure, there are two major peaks and spreads around them on orange (hue = 30°) and blue (hue = −150°). These hues are typically used in the pictures of the interior, which is useful in modeling and design the color features.

Also distribution of each brand also had different characteristics. For example, IKEA is widely distributed in some directions. Karimoku was not distributed CB direction when compared to R or O direction. Also Herman Miller and Nitori are distributed in low saturation.

3 Clustering and Scaling Color Feature

3.1 Characteristic of Each Brand (Qualitative Analysis)

The results of analysis of images using hue of Sect. 2, use trend of the representative color of the image photograph in the interior is characterized by the distribution of hue, and two directions are found: orange and blue. Thus parameters need to represent color distribution on the directions as a characteristic of the brand image, and we thought that these parameters, representing the characteristics of each brand and can be used to model the brand image.

It should be noted that, color distribution and interior photos of the color features of the chromaticity diagram, roughly put together it is considered that there is a following of such relevance:

  • There are large protrusion in the orange and blue direction of all brands. Specifically. Orange is used in large pieces of furniture or wallpaper, such as a table or sofa. Blue is used in small articles and cushions as accent color.

  • Brand distributed in the orange direction in the brand that mainly use the wood, are used warm colors and natural materials. As a background color the color of the wallpaper, as the commodity is considered, such as wooden material corresponds to this.

  • If it is widely distributed in each direction using a colorful color scheme.

  • Green vivid colors are not used much. Thus, bright purple and green is the color that is not used in most pictures of the interior, it is considered that there is no need to adopt as a parameter.

3.2 Quantitative Scale of Hue Parameters

To evaluate the characteristics as described in the previous section in a quantitative indication, we describe a method for statistically classification of hue and determining the attribution degree of a representative color against a cluster.

Distribution of the hue shown in Fig. 3 represents two peaks, but two colors are too small to express the characteristics of the image photograph. Therefore, in order to further subdivide these colors, using a dendrogram (Fig. 4) of the Word clustering. Because the hue value is represented by the angle, it is difficult such as the average of the calculations, the distance between the hue is determined as the smaller of the two angular difference. Aggregate hierarchical clustering was employed since it is sufficient definitions distance between data. It should be noted that we did not use the colors whose saturation is less than 5 % as data for clustering because a hue of the achromatic color is undefined and a hue of low saturation colors is tend to increase error of hue. In the dendrogram, it can be seen that are created large two clusters. In this study, however, it was divided into four clusters. Table 1 shows sizes and centers of each cluster. The color of low saturation, which did not apply to the hierarchical clustering, were grouped as cluster zero.

Fig. 4.
figure 4

Dendrogram of word clustering

Table 1. Number of colors and center hue of word clustering

It should be noted that the center \( {\bar{\text{H}}}_{i} \) of hue was determined by following equation.

$$ \bar{H}_{i} = atan2\left( {\frac{1}{{\left\| {c_{i} } \right\|}}\sum\nolimits_{{r \in C_{i} }} {cos\left( {H_{r} } \right)} , \frac{1}{{\left\| {c_{i} } \right\|}}\sum\nolimits_{{r \in C_{i} }} {sin\left( {H_{r} } \right)} } \right) $$
(1)

Where \( {\text{H}}_{\text{r}} \) is the hue of the color \( r \) belonging to the cluster \( C_{i} \), and \( atan2\left( {x, y} \right) \) is function for determining the angle against the x-axis from the x and y on the unit circle. What this function does is, after obtaining the average by orthogonal coordinate, transforming it in polar coordinate again. Because hue values inside a cluster are distributed in a limited period, the error does not occur by this calculation method.

Next, we describe a method to express the color features of the interior pictures of using the result of clustering. As an overview, since one of the representative color extracted from the image belongs to one of the five clusters of Table 1, it calculates the attribution degree of the representative color and the cluster to determine how much used each colors.

Explaining how to determine the attribution degree \( A_{i} \left( r \right) \) of the representative color \( r \) and cluster \( C_{i} \). First of all, when \( i = 0 \) i.e. attributable to the degree of achromatic color cluster is determined as follows.

$$ A_{0} \left( r \right) = \left\{ {\begin{array}{*{20}c} {1 - S_{r} } & {S_{r} < 0.05} \\ { 0 } & {otherwise } \\ \end{array} } \right. $$
(2)

Where \( S_{r} \) is the saturation of color \( r \). Color with low saturation is about attribution degree is higher close to the achromatic color, color saturation of more than 5 % is attributable degree to zero. Then, the degree of attribution to \( i \ne 0 \) i.e. colorful cluster is determined as follows.

$$ A_{i} \left( r \right) = \left\{ {\begin{array}{*{20}c} {b_{i} \left( r \right) \cdot S_{r} \cdot cosH_{r} - \bar{H}_{i} } & {S_{r} \ge 0.05} \\ 0 & {otherwise} \\ \end{array} } \right. $$
(3)

\( H_{r} \) is hue of representative color r and \( b_{i} \left( r \right) \) is binary function that indices whether \( r \) is belong to the cluster \( C_{i} \). As shown in Eq. (3), the hue of the color \( r \) is more close to the center of the cluster, attribution degree higher saturation increases. However, the degree of attribution is exclusive, and all attribution degree of clusters except the nearest to \( r \) is zero. Similarly, Attribution degree of low saturation representative colors against non-achromatic clusters is also zero.

Then color feature vector of each image can be formed using the feature quantity. Image features \( {\text{f}} = ({\text{f}}_{0} ,{\text{f}}_{1} ,{\text{f}}_{2} ,{\text{f}}_{3} ,{\text{f}}_{4} ) \) is defined as follows.

$$ {\text{f}}_{\text{i}} = \frac{1}{{\left\| {\left\{ r \right\}} \right\|}}\sum\nolimits_{r} {A_{i} \left( r \right)} $$
(4)

\( \left\{ r \right\} \) is the set of representative colors extracted from the image photograph. \( f_{i} \) represents the percentage of clusters which the representative colors in the image belong to, and \( \sum\nolimits_{i} {f_{i} } \) does not exceed 1.

4 Classification of Brand Images Using Color Features

\( {\text{f}}_{\text{i}} \) proposed in above is the image feature which has 5 parameters, and can be used to compare or image search among brands. In the study, we show result of visualization of brand images as an application. In particular, the variables from the feature vectors of two selected, mapped on the plane.

In Fig. 6, pictures of each brand mapped on \( {\text{f}}_{1} \) (\( \bar{H}_{i} = 14.88 \), i.e. red) and \( {\text{f}}_{2} \) (\( \bar{H}_{i} = 34.23 \), i.e. orange) plane are shown. As seen in the figure, pictures containing vivid red objects are mapped on lower right, and similarly orange and yellow’s ones are on upper left, and proposed \( {\text{f}}_{\text{i}} \) is efficient to represent the corresponding color features. Also photos at the bottom left that are not neither orange nor red are mapped. Since the majority of the photos do not have bright colors, the distribution is concentrated in the vicinity of the origin. Regarding difference of each brand, this is able to reflect the analysis discussed in Sect. 3. For example, IKEA is not so much to use the orange, it uses vivid red. Karimoku and Nitori contrary to use well the orange. Further, by using other color variables, it is possible to visualize the relationship between the brand image and color feature.

5 Discussion

5.1 Trend of the Representative Colors in Image Photographs of Home Interior Brands

In Sect. 2, we described the method for extraction of colors and an overview of the distribution of the extracted color features. As a result, achromatic concentration and the distribution of hue of orange and the middle of cyan and blue was observed. In this section, we discuss the relationship with the pictures of each of the color and interior.

Orange or yellow corresponds to the color of the wooden floors and walls. Cyan and blue is opportunity color, thus these colors seem to be used for harmonic. Because in many cases the color of the walls that can not be easily changed in the interior, white or low saturation that is easy to match any other colors is preferred to use in many cases. Further the wall takes a large area in the picture, then the representative color likely to be extracted. Hence the color of low saturation in many image photo believed to have been extracted as a representative color.

And since the middle of the color of red, cyan and blue, both the middle of the achromatic color and is considered to have been used as a harmony color with respect to orange. Cyan and blue that are in a relationship of orange and contrast, many of the images is a color that can be used in photography.

Red is there is a different trend in the use by the brand. In arflex and Cassina, in accordance with the low saturation background, it has been used as a bright red sofa and chairs. On the other hand the color of the massive tree in Ralph Lauren and Karimoku has occurred redness.

In addition, the color of the wood in the pictures of the interior is often seen due to the characteristics of the buildings and furniture. Other furniture in order to further harmonize with the colors were also seen cases be the same color or similar color. Wood has various types and colors, they are the low saturation color to vivid red or orange. Thus, such orange has a rather wide distribution than the particular, and it is considered to be spreading in orange direction in Fig. 3.

And since the red and the middle of cyan and blue, both is considered to have been used as a harmony color with achromatic color and orange. Cyan and blue that are in a opposed relationship of orange, can be used in photography. Red is there is a different usage by the brand. In arflex and Cassina, it has been used as a bright red sofa and chairs with the low saturation background. On the other hand the color of the massive wood in Ralph Lauren and Karimoku has occurred redness.

5.2 Application of Classification of Brand Image

In Sect. 4, we classifies the representative colors into four clusters to construct feature space, and then commonality and difference between brands were observed. These commonality and difference are discussed as qualitative analysis, and in order to carry out the automatic classification of brand image, it is required to have a method of learning the relationship of the feature and the brand, as feature tasks. As remarked in previous discussion, the representative color is used for the particular materials and elements as there are things that have been or used to support the harmony of color in a whole image. in this section, color features of each brand of the type is in the image photograph with differences will be discussed based on Fig. 5.

Fig. 5.
figure 5

Visualization of color features of all target brands (Color figure online)

As the common element between the brand, there is a concentration of the distribution of the over the original point and the low-saturation orange. It is considered to be a basic color in pictures of the interior no matter the brand. From the viewpoint of classification and differences of brand, it may be noted the difference in the shape of a distribution as seen in Fig. 5. But it is very difficult to classify the individual image photograph for each brand. The reason is that the majority of pictures of low saturation, and likeness of brand of these pictures is not easy to understand. The purpose of this study is to understand the characteristics of the brand image, not classification and search system of brand image of the picture. However in considering such a service, it would have to be designed in a different concept from the present study.

6 Conclusion

In this study, we have been proposed for the method to design features for visualization of brand images using color features. While traditional approach has been to extract a generic image features, the proposed method by clustering the representative colors is intended to reflect the characteristics of the color scheme of the target contents. In this study, we analyzed eight of the interior brand, constitute a feature vector by four hue clusters and the achromatic color cluster, and use them to visualize the color feature. Proposed method is expected to be constructed by the same procedure for the other brands and content.

In this study, we have been proposed for the method to design features for visualization of brand images using color features. While traditional approach has been to extract a generic image features, the proposed method by clustering the representative colors is intended to reflect the characteristics of the color scheme of the target contents. In this study, we analyzed eight of the interior brand, constitute a feature vector by four hue clusters and the achromatic color cluster, and use them to visualize the color feature. Proposed method is expected to be constructed by the same procedure for the other brands and content.

To analyze the extracted color is from what object quantitatively, object recognition technology is required. Since we used the characteristics of only the color, it was only qualitative analysis mechanisms of the data generation. But for the estimation and image search of the brand image mentioned above, such a detailed analysis is considered to be essential. The proposal of designing the feature is future work such as to achieve this.

In addition, since the representative color to be employed to image photo vary depending on the season and events, the trend of the analysis data is considered to be significantly different depending on the time and the season to collect image data. In this study, we adapted home interior that is not so much affected by seasonal interior. However, When target is other contents, this method was applied to the image data of the same brand for each period of the season and various types of events. And analysis of the change in the brand image of each season analysis and, also, universal brand image not depending on a specific season and various events is considered to be very interesting research theme, which is a topic for future research.