Abstract
The virtual restoration of historic murals holds immense importance in the realm of cultural heritage preservation. Currently, there are three primary technical issues. First and foremost, it is imperative to delineate the precise location where the mural necessitates restoration. Second, the original color of the mural has changed over time, resulting in a difference from its current appearance. Then, while the method utilizing convolutional neural networks is effective in restoring small defaced areas of murals, its effectiveness significantly diminishes when applied to larger areas. The primary objectives of this paper are as follows: (1) To determine the large and small areas to be restored, the authors employ hyperspectral super-pixel segmentation and support vector machine-Markov random field (SVM-MRF) classification. (2) The authors transform the hyperspectral mural images into more realistic and accurate red-green-blue (RGB) images using the Commission Internationale de l’Eclairage (CIE) standard colorimetric system. (3) The authors restored the images respectively using convolutional neural network and matching image block-based approaches depending on the size of the areas to be mended. The proposed method has enhanced the image quality assessment (IQA) in terms of both color quality and restoration effects. In contrast to the pseudo-color fusion method, the color optimization algorithm described in this research enhances the multi-scale image quality (MUSIQ) by 8.42%. The suggested technique enhances MUSIQ by 2.41% when compared to the convolutional neural network-based image inpainting algorithm.
Similar content being viewed by others
Introduction
The mural is an ancient kind of painting employed by early humans. Ancient civilizations such as Egypt, India, Babylon, and China have enormous murals that have endured. The paintings portray individuals, animals, and objects, providing visual representations of the societal hierarchy, professional distinctions, and lifestyle prevalent during that era. Nevertheless, murals are commonly discovered in distinctive environments like caves or burial chambers, which are prone to deterioration caused by factors such as moisture, fluctuations in temperature, illumination, and other variables. In addition, they may also face threats from human activities. The mural paintings in China, which have a long history, have suffered varied degrees of damage. Currently, the preservation of these artworks is facing significant obstacles. The Kizil Grottoes in Xinjiang are subject to ongoing degradation and destruction as a result of the delicate nature of the rock mass and frescoes, deteriorating environmental conditions, natural erosion, and unforeseen tragedies. The Qianling Mausoleum in Shaanxi Province was plundered by individuals who excavated tunnels, resulting in the tomb being inundated with mud and water, causing harm to the murals. The murals in Tiantishan Grottoes in Gansu province have suffered varied degrees of damage due to past earthquakes and the flooding of the nearby reservoir. Figure 1 illustrates that a multitude of elements contribute to the occurrence of fractures, stains, fading, missing sections, mildew, and other forms of degradation on the murals, each varying in severity. Given the dire state of the murals, it is imperative that they be conserved to safeguard human cultural heritage and promote sustainable cultural development.
Conventional restoration methods often consider aspects such as environmental impact, color degradation, and surface microbe growth [1,2,3]. Subsequently, they replicate the process of evolution through controlled laboratory experiments, ascertain the initial conditions of the murals, and subsequently proceed to reconstruct the murals directly utilizing the aforementioned information. Presently, there are challenges associated with employing conventional restoration approaches, mostly due to their irreversibility. The need for a significant amount of time and strict professional standards for restorers are other aspects that should be considered. Conventional restoration methods are struggling to meet the requirements of the contemporary world because of the swift advancement of computer technology, and are being substituted by intelligent and digital procedures. The artificial intelligence system, trained using digital image restoration techniques, initially formulates hypotheses about the content of damaged areas. It then proceeds to fill in the missing parts of the original images [4].
The majority of mural image virtual restoration uses image inpainting techniques, which denotes the method of inferring and reconstructing absent or concealed segments of an image in computer vision. Traditional image inpainting methods are based on texture and structure since many pictures have consistent internal patterns. Techniques based on Statistics and machine learning are frequently used to anticipate and fill missing or loss-less sections by extracting texture and structural information from mural images [5, 6]. Pen et al. [7] introduced a priority-based approach for mural image inpainting that utilized a genetic algorithm to connect the structural data of the damaged area, and the texture was then tailored to fill the space, resulting in the most effective restoration of the murals in Mogao Grottoes. Wang et al. [8] developed a mural image restoration algorithm that utilized line drawings as a guide and employed a linear combination of multiple candidate patches to create the desired patch to complete the inpainting of the mural paintings in the Mogao Grottoes. Cao et al. [9] devised a method for local search and adaptive sample block using Criminisi algorithm, and data items were modified and a novel priority function was developed to enhance the order in which images were filled, based on the feature values of the mural painting components. Zhou et al. [10] utilized a structure-oriented feature refinement module to enhance the characteristics of the missing parts and used the inferred color correlation in the structural information to achieve the digital restoration of the paintings in the Mogao Grottoes. Wei et al. [11] proposed a technique that restored mural images of the Mogao Grottoes by employing color space for access decomposition and the total variation (TV) noise model for image decomposition. Xu et al. [12] presented a virtual and real fresco restoration display approach based on the visual attention mechanism to improve the semantic significance of virtual fresco restoration and the immersion of visual perception. Jaidilert et al. [13] introduced a semi-automatic scratch identification system that utilizes the region growth method and numerous variational interpolation techniques to pixel fill and color repair the missing parts of Thai artworks.
With the rise of deep learning technology, a large number of image inpainting methods based on convolutional neural networks (CNN) and generative adversarial networks (GAN) have emerged, and some of these networks, which had excellent results on other datasets, were used for mural image inpainting [14]. General CNN image completion involves the following steps. First, create the training set containing image pairings with missing parts and full images to confirm the correlation. Then, network architectures such as U-Net [15] and DeepFill [16], are constructed. In these networks, encoders extract visual characteristics into a low-dimensional representation, while decoders reconstruct the completed image. After training with the supplied training set, the CNN model learns to estimate the missing part’s pixel value. The trained CNN model receives the completed images. The model will generate missing pixel values based on known information to complete the images. Liu et al. [17] developed a model made up of a multi-stage progressive reasoning network (MPRNet) and a multi-scale feature aggregation (MFA) module to achieve the inpainting of the Mogao Grottoes painting. Schmidt [18] suggested a highly noise-resistant inpainting technique by combining content adaptive resampler (CAR) and hierarchical information extraction network (HiNet) and produced great results in the mural paintings in Mogao Grottoes. Chen et al. [19] finished the digital restoration of high-resolution frescoes and presented an image inpainting strategy based on partial convolution and sliding window technique. Wang et al. [20] proposed a novel sparse representation strategy with elastic net regularization based on similarity-preserving overcomplete dictionaries to deal with the problem of unclear or completely missing structural information over large areas. Ciortan et al. [21] presented an image inpainting approach based on a generative adversarial network with two generators, one of which is used to generate edges and the other to generate colors, to restore the murals in the Mogao Grottoes. Xu et al. [22] suggested a digital restoration technique that combines deformations convolution network (DCN), efficient channel attention network (ECANet), residual network (ResNet), and CycleGAN to better capture high-frequency aspects of images and prevent network deterioration and gradient disappearance. Li et al. [23] suggested a generation discriminator network model based on an artificial intelligence algorithm to improve the texture details of images generated from the generator by focusing this portion of the loss in the loss function. These methods have achieved excellent results in the task of complementing murals, and the supplemented areas have perceptual similarities to the original images.
The image inpainting algorithm faces challenges when solving the practical problem of mural image inpainting. A challenging problem is to manually find the pixels that need to be filled, that is, the annotation of the masks at the prediction step of image inpainting. For the murals, there are numerous stained pixels, the distribution of which is extremely wide, and the manual annotation takes massive time. Spectral imaging technique is extensively utilized in the field of cultural relics preservation. Spectral imaging is an optical method that simultaneously captures both images and the spectral reflectance of objects. This technique allows for the characterization of item properties across distinct wavelength bands, thus automatically labeling the masks of mural images with spectral information is a viable possibility. Besides, spectral imaging technique offers the advantage of non-destructive, non-contact examination, as it does not necessitate samples like chemical analysis, and is not susceptible to damage from high-intensity laser beams like Raman spectroscopy, making it more suitable for cultural artifacts. Initially, hyperspectral imaging was employed in the domain of pigment identification and analysis, as well as in the extraction of concealed information. Some experts have recently gathered hyperspectral images of mural paintings as a substitute for RGB images in the virtual restoration of murals. Sun et al. [24] employed a pre-trained model with a Butterworth high-pass filter in a three-state domain conversion network and eliminated virtual scratches using principal component analysis and inverse principal component analysis to create a mural image with excellent visual effects and high-frequency information. Li et al. [25] employed a hyperspectral imaging technique to create a comprehensive database of 44 pigment information for murals, which is then utilized to guide the virtual restoration of mural images by searching for the most similar reference samples in a specifically designed feature space. Zhou et al. [26] proposed a method to select feature bands from hyperspectral painting and calligraphy images to generate red-green-blue (RGB) images by establishing rules and then reconstruct the stained polluted areas of RGB images through color-constrained Poisson editing to effectively remove or dilute the stains and restore the original colors of artworks. Hou et al. [27] proposed a virtual smudge recovery method based on hyperspectral imaging maximum noise fraction (MNF) transformation, which concentrated the main features of ancient paintings into several topmost principal components to determine the principal components containing the larger spectral information of the stain, and carried out MNF inverse transformation on several top components other than the selected components to reduce the impact of the stain on the images. Spectral imaging technology has a specific application in the field of cultural treasures protection. This research employs a technique that utilizes super-pixel segmentation to expand the application of this technology to the automated division of masks.
This research utilizes hyperspectral images for mask rendering, but the virtual restoration of images relies on visible images, leading to the issue of image registration. This work proposes a novel approach to address the problem by reconstructing visible images from hyperspectral images, rather than separately acquiring hyperspectral images and visible images. The conventional approach for color reconstruction involves extracting three representative bands from hyperspectral images and feeding them into three primary color channels to generate pseudo-color images. The method described in this research achieves color reconstruction from spectral imaging using the CIE standard colorimetric system. This approach not only eliminates the need for image registration, but also incorporates more comprehensive spectral data, resulting in a more authentic reconstruction of color, which partially resolves the issue of color fading.
Another problem is that inpainting algorithms based on convolutional neural networks require fixed-size inputs, therefore the image is often broken up into smaller blocks, such as 512\(\times\)512-pixel squares. The blocks are restored and then spliced. Unlike defacement, missing areas in mural paintings are often large holes, which means adopting this method is faced with the problem that most or even the entire blocks are the areas to be inpainted. In the case of a limited training set, it is also challenging for convolutional neural networks to achieve effective inpainting of large holes. Thus, by adjusting the super-pixel parameters throughout the process of extracting masks, it is possible to extract sections of varying sizes to be patched. The primary technique for restoration is the utilization of convolutional neural networks, while the secondary approach involves employing image block matching to aid in the restoration of the challenging regions with large size.
The related work of this paper is as follows,
-
(1)
A hyperspectral image segmentation method based on simple linear iterative clustering (SLIC) is proposed, which can automatically extract the location of the missing and defacement points to be repaired on the mural images according to the spectral-spatial information.
-
(2)
The fusion of hyperspectral images to color images is achieved through the CIE standard colorimetric system, which enhances the color’s realism and somewhat resolves the issue of fading.
-
(3)
A strategy that the Criminisi algorithm and partial convolutional neural network are adopted respectively for missing points and defaced points is proposed to solve the problem that the missing areas are difficult to be inpainted by the traditional convolutional neural network due to the large region.
Methods
The diagram illustrating the sequence of steps in this work is depicted in Fig. 2. The hyperspectral mural images were acquired using a high-resolution device specifically designed for collecting cultural treasures. The main task is image inpainting, namely the rebuilding of the area on the mural images presumed to be stained. Prior to commencing the inpainting process, it is important to ascertain the specific regions that require inpainting, in other words, to identify the masks of the images in inpainting tasks [28]. SLIC is not only applicable to color images, but also compatible with multi-channel hyperspectral images. Given that the inpainted area is often a contiguous region, the super-pixels produced by SLIC are small and tidy, and the local characteristics are simple to represent. Furthermore, SLIC is highly advantageous in terms of its operational speed, compactness in generating super-pixels, and its ability to preserve contours. The mask acquirement method described in this research comprises two distinct phases.
Initially, the quantity of super-pixels is set minor, leading to individual super-pixel regions with larger area. This phase mostly found absent components, areas of the wall exhibiting flaking, and extensive regions with compromised portions. The SLIC algorithm will produce superpixels, which will subsequently be categorized using the SVM-MRF classifier. The classification outcome identifies large regions requiring inpainting, and the collection of these regions forms mask 1. Subsequent efforts were undertaken to identify smaller stains, the method of which involves assigning the spectral value of the mask 1 region to the maximum value, hence excluding the influence of previously found masks. Thereafter, the hyperspectral image is partitioned into numerous smaller squares in the spatial dimension, followed by the setting of a greater quantity of superpixels for segmentation, thereupon then each superpixel encompasses a smaller region. The classification is subsequently executed using SVM-MRF, revealing smaller contaminated regions within small superpixels [29]. Mask 2 comprises all smaller stained regions. The fundamental idea of classification is based on the distinctive spatial-spectral characteristics of the contaminated area and the normal region, utilizing SVM as a spectral classifier for initial classification, subsequently applying MRF for post-processing. An issue that needs attention is the challenge of discerning between the sections requiring restoration and the sections that were partially coated with black paint, considering the similarity in the spectral properties of both regions. Hence, the approach of this research is to initially consider these two categories of regions as a single entity in the classification step, and subsequently differentiate them using spectral analysis.
Given that the target images for the virtual restoration task are in RGB format, it is necessary to compress the hyperspectral images into three color channels red, green, and blue. Obtaining hyperspectral images and RGB images simultaneously and in the same location using image acquisition devices is difficult. Consequently, capturing additional RGB images presents challenges in terms of registration and calibration. A popular approach in creating fake color images is to use three channels for band synthesis. However, this method only takes into account the information from these three bands, resulting in inadequate usage of spectral information. Consequently, the image’s color will differ from the true perception. To address the issue, the hyperspectral mural images are transformed into RGB images using the color principle and the CIE standard colorimetric system. This is achieved by establishing a mapping from the spectrum to the RGB channels. The images cover a high quantity of spectrum information using this color reconstruction system, resulting in color that is more in line with the principles of optical imaging. Furthermore, the spectral curve encapsulates the intrinsic properties of the substance, hence exhibiting resilience against issues like fading.
Finally, the Criminisi algorithm and convolutional neural network are integrated to do image restoration using partial convolution. The former method serves as a supplement to the later algorithm in addressing the limitation of the latter in repairing vast damaged areas, despite the latter’s ability to better comprehend semantic information during the inpainting process. The precise approach involves utilizing a convolutional neural network to restore the small defiled regions, followed by employing the Criminisi method to restore the large damaged sections. The restoration of the small areas is intentionally prioritized over the large portions to prevent the Criminisi algorithm from matching pixels that contain defiled information. Since the input of the convolutional neural network is a square image block, the images should first be cut into blocks that conform to the input size. Representatively, after restoring the images by the convolutional neural network, an additional stitching procedure is performed. This is not included in the Criminisi algorithm due to its ability to directly process images of varying dimensions.
Automatic mask acquisition based on SLIC and SVM-MRF
Before finalizing the virtual mural image restoration, it is necessary to locate the areas requiring restoration. Using spatial-spectral information, the images are segmented into hyper-pixels in this phase. Subsequently, methods based on SVM and MRF are employed to classify the hyper-pixels and detect any contaminated or missing regions. To address the potential confusion caused by the black paint area, the outcomes were additionally refined through the utilization of spectral binary coding. The extraction of larger and smaller areas to be restored is achieved by adjusting specific super-pixel parameters. This process generates masks that guide the ensuing inpainting steps.
Spatial-spectral hyper-pixel segmentation based on SLIC
The fundamental concept of SLIC is utilizing the k-means method inside a specific region of an image to get a segmentation outcome by clustering. Let M represent the original hyperspectral image, which consists of N pixels in B bands. Each pixel is denoted as
Multiple clustering centroids are chosen, and each pixel is assigned to the centroid that is closest to it based on a predetermined distance metric. This process results in the creation of clusters. Next, the average vectors of the clusters are calculated, and the clustering centroids are then modified to reflect these new positions. Through the process of iterating, the clustering centroids typically reach a stable state.
Pixels clustered within the same superpixel are regarded as having analogous spectra, indicating similarity in the spectral dimension; meanwhile, they often occupy nearby regions, signifying similarity in the spatial dimension. Consequently, the spatial-spectral dimension is defined during the clustering process. The spectral-spatial distance between two pixels \(m_{i}\) and \(m_{j}\) in hyperspectral images is set as follows:
in which k is a parameter that balances the spectral information and spatial information, while the spectral distance term \(D_{spectral}\) and the spatial distance term \(D_{spatial}\) are respectively
In which \(\left( u_{i},v_{i}\right)\) and \(\left( u_{j},v_{j}\right)\) respectively represent the position of \(m_{i}\) and \(m_{j}\).
Following the aforementioned stages, the initial hyperspectral images undergo segmentation, resulting in the acquisition of super-pixel images. The original pixels are replaced with the average vector of the pixels in each super-pixel region to achieve the fusion of spatial and spectral data.
Super-pixel classification based on SVM-MRF
Subsequently, the super-pixel images undergo classification using SVM-MRF. SVM is a classification model that separates data into two classes, the fundamental model of which is a linear classifier that maximizes the margin on the feature space. The learning segmentation concept aims to maximize the interval and is transformed into a convex quadratic programming problem. Suppose the training samples are \({\varvec{{x}}}_{i}\) while \(y_{i} \in \left\{ +1,-1 \right\}\) are the labels, in which \(i=1,2,...,n\). The classification hyperplane is defined as \(f\left( {\varvec{{x}}} \right) ={\varvec{{w}}}^{T}{\varvec{{x}}}+b\), in which \({\varvec{{w}}}\) is the coefficient while b is the bias. Then the solution to the optimal classification surface is expressed as
The purpose of the radial basis function (RBF) kernel is to transform the data into a feature space with an infinite number of dimensions, resulting in improved performance for nonlinear classification. RBF can capture detailed feature information, making it suitable for complicated nonlinear classification tasks in hyperspectral image classification. The radial basis function for the samples \({\varvec{{x}}}_{i}\) and \({\varvec{{x}}}_{j}\) is expressed as follows:
The kernel SVM can be solved by formulating a Lagrange function, and the ultimate objective function is represented as follows,
In which \(\varphi \left( {\varvec{{x}}}\right)\) is the linear transformation that defines the kernel function \({\varvec{{K}}}\left( {\varvec{{x}}}_{i}, {\varvec{{x}}}_{j} \right) =\varphi \left( {\varvec{{x}}}_{i} \right) ^{T} \varphi \left( {\varvec{{x}}}_{j} \right)\).
Following the utilization of SVM as a spectrum classifier for initial classification, the MRF algorithm is employed for subsequent post-processing. Based on the spatial characteristic information, the SVM result is refined and altered to obtain a comprehensive classification result of the spatial and spectral information. The spectral energy term serves as the foundation for feature classification, whereas spatial energy term encompasses superpixel contextual features, and the likelihood of each category is adjusted to some degree. The objective function of MRF is mathematically defined as
In which \(U_{spectral}\left( {\varvec{{x}}}_{i} \right)\) and \(U_{spatial}\left( {\varvec{{x}}}_{i} \right)\) are respectively the spectral energy term and the spatial energy term, and \(\beta\) is a parameter that balances the importance between both terms. The posterior probability based on SVM pixel classification can be obtained by calculation, which is
In which A and B are parameters, which can be calculated by minimizing the cross-entropy error function. According to the Ising model for Gibbs distribution, the spatial energy function is established as
In which \({\varvec{{N}}}_{i}\) represents the neighborhoods of \({\varvec{{x}}}_{i}\), and \(L_{i}\) is the class label of \({\varvec{{x}}}_{i}\). \(\delta \left( L_{i},L_{j} \right)\) is the Kronecker delta function,
Correction of segmentation based on spectral binary coding
Following the aforementioned procedures, the contaminated section has been identified, albeit it is still being conflated with the portion containing black pigments. Thus, spectral feature quantization is employed for further correction. Spectral binary coding is employed to attain rapid target detection. Spectral binary coding, which use 0–1 sequences to represent the spectrum, facilitates the rapid identification of targets in the spectrum library [30]. The most basic approach to binary coding can be described as follows:
In which the function \(h\left( n \right)\) represents the encoding of a pixel at a certain wavelength n out of a total of N bands. The value of the pixel is denoted by \(x_{n}\), and T represents the threshold. This study encodes the first-order difference between bands to emphasize the variation trend of pixels in distinct bands,
The standard code for the black pigment is determined based on the distribution of the black pigment, which has a lesser variance compared to the stained point. It is defined by counting the number of 0 s and 1 s in the sample and determining if there are more 0 s than 1 s in that band. The criterion for classifying pixels is based on whether the Hamming distance between the code word and the standard code word is below a specific threshold.
Color Optimization Based on CIE Colorimetric System
To begin the CIE chrominance computation, the first step involves determining the stimulus values X, Y, and Z based on the spectral reflection curve of the objects’ surfaces. The color stimulus value \(\varphi \left( \lambda \right)\) is determined using the following formula, which takes into account the spectral reflectance \(\rho \left( \lambda \right)\) of the item and the relative spectral energy distribution \(S \left( \lambda \right)\) of the light source.
Once the color stimulus is acquired, the stimulus values of the three colors X, Y, and Z are determined.
in which the values of \(\overline{x} \left( \lambda \right)\), \(\overline{y} \left( \lambda \right)\), and \(\overline{z} \left( \lambda \right)\) correspond to the color-matching functions. This constitutes a series of empirical formulas readily applicable in practical settings through reference to corresponding tables. The normalization coefficient, denoted as K, is a factor used to standardize or scale a value,
The conversion from CIEXYZ stimulus value to the digital control value RGB for D65 illuminators can be achieved using the following empirical formula. The X, Y, and Z data are initially transformed into linear RGB values, denoted as r, g, and b using the following formula:
Afterwards, the r, g, and b values are converted into \(R'\), \(G'\), and \(B'\) values by a non-linear transformation. When \(r,g,b\le 0.0031308\),
while when \(r,g,b> 0.0031308\)
where the values of \(R'\), \(G'\), and \(B'\) range from 0 to 1. For 8-bit encoding, the values of R, G, and B range from 0 to 255 as integers,
where the operator round indicates the rounding operation.
Mural Image Inpainting Algorithms
The mural image restoration is performed on the RGB image following color reconstruction. The process of obtaining large and small areas to be restored involves adjusting parameters in the super-pixel segmentation step beforehand. Training convolutional neural networks using partial convolution to restore small-sized sections. Considering the limited effectiveness of convolutional neural networks in repairing large-sized regions, a method that utilizes image block matching is employed to address this issue. Figure 3 exhibited the schematic diagrams of the partial convolutional neural network and the Criminisi Algorithms.
Inpainting for large areas based on convolutional neural network
The image inpainting techniques that rely on convolutional neural networks take the damaged image as the input and the entire image as the label for the learning process. Partial convolution is a technique that utilizes masks in convolutional operations to enhance operational efficiency and enhance the ability to differentiate between damaged and non-damaged pixels, hence enhancing their sensitivity [31]. A partial convolution layer is defined in the following manner,
In which the variables \({\varvec{{W}}}\) and b respectively indicate the weight and bias. \({\varvec{{X}}}\) represents the feature value of the window, while \({\varvec{{M}}}\) represents the matching mask. The output of the layer is denoted by \(x'\). The symbol \(\odot\) represents the operation of element-wise multiplication.
As the configuration of the network displayed in Fig. 3, The U-net architecture employs partial convolution in the encoder and nearest neighbor up-sampling in the decoder. The training datasets consist of mural hyperspectral images that have been acquired and transformed into RGB images. These images are then divided into blocks with dimensions of 256\(\times\)256. The NVIDIA Irregular Mask Dataset was utilized for masks.
The loss function chosen is a mixed function that combines multi-scale structural similarity (MS-SSIM) \(L_{MS-SSIM}\) and L1 norm \(L_{l_{1}}\),
In which \(G_{\sigma _{G}^{M} }\) is the Gaussian kernel for calculating the M-th scale in MS-SSIM and \(\alpha\) is a parameter that determines the proportion of the structural similarity (SSIM) loss function and L1 norm loss function.
Inpainting for small areas based on the Criminisi algorithm
The Criminisi algorithm is an image restoration method that effectively handles damaged images with intricate texture and intricate structure [32]. The schematic diagram of the Criminisi algorithm is shown in Fig. 3.
The algorithm is mostly segmented into subsequent parts. To begin with, it is essential to identify the large regions that require restoration, a task that is accomplished by the technique of super-pixel segmentation.
Next, establish priorities. The priority order of sample block restoration is determined by evaluating the priority of all sample blocks located at the boundary of the region using the priority function \(P\left( p \right)\). The priority function is expressed as,
In which \(C\left( p \right)\) represents the proportion of original image pixels in the \(\psi _{p}\) with p as the center; \(D\left( p \right)\) is the product of the normal vector \(n_{p}\) of the tangent line at point p and the iso illuminance line \(\nabla I_{p}^{\bot }\); \(\alpha\) is the normalization factor. The priority function of all contour boundary sample blocks is calculated, and the sample block with the highest value is chosen for restoration. The sum of squared difference (SSD) is used as the matching criterion, and the best matching block is selected from the complete information region to repair the image through a global search. The total of squared difference is defined as
In which \(d\left( \psi _{p}, \psi _{q}\right)\) represents the sum of squares of the difference of pixels corresponding to the sample block \(\psi _{p}\) and the matching block \(\psi _{q}\) to be restored.
The value of \(d\left( \psi _{p}, \psi _{q}\right)\) is determined using the following formula,
In which \(\Delta I_{R}^{2}\), \(\Delta I_{G}^{2}\), and \(\Delta I_{B}^{2}\) respectively represent the differences in pixels between the sample block and the matched block in the red, green, and blue channels. The restoration process involves selecting the pixel with the highest priority value from all the pixels to be restored. Then, a search is conducted in the undamaged part of the image to find the ideal matching block. Ultimately, the data contained within the ideal matching block \(\psi _{q}\) is replicated to the corresponding location within the impaired block \(\psi _{p}\). After inpainting \(\psi _{p}\), the confidence item value of each pixel in \(\psi _{p}\), are updated. The update process is expressed as
Results and discussions
The results section of this study comprises the following components. A minute description of the experimental setup, including the imaging equipment used for the mural collection and computational resources is provided. Then the findings and outcomes of the mask extraction, the procedure for constructing color system, and a detailed explanation of network training are presented. Finally, the effectiveness of the method is further validated by examining several metrics of image quality assessments.
Experiment conditions
The mural images in this manuscript were obtained from a tomb located in Baiyangzhai Village, Chang’an District, Xi’an, Shaanxi Province. The tomb’s owner is believed to have been a eunuch from the Tang Dynasty. The vault features a gracefully curved roof and a rectangular hollow, exuding an atmosphere of refinement and solemnity. It stands at a height of approximately 2.7 yards, with a corridor measuring 16 yards in length and 1.8 yards in width. The mural paintings are predominantly located on the eastern and western sides of the tomb corridor. The paintings on the western side are mostly well-preserved, however, some of the paintings on the eastern side have deteriorated and fallen off. In addition, other murals are scattered around the wall near the northern entrance. The murals depict guards, chariots, musicians, and maids. The murals have generally been well conserved, although they exhibit varied levels of cracking, detachment, and contamination by sludge.
The data acquisition site is exhibited in Fig. 4. The hyperspectral images are obtained using a motor-driven frame in a push-broom configuration [33]. The Aphis 2.0-I spectrometer serves as a device for acquiring spectral data. The imaging spectrometer captures hyperspectral images of cultural treasures by gathering spectral reflectance on a pixel-by-pixel basis, ensuring non-contact and non-destructive qualities. Besides, the computer program simultaneously regulates the motor automatically via serial communication to facilitate data collecting, resulting in significant efficiency enhancement. To prevent ruins to the mural paintings, thermometers, hygrometers, and soil acidity meters continuously monitored environmental changes in the tomb during the collection process, with data collection personnel adhering to strict regulations under the supervision of professional archaeologists from the Shaanxi Provincial Institute of Archaeology.
The computational process is performed on an H3C UniServer R5300 G3 computer equipped with an NVIDIA GeForce RTX 3090 graphics card. The computer runs on a 64-bit Linux operating system and is equipped with software such as ENVI 5.3, Pytorch 1.12.0 package in Anaconda 5.3.0 (Python 3.7.0), and Matlab R2018b.
Mask acquisition based on spatial-spectral information
The complete procedure of automatic mask acquisition is illustrated in Fig. 5. Initially, the hyperspectral images of the murals are segmented into many hyper-pixels. Subsequently, by SVM-MRF classification, the larger damaged areas and the black pigment portions are identified as a unified entity. Subsequently, spectral analysis is used to differentiate between the larger damaged areas and the ones with black pigment. The spectral curves of black pigment pixels are derived from artificially labeled regions, and the coding rules for the black pigment regions are determined based on the coding principles outlined in Eq. 13. Next, the classification of the super-pixels is based on whether they belong to the damaged areas or the black pigment areas. The spectral codes of four sample super-pixels in the areas to be restored are depicted on the right of Fig. 5. The presence of blue stems signifies that the coding of the band matches that of the black pigment sections, whereas the red stems show that they are distinct. Thus, mask 1, which represents the areas requiring inpainting on a larger scale, is acquired. The subsequent stage involves obtaining mask 2, which targets the smaller regions that are stained. During this stage, the pixels in the black pigment sections and the major damaged areas are assigned saturation values in each band. This ensures that these regions are excluded from the classification process. Hyperspectral images undergo spatial dimension division into smaller spectral image blocks, followed by a repetition of super-pixel segmentation and classification. The masks acquired from each individual block are ultimately combined to create the overall mask 2.
Before the image restoration stage, the hyperspectral images are converted into RGB images using the CIE standard colorimetric method. The data from each band of hyperspectral images is combined into XYZ space using integration, and then transformed into RGB space using matrix operations. Figure 6 illustrates the contrast between the suggested technique and the pseudo-color fusion method. The method employed in this research results in images that exhibit a greater balance among the three fundamental hues, as well as more realistic brightness and saturation. For verification purposes, two non-reference image quality assessments (NR-IQA), namely naturalness image quality evaluator (NIQE) [34] and MUSIQ [34], are utilized. In contrast to full-reference image quality assessments (FR-IQA), they measure image quality without the need for reference images. MUSIQ quantifies the contrast, brightness, and color balance of an image, while NIQE facilitates the precise evaluation of the image’s color intensity The evaluation results are displayed in Table 1. The indications demonstrate that the color produced by the proposed method is more authentic and harmonious.
Network training
For training the network, a dataset including 26,016 mural images with dimensions of 256\(\times\)256 pixels was employed. Out of these, 23,415 images were used for training and 2601 images were used for testing. The NVIDIA irregular mask dataset is employed as the mask dataset, with each mask being resized to a dimension of 256\(\times\)256. A learning rate of 0.00001 was utilized, along with a batch size of 32 and a total of 500 epochs. In the forecasting phase, the images and masks that need to be restored are resized to 256\(\times\)256 pixels. This is done since the mural images in the actual application scenario do not have the same size as the network’s input size. If the dimensions of the image are not evenly divisible by 256 pixels, the borders are filled with homogeneous color until the dimensions are a multiple of 256, and then they are restored. Ultimately, miniature images are seamlessly combined to get the initially restored images. If an area was previously filled a homogeneous color, it is eradicated then.
The large areas were subsequently restored utilizing the Criminisi technique. The approach is unsupervised and does not impose any constraints on the image sizes, allowing for a direct execution of the process.
Evaluation
The evaluation approach of this paper comprises three components: visual representations of inpainting effects, comparison of metrics between methodologies, and ablation studies. In visual representations, the effects of various strategies are displayed directly as images for subjective assessment. In comparison of metrics between methodologies, the index comparison among approaches is employed to assess the various procedures objectively. Ablation studies address the configuration of specific hyperparameters for network architecture.
Visual representations of inpainting effects
Figure 7 illustrates the visual outcomes of six image inpainting techniques in the virtual restoration of mural artworks. Four mural paintings characterized by intricate patterns and evident defacement were chosen for experimental study. In Fig. 7, the black and red regions represent the masks in the images, the red sections of which are the areas that are particularly restored with the Criminisi algorithm. From the perspective of visual effects, the restoration of polluted areas by LBAM [35] and DFNet [36] is unfortunate, and there are still several areas affecting poor visual effects in the images. MAT [37] introduced some noise while restoring, which resulted in the images appearing quite intrusive. The effect of CR-Fill [38] is more significant, but the contour of the edges of the restored areas is excessively obvious. The primary benefit of the proposed approach is the enhancement of image smoothness while addressing the grainy texture resulting from localized noise. Deep learning methodologies frequently present difficulties in restoration when confronted with extensive regions of loss. Samples 2 and 4 provide as the most evident evidance, in which artifacts manifest in the regions indicated by mask 1. The Criminisi algorithm incorporates both the original texture information of images and the output of the initial image correction performed by the network. The Criminisi algorithm’s repair efficacy is nearly unparalleled in the context of substantial area loss. Consequently, this research develops an effective convolutional neural network utilizing local convolution to prevent incomplete restoration and blurring in small noisy areas. The implementation of the Criminisi algorithm effectively addresses the limitations of neural networks, where predictions may become disordered and produce artifacts in cases of large masks.
Comparison of metrics between methodologies
Alongside visual comparisons of approaches, various FR-IQAs and NR-IQAs were employed for objective numerical assessment of the images. Due to the unknown original appearance of the murals when they were painted by the ancients, there is a lack of ground truth for this image restoration work. FR-IQAs are specifically designed to assess the performance of neural networks trained with mural image datasets, as they are only relevant to tasks that involve a reference. The peak signal-to-noise ratio (PSNR) and SSIM of each network at varying levels of mask proportions are compared in Fig. 8. SSIM is a metric used to quantify the similarity between the original image and the image that has been restored after the addition of masks. On the other hand, PSNR quantifies the average squared error between the two images. The suggested technique demonstrates exceptional performance in both PSNR and SSIM metrics for masks of various sizes. This indicates that these techniques exhibit strong generalization capabilities for mural pictures during training, and the resulting model has exceptional performance on the training set. The network presented in this research exhibits superior performance during the training phase and demonstrates exceptional efficacy under a small-area mask. It is evident that when the mask size increases, the performance of multiple networks, including the one discussed in this article, declines significantly. The necessity to implement the Criminisi algorithm as a means of rectifying network outcomes is further substantiated by this phenomenon. Although the Criminisi method is not classified as supervised training, as FR-IQA cannot directly quantify its impact, the role of which can be observed by NR-IQA.
On the other hand, NR-IQAs are capable of evaluating image quality with a high level of precision, without relying on reference data. The evaluations of image quality in several networks encompassing the blind/referenceless image spatial quality evaluator (BRISQUE), MUSIQ, NIQE, and non-reference quality metric (NRQM) are displayed in Table 2. BRISQUE autonomously evaluates the quality of images without depending on reference images. NIQE quantifies the degree of naturalness or realism exhibited by visuals. MUSIQ utilizes local features, global data, multi-scale information, and perceptual models to assess the quality of an image. NRQM assesses the quality of an image by examining statistical characteristics, structural details, color dispersion, and contrast. In the evaluation of the NR-IQA index, the outcomes of each approach are acceptable. In comparison to alternative methodologies, our approach exhibits a significant advantage in Brisque and MUSIQ, indicating that the images restored in this study align impeccably with human sensory attributes. The seeming abrupt textures in sample 2 may offer more extensive multi-scale information, leading to the result that the medium PConv technique demonstrates superior performance according to the MUSIQ index. According to the NRQM assessment, the DFNet algorithm’s repair of sample 4 is deemed superior to our approach; yet, it is evident that its inpainting results exhibit artifacts. The superiority of the method shown in this study over Pconv can be attributed to the inclusion of an additional restoration step based on matching, as opposed to relying solely on a network-based approach. The IQA evaluation findings indicate that, overall, our method produces the most favorable outcomes.
Ablation studies
The primary parameters influencing network performance are the choice of convolution kernel and the loss function. Partial convolution is employed as paddings and is compared to standard convolution in an ablation study. Certain approaches frequently employed in classification problems, such as reflection and repetition paddings, are not addressed here, as they are theoretically inapplicable due to the irregular masks present in this work. Figure 9 illustrates the efficacy of partial convolution in this task. Both the PSNR and SSIM metrics indicate that partial convolution outperforms standard convolution across different mask scales.
The prevalent loss functions for this task include L1 loss functions and SSIM loss functions. The L1 loss function alone accounts for pixel-wise differences and does not incorporate global information control. MS-SSIM considers brightness, contrast, and structure, aligning with human visual perception, and mitigates the detail deficiency in the L1 loss function, hence preventing color bias associated with MS-SSIM loss. Figure 10 illustrates the ablation experiment of the loss function. The NR-IQA of the test set image serves as a metric to investigate the significance of the two loss components. The NR-IQA is standardized into a score index, with higher scores indicating superior picture quality. The six columns of the bar chart illustrate the variation in the \(\alpha\) value from 0 to 1 in increments of 0.2. When \(\alpha =0\), the MS-SSIM component is forfeited, and when \(\alpha =1\), L1 loss component is forfeited. Ablation investigations indicate that images corrected in the test set exhibit optimal quality at an \(\alpha\) value of 0.8.
Conclusion
Research is conducted on the virtual restoration of Tang Dynasty tomb murals using hyperspectral images. This research utilizes hyper-pixel segmentation and SVM-MRF classification techniques to automatically identify the pixels in hyperspectral pictures of mural paintings that require restoration. The hyperspectral image is segmented and classified at the hyper-pixel level, and the restoration area is identified to establish the necessary conditions for subsequent procedures. The CIE standard coloration system was utilized to convert the hyperspectral mural image to a red, green, and blue (RGB) image; this resulted in an 8.42% improvement in the average MUSIQ when compared to the pseudo-color fusion algorithm. Smaller regions are restored using convolutional neural networks based on partial convolutions and the large areas are restored by the matching image block-based method. The ablation experiment validated the efficacy of employing partial convolution as a padding approach, and the combination of MS-SSIM and L1 loss in a weighted sum format was deemed best for the loss function. In contrast to the traditional CNN approach, the mean MUSIQ enhancement amounted to 2.41%. The future of work will be shaped by:
(1) The development of a comprehensive dataset containing spectral information for black pigments. During the mask acquisition process, it is possible to achieve improved classification accuracy by implementing more effective correction techniques.
(2) The augment of the training dataset of convolutional neural networks. The efficacy of this work has alone been validated in the virtual restoration of the Tang Dynasty murals in China, and there is potential for further expansion of its application scenarios.
Availability of data and materials
Relevant researchers may acquire the data and materials that substantiate the conclusions of this study by contacting the corresponding author if they are required for scientific research.
References
Zhang H, Guo Q, Wang Y, Xia Y, Tang S, Zhao L, et al. Analysis of cracking behavior of murals in Mogao grottoes under environmental humidity change. J Cult Herit. 2024;67:183–93.
Ogura D, Hase T, Nakata Y, Mikayama A, Hokoi S, Takabayashi H, Okada K, Su B, Xue P. Influence of environmental factors on deterioration of mural paintings in Mogao cave 285, Dunhuang. Case Stud Build Rehabil. 2021;105–59.
Hamburger C. Identification of microorganisms dwelling on the 19th century lanna mural paintings from northern Thailand using culture-dependent and-independent approaches. Biology. 2022;11(2):228.
Xiang H, Zou Q, Nawaz MA, Huang X, Zhang F, Yu H. Deep learning for image inpainting: a survey. Pattern Recogn. 2023;134: 109046.
Chen L, Yuan C, Qin X, Sun W, Zhu X. Contrastive structure and texture fusion for image inpainting. Neurocomputing. 2023;536:1–12.
Sari IN, Du W. Structure-texture consistent painting completion for artworks. Ieee Access. 2023;11:27369–81.
Pen H, Wang S, Zhang Z. Mural image shedding diseases inpainting algorithm based on structure priority. Third International Conference on Artificial Intelligence and Computer Engineering (ICAICE 2022) 12610, 2023;347–352 . SPIE
Wang H, Li Q, Jia S. A global and local feature weighted method for ancient murals inpainting. Int J Mach Learn Cybern. 2020;11:1197–216.
Cao J, Li Y, Zhang Q, Cui H. Restoration of an ancient temple mural by a local search algorithm of an adaptive sample block. Herit Sci. 2019;7(1):39.
Zhou Z, Liu X, Shang J, Huang J, Li Z, Jia H. Inpainting digital Dunhuang murals with structure-guided deep network. ACM J Comput Cult Herit. 2022;15(4):1–25.
Wei H, Shuwen W. Dunhuang murals inpainting based on image decomposition. In: 2010 3rd International Conference on Computer Science and Information Technology, 2010;2:397–400. IEEE
Xu H, Zhang Y, Zhang J. Frescoes restoration via virtual-real fusion: method and practice. J Cult Herit. 2024;66:68–75.
Jaidilert S, Farooque G. Crack detection and images inpainting method for thai mural painting images. In: 2018 IEEE 3rd International Conference on Image, Vision and Computing (ICIVC), 2010;143–148. IEEE
Zhang X, Zhai D, Li T, Zhou Y, Lin Y. Image inpainting based on deep learning: a review. Inf Fusion. 2023;90:74–94.
Qiu S, Jin Y, Feng S, Zhou T, Li Y. Dwarfism computer-aided diagnosis algorithm based on multimodal pyradiomics. Inf Fusion. 2022;80:137–45.
Vatolin D. Deep two-stage high-resolution image inpainting. 2020.
Liu W, Shi Y, Li J, Wang J, Du S. Multi-stage progressive reasoning for Dunhuang murals inpainting. In: 2023 IEEE 4th International Conference on Pattern Recognition and Machine Learning (PRML), 2023;211–217. IEEE
Schmidt A, Madhu P, Maier A, Christlein V, Kosti R. Arin: adaptive resampling and instance normalization for robust blind inpainting of dunhuang cave paintings. In: 2022 Eleventh International Conference on Image Processing Theory, Tools and Applications (IPTA), 2022;1–6. IEEE
Chen M, Zhao X, Xu D. Image inpainting for digital Dunhuang murals using partial convolutions and sliding window method. In: Journal of Physics: Conference Series, 2019;1302:032040. IOP Publishing.
Wang H, Li Q, Zou Q. Inpainting of Dunhuang murals by sparsely modeling the texture similarity and structure continuity. J Comput Cult Herit (JOCCH). 2019;12(3):1–21.
Ciortan I-M, George S, Hardeberg JY. Colour-balanced edge-guided digital inpainting: applications on artworks. Sensors. 2021;21(6):2091.
Xu Z, Zhang C, Wu Y. Digital inpainting of mural images based on dc-cyclegan. Herit Sci. 2023;11(1):169.
Li J, Wang H, Deng Z, Pan M, Chen H. Restoration of non-structural damaged murals in Shenzhen Bao’an based on a generator-discriminator network. Herit Sci. 2021;9:1–14.
Sun P, Hou M, Lyu S, Wang W, Li S, Mao J, Li S. Enhancement and restoration of scratched murals based on hyperspectral imaging-a case study of murals in the Baoguang hall of Qutan temple, Qinghai, China. Sensors. 2022;22(24):9780.
Li J, Xie D, Li M, Liu S, Wei C. Pigment identification of ancient wall paintings based on a visible spectral image. J Spectrosc. 2020;2020(1):3695801.
Zhou P, Hou M, Lv S, Zhao X, Wu W. Virtual restoration of stained Chinese paintings using patch-based color constrained Poisson editing with selected hyperspectral feature bands. Remote Sens. 2019;11(11):1384.
Hou M, Zhou P, Lv S, Hu Y, Zhao X, Wu W, He H, Li S, Tan L. Virtual restoration of stains on ancient paintings with maximum noise fraction transformation based on the hyperspectral imaging. J Cult Herit. 2018;34:136–44.
Qiu S, Ye H, Liao X. Coastal zone extraction algorithm based on multilayer depth features for hyperspectral images. IEEE Trans Geosci Remote Sens. 2023.
Xie C, Zhang X, Zhuang L, Han W, Zheng Y, Chen K. Classification of polarimetric sar imagery based on improved mrf model using wishart distance and category confidence-degree. In: 2023 IEEE International Radar Conference (RADAR), 2023;1–4. IEEE.
Meng Q, Zhang J, Li X, Li Y, Shen X, Li Z, Xu M, Yao C, Chu P, Cui Y-J, et al. Asap-ms combined with mass spectrum similarity and binary code for rapid and intelligent authentication of 78 edible flowers. Food Chem. 2024;436: 137776.
Liu G, Reda FA, Shih KJ, Wang T-C, Tao A, Catanzaro B. Image inpainting for irregular holes using partial convolutions. In: Proceedings of the European Conference on Computer Vision (ECCV), 2018;85–100.
Sun X, Jia J, Xu P, Ni J, Shi W, Li B. Structure-guided virtual restoration for defective silk cultural relics. J Cult Herit. 2023;62:78–89.
Qiu S, Zhang P, Li S, Hu B. Extraction and analysis algorithms for Sanxingdui cultural relics based on hyperspectral imaging. Comput Electr Eng. 2023;111: 108982.
Ke J, Wang Q, Wang Y, Milanfar P, Yang F. Musiq: Multi-scale image quality transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021;5148–5157
Xie C, Liu S, Li C, Cheng M-M, Zuo W, Liu X, Wen S, Ding E. Image inpainting with learnable bidirectional attention maps. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019;8858–8867.
Vatolin D. Deep two-stage high-resolution image inpainting. 2020.
Li W, Lin Z, Zhou K, Qi L, Wang Y, Jia J. Mat: mask-aware transformer for large hole image inpainting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022;10758–10768.
Zeng Y, Lin Z, Lu H, Patel VM. Cr-fill: generative image inpainting with auxiliary contextual reconstruction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021;14164–14173.
Acknowledgements
The authors convey their utmost gratitude to the Shaanxi Academy of Archaeology for providing comprehensive assistance at the archaeological site.
Funding
This work is supported by Shaanxi key research and development plan (No.2018ZDXM-SF-093) and Shaanxi Province key industrial innovation chain (No.S2022-YF-ZDCXL-ZDLGY-0093 and 2023-ZDLGY-45). Light of West China (No.XAB2022YN10). The China postdoctoral science foundation (No.2023M740760). Shaanxi key research and development plan (No.2024SF-YBXM-678).
Author information
Authors and Affiliations
Contributions
Zimu Zeng was in charge of conducting algorithmic research and performing experimental analysis. Qiu Shi modified the algorithm’s settings. Pengchang Zhang and Xingjia Tang gathered the necessary data. Siyuan Li, Xuebin Liu, and Bingliang Hu prepared and refitted the equipment for acquiring hyperspectral data. All authors were involved in the conceptualization and design of the research, as well as in the analysis and interpretation of the data. They also contributed to the writing of the work.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Zeng, Z., Qiu, S., Zhang, P. et al. Virtual restoration of ancient tomb murals based on hyperspectral imaging. Herit Sci 12, 410 (2024). https://doi.org/10.1186/s40494-024-01501-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s40494-024-01501-0