1 Introduction

Skin scarring is a physiological response to skin injury accompanied by a three-stage healing process: inflammation, new tissue formation, and extracellular matrix reconstruction [1, 2]. Collagen fibers play a key role in the restoration of the skin structure. The assessment of collagen fiber morphology and structure is crucial for differentiating scars from normal tissue, and this assessment often relies on microscopic images of collagen fibers stained with Sirius Red. DNA microarray technology has revealed the expression of thousands of genes in scar tissues [3, 4]. Gene expression analysis is important for identifying scar tissue. By comparing the gene expression patterns of scar tissues with those of normal tissues, reliable differentiation markers can be identified, which can help establish an accurate classification model and is expected to provide theoretical support for preventing or alleviating scar formation. Furthermore, the use of association analysis algorithms to explore the association between collagen fiber micrographs and gene expression is critical when studying the mechanisms of scar formation. This imaging genetic approach can provide insights into the process of scar formation, reveal its underlying mechanisms, and help identify potential therapeutic targets.

For research in the area of disease classification using collagen fiber micrograph to quantify the anisotropy of collagen fibers in scar tissue, Fomovsky et al. developed the Matfiber algorithm [5], which measures the orientation of collagen fiber structures in a finite subregion of an image using an intensity gradient detection algorithm. This method can extract the specific physical features of collagen fibers and can be used as a basis for scar tissue determination. However, feature extraction and discrimination based on machine-learning methods have significant limitations, and very few features are extracted. Collagen fibers have a rich hierarchy of textural features that require a greater depth of feature extraction. Pham et al. first proposed the use of deep learning techniques to quantify and characterize collagen fiber features [6]. Their study introduced a Universal CNN (UCNN) based on the VGG-16 implementation, which can be used for the burn scar tissue image classification and detailed characterization of collagen fiber tissues, with an accuracy of 97% in scar discrimination. However, VGG-16 lacks a deep network structure, does not extract collagen fiber texture features well, and may have some limitations in terms of parameter efficiency and the handling of large amounts of data. Razia et al. proposed a lightweight deep convolutional neural network model [7], S-MobileNet, and exploited model fine-tuning using Relu and Mish activation functions, with a model discrimination accuracy of 98%. Hekler et al. used a deep learning approach to train a single CNN and combined two independently determined diagnoses into a new classifier based on gradient enhancement techniques [8], which ultimately led to the classification of five classes of skin lesions. The algorithm uses an end-to-end learning approach and can learn the features directly from raw data, which simplifies the process and improves efficiency, achieving a classification accuracy of 82%. However, the model complexity may lead to overfitting problems, and the dependence of gradient enhancement techniques on data distribution and feature selection must be handled with care.

For research in the area of disease classification using gene expression data, Hilal et al. proposed a novel feature subset selection and optimal adaptive neuro-fuzzy inference system (FSS-OANFIS) [9], which uses an improved grey wolf optimizer-based feature selection (IGWO-FS) model to derive the optimal feature subset, and the OANFIS model was used for gene classification with a discrimination accuracy of 89.47% on the colon cancer dataset. Because microarray data usually contain a large number of genes and a small number of samples, regularization is often used to effectively select information-rich genes to improve discriminatory accuracy. Lavanya et al. demonstrated that coefficient logistic regression with L1/2 regularization yields a higher classification accuracy and is an effective technique for gene selection in practical classification problems [10]. Based on this, Alharthi et al. proposed an adaptive penalized logistic regression (APLR) method, a regularization technique that achieved the highest discrimination accuracy of 93.53% in a prostate gene expression dataset, which was implemented using the least absolute contraction and selection operator. Elbashir et al. employed a constructed lightweight CNN model to classify breast cancer by converting gene expression data into a 2D heat map matrix [11]. Their results showed that this method achieved a discrimination accuracy of 98.76% and an area under curve (AUC) value of 0.99. Despite the significant advantages of this method in improving the accuracy, the general applicability of the method on different datasets is low. This may be due to the specificity of the dataset and the limitations of the heat map matrix transformation.

For research in the area of disease classification using multimodal data, considering the problem of insufficient feature representation of unimodal data, Ghoniem et al. established a hybrid evolutionary deep learning model using multimodal data, and the established multimodal fusion framework fused the genetic and histopathological image modalities. Based on the features of different modal data, they established a deep feature extraction network [12]. The constructed model achieved 98% accuracy in ovarian cancer staging prediction. Cai et al. proposed a staged multimodal multiscale attention model that extracts image and gene features by training feature extractors of different modalities and sends the multimodal features together to the feature fusion module for multimodal feature fusion to achieve classification judgment [13]. This idea of training different feature extraction networks can realize the effective extraction of multimodal data features and achieve a staging prediction accuracy of 88.51% on the TCGA lung cancer dataset.

For research in the area of imaging genetics analysis, Wang et al. proposed a multi-constrained uncertainty-aware adaptive sparse multi-view canonical correlation analysis (MC-unAdaSMCCA) method to explore the associations between SNPs, gene expression data, and sMRI by applying orthogonal constraints to multimodal data via linear programming [14]. Deng et al. proposed a multi-constrained joint non-negative matrix factorization (MCJNMF) method for correlation analysis of genomic and image data [15]. This method projects these two data matrices onto a common feature space, thereby enabling heterogeneous variables with large coefficients in the same projection direction to form a common module. This approach effectively identified common disease-related modules. However, to the best of our knowledge, no researchers has utilized the MCJNMF algorithm for association and bioinformatic analyses of scarring. In this study, association analysis was expected to provide a deeper and more precise understanding of the mechanism of scar formation, providing important insights and new ideas for scar treatment and prevention.

Currently, no unified platform has been established for scar tissue discrimination, either in the field of unimodal discrimination or multimodal fusion discrimination. Therefore, it can be adapted to the needs of scar discrimination under different input conditions. In addition, most current studies on both unimodal and multimodal fusion discrimination are limited to the technical application level and fail to explore the mechanism of scar formation from the perspective of bioinformatics. Based on the above problems, this study designed a multi-functional scar tissue discrimination platform that can perform both unimodal discrimination of histopathological images or gene expression data and the fusion of two modalities of data to achieve multimodal scar tissue discrimination. In unimodal discrimination, a CNN model based on residual networks is proposed to discriminate collagen fiber micrographs. The convolutional block based on residual network structure has advantages in image feature extraction and discrimination. This network structure can capture the textural features of collagen fibers more finely and solve the problem of gradient vanishing in deep learning. In addition, a logistic regression model with L1 regularization was designed to extract important gene features, which were then fed into a sigmoid classifier for binary discrimination. In multimodal discrimination, trained image feature extraction networks and gene feature extraction networks are used for unimodal feature extraction, and the multimodal features are fused by weighted average linear aggregation and then fed into the sigmoid classifier for final classification. In addition, a multimodal imaging genetics correlation analysis algorithm was performed on scar tissue images and gene expression data to gain insight into the causes of scar formation and identify potential targets for scar treatment. The contributions of this study are as follows:

  • Accurate discrimination of histopathological images and gene expression data of scar tissue using residual-network-based CNN model and L1 regularized logistic regression models.

  • A feature extraction network was constructed for different modal data to achieve effective extraction of features from different modal data, and a feature fusion module was designed to fuse multimodal features to improve the objectivity of scar tissue discrimination.

  • Using the MCJNMF algorithm to correlate collagen fiber features and gene expression, we mined potential pathological mechanisms of scar tissue formation and identified possible therapeutic targets for scarring.

2 Materials and methods

2.1 Workflow of this study

The research content of this study was divided into three tasks, as shown in Fig. 1: Task1 is the design and implementation of the unimodal discriminative model, Task2 is the design and implementation of a multimodal discriminative model, and Task3 is the investigation of the biological mechanism of scar tissue formation. These three tasks are described as follows.

In Task 1, for the modal discrimination of collagen fiber micrographs, the images were input into the proposed CNN model for collagen fiber feature extraction. After the fully connected layer, the extracted features were expanded into one-dimensional features, which were then connected to a Sigmoid classifier to achieve unimodal discrimination of the collagen fiber micrographs. For gene expression modal discrimination, the L1 regularized logistic regression model was used to extract gene features, and a Sigmoid classifier was connected to the model to obtain the final discrimination results. In Task 2, in the feature extraction layer, the image and gene discrimination models trained in Task 1 were used as the feature extraction network for the image and gene modalities. In the feature fusion layer, based on the image and gene features extracted by the feature extraction network, a linear weighting network is used to fuse the features of the two modalities. Finally, the fused features were input into the Sigmoid classifier to achieve multimodal discrimination. In Task 3, to explore the causes of scar tissue formation more deeply, we performed a bioinformatics analysis of the scar tissue at the macroscopic and image genetics levels. For the macroscopic characterization of collagen fibers, scar tissue images, and normal tissue images were input into the image discrimination model. The image of the 32nd channel of conv1 was extracted, and the density and anisotropy parameters of the collagen fibers were extracted using the Matfiber algorithm. Density and alignment characterization of collagen fibers of scar tissue and normal tissue were performed, and the differences in density and alignment between the collagen fibers of scar tissue and normal tissue were analyzed. In the image genetics level analysis, the extracted collagen fiber features and gene features were correlated using the MCJNMF algorithm to obtain the co-expression module. The genes in the co-expression module were taken and intersected with the differential genes of scar tissue and normal tissue to obtain the intersecting genes related to the formation of collagen fibers in scar tissue, and then the intersecting genes were enriched and analyzed to explore the pathogenesis related to the formation of collagen fibers in scar tissue. In addition, receiver operating characteristic (ROC) curves of the intersected genes were plotted to obtain abnormally expressed genes with specific correspondences to scar formation and biological mechanisms to identify potential targets for disease treatment.

Fig. 1
figure 1

Workflow of multi-functional discriminatory platform and bioinformatics analysis of scar tissue at macro- and micro-levels

2.2 Image discrimination model

Figure 1 (a)–(c) shows a block diagram of the CNN model used for the discrimination of collagen fiber micro-images. First, each image (training and test sets) was resized to the input size of the model (224 × 224 pixels) using the resize method in transforms, and the images were normalized using the normalization method such that the distribution of the pixel values in each channel was close to the zero-mean and unit variance. The proposed CNN model (Fig. 1(a)) uses the structure and weights of Stage1-Stage2 of ResNet-50 pre-trained on ImageNet and freezes the parameters. After the pre-training block, four cascaded trainable convolutional layers (out_channels = 256, kernel_size = 3, stride = 1, and padding = 1) were added, and the parameters of the convolutional layers were initialized using the Kaiming uniform initializer. The first of these trainable convolutional layers was used for channel number shrinkage (the number of channels was reduced from 512 to 256) to reduce the model complexity. In addition, three cascaded convolutional layers are introduced to increase the nonlinear representation capability of the network, improve the sensory field, and extract high-level features of the image. Feature activation is then achieved using the ReLU activation function, followed by input to the global average pooling (GAP) layer for dimensionality reduction of the feature maps. This improves the computational efficiency and generalization ability of the model while simultaneously enhancing the model’s translation invariance to the image for better adaptation to the image classification task. After the global average pooling layer, a flattening layer is added to perform the spreading operation on the obtained feature maps, and the obtained one-dimensional feature vectors are input into the fully connected layer. A Sigmoid classifier was used in the last layer to classify normal and scar tissues (Fig. 1(b)), and its output scores were in the range [0,1] (Fig. 1(c)). The pseudocode implemented in this model is provided in Online Resource 1. Note that we optimized the learning rate and training batch size of the model using a grid search algorithm and a cross-validation method to obtain the optimal hyperparameter configuration.

2.3 Gene chip discrimination model

Figure 1(d)-(f) shows a block diagram of the unimodal discriminative model for gene expression data. In this block diagram, we use the L1 regularized logistic regression model for gene modality discrimination(Fig. 1(d)). First, the data were preprocessed; that is, the gene expression values were normalized to ensure that each gene feature contributed equally to the training process of the model. A logistic regression model was chosen to implement the binary classification task, and L1 regularization was applied to the training set. The strength of the L1 regularization was controlled by a specified parameter (λ), and we used the LogisticRegression method in the sklearn library to achieve this. L1 regularization is a penalty term attached to the loss function, which penalizes the model’s performance by adding the sum of absolute values of the parameters to the loss function to penalize the complexity of the model and minimize the sum of the loss function and the regularization term, thus reducing model overfitting and inducing sparsity in the model parameters. Therefore, the objective function can be expressed as follows:

$$\begin{aligned} J\left( w \right)\, & = - \frac{1}{m}\sum _{i = 1}^m\left[ {{y^{\left( i \right)}}{\text{log}}\left( {{h_w}\left( {{x^{\left( i \right)}}} \right)} \right) + \left( {1 - {y^{\left( i \right)}}} \right){\text{log}}\left( {1 - {h_w}\left( {{x^{\left( i \right)}}} \right)} \right)} \right] \\ & \quad + \lambda \parallel w{\parallel _1} \\ \end{aligned}$$
(1)

The loss function \(J\left(w\right)\) consists of a cross-loss term \(-\frac{1}{m}{\sum }_{i=1}^{m}\left[{y}^{\left(i\right)}\text{log}\left({h}_{w}\left({x}^{\left(i\right)}\right)\right)+\left(1-{y}^{\left(i\right)}\right)\text{log}\left(1-{h}_{w}\left({x}^{\left(i\right)}\right)\right)\right]\)and an L1 regularization term \(\lambda {\parallel w \parallel}_{1}\), where \(w\) is the parameter vector of the model, \(m\) is the number of samples, \({y}^{\left(i\right)}\) is the true label of the ith sample, \({h}_{w}\left({x}^{\left(i\right)}\right)\) is the predicted value of the model for the Ith sample, \(\lambda\) is the regularization parameter, which is used to control the strength of the regularization, and \({\parallel w\parallel}_{1}\) denotes the number of L1-paradigms of the \(w\) of the parameter vector, which represents the sum of the absolute values of parameters. The ultimate goal of model training is to minimize the sum of the loss function and regularization terms to obtain a model that performs well on the training data and has fewer parameters.

Finally, a Sigmoid classifier was accessed after the L1 regularized logistic regression model to classify the gene expression data of normal and scar tissues (Fig. 1(e)), and its output scores were in the range of [0,1] (Fig. 1(f)). The pseudocode implemented in this model is provided in Online Resource 2. Note that we optimized the parameters of the LogisticRegression function, including the solver and regularization coefficients, using a lattice search algorithm to obtain the best hyperparameter configuration.

2.4 Multimodal data fusion discriminant model

Figure 1(g)–(j) shows block diagrams of the multimodal discriminative models for collagen fiber microimages and gene expression data. The image modal discriminant model and gene modal discriminant model trained in Task1 were used as image feature extractor and gene feature extractor, respectively. First, we loaded the image discriminative model using PyTorch and set it to evaluation mode, which was performed to utilize only the forward propagation process of the model to extract high-level feature representations of the input image. Simultaneously, we loaded the gene discrimination model using the joblib library and called the weights of this model to extract the corresponding important gene features (Fig. 1(g)). After acquiring the image and gene features, a weighted average linear aggregation network was used to fuse the two modal data sets to obtain the fused features (Fig. 1(h)). The specific realization process is shown in Fig. 2. The basic principle of weighted average linear fusion is to weigh and average the outputs of multiple features or models, where the weight of each feature or model is determined using methods such as a priori knowledge, experience, or cross-validation. Typically, weights depend on the performance and contribution of each feature or model, and features or models with better performances may be assigned higher weights. In this experiment, the average weights of the image and gene features were obtained by evaluating the performance metrics (F1 scores) of the two feature extraction networks in the validation set and normalizing them. The average weighted linear fusion result \({F}_{ensemble}\) can be expressed as follows:

$${F}_{ensemble}= {\sum }_{i=1}^{N}\frac{{S}_{i}}{\sum _{j=1}^{N}{S}_{j}}{F}_{i}$$
(2)

where \({S}_{i}\) is the performance metric of each feature-extraction network, \(N\) is the total number of feature extraction networks, \(\frac{{S}_{i}}{{\sum }_{j=1}^{N}{S}_{j}}\)is the corresponding weight of each feature extraction network, and \({F}_{i}\) is the output of each feature extraction network. The pseudocode implemented in this model is provided in Online Resource 3.

The advantage of this approach is the automated determination of weights based on performance, which allows better-performing features or models to influence the final fusion results, thus improving the overall model performance.

Fig. 2
figure 2

Feature fusion layer framework

2.5 Imaging genetics correlation analysis algorithm

In this study, we used the MCJNMF algorithm to model the associations between macro- and micro-level data. This approach integrates both genomic and image data and helps identify common modules associated with diseases.

For the collagen fiber micrographs, 29 texture features were extracted from the images using the MatFiber and Haralick algorithms. Within the context of our investigation, we address two distinct data matrices: \({X}_{1}\), which represents the feature matrix derived from the microscopic image, and \({X}_{2}\), which represents the gene expression matrix. To reveal the shared underlying patterns within both datasets, we utilized a framework that decomposes the original matrices into a common base matrix, denoted as \(W\). This process is accompanied by distinct coefficient matrices, namely \({H}_{I}(I=\text{1,2})\), which are associated with each dataset [16]:

$${X}_{I}\approx W{H}_{I}, W\ge 0, {H}_{I}\ge 0, I=\text{1,2}$$
(3)

The absolute values of the Pearson correlation coefficients between the image features and the gene expression matrix data were then computed, and the matrix of correlation coefficients was defined as the a priori knowledge matrix \(A\), which can be encoded by the following objective functions:

$$\sum _{ij}{a}_{ij}\left({h}_{i}^{1}\right){h}_{j}^{2}=Tr\left({H}_{1}A{H}_{2}^{T}\right)$$
(4)

where \({a}_{ij}\) refers to an element of the adjacency matrix, and the value of \({a}_{ij}\) refers to the degree of relevance.

Alternatively, using linear programming, orthogonal constraints are added to \(H\), whose objective function can be defined as follows:

$$\begin{aligned}\Gamma \left( {W,{H_1},{H_2}} \right) & = \sum\limits_{I = {\text{1,2}}} {\parallel {X_I} - W{H_I}\parallel _F^2} + \lambda Tr\left( {{H_1}A{H_2}} \right) \\ & \quad - {\gamma _1}\parallel W\parallel _F^2 + \sum\limits_{I = {\text{1,2}}} {\left[ {{\gamma _2}\parallel {H_I}{\parallel _1}\left( {\parallel {H_1}H_I^T\parallel _F^2} \right)} \right]} \\ \end{aligned}$$
(5)

Where the parameter \(\lambda\) is the weight for the must-link constraint defined in \(A\). \({\gamma }_{1}\) is used to limit the growth of \({W}_{ }\) and \({\gamma }_{2}\) is used to constrain \(H\).

The pseudocode implemented in this arithmetic is provided in Online Resource 4.

3 Results

3.1 Implementation details and evaluation metrics

The model was trained and tested using an Intel® Core™ i9-13900 K CPU @ 3.0 GHz processor, NVIDIA RTX A5000(GPU), Python 3.8.7, Pytorch 2.1.0, and Windows 11 operating system. To evaluate the classification performance of different models, the accuracy, precision, recall, F1 score, receiver operating characteristics (ROC), and area under the curve (AUC) were measured.

3.2 Data source and preprocessing

Picrosirius Red staining is a tissue-staining method commonly used to observe and analyze collagen fibers. In this staining technique, Picrosirius Red-stained collagen fiber tissues appear green to red in polarized light, and through color deconvolution and normalization, the tissue image can be decomposed into red and green channel images, where the red channel image represents mature collagen fibers and the green channel image represents immature collagen fibers. By combining the red and green channel images, it is possible to combine the information of mature and immature collagen fibers to obtain more comprehensive collagen fiber characteristics. The histopathological images used in this study were derived from a database of Sirius red-stained skin collagen fiber micrographs created by Mascharak et al. [17]. , which included 1048 red-channel images and 1048 green-channel images. The images cover normal skin and skin tissue images at week 2, month 1, and month 3 after intervention with PBS and verteporfin. In this experiment, we selected 246 microscopic images at specific time points after PBS intervention and 240 microscopic images at specific time points after verteporfin intervention as the scar group images, and 306 images of uninjured skin as the normal group images (including red and green channel images). The raw TIF images were converted to PNG for computer processing. Using the OpenCV add Weighted method, the red and green channel images of each sample were linearly combined with the same weight (0.5) to produce a merged image. After this process, we obtained a new dataset containing 273 micrographs of collagen fibers from normal skin and 123 micrographs from scarred skin. To address the problem of training bias that may result from an insufficient data volume, a data enhancement strategy was employed that included flipping the images vertically and horizontally and rotating them by 90°. Eventually, an augmented dataset was obtained that included 492 scar tissue images and 1092 normal tissue images. Subsequently, all images were normalized to 500*500 pixels. Specific image data information are listed in Table 1.

Table 1 Number of images of normal skin samples and scarred skin samples in the original and augmented databases

Gene expression data from the Gene Expression Omnibus (GEO) database, a public database created and maintained by the National Center for Biotechnology Information (NCBI) of the U.S. National Institutes of Health (NIH), contains millions of gene expression samples from around the world. Researchers can access publicly available gene expression data from the GEO database using data numbers. In this experiment, all the samples were from the GPL570 platform; therefore, the number of gene features contained in each sample was the same (23,521). Information about the source and number of samples in the scarred and normal groups is listed in Table 2. First, the gene expression profiles of the samples were loaded by data numbering. Data filtering was performed on the gene expression profiles and negative expression levels or obviously noisy data were placed as missing values. Next, the missing values were filled in using the mean value method. The data were then log-transformed to approximately follow a normal distribution. Finally, data standardization operations were performed to remove systematic errors and ensure the reliability of later data analysis. After the above pre-processing, the constructed gene expression matrices of all samples had dimensions of 42 × 23,521 (42 samples × 23,521 genes).

Table 2 Datasets for gene expression data

3.3 Unimodal discriminant model

In terms of image unimodal discrimination, using the grid search algorithm and cross-validation method, we observed that the proposed model performed best in discrimination when the learning rate was 0.0001 and the batch size was 64. Therefore, in the hyperparameter configuration of the proposed CNN model, we chose CrossEntropyLoss as the loss function of the discriminative model to improve the convergence speed and performance of the model. Considering the convergence speed and stability of the model, we configured the Adam optimizer and set the learning rate to 0.0001. The batch size of the dataset was set to 64 and the epoch during training was set to 50. Three cascaded 3 × 3 convolutional layers were used for feature extraction, which helped introduce more nonlinear transformations so that the network could better capture the complex patterns and features in the input data. Ablation experiments were designed based on the model structure to investigate the contribution of added cascaded convolutional layers to the discriminative model. We compared the proposed CNN model with the three structural fine-tuning models listed in Table 2. ResNet_FT1 removed Conv4 and Conv5; ResNet_FT2 removed Conv3, Conv4, and Conv5; and ResNet_FT3 removed the four convolutional layers of the cascade and uses only Stage1-Stage2 of ResNet-50. The hyperparameter configuration of the structural fine-tuning model was the same as that of the proposed image-discrimination model. Our experiments used 492 scar images and 1092 normal images as image datasets, of which 70% were used for training and 30% for testing.

The loss rate variation, accuracy variation, and ROC curves for the test set during training are shown in Fig. 3. The performance metrics of the comparison models are listed in Table 3. The experimental results show that the proposed CNN model has the best classification performance among the three structural fine-tuning models, improving the AUC by approximately 30% and the accuracy by 19.83% compared with the convolutional layer without adding four cascades (ResNet_FT3). This shows that the convolutional blocks we incorporated yield good results. Compared to the model with only one channel shrinkage layer (ResNet_FT2), the proposed CNN model improves the AUC by approximately 15% and the accuracy by 9.81%. Compared to the model with only one channel shrinkage layer and one feature extraction layer (ResNet_FT1), the proposed CNN model improves the AUC by approximately 7% and accuracy by 3.92%. The proposed CNN model achieves the highest precision, recall, and F1 score, which indicates that the proposed CNN can better discriminate scar tissue images and reduce the false positive rate. Compared with the other three models, the proposed model has the shortest training practice and the lowest time cost.

Table 3 Comparison of classification performance of proposed CNN and three models with different structure
Fig. 3
figure 3

Training process of unimodal discriminative models with different fine-tuning structures. (a) loss rate curve, (b) accuracy curve, and (c) ROC curve

In addition, we compared the proposed model with the full migration learning ResNet-50 (ResNet_TL), VGG16 (VGG_TL), AlexNet (AlexNet_TL), and fine-tuned ResNet (ResNet_FT) models to evaluate its classification and feature extraction for scar tissue image performance. During the model training, we used the hyperparameters listed in Table 4. The loss rate variation, accuracy variation, and ROC curves for the test set during training are shown in Fig. 4. The performance metrics of the compared models are listed in Table 5. The experimental results show that the proposed CNN model has the best classification performance among other pre-trained large models, and compared with the original ResNet-50 model (ResNet_TL) and the fine-tuned ResNet-50 model (ResNet_FT), the proposed CNN model improves the accuracy by 3.49% and 0.44%, and the AUC values by approximately 7% and 3%, respectively; the model sizes of ResNet_TL and ResNet_FT are 90 M, whereas the size of the proposed model is 16.8 M, which indicates that the proposed CNN model greatly reduces the computational cost and improves the discriminative accuracy simultaneously. Compared with VGG_TL and AlexNet_TL, the AUC value of the proposed CNN model is still 5% and 4% higher, respectively. In addition, the F1 score reaches the highest value of 98.27%, which indicates that the proposed CNN model can effectively achieve scar tissue classification discrimination while reducing the computing cost, and in the case of limited computing cost, the proposed CNN model has a higher utilization value. In terms of the time cost of model training, the training time of AlexNet_TL was slightly lower than that of the proposed CNN model; however, the proposed CNN model was able to achieve the optimum performance in all other model metrics.

Table 4 Hyperparameter configuration of the training process
Table 5 Comparison of classification performance of proposed CNN and four models
Fig. 4
figure 4

Training process of unimodal discriminative models with different pre-trained macromodels. (a) loss rate curve, (b) accuracy curve, and (c) ROC curve

For the genetic model for scar tissue discrimination, the performance of the model was optimal when the liblinear solver was used, and the regularization ratio was 0.5, as obtained from a grid search of the solver and regularization ratio; therefore, in this study, the penalty parameter of the logistic regression model was set to l1, and it was optimized using the liblinear solver, with the L1 regularization ratio set to 0.5. The addition of the L1 regularization term to the logistic regression model can produce a sparse solution, which can be applied to feature selection to compress the unimportant feature coefficients to zero. Based on this, the optimal hyperparameter configuration was obtained using a grid-search algorithm. Therefore, the optimal hyperparameter configuration can be obtained using a grid-search algorithm. To verify the performance of the proposed logistic regression model with L1 regularization, we compared the proposed gene discrimination model with three fine-tuned models: the penalty parameter of LogisticRegression_FT1 was set to l2, and the regularization ratio was set to 0.5. The penalty parameter of LogisticRegression_FT2 was set to l1, the regularization ratio was set to 0.1, and LogisticRegression_FT3 did not set the regularization term. In this experiment, 19 scar and 23 normal samples were used as the genetic dataset, of which 60% were used for training and 40% for testing. Table 6 lists that the proposed gene-discrimination model had the best discrimination performance among the three fine-tuned models. Compared with the use of the L2 regularization (LogisticRegression_FT1) method, the proposed model improves the accuracy by 17.65% and AUC by approximately 5%, which demonstrates the effectiveness of using the L1 regularization term on the constructed gene expression dataset. When the regularization strength is strengthened, the fine-tuned model (LogisticRegression_FT2) shows a decrease in accuracy, precision, F1 score, and AUC value metrics compared with the proposed model, which proves that the hyperparameters configured in the proposed model have better discriminative performance. Compared with the logistic regression model without regular terms (LogisticRegression_FT3), the proposed model improves discrimination accuracy by 17.65% and the AUC value by approximately 6%, which proves that the inclusion of the L1 regular term in the logistic regression model improves discrimination performance. Compared with the other three models, the proposed model has the shortest training practice and the lowest time cost.

Table 6 Comparison of classification performance of proposed model and three models

3.4 Multimodal discriminant model

In terms of multimodal discrimination, the performance index of the image feature extraction network on the constructed image dataset (Table 1) was 98.27%, and that of the gene feature extraction network on the constructed gene expression dataset (Table 2) was 100%. After normalization, we set the weighted average weight of the image feature extraction network to 0.49 and the weighted average weight of the gene feature extraction network to 0.51. The image features extracted by the image feature extraction network were 256, and the gene features extracted by the gene feature extraction network were 33. Therefore, the final multimodal fusion features obtained after weighted average fusion were 289. In the training process, CrossEntropyLoss was used as the loss function. Considering the convergence speed and stability of the model, we configured the Adam optimizer and set the learning rate to 0.001. A sigmoid classifier was used to classify normal and scar tissues with output scores ranging from [0,1]. Pairing the image data with the gene data, a total of 9348 paired samples (19 scar gene samples × 492 scar image samples) were obtained in the scar group, and a total of 25,118 paired samples (23 normal gene samples × 1092 normal image samples) were obtained in the normal group. The two groups of paired samples formed a multimodal dataset, of which 70% was used for training and 30% for testing.

To verify the contribution of the constructed unimodal feature-extraction network to the multimodal discriminative model, we designed an ablation experiment based on the structure of the image feature-extraction network. We compared the proposed multimodal discriminative model with the three structural fine-tuning models listed in Table 6: Fusion_FT1 removes Conv4 and Conv5 of the image feature extraction network; Fusion_FT2 removes Conv3, Conv4, and Conv5 of the image feature extraction network; Fusion_FT3 removes the cascading four convolutional layers and only Stage1-Stage2 of the image feature extraction network are used; Fusion_FT4 changes the model weight coefficients \({S}_{1}\) and \({S}_{2}\)of the weighted aggregation network, setting \({S}_{1}\) to 0.6 and \({S}_{2}\) to 0.4; and Fusion_FT5 de-emphasizes the weighted average aggregation network and directly splices the obtained multimodal features. The hyperparameter configuration of the structural fine-tuning model is the same as that of the image discrimination model proposed in this study.

The loss rate variation, accuracy variation, and ROC curves for the test set during training are shown in Fig. 5. The performance metrics of the compared models are listed in Table 7. The experimental results show that the proposed multimodal discriminative model exhibits the best classification performance among three structurally fine-tuned multimodal discriminative models. Compared with Fusion_FT1, the proposed multimodal discriminative model improved the AUC by approximately 2% and the accuracy by 1.47%. This indicates that the trained feature extraction network achieves good feature extraction. Compared to the Fusion_FT2 and Fusion_FT3 fine-tuning models, the proposed multimodal discrimination model has the highest precision and recall. This indicates that the incorporation of the feature extraction network is capable of extracting high-level features of the image, which has a very positive effect on multimodal feature fusion discrimination. Compared with Fusion_FT4 and Fusion_FT5, the proposed multimodal discriminative model has an AUC of 0.97. However, in terms of accuracy, precision, recall, and F1 score, the proposed multimodal discriminative model can achieve the highest discriminative standard, which proves the effective role played by the weighted average linear network in feature fusion and improves the discriminative performance.

Table 7 Comparison of classification performance of proposed fusion model and three fusion models
Fig. 5
figure 5

Multimodal discriminant model training results. (a) loss rate curve, (b) accuracy curve, and (c) ROC curve

3.5 Results of Imaging Genetics Association Analysis

For genetic association analysis of the scar tissue, we used the Matfiber and Haralick algorithms to extract 29 different textural features from the collagen fiber micrographs. These features provide a powerful toolkit for us to deeply understand the intricate microstructure of skin tissues. These rich textural attributes helped us conduct multifaceted explorations and reveal the inherent differences and properties of various skin tissue types. For the processing of gene expression data, this study involved 42 samples, each containing 23,521 genes. First, we performed a log2 transformation of the genes in all samples to enhance the centralization of the data and facilitate subsequent calculations. Next, the ComBat method was used to remove batch effects from all samples to eliminate potential effects between batches and obtain the final preprocessed gene expression data. This study combined image feature data with preprocessed gene expression data under the guidance of the MCJNMF algorithm. By strategically choosing the parameters L1 = 0.001, r1 = 1, r2 = 1, K = 7, and a = 0.001 [15], seven common modules were successfully extracted from the combined dataset. The feature information of each module is listed in Table 8. This integrated approach provides a panoramic view by interweaving image features with gene expression features, revealing the intricate tapestry and intrinsic diversity of skin tissues.

Table 8 Common module features

4 Discussion

This study demonstrated that the designed multi-functional scar tissue discrimination platform can accurately classify unimodal and multimodal input data, achieving objective scar tissue discrimination. To explore the mechanism of scar tissue formation in detail, in subsequent analyses, the representation of collagen fibers in scar tissue and normal tissue at the macroscopic and imaging genetics levels was further explored. In the macroscopic level analysis, we characterized the density and orientation of collagen fibers and chose the channel 32 image of the conv1 layer in the proposed CNN model as the feature extraction channel image (which stems from the model architecture and domain knowledge, as the conv1 layer enhances the responsiveness to texture features to some extent). Figure 6 shows the density and orientation characterization maps of the collagen fibers in scar tissue and normal tissue. In addition, the density, circular standard deviation, and angular deviation of collagen fibers in the two groups were statistically analyzed (Fig. 7), which shows that these texture features are distinctly different between the scar tissue and normal skin tissue. The collagen fibers in scarred skin were significantly denser and more densely arranged than those in normal skin. This is consistent with the biological changes that occur during the healing process of scar tissue and provides a direct underlying biological mechanism for the scar tissue discrimination platform that was constructed. In addition, statistical analyses were performed to quantify the textural differences in collagen fiber characteristics in terms of alignment strength and angular deviation. Scarred skin had a smaller circular standard deviation and angular deviation, indicating that the collagen fibers tended to be centrally distributed and aligned. In contrast, collagen fibers in normal skin were dispersed in multiple directions. These statistical analyses provided objective quantitative evidence of textural differences between scar tissue and normal skin.

Fig. 6
figure 6

Density characterization and arrangement characterization of collagen fibers in scarred and normal groups

Fig. 7
figure 7

Collagen fiber characterization results. (a) Statistical analysis of collagen fiber density. (b) Statistical analysis of collagen fiber circular standard deviation. (c) Statistical analysis of collagen fiber angular deviation

DO enrichment analysis of the genes in the seven common modules revealed that the genes in module 4 were associated with collagen diseases, rheumatism, and systemic scleroderma (Fig. 8(a)). Such conditions stem from aberrant collagen fiber synthesis, organization, or perturbations in collagen fiber-associated cellular signaling. This finding underscores the potential significance of module 4 in disease-linked biological processes, offering insights into the mechanisms underlying skin scarring. This unveils the pivotal role that genes within module 4 play in the context of disease-affected scarred skin and reorganized collagen fibers in normal skin, signifying their potential involvement in disease pathogenesis. This lends credence to the notion that our gene set is intrinsically linked to collagen fiber-associated biological processes and diseases. Further examination via GO enrichment analysis revealed associations with terms like “blood vessel diameter maintenance,” “regulation of tube size,” “vascular process in the circulatory system,” and “regulation of vasoconstriction,” underscoring these genes’ involvement in regulating vascular and tubular structures, thus maintaining circulatory system functionality (Fig. 8(b)). This hints at the potential role genes in module 4 play in controlling collagen fiber density and arrangement, ultimately contributing to skin tissue function and homeostasis. This closely aligns with the distribution and characteristics of collagen fibers in scarred and normal skin. KEGG enrichment analysis demonstrated pathways linked to collagen fibers, including the “Calcium signaling pathway” and “MAPK signaling pathway” (Fig. 8(c)). The Calcium signaling pathway is pivotal in extracellular matrix synthesis, tissue structure upkeep, and cellular signaling of collagen fibers [19, 20]. Similarly, the MAPK signaling pathway regulates collagen fiber synthesis, catabolism, cell proliferation, and apoptosis, all of which influence collagen fiber-associated functions and morphological attributes in scarred and normal skin [17, 18, 21]. These findings provide vital clues for understanding the molecular mechanisms underlying collagen fiber synthesis and tissue regulation.

Fig. 8
figure 8

Results of enrichment analysis of co-expressed genes from module 4. (a) DO enrichment results. (b) GO enrichment results. (c) KEGG enrichment results

We identified potential biomarker genes associated with scar tissue through meticulous sequencing of differential and co-expression module analyses. This process commenced with the identification of 417 differential genes through differential analysis. These discrepancies in gene expression between scar and normal tissues are potentially intertwined with physiological and pathological collagen fiber-related processes. Using MCJNMF-based multimodal data association analysis, we identified seven co-expression modules, with module 4 emerging as correlated with collagen diseases and skin disorders. This insight was reinforced through enrichment analysis of 1212 genes within module 4. An intersection operation involving 417 differential genes and module 4 genes yielded 19 potential biomarker genes (Fig. 9(a)). Further refining this selection through ROC curve analysis, we identified 11 potential marker genes characterized by AUC values exceeding 0.5 (Fig. 9(b)). An in-depth biological assessment of these marker genes revealed their diverse involvement in processes related to scarred skin. For instance, TRIM59-encoded proteins may modulate cell cycle and apoptosis, potentially affecting collagen fiber production and repair [22]. TBC1D9-encoded proteins involved in intracellular membrane trafficking may regulate collagen fiber synthesis and distribution. These findings hint at their pivotal roles in biological processes linked to scarred skin.

Fig. 9
figure 9

(a) Volcano maps of intersecting genes in module 4 and differentially expressed genes. (b) ROC curves for potential marker genes

In summary, these potential marker genes play roles in diverse cellular processes and pathways encompassing the cell cycle, intracellular membrane transport, immune regulation, and cell signaling. These processes are intricately linked to collagen fiber generation, repair, and regulation. This highlights the substantial involvement of these potential marker genes in the biological progression of scarred skin and provides invaluable insights into the molecular mechanisms underlying collagen fiber-related disorders.

5 Conclusion

In this study, we successfully established a versatile discriminative platform for identifying scar samples that synergistically integrates a residual network-based CNN model, a logistic regression model with L1 regularization, and a multimodal feature fusion technique with a weighted average aggregation network for both unimodal and multimodal data inputs. In addition, the characterization of collagen fiber features extracted from 32-channel images of the conv1 layer in the proposed CNN model revealed significant changes in the density and arrangement of collagen fibers in the scarred skin. This dynamic change suggests that the microstructural properties of collagen fibers in scarred skin are altered depending on the disease state, providing insights into the intricate biological properties of these fibers. DO, GO, and KEGG enrichment analyses played key roles in identifying genes that were closely associated with collagen fibers. The DO enrichment analysis highlighted the close association of module 4 with various diseases associated with irregular collagen fibers, such as collagen disease, rheumatism, and systemic scleroderma. These findings strongly corroborate the results of our genetic screening efforts, reinforcing the biological significance of the identified genes and their relevance to the disease. GO enrichment findings highlighted the pivotal contribution of these genes to regulating vascular and ductal structures, maintaining circulatory system functions, and other vital biological processes. The KEGG enrichment results highlighted the critical roles of these genes in collagen fiber synthesis, extracellular matrix regulation, and cellular signaling. This underscores their profound involvement in scarring, underscoring their regulatory roles in collagen fiber shifts within scarred and normal skin. Furthermore, our selection of potential biomarker genes related to collagen fibers within the scar tissue derived from module 4 and differentially expressed genes revealed a diverse array of biological functions. Delving into the biological roles of these potential markers, it is evident that these genes participate in multiple biological processes, such as cell cycle regulation, intracellular membrane transport, immune regulation, and cell signaling. These functions are intricately connected to the creation, repair, and control of collagen fibers. This not only offers cues for delving deeper into the molecular mechanisms of diseases related to scarred skin but also provides promising molecular targets for future therapeutic strategies.

In conclusion, this study establishes a versatile platform for scar tissue discrimination and makes an important contribution to unraveling the molecular basis of collagen fiber-related diseases. We strongly believe that these findings will provide new approaches for the treatment, diagnosis, and prevention of skin scarring and valuable references for broader biomedical research efforts.

Here we also present the limitations of our current work. First, the proposed multimodal discriminant model has currently only been validated on customized multimodal datasets, but has not yet fully considered the matching of multimodal data. To optimize this, future research should focus on the matching of image data and gene expression data. It is crucial to construct more complete and reliable datasets to ensure the reliability and validity of the platform in clinical applications. Second, in addition to further research in the field of computer-aided diagnosis, future work will focus on the bio-experimental validation of the pathogenic mechanisms of scarring. The 11 potential targets of scar pathogenesis identified in this study will be important for future research. Relevant biological experiments will help to validate the exact roles and mechanisms of these targets in the process of scar formation. To this end, cellular and animal models of scar tissue will be established to simulate the biological process of scar formation to provide a reliable experimental platform for validation. In-depth study of the functions of these biomarker genes will explore their roles and regulatory mechanisms in the process of scar formation, providing new theoretical and practical support for scar treatment. In addition, combining the results of biological experimental validation with clinical practice will advance the clinical translation of research results and provide more effective treatment and management programs for scar patients.