Abstract
In Lie group convolutional neural networks (LG-CNNs), the calculation and storage of Lie group distances have quadratic space complexity. In order to improve the memory utilization efficiency of LG-CNNs, a novel Lie group convolutional neural network called LieCConv is proposed. LieCConv utilizes an innovative sampling algorithm and a linear space complexity calculation and storage approach for Lie group distances, substantially enhancing network memory efficiency. Firstly, LieCConv employs a novel sampling algorithm called array-neighborhood sampling (ANS) in the downsampling stage. ANS only requires neighborhood information to obtain an excellent sample set with a low threshold of use. The sample set generated by ANS reflects the distribution of the original set. Then, LieCConv adopts a batch calculation and storage scheme for Lie group distances, which effectively declines the space complexity of calculating and storing Lie group distances from quadratic complexity to linear complexity, reducing the memory consumption during training. Finally, the contrast between ANS and farthest point sampling was presented, demonstrating that ANS better captures the distribution characteristics of the original dataset. The memory usage of LieCConv and LieConv was compared, revealing that LieCConv reduces the memory usage for calculating and storing Lie group distances to less than 500 MB. And the performance of LieCConv was evaluated on RotMNIST, RotFashionMNIST and TT100K, validating that LieCConv is universal and effective.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
Enabling computers to play a role in tackling more complex tasks has been a significant target for researchers in computer science and artificial intelligence. Convolutional neural networks (CNNs) are one of the research achievements. They exhibit powerful learning capabilities and are used to tackle various complex tasks, such as natural language processing, image recognition, etc. CNNs automatically extract useful information for tasks by learning the features and patterns, enabling them to tackle complex problems. Due to this characteristic, CNNs have been extensively employed in numerous domains. Image classification is an important research direction of CNNs, which is to divide images into multiple different categories. In recent years, the development of CNNs has led to significant achievements in the field of image classification. The convolutional models autonomously extract and learn features from images, substantially improving the accuracy of image classification. The high accuracy of image classification models has empowered their applicability to address numerous real-world challenges, including medical image classification and content moderation for social media images. The success of CNNs relies heavily on the translation equivariance exhibited by convolutions. The translation equivariance of CNNs ensures that the input and output of the network are consistently subjected to the same translation transformations, providing an inductive basis for tasks with translation symmetry. However, traditional CNNs do not necessarily possess equivariance with respect to transformations like rotation and scaling.
With the collective efforts of researchers, CNNs are gradually developing in diverse directions. Research on CNNs is no longer limited to traditional convolutional approaches. There are a growing number of variant convolutional schemes and novel convolutional architectures. Group convolution is one of the new convolutional schemes. Group theory provides a rational explanation for the translation equivariance and symmetry properties of convolution, laying the foundation for convolution on Lie groups. Furthermore, when selecting an appropriate Lie group, CNNs based on Lie groups possess equivariance to rotation, scaling, and other transformations. Many scholars have recognized this and conducted extensive research on group convolutions. Most of the work is specific to certain classes of Lie groups, making it difficult to transfer to other Lie groups conveniently. Finzi et al. [9] designed a group convolutional scheme called LieConv that addresses this limitation. LieConv is designed to be applicable to non-homogeneous and arbitrary spatial data, significantly reducing the computational cost of achieving equivariance on new Lie groups. However, the computation and storage of Lie group distances on Lie groups have a high space complexity of \({O \left( n^2 \right) }\), resulting in significant memory consumption. It is a challenging task to design an efficient and reliable scheme for calculating and storing Lie group distances to reduce memory consumption.
A Lie group convolutional scheme called LieCConv is proposed. LieCConv employs a novel sampling algorithm and a linear space complexity calculation and storage approach for Lie group distances, significantly improving memory utilization efficiency. Specifically, LieCConv uses a novel sampling algorithm called array-neighborhood sampling (ANS). ANS only requires the neighborhood information of elements to acquire a sample set that covers all the information of the original set, with a lower usage threshold than farthest point sampling (FPS). Additionally, the sample set obtained by ANS preserves the distribution of the original set, which offers a different feature compared to uniform sampling. LieCConv also adopts a batch calculation and storage scheme (BCSS). The scheme reduces the space complexity of calculating and storing Lie group distances from quadratic complexity to linear complexity, which reduces the memory cost of model training. This enables the model to train with larger batch sizes and image sizes. The main contributions are summarized as follows:
-
(1)
To address the issue that existing point data sampling algorithms have a high usage threshold, a novel sampling algorithm is proposed, array-neighborhood sampling. This algorithm only requires the neighborhood information of elements to obtain a sample set, reducing the sampling requirements. In addition, the sample set generated by ANS conforms to the distribution of the original set.
-
(2)
In order to improve memory utilization efficiency, a batch calculation and storage scheme is designed. The scheme calculates Lie group distances in batches and only stores the effective Lie group distances. These reduce the space complexity of Lie group distances from quadratic complexity to linear complexity, significantly improving memory utilization efficiency.
-
(3)
The contrast between ANS and FPS was presented, the memory usage of LieCConv and LieConv was compared, and the performance of LieCConv was evaluated on RotMNIST, RotFashionMNIST and Tsinghua-Tencent 100K [35] (TT100K).
2 Related Works
CNNs are a popular and effective machine learning method, which are widely used in many fields, such as object recognition [13, 23, 32], image classification and segmentation [1, 12, 27, 34], natural language processing [3, 24], audio [28, 30], video [11, 36] and so on. As real-world problems become increasingly complex, CNNs are rapidly advancing. Liu et al. [21] explored the performance changes of ResNet designed according to Vision Transformers networks. They discovered that pure convolutional architecture models were able to achieve competitive results compared to Transformers. To address the lack of scalability in self-attention mechanisms, Tu et al. [26] proposed an efficient and scalable attention model consisting of blocking local attention and expanding global attention. Ding et al. [6] discussed the influence of large convolutional kernels on CNNs. The research results showed that large-kernel networks had larger effective receptive fields and higher shape bias, which achieved better performance.
Image classification is an important research direction in CNNs. At present, image classification has achieved certain results and has been applied to solve practical problems. Janani et al. [15] detected and categorized the nitrogen nutrient of groundnut leaves using a convolutional neural network (CNN). Mahjoubi et al. [22] used a hierarchical deep convolutional neural network (DCNN) to classify exfoliated graphene flakes into six categories. Liao et al. [18] combined the feature extraction capability of CNN with the bio-interpretability of SNN to propose a spiking neural network for motor imagery signal classification. Daoud et al. [4] designed a CNN for classifying and detecting fire in video surveillance scenes. Hassanzadeh et al. [14] proposed a block-based evolutionary DCNN for image classification that can generate variable-length networks with high accuracy while using less computation. CNNs play a critical role in solving practical image classification problems. However, traditional CNNs usually only satisfy translation equivariant, but not necessarily equivariant when dealing with transformations such as rotation and scaling.
Convolution on Lie groups is a novel convolutional architecture that merits in-depth research. By selecting an appropriate Lie group, the constructed convolutional network not only satisfies translation equivariant, but is also equivariant with respect to rotation, scaling, and other transformations. The mathematical concepts of Lie groups and Lie algebras have undergone significant development over the years and now possess a well-established theoretical framework [7, 20, 33]. In practical applications, researchers in the field of robotics have long recognized this theory and successfully applied it in robot control [17, 19, 25]. In the field of computer vision, the theory of Lie groups and Lie algebras has also been applied to machine learning. Esteves et al. [8] developed a precise convolutional scheme on the sphere in the spherical harmonic domain and utilized this convolution to construct a CNN for addressing the problem of 3D rotation equivariance. Bekkers [2] proposed a modular framework that can build Lie group convolutional neural networks (LG-CNNs) for arbitrary Lie groups, which broke through the limitations of discrete groups or continuous compact groups. Finzi et al. [9] designed a general convolutional layer that was capable of accommodating any specified Lie group transformations. They effectively merged equivariance into a new group through exponential mapping and logarithmic mapping, realizing a simple and convenient rapid prototyping. Using the equivariance of SE(3) under continuous 3D roto-translations, Fuchs et al. [10] introduced a variant of 3D point cloud self-attention module that guaranteed the equivariant robustness of SE(3). To solve the problem of discretization of continuous groups in Lie group convolutions, Dehmamy et al. [5] described a Lie algebraic convolutional network that automatically discovers symmetry. Yang et al. [31] applied Lie algebra theory to 3D face recognition and proved that the selection of face in image space is only determined by rotation. Based on this theory, they designed a Lie algebra residual network to handle pose robust face recognition. Knigge et al. [16] designed a separable convolutional kernel and improved the performance and computational efficiency of convolution by sharing weights. However, the memory utilization efficiency of LG-CNNs still has significant room for improvement. A Lie group convolutional scheme called LieCConv with high memory utilization efficiency is proposed.
3 LieCConv
LieConv is a convolutional scheme that complete equivariant operations on Lie group, which successfully extends the application of Lie group to continuous data spaces. This scheme is universal and convenient and has shown good performance on RotMNIST and QM9. However, Lie group distances between elements have a space complexity of \({O \left( n^2 \right) }\), resulting in a high memory consumption for the procedure of calculating and storing the Lie group distances. In the actual implementation process, due to the high space complexity, it is often necessary to limit the batch size to avoid memory overflow. In response to this problem, a novel scheme is designed to calculate and store the Lie group distances. The scheme reduces the space complexity of Lie group distances from \({O \left( n^2 \right) }\) to \({O \left( s * n \right) }\), where \( s < n \). In addition, a novel sampling algorithm called array-neighborhood sampling is proposed, which is more suitable for the novel calculation and storage scheme. Compared with FPS, ANS has lower application requirements. A novel convolutional scheme called LieCConv is proposed based on the above two methods.
Section 3.1 introduces the convolution on Lie groups. Section 3.2 introduces BCSS for Lie group distances. Section 3.3 introduces ANS, which has a wider application prospect than FPS.
3.1 Convolution on Lie Groups
LG-CNNs learn useful information for tasks by inducing the feature representations of data on Lie groups. Therefore, the input data should first be transformed from a Euclidean space X to a Lie group G, which is defined as lifting \( Lift\left( x\right) = g \) for \( x \in X, g \in G \). While completing the spatial transformation of elements, lifting should be able to preserve the characteristics of the elements. Exponential mapping and logarithmic mapping are commonly used to achieve the transformation of elements between Euclidean spaces and Lie groups. For the rotation and translation group, the exponential map from Euclidean space to the Lie group is the matrix exponential
where
The logarithmic map from the Lie group to the Euclidean space is
Neighborhoods of elements on Lie groups are not intuitively available. After lifting elements to a Lie group, the Lie group distance is calculated first, and then the neighborhood of the elements is generated through the Lie group distance. The distances between elements on Lie group are calculated using the Lie group distance formula
By substituting formulas (1) and (5) into formula (6), the distance between elements on the Lie group is calculated. Select the k elements closest to element x to form the neighborhood of x. Then perform convolution on the neighborhood to learn data features. The continuous definition and discrete definition of convolution on Lie groups are as follows [9]:
Continuous definition:
Discrete definition:
where \(k_\theta \) is convolutional filters, f is input feature maps, V is the volume of space, and n is the number of element points. The \(k_\theta \) has independent parameters for each position j in the neighborhood of x.
3.2 Batch Calculation and Storage Scheme
By using an appropriate Lie group, the constructed LG-CNN is able to simultaneously satisfy the equivariant of translation and other transformations, enhancing the performance and robustness of the model. However, before performing convolution on Lie groups, it is necessary to first calculate the distance between elements on the Lie groups. The calculation and storage of Lie group distances have a space complexity of \({O \left( n^2 \right) }\), which requires a large amount of storage resources.
A batch calculation and storage scheme for the Lie group distances is proposed. The Lie group distances are usually calculated and stored at once, which inevitably results in data with a space complexity of \({O \left( n^2 \right) }\). This paper notes that convolutions always only care about the elements surrounding an element, that is, the neighborhood of the element. It is not necessary to preserve elements outside the neighborhood. Based on this opinion, only the Lie group distances from each element to its neighborhood elements is stored, reducing the space complexity from \({ O\left( n^2\right) }\) to \({ O\left( n * s\right) }\), where s is the neighborhood size. Generally, the neighborhood size is much smaller than the number of elements, with \( s \ll n \), so \( n * s \ll n^2 \).
In order to avoid the high space complexity caused by calculating Lie group distances at once, a batch calculation scheme is adopted. The scheme divides Lie group distances into t batches for calculation. After calculating each batch of Lie group distances, the distances from each element to elements outside its neighborhood are discarded.
Formally, for the calculation and storage of Lie group distances, the matrix D for storing Lie group distances is obtained by multiplying the inverse position matrix V and the position matrix U of the elements in the Lie group. The process of discarding the Lie group distances from each element to elements outside its neighborhood is defined as Cut.
According to the rule of block matrix multiplication, the multiplication operation of matrices V and U is performed in two steps. Firstly, divide U into t blocks as \( U_i\left( 1 \le i \le t\right) \). Then perform the multiplication operation of V and block matrix \( U_i \). The formula is
Apply Cut to the block multiplication process of V and U, and perform the Cut on each block multiplication result before saving it. The formula is
Finally, the space complexity of Z is \({ O \left( n*s \right) }\). During the process, the maximum memory usage depends on the maximum memory usage of \( VU_i \) and Z, with their space complexity being \({O \left( n^2 / t \right) }\) and \({O \left( n*s \right) }\), respectively. Therefore, the space complexity of the maximum memory usage is \({O \left( max\left( n / t,s\right) * n \right) }\).
Through the above method, the memory cost during training is successfully reduced without reducing the convolutional effectiveness. Figure 1 is an abstract schematic diagram of the storage schemes in LieConv and LieCConv. In the figure, the red dots represent neighborhood elements, and the directed arrows represent Lie group distances that need to be stored. It is observed that the number of arrows in the right image is significantly less than the number of arrows in the left image. It is easily inferred that the larger the difference between the number of elements and the size of the neighborhood, the more memory storage space is saved.
3.3 Array-Neighborhood Sampling
FPS is a uniform sampling algorithm that achieves uniform sampling by selecting the element farthest from the sample set. The formula is
However, FPS requires Lie group distances from each element to others, which is not suitable for sampling sets with only neighborhood information. A novel sampling algorithm suitable for the above case is proposed, called array-neighborhood sampling. ANS obtains a sample set that reflects the original set well through the neighborhood information of elements. In addition, the sample set generated by FPS is uniformly distributed, while the sample set acquired by ANS retains the distribution of the original set.
\( \left( i\right) \) represents the current element, \( i \in [1,9] \). Triangles represent the elements inside the neighborhood. Circles represent the elements outside the neighborhood. No. represents the number of elements. Times represents the number of times the corresponding element has been selected. For the sake of clarity, random sampling within the neighborhood was not conducted
The design idea of ANS is to select sample elements based on the importance of the feature information contained in each element. Firstly, sort elements according to the importance of the feature information they contain. Convolution is a process that acts on an element and its neighborhood. This paper considers that an element contains the feature information of itself and elements in its neighborhood. When an element is contained in more neighborhoods, the more dense the feature information it is located, and the more important the feature information it contains. Therefore, the importance of an element is measured by the number of neighborhoods that contain it. For each element, randomly sample in its neighborhood according to the sampling ratio, and record the total number of times each element is sampled. An example is given in Fig. 2. Sort the elements according to the number of times each element is sampled, resulting in an ordered queue \( sort\_idx \). At this point, the elements in \( sort\_idx \) that are closer to the head of the queue have more dense feature information around them and contain more important features. The rationale behind random sampling in the neighborhood is to increase the randomness of the sampling process, so that ANS obtains different sampling results for each sampling.
Subsequently, select samples based on \( sort\_idx \). Sampling is taking a portion of the original set to reflect the original set. In order to better reflect the original set, the sample set should contain as much information as possible. Therefore, when sampling \( sort\_idx \), select elements that contain the feature information that does not duplicate the feature information contained in the sample set. Sample sequentially from the head of \( sort\_idx \). After an element u is included in the sample set, the neighborhood elements of u in \( sort\_idx \) are set to a state that can not be sampled. This operation minimizes the duplication of the feature information contained in the sample set, and ensures that elements at sparse locations will not be missed when the number of samples is sufficient to traverse the queue once. After completing the first iteration, the sample set contains almost all the feature information of the original set. Before the next iteration, set all elements to the state that can be sampled. Then, repeat the operation of the first iteration, selecting elements that are not in the sample set from the head of \( sort\_idx \) as samples. Repeat the above operation until sufficient samples are obtained.
Continuing with the example in Fig. 2, the process of selecting five samples from nine elements is shown in Fig. 3. The elements that can be sampled are selected as samples in the order of \( sort\_idx \), such as (1), (2), (3), and (4). When none of the elements can be sampled, the elements not present in the sample set are designated as able to be sampled, as shown in (5). Continue sampling until five samples are obtained, as shown in (6).
From the second traversal onwards, most of the feature information contained in the sampled elements \( y_i^\prime \) has already been included in the sample set \( y_i \) obtained from the first traversal. \( y_i^\prime \) play a complementary and enhanced role for \( y_i \). In addition, since sampling is performed from the head of \( sort\_idx \), the elements in front of the queue are always sampled first. This makes the elements in the dense areas more than those in the sparse areas in the sample set, which reflects the distribution of the original set. This ensures that the sample set focuses more attention on the more important and dense parts of the feature information. ANS is described in Algorithm 1. In order to ensure that the sample set always contains all the feature information of the original set, this paper recommends that the sample set size \( sample\_num \), neighborhood size \( nb\_size \), and original set size n satisfy inequality \( sample\_num * nb\_size \ge n \). Otherwise, the sampling results will lose the feature information contained in the tail of \( sort\_idx \).
4 Experiments
In this section, the memory usage of LieCConv and LieConv was compared and the contrast between ANS and FPS was presented. The results indicate that the BCSS and ANS exhibited advantages. Then the performance of LieCConv was evaluated on RotMNIST, RotFashionMNIST and TT100K.
4.1 Algorithm Comparison
4.1.1 Memory Usage
The calculation and storage scheme for Lie group distances in LieConv determines that it will generate data with a space complexity of \({O \left( n^2 \right) }\), resulting in inefficient memory usage during model learning. In order to improve memory usage efficiency, a batch calculation and storage scheme is designed. For the calculation scheme, a batch calculation scheme is adopted to convert the space complexity into negligible time consumption. For the storage scheme, only the Lie group distances from each element to its neighborhood elements are stored, decreasing the space complexity of storing Lie group distances from \({O \left( n^2 \right) }\) to \({O \left( n * s \right) }\). Following this, a set of comparative experiments were conducted.
The runtime memory usage of LieCConv and LieConv was compared on a computer with a CPU of Intel(R) Core(TM) i7-10700 and 16GB of RAM. Three sets of data were fed into a 2-layer convolutional model. Each set of data had a batch size of 50 and consists of 32 channels. The image sizes for these sets were \( 30*30 \), \( 40*40 \), and \( 50*50 \) respectively. The model had an input channel of 32 and an output channel of 64. In LieCConv, the neighborhood size was 100. When calculating Lie group distances, the distance matrix was divided into 7 blocks. The experimental result is shown in Fig. 4. According to Fig. 4, the training process is divided into two stages. The first stage is the lifting, where the memory occupation is mainly generated by the calculation and storage of Lie group distances. The second stage is the convolution, where the memory occupation is mainly caused by the model.
Combining Fig. 4a and b, it is evident that the memory usage rapidly increased with the enlargement of the input data size. Notably, in the lifting phase, LieConv exhibited a substantial surge in memory consumption. In Fig. 4c, it is observed that the memory requirement during the lifting phase in LieConv exceeded the physical memory capacity, severely impacting performance. Moreover, larger input sizes even lead to memory overflow errors within LieConv. LieCConv effectively reduces memory usage for calculating and storing Lie group distances by leveraging ANS and BCSS, which renders the memory usage during the lifting negligible and decreases memory usage during the convolution. Therefore, LieCConv consumes less memory during training and is able to accept larger input image sizes.
In summary, LieCConv alleviates the limitations caused by the computation and storage of Lie group distances, significantly improving memory usage efficiency.
4.1.2 Sampling Algorithm
FPS provides a uniformly distributed sample set, but it has a high usage condition that requires the distances from each element to all other elements. To solve this problem, array-neighborhood sampling is proposed. ANS only requires the neighborhood information of each element, with a lower usage threshold and wider application scenarios. The sample set obtained by ANS contains all the information of the original set and retains the distribution of the original set.
ANS controls the amount of feature information contained in each element by setting the neighborhood size. Generally, the smaller the neighborhood size, the less feature information each element contains, while the larger the neighborhood size, the more feature information each element contains, but the more ambiguous the feature information is. Therefore, this paper does not recommend taking extreme neighborhood sizes, such as 1 or n, which leads to a poor sampling effect. The sampling results of ANS and FPS were compared in Fig. 5. In the figure, the ds at the left represents the downsampling ratio, the n at the top represents the original set size, and the mc at the upper left corner of each subplot represents the neighborhood size. The experiment used 42 as a fixed random seed.
In order to intuitively reflect the differences between ANS and FPS, they were executed on two sets of randomly generated 100 elements. The sampling results are shown in Fig. 6. The downsampling ratio used in the experiment was 0.5. The neighborhood size used for ANS was 16. From the figure, it is seen that both sample sets well reflect the original set. The distribution of the sample set obtained by ANS is similar to the distribution of the original set, while the sample set obtained by FPS is uniformly distributed.
To further verify the universality and robustness of ANS, ANS was conducted on three sets of random 3D data, as shown in Fig. 7. To better display the sampling results, the transparency of the points in the figure was set based on the sum of the three coordinate values. The larger the sum of coordinate values, the more opaque the point is; the smaller the sum of coordinate values, the more transparent the point is. From the figure, it is seen that ANS still performs well on 3D data.
4.2 RotMNIST
RotMNIST is a dataset composed of data obtained by randomly rotating each data in MNIST, where the random rotation is obtained by uniformly sampling in SO(2). To demonstrate the advantages of LieCConv, LieCConv was compared with LieConv and the baselines used in [9] on RotMNIST without changing network settings and parameters. The results are shown in Table 1. It is seen that LieCConv achieves the same or even better performance as LieConv and the performance is competitive compared to other networks.
4.3 RotFashionMNIST
LieCConv was evaluated on RotFashionMNIST. RotFashionMNIST is obtained by randomly rotating each image in FashionMNIST [29], modeled after RotMNIST. The random rotation is obtained by uniformly sampling in SO(2). A series of experiments were conducted to verify the impact of changes in parameters, model layers and feature dimensions on the experimental results. In the experiments, the cross entropy loss function was used to optimize the model. The loss calculation formula is
The performance of LieCConv with different neighborhood sizes was compared. The experiment used a LieCConv model, which uses a 6-layer network structure and group SE(2), with a learning rate of 0.001. The neighborhood size retained in lifting was 100. The process of calculating Lie group distances was divided into 10 batches. The model eventually generated a feature with a dimension of 128. The channel count, image size and batch size of input images were 1, 28*28 and 50, respectively. In the experiments, the neighborhood size mc was set to 9, 16 and 25 during the convolution process. The formula for calculating the error rate of the model is
The model network structures are shown in Table 2. In the table, the size represents the output size of layers, and the dimension represents the output dimension of layers. The error rate comparison diagram is shown in Fig. 8a. Based on the figure, it is inferred the larger the neighborhood size used in the model, the better the performance of the model. However, increasing the neighborhood size will lead to a larger memory usage and an increase in computational workload, raising the hardware requirements for the device.
Based on the above parameters, the neighborhood size was set to 25. Then the models with different model network structures were trained. In Fig. 8b, the comparison was made among a 6-layer network model with a feature dimension of 128, a 10-layer network model with a feature dimension of 512, and a 12-layer network model with a feature dimension of 1024. From the figure, it is deduced that the performance of the models improved as the number of layers and the dimension of the model deepen as expected. In addition, it is seen that the models with the feature dimension of 512 and 1024 had similar results in the end. Therefore, the model achieved its optimal performance when the feature dimension is 512. The confusion matrix of the 10 layer network model is shown in Table 3.
4.4 TT100K
TT100K is a publicly available traffic sign dataset compiled in collaboration between Tsinghua University and Tencent, consisting of 100000 road images and 30000 traffic signs. The images in TT100K are captured by wide-angle digital single lens reflex cameras with high pixels, covering traffic signs under various lighting and weather conditions, and the categories of traffic signs are relatively comprehensive and reflect the road conditions well. The dataset used in the experiment selected 45 categories with more than 100 signs for training and testing.
Accuracy, precision, recall and F1-Score are used to measure the performance of LieCConv. Accuracy is the percentage of the total sample that is correctly predicted. Precision refers to the proportion of the samples predicted correctly for a category to the number of samples predicted for that category. Recall refers to the percentage of the total number of samples in a category that are predicted correctly. F1-Score is a harmonic indicator that takes into account both precision and recall. The formulas for the four indicators are shown in Eqs. 14, 15, 16 and 17.
where TP represents the number of samples in which both the prediction and ground truth are positive, FP represents the number of samples that are actually positive but predicted to be negative, FN represents the number of samples that are actually negative but predicted to be positive, and TN represents the number of samples in which both the prediction and ground truth are negative.
Table 4 shows the classification performance of LieConv and LieCConv on TT100K. Table 4 shows that the accuracy of LieCConv on TT100K is 97.8% which is close to that of LieConv. It indicates that LieCConv has good performance and robustness in traffic sign classification task.
5 Conclusion
Simplifying and improving the convolution process through more efficient methods is a fundamental objective in the field of machine learning. In order to improve the memory utilization efficiency of Lie group convolution, a Lie group convolutional neural network with efficient memory utilization called LieCConv is proposed. Firstly, a novel sampling algorithm called array-neighborhood sampling is proposed. ANS only requires neighborhood information of elements to obtain a sample set that well reflects the original set, and the sample set has a similar distribution to the original set. ANS provides a sampling algorithm with new characteristics and a low usage threshold for point data sampling. In addition, a batch calculation and storage scheme is designed, which reduces the memory cost of programs involving large matrix multiplication in the process. In this paper, the scheme effectively reduces the space complexity of calculating and storing Lie group distances from quadratic complexity to linear complexity. Finally, a series of experiments were conducted to evaluate the performance of LieCConv. In future research, this paper hopes that ANS and BCSS are applied to the processing of point clouds, graphs, and other data, improving the computational and storage efficiency of data processing.
Data Availability
The RotMNIST dataset, FashionMNIST dataset and TT100K dataset used in the current study are only used for research and can be available at http://www.iro.umontreal.ca/~lisa/icml2007data/mnist_rotation_new.zip, https://github.com/zalandoresearch/fashion-mnist and https://cg.cs.tsinghua.edu.cn/traffic-sign/ respectively.
References
Arora S, Suman HK, Mathur T et al (2023) Fractional derivative based weighted skip connections for satellite image road segmentation. Neural Netw 161:142–153. https://doi.org/10.1016/j.neunet.2023.01.031
Bekkers EJ (2020) B-spline cnns on lie groups. In: 8th international conference on learning representations, ICLR 2020, Addis Ababa. OpenReview.net, https://openreview.net/forum?id=H1gBhkBFDH
Chauhan S, Saxena S, Daniel P (2022) Improved unsupervised neural machine translation with semantically weighted back translation for morphologically rich and low resource languages. Neural Process Lett 54(3):1707–1726. https://doi.org/10.1007/s11063-021-10702-8
Daoud Z, Hamida AB, Amar CB (2023) Fireclassnet: a deep convolutional neural network approach for PJF fire images classification. Neural Comput Appl 35(26):19069–19085. https://doi.org/10.1007/s00521-023-08750-3
Dehmamy N, Walters R, Liu Y et al (2021) Automatic symmetry discovery with lie algebra convolutional network. In: Ranzato M, Beygelzimer A, Dauphin YN, et al (eds) Advances in neural information processing systems 34: annual conference on neural information processing systems 2021, NeurIPS 2021, pp 2503–2515, URL: https://proceedings.neurips.cc/paper/2021/hash/148148d62be67e0916a833931bd32b26-Abstract.html
Ding X, Zhang X, Han J et al (2022) Scaling up your kernels to 31x31: revisiting large kernel design in CNNS. In: IEEE/CVF conference on computer vision and pattern recognition, CVPR 2022, New Orleans. IEEE, pp 11953–11965, https://doi.org/10.1109/CVPR52688.2022.01166
Dorodnitsyn VA, Kaptsov EI, Meleshko SV (2023) Lie group symmetry analysis and invariant difference schemes of the two-dimensional shallow water equations in lagrangian coordinates. Commun Nonlinear Sci Numer Simul 119:107119. https://doi.org/10.1016/j.cnsns.2023.107119
Esteves C, Allen-Blanchette C, Makadia A et al (2020) Learning SO(3) equivariant representations with spherical CNNS. Int J Comput Vis 128(3):588–600. https://doi.org/10.1007/s11263-019-01220-1
Finzi M, Stanton S, Izmailov P et al (2020) Generalizing convolutional neural networks for equivariance to lie groups on arbitrary continuous data. In: Proceedings of the 37th international conference on machine learning, ICML 2020, Proceedings of Machine Learning Research, vol 119. PMLR, pp 3165–3176, http://proceedings.mlr.press/v119/finzi20a.html
Fuchs F, Worrall DE, Fischer V et al (2020) Se(3)-transformers: 3d roto-translation equivariant attention networks. In: Larochelle H, Ranzato M, Hadsell R, et al (eds) Advances in neural information processing systems 33: annual conference on neural information processing systems 2020, NeurIPS 2020, vol 33. Curran Associates, Inc., pp 1970–1981, URL: https://proceedings.neurips.cc/paper/2020/hash/15231a7ce4ba789d13b722cc5c955834-Abstract.html
Gao G, Liu Z, Zhang G et al (2023) Danet: semi-supervised differentiated auxiliaries guided network for video action recognition. Neural Netw 158:121–131. https://doi.org/10.1016/j.neunet.2022.11.009
Gupta KD, Sharma DK, Ahmed S et al (2023) A novel lightweight deep learning-based histopathological image classification model for IoMT. Neural Process Lett 55(1):205–228. https://doi.org/10.1007/s11063-021-10555-1
Han X, Huang X, Sun S et al (2022) 3ddacnn: 3d dense attention convolutional neural network for point cloud based object recognition. Artif Intell Rev 55(8):6655–6671. https://doi.org/10.1007/s10462-022-10165-w
Hassanzadeh T, Essam D, Sarker RA (2022) Evodcnn: an evolutionary deep convolutional neural network for image classification. Neurocomputing 488:271–283. https://doi.org/10.1016/j.neucom.2022.02.003
Janani M, Jebakumar R (2023) Detection and classification of groundnut leaf nutrient level extraction in RGB images. Adv Eng Softw 175:103320. https://doi.org/10.1016/j.advengsoft.2022.103320
Knigge DM, Romero DW, Bekkers EJ (2022) Exploiting redundancy: Separable group convolutional networks on lie groups. In: Chaudhuri K, Jegelka S, Song L, et al (eds) International conference on machine learning, ICML 2022, Baltimore, Proceedings of machine learning research, vol 162. PMLR, pp 11359–11386, https://proceedings.mlr.press/v162/knigge22a.html
Kwon J, Kim S, Park FC (2022) Physically consistent lie group mesh models for robot design and motion co-optimization. IEEE Robot Autom Lett 7(4):9501–9508. https://doi.org/10.1109/LRA.2022.3189806
Liao X, Wu Y, Wang Z et al (2023) A convolutional spiking neural network with adaptive coding for motor imagery classification. Neurocomputing 549:126470. https://doi.org/10.1016/j.neucom.2023.126470
Lippiello V, Cacace J (2022) Robust visual localization of a UAV over a pipe-rack based on the lie group SE(3). IEEE Robot Autom Lett 7(1):295–302. https://doi.org/10.1109/LRA.2021.3125039
Liu F, Gao Y (2022) Lie group analysis for a higher-order boussinesq-burgers system. Appl Math Lett 132:108094. https://doi.org/10.1016/j.aml.2022.108094
Liu Z, Mao H, Wu C et al (2022) A convnet for the 2020s. In: IEEE/CVF conference on computer vision and pattern recognition, CVPR 2022, New Orleans, 2022. IEEE, pp. 11966–11976, https://doi.org/10.1109/CVPR52688.2022.01167
Mahjoubi S, Ye F, Bao Y et al (2023) Identification and classification of exfoliated graphene flakes from microscopy images using a hierarchical deep convolutional neural network. Eng Appl Artif Intell 119:105743. https://doi.org/10.1016/j.engappai.2022.105743
Ren J, Xiong Y, Xie X et al (2023) Learning transferable feature representation with swin transformer for object recognition. Neural Process Lett 55(3):2211–2223. https://doi.org/10.1007/s11063-022-11004-3
Rogers A, Gardner M, Augenstein I (2023) QA dataset explosion: a taxonomy of NLP resources for question answering and reading comprehension. ACM Comput Surv 55(10):197:1-197:45. https://doi.org/10.1145/3560260
Teng S, Chen D, Clark WA et al (2022) An error-state model predictive control on connected matrix lie groups for legged robot control. In: IEEE/RSJ international conference on intelligent robots and systems, IROS 2022, Kyoto, 2022. IEEE, pp. 8850–8857, https://doi.org/10.1109/IROS47612.2022.9981282
Tu Z, Talebi H, Zhang H et al (2022) Maxvit: Multi-axis vision transformer. In: Avidan S, Brostow GJ, Cissé M, et al (eds) Computer Vision - ECCV 2022: 17th European Conference, Tel Aviv, Israel, Proceedings, Part XXIV, Lecture Notes in Computer Science, vol 13684. Springer, pp 459–479, https://doi.org/10.1007/978-3-031-20053-3_27
Vacher J, Launay C, Cagli RC (2022) Flexibly regularized mixture models and application to image segmentation. Neural Netw 149:107–123. https://doi.org/10.1016/j.neunet.2022.02.010
Wu Y, Hu R, Wang X et al (2022) High parameter frequency resolution encoding scheme for spatial audio objects using stacked sparse autoencoder. Neural Process Lett 54(2):817–833. https://doi.org/10.1007/s11063-021-10659-8
Xiao H, Rasul K, Vollgraf R (2017) Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. [dataset], arXiv:1708.07747
Yang W, Zhou X, Chen Z et al (2023) Avoid-df: audio-visual joint learning for detecting deepfake. IEEE Trans Inf Forensics Secur 18:2015–2029. https://doi.org/10.1109/TIFS.2023.3262148
Yang X, Jia X, Gong D et al (2021) Larnet: Lie algebra residual network for face recognition. In: Meila M, Zhang T (eds) Proceedings of the 38th international conference on machine learning, ICML 2021, Proceedings of Machine Learning Research, vol 139. PMLR, pp 11738–11750, http://proceedings.mlr.press/v139/yang21d.html
Yaseen MU, Anjum A, Fortino G et al (2022) Cloud based scalable object recognition from video streams using orientation fusion and convolutional neural networks. Pattern Recognit 121:108207. https://doi.org/10.1016/j.patcog.2021.108207
Yu X, Gao Y, Bennamoun M et al (2023) A lie algebra representation for efficient 2d shape classification. Pattern Recognit 134:109078. https://doi.org/10.1016/j.patcog.2022.109078
Zhao P, Li Y, Tang B et al (2023) Feature relocation network for fine-grained image classification. Neural Netw 161:306–317. https://doi.org/10.1016/j.neunet.2023.01.050
Zhu Z, Liang D, Zhang S et al (2016) Traffic-sign detection and classification in the wild. In: 2016 IEEE conference on computer vision and pattern recognition, CVPR 2016, Las Vegas. IEEE Computer Society, pp 2110–2118, https://doi.org/10.1109/CVPR.2016.232
Zhuang X, Liu F, Hou J et al (2022) Transformer-based interactive multi-modal attention network for video sentiment detection. Neural Process Lett 54(3):1943–1960. https://doi.org/10.1007/s11063-021-10713-5
Funding
This work was partially supported by National Natural Science Foundation of China(Grant Nos. 61972454, 62072215 and 62072291), National Key Research and Development Program of China (Grant No. 2021ZD0112802), and the Key-Area Research and Development Program of Guangdong Province (Grant Nos. 2020B0101090004 and 2020B0101360001).
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by YZ, XL, CT, BQ, AY and FC. The first draft of the manuscript was written by YZ and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
We declare that we do not have any commercial or associative interest that represents a Conflict of interest in connection with the work submitted.
Ethics Approval
Ethics approval was not required for this research. The research in this article does not involve human participants or animals.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Zhang, Y., Luo, X., Tao, C. et al. LieCConv: An Image Classification Algorithm Based on Lie Group Convolutional Neural Network. Neural Process Lett 57, 22 (2025). https://doi.org/10.1007/s11063-024-11691-0
Accepted:
Published:
DOI: https://doi.org/10.1007/s11063-024-11691-0