1 Introduction

In recent years, the interest in artificial intelligence (AI) has been growing constantly. The need to have smart applications has motivated researchers to develop machine learning-based and then deep learning-based algorithms and programs. They were inspired by the human brain’s amazing abilities to learn. Therefore, they aimed to mimic its learning, thinking, and decision-making abilities through these algorithms to add the smartness trait to machines. As a result, various fields bloomed after supporting it with AI such as computer vision [1] where the computers are trained to identify objects from images and videos. Various deep learning (DL) applications appeared, like smart healthcare [2], gaming [3], and social media where the AI helps make an accurate prediction about user preferences [4]. Additionally, its application to health systems where a computer can identify the presence of diseases from MRI and CT images such as breast cancer, COVID-19 [5], etc. The use of deep learning to serve medical purposes is a prosperous and worth investment due to the benefits it brings to humanity. It can boost the accuracy of diagnostics to make it more efficient and fasten the detection process of diseases. It is widely known that early detection and intervention of medical staff increases the chances of survival, especially in critical cases. Profoundly training the computers about medical knowledge can highly assist the doctors. However, this requires transferring their knowledge and expertise in an accurate, organized, and systematic way to the computers. The spread of COVID-19 and the disastrous losses it engendered, with over 252,780,044 cases of infection and over 5,098,342 deaths worldwide reported at the time of writing based on the statistics provided by WHO [6]. Not to mention the drastic negative impact it left on the economy and the financial losses recorded worldwide. Moreover, the influence it left on our daily lives, like the obligatory wear of masks and the need to carry sanitizers everywhere, the travel ban and restrictions as well as the need for health passports and vaccination proofs, etc. Furthermore, to the moment there is no accurate prediction about when will the health safety prevail or when will this virus completely diminish, especially, with new strikes recorded from time to time and the appearance of new variants. This situation pushed researchers to find solutions to it, by finding remedying drugs, and vaccinations or by helping its early detection through the use of DL. Researchers in [7] proposed a framework that embeds a mass surveillance system and multi-model diagnostic system with the B5G (beyond 5G) and deep learning techniques as well as edge and cloud computing to efficiently detect and diagnose COVID-19 and infection like diseases. The need for a prompt response during the pandemic and real-time awareness about the infection spread as well as the exchange of knowledge in a secure and privacy-preserving manner led to the use of multiple technologies such as The Internet of Vehicles (IoV) [8], Unmanned Aerial Vehicles (UAVs) or drones, blockchains, deep learning and computer vision, edge computing, cloud computing, big data, etc. All in all, the need to have fast internet and secure confidential communications were further emphasized as essential needs [9]. Furthermore, the necessity of deploying the smart healthcare systems powered by the internet of medical things and computer-aided diagnostics has once more been highlighted during this pandemic to ensure the safety of healthcare workers, the efficiency of treatment, fast response, and limitation of virus spread [10].

Researchers of [11] used the AI techniques to identify SARS-CoV-2 main protease inhibitors from FDA-approved drugs where they were able to detect 13 positive drugs amongst 20 potentially tested drugs, 6 of which were identified for the first time in their study. Authors of [12] used deep learning pretrained drug-target interaction model to find potential drugs among the commercially available that could act on SARS-CoV-2 viral proteins and inhibit them. In their results, they found multiple potential inhibitor drugs and they highly recommended them to be clinically tested for further confirmation. Similarly, authors of [13] used molecular deep learning training to identify potential inhibitors SARS-CoV-2 Mpro inhibitors. The authors of [14] used generative deep learning models inspired by generative chemistry to design new drug-like inhibitors of 2019-nCoV. Multiple works were published aiming to use DL to find and repurpose drugs from the FDA-approved ones that are commercially available to treat COVID-19 such as in [15]. Since deep learning relies on an accurate dataset, researchers have been focusing on collecting and providing related molecular databases such as CoronaDB-AI [16].

After reading the efforts made by scientists from multiple perspectives in order to elaborate an effective healthcare system against COVID-19. We draw Fig. 1 to summarize all their efforts. The first thing we noticed is the persistent emphasis on the need for a stable, fast, and large bandwidth communication medium for the exchange of medical data represented by the use of 4G, 5G, and B5G, etc. The second high requirement is the secure lightweight communications, the control access of data, and the privacy of patients, leading to the support of edge computing usage. Thirdly, the massive amount of exchanged data and the necessity to share information between medical staff worldwide recommended the use of cloud computing, blockchains, and big data. Fourthly, the need for fast and accurate diagnosis and the processing of massive data to look for remedying drugs or vaccines pushed for the use of deep learning and AI advanced techniques. Lastly, to prevent the spread of the virus in an effective safe way, IoT technology backed by (infrared) cameras, drones, and sensors is recommended to be used. To recap, the made efforts are directed toward four areas which are:

  1. (1)

    Accurately and efficiently diagnosing COVID-19,

  2. (2)

    Finding effective COVID-19 drugs and vaccines,

  3. (3)

    Preventing the COVID-19 from spreading through massive surveillance systems,

  4. (4)

    Ensure public safety and prevent viruses from crossing borders (health passports).

Fig. 1
figure 1

COVID-19 Smart Healthcare System

Motivated by the same reasons, we also aimed in this paper to use DL to discover COVID-19 key enzyme inhibitors. Unlike the mentioned works, our proposed application is original and novel. It aims to use DL techniques to identify the inhibitors from natural resources like aliments, plants, herbs, etc. They are directly edible and they can strengthen the immune system during the COVID-19 infection and prevent virus replications. Encouraging the importance of a healthy diet, and reckoning on food to become a remedy. Therefore, we trained the computer to know which aliments/herbs can be eaten to reduce the negative impacts of the COVID-19 virus on the body of the infected and to reduce its reproductivity rate helping the body to defend itself. Our app was trained based on works published by scientists on key enzyme inhibitors of the COVID-19 virus. We first selected these inhibitors and then researched natural aliments, plants, herbs, and seeds rich with them. Lastly, we fed these data to our developed DL web app. This paper will be organized as follows: Sect. 2 explains the deep learning concept and how it works. Section 3 illustrates the COVID-19 treatment mechanisms. Section 4 describes the proposed application, its role, the dataset collection, and the preparation of training. Section 5 illustrates the testing dataset, phases, and results. Section 6 concludes the paper and highlights our future perspectives.

2 Technical background

Deep learning is a subfield of machine learning that models high-level data abstractions using multiple neurons’ layers consisting of complex structures or non-linear transformations [17]. Noting that in DL these layers are not human-designed rather they are learned from data through general-purpose learning procedures [18]. In DL the high-level features are extracted from low-level features. The DL uses neural networks with multiple hidden layers where the input is processed through the layers to compute the output. There exist various DL architectures each is suitable for a certain type of training datasets such as convolutional neural network (CNN) for images [19], recurrent neural network (RNN) for sequential data, and its extension for variable-length data the long short-term memory (LSTM) [17, 20, 21]. In our app, we used LSTM for the training because the length of our data is variable. The LSTM can learn long-term dependencies and remember them. It is composed of layers and memory blocks noted as cells (see Fig. 2) [22]. The data can be added or removed through a sigmoid gate. Noting that a gate is a layer or a sequence of matrix operations containing various weights. The initial step in composing the LSTM network is eliminating and excluding useless data through the sigmoid or forget gate. This process is repeated at each inner layer to eliminate unnecessary data and store the important ones in the cells until the final output is calculated [23]. In our application, we used an LSTM network trained using our textual data. The data is a set of aliment/plants/herbs/seeds where the COVID-19 key enzyme inhibitors are found or absent. Our data was stored as json file seeded to our web app that uses LSTM neural networks for the training. The web application is developed using HTML, CSS, and JavaScript. For the deep learning functions and networks, we used brainjs [24] which is a GPU accelerated neural network implementation in JavaScript for use in browsers and via nodejs [25].

Fig. 2
figure 2

LSTM Cell and its operations

3 COVID-19 treatments review

We have screened different databases about the number of published works until the time of writing of this paper (26th, January 2022) on COVID-19 vaccines, we have found more than 342 000 published papers in google scholar, about 19,793 papers in the science direct database [26] and 2875 paper in nature database [27]. Actually, we have different variants of COVID-19, the most spread is Omicron followed by Delta, we have recorded 106 published papers on nature database just for omicron, this number is higher in other databases. This proves the international serious interest in curing this disease. For the natural SARS-CoV-2 inhibitors obtained from plants; we have discovered more than 50, 400 papers on google scholar, nearly 9200 papers on science-direct, with 529 papers on the nature database. We focused on selecting inhibitors abundant in our daily food intakes to facilitate balancing our diet during infection with this virus and boosting our immune system. We have collected these data according to the high potent inhibitors published in previous works with in vitro; in vivo and in silico studies.

Scientists around the world have used common drugs for other viruses like SARS-CoV-1, MERS-CoV, and HIV as well as antimalarial drugs were inspected for activity against SARS-CoV-2 [28]. Including Remdesivir (designed for Ebola virus), Lopinavir / Ritonavir (designed for HIV) has inhibitory activity on SARS-CoV-2 Main protease (Mpro) [29], chloroquine and hydroxychloroquine (designed for malaria), and Tocilizumab (designed for rheumatoid arthritis), in addition, other potential drugs from existing antiviral agents have also been proposed [30, 31]. PAXLOVID™ (PF-07321332) is the last developed drug (one pill against COVID-19) by Pfizer which is an inhibitor of the SARS-CoV-2 main protease (3CLpro); this latter binds covalently to the catalytic amino acid Cys145. In November 2021, Pfizer reported positive Phase 2/3 results, with an 89% reduction in hospitalizations if administered within three days of symptom onset [32, 33, 34].

China was the first country that used Medicinal plants for COVID-19 treatment, more than 23,600 published papers on the most used medicinal plants, especially in China; they have used a well-known mixture in their folk medicine to treat COVID-19 patients [35] as they describe it in the published reviews [36, 37]. According to these reviews, the top ten Chinese herb medicines used were Maxing Shigan Tang, Lianhua Qingwen granule/capsule, Xuebijing injection, Dayuanyin, Shufeng Jiedu capsule, Qingfei Paidu Tang, Xiaochaihu Tang, Ganlu Xiaodu Dan, Liujunzi Tang and Toujie Quwen granule [36].

Numerous studies identified different proteins of the SARS-CoV-2 as keys to its development and replication; The 3C-Like proteinase (3CLpro) of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the first drug target due to its role in the replication of the virus [38]; Following studies have identified different drug targets as these enzymes or known as non-structural proteins NSP, which include NSP1-NSP16. NSP3 (papain-like protease), NSP5 (3-chymotrypsin-like protease “3-CL-protease”), NSP12 (RdRp: RNA-dependent RNA polymerase), NSP13 (helicase), NSP14 (N7-methyltransferase), and NSP16 (2′-O-methyl transferase) are important viral enzymes while NSP7-NSP10 are regulatory proteins [39], with more than 1600 3D structures published in PDB [40]. Meta-analysis was achieved to determine the different published PDB structures of the non-structural proteins; Table 1 groups the major potent natural compounds found until now as potential inhibitors of the 3CLpro through clinical trials describing their possible sources. The results of these meta-analyses were set as the necessary data for training the DL application.

Table 1 Natural chemical compounds as SARS-CoV-2 3CL-pro potential inhibitors

4 Proposed application

The implementation of our applications followed multiple steps which are illustrated in Figs. 3 and 4. The most tedious and time-consuming task was searching for COVID-19 inhibitors, with the huge number of papers published about COVID-19 (See Sect. 3), filtering them and selecting those about inhibitors then looking for their natural sources like aliments, plants, seeds, and herbs was effort-demanding. After collecting all this data (Fig. 3. Steps 1 and 2), we had to think of how to represent it and seed it to the neural network. The training dataset is represented as a json file composed of multiple entries; each entry has two fields. The 1st represents the natural source, the 2nd represents whether or not this natural source contains inhibitors (Fig. 3. Step 3). The next step was to select the appropriate learning model to use which is LSTM (Fig. 3. Step 4).

Fig. 3
figure 3

Application building steps

Fig. 4
figure 4

Application building process

Steps 5 and 6 in Fig. 3 were to implement the backend which contained the brainjs LSTM training network code on the training data composed of 175 aliment/plant containing 13 COVID-19 enzymes inhibitors, the LSTM parameters are resumed in Table 2. Noting that the learning rate defines the training speed which is slow if close to 0, and quick if close to 1. We checked these parameters from brainjs documentation [24]. The training was done on a quad-core i7 with NVIDIA GeForce GT 750 M graphic card. Upon confirming that the backend works, we started building the front-end, we used HTML, CSS for the interface and its styling, and JavaScript to implement its functioning (Fig. 3. Step 7). The front-end is simple, in the current version, the user (or researcher) needs to provide the name of a natural source and the application will check whether it contains one of COVID-19 key enzymes’ inhibitors that it was trained to recognize. To be able to run the functionalities of the backend from the browser, we used browserify tool [76] like in [77] (Fig. 3. Step 8). Noting that as illustrated in Fig. 4, we refined the implementation and the training parameters multiple times. The refining was done after multiple rounds of testing and result analysis. It aimed to enhance the accuracy of the identification of the app before finally providing it to the users. The testing details and parameters and their impact on the results are illustrated in the next section.

Table 2 LSTM parameters

5 Testing and analysis

To test our application, we tried various sources and aliments to check if they inhibit or not the COVID-19 enzymes. The sizes of training and testing datasets are illustrated in Table 2. We tested the application multiple times by changing the training parameters (number of iterations and errorThresh) to get more refined and accurate results. The accuracy level of our application against the testing dataset in each test is recapped in Table 3. We choose to evaluate our app by calculating the accuracy level using Eq. 1 and the error rate using Eq. 2. The average obtained accuracy level varied between 54% in Test 1 to 74% in Test 4. Consecutively, the error rate varied between 36 to 26% respectively. Figure 5 depicts the average obtained results in each test. Overall, it can be observed that the refining process which required tuning the training parameters helped in improving the training accuracy and lowing the error rate.

Table 3 Testing results recap
Fig. 5
figure 5

Overall Obtained results in each Test

While evaluating the training accuracy and error rates, we decided to further investigate the types of errors given by our trained application. We classified the errors into the false positive and false negative. The false-positive represents the mistake our app gave when it classified an element containing COVID-19 inhibitors as a non-inhibitor-container. Its rate is calculated using Eq. 3. The false negative is the mistake our app made when it classified an element that does not contain COVID-19 inhibitors as an inhibitor-container. Its rate is calculated using Eq. 4.

Table 3 illustrates that as we were refining the results, we were able to reduce the false-positive errors from 73 to 42% in tests 1 and 4 respectively. On the contrary, the false-negative rate increased from 27% in Test 1 to 58% in Test 4. Noting that in Test 2, we obtained the lowest false-negative rate (25%), although the false-positive rate was the highest (75%). Figures 6, 7, 8 and 9 detail the identification accuracy of each inhibitor as well as the error rate (false positive and negative) in each test. The error rate was due to the nature of the dataset which requires further processing. Currently, some of the aliments in the dataset are given by their scientific names, some have simple names while others contain acronyms or special characters and are composed of multiple words. The variable length and the non-uniform format are what are causing the moderate accuracy and engendering errors.

$$ Accuracy = \frac{Correct\,\, identification}{{Size\, of\, test\, data\, set}} $$
(1)
$$ Error\, Rate = \frac{Incorrect\, identification}{{Size\, of \,test\, data \,set}} $$
(2)
$$ False\, Positive = \frac{NRC }\,{{Incorrect\, identification}} $$
(3)
Fig. 6
figure 6

Detailed Results of Test 1

Fig. 7
figure 7

Detailed Results of Test 2

Fig. 8
figure 8

Detailed Results of Test 3

Fig. 9
figure 9

Detailed Results of Test 4

where, \({\varvec{NRC}}: Number\, of \,rejected\, correct\, values\)

$$ False\, Negative = \frac{NAW}{{Incorrect \,identification}} $$
(4)

where, \({\varvec{N}}{\varvec{A}}{\varvec{W}}: Number\, of \,accepted\, wrong\, values\)

6 Conclusion and future perspectives

In this paper, we explained the approach we followed to create a DL-based web application to check the natural sources' continency of COVID-19 key Enzymes’ inhibitors. As we have explained earlier, the current application replies only with yes or no based on what it was trained to know. This application aims to draw the attention of the scientific community world widely to this approach to exchange more data that could help train the app. It could serve also as a large database for scientists working on COVID-19 enzymes inhibition field to check for more potential aliments rich with them before the lab tests. Moreover, the same plants and aliments have different names depending on the regions and countries, some plants although different may have the same key effective compounds. Therefore, we aim to train the app to know the potential sources for COVID-19 enzyme inhibitors by their composition not only their names. Succeeding in this purpose depends on the availability of data regarding the sources rich with these inhibitors and the key compositions of these sources. Currently, we are working on gathering, filtering, and sorting this dataset for the future release of the second version of this application.