Machine Learning Predicts the Level of Disease Spread

: The aim of the research is predictive analysis of the spread of disease. Variable analysis at the population level in a region and the total disease events detected in the community. These variables can show the accuracy and certainty of the status of the resulting analysis. The concept of Machine Learning analysis is proposed to develop previous analysis models. The methods used include the K-Means cluster, Naïve Bayes, and Decision Tree (DT). There are two stages in the analysis process: pre-processing and classification. The discussion presented by K-Means provides a classification analysis pattern. The patterns obtained will be passed on to the classification process using Naïve Bayes and DT. Naïve Bayes results provide quite significant results with an accuracy rate of 83.33%. DT can also describe the results of information and knowledge analysis in the form of decision trees. DT produces decision trees that can provide knowledge and information analysis. The DT results provide an accuracy rate of 91.76% so these results can be used as consideration in decision making. The resulting information and knowledge can be used as a guide in making policies for handling health in the community.


Introduction
The public health management approach makes use of the degree of illness spread as one of its tools (Carroll et al., 2014).According to Wiesinga et al. (2020), this category of disease typically includes tuberculosis, diarrhea, hypertension, influenza, and other illnesses.These illnesses have a rather high percentage of 79.93% that result in agony, paralysis, and even death.Since the spread rate is widespread across the country, it plays a significant role in the present health issues (Bertozzi et al., 2020).The optimal alternative option can be found by using the predictive analysis technique to actively create a model (Sghir et al., 2023;Lepenioti et al., 2020).
To achieve the intended outcomes, predictive analysis has been extensively developed for a variety of challenges (Lopes et al., 2020).To do prediction analysis, these different models employ a variety of techniques (Toma & Wei, 2023;Shipe et al., 2019).The idea of machine learning (ML) incorporates the analytical model.According to Petropoulos et al. (2022), this model has proven to be quite effective in carrying out the prediction procedure.When machine learning operates at its best, it produces reasonably accurate output.Growing research on machine learning has produced important graphics that address global health issues.The identification and prediction method reveals this issue.Here, machine learning will also be utilized to conduct a prognostic analysis procedure about the state of illness types' propagation throughout society (Santangelo et al., 2023;Tuli et al., 2020).
K-Means clusters, Naïve Bayes, and Decision Tree (DT) are the techniques that will be employed in the machine learning-based prediction analysis process (Elbasi et al., 2023).This approach has the potential to provide the intended outcomes more successfully.A technique for classifying data based on mathematical computations is the K-Means cluster (Wang et al., 2022;Li et al., 2023).A well-liked technique in the big data notion is K-Means (Mussabayev et al., 2023;Gul & Rehman, 2023).To generate prediction patterns, this method relies on pre-processing.It is possible to demonstrate that the patterns found are useful for completing the predictive analysis procedure.The outcomes of the K-Means cluster pre-processing will be used in the Naïve Bayes idea analysis.
The most recent prediction analysis method described in the analysis model is offered in this study.Preprocessing steps and a prediction method were used to create the model.Furthermore, a two-way strategy for updating the analytical model is described, depending on the population distribution and the number of disease cases in the community.Depending on the results produced, this model can offer updates.In general, the analysis's findings will quantify the degree of precision and error to evaluate sensitivity and performance.In doing so, connected parties can take into account the advantages derived from the analysis process' output when managing public health.

Method
Pre-processing and prediction analysis are the two phases of the machine learning concept-based prediction analysis process.El-Hussein et al. (2021) state that the K-Means cluster, Naïve Bayes, and Decision Tree (DT) are the techniques and algorithms employed.The description of the research stages can be seen in Figure 1.

Figure 1. Research stages
The prediction analysis procedure is outlined in Figure 1 and starts with data analysis based on infection cases and population size.To obtain patterns for classification analysis, the K-Means cluster technique is used for pre-processing before moving on to the classification step.Naïve Bayes will be used to carry out the prediction process to produce the analysis pattern, to achieve the highest number of prediction results.The Decision Tree (DT) approach will be used to gather data and knowledge as the prediction analysis stage progresses.

Data Collection
Disease infection data was taken from 2019 to 2021, Pesisir Selatan District Health Service.Indicators for variable testing are population and number of diseases (Table 1).(1) Where: K = index cluster J = number of clusters Xi = Cluster data

Naïve Bayes
The Naive Bayes method, sometimes known as Bayes' Theorem, forecasts future possibilities based on past performance.The Naïve Bayes Classifier's primary feature is its extremely strong (naïve) assumption of each condition's independence, which enables us to generate an exhaustive review.By putting this idea into practice, predictive analysis can learn more effectively and produce better resuts (Herodotou et al., 2019).Equation 2, displays the Naive Bayes Algorithm equations.DT uses data filters in making decisions and testing validation.DT in the form of decision trees plays a role in helping complex data and generating insights and information.DT works with mathematical principles in making decisions (Parra et al., 2023).The DT equation can be seen in equation 3.

Result and Discussion
In machine learning, the Naïve Bayes algorithm is a popular concept.According to Taye (2023) and Sarker (2021a), this approach is a supervised learning idea that can yield outcomes with a respectable level of accuracy.Alzubaidi et al. (2021) and Ciaburro et al. (2022) define training data as pre-existing data, also known as original data, and training data as data that will be evaluated on pre-existing data to produce output.Naïve Bayes performs analysis based on these two types of data.Since many issues have been effectively overcome, this concept is still evolving.Naïve Bayes is predicted to perform optimally when examining predictions of disease transmission.The learning process will be optimized to generate output by maximizing the training and testing phases (Li et al., 2022).This will yield the desired results.The degree of accuracy and error in the output presentation shows the naïve Bayes performance.The approach will yield output results that will then undergo additional analysis to furnish the necessary information and understanding (Ahmed et al., 2023).
The Decision Tree (DT) idea will be employed in the analytic process going forward (Lin & Fan, 2019;Purwanto et al., 2022).It is envisaged that machine learning, which uses a variety of ways to conduct analysis, would be able to offer a structured process and produce output that is exact and correct (Javaid et al., 2022;Aldoseri et al., 2023).
Machine learning can be used to create predictive models in physical science, such as predicting the path of particles in nuclear experiments (Huang et al., 2023).Machine Learning has become an important part of many industries and sectors, including the nuclear field.In today's digital era, a lot of data is generated from various nuclear systems and equipment, and the use of Machine Learning can help in collecting, analyzing, and utilizing that data to optimize nuclear operations and improve safety.In the nuclear field, Machine Learning can be used to develop nuclear accident prediction models, incorporate radiation levels, optimize the use of nuclear fuel, and even in the development of new technologies for nuclear reactors (Sandhu et al., 2023).

Pre-Processing Analysis
Optimizing the prediction process is the goal of the pre-processing analysis stage.superior and more organized analysis can be obtained with this technique, leading to superior output outcomes.The K-Means cluster technique is employed in this pre-processing analysis.Data can be grouped by this method according to how closely related the data are to one another.Table 2 displays the findings of the pre-processing analysis performed with K-Means clusters.2, the cluster results offer a pattern for classification analysis based on data groups at the infection rates by illness type status level.A level of distribution with 8 items in high status (C1), 2 items in medium status, and 5 things in low status are displayed by the cluster results.Table 2 shows that there are three status categories-high, medium, and low status-for the transmission of infectious diseases.A classification procedure will be used with the preprocessed data to determine the distribution of infection cases by disease type.

Prediction Analysis
According to the number of afflicted cases, the prediction process in the discussion seeks to determine the degree of disease type spread (Ahmad et al., 2021;Keshavamurthy et al., 2022).In this instance, the Naïve Bayes method-which can learn with superior output results-is used to start the analytic process with training and test data.Using the same learning concept-using training and test data-Naïve Bayes can also be applied in the situation of disease prediction.There is a reasonable amount of accuracy in the outcomes.To do testing, this approach first learns the patterns that have evolved in the training data.The goal of this learning is to obtain the optimal choices for the prediction analysis procedure.Figure 2 illustrates the steps of the Naïve Bayes prediction method.Training data is connected to the process, and it is then connected to an applied model that contains test data already, allowing us to assess the method's performance.Figure 3 shows the outcomes of the Naïve Bayes procedure.
The output of the Naïve Bayes prediction study, which produced quite decent results, is explained in Figure 3.The accuracy number of 83.33% indicates these outcomes, indicating that the Naïve Bayes method approaches maximum performance in carrying out the prediction process.A plot view that shows the distribution of the data being processed can also be used to demonstrate the effectiveness of Naïve Bayes output.These findings are sufficient to demonstrate that Naïve Bayes is capable of performing a predictive analytic procedure on the state of the various disease types' rate of spread.

Figure 3. Naïve bayes results
The output of the Naïve Bayes prediction study, which produced quite decent results, is explained in Figure 3.The accuracy number of 83.33% indicates these outcomes, indicating that the Naïve Bayes method approaches maximum performance in carrying out the prediction process.A plot view that shows the distribution of the data being processed can also be used to demonstrate the effectiveness of Naïve Bayes output.These findings are sufficient to demonstrate that Naïve Bayes is capable of performing a predictive analytic procedure on the state of the various disease types' rate of spread.The next step in the analysis process will be to investigate information derived from Naïve Bayesanalyzed prediction patterns.A knowledge-based format for output can be presented by the Decision Tree (DT) approach.According to its concept, DT analyzes data to uncover knowledge and hidden information.The DT idea will be applied to a classification analysis procedure that will concentrate on two dimensions: population level and number distribution numbers (Rupp et al., 2024;Ishaque et al., 2023).This two-way study aims to identify data and knowledge as perceived from many angles.

Figure 4. Results of decision tree analysis based on population level
In Figure 4, DT takes the form of a decision tree that produces knowledge and information.Residents aged > 45 years have a greater chance of contracting the disease.Moderate probability for populations < 30 years old and populations aged 31-45 years.Analytical techniques are used to estimate the level of disease spread (Figure 4).An illustration of the analysis of the findings can be seen in Figure 5.

Figure 5. Results of decision tree analysis based on the rate of spread of disease types
The results are shown in Figure 5 using information consistent with the analysis's findings.Based on the quantity of infection cases submitted, these outcomes are shown.In this instance, machine learning-based predictive analysis can characterize the disease-specific distribution status.The ID3 and C4.5 algorithms are Decision Tree (DT) methods for determining electrical power in physics (Tanjung et al., 2016).

Conclusion
Predictive analysis using machine learning concepts provides quite good results in presenting information about the spread of cases of disease types.The two ways that the analytical process's result is presented are through classification based on information about the population's distribution and the number of infected cases.These results were obtained through initial pre-processing using K-Means clusters to produce classification analysis patterns which will then become an illustration of Naïve Bayes learning.Where from these results Naïve Bayes provides a percentage of output accuracy or an accuracy level of 83.33%.The prediction results obtained can also be described in the form of a decision tree using the decision tree method with an average level of accuracy of 91.76%.The resulting decision tree contains a knowledge base that can be used as a control medium to handle spikes in the number of cases of disease spread.In this way, the analysis results obtained can be used as choices in decision-making.
(|)P(c)  () is a specific class P(c|x): Probability of a hypothesis based on conditions (posterior probability) P(c) : Hypothesis probability (prior probability) P(x|c): Probability based on the conditions in the hypothesis P(x) : Probability c Decision Tree (DT)

Figure 2 .
Figure 2. Naïve bayes process In Figure 2, a naïve Bayes process is depicted.Training data is connected to the process, and it is then connected to an applied model that contains test data already, allowing us to assess the method's performance.Figure3shows the outcomes of the Naïve Bayes procedure.

Table 1 .
Variable Analysis of the Infectious Disease Spread Classification Status K-Means ClusterK-Means Cluster works by using information and knowledge to identify data patterns.K-Means Cluster is used in machine learning in formula 1. ℎ ℎ ∑ ∑ || − µ|| 2 2 €  =1

Table 2 .
K-Means Cluster Preprocessing Results