A machine learning screening model for identifying the risk of high-frequency hearing impairment in a general population

Wang, Yi; Yao, Xinmeng; Wang, Dahui; Ye, Chengyin; Xu, Liangwen

doi:10.1186/s12889-024-18636-1

Research
Open access
Published: 25 April 2024

A machine learning screening model for identifying the risk of high-frequency hearing impairment in a general population

Yi Wang^1,2^na1,
Xinmeng Yao¹^na1,
Dahui Wang³,
Chengyin Ye³ &
…
Liangwen Xu¹

BMC Public Health volume 24, Article number: 1160 (2024) Cite this article

300 Accesses
Metrics details

Abstract

Background

Hearing impairment (HI) has become a major public health issue in China. Currently, due to the limitations of primary health care, the gold standard for HI diagnosis (pure-tone hearing test) is not suitable for large-scale use in community settings. Therefore, the purpose of this study was to develop a cost-effective HI screening model for the general population using machine learning (ML) methods and data gathered from community-based scenarios, aiming to help improve the hearing-related health outcomes of community residents.

Methods

This study recruited 3371 community residents from 7 health centres in Zhejiang, China. Sixty-eight indicators derived from questionnaire surveys and routine haematological tests were delivered and used for modelling. Seven commonly used ML models (the naive Bayes (NB), K-nearest neighbours (KNN), support vector machine (SVM), random forest (RF), eXtreme Gradient Boosting (XGBoost), boosting, and least absolute shrinkage and selection operator (LASSO regression)) were adopted and compared to develop the final high-frequency hearing impairment (HFHI) screening model for community residents. The model was constructed with a nomogram to obtain the risk score of the probability of individuals suffering from HFHI. According to the risk score, the population was divided into three risk stratifications (low, medium and high) and the risk factor characteristics of each dimension under different risk stratifications were identified.

Results

Among all the algorithms used, the LASSO-based model achieved the best performance on the validation set by attaining an area under the curve (AUC) of 0.868 (95% confidence interval (CI): 0.847–0.889) and reaching precision, specificity and F-score values all greater than 80%. Five demographic indicators, 7 disease-related features, 5 behavioural factors, 2 environmental exposures, 2 hearing cognitive factors, and 13 blood test indicators were identified in the final screening model. A total of 91.42% (1235/1129) of the subjects in the high-risk group were confirmed to have HI by audiometry, which was 3.99 times greater than that in the low-risk group (22.91%, 301/1314). The high-risk population was mainly characterized as older, low-income and low-educated males, especially those with multiple chronic conditions, noise exposure, poor lifestyle, abnormal blood indices (e.g., red cell distribution width (RDW) and platelet distribution width (PDW)) and liver function indicators (e.g., triglyceride (TG), indirect bilirubin (IBIL), aspartate aminotransferase (AST) and low-density lipoprotein (LDL)). An HFHI nomogram was further generated to improve the operability of the screening model for community applications.

Conclusions

The HFHI risk screening model developed based on ML algorithms can more accurately identify residents with HFHI by categorizing them into the high-risk groups, which can further help to identify modifiable and immutable risk factors for residents at high risk of HI and promote their personalized HI prevention or intervention.

Peer Review reports

Introduction

Hearing impairment (HI) is the most common sensory impairment and has become a public health concern worldwide [1, 2]. HI is often considered an invisible disability. It is characterized by reduced hearing sensitivity and loss of speech understanding caused by degeneration of the cochlea, the auditory nerves or both [3]. This disorder has caused substantial social and economic burdens on adults in China. In 2011, 2013, and 2015, the direct costs attributable to HI of middle-aged and older people aged 45 and above in China were $50.699 billion, $81.783 billion and $106.777 billion, accounting for 3.43, 4.54 and 5.54% of the overall healthcare costs in the same year, respectively [4]. Currently, pure-tone hearing tests are the gold standard for identifying HI, which requires expensive audiological equipment and trained professionals, leading to limitation the feasibility of utilizing pure-tone hearing tests for mass screening of HI at the community level. Moreover, relevant studies have shown that due to the lack of professional hearing health care personnel and insufficient allocation of HI detection equipment in China's grassroots communities, residents generally lack routine HI detection services, resulting in timely detection of early HI lesions and high-risk group [5, 6]. Therefore, there exists a pressing need to establish a practical and accessible screening tool tailored for residents with HI, with the aim of mitigating the progression to clinically significant HI [7]. Many HIs, defined as audiometric losses, usually start from higher-frequency deterioration, then develop slowly and evolve gradually to lower-frequency or speech-frequency dysfunction [8]. Primary screening for high-frequency hearing impairment (HFHI) could be the key to starting early prevention and intervention.

Currently, well-known HI risk factors in the general population include age [9], genetics, behavioural factors (e.g., smoking and exercise) [10], environmental exposures (e.g., noise exposure), health care utilization factors (e.g., immunization and antibiotics), and chronic disorders (e.g., hypertension and diabetes) [11]. Furthermore, various biomarkers involved in inflammation (such as increased white blood cell (WBC) counts, neutrophil (NE) counts, monocyte (MO) counts, and lymphocyte (LY) counts) and metabolic parameters (such as low-density lipoprotein (LDL) and high-density lipoprotein (HDL)) are also recognized as risk indicators of HI [12]. Chronic changes in inflammatory status that accompany the ageing process, known as inflammation, may cause or accelerate long-term damage to the hearing system [13]. Red blood cell distribution width (RDW), a parameter used to classify anaemia, was recently reported to be associated with inflammation and microcirculation disorders [14]. HDL and LDL have been reported to affect blood supply and thus may potentially influence sudden sensorineural HI [12]. While several studies have attempted to investigate the relationships between HI and blood inflammation and metabolic parameters, limited studies have used these parameters to predict HI. Notably, in recent years, Chinese community residents have regular annual physical examinations, and blood tests (blood routine and biochemical) during physical examinations are routine tests, and doctors usually extract only a small part of the information hidden in the results of routine blood tests [15]. Thus, in the era of big data, conducting more comprehensive analyses by integrating existing community blood test data with questionnaire data holds significant promise for elucidating HI-related biomarkers, identifying individuals with HI, and proactively alerting community residents about their risk of HI.

In the area of HI assessment, various tools, such as screening scales, apps, tablet-based and computer-based devices, and internet-based platforms, have been developed by researchers. While these products are widely utilized in patients with HI, the existing assessment scales primarily focus on evaluating the functional aspects and consequences of HI, especially in the elderly population [16]. Despite the availability of low-cost automated hearing tests for remote screening, their accuracy and accessibility are constrained by factors such as environmental noise, equipment reliability, and data security [17].

Machine learning (ML), situated at the intersection of statistics and computer science, is a scientific field dedicated to understanding how computers learn from data. ML algorithms, which are capable of capturing complex and unpredictable patterns in human physiology, have shown promise in audiology [18]. To date, most HI prediction models based on ML algorithms have been built for noise-related workers, and their predictors are usually noise exposure level and exposure time, which are not suitable for the general population [19]. The existing HI prediction models for the general population are limited. One study established a simple screening tool for moderate-to-severe HI in Shanghai communities, and the accuracy of the testing set reached 91.92%, but the study subjects were mainly elderly people [20]. Another study predicted HI based on a decision tree-based algorithm and obtained an area under the curve (AUC) value of 0.870, but this study mainly focused on speech-frequency HI and limited the early prevention of HI [21].

Therefore, this study aimed to develop an HFHI ML risk screening model for the general population by using routine physical examination data (blood biomarkers and liver function indicators) and demographic and lifestyle risk factor data from community residents. The performances of the seven most commonly used ML algorithms for screening HFHI in the general population were further compared to determine the best algorithm. We envision that this model can be used directly in community settings to screen populations for HFHI risk in combination with electronic case information to provide guidance and assistance for personalized HFHI prevention or intervention in the community.

Materials and methods

Study design and dataset

To develop the optimal HI screening model, a multistage stratified cluster sampling method was used to conduct a population-based cross-sectional survey of 7 health centres in Hangzhou, Jiaxing, Huzhou, Quzhou and Lishui from 2016 to 2018 based on the geographical distribution of Zhejiang Province and the stratification of city size, city and county. Considering the sample effective rate and the prevalence rate of HFHI, the sample size was calculated to be at least 1436 people by using the current study formula, and 4010 people were ultimately investigated. After deleting and filling in the missing data, 3371 people were included. During the specified timeframe, all individuals undergoing physical examinations at these health centres were asked to ascertain their willingness to participate in our study. Those who agreed to participate were included in the study and asked to sign an informed consent form. After completing the informed consent process, all participants completed a comprehensive assessment, including a questionnaire survey, an audiometric assessment and a blood test, during their annual physical examination (Fig. 1). Inclusion criteria are as follows: (1) Residents from Zhejiang Province who have lived in the area for more than one year and are over 18 years old; (2) No self-reported occupational noise exposure; (3) No previous medical history of otitis media, craniocerebral injury, or ototoxic drug use or detonation deafness; (4) No history of inflammation or fever within 30 days before audiometry test. The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Institutional Review Board of Hangzhou Normal University Ethics Committee (grant number: 2017LL107), and all personal privacy information was well protected and removed during the process of analysis and publication.

Audiometric assessment

All pure-tone air-conduction hearing thresholds were measured by trained researchers using audiometers (AT235; Interacoustics AS, Assens, Denmark) with supra-aural headphones (TDH-39; Telephonic Corporation, Farmingdale, USA). Subjects were advised to stay away from noisy conditions for more than 12 h prior to the hearing test to improve test accuracy. Pure-tone air conduction hearing thresholds were tested in both ears at frequencies of 3 Kilohertz (kHz), 4 kHz, 6 kHz and 8 kHz over an intensity range of -10 to 110 decibel (dB(A)). In the audiometric examination, participants who did not respond at least once were considered nonresponsive. To measure the reliability of participants' responses, the 1 kHz frequency was tested twice in each ear. An unreliable response was considered if the results differed by more than 10 dB(A), and then the assessment was performed again. Pure tone averages were computed across fixed frequency bands (3 kHz, 4 kHz, 6 kHz and 8 kHz) or across affected frequencies, and the result of > 25 dB(A) with a worse-hearing ear was diagnosed as HFHI [22].

Questionnaire survey

The questionnaire survey and audiometry test were conducted on the same day. The original questionnaire was developed and revised through expert consultation and included questions about demographics, symptom and disease histories, behavioural factors, environmental exposure and cognitive parameters (Additional file 1). Then, a pre-participation survey involving 926 participants was administered. The results showed that the developed questionnaire reached a Cronbach's α coefficient of 0.753 and a Kaiser–Meyer–Olkin (KMO) value of 0.794 [23,24,25].

Blood sample collection and laboratory testing methods

First, research participants adhered to fasting conditions before morning blood collection, during which medical personnel obtained peripheral venous blood samples (the same day as the audiometric assessment and questionnaire survey). Each sample received a unique serial number corresponding to the participant's questionnaire. On-site investigators utilized sterile syringes to carefully extract 1 ml of daily collected EDTA-anticoagulated whole blood. The collected blood was subsequently distributed into sterile Eppendorf tubes, sealed, and subjected to a meticulous verification process involving cross-checking the participant's name and serial number. Finally, each appropriately labelled sample was securely stored at -80 °C for subsequent analysis.

Candidate indicators

To develop HFHI risk screening models, a total of 68 variables were collected from the questionnaire survey and blood test for modelling purposes. Demographics, symptom and disease histories, behavioural factors, environmental exposure, and hearing cognition were included in the structured questionnaire. The data were collected and measured under standardized conditions following uniform procedures. The details of the candidate indicators are shown in Table 1.

Table 1 Variables included in this study

Full size table

Statistical analyses

Data pre-processing and feature selection

Epidata V.3.1 was used for survey data entry, checking and error correction. All the statistical analyses were carried out using R software V.4.3.2. For data pre-processing, we first excluded participants for whom more than 5% of the haematological data were missing. Second, for participants who had less than 5% missing data for haematology features, we imputed the missing data within each health centre using values generated by multiple interpolation without changing the distribution of the observed data [26]. Overall, 3371 participants were included in the cohort. Third, since the normal reference value ranges varied in haematology tests of different health centres (Additional file 2), for the purpose of future generalizability of the screening model, we recoded the haematological test indicators into three-level categorical values (i.e., low, normal and high) according to their corresponding reference values. Finally, a total of 3371 subjects and 68 covariates were included in the subsequent analysis. Univariate comparisons were conducted between HFHI patients and controls. Categorical variables are shown as n (%). Differences among groups at baseline were analysed by the chi-square test (for categorical variables). All p values were two-tailed, and p < 0.05 was considered to indicate statistical significance at this stage.

Model development and validation

After performing univariant feature selection, 58 indicators were selected for subsequent model construction. To both construct and evaluate the models, 2/3 (n = 2218) of the study population was randomly generated as the training set, and the remaining 1/3 (n = 1153) was used as the validation set (Fig. 1). Seven different ML algorithms were adopted in the training set to construct 7 distinct predictive models. These algorithms include naive Bayes (NB) [27], K-nearest neighbours (KNN) [28], support vector machine (SVM) [29], random forest (RF) [30], eXtreme Gradient Boosting (XGBoost) [31], Boosting [32], least absolute shrinkage and selection operator (LASSO regression) [33] (packages “randomForest”, “e1071”, “FNN”, “gbm”, “xgboost”, “caret”, and “glmnet” in R statistical analysis software). In addition, to evaluate the validity of each model, we used fivefold and tenfold cross-validation techniques on the entire dataset. Specifically, for fivefold cross-validation, the available dataset was divided into five roughly equal-sized subsets. Four of them were applied to fit the model, and the remaining one was used to estimate the accuracy of the model. Similarly, for the tenfold cross-validation, we performed 10 cross-validations by randomly dividing the entire dataset into 10 parts for 10 iterations. In each iteration, we selected 9 parts as training data and 1 part as the test set. The average result was 10% of the test data unused for each model.

The evaluation of the model was completed in the validation set. For two class-predictions these are typically the true positives (TP), true negatives (TN), false positives (FP) and false negatives (FN). The performance of the developed models was assessed by six metrics: AUC, accuracy ($\frac{TP+TN}{TP+TN+FP+FN}$), precision ($\frac{TP}{TP+FP}$), recall ($\frac{TP}{TP+FN}$), specificity ($\frac{TN}{TN+FP}$), and F-score ($\frac{2TP}{2TP+FP+FN}$). The AUC value was used as a reasonable summary of the overall diagnostic accuracy of the test, and the model with the largest AUC value was considered to have the best overall performance. Accuracy was defined as the percentage of true outcomes out of all prediction results. Precision was the ratio of true positives among all positive results [34]. Recall was the ratio of true positives out of true cases. The specificity reflects the percentage of true negatives among all negative results identified by the HFHI screening model. The F-score was the harmonic average of the precision and recall, allowing for comparison of the models' performance in identifying true positives when cases and controls were unbalanced in the dataset. By convention, the threshold for the AUC of a good model was set to 0.70 [35]. After that, the model with the best performance in terms of the six accuracy measurements was selected as the final HFHI screening model.

Finally, based on the variables that survived in the final model, we used the "rms" and "nomogram" R packages to construct a nomogram. The nomogram was based on proportionally converting each regression coefficient in multivariate logistic regression to a 0- to 100-point scale [36]. The effect of the variable with the highest β coefficient (absolute value) was set to 100 points. The points were added across variables to derive total points and then converted to the nomogram score, representing a certain individual's model-based probability of having HFHI.

Results

Subject characteristics and prevalence of HFHI

The present study included 3371 community participants, which consisted of 1730 males (51.3%) and 1641 females (48.7%), aged between 18 and 98 years, with a mean of 50.39 ± 15.23 years. Among them, 57.3% (1930/3371) were diagnosed with HFHI. Compared to those without HFHI, those with HFHI were usually older; more likely to be male; had lower education levels and incomes; and were more likely to be diagnosed with hypertension, diabetes, otitis media and chronic heart disease. The univariate analysis results of the behavioural factors, environmental exposures, symptoms and disease conditions, routine blood indices and hepatic function indices are summarized in Additional file 3. As a result, a total of 58 candidate indicators were identified for the next process of model construction (p < 0.05).

Performance evaluation of the HFHI screening models

In this study, we used 7 ML algorithms to construct the HFHI classification models in the training set and evaluated the performance of these models in the validation set using AUC values, accuracy, precision, recall, specificity and F-score measurements. Finally, the model with the best discriminative ability in the validation stage was selected as the final model to distinguish HFHI patients from community residents. We first compared the performance of each model by using commonly used receiver operating characteristic (ROC) curves and AUC values. The validated ROC curves and the validated AUC values for all models are shown in Fig. 2 and Table 2. Among the adopted algorithms, the LASSO algorithm achieved the best validated AUC of 0.868 (95% confidence interval (CI): 0.847–0.889) in the validation cohort, indicating that it had the best overall discriminative ability compared to the other models. In addition, KNN, Boosting and XGBoost achieved relatively good overall discriminative ability, with validated AUC values of 0.866 (95% CI: 0.845–0.887), 0.858 (95% CI: 0.837–0.880) and 0.854 (95% CI: 0.833–0.876), respectively.

Table 2 Comparisons on accuracy, precision, recall, specificity and F-score of classification among different machine learning approaches

Full size table

These ML models were further evaluated and compared with respect to other performance-related properties, including accuracy, precision, recall, specificity and F-score. The results are summarized in Table 2. In detail, the best accuracy was attained by the RF model (80.57%). In terms of precision, all 7 models achieved relatively good performance by attaining a precision of more than 80%, of which the best model was the NB model (93.52%). For the measurement of recall, the RF models attained the best value of 81.74% among the 7 models. The two best models in terms of specificity were the NB model (96.62%) and the LASSO model (81.01%). Finally, we compared the F-scores among these models, where 6 out of 7 models achieved values above 80%. The RF model (83.21%), the SVM model (82.94%), the KNN model (81.97%), the LASSO model (81.78%), the Boosting model (80.52%), and the XGBoost model (80.38%) are ranked from highest to lowest.

After performing these comprehensive comparisons, we found that the overall performances of the SVM model, RF model, KNN model and LASSO regression model were relatively better than those of the other models, but the AUC values of the SVM model (0.805), RF model (0.803) and KNN model (0.866) were lower than those of the LASSO regression model. Furthermore, we used fivefold and tenfold internal cross-validation methods to evaluate the performance of different algorithms. The AUC values of the models from cross-validation were compared with those of the original model. Similar to the original model, LASSO and KNN consistently outperformed the other algorithms. Specifically, in the fivefold cross-validation, the mean AUC values of the LASSO- and KNN-based models were both 0.857. Notably, when comparing the 95% CI, the KNN model showed a slightly narrower range (0.844–0.869) than did the LASSO model (0.845–0.870). In the tenfold cross-validation model, KNN achieved a slightly better performance than LASSO in terms of the AUC mean (Additional file 4). However, from the perspective of model interpretation and application, the LASSO-based model has unique strengths. First, in terms of model application, LASSO is capable of selecting impactful variables, thereby simplifying the model and facilitating its application value. KNN lacks a variable filtration step, necessitating the inclusion of all variables for subsequent application. Second, in terms of model interpretation, the important variables selected by LASSO can be used to design personalized intervention programs, while the KNN-based model does not have such capability since all involved features contribute to the model without a clear distinction of importance, limiting interpretability and customization potential. Consequently, we finally adopted the LASSO regression algorithm to construct the final HFHI screening model.

By adopting LASSO regression, our screening model ultimately selected 34 variables as indicators of HFHI risk and reached a fitted AUC of 0.866 (95% CI: 0.852–0.881) in the training cohort and a predicted AUC of 0.868 (95% CI: 0.847–0.889) in the validation cohort. The two AUC values were relatively high and differed very little, indicating that the derived model attained robust performance. The 34 HFHI risk indicators that survived in the final LASSO-based model included 5 demographic indicators, 7 disease-related features, 5 behavioural factors, 2 environmental exposures, 2 hearing cognitive factors, and 13 blood test indicators. Among them, history of coronary heart disease, otitis media and self-reported hearing issues, as well as several routine blood indices (e.g., RDW, PDW and LY%) and liver function indicators (e.g., TG, IBIL, AST and LDL), were identified as the most significant indicators (Additional file 5).

The HFHI screening nomogram and a case study for model interpretation

Based on the 34 HFHI risk indicators identified by LASSO regression, we further generated a nomogram to transform the LASSO regression model into an accessible screening tool that could ultimately be easily used by primary care physicians and community residents. In the nomogram (Fig. 3), an individual's total points were obtained by summing the points of each individual indicator achieved by a certain individual on the corresponding scale, and a vertical line was drawn on the scale based on the total points, providing this individual's final HFHI risk score. In our dataset, all individuals were ranked from low to high risk based on their risk scores and classified into three different risk groups: high-risk (score 0.75–1.00), medium-risk (score 0.45–0.75), and low-risk (score 0–0.45). The classification threshold of each risk category was determined based on the positive predictive value (PPV), sensitivity and specificity of each risk category. In general, the low-risk group comprised 1314 individuals, 22.91% (301/1314) of whom exhibited HFHI. In the medium-risk group, consisting of 822 individuals, 60.83% (500/822) were diagnosed with HFHI. Within the high-risk group, 91.42% (1129/1235) were confirmed as having HFHI. According to the nomogram, the demographic characteristics of the high-risk groups of HFHI patients could be described as elderly, low-educated, and low-income males. In terms of disease history, people with a history of tinnitus, hypertension, diabetes, coronary heart disease, or otitis media were at greater risk of HFHI. For lifestyle features, a history of smoking, alcohol consumption, high-level volume when using electronic products, high-level life pressure, and noise exposure in the working environment were risk factors for HFHI. Furthermore, 13 blood test indicators were identified by our model. Among them, the indicators that had the greatest impact on the HFHI were RDW, NE and TG.

In Fig. 3, we also use a community resident as an example to illustrate the application of the constructed HFHI screening model in community settings, showing the model's ability to identify potential high-risk residents who are normally overlooked. Generally, senior residents were at greater risk of HFHI, as age was one of the most important indicators of HI. However, for the 42-year-old man in our validation set, he was relatively young but was classified into the high-risk group of HFHI, as his identified HFHI risk was 0.846. Moreover, this middle-aged man was confirmed to have HFHI according to his audiology test results (47.50 dB for the poor ear). As shown in Fig. 3, the 34 involved features are marked with red triangles based on the patient records. From the perspectives of demographic and disease status, this man had self-perceived HI, hypertension and diabetes. He also has several behavioural risk factors, including smoking, alcohol consumption, excessive electronic volumes and daily noise exposure in the workplace. In terms of biomarker risk factors, the patient had an abnormally high level of IBIL. It is hoped that by applying this screening model, we could screen these potentially high-risk patients for subsequent confirmatory diagnosis and identify their risk factors for subsequent individualized interventions.

Characteristics differences in risk stratification

Differences in resident lifestyles according to risk stratification

To further explore the distribution of the lifestyle variables identified by the model in the three risk categories, we calculated the proportion of individuals with certain lifestyle behaviours within intervals of 0.1 points for their risk score and plotted loess curves over the spectrum of risk profiles. As shown in Fig. 4, the proportion of individuals with habits of using electronic volume over 40%, smoking, alcohol consumption, or self-reported workplace noise experience more than once a week increased significantly with increasing risk scores, while the proportion of individuals with habits of daily fruit and vegetable intake over 500 g and exercise more than once per month decreased significantly. In detail, 72.83% (957/1314) of individuals in the low-risk group exercised more than once per month, while this proportion decreased to 46.07% (569/1235) in the high-risk group. A total of 28.10% (347/1235) of individuals in the high-risk group drank alcohol, which was 3.13 times greater than that of the low-risk group (8.98%, 118/1314). Similarly, 39.51% (488/1235) of individuals in the high-risk group were smokers, which was 2.93 times greater than that of the low-risk group (13.47%, 177/1314).

Differences in the resident blood indices according to risk stratification

We also explored the distribution of the identified blood index variables in the three risk categories (Fig. 5). As the risk score increased, the proportion of individuals with high levels of LDL, LY%, RDW, TC and EO% also increased, while the proportion of individuals with high levels of HDL decreased slightly. Among them, 29.15% (360/1235) of individuals in the high-risk category had high levels of TC, which was 3.25 times that in the low-risk group (8.98%, 118/1314). In the high-risk group, 36.03% (445/1235) of individuals had high LDL levels, which decreased to 23.06% (303/1314) in the low-risk group. Conversely, the proportion of individuals with high levels of HDL decreased from 6.16% (81/1314) in the low-risk group to 3.16% (39/1235) in the high-risk group.

Discussion

HI is a prevalent sensorineural disorder and a growing public health issue of global concern. In this study, 57.3% were diagnosed with HFHI. It is suggested that half of all cases of HI can be prevented through public health measures [10, 37]. For instance, the World Health Organization (WHO) estimates that over 19% of childhood HI cases can be avoided by immunization against rubella and meningitis alone. In addition, the development and implementation of community-based hearing conservation programs aimed at changing listening behaviours and noise control in entertainment venues are also effective ways to intervene [38]. The development of a community-based risk screening tool for HI could be highly important and could be considered a first step in developing primary screening and prevention strategies for hearing health care in communities. In the current study, we adopted 7 ML algorithms to develop HFHI risk screening models by using questionnaire-based features and haematological test outcomes. The AUC values of the 6 models in the model validation stage were > 0.80, indicating that the ML model can be used to accurately identify HFHI individuals among ordinary residents.

When comparing the performance of the built models on the validation set, we found that the LASSO-based regression model achieved the best performance among all the algorithms tested, attaining an AUC of 0.868 (95% CI: 0.847–0.889) and reaching precision, specificity and F-score values all greater than 80% on the validation set. LASSO is a regression-based methodology allowing for simultaneous involvement of a large number of covariates in the model. More importantly, it penalizes the absolute value of a regression coefficient; thus, it is capable of regulating the impact a coefficient may have on the overall regression. The greater the penalization is, the greater the shrinkage of coefficients, with some reaching 0, thus automatically removing unnecessary/uninfluential covariates, making the model lean and resulting in a high discrimination rate [39]. The LASSO-based screening model is a classifier that can accumulate relative risks of meaningful variables and maximize its predictive power. This fundamental characteristic might explain the optimal discriminative ability of the LASSO model in our study [33].

Routine blood tests in primary community health care institutions are generally performed regularly at a frequency of at least once a year; on this basis, making full use of these indicators may promote primary screening and early warning of common diseases in community settings. In our study, the developed model revealed the value of haematological test data in screening HFHI, including metabolic parameters (e.g., LDL, HDL and TG), inflammatory parameters (e.g., NE and MPV) and other factors (e.g., AST). Among them, 4 indicators that had a significant impact on the screening model of the HFHI were TG, RDW, LY (%), and NE. Our research showed that residents with low TG levels and high TC levels are at greater risk of HFHI. Similarly, related studies have shown that TG, TC, and the LDL/HDL ratio are strongly associated with the prognosis of HI. When TC ranges from 5.2 to 6.2 mmol/L and TG ranges from 1.7 to 2.3 mmol/L, hypercholesterolemia increases the whole blood viscosity of the inner ear microvessels, which leads to haemorheological alterations and incomplete ischaemia [40]. A reasonable explanation is that abnormal blood lipid levels may lead to changes in whole blood viscosity, which may result in impaired microcirculation and ultimately lead to inadequate blood supply to target organs, resulting in sudden sensorineural HI [41]. In addition, our model revealed that high RDW is a risk factor for HFHI. Consistent with prior research, it has been shown that inflammatory markers are associated with HI in patients with inflammatory diseases, that there is a positive correlation between the RDW and mean hearing threshold, and that capturing inflammatory status is valuable for screening the risk of HI in residents with underlying chronic inflammation [42]. Furthermore, several social determinant features were recognized by our developed screening model as indicators of HFHI risk, including old age, male sex, low education level, and low income, which were all reported as risk factors for HI in previous studies [43]. Lifestyle factors such as smoking, alcohol drinking, a high volume when using electronic products and a high level of life pressure were also identified as risk factors for HFHI in our study. Previous research has shown that smokers are at a greater risk of HI and that the risk of smoking-related HI may decrease over time after quitting smoking [44]. There are many studies on the relationship between alcohol consumption and HI [45]. Some studies have found that moderate alcohol consumption has a protective effect on HI, which is similar to the cardiac protective effect of alcohol [46], whereas in contrast, our study found that alcohol consumption is a risk factor for HI. In addition, a cohort study has shown that individual music player users listening to high volumes increase the progression of their HI [47]. Therefore, timely intervention after early identification of risk factors in our model may have research significance for the occurrence of HFHI in residents.

The nomogram has been previously used as a predictive method in HI screening for occupational noise workers and achieved good screening results [48]. Therefore, to strengthen the operability and generalizability of our screening model for community physicians and residents, we also created an HFHI nomogram based on the derived LASSO-based HFHI risk model. Our nomogram achieved good calibration and discriminative ability in the validation cohort. This case study also revealed that this HFHI risk screening nomogram could be an effective tool for assessing HFHI risk in the general residential population and was especially valuable for screening potential HFHI patients in young or middle-aged adults. It is hoped that this nomogram could be further applied by primary care physicians as a routine tool at community health centres in the future to facilitate primary screening for HI and to help recognize a patient's modifiable risk factors for individualized intervention and treatment plans, ultimately driving current primary care towards personalized health care.

To our knowledge, this is the first study to combine residents' routine physical examination indicators and demographic and lifestyle risk factors from community residents to establish a risk assessment model for residents' HFHI based on an ML algorithm. Previous screening tools for HI have mostly targeted the elderly population, but they have primarily assessed the negative impact of HI on emotional and social problems and have not identified risk factors [49]. Model-based screening for HI achieved a better prediction effect, with AUC values ranging from 0.713–0.776, but it was mainly designed for noise-induced workers. Its risk factors are mainly industry type, noise exposure duration and median kurtosis [19, 50], which are quite different from those of ordinary residents. One study used demographics, clinical factors, and self-reported hearing status to predict whether speech frequency HI achieved good performance, but intervention after speech frequency HI was usually irreversible. The value of early prevention and intervention for HI is limited [21].

Implications of the study

Our model has several potential implications and contributions. First, it encompassed a comprehensive array of variables, including basic personal information, current disease history, behavioural and environmental factors, and self-perceived hearing cognition. Additionally, it integrates haematological indicators at the biochemical level, thereby forming a screening model that systematically considers potential risk predictors across multiple dimensions, which can provide research ideas for the exploration of biomarkers related to HI. Second, we used and compared various ML algorithms to construct the model, and successfully proved the feasibility of using ML algorithms for developing HFHI screening models in the general population. More importantly, our screening tool is low-cost and widely accessible, making it suitable for deployment in grassroots community health service centers in China. In this context, community doctors can integrate inquiries about the HI risk factors into routine medical check-ups, leveraging our risk screening model to assess residents’ HI risk levels. As a result, the tool enables accurate identification of high-risk populations, making the initial stride towards early HI screening and prevention for community residents. Moreover, the model can help identify modifiable and intervenable risk factors among individuals at high risk of HI, thereby facilitating the delivery of precise interventions and targeted health guidance. Meanwhile, this model can also serve as an epidemiological tool to summarize shared and common HI risk factors at the population level. This application allows for the design of evidence-based prevention strategies and policy frameworks in public health, contributing significantly to proactive healthcare management and community well-being.

Limitations

The limitations of this study need to be noted. First, the use of a cross-sectional design restricts the establishment of a direct causal relationship between identified predictors and the incidence of HFHI. Instead, this study provides suggestions from an aetiological perspective. Subsequent prospective studies are necessary to substantiate whether such associations are indicative of causal relationships. Second, our data were exclusively collected from seven health centres within a single province of China, potentially limiting the generalizability of the model to other regions in China with distinct socioeconomic and behavioural factors. In addition, the measurement of haematological data from different regions was conducted in the clinical laboratories of the investigated health care centres separately, which may introduce the potential for detection and measurement bias in this study.

Conclusion

In this study, we developed and validated an HFHI risk screening model based on seven ML algorithms that included demographic information, symptoms and medical history, behavioural factors, environmental exposure, hearing cognition variables, and haematology test variables. Finally, the LASSO-based screening model achieved the best accuracy, and we further transformed it into a screenable nomogram that can be directly used in the community environment combined with electronic case information to conduct risk screening of the population. The model can classify community residents into different HFHI risk categories (high-, medium- and low-risk). The risk factors for each HI high-risk resident can be identified so that HI prevention or intervention can be personalized. Following this study, we intend to use the screening model to develop an early warning plug-in tailored for electronic medical record systems, with a specific focus on its implementation within primary care settings. This initiative aims to facilitate the ongoing collection of the latest data, allowing for a comprehensive external validation of the screening model on a larger scale, and eventually ascertain the viability of the instrument as a nationally applicable tool for future use.

Availability of data and materials

The datasets generated and/or analysed during the current study are not publicly available due to the owners considering further publications, but are available from the corresponding author on reasonable request.

Abbreviations

ALB:: Albumin
ALT:: Alanine aminotransferase
AST:: Aspartate aminotransferase
AUC:: Area under curve
BA:: Basophilic
BASO:: Basophilic
BPC:: Blood platelet count
BUN:: Blood urea nitrogen
CI:: Confidence interval
CR:: Creatinine
dB:: Decibel
DBIL:: Direct bilirubin
EO:: Eosinophil
FN:: False negatives
FP:: False positives
HCT:: Hematocrit
HDL:: High-density lipoprotein
HFHI:: High-frequency hearing impairment
HGB:: Hemoglobin
HI:: Hearing impairment
IBIL:: Indirect bilirubin
kHz:: Kilohertz
KMO:: Kaiser-Meyer-Olkin
KNN:: K-nearest neighbors
LASSO:: Least absolute shrinkage and selection operator
LDL:: Low-density lipoprotein
LY:: Lymphocyte
MCHC:: Mean corpuscular hemoglobin concentration
MCH:: Mean corpuscular hemoglobin
MCV:: Mean corpuscular volume
ML:: Machine learning
MO:: Monocyte
MO:: Monocyte
MPV:: Mean platelet volume
NB:: Naive Bayes
NE:: Neutrophil
PDW:: Platelet distribution width
PPV:: Positive predictive value
RBC:: Red blood cell
RDW:: Red cell distribution width
RF:: Random forest
ROC:: Receiver operating characteristic
SVM:: Support vector machine
TBIL:: Total bilirubin
TC:: Total cholesterol
TG:: Triglyceride
TN:: True negatives
TP:: True positives
WBC:: White blood cell
WHO:: World Health Organization
XGBoost:: eXtreme Gradient Boosting

References

The Lancet Global H. Amplifying the global issue of hearing loss. Lancet Glob Health. 2022;10(10):e1360. https://doi.org/10.1016/s2214-109x(22)00390-4.
Article CAS Google Scholar
Jiang CY, Han K, Yang F, Yin SY, Zhang L, Liang BY, et al. Global, regional, and national prevalence of hearing loss from 1990 to 2019: a trend and health inequality analyses based on the global burden of disease study 2019. Ageing Res Rev. 2023;92:102124. https://doi.org/10.1016/j.arr.2023.102124.
Article PubMed Google Scholar
Croll PH, Voortman T, Vernooij MW, Baatenburg de Jong RJ, Lin FR, Rivadeneira F, et al. The association between obesity, diet quality and hearing loss in older adults. Aging (Albany NY). 2019;11(1):48–62. https://doi.org/10.18632/aging.101717.
Article PubMed Google Scholar
Ye X, He P. Direct costs attributable to hearing loss in China: based on an econometric model. Ear Hear. 2023;44(2):330–7. https://doi.org/10.1097/aud.0000000000001284.
Article PubMed Google Scholar
Ge J, Geng S, Gao Y, Ren G, Sun X, Jiang H. Construction and effect of the three-level and two-stage screening mode for age-related hearing loss: a study based on the community in Shanghai. China Clin Interv Aging. 2023;18:1309–20. https://doi.org/10.2147/cia.S423822.
Article PubMed Google Scholar
Long M, Zheng XY, Bu XK, Ma FR, Gong SS, Gong ZZ. Report on hearing health in CHINA(2021). Social Sciences Literature Press. 2021.
Cunningham LL, Tucci DL. Hearing loss in adults. N Engl J Med. 2017;377(25):2465–73. https://doi.org/10.1056/NEJMra1616601.
Article PubMed PubMed Central Google Scholar
Vlaming MS, MacKinnon RC, Jansen M, Moore DR. Automated screening for high-frequency hearing loss. Ear Hear. 2014;35(6):667–79. https://doi.org/10.1097/aud.0000000000000073.
Article PubMed PubMed Central Google Scholar
Li P, Pang K, Zhang R, Zhang L, Xie H. Prevalence and risk factors of hearing loss among the middle-aged and older population in China: a systematic review and meta-analysis. Eur Arch Otorhinolaryngol. 2023;280(11):4723–37. https://doi.org/10.1007/s00405-023-08109-3.
Article PubMed Google Scholar
Tsimpida D, Kontopantelis E, Ashcroft D, Panagioti M. Socioeconomic and lifestyle factors associated with hearing loss in older adults: a cross-sectional study of the English Longitudinal Study of Ageing (ELSA). BMJ Open. 2019;9(9):e031030. https://doi.org/10.1136/bmjopen-2019-031030.
Article PubMed PubMed Central Google Scholar
Baiduc RR, Sun JW, Berry CM, Anderson M, Vance EA. Relationship of cardiovascular disease risk and hearing loss in a clinical population. Sci Rep. 2023;13(1):1642. https://doi.org/10.1038/s41598-023-28599-9.
Article CAS PubMed PubMed Central Google Scholar
Jung DJ, Do JY, Cho KH, Kim AY, Kang SH. Association between triglyceride/high-density lipoprotein ratio and hearing impairment in a Korean population. Postgrad Med. 2017;129(8):943–8. https://doi.org/10.1080/00325481.2017.1381538.
Article PubMed Google Scholar
Verschuur CA, Dowell A, Syddall HE, Ntani G, Simmonds SJ, Baylis D, et al. Markers of inflammatory status are associated with hearing threshold in older people: findings from the Hertfordshire Ageing Study. Age Ageing. 2012;41(1):92–7. https://doi.org/10.1093/ageing/afr140.
Article PubMed Google Scholar
Nonoyama H, Tanigawa T, Shibata R, Nakao Y, Horibe Y, Katahira N, et al. Red blood cell distribution width predicts prognosis in idiopathic sudden sensorineural hearing loss. Acta Otolaryngol. 2016;136(11):1137–40. https://doi.org/10.1080/00016489.2016.1195919.
Article CAS PubMed Google Scholar
Bing D, Ying J, Miao J, Lan L, Wang D, Zhao L, et al. Predicting the hearing outcome in sudden sensorineural hearing loss via machine learning models. Clin Otolaryngol. 2018;43(3):868–74. https://doi.org/10.1111/coa.13068.
Article CAS PubMed Google Scholar
You S, Han W, Kim S, Maeng S, Seo YJ. Reliability and validity of self-screening tool for hearing loss in older adults. Clin Interv Aging. 2020;15:75–82. https://doi.org/10.2147/cia.S238053.
Article PubMed PubMed Central Google Scholar
Wang K, Wei W, Shi J, Qi B, Zhu Z, Li Z. Diagnostic accuracy of mobile health-based audiometry for the screening of hearing loss in adults: a systematic review and meta-analysis. Telemed J E Health. 2023;29(10):1433–45. https://doi.org/10.1089/tmj.2022.0427.
Article PubMed Google Scholar
Heo J, Yoon JG, Park H, Kim YD, Nam HS, Heo JH. Machine learning-based model for prediction of outcomes in acute stroke. Stroke. 2019;50(5):1263–5. https://doi.org/10.1161/strokeaha.118.024293.
Article PubMed Google Scholar
Chen F, Cao Z, Grais EM, Zhao F. Contributions and limitations of using machine learning to predict noise-induced hearing loss. Int Arch Occup Environ Health. 2021;94(5):1097–111. https://doi.org/10.1007/s00420-020-01648-w.
Article PubMed PubMed Central Google Scholar
Zhang M, Bi Z, Fu X, Wang J, Ruan Q, Zhao C, et al. A parsimonious approach for screening moderate-to-profound hearing loss in a community-dwelling geriatric population based on a decision tree analysis. BMC Geriatr. 2019;19(1):214. https://doi.org/10.1186/s12877-019-1232-x.
Article PubMed PubMed Central Google Scholar
Gathman TJ, Choi JS, Vasdev RMS, Schoephoerster JA, Adams ME. Machine learning prediction of objective hearing loss with demographics, clinical factors, and subjective hearing status. Otolaryngol Head Neck Surg. 2023;169(3):504–13. https://doi.org/10.1002/ohn.288.
Article PubMed Google Scholar
Choi YH, Park SK. Environmental exposures to lead, mercury, and cadmium and hearing loss in adults and adolescents: KNHANES 2010–2012. Environ Health Perspect. 2017;125(6):067003. https://doi.org/10.1289/ehp565.
Article CAS PubMed PubMed Central Google Scholar
Zhang J, Li C, Yin S, Wang Y, Zhou Y, Wang S, et al. Environmental exposure to organochlorine pesticides and its association with the risk of hearing loss in the Chinese adult population: a case-control study. Sci Total Environ. 2021;767:145153. https://doi.org/10.1016/j.scitotenv.2021.145153.
Article CAS PubMed Google Scholar
Wang Y, Ye C, Wang D, Li C, Wang S, Li J, et al. Construction and evaluation of a high-frequency hearing loss screening tool for community residents. Int J Environ Res Public Health. 2021;18(23):12311. https://doi.org/10.3390/ijerph182312311.
Article PubMed PubMed Central Google Scholar
Wang D, Zhang H, Ma H, Zhang L, Yang L, Xu L. Hearing threshold levels and hearing loss among people in Zhejiang, China: a population-based cross-sectional study. BMJ Open. 2019;9(4):e027152. https://doi.org/10.1136/bmjopen-2018-027152.
Article PubMed PubMed Central Google Scholar
Zhang Z. Multiple imputation with multivariate imputation by chained equation (MICE) package. Ann Transl Med. 2016;4(2):30. https://doi.org/10.3978/j.issn.2305-5839.2015.12.63.
Article PubMed PubMed Central Google Scholar
Nuryunarsih D, Herawati L, Badi’ah A, Donsu JDT, Okatiranti. Predicting changes in systolic and diastolic blood pressure of hypertensive patients in indonesia using machine learning. Curr Hypertens Rep. 2023;25(11):377–83. https://doi.org/10.1007/s11906-023-01261-5.
Article PubMed PubMed Central Google Scholar
Liu C, Yang H, Feng Y, Liu C, Rui F, Cao Y, et al. A K-nearest neighbor model to predict early recurrence of hepatocellular carcinoma after resection. J Clin Transl Hepatol. 2022;10(4):600–7. https://doi.org/10.14218/jcth.2021.00348.
Article PubMed PubMed Central Google Scholar
Chiong R, Fan Z, Hu Z, Chiong F. Using an improved relative error support vector machine for body fat prediction. Comput Methods Programs Biomed. 2021;198:105749. https://doi.org/10.1016/j.cmpb.2020.105749.
Article PubMed Google Scholar
Ubels J, Schaefers T, Punt C, Guchelaar HJ, de Ridder J. RAINFOREST: a random forest approach to predict treatment benefit in data from (failed) clinical drug trials. Bioinformatics. 2020;36(Suppl_2):601–9. https://doi.org/10.1093/bioinformatics/btaa799.
Article CAS Google Scholar
Hou N, Li M, He L, Xie B, Wang L, Zhang R, et al. Predicting 30-days mortality for MIMIC-III patients with sepsis-3: a machine learning approach using XGboost. J Transl Med. 2020;18(1):462. https://doi.org/10.1186/s12967-020-02620-5.
Article CAS PubMed PubMed Central Google Scholar
Liu Y, Lei P, Wang Y, Zhou J, Zhang J, Cao H. Boosting framework via clinical monitoring data to predict the depth of anesthesia. Technol Health Care. 2022;30(S1):493–500. https://doi.org/10.3233/thc-thc228045.
Article PubMed PubMed Central Google Scholar
Kang J, Choi YJ, Kim IK, Lee HS, Kim H, Baik SH, et al. LASSO-based machine learning algorithm for prediction of lymph node metastasis in T1 colorectal cancer. Cancer Res Treat. 2021;53(3):773–83. https://doi.org/10.4143/crt.2020.974.
Article CAS PubMed Google Scholar
El-Hasnony IM, Elzeki OM, Alshehri A, Salem H. Multi-label active learning-based machine learning model for heart disease prediction. Sensors (Basel). 2022;22(3):1184. https://doi.org/10.3390/s22031184.
Article PubMed Google Scholar
White N, Parsons R, Collins G, Barnett A. Evidence of questionable research practices in clinical prediction models. BMC Med. 2023;21(1):339. https://doi.org/10.1186/s12916-023-03048-6.
Article PubMed PubMed Central Google Scholar
Wu Y, Hu H, Cai J, Chen R, Zuo X, Cheng H, et al. A prediction nomogram for the 3-year risk of incident diabetes among Chinese adults. Sci Rep. 2020;10(1):21716. https://doi.org/10.1038/s41598-020-78716-1.
Article CAS PubMed PubMed Central Google Scholar
Wilson BS, Tucci DL, Merson MH, O’Donoghue GM. Global hearing health care: new findings and perspectives. Lancet. 2017;390(10111):2503–15. https://doi.org/10.1016/s0140-6736(17)31073-5.
Article PubMed Google Scholar
Portnuff CD. Reducing the risk of music-induced hearing loss from overuse of portable listening devices: understanding the problems and establishing strategies for improving awareness in adolescents. Adolesc Health Med Ther. 2016;7:27–35. https://doi.org/10.2147/ahmt.S74103.
Article PubMed PubMed Central Google Scholar
McEligot AJ, Poynor V, Sharma R, Panangadan A. Logistic LASSO regression for dietary intakes and breast cancer. Nutrients. 2020;12(9):2652. https://doi.org/10.3390/nu12092652.
Article CAS PubMed PubMed Central Google Scholar
Shao M, Xiong G, Xiang G, Xu S, Zhang L. Correlation between serum lipid and prognosis of idiopathic sudden sensorineural hearing loss: a prospective cohort study. Ann Transl Med. 2021;9(8):676. https://doi.org/10.21037/atm-21-907.
Article CAS PubMed PubMed Central Google Scholar
Tripolino C, Irace C, Carallo C, Scavelli FB, Gnasso A. Body fat and blood rheology: Evaluation of the association between different adiposity indices and blood viscosity. Clin Hemorheol Microcirc. 2017;65(3):241–8. https://doi.org/10.3233/ch-16172.
Article CAS PubMed Google Scholar
Jung DJ, Yoo MH, Lee KY. Red cell distribution width is associated with hearing impairment in chronic kidney disease population: a retrospective cross-sectional study. Eur Arch Otorhinolaryngol. 2020;277(7):1925–30. https://doi.org/10.1007/s00405-020-05912-0.
Article PubMed Google Scholar
Chou CF, Beckles GL, Zhang X, Saaddine JB. Association of socioeconomic position with sensory impairment among US working-aged adults. Am J Public Health. 2015;105(6):1262–8. https://doi.org/10.2105/ajph.2014.302475.
Article PubMed PubMed Central Google Scholar
Lin BM, Wang M, Stankovic KM, Eavey R, McKenna MJ, Curhan GC, et al. Cigarette smoking, smoking cessation, and risk of hearing loss in women. Am J Med. 2020;133(10):1180–6. https://doi.org/10.1016/j.amjmed.2020.03.049.
Article PubMed PubMed Central Google Scholar
Qian P, Zhao Z, Liu S, Xin J, Liu Y, Hao Y, et al. Alcohol as a risk factor for hearing loss: a systematic review and meta-analysis. PLoS ONE. 2023;18(1):e0280641. https://doi.org/10.1371/journal.pone.0280641.
Article CAS PubMed PubMed Central Google Scholar
Dawes P, Cruickshanks KJ, Moore DR, Edmondson-Jones M, McCormack A, Fortnum H, et al. Cigarette smoking, passive smoking, alcohol consumption, and hearing loss. J Assoc Res Otolaryngol. 2014;15(4):663–74. https://doi.org/10.1007/s10162-014-0461-0.
Article PubMed PubMed Central Google Scholar
Engdahl B, Aarhus L. Personal music players and hearing loss: the HUNT cohort study. Trends Hear. 2021;25:23312165211015880. https://doi.org/10.1177/23312165211015881.
Article PubMed PubMed Central Google Scholar
Sun R, Shang W, Cao Y, Lan Y. A risk model and nomogram for high-frequency hearing loss in noise-exposed workers. BMC Public Health. 2021;21(1):747. https://doi.org/10.1186/s12889-021-10730-y.
Article PubMed PubMed Central Google Scholar
Cassarly C, Matthews LJ, Simpson AN, Dubno JR. The revised hearing handicap inventory and screening tool based on psychometric reevaluation of the hearing handicap inventories for the elderly and adults. Ear Hear. 2020;41(1):95–105. https://doi.org/10.1097/aud.0000000000000746.
Article PubMed PubMed Central Google Scholar
Zhao Y, Li J, Zhang M, Lu Y, Xie H, Tian Y, et al. Machine learning models for the hearing impairment prediction in workers exposed to complex industrial noise: a pilot study. Ear Hear. 2019;40(3):690–9. https://doi.org/10.1097/aud.0000000000000649.
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We wish to thank the all the community healthcare centers for their support of this program. We would also like to thank the editors and reviewers for their precious time and constructive comments.

Funding

This research was funded by National Natural Science Foundation of China (grant number: 72074066) and Science and Technology Department of Zhejiang Province (grant number: 215C03050).

Author information

Yi Wang and Xinmeng Yao are co-first authors/contributed equally to this work.

Authors and Affiliations

Department of Epidemiology and Biostatistics, School of Public Health, Hangzhou Normal University, Hangzhou, 311121, Zhejiang, China
Yi Wang, Xinmeng Yao & Liangwen Xu
Hangzhou Center for Disease Control and Prevention, Hangzhou, Zhejiang, China
Yi Wang
Department of Health Management, School of Public Health, Hangzhou Normal University, Hangzhou, Zhejiang, China
Dahui Wang & Chengyin Ye

Authors

Yi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xinmeng Yao
View author publications
You can also search for this author in PubMed Google Scholar
Dahui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chengyin Ye
View author publications
You can also search for this author in PubMed Google Scholar
Liangwen Xu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization, Y.W. (YiWang), C.Y. (Chengyin Ye) and L.X. (Liangwen Xu); Data curation, X.Y. (Xinmeng Yao), D.W. (Dahui Wang) and C.Y.; Formal analysis, Y.W., X.Y. and C.Y.; Writing-original draft, Y.W. and X.Y.; Writing-review and editing; Y.W., X.Y. and C.Y.; Supervision, L.X. All authors have read and agreed to the published version of the manuscript.

Corresponding authors

Correspondence to Chengyin Ye or Liangwen Xu.

Ethics declarations

Ethics approval and consent to participate

The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Institutional Review Board of Hangzhou Normal University Ethics Committee (grant number: 2017LL107). Study participants were informed about the objective of the study and written informed consent was obtained from each respondent. All the information obtained was and will be held confidential and it was and will be used only for this study's purpose.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Material 1.

Supplementary Material 2.

Supplementary Material 3.

Supplementary Material 4.

Supplementary Material 5.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Wang, Y., Yao, X., Wang, D. et al. A machine learning screening model for identifying the risk of high-frequency hearing impairment in a general population. BMC Public Health 24, 1160 (2024). https://doi.org/10.1186/s12889-024-18636-1

Download citation

Received: 22 July 2023
Accepted: 17 April 2024
Published: 25 April 2024
DOI: https://doi.org/10.1186/s12889-024-18636-1

A machine learning screening model for identifying the risk of high-frequency hearing impairment in a general population

Abstract

Background

Methods

Results

Conclusions

Introduction

Materials and methods

Study design and dataset

Audiometric assessment

Questionnaire survey

Blood sample collection and laboratory testing methods

Candidate indicators

Statistical analyses

Data pre-processing and feature selection

Model development and validation

Results

Subject characteristics and prevalence of HFHI

Performance evaluation of the HFHI screening models

The HFHI screening nomogram and a case study for model interpretation

Characteristics differences in risk stratification

Differences in resident lifestyles according to risk stratification

Differences in the resident blood indices according to risk stratification

Discussion

Implications of the study

Limitations

Conclusion

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Supplementary Information

Supplementary Material 1.

Supplementary Material 2.

Supplementary Material 3.

Supplementary Material 4.

Supplementary Material 5.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Public Health

Contact us