While universal hepatitis B virus (HBV) testing is useful in high prevalence areas, it is not as cost effective in low prevalence areas like the United States.
However, the use of machine learning technology could help identify those at high risk.
A team, led by Nathan S. Ramrakhiani, Division of Gastroenterology and Hepatology, Stanford University Medical Center, identified patients with HBV using logistic regression and learning Newly developed automatic systems that leverage demographic data from population-based data.
While it affects more than 290 million patients worldwide, only 10% of individuals with the chronic hepatitis B virus have been officially diagnosed. Even though, for the past two decades, guidelines have called for screening people at high risk, cases still remain underreported.
In the study, investigators identified patients with data on hepatitis B surface antigen (HBsAg), year of birth, gender, race and ethnicity, and place of birth. using 10 cycles of the National Health and Nutrition Examination Survey (NHANES) between 1999-2018. The median year of birth for the patient population was 1973
The participants were divided into 2 distinct cohorts: training (cycles 2, 3, 5, 6, 8, 10; n = 39,119) and validation (cycles 1, 4, 7, 9; n = 21,569).
Next, the researchers developed and tested the new logistic regression and machine learning models.
The primary outcome of the logistic regression model was HBV infection, which was defined as positive HBsAg with demographic variables as primary predictors. Univariate and multivariate logistic regressions were part of the training set.
Compare the 2 models
In the machine learning model, researchers determined the demographic factors and place of birth associated with the primary outcome. The model used the training cohort with sub-sampling of controls and 10-fold cross-validation to determine the test characteristics of the model.
Using multivariate logistic regression, investigators identified several factors that were more frequently associated with HBV infections, including year of birth 1991 or later (aOR, 0.28; 95% CI: 0.14- 0.55; P P = 0.0080), Black and Asian / Other vs White (aOR, 5.23 and 9.13; 95% CI, 3.10-8.83 and 5.23-15.96; P P
Machine learning model, best logistic regression
In the end, the machine learning model was superior, with a higher area under the operating characteristic of the receiver (AUROC) (0.83 versus 0.75 in the validation cohort, P
The training cohort showed that AUROC was significantly higher in the machine learning model at 0.90 (95% CI, 0.88-0.92), compared to 0.81 (95% CI %, 0.79-0.84) for the logistics model (P
In the validation cohort, AUROC was also higher in the machine learning model (0.83; 95% CI 0.78-0.88 vs. 0.75; 95% CI 0.70 -0.80; P
“Our machine learning model has consistently outperformed the logistic regression model, laying the groundwork for what could potentially be a practical and cost-effective HBV screening strategy for low prevalence regions with more ‘imported’ HBV infection. like United States or Western Europe. The authors wrote. “We are also advocating for additional risk-based screening for populations at specific risk of exposure according to professional society and CDC guidelines.”
The study, “Optimizing hepatitis B virus screening in the United States using a simple demographic-based model ”, was published online in Hepatology.