Prognostic scores in primary biliary cholangitis patients with advanced disease

2023-10-21 01:01JuanFengJiaMinXuHaiYanFuNanXieWeiMinBaoYingMeiTang

Juan Feng, Jia-Min Xu, Hai-Yan Fu, Nan Xie, Wei-Min Bao, Ying-Mei Tang


Key Words: Primary biliary cholangitis; Prognostic value; Liver transplantation; Cholangitis; Mayo score


Primary biliary cholangitis (PBC) is a chronic progressive liver disease that causes the gradual destruction of the intrahepatic small bile ducts[1]. Preclinical PBC may present with specific diagnostic antibodies (anti-mitochondrial antibody, AMA) but remain asymptomatic with normal liver function for over a decade. Approximately 50%-60% are asymptomatic at diagnosis[2]. Ursodeoxycholic acid (UDCA) is the first-line treatment. It increases long-term survival.However, approximately 40% of patients with PBC have incomplete responses, and these patients progress rapidly to the middle and late stages of disease after early diagnosis and treatment[3]. Because of the chronic progressive disease characteristics, PBC patients in the middle and late stages should not be ignored.

Over the past 20 years, several risk-scoring models for PBC have been proposed as tools to estimate the risk of adverse outcomes and to guide management[4]. The most influential scores are GLOBE and UK-PBC, developed for early PBC patients. Recent studies reported that these scores accurately predict outcomes in patients treated with UDCA treatment at various disease stages[5-7]. However, their application to middle and late stage PBC patients remains to be studied.The Mayo score was developed to determine the timing of liver transplantation (LT) in PBC and is now a model for predicting PBC survival[8-10]. The aspartate aminotransferase-to-platelet ratio index (APRI) and fibrosis-4 index (FIB-4)are non-invasive fibrosis scores based on biochemical indicators[11]. All parameters, including aminotransferase,platelets, and age, are associated with PBC outcomes[12,13]. The albumin-bilirubin (ALBI) score was initially developed to assess liver function in hepatocellular carcinoma patients[14]. The total bilirubin (TBil) and albumin in the score are associated with PBC progression, and some studies have used them to predict PBC outcomes[15,16]. There are few studies on the efficacy and differences of the various prognostic scoring systems in PBC patients, especially in patients in advanced stages[17-19].

Some patients with decompensated cirrhosis return to a clinical state consistent with compensated cirrhosis when they undergo appropriate etiological and symptomatic supportive treatment, named the “recompensation phenomenon”[20].Portal hypertension and systemic inflammation can lead to the progression of decompensated cirrhosis. Recently, studies have been performed on the mechanism and clinical feasibility of reversing decompensation and recompensation in cirrhosis[21-23]. These findings led to updating the stage evaluation concept and an outcomes estimate system for decompensated cirrhosis.

The present study enrolled patients diagnosed with PBC during hospitalization whose disease stages were in the middle and late stages. We compared the effectiveness and differences of various prognostic scoring systems to optimize monitoring, disease evaluation, and timely treatment for advanced stage PBC.


Population and study design

Patient data were derived from nine hospitals in Yunnan Province, China. Patients whose disease was on the first page of the medical record were diagnosed with PBC (ICD-10 code K74.3) and were treated with UDCA after diagnosis. The diagnostic criteria were as follows: elevated serum alkaline phosphatase; AMA-positive or AMA-negative when there were PBC-specific autoantibodies such as spl00 and gp210; histological evidence suggesting non-suppurative destructive cholangitis; and interlobular bile duct injury. PBC can be diagnosed when two criteria are met, and the diagnostic criteria met the 2018 American Association for the Study of Liver Diseases guidelines[24].

Patients were excluded if they underwent follow-up for less than 6 mo or if the dates of treatment initiation or major clinical events were unknown.

Data collection

Clinical data were obtained from 397 PBC patients diagnosed during hospitalization from May 1, 2015 to December 31,2021. Clinical data collected from these patients included age, sex, ethnicity, date of PBC diagnosis, past medical and personal histories, clinical manifestations, liver disease complications, liver biopsy results, imaging results, gastroscopy results, and laboratory values (immunological tests, serum biochemistries, complete blood counts, and coagulation times). UDCA (13-15 mg/kg/day) was prescribed after diagnosis, and laboratory results were collected at the 1-year follow-up. Current guidelines and the reports from centers worldwide state that biochemical improvement after 1 year of UDCA treatment accurately predicts long-term outcomes and survival[24-26]; therefore, we collected laboratory results at a 1-year follow-up for prognostic assessment.

All patients were followed up by telephone with a deadline of December 31, 2021. Endpoint events were liver-related death or LT. No endpoint event was non-transplantation survival. Classification of the disease stage was according to the patient’s clinical characteristics and examination data. A cirrhosis diagnosis was based on liver imaging examination (Bultrasound, computed tomography), liver biopsy, or liver transient elastic imaging in the medical records. The diagnosis standard was derived from the 2020 guidelines[27]. We divided the patients into groups without cirrhosis, compensated cirrhosis, and decompensated cirrhosis.

Ethical considerations

This study was performed per the Declaration of Helsinki. The Ethics Committee of the second affiliated hospital of Kunming Medical University approved the study (approval No. YJ-2022-14). Each participating center approved the protocol. We analyzed all data anonymously.

Statistical analyses

The baseline time was the start of UDCA treatment, and the primary endpoint was a composite of death or LT. Patients not meeting this endpoint during follow-up were censored at their final follow-up visit. The formulas of prognostic scores can be found in the Supplementary material. These scores were computed at baseline and after 1 year of UDCA treatment. These risk scores were descriptive statistics to compare patients that did or did not meet the composite endpoint.

Predictive validity was based on model discrimination and calibration. Cox proportional hazard regression analyses were performed to assess the discriminative performance of the risk scoring models at baseline and after UDCA treatment for 1 year. The overall discriminative performance of these models was calculated using the concordance (C)-statistic. Combining these predictive models when assessing the risk of death or LT based on data collected following UDCA treatment for 1 year was further evaluated using Cox regression analyses. C-statistic values were also assessed for various combinations of risk prediction models.

A graphical approach was used to assess model calibration by comparing Kaplan-Meier transplant-free survival estimates produced by these risk prediction models after 1 year of UDCA treatment.

All analyses were performed using R v 4.2.1. To account for missing values, the predictive mean matching of the mice package was applied to interpolate the missing data of laboratory results using multiple interpolation methods.Continuous data were expressed as the median and interquartile range.P< 0.05 was the threshold of significance.



Study population characteristics

We enrolled 397 PBC patients initially diagnosed while hospitalized and underwent UDCA treatment. The mean age was 56.84 (standard deviation 11.2) years and included 343 (86.4%) females. The specific staging, clinical, and biochemical characteristics at the beginning of UDCA treatment are displayed in Table 1.

The patients were followed for 6.4 ± 1.4 years, with 3 patients lost to follow-up at the final follow-up. During followup, 86 experienced a clinical endpoint: 4 patients underwent LT; and 82 patients died. Liver disease was related to the cause of death in 79/82 (96.3%) patients. The 3-, 5-, and 7-year transplant-free survival rates were 94.0%, 86.9%, and 78.3%, respectively (Figure 1). Advanced stages correlated with lower survival (P< 0.001).

At the start of UDCA therapy, 80 (20.2%) patients had no cirrhosis, 43 (10.9%) patients had compensated cirrhosis, and 274 (69.0%) patients had decompensated cirrhosis.

Table 1 Baseline cohort characteristics

Discriminative performance of different prognostic risk scoring models

The overall discriminative performance of the Mayo, APRI, FIB-4, and ALBI models was assessed at baseline based on Cstatistic values when used to predict death or LT. GLOBE and UK-PBC scores were based on values measured at baseline and after UDCA treatment for 1 year. The baseline C-statistic values for the Mayo and ALBI scores were 0.702 [95%confidence interval (CI): 0.653-0.751] and 0.705 (95%CI: 0.656-0.755), respectively, while the FIB-4 and APRI scores showed poorer performance (Table 2).

Following UDCA treatment for 1 year, the C-statistic values for Mayo, GLOBE, UK-PBC, and ALBI scores were 0.740(95%CI: 0.678-0.776), 0.731 (95%CI: 0.681-0.782), 0.727 (95%CI: 0.678-0.776), and 0.725 (95%CI: 0.672-0.778), respectively. In contrast, the FIB-4 score showed poorer discriminatory power, and the APRI scores showed virtually no discriminatory performance (Table 2; Supplementary Figure 1).

Table 2 Discriminative performance of the various risk prediction scores calculated at baseline and after 1 year of ursodeoxycholic acid therapy

Figure 1 Kaplan-Meier estimates for the baseline patients survival. The transplant free survival (or death) of primary biliary cholangitis patients with baseline compensated and decompensated cirrhosis. LT: Liver transplantation.

There were no significant differences between the GLOBE, UK-PBC, Mayo, and ALBI scores concerning predictive performance at the start of UDCA treatment or 1 year after (Supplementary Table 1).

Analysis of the combined performance of different risk prediction scores

Cox regression analyses were used to evaluate the availability of combining predictive models when assessing the odds of death or LT based on data collected following UDCA treatment for 1 year. In univariate Cox regression analyses, the UK-PBC, ALBI, GLOBE, and Mayo scores were all significantly associated with death or LT (P< 0.001) (Table 3). The hazard ratio of UK-PBC was the largest (hazard ratio: 6.046, 95%CI: 3.479-10.510). In multivariate analysis, only the GLOBE scores remained significantly associated with death or LT (Table 3).

Adding the UK-PBC, APRI, FIB-4, Mayo, and ALBI scores to the GLOBE score did not significantly improve the discriminative performance, with a C-statistic value that remained at 0.73 (Supplementary Table 2). The C-statistics of all scores before adding are displayed in Table 1.

Combining the UK-PBC score with the APRI, FIB-4, and ALBI scores did not cause a significant increase in discrimination performance. The C-statistic remained at 0.72 (Supplementary Table 2); only with the addition of the Mayo score did the C-statistic increase (+0.02).

Table 3 Multivariable analyses of risk prediction scores after 1 year of ursodeoxycholic acid therapy

The most significant increase in C-statistic values was observed when the Mayo score was combined with the others.The APRI score increased to 0.740 (95%CI: 0.689-0.791), and the FIB-4 score increased to 0.741 (95%CI: 0.69-0.791)(Supplementary Table 2)

Calibration analyses of different predictive risk scores

The ALBI, GLOBE, and Mayo scores with superior discriminatory performance were selected to evaluate the predicted and observed survival (Figure 2). The UK-PBC score was omitted from the analyses because it primarily predicts liverrelated death rather than transplant-free survival[28]. The three risk prediction models tended to overestimate transplantfree survival. They showed good calibration for short-term survival; the deviation from observed survival at 1 year to 3 years for ALBI, GLOBE, and Mayo was < 0.2%. After 3 years, the deviation tended to be greater yearly. The most significant deviation was for the GLOBE score (2.0%-4.3%), and the most minor was for the Mayo score (1.0%-2.4%).When these scores were evaluated at yearly intervals for up to 7 years, the deviation of the GLOBE score was the greatest,and the Mayo score was the most minor. By comparison, the Mayo score demonstrated the best calibration.


We assessed the PBC-specific scores GLOBE, UK-PBC, and Mayo and compared the ALBI, APRI, and FIB-4 scores. These analyses revealed that the ALBI and Mayo scores showed adequate discriminatory performance and good predictive accuracy at baseline. The Mayo score demonstrated superior discriminatory performance and calibration singly and combined with other risk models, suggesting that this score is the best risk prediction model for predicting liver-related death or LT in PBC patients in the advanced stage. These findings also suggested that the performance of the PBC-specific risk scores was superior to other prognostic scores for advanced PBC.

Models with a C-statistic value greater than 0.7 are considered good prognostic models. The Mayo score was the only model consistently reaching this threshold at baseline and 1 year of UDCA treatment. The C-statistic of the Mayo score was greater after patients received UDCA for 1 year, suggesting an increase in discriminatory performance with prolonged UDCA treatment. The next most effective predictive models were the GLOBE, UK-PBC, and ALBI scores, with no significant differences in predicting liver-related death or LT following UDCA treatment for 1 year.

The Mayo score exhibited consistently better discriminative performance than other scores in this PBC patient cohort.The Mayo score is a traditional risk prediction model developed for PBC patients, primarily developed to evaluate untreated PBC patients. However, this study enrolled patients that had undergone UDCA treatment and were in an advanced stage. The Mayo score has previously been linked to transplant-free survival among patients that underwent UDCA treatment, enabling their stratification into low- and high-risk groups based on the original thresholds[29,30].However, the reliance of this scoring model on ascites, which can be subjective, may limit its clinical applicability.

In this study, the parameter was derived from the results of imaging examinations during hospitalization, and this examination is a routine item of these hospitalized patients. Therefore, the judgment of the parameter of ascites was relatively objective. In theory, the superior discriminatory performance of the Mayo score may be promoted by ascites and prothrombin time, which are the most relevant parameters in late stage PBC; other parameters are TBil and ALB,which also are indicators of significant changes in patients with more advanced stages. Based on these characteristics, the Mayo score may be more applicable for prognosis assessment in advanced stage PBC patients. Our study verified this point, but the actual evidence remains to be further verified in a large population or more studies.

The discriminatory performance of the GLOBE, UK-PBC, and ALBI scores is secondary to the Mayo score. Both the UK-PBC and GLOBE scores were developed as PBC-specific scoring systems and have previously been applied to evaluate the prognosis of early PBC patients. Our cohort was mainly late stage patients, and the results were inferior to the Mayo score. The ALBI score is calculated using two indicators (TBil, ALB), which are validated biomarkers associated with PBC disease progression[31-33]. APRI and FIB-4 scores had inferior discriminatory performance in this study, while the two were liver fibrosis scores based on biochemical indicators. However, this study’s poor performance may be because most patients had cirrhosis without significant differences in the progression of liver fibrosis, which is not applicable to predicting advanced PBC patients.

Figure 2 Calibration analyses of the predictive accuracy of ALBI, GLOBE, and Mayo scores were calculated after ursodeoxycholic acid treatment for 1 year over a 7-year follow-up interval.

The different combinations of prognostic models were evaluated for their ability to predict death or LT. The study results showed that GLOBE and UK-PBC were relatively stable, with little change in the C-statistic when other scores were added. Moreover, the univariate and multivariate Cox regression analyses of all predictive models also support this point. The highest C-statistic value increases were observed when the Mayo scores were combined with the other scores.The results demonstrated that the GLOBE and UK-PBC score models have good stability and are applicable for prognosis assessment exclusively. While the APRI and FIB-4 scores were applied to combine with other scores, the best discriminatory performance was combined with the Mayo score.

We chose the ALBI, GLOBE, and Mayo sores for model calibration, which had superior discriminatory performance after UDCA therapy for 1 year, while the UK-PBC model was omitted because it predicts liver-related death and not transplant-free survival[28]. These scores all tended to overestimate the transplant-free survival rate, with better calibration at 1-3 years. The deviation tended to increase yearly after 3 years. In the 1-7-year interval, the deviation of the GLOBE score was the greatest, and the Mayo score was the most minor. In contrast, the best model calibration was the Mayo score. These findings suggested that the Mayo score has the best prediction performance and accuracy for advanced PBC patients.

This study has several limitations. First, we did not have a large study cohort, and the comparison of prognostic scores was calculated at baseline and 1 year later. This limitation indicates the need for verification using large sample sizes and prospective studies. Second, this was a retrospective analysis; some of the included data were missing. We applied predictive mean matching to interpolate the missing values. Third, while the UK-PBC risk score was developed to predict liver-related death and not transplant-free survival (unlike the other score models), the same analyses used the endpoints and indicated similar discriminatory performance. Despite the limitations, the study is significant because of the lack of the comparison of prognostic scores in advanced PBC patients.


The Mayo, GLOBE, UK-PBC, and ALBI scores had excellent prediction performance for death and LT. Mayo scores had the best prediction efficacy in discriminating performance and predicting outcomes. The significance of this study was that it enables advanced PBC patients to be monitored and assessed closely in clinical practice to delay PBC progression.


Research motivation

This study was designed to compare the prognostic value of different risk scores in the PBC patients with advanced disease stages.

Research objectives

To determine the best prognostic score to ensure that the clinical majority of PBC patients get more monitoring and assessment.

Research methods

The discriminatory performance of the scores was assessed with concordance statistics at baseline and after 1 year of ursodeoxycholic acid (UDCA) treatment. The combined performance of prognostic scores in estimating the risk of death or liver transplantation after 1 year of UDCA treatment was assessed using Cox regression analyses. Predictive accuracy was evaluated by comparing predicted and actual survival through Kaplan-Meier analyses.

Research results

After receiving UDCA treatment for 1 year, the score with the best discrimination performance was the Mayo score, with a concordance statistic of 0.740 (95% confidence interval: 0.690-0.791). The ALBI, GLOBE, and Mayo scores tended to overestimate transplant-free survival. Comparing 7 years of calibration results showed that the Mayo score was the best model.

Research conclusions

The Mayo, GLOBE, UK-PBC, and ALBI scores demonstrated comparable discriminating performance for advanced stage PBC. The Mayo score showed optimal discriminatory performance and excellent predictive accuracy.

Research perspectives

There is a need for verification of our results with larger sample sizes and prospective studies.


Author contributions:Feng J, Xu JM, Bao WM, and Tang YM designed the research study; Feng J, Xu JM, Fu HY, and Xie N performed the research; All authors contributed to data collection and collation; Feng J and Xu JM analyzed the data and wrote the manuscript; All authors read and approved the final manuscript.

Supported byMedicine Leading Talents of Yunnan Province, No. L-2019013; the Yunnan Wanren Project, No. YNWR-MY-2018-028; and Clinical Research Project of the Second Affiliated Hospital of Kunming Medical University, No. 2020ynlc010.

Institutional review board statement:The Ethics Committee of the second affiliated hospital of Kunming medical university approved the study (Approval No. YJ-2022-14), the protocol was approved by each participating center.

Informed consent statement:Patients were not required to give informed consent to the study because the analysis used anonymous clinical data that were obtained after each patient agreed to treatment by written consent.

Conflict-of-interest statement:The authors declare that they have no competing interests.

Data sharing statement:Not applicable.

STROBE statement:The authors have read the STROBE Statement—checklist of items, and the manuscript was prepared and revised according to the STROBE Statement—checklist of items.

Open-Access:This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers.It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See:

Country/Territory of origin:China

ORCID number:Juan Feng 0009-0007-1690-832X; Ying-Mei Tang 0000-0002-0731-4198.

S-Editor:Yan JP


P-Editor:Cai YX