Notes
Article history paragraph text
The research reported in this issue of the journal was funded by the HTA programme as project number 03/38/01. The contractual start date was in July 2005. The draft report began editorial review in October 2011 and was accepted for publication in November 2011. The authors have been wholly responsible for all data collection, analysis and interpretation, and for writing up their work. The HTA editors and publisher have tried to ensure the accuracy of the authors' report and would like to thank the reviewers for their constructive comments on the draft document. However, they do not accept liability for damages or losses arising from material published in this report.
Permissions
Copyright statement
© Queen's Printer and Controller of HMSO 2013. This work was produced by Lilford et al. under the terms of a commissioning contract issued by the Secretary of State for Health. This issue may be freely reproduced for the purposes of private research and study and extracts (or indeed, the full report) may be included in professional journals provided that suitable acknowledgement is made and the reproduction is not associated with any form of advertising. Applications for commercial reproduction should be addressed to: NIHR Journals Library, National Institute for Health Research, Evaluation, Trials and Studies Coordinating Centre, Alpha House, University of Southampton Science Park, Southampton SO16 7NS, UK.
Chapter 1 Introduction to report
Background
Introduction
Liver disease represents a major source of morbidity and mortality in the UK. 1 Abnormal liver function tests (LFTs) have been shown to be predictive not only of liver disease mortality, but also of more general causes of mortality. 2 LFTs are a good example of inexpensive tests (modern auto-analysers process large batches of samples using inexpensive reagents) that are frequently ordered as a ‘test of exclusion’ in patients with non-specific symptoms, such as tiredness or upper abdominal discomfort. The tests are also non-specific in the sense that none of the four to eight analytes included in the LFT panel points directly to a specific diagnosis, and many are not even specific to the liver. A doctor may order a laboratory test because a patient has features of a particular disease; for example, the gradual onset of jaundice in a user of injectable substances points to hepatitis C. The prior risk of hepatitis in such a person would be high: many positives would be true-positives. In most cases, however, LFTs are ordered without such a traceable link between symptoms and a specific diagnosis, for example when patients have vague symptoms or as part of the monitoring of patients with chronic diseases. Such tests are often offered as a type of insurance policy, but the prior risk of disease is low and the predictive value of LFTs is, a priori, likely to be low also. LFTs are interpreted by reference to population norms, rather than explicit calculus of the relative benefits and harms of false-positive and false-negative diagnoses. Many patients have a positive test, but it is not clear what proportion of these are true-positives, especially when the test result is only mildly abnormal. Review of the literature (see Previous research) shows that there is little evidence from large cohorts of people with abnormal LFT results to guide clinical actions when LFT results are mildly abnormal. The issue of how, or even whether, to investigate abnormal LFTs under various scenarios is not settled.
It is clear that a very large number of tests are ordered and abnormal results are common. The laboratory at University Hospital Birmingham received 67,182 requests for LFTs in 2003, from 83 general practitioner (GP) practices representing 210 GPs. Of these, 9779 (15%) led to an abnormal result in the sense that at least of one of the analytes on the LFT panel exceeded the reference range. As LFTs are inexpensive and easy to organise as one of the standard ‘blood tests’ in the GP's repertoire, their use has become widespread without careful study of their meaning in a general practice setting. As the meaning of the various combinations of possible test results and clinical features is unclear, different practitioners respond in different ways to the same test profile – the eclectic nature of practitioners' responses to the same scenarios has been well documented. 3
Most abnormal LFT results are false-positives. Thus, large numbers of follow-on tests and much anxiety can ensue if a low threshold is used to define abnormality. On the other hand, there are arguments to adopt a low threshold for subsequent evaluation, as LFTs have the ability to detect diseases when they are most treatable, for example by reducing overload in patients with metal storage diseases or by administering antiviral agents in those with chronic viral hepatitis. Furthermore, theory-based interventions designed to modify behaviour that leads to liver damage, while clearly far from a panacea, nevertheless produces worthwhile benefits in that some people adopt healthy lifestyles when they perceive that their health is threatened and that engaging in the recommended behaviour will reduce this threat. 4–6
The incidence of many liver diseases is rising, for example with migration from places with high rates of chronic hepatotoxic viral infection, and as a result of alcohol and calorie excess. Comorbidity is becoming more common as alcohol misuse and calorie excess unmask other diseases of the liver, such as haemochromatosis.
Thus, three interacting factors create an urgent need to better understand the clinical epidemiology of abnormal LFTs:
-
frequent use of these tests
-
lack of clarity about the meaning of the results
-
increasing treatability and rising incidence of liver diseases.
A number of authors have produced diagnostic algorithms for the investigation of people with abnormal LFTs. 7–12 These provide sensible advice – for example stressing the importance of taking a careful family history or of responding to tests that suggest obstructive biliary disease – but they do not provide a clear probabilistic basis for their reasoning. In particular, there is no scientific rationale for the widespread advice to repeat an abnormal LFT before conducting further tests. Green and Flamm13 state in their 2002 review of 1400 papers: ‘Unfortunately … there are no long term prospective studies to define the natural history of liver disease in patients with abnormal liver chemistries tests.’ They call for a substantial prospective study of a well-documented population given a standardised diagnostic work-up in general practice and then followed up for a period of time. It was this gap in the literature that the Birmingham and Lambeth Liver Evaluation Testing Strategies (BALLETS) study was designed to rectify.
Previous research
There is considerable literature on the laboratory measurement of analytes. Dufour et al. 14,15 carried out a systematic review of this topic in 2000. This review contains much useful information on biological variability and how it is affected by sex, age, race, use of the oral contraceptive pill (and other medicines), pregnancy, exercise, delay in analysis and time of day. The study also reviews the patterns of abnormality of each analyte given different diseases. A further systematic review that distilled 14,000 references was commissioned by the American Gastroenterology Association Clinical Practice Committee in 2002. 13 Again, most of the references describe probabilities of test results given various diseases, rather than the probabilities of the various diseases given test results. For example, Bonacini 16 describes ‘test results in people with cirrhosis due to chronic hepatitis infection’. Only a small proportion of articles report likelihood of disease by test result. Studies in this category tend to be based on hospital patients with serious abnormalities, such as ‘notably raised aspartate aminotransferase’17 or ‘requiring liver biopsy’. 18–20 Angulo et al. 21 investigated a remarkable 733 patients with non-alcoholic fatty liver disease (NAFLD) confirmed by liver biopsy to determine which features were associated with more serious disease, while Ekstedt et al. 22 followed up 129 patients with biopsy-confirmed NAFLD for a mean of 13 years and showed that the subgroup with ‘steatohepatitis’ had an increased risk of both cardiovascular and liver-related death compared with a reference population.
We updated the above review (Table 1) and selected studies that started with the LFT result and then followed the cohort, so as to provide the type of probability needed for decision-making. MEDLINE was interrogated, with limits placed on the overall search with respect to ‘humans’ and ‘publishing date post 1980’. Owing to the variety of nomenclature regarding LFTs a variety of search strings were used for this category. Search strings relating to abnormal LFT results included ‘liver function test’, ‘transaminases’, ‘alanine aminotransferases’, ‘aspartate aminotransferases’, ‘alkaline phosphatase’ and ‘gamma-glutamyltransferases’. The search was focused by using the limits of blood, analysis and metabolism. Despite the limits, these search strings retrieved over 35,000 references. The term ‘hepatitis’ was considered too narrow when attempting to find studies that followed up patients for a variety of diseases, so the more general term of ‘liver diseases’ was included, with limits of diagnosis, enzymology, epidemiology, mortality and virology, which retrieved around 8500 references. When these two search strategies were combined, 1448 papers were returned, the abstracts of which were read.
LFT search strings (limited using the subheadings; blood, analysis and metabolism) | Hepatitis search strings |
---|---|
‘liver function test’ | ‘liver diseases’ (diagnosis) |
‘transaminases’ | ‘liver diseases’ (epidemiology) |
‘alanine aminotransferases’ | ‘liver diseases’ (enzymology) |
‘aspartate aminotransferases’ | ‘liver diseases’ (virology) |
‘alkaline phosphatase’ | ‘liver diseases’ |
‘gamma-glutamyltransferases’ | |
With limits added (‘humans’ and ‘publishing date post 1980’) | With limits added (‘humans’ and ‘publishing date post 1980’) |
Papers returned = 35,070 | Papers returned = 8526 |
Eight studies were found that matched our requirement of following up patients who had experienced an abnormal LFT result. Two additional articles were selected from the references of relevant studies. As a result, to the best of our knowledge, there are only 10 studies for which a cohort of asymptomatic patients with abnormal LFTs was followed up (Table 2). However, one article was written in Korean (only the abstract was translated) and was excluded from our analysis.
Author and country | Date | Type of study and population studied | Analytes used | No. of patients enrolled | No. of patients with abnormal LFT results (%) | Prevalence of viral hepatitis in patients with abnormal LFT results (%) | Notes |
---|---|---|---|---|---|---|---|
McLernon et al.,2 Scotland | 2009 | Record linkage; laboratory database of GP tests, hospital admissions and death certificates | Bilirubin, albumin, ALP, GGT, ALT, AST (transaminases sometimes combined). GP selected | 95,977 | 20,827 (21.7) | 2.2 | Mean follow-up of 3.7 years. Risk of underascertainment |
Pendino et al.,27 Italy | 2005 | Prospective cohort study; general population | AST, ALT, GGT | 1645 | 319 (19.4) | 17.9 | High baseline rate of viral hepatitis: 5.6% |
Kim et al.,23 Korea | 2004 | Record linkage: insurance data and death certificates | AST, ALT | 142,055 | 11,193 (7.9) | N/A | Outcome was liver disease mortality |
Yano et al.,28 Japan | 2001 | Prospective cohort study; ‘healthy’ office workers | AST, ALT, GGT | 1973 | 358 (18.1) | 2.7 | Assumed that all liver cancer and cirrhosis was a result of viral hepatitis |
Daniel et al.,18USA | 1999 | Prospective cohort study; primary-care population | ALT, AST raised 50% above normal on at least two occasions across a 6-month period | 1124 | 1124 (100) | N/A | Marker-negative patients only, so infected patients excluded from analysis |
Mathiesen et al.,30 Sweden | 1999 | Prospective cohort study; primary-care population | AST, ALT raised for at least 6 months (ALP had to be normal) | 150 | 150 (100) | 15.3 | |
Whitehead et al.,17 UK | 1999 | Prospective cohort study; primary-care population | AST markedly raised [10 times (> 400 U/l) above the ULN] | 137 | 137 (100) | 2.2 | |
Bellentani et al.,26 Italy | 1994 | Prospective cohort; general population | AST, ALT, GGT | 6917 | 1473 (21.3) | 2.4 | |
Hultcrantz et al.,29 Scandinavia | 1986 | Prospective cohort study; primary-care population | AST, ALT moderately raised for at least 6 months (ALP had to be below twice the ULN) | 149 | 149 (100) | 2.7 |
Two of the remaining nine English-language papers described record linkage studies. One such study was based on the Korean insurance database, which was linked with death certificates. 23 This study reported that increased alanine aminotransferase (ALT), even within the upper end of the normal range, was associated with eventual death from liver disease. A study carried out in Scotland linked general practice and hospital databases. 2,24 However, this was a retrospective study so a full liver screen was not conducted and follow-up was for a median of only 4 years, whereas many diseases, including chronic viral hepatitis, have much longer prodromal periods. 25
The other seven studies were prospective cohort studies, based on testing asymptomatic members of the general population. The famous Dionysos study,26 based on three analytes from the LFT panel, is included among these. In this study, an impressive 6917 citizens from two communities in northern Italy were screened. Although the authors tested for viral hepatitis all of those in whom the LFT result was abnormal (n = 1473), and among whom they found a prevalence rate of 2.4%, the main aim of their study was to determine the effect of alcohol and diet on LFTs. Testing for viral hepatitis was used as a method of excluding causes of liver damage other than their topic of interest, so in-depth analysis on how viral hepatitis affected the pattern of LFTs was not published. Another Italian study, by Pendino et al. ,27 screened 1645 inhabitants from a town in southern Italy, with both a LFT [ALT, aspartate aminotransferase (AST) and gamma-glutamyltransferase (GGT)] and viral screen. 27 The prevalence of viral hepatitis is much higher in this region because of a significant immigrant population, and the authors performed a more extensive analysis of the impact of viral hepatitis on LFTs. Of the 319 (19.4%) individuals in whom LFT results were abnormal, nearly 18% were infected with viral hepatitis. However, the LFT failed to detect 34 (37%) of the 92 cases of viral hepatitis present in the community. Perhaps the most comprehensive prospective analysis looking at the effect of viral hepatitis on individual analytes was carried out on a population of Japanese office workers. 28 The study used data from compulsory health checks, which included an ALT/AST/GGT panel along with certain additional tests, including a viral screen, which were added for study purposes. The authors found that ALT was the most sensitive of the three analytes used, detecting nearly half of cases of viral hepatitis, while being abnormal in 14% of the cohort (278 abnormal results in 1973 participants). The remaining four prospectively designed studies were carried out in general practice and were therefore closer in population terms to the BALLETS cohort. However, three of these are restricted to patients with persistently abnormal LFT results over a 6-month period,18,29,30 and one of these did not include a test for viral hepatitis. The final prospective study, by Whitehead et al.,17 was small and based on only one analyte.
After this review of the literature we concluded that no study has fully investigated a cohort of patients in primary care with an abnormal LFT result (from the full LFT panel) and no obvious or known liver disease. BALLETS is thus the first study to test the validity of the various strategies that a GP could use to make a diagnosis in patients with abnormal LFTs. The BALLETS study was based on performing a full LFT panel of investigations to identify diseases such as chronic viral hepatitis and primary biliary cirrhosis (PBC) that could otherwise be identified only by follow-up lasting many decades. The study was therefore designed to look into, and ‘concertina’, the future. Patients were also followed up over 2 years to detect systemic diseases attacking the liver (e.g. disseminated cancer), to follow the progress of people with excess alcohol consumption and/or ‘fatty liver’ on ultrasound and to ascertain the rates at which abnormal LFTs reverted to normal according to diagnostic category and type of analyte that was abnormal.
We also identified a relevant study by Kim et al. 31 This study prospectively followed a group of ‘healthy’ Korean factory workers, taking measurements of ALT, AST and GGT on at least two separate occasions. The full article was in Korean so we had access to the abstract only.
Structure of this report
The central idea behind the BALLETS study was to create a well-characterised cohort (as described above) and follow patients for 2 years. A database would thereby be created for statistical analysis. The generation and analysis of this database are referred to as the ‘main study’. The objectives of this study are detailed in Chapter 2, the methods are described in Chapter 3 and the results are presented in Chapter 4. The report also contains a series of substudies, the objectives of which are spelled out in Chapter 2. The methods and results of these substudies are then described in Chapter 5, which contains sections dealing with the psychological effects of a positive test (see Chapter 5 Psychology 1: effects of positive tests); a qualitative account concerning the effects of testing on behaviour (see Chapter 5 Psychology 2: effects of results on behaviour); a qualitative account of clinicians' motivations for testing (see Chapter 5 Sociology of testing: an exploration of the clinical and non-clinical motives behind the decision to order a liver function test); a decision analysis covering options following a positive LFT test result (see Chapter 6); and a study of markers for fibrosis in a subset of patients with ‘fatty liver’ from the Birmingham cohort (see Chapter 6). In Chapter 6 we discuss the implications of our study, integrating lessons from the main study and substudies. We approach this task by imagining that all of the scientific information regarding LFTs – including that from the BALLETS study – was available, but that LFTs had not yet come into widespread, routine use. We also make use of the different reasons for testing that emerge from the qualitative substudy of GP reasons for ordering LFTs. This perspective leads to proposals to use different testing strategies according to the different reasons for conducting laboratory investigations. Perhaps provocatively we argue that the idea of a one-size-fits-all panel is obsolete. The original protocol for the study is included as Appendix 1 (BALLETS study protocol).
Chapter 2 Objectives
Main study
The Health Technology Assessment (HTA) commissioning brief made it clear that the overall objective was to inform general practice decision-making. Thus, the main objective can be framed as follows: ‘How does the probability of disease vary by the pattern of abnormal LFTs and the clinical features of a patient?’. ‘Pattern’ of abnormal LFTs describes which analytes are abnormal (singly or in combination) and the degree (extent, magnitude) of the abnormality. In particular, we set out to ascertain the predictive value of the pattern of LFTs for the specific and often treatable viral, genetic or autoimmune liver diseases in Table 3.
Disease | Prevalence (%) | Source |
---|---|---|
Chronic viral hepatitis C | 0.42 | Health Protection Agency website, cited 201132 |
Chronic viral hepatitis B | 0.3 | Health Protection Agency website, cited 201133 |
Metal storage disease – iron | 0.25 | Worwood 199834 |
PBC | 0.024 | Metcalf et al. 199735 |
Autoimmune hepatitis | 0.001 | Autoimmune Hepatitis website, cited 200936 |
Metal storage disease – copper | < 0.025 | Olivarez et al. 200137 |
A1AT deficiency | < 0.025 | de Serres 200238 |
Secondary objectives of the main study were:
-
To follow up people who had neither one of the above serious and treatable liver diseases nor another serious disease (such as metastatic cancer) and to evaluate the extent to which abnormal LFTs progressed or remitted over a 2-year period.
-
To determine the proportions where ‘fatty liver’ progressed, improved or stayed the same and to investigate how clinical, behavioural and biochemical features correlated with progression, resolution or maintenance of the ultrasound finding. This study was not part of the original protocol but was prompted by the high incidence of fatty liver at entry to the study. Repeat ultrasound was funded under an extension to the original grant.
-
To investigate the issue of redundancy among LFT analytes by measuring what would be lost in terms of prognostic accuracy by dropping certain analytes from the full panel of LFT analytes. This is an important issue because the benefit of analytes that offer small marginal gains in detection rates may be outweighed by losses as a result of false-positives.
-
To shed light on the utility of undertaking LFTs in the first place by determining the prevalence of serious disease in the cohort as a whole.
Some of these figures may be underestimates of the incidence of the various pathological entities since we now know that many people may have subclinical disease with such long lead times that they do not present clinically during the person's lifetime. This applies in particular to haemochromatosis and PBC, a point to which we return.
Psychological substudy
Abnormal LFTs may have psychological consequences, and this is important given the high proportion of false-positive results that were anticipated. The original protocol thus included a psychological substudy based mainly around the measurement of (any) induced anxiety at various stages following disclosure of a positive result.
We became increasingly aware that knowledge of abnormal LFT results, and performance of some tests prompted by abnormal LFT results, might constitute an intervention in their own right, as news of these results might affect behaviour (see Sociological substudy, below). For example, a person with persistently abnormal LFT results and an ultrasound diagnosis of fatty liver may be influenced by these results to modify unhealthy behaviour (excessive calorie and/or alcohol intake). Conversely, a normal result may provide false reassurance. The follow-on study was thus adapted not only to observe any residual anxiety caused by testing, but also to collect data on (any) changes in eating and drinking habits. The additional data collection for this purpose at the 2-year follow-up point was funded by an extension to the HTA grant.
Sociological substudy
A (perhaps predictable) early finding from our study was that LFTs do not offer high diagnostic precision, and that the positive predictive value (PPV) (probability of disease given a positive test) is low. Moreover, the value of LFTs, as of any test, lies in its incremental diagnostic accuracy given what the doctor knows before the result is made available. For example, finding a raised ALT level in a patient with a known alcohol problem would not be a surprise. On the other hand, such a result may buttress the doctor's advice to reduce alcohol consumption. These considerations raise the question of why so many LFTs are ordered in the first place. If GPs (erroneously) thought that LFTs were highly predictive of serious treatable disease then we may expect the BALLETS results to reduce demand for LFTs. If, however, the low predictive value of these tests is not news to GPs then other approaches would be necessary to reduce test ordering (if this was perceived as desirable – see Decision analysis, below). We therefore carried out a further study, not included in the original protocol, to find out more about GPs' motivation for ordering LFTs. This substudy included a general review of the literature on GPs' test-ordering behaviour. The protocol for this study is described in this report.
Decision analysis
As stated in Sociological substudy, above, it became clear from the literature (and emerging results in this study) that the predictive value of LFTs was rather low. This raises the question of what action (if any) a doctor should take when confronted by a mildly abnormal LFT result. Clearly, if there is an obvious clinical lead then this should be followed; for example, if a person has a history of intravenous drug use then a test for viral hepatitis is indicated. However, the majority of cases are more ambiguous. We therefore decided that it would be helpful to carry out a formal decision analysis to examine the losses and gains associated with various clinical opinions. Conducting a decision analysis for each potential disease and then consolidating them into one composite analysis would be well beyond the scope and resources of this project. We therefore selected one disease class – chronic viral hepatitis – as an exemplar on the basis that:
-
Unlike high alcohol intake and obesity, the clinician can diagnose the condition only by further testing.
-
The disease, if caught early, is highly treatable.
-
It is one of the most common of the specific liver diseases to present clinically.
We were aware of the previous decision analysis in the previous HTA report2 and our analysis includes a critique of this work.
Biochemistry of ongoing liver disease
It became clear at an early stage that the BALLETS study would generate a sizable cohort of people with fatty liver.
The extensive testing algorithm incorporated in the study did not include all necessary tests for the diagnosis of the enigmatic condition called ‘metabolic syndrome’. The literature suggests that a small percentage (5–10%) of people with fatty liver would progress to liver fibrosis, and the BALLETS study provides a platform for the study of novel blood tests that might predict such progression. We therefore performed an add-on study in which a fibrosis score was calculated. In addition, new hypotheses concerning the origin and prognosis of fatty liver may emerge over the next 4 years in this fast-moving field of enquiry. For these reasons, additional funding was sought and granted by the HTA programme to store frozen blood samples from consenting participants.
Chapter 3 Methods: main study
Selection of practices and patients
Practices were selected on the basis of geographic spread and their willingness to join the study. They had to be multiple-partner practices. We deliberately included inner city practices in order to ‘enrich’ the population to include a higher than average proportion of chronic viral hepatitis. Two city areas were selected: Birmingham and the Lambeth district of London. This was done so that the relationship between LFTs and this disease could be studied. The geographical location and demographic and ethnic features of the eight Birmingham practices and three Lambeth practices that we were able to recruit are described in Chapter 4 (see Nature of the population studied: Birmingham and Lambeth sites).
General practitioners from participating practices reviewed all abnormal LFT results arising in their practice to determine eligibility. Patients aged > 18 years were eligible if one or more analyte was abnormal, they did not have known liver disease, they were not deemed to require immediate referral to hospital and they were not pregnant. Seven out of the eight Birmingham practices sent samples to a single laboratory (University Hospitals Birmingham NHS Foundation Trust laboratories), whereas the remaining practice (Wand Medical Centre) sent samples to the laboratory of Russells Hall Hospital. All Lambeth practices used a single laboratory (Guy's and St Thomas' NHS Foundation Trust laboratory). The repertoire of analyses included, prompted by a request for LFTs from the participating practices, was extended over the study period from the usual five analytes in our laboratories to all eight listed in Table 4. The idea was to enable redundancy between tests to be detected and to help generalise to centres that included different analytes. The analytes were classified as normal or abnormal according to standard laboratory practice that is compliant with International Quality Control Standards. The classification was based on reference ranges specific to each of the (three) individual laboratories (see Table 4).
Test | Reference range | ||
---|---|---|---|
University Hospitals Birmingham NHS Foundation Trust | Russells Hall Hospital NHS Trust | Guy's and St Thomas' NHS Foundation Trust | |
ALT | 1–41 U/l | 1–56 U/l | 1–45 U/l M, 1–28 U/l F |
AST | 1–43 U/l | 1–45 U/l | 1–49 U/l |
Bilirubin | 1–22 μmol/l | 1–22 μmol/l | 1–22 μmol/l |
ALP | 1–320 U/l age < 40 years M | 1–120 U/l | 1–129 U/l |
1–330 U/l age ≥ 40 years M | |||
1–260 U/l age < 40 years F | |||
1–290 U/l age 40–49 years F | |||
1–330 U/l age ≥ 50 years F | |||
GGT | 1–40 U/l F, 1–50 U/l M | 1–58 U/l | 1–65 U/l M, 1–38 U/l F |
Albumin | 34–51 g/l | 35–47 g/l | 40–52 g/l |
Globulin (derived) | 21–37 g/l | 21–37 g/l | 21–37 g/l |
Total protein | 60–80 g/l | 65–83 g/l | 61–79 g/l |
Eligible patients were contacted to seek verbal consent to participate in the study. The method of contact varied from practice to practice so that it would be compatible with the normal procedures used in the practices. The bespoke protocols to inform patients of their results and the study process are described, for each practice, in Appendix 1 (section 10.2a–f). Once an eligible patient had been identified he or she was contacted and invited to attend the practice for a study session. The practice sent a Patient Information Sheet to all potential patients in advance of their attendance at the study session.
Testing strategy for patients in the Birmingham and Lambeth Liver Evaluation Testing Strategies study
Formal written consent was sought when the patient attended the study session. The following information was collected and recorded:
-
Clinical details (Table 5).
-
An alcohol use questionnaire was completed and the patient's weight, height, waist and hip size were measured (Table 6).
-
A single blood sample was obtained for detailed analysis. The LFT panel was repeated along with tests for specific (autoimmune, genetic and viral) diseases (Table 7).
-
An ultrasound scan (USS) of the liver was obtained using a portable ultrasound machine (TITAN® SonoSite) operated by experienced (10 years minimum) sonographers from the ultrasound department of the University Hospitals Birmingham NHS Foundation Trust, Worcester Acute Hospitals NHS Trust or Guy's and St Thomas' Hospital NHS Trust. The sonographer completed a pro forma (see Appendix 1, section 10.7) that included a description of liver texture on a four-point scale, indicating normal, mild, moderate and severe echo density. Fatty liver on ultrasound was determined by comparison of brightness/echogenicity in the liver with the right kidney. The sonographer notified the named or on-call GP of any findings of a sinister nature so that they could be acted upon immediately. All scans were recorded on tape and 50 of these were selected at random from the first participating practice for scrutiny by a senior radiologist, as a form of quality control (see Quality control of ultrasound).
1. GP name and practice code | ||||
2. Patient study ID | ||||
3. Name and address | ||||
4. Date of birth | ||||
5. NHS no. | ||||
6. Gender | ||||
7. Current and recent medication | ||||
8. Reason for GP consultation/LFTs ordered? | ||||
9. Current/past Illnesses | ||||
10. Recent febrile Illness | ||||
11. Recent muscle damage | ||||
12. Substance abuse | Past □ | Current □ | Intravenous □ | Oral □ |
13. Recent travel history | Over last 6 months? | Where? | ||
14. Immunisation against HBV | ||||
15. Transfusion history | No □ | Yes □ | Date | |
16. Length of residence in the UK | ||||
17. Ethnic group | ||||
18. Preferred language | ||||
19. Country of birth |
Alcohol consumption (units per week over past 6 months?) | ||||||||
a. How often do you drink? | Annually | Special occasions | Monthly | Fortnightly | ||||
Weekly/daily | M | T | W | T | F | S | S | |
b. What is the type or brand? | ||||||||
c. What size of glass or can do you drink? | ||||||||
d. Number of each type of drink consumed in a session? | ||||||||
Measurements | ||||||||
Height (cm) | ||||||||
Weight (kg) | ||||||||
Waist measurement (cm) | ||||||||
Hip measurement (cm) |
Hepatitis B viral markers (HBV surface Ag) |
HCV antibody (HCV Ab) |
A1AT |
Caeruloplasmin |
Iron and transferring |
SMA |
AMAs |
The research team produced a consolidated report comprising the results of the index LFT and the first follow-up LFT, and all of the information described in Tables 5–7, along with the result of the ultrasound examination. The patient participant then attended the GP for a consultation informed by all of these data.
Note that the intention was for each patient to have three LFT panels performed as part of the BALLETS study:
-
the test that confirms eligibility: ‘the index test’
-
repeat test on agreeing to enter the study: ‘the first follow-up test’ (FU1)
-
test at 2-year follow-up: ‘the second follow-up test’ (FU2).
The GPs were provided with a set of guidelines to assist decision-making when one of the tests in Table 7 was abnormal or when an abnormality, such as fatty liver, was seen on the USS. The guidelines were produced by members of the study team (JN and RL) and approved by each practice. The guidelines are outlined in Appendix 1 (section 10.9). In addition, clinical members of the research team visited practices to provide proctorship on what to do about abnormal results. The results of follow-up tests were obtained from the laboratories by the research team. In some cases a follow-on test indicated according to the guideline was absent from the laboratory records. In these cases the chief investigator contacted the practice concerned to remind the GP to consider recommending the test to the patient. This issue of missing follow-on blood tests had not been foreseen by the research team and ethical permission was obtained to amend the protocol so that GPs could be contacted.
The 2-year follow-up visit
A second follow-up visit was offered to patients 2 years after the first follow-up visit . The electronic patient records at practices were scrutinised where possible and patients placed in four categories for the purpose of 2-year follow-up:
-
Deceased The cause and date of death were ascertained from notes or the practice database.
-
No longer registered with the practice The new practice was contacted and the GP asked to invite the patient to attend for a second follow-up LFT for submission to the original laboratory.
-
Patient under ongoing hospital care The diagnosis was obtained from study hepatologists in Birmingham or Lambeth.
-
Remaining patients The remainder were invited to attend the practice for the second follow-up LFT. The weight and body measurements and alcohol history were repeated at this visit. Extensions to the protocol were obtained from the funder to enable patients at Birmingham to undergo a repeat ultrasound examination and to be asked to consent for an aliquot of blood being preserved for cryogenic storage of cells and serum. These protocol amendments and patient documentation for this enhanced follow-up in Birmingham were approved by the ethics committee.
A summary of the full patient journey is illustrated in Figure 1.
Laboratory methods
The biochemical measurements were carried out in the accredited (Clinical Pathology Accreditation UK) laboratories of University Hospitals Birmingham NHS Foundation Trust (Queen Elizabeth and Selly Oak Hospitals, Birmingham), of Guy's and St Thomas' NHS Foundation Trust (St Thomas' Hospital), and of Dudley Group of Hospitals NHS Foundation Trust (Russells Hall Hospital, Dudley). The measurements were performed on serum obtained from blood samples collected into Vacuette tubes (evacuated collection tubes) containing no anticoagulant (Greiner Bio-One GmbH, Kremsmuenster, Austria). Serum was obtained by centrifugation of the samples for 5 minutes at 1200 × g and measurements were performed on a Roche Modular Analytic system using specific reagents supplied by Roche Diagnostics (Roche Diagnostics Ltd, Burgess Hill, UK) in University Hospitals Birmingham NHS Foundation Trust (Queen Elizabeth and Selly Oak Hospitals Birmingham) and Guy's and St Thomas' NHS Foundation Trust (St Thomas' Hospital), and on Vitros 5.1 analysers using reagents supplied by Ortho Clinical Diagnostics (Ortho Clinical Diagnostics, Johnson & Johnson, High Wycombe, UK) in the Dudley Group of Hospitals NHS Foundation Trust. ALT, albumin, alkaline phosphatase (ALP), AST, GGT, total bilirubin, total protein, caeruloplasmin and alpha-1 antitrypsin (A1AT) were assayed. Where A1AT concentrations were noted to be < 1.5 g/l, the sample was phenotyped by isoelectric focusing to help in diagnosis and monitoring. The phenotyping was performed on a Sebia Hydrasys instrument (Sebia UK River Court, Camberley, UK) with specific reagents and isoelectric focusing gels.
Integral pilot
Purpose of integral pilot
Rather than follow convention and collect a full data set before setting out on the analysis it was decided to analyse data from the first practice to complete recruitment – the Hall Green Practice in Birmingham. This practice completed its recruitment at a point in time when recruitment in other practices was nascent or yet to begin. Analysis of the integral pilot was carried out as soon as the FU1 data became available, i.e. the integral pilot does not include the FU2 results.
The purposes of this pilot were threefold:
-
to ‘test the system’ by detecting incomplete data and exploring systematic failures so that remedial action could be taken where necessary
-
to compare patients entered in the study with those who might have been eligible but who were not entered in the study
-
to conduct a quality control study on the accuracy of ultrasound findings by reviewing a sample of images stored on tape.
Missing data
One hundred and sixty-one patients were entered in the study in the pilot practice. Two patients did not attend for the ultrasound examination and have been excluded from the pilot analysis. The following analyses all relate to the remaining 158 cases. Their age and sex distributions are shown in Table 8.
Age (years) | Male, n (%) | Female, n (%) | Total, n (%) |
---|---|---|---|
≤ 44 | 24 (25.8) | 14 (21.5) | 38 (24.1) |
45–54 | 20 (21.5) | 9 (13.8) | 29 (18.4) |
55–64 | 20 (21.5) | 19 (29.2) | 39 (24.7) |
≥ 65 | 29 (31.2) | 23 (35.4) | 52 (32.9) |
Age (years), mean (SD) | 54.7 (15.4) | 57.8 (15.0) | 56.0 (15.2) |
Total | 93 (100.0) | 65 (100.0) | 158 (100.0) |
The index panel of LFT analytes was incomplete (i.e. not all of the eight results were available) for 26 out of the 158 patients and complete for 132 (84%) patients. The first follow-up panel of LFT analytes was not available in five cases and the panel was incomplete in 27 cases – thus complete data were available on the follow-up LFT panel for 126 out of the 158 (80%) patients. The full breakdown of the missing data is given in Table 9.
No. of tests | Index test | FU1 test |
---|---|---|
0 | 0 | 5 |
1 | 0 | 0 |
2 | 0 | 1 |
3 | 0 | 7 |
4 | 13 | 14 |
5 | 6 | 1 |
6 | 6 | 0 |
7 | 1 | 4 |
8 | 132 | 126 |
Total | 158 | 158 |
The missing data did not follow any anticipated pattern (see Table 9). One might have expected that if those cases for which all eight analytes required for study purposes had not been included then the five default analytes for this particular laboratory would have been measured. This would have resulted in bimodal distribution, with high peaks at eight and five analytes. On further enquiry, it transpired that the clerks who receive the request forms and enter the requests on computer do so with variable fidelity (for study patients as for routine patients). A programme of staff training was therefore put in place to try to reduce this problem. However, we were advised that with large numbers and high turnover of clerical staff in the laboratory, some remaining laboratory omissions were inevitable.
Comparison of patients who were and were not ‘recruited’
Some eligible patients declined to participate, but we became aware that many more were not invited to participate by their GPs. Furthermore, some GPs recruited many more patients than others. One possible explanation was a tendency to select patients with the more severely abnormal results for entry in the study. This tendency could have been motivated by a desire to obtain all of the ancillary tests inherent in entry in the BALLETS study while reducing the need for further attendances and testing among those at lower perceived risk. This could lead to bias if, even among cases with equal severity of abnormality, GPs were somehow identifying patients with the worst prognosis for inclusion in the study. This could result in exaggerated estimates of the risks associated with abnormal LFTs.
In order to shed light on this issue, we collected baseline data from all (195) eligible but non-entered patients for two calendar months – May and June 2006 – and compared them with 53 participating patients for those months. This epoch was selected on the grounds that it corresponded to the period of highest recruitment.
The 195 non-entered patients constituted two subgroups: 129 patients had simply not been invited by the GP, despite fulfilling all objective criteria of entry to the study, while the remaining 66 had declined to take part (Table 10a). These subgroups are broken down by age and sex in Table 10b. The mean age of the invited patients, 58.6 years, is somewhat higher than the mean age of not-invited patients, 54.1 years (p = 0.028, two-sided t-test). There was no significant age difference between ‘consenters’ and ‘refusers’ within the invited group (p = 0.766). Thus, the 53 patients in the study tended to be older than those outside it. To put this in perspective, 68% (40/59) of eligible 65- to 74-year-olds were invited to join compared with 31% (22/72) of eligible patients under 45 years.
Status | n | Mean (SD) age (years) |
---|---|---|
Consented | 53 | 58.2 (13.9) |
Refused | 66 | 59.0 (16.5) |
Total invited | 119 | 58.6 (15.3) |
Not invited | 129 | 54.1 (17.2) |
Total | 248 | 55.4 (17.3) |
Category | n (%) | n (%) | n (%) | n (%) | n (%) |
---|---|---|---|---|---|
Age (years) | |||||
≤ 44 | 9 (17.0) | 13 (19.7) | 22 (18.5) | 40 (31.0) | 62 (25.0) |
45–54 | 10 (18.9) | 10 (15.2) | 20 (16.8) | 27 (20.9) | 47 (19.0) |
55–64 | 12 (22.6) | 13 (19.7) | 25 (21.0) | 26 (20.2) | 51 (20.6) |
65+ | 22 (41.5) | 30 (45.5) | 52 (43.7) | 36 (27.9) | 88 (35.5) |
Sex | |||||
Male | 33 (62.3) | 42 (63.6) | 75 (63.0) | 77 (59.7) | 152 (61.3) |
Female | 20 (37.7) | 24 (36.4) | 44 (37.0) | 52 (40.3) | 96 (38.7) |
By contrast, the sex distribution was stable across all subgroups.
Abnormalities in the index LFTs for these 195 patients are analysed in Tables 11 and 12. The proportion of patients with abnormal GGT was higher (p = 0.011) among those invited to join the study (73.7%) than among those not invited (58.1%). However, there is no evidence of preferential invitation associated with abnormality on any other analyte, nor with the presence of more than one abnormality in the index panel (see Table 11). However, there was an (unexplained) tendency for invited patients with abnormal globulin to decline to participate (p = 0.002). Otherwise we found no evidence of recruitment bias.
Analytes | Consented, n (%) | Refused, n (%) | Exact test | Total invited, n (%) | Not invited, n (%) | Exact testc | Total, n (%) |
---|---|---|---|---|---|---|---|
Total | 53 (100) | 66 (100) | 119 (100) | 129 (100) | 248 (100) | ||
ALTd | 19 (38.0) | 18 (27.3) | 0.234 | 37 (31.9) | 33 (25.6) | 0.322 | 70 (28.6) |
ASTd | 5 (10.2) | 3 (4.5) | 0.283 | 8 (7.0) | 15 (11.6) | 0.274 | 23 (9.4) |
Bilirubind | 4 (8.2) | 1 (1.5) | 0.162 | 5 (4.3) | 10 (7.8) | 0.299 | 15 (6.1) |
ALPd | 2 (3.9) | 7 (10.6) | 0.295 | 9 (7.7) | 16 (12.4) | 0.291 | 25 (10.2) |
GGTd | 41 (78.9) | 46 (69.7) | 0.298 | 87 (73.7) | 75 (58.1) | 0.011 | 162 (65.6) |
Albumind | 1 (1.9) | 3 (4.5) | 0.628 | 4 (3.4) | 3 (2.3) | 0.713 | 7 (2.8) |
Globulind | 1 (2.0) | 14 (21.2) | 0.002 | 15 (12.9) | 23 (17.8) | 0.377 | 38 (15.5) |
Total proteind | 5 (10.0) | 18 (27.3) | 0.033 | 23 (19.8) | 30 (23.3) | 0.538 | 53 (21.6) |
More than one abnormal analytee | 17 (37.8) | 29 (43.9) | 0.560 | 46 (41.4) | 54 (41.9) | 1.000 | 100 (41.7) |
Analyte | Consented | Refused | Total invited | Not invited | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
n | Excess mean | Excess median | n | Excess mean | Excess median | n | Excess mean | Excess median | n | Excess mean | Excess median | |
ALT | 19 | 1.37 | 1.29 | 18 | 1.46 | 1.32 | 37 | 1.42 | 1.32 | 33 | 1.55 | 1.32 |
AST | 5 | 1.37 | 1.16 | 3 | 1.52 | 1.56 | 8 | 1.42 | 1.47 | 15 | 1.54 | 1.26 |
Bilirubin | 4 | 1.90 | 1.86 | 1 | 1.09 | 1.09 | 5 | 1.74 | 1.55 | 10 | 1.34 | 1.20 |
ALP | 2 | 1.20 | 1.20 | 7 | 1.49 | 1.17 | 9 | 1.43 | 1.17 | 16 | 1.85 | 1.24 |
GGT | 41 | 1.79 | 1.30 | 46 | 2.16 | 1.54 | 87 | 1.99 | 1.42 | 75 | 2.15 | 1.42 |
Albumin | 1 | 1.06 | 1.06 | 1 | 1.02 | 1.02 | 2 | 1.04 | 1.04 | 2 | 1.03 | 1.03 |
Globulin | 1 | 1.08 | 1.08 | 14 | 1.15 | 1.14 | 15 | 1.14 | 1.14 | 20 | 1.08 | 1.05 |
Total protein | 5 | 1.02 | 1.02 | 18 | 1.05 | 1.04 | 23 | 1.04 | 1.04 | 30 | 1.04 | 1.02 |
In the end, we were not able to exclude a degree of selection in patients entered. Some selection effects associated with age and GGT abnormality are suggested. Individual differences in how eligibility criteria are applied are inevitable in a large and busy practice and we cannot exclude a degree of bias owing to hidden confounders. If clinicians selected a group of patients with significantly higher prior risk, then it is possible that the study will somewhat overstate the association between mildly abnormal test results and the various disease end points. All we really know is that some apparently eligible patients were not invited by their GP to participate. This may be because of a purposive decision to exclude or because of some oversight. Such data are difficult to collect because doctors who are unwilling or too hard pressed to select patients for study entry are unlikely to go to the trouble of recording their reasons.
Quality control of ultrasound
In order to quality assure the liver imaging, the first (FU1) ultrasound images and paper reports of 50 randomly selected BALLETS patients were presented to the study radiologist, who was in complete agreement with the sonographer's findings in 34 out of the 50 cases. In the remaining 16 cases there was some ‘technical or relatively minor’ disagreement but ‘no serious clinical disagreement’ that might have altered clinical decision-making.
The process was repeated with 50 randomly selected FU2 ultrasound scans. The study radiologist agreed with the written report in 38 instances, and in the remaining 12 cases found that there was ‘technical or relatively minor’ disagreement, but, once again, no disagreement that would alter the clinical decision-making process.
Production of reference categories (categories of diagnostic groupings)
The analysis plan in the study protocol was to investigate the association between the index LFT panel and clinical outcome in order to address such questions as:
-
Which profiles of index test results suggest higher and lower risk of the presence of serious specific disease, and of the other reference standards?
-
What is the contribution of different test analytes? How does this vary by clinical features?
The idea here was that the information the GP would have at the time of the index test would constitute explanatory variables in an analysis of clinical outcome using polytomous regression methods to cope with multiple diagnostic categories. The BALLETS study would provide outcome data partly by repeating the LFT (at FU1 and FU2), but, more specifically, by doing exhaustive further testing to ‘concertina’ the future and reach a diagnosis.
This exercise required that each participant be assigned to an outcome (diagnostic) category. That is to say, we needed a reference standard. However, experience gained from our integral pilot suggests that this is tricky. The problem we encountered might be called ‘multiple and overlapping categories’. Briefly, when we came to analyse the data, we found that patients did not fall into a manageable number of discrete categories. For example, the category ‘fatty liver’ could be divided into ‘fatty liver alcohol excess’, ‘fatty liver overweight’, ‘fatty liver overweight and alcohol excess’ and ‘fatty liver and not overweight and no alcohol excess’. However, the job would still not be done – there could then be subcategories for each of the above according to whether the virology was positive or negative, for example. Then there would have to be categories for excess alcohol, overweight, viral diseases, immunological diseases and metal storage diseases – all with and without fatty liver. Our initial discussions with chemical pathologists, liver specialists and GPs suggested that consensus regarding a manageable number of mutually exclusive pathological diagnoses was unlikely to be obtainable. Indeed, even taking a liver biopsy would fall well short of resolving this issue.
It was therefore decided to ‘collapse’ the reference standards into a small number of broad ‘action groups’. These groups are based on the appropriate clinical response, rather than on the precise underlying (and often unknowable) pathophysiological entity.
The following groups were created:
-
Group 1 Specific category of viral, autoimmune or genetic aetiology from Table 13:
-
hepatocellular diseases: chronic viral hepatitis B and hepatitis C, haemochromatosis, autoimmune hepatitis, Wilson's disease, antitrypsin deficiency, cirrhosis (alcohol or fat induced)
-
diseases of the intrahepatic bile ducts: PBC, primary sclerosing cholangitis (PSC).
-
-
Group 2 Serious liver or other pathology requiring referral. This would include metastatic cancer.
-
Group 3 Non-specific category. This is broken down into:
-
echo-bright (fatty) liver
-
not fatty liver.
-
Reference group | Subgroups and sub-subgroups |
---|---|
Group 1 (serious viral, genetic or autoimmune disease) | Subgroup A, hepatocellular disease |
Viral hepatitis B or C | |
Haemochromatosis | |
Wilson's disease | |
Antitrypsin deficiency | |
Autoimmune hepatitis | |
Cirrhosis (alcohol or fat induced) | |
Subgroup B | |
PBC | |
PSC | |
Group 2 | Metastatic disease |
Paget's disease of bone | |
Infectious diseases, such as hepatitis A, glandular fever, leptospirosis | |
Thyroid disease | |
Group 3 (non-specific) | Echo-bright (fatty) liver on ultrasound |
Alcohol excess | |
Overweight | |
Alcohol + overweight | |
Neither alcohol nor overweight | |
No fatty liver | |
Gilbert syndrome | |
Persistence of LFT abnormality at 2 years | |
LFT abnormality resolved at 2 years |
The groups are hierarchical in the sense that a person would be assigned to the ‘top’ category when more than one category might apply. Thus, a person with ‘two hits’, such as haemochromatosis and ‘fatty liver’, would be assigned to group 1, not group 3.
The question could be asked as to why we did not include the diagnosis of alcoholic liver damage of a degree less extreme than cirrhosis. The answer is that, had we done so, alcohol use would serve two non-independent functions – as a clinical feature known in advance of testing (in many/most cases) and also as the outcome of testing. That is to say, the results would be subject to incorporation bias (where a variable serves both as an explanatory and as an outcome variable). Formally, the same could be said of alcoholic cirrhosis, but here the diagnosis rests in ultrasound and exclusion of other causes as well as alcohol history.
Statistical methods
This section gives an outline of the methods and approaches used. Fuller details of individual methodologies are presented as appropriate in the results sections.
Variables and data
Demographic and lifestyle information – including body mass index (BMI) and alcohol consumption in units per week – was coded using categorical variables. Six categories each were used for age and alcohol consumption and four for BMI. The details may be read from Table 14.
Characteristic | All subjects (n = 1290) | Subjects with 2-year follow-up LFTs (n = 790) | ||
---|---|---|---|---|
n | % | n | % | |
Sex | ||||
Male | 724 | 56.12 | 453 | 57.34 |
Female | 566 | 43.88 | 337 | 42.66 |
Age (years) | ||||
≤ 34 | 106 | 8.22 | 33 | 4.18 |
35–44 | 165 | 12.79 | 91 | 11.52 |
45–54 | 240 | 18.60 | 149 | 18.86 |
55–64 | 325 | 25.19 | 243 | 30.76 |
65–74 | 273 | 21.16 | 187 | 23.67 |
75+ | 181 | 14.03 | 87 | 11.01 |
Ethnic group | ||||
White | 1056 | 81.86 | 663 | 83.92 |
Asian | 89 | 6.90 | 56 | 7.09 |
Black | 66 | 5.12 | 33 | 4.18 |
Other | 40 | 3.10 | 18 | 2.28 |
Not known | 39 | 3.02 | 20 | 2.53 |
BMI (kg/m2)a | ||||
< 20 | 49 | 3.80 | 13 | 1.65 |
20–24.99 | 250 | 19.38 | 149 | 18.86 |
25–29.99 | 454 | 35.19 | 248 | 31.39 |
30+ | 498 | 38.60 | 294 | 37.22 |
Not known | 39 | 3.02 | 86 | 10.89 |
Alcohol consumption (units per week)a | ||||
0 | 547 | 42.40 | 282 | 35.70 |
1–14 | 352 | 27.79 | 251 | 31.77 |
15–29 | 153 | 11.86 | 90 | 11.39 |
30–49 | 122 | 9.46 | 53 | 6.71 |
50–99 | 84 | 6.51 | 39 | 4.94 |
100+ | 24 | 1.86 | 4 | 0.51 |
Not known | 8 | 0.62 | 71 | 8.99 |
Concentrations of analytes in the LFT panels were recorded (see Table 4 for units) by the three individual laboratories. Laboratory-specific reference ranges, incorporating adjustments for age and sex (see Table 4), were used to categorise values as normal or abnormal. Thus, each LFT result was available in two forms: as a continuous variable (measured concentration) and as a dichotomous variable (normal/abnormal).
Liver fat on ultrasound was recorded on a four-point ordinal scale (normal, mild, moderate and severe). The condition ‘fatty liver on ultrasound’ was identified with the categories normal, mild and severe, and analysed as a binary variable.
Summaries of categorical variables (with percentages) are presented in tabular form. Summaries of analyte concentrations were expressed in terms of medians and quartiles.
Analysis of liver function test data
Abnormality
The presence of an abnormal analyte in the index panel was a criterion of entry to the study. Redundancy in the test panel was investigated by identifying subsets of analytes (i.e. subpanels) which would have recruited the highest proportions of study patients.
Analyte concentrations
Many of the analytes exhibited positive distributional skewness. Regression analyses of concentrations were conducted on log-transformed data. Differences in the distribution of results between laboratories were examined using quantile–quantile (Q–Q) plots and modelled using multiplicative factors (additive on the log-scale).
Pearson correlation analyses (using logged data standardised within laboratories) were carried out for different analytes in the same panel, and for individual analytes over time.
For each patient in the study, data were available from (up to) three LFT panels, recorded at different times. This gives an opportunity to analyse the development of patient readings over time as well as to relate results to demographic and diagnostic information. However, abnormality on the first panel is a criterion of entry to the study. It was anticipated that this feature would manifest itself in a ‘regression to the mean’ effect over the course of the study. Such selection effects could compromise the interpretation of any statistical analysis of the measured concentrations. The FU1 panels are the most complete panels in terms of missing data (certainly more complete than the FU2 data) and somewhat less biased by selection effects than the index panels, as they were not used as a criterion of entry to the study.
A time-series modelling approach was used to partition the variation in analyte concentrations between transient (short-term) components and persistent (long-term) components. The latter may be more relevant for the diagnosis of serious conditions. For this analysis, selection bias was handled by conditioning on the index LFTs. The variance explained by the persistent component in the model was compared with that from an analysis based on intrapatient correlations. Full details of the modelling methodology are found in Appendix 2 (BALLETS study analysis) along with the results.
Stepwise regression procedures were used to model the impact of demographic and lifestyle factors on LFT results from the FU1 panel. The explanatory models obtained were used in subsequent analyses.
Diagnostic category and liver function test results
The relationship between diagnostic category and LFT results was investigated in two ways: (1) by adding diagnostic category to the explanatory models already derived and (2) by means of a multiple discriminant analysis. The discriminant analysis was carried out using a set patient-level variables and (logged) analyte concentrations that had been identified from a series of preliminary logistics regression analyses. The preliminary analyses involved separate stepwise logistic regressions designed to find significant predictors of individual disease categories. These predictors were then used in a multiple logistic discriminant (polytomous logistic regression) analysis between the separate diagnostic categories. The performance of the discriminant to distinguish between liver disease and a non-specific diagnosis was assessed using the area under a receiver operating characteristic (ROC) curve.
The stepwise element in the discriminant analysis described here was restricted to patients with a complete panel of LFTs at FU1. The diagnostic groups for serious liver disease were very small compared with the non-specific group, and further depleted in the complete case analysis. In order to make full use of the LFT data that were available from diseased patients, the analysis was repeated using a multiple imputation technique. Further details may be found in Chapter 4 (see Analysis of imputed data).
Fatty liver
Stepwise logistic regression was used to explore the relationship between fatty liver at FU1 and patient characteristics, including (logged) LFT results, BMI and alcohol consumption. In this analysis, linear and quadratic components for age and alcohol consumption were substituted for the categorical variable for ease of interpretation. These components relate to the ordered categories themselves rather than the raw data. Persistence of fatty liver from FU1 to FU2 was investigated by stepwise logistic regression, including fatty liver at FU1 as a predictor for fatty liver at FU2. The consequences for liver fat of a change in BMI within an individual subject were investigated by means of two ordinal regression analyses: (1) liver fat category at FU2 on liver fat category at FU1 with percentage change in BMI as a covariate and (2) numerical difference in liver fat category between FU1 and FU2 on percentage change in BMI. Change in alcohol units (on a square root scale) was incorporated as a covariate in these analyses.
Sample size
A main objective of the study was to investigate the connection between LFTs and serious liver disease. However, there were several diseases under consideration, and no single primary question on which to power the study [see Production of reference categories (categories of diagnostic groupings)]. In the original protocol, logistic regression methods were proposed to explicate the relationship between diagnostic group and analyte concentrations. Sample size calculations for such problems often focus on ‘events per variable’ rules, which, in this study, suggest that 5 to 10 positive diagnoses would be needed for each predictor variable. Here there are seven independent LFTs (given that total protein is the aggregate of two other analytes), suggesting that between 35 and 70 positive cases would be needed for an unadjusted analysis. The study was designed for 1500 patients, which gives a satisfactory number of events (60) assuming a prevalence of a positive diagnosis of 4%. In practice, 44 cases of serious liver disease were found. If serious liver disease is considered as a composite outcome, the events per variable approach suggests that a reliable analysis is possible, at least for the five non-protein analytes in the LFT panel together with a small number of covariates.
The ‘events per variable’ approach focuses purely on technical aspects of logistic regression estimates. In the original protocol, we also considered a novel alternative criterion based on the ability of a logistic discriminant to identify high-risk cases (i.e. patients with risk of disease higher than an acceptable threshold level). For this purpose a baseline level of acceptable risk was taken to be 2% of the average population prevalence. According to this approach, 1500 patients would be sufficient to estimate a logistic discriminant function with a 90% chance of flagging up any patient whose true risk was twice the acceptable baseline level. These calculations posited an average prevalence of 4% – close to that actually observed in the sample. In retrospect, it seems that the degree to which the risk is predictable from the LFT results was underestimated in the original calculations, suggesting that the true performance of the discriminant would exceed expectations. However, there does not seem to be any direct way to verify this, as the true risk profile remains an unknown function of LFT results. In any event, this approach is concerned only with case finding and attaches no penalty to false-positives.
The other principal statistical analysis is concerned with the incidence of fatty liver. For this the event rates are much higher than for serious liver disease, and the sample numbers required for a meaningful analysis are correspondingly less stringent.
Summary of changes to the protocol
The study protocol can be found in Appendix 1.
Changes to protocol by section
-
Section 2.5.6 The patient's perspective. Altered to describe the contents of psychology questionnaires.
-
Section 3.2.1 Practices. Additional practices were recruited in order to improve and maintain recruitment to the study.
-
Section 3.3.2 Formal enrolment in subsequent testing protocol: defining of the patient population and seeking consent. Changes to this section reflect alterations to the clinical process at each practice. The original clinical protocol was adapted to routine clinical practice.
-
Section 3.3.4 Collection of patient information. Altered to indicate that psychology questionnaires would not be translated into languages other than English.
-
Section 3.3.6 Long term follow-up. The 2-year follow-up phase in Birmingham was more complex than originally planned, including repeat USS, alcohol questionnaire and measurements. This section was altered to reflect changes to the process.
-
Section 3.5.1 Broad aim. Altered to address the possibility of selection bias, which could occur when suitable patients declined to take part or when suitable patients were not selected by their GP to take part (see section 3.5).
-
Section 3.7.1 Psychological pilot study. This pilot study was implemented to inform the development of psychological questionnaires for use in the main study. The study process was updated to indicate changes to measures and time points used for data collection.
-
Section 5.1 Substudy: Cryogenic storage and later testing Approval was obtained to collect and store an anonymised blood sample from consenting Birmingham patients attending their 2-year (FU2) clinic appointment.
-
Section 5.2 Substudy: A qualitative investigation into liver function test ordering behaviour of general practitioners involved in the BALLETS study. A substudy designed to examine the non-clinical motives behind a GP's decision to order an LFT.
-
Section 5.3 Substudy: Follow-up of abnormal test results. In the course of the study some patients tested positive for some specific liver diseases, but many were not followed up according to the agreed algorithm for referral or further testing. Letters were prepared by the study hepatologist and chief investigator suggesting appropriate follow-up of individual study patients testing positive for particular diseases.
-
Section 5.4 Substudy: Qualitative investigation exploring anecdotal and preliminary evidence that events associated with participation in the BALLETS study were motivational to patients with and without fatty liver. A random selection of study patients were interviewed, in response to anecdotal accounts from patients at 2-year follow-up clinics that they had implemented lifestyle changes following their first BALLETS clinic appointment.
Ethics committee approval for changes to the protocol
The main research ethics committee, St Thomas' Hospital Research Ethics Committee, gave favourable ethical opinion to the BALLETS study in April 2005.
During the recruitment and follow-up phases of the study, the St Thomas' Hospital Research Ethics Committee Modifications Subcommittee approved 10 substantial amendment applications for alterations to the study protocol and documentation. Detailed summaries of each amendment are provided in the appendices to the main report (Appendix 3, BALLETS study: summary of ethics and substantial amendment approval). All amendments were also approved by South Birmingham and Lambeth local research ethics/research and development committees.
Approval was sought for the recruitment of new study practices in Birmingham, and corresponding patient documentation, for two qualitative substudies, as described in Chapter 5 (see Psychology 1: effects of positive tests and Psychology 2: effects of results on behaviour), and for a more intensive 2-year follow-up phase for Birmingham patients, which included an additional USS and the cryogenic storage and later testing of cells and serum. In addition, approval was obtained for the study team to remind GPs by letter, of the need to follow up patients who tested positive for some specific liver diseases.
Chapter 4 Results: main study
This section begins with a brief description of the practices from which the participants were drawn and the demographics of the patients in the study (see Nature of the population studied: Birmingham and Lambeth sites and Patients and practice characteristics, below). Numerical summaries of the clinical data are also presented (see Summary of clinical data), namely LFT panels, diagnostic categories and ultrasound features. Some observations on the timing and completeness of panels are included here, together with a brief discussion of selection effects.
Analysis of the LFT panels themselves (see Analysis of the liver function test panels) is presented in two parts: (1) the inter-relationships between unadjusted LFT results and the utility of laboratory reference ranges for assessing abnormality (see Analysis of unadjusted liver function test results) and (2) variation in the concentrations of individual analytes, investigated using regression models to adjust for patient characteristics (see Impact of patient characteristics on liver function test results). The contribution of diagnostic grouping and fatty liver status to these models is also considered (see Impact of diagnostic surgery and Impact of fatty liver, respectively).
The section Liver-related disease contains a detailed clinical description of the categories of liver disease in the sample. Diagnostic value of liver function tests builds on the regression models (see Impact of patient characteristics on liver function test results) to investigate the diagnostic potential of the LFT panel, taking account of individual patient characteristics. The approach is based on stepwise procedures to find the analytes with the greatest discriminatory potential and uses imputation methods to cope with missing values in sparsely populated diagnostic categories.
Fatty liver is investigated in Fatty liver on ultrasound. The risk of fatty liver is modelled using logistic regression techniques. Relationships between fatty liver and lifestyle variables over the course of the study are investigated.
Nature of the population: Birmingham and Lambeth sites
In Birmingham, 1197 participants were recruited from eight practices. The location of the practices within the Birmingham conurbation and the socioeconomic and ethnic group characteristics of the surrounding areas are shown in Figures 2 and 3. In Lambeth, 147 participants were recruited from three practices. The location of the practices within the Lambeth conurbation and the socioeconomic and ethnic group characteristics of the surrounding areas are shown in Figures 4 and 5.
Patients and practice characteristics
Forty-six patients of the total 1344 patients were excluded because none of the original LFTs was, in fact, abnormal. Eight patients were excluded because the second blood test result was completely missing such that neither the FU1 LFT nor any of the tests for specific diseases was available (Figure 6). The analyses use data from the remaining 1290 patients, whose individual characteristics are summarised in Table 14. This includes basic demographic information (recorded at FU1), together with BMI measurements (taken at FU1 and again at FU2) and results from the alcohol questionnaire (administered at FU1 and FU2). Results are given for all subjects (at FU1) and for the subsample who contributed to the FU2 LFT data. The reasons for ordering the index LFTs are summarised in Table 15.
Reasons for LFT ordering | Total |
---|---|
Investigations | |
Abdominal symptoms or signsa | 70 |
General symptoms or signs | 318 |
Suspected alcohol abuse | 18 |
Reviews | |
CVD | 53 |
Cholesterol | 57 |
Hypertension | 151 |
Diabetes | 220 |
Medication | 95 |
Other | 308 |
Total | 1290 |
Eleven general practices contributed to the study and their participation is summarised in Table 16. The first eight practices in the table are situated in Birmingham, and the remaining three in London.
Practice | All subjects (n = 1290) | Subjects with 2-year follow-up LFTs (n = 790) | Processing laboratory | ||
---|---|---|---|---|---|
n | % | n | % | ||
Hall Green | 161 | 12.48 | 117 | 14.81 | a |
Lordswood | 213 | 16.51 | 134 | 16.96 | a |
Greenridge | 195 | 15.12 | 103 | 13.04 | a |
Yardley Wood | 144 | 11.16 | 100 | 12.66 | a |
Woodland Road | 149 | 11.55 | 97 | 12.28 | a |
Cofton | 126 | 9.77 | 76 | 9.62 | a |
Shenley Green | 75 | 5.81 | 42 | 5.32 | a |
Wand | 89 | 6.90 | 58 | 7.34 | b |
Lambeth Walk | 71 | 5.50 | 31 | 3.92 | c |
Waterloo Health | 48 | 3.72 | 26 | 3.29 | c |
The Hurley Clinic | 19 | 1.47 | 6 | 0.76 | c |
Total | 1290 | 790 |
Summary of clinical data
Diagnostic categories
Patients were categorised into three broad diagnostic groups, described more fully in Chapter 3 (Methods: main study). Categories 1 and 2 are the two broad ‘serious liver disease’ categories, which may be further subdivided into: category 1a (hepatitis B and C + other hepatocellular diseases). Category 1b (hepatic bile duct disease) and category 2 [other diseases (such as metastatic cancer)] affecting the liver. The remaining cases (category 3) are rather non-specific.
Ultrasound features
Sonography reports for the liver were obtained at FU1 for 1277 patients and at FU2 for 658 out of 1152 patients from Birmingham practices. Second ultrasound examinations were not performed in Lambeth (see Chapter 3, The 2-year follow-up visit). A four-point ordinal scale (normal, mild, moderate and severe) was used to describe liver fat on each occasion (see Chapter 3, Testing strategy for patients in the BALLETS study), with results summarised in Table 17.
First follow-up | Second follow-up | ||||||
---|---|---|---|---|---|---|---|
Normal | Mild | Moderate | Severe | Not known | DNAa | Total | |
Normal | 324 | 44 | 10 | 1 | 1 | 413 | 793 |
Mild | 62 | 61 | 28 | 0 | 1 | 111 | 263 |
Moderate | 23 | 37 | 36 | 6 | 1 | 74 | 177 |
Severe | 4 | 4 | 8 | 2 | 0 | 26 | 44 |
Not known | 0 | 1 | 1 | 0 | 1 | 5 | 8 |
DNA | 2 | 0 | 0 | 0 | 0 | 3 | 5 |
Total | 415 | 147 | 83 | 9 | 4 | 632 | 1290 |
The subsequent version of the table (Table 18) shows the persistence of the ultrasound diagnosis of fatty liver between the two epochs.
Initial sonography | Two-year sonography | ||
---|---|---|---|
Normal (%) | Abnormal (%) | Total (%) | |
Normal | 324 (85.49) | 55 (14.51) | 379 (100.00) |
Abnormal | 89 (32.84) | 182 (67.16) | 271 (100.00) |
Total | 413 (63.54) | 237 (36.46) | 650 (100.00) |
Liver function test panels
Timing and completeness of panels
The LFT panel was extended from the usual five analytes to eight analytes for study purposes – that is to say the intention was to report on eight analytes on each occasion – the index result that triggered entry in the study and the FU1 and FU2 tests used as part of the comprehensive testing strategy. The number of analytes actually reported on each occasion is shown in Table 19. Complete reporting (all eight analytes) occurred for 915 (70.9%) on index testing, and 1168 (90.5%) at FU1 and 642 (81.3%) at FU2. A complete panel was recorded on the first two occasions (index and FU1) for 844 (65.4%) participants. Compared with the integral pilot data (see Chapter 3, Laboratory methods), completion improved for the follow-up visit but deteriorated for the index visit. A bimodal pattern of testing can be observed, with modes at eight analytes (as required for study purposes) and five analytes (the default situation) (see Chapter 3, Missing data).
No. of tests | Index | FU1 | FU2 |
---|---|---|---|
0 | 0 (0.0) | 15 (1.2) | – |
1 | 3 (0.2) | 3 (0.2) | 1 (0.1) |
2 | 6 (0.5) | 3 (0.2) | 1 (0.1) |
3 | 6 (0.5) | 23 (1.8) | 26 (3.3) |
4 | 103 (8.0) | 16 (1.2) | 42 (5.3) |
5 | 134 (10.4) | 21 (1.6) | 29 (3.7) |
6 | 99 (7.7) | 17 (1.3) | 25 (3.2) |
7 | 24 (1.9) | 24 (1.9) | 24 (3.0) |
8 | 915 (70.9) | 1168 (90.5) | 642 (81.3) |
Total | 1290 | 1290 | 790a |
At baseline, 1290 (100%) patients provided at least one LFT result. At FU1 this number fell to 1275 (98.8%) and at FU2 to 790 (61.2%). However, as described in Table 19, not all LFT panels were complete. Table 20 shows the number of times each individual analyte was recorded as a percentage of the number of panels available.
Analyte | Index panel (n = 1290) | FU1 panel (n = 1275) | FU2 panel (n = 790) |
---|---|---|---|
ALT | 86.4 | 96.8 | 89.5 |
AST | 89.8 | 95.1 | 91.6 |
Bilirubin | 98.1 | 96.7 | 96.3 |
ALP | 98.6 | 96.9 | 95.9 |
GGT | 89.3 | 97.5 | 90.6 |
Albumin | 99.1 | 98.4 | 98.5 |
Globulin | 75.7 | 95.2 | 88.2 |
Total protein | 76.1 | 96.9 | 89.5 |
It was intended that FU1 occur as soon after the index LFT panel as might occur in practice, and that FU2 would occur after a further 2 years. The actual times that elapsed between the index and follow-up tests are summarised in Table 21.
LFT testing | n | Minimum | Q1 | Median | Q3 | Maximum |
---|---|---|---|---|---|---|
Index to FU1 | 1288 | 0.1 | 0.7 | 1.0 | 1.7 | 9.0 |
FU1 to FU2 | 790 | 3.2 | 21.1 | 23.9 | 27.1 | 41.9 |
Index to FU2 | 790 | 4.1 | 22.3 | 25.3 | 28.4 | 43.4 |
The intended interval between FU1 and FU2 was 2 years. In the event, the median interval was almost exactly 2 years (23.9 months) with an interquartile range (IQR) of 21–27 months.
Abnormalities in the index liver function tests
The presence of some abnormality in the index panel was a main criterion for entry to the study. The number of analytes that were abnormal in the index panel is shown in Table 22.
No. of abnormal analytes | Total (%) |
---|---|
1 | 750 (58.1) |
2 | 342 (26.5) |
3 | 152 (11.8) |
4 | 41 (3.2) |
5 | 5 (0.4) |
Total | 1290 (100.0) |
The extent to which analytes were abnormal, when abnormal, is summarised in Table 23 in terms of average values expressed in units of the threshold of abnormality. Thus, for example, the median of the abnormally high ALT values is 1.37 times the upper limit of normal (ULN) (as defined by the laboratory concerned). It can be seen that the degree of abnormality is low in most cases. The exception is GGT, for which the corresponding median is 1.68.
Analyte | Total | Normal | Below normal | Above normal | ||||
---|---|---|---|---|---|---|---|---|
n | n | Mean | Median | n | Mean | Median | ||
ALT | 1114 | 676 | – | – | – | 438 | 1.62 | 1.37 |
AST | 1158 | 903 | – | – | – | 255 | 1.52 | 1.26 |
Bilirubin | 1265 | 1117 | – | – | – | 148 | 1.44 | 1.26 |
ALP | 1272 | 1083 | – | – | – | 189 | 1.30 | 1.16 |
GGT | 1152 | 285 | – | – | – | 867 | 2.41 | 1.68 |
Albumin | 1278 | 1248 | 9 | 0.87 | 0.91 | 21 | 1.04 | 1.04 |
Globulin | 977 | 922 | 10 | 0.90 | 0.93 | 45 | 1.10 | 1.05 |
Total protein | 981 | 884 | 4 | 0.85 | 0.91 | 93 | 1.04 | 1.03 |
Summary of liver function test data
The blood samples were processed by three different laboratories (see Chapter 3, Selection of practices and patients), labelled 1–3. Each laboratory operates its own reference ranges for the detection of abnormality. Mostly the differences between laboratories were slight, but for one analyte (ALP) the reported results for laboratory 1 followed a markedly different distribution from the other two laboratories (see Appendix 2, Liver function test results by laboratory). Given the potential for differences in laboratory practice, the summary statistics in Table 24 have been computed separately for each laboratory (medians and quartiles).
Analyte | Laboratory 1 | Laboratory 2 | Laboratory 3 | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
n | Q1 | Median | Q3 | n | Q1 | Median | Q3 | n | Q1 | Median | Q3 | |
ALT | 898 | 22 | 32.5 | 50 | 79 | 31 | 48 | 65 | 137 | 22 | 35 | 55 |
AST | 1049 | 23 | 28 | 38 | 79 | 29 | 40 | 53 | 30 | 27 | 33.5 | 44 |
Bilirubin | 1049 | 6 | 9 | 13 | 79 | 6 | 8 | 11 | 137 | 9 | 12 | 20 |
ALP | 1053 | 166 | 203 | 264 | 81 | 71 | 94 | 124 | 138 | 65 | 78 | 99 |
GGT | 934 | 46 | 64 | 106 | 81 | 39 | 67 | 89 | 137 | 29 | 72 | 99 |
Albumin | 1059 | 43 | 45 | 47 | 81 | 43 | 45 | 48 | 138 | 45 | 47 | 49 |
Globulin | 863 | 27 | 29 | 32 | 71 | 28 | 31 | 34 | 43 | 26 | 29 | 32 |
Total protein | 864 | 71 | 74 | 77 | 74 | 73 | 76 | 79 | 43 | 73 | 76 | 78 |
Analyte | Laboratory 1 | Laboratory 2 | Laboratory 3 | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
n | Q1 | Median | Q3 | n | Q1 | Median | Q3 | n | Q1 | Median | Q3 | |
ALT | 1021 | 22 | 30 | 46 | 84 | 29 | 45 | 60.5 | 129 | 20 | 31 | 44 |
AST | 1018 | 22 | 27 | 36 | 82 | 28 | 35.5 | 47 | 112 | 25 | 32 | 39.5 |
Bilirubin | 1018 | 6 | 9 | 13 | 84 | 6 | 8.5 | 11 | 131 | 7 | 10 | 16 |
ALP | 1021 | 163 | 200 | 251 | 84 | 75 | 94.5 | 121 | 131 | 64 | 77 | 91 |
GGT | 1027 | 38 | 59 | 99 | 88 | 34 | 58 | 92.5 | 128 | 27.5 | 52.5 | 92.5 |
Albumin | 1039 | 44 | 46 | 48 | 84 | 43.5 | 46 | 48 | 131 | 45 | 47 | 49 |
Globulin | 1015 | 27 | 30 | 33 | 84 | 28 | 31 | 33 | 115 | 26 | 29 | 31 |
Total protein | 1027 | 73 | 76 | 79 | 88 | 73 | 77 | 80 | 120 | 73 | 76 | 78.5 |
Analyte | Laboratory 1 | Laboratory 2 | Laboratory 3 | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
n | Q1 | Median | Q3 | n | Q1 | Median | Q3 | n | Q1 | Median | Q3 | |
ALT | 588 | 19 | 27 | 39 | 56 | 30.5 | 39 | 51 | 63 | 22 | 32 | 55 |
AST | 640 | 21 | 26 | 32 | 28 | 24.5 | 34 | 39 | 56 | 26 | 34.5 | 52.5 |
Bilirubin | 640 | 6 | 8 | 11 | 58 | 5 | 8 | 11 | 63 | 7 | 9 | 14 |
ALP | 640 | 161 | 193 | 240 | 55 | 76 | 85 | 117 | 63 | 60 | 71 | 97 |
GGT | 613 | 34 | 54 | 90 | 40 | 29 | 49.5 | 68.5 | 63 | 27 | 44 | 96 |
Albumin | 659 | 44 | 46 | 48 | 56 | 44 | 45 | 47 | 63 | 46 | 47 | 49 |
Globulin | 613 | 25 | 28 | 31 | 33 | 27 | 31 | 35 | 51 | 27 | 29 | 32 |
Total protein | 623 | 71 | 74 | 77 | 33 | 73 | 77 | 81 | 51 | 74 | 76 | 78 |
The distribution of analyte concentrations is represented by the histograms in Figures 7 and 8. It is clear that the non-protein analytes exhibit substantial positive skewness. Accordingly, much of the analysis reported here deals with log-transformed LFT values, as discussed in Appendix 2 (see Liver function test results by laboratory). One advantage of this approach is that a multiplicative laboratory effect can be readily incorporated as an additive term in any linear model for LFTs. As necessary (e.g. in the stepwise analyses of Diagnostic value of liver function tests and Fatty liver on ultrasound, below) the log-transformed data have been explicitly standardised to zero mean and unit standard deviation (SD) within laboratories.
Selection effects
Considered as a sample from a natural population, the index results are subject to selection bias, as an abnormal LFT was a criterion of entry to the study. It might be anticipated that the selection effects would be attenuated over time as the impact of abnormal results arising from short-term or chance effects dies away, at least in the group with no specific serious liver diagnosis. This phenomenon is investigated in Table 25. The first five ‘non-protein’ analytes (ALT, AST, bilirubin, ALP, GGT) all show a general reduction over time, which may be interpreted as ‘regression to the mean’. Although the 2-year follow-up data are most likely to be free of LFT-related selection effects, it is comparatively incomplete (only 61.2% of patients) and more vulnerable to systematic dropout. For example, middle-aged patients are over-represented in the FU2 data. By contrast, FU1 was the primary focus of the data collection exercise, and is essentially complete, with 98.8% of patients represented.
Analyte | All subjects | |||||
---|---|---|---|---|---|---|
n | Minimum | Q1 | Median | Q3 | Maximum | |
ALT | ||||||
Index | 1114 | 8 | 22 | 33 | 50 | 329 |
FU1 | 1234 | 6 | 21 | 31 | 45 | 534 |
FU2 | 707 | 6 | 19 | 27 | 40 | 170 |
AST | ||||||
Index | 1158 | 12 | 23 | 28 | 38 | 299 |
FU1 | 1212 | 11 | 22 | 27 | 36 | 248 |
FU2 | 724 | 10 | 21 | 26 | 33 | 152 |
Bilirubin | ||||||
Index | 1265 | 1 | 6 | 9 | 13 | 130 |
FU1 | 1233 | 1 | 6 | 9 | 13 | 62 |
FU2 | 761 | 1 | 6 | 8 | 11 | 49 |
ALP | ||||||
Index | 1272 | 11 | 165 | 204 | 264 | 1075 |
FU1 | 1236 | 46 | 164 | 201 | 252 | 1105 |
FU2 | 758 | 16 | 161 | 192 | 241 | 1340 |
GGT | ||||||
Index | 1152 | 7 | 45 | 65 | 109 | 1582 |
FU1 | 1243 | 8 | 37 | 60 | 101 | 2890 |
FU2 | 716 | 7 | 33 | 54 | 92 | 683 |
Albumin | ||||||
Index | 1278 | 19 | 43 | 45 | 47 | 55 |
FU1 | 1254 | 19 | 44 | 46 | 47 | 56 |
FU2 | 778 | 28 | 44 | 46 | 48 | 145 |
Globulin | ||||||
Index | 977 | 16 | 27 | 29 | 32 | 75 |
FU1 | 1214 | 4 | 27 | 30 | 33 | 112 |
FU2 | 697 | 16 | 25 | 28 | 31 | 47 |
Total protein | ||||||
Index | 981 | 36 | 71 | 74 | 77 | 113 |
FU1 | 1235 | 39 | 73 | 76 | 79 | 162 |
FU2 | 707 | 54 | 71 | 74 | 77 | 90 |
The absence from the study of any patients with an index panel of normal LFTs will compromise assessments of the diagnostic value of LFTs. When considering diagnostic criteria based on conventional limits of abnormality, the number of ‘negatives’ (both true-negatives and false-negatives) will be underestimated. This means that many conventional measures of diagnostic performance will be biased. The direction of bias for some common measures is given in Table 26.
Measure | Definition | Direction of bias in BALLETS study |
---|---|---|
TPR = sensitivity | TPTP+FN | Overestimate |
FPR | FPFP+TN | Overestimate |
FNR | FNTP+FN | Underestimate |
TNR = specificity | TNFP+TN | Underestimate |
PPV | TPTP+FP | Unbiased |
NPV | TNTN+FN | Unclear |
Notice that PPVs can be estimated without bias because they do not depend on negative results. It is plausible that NPVs would be underestimated in the study, but this cannot be established without further assumptions.
Analysis of liver function test panels
Analysis of unadjusted liver function test results
Patterns of abnormality within the index panel
One of our aims was to evaluate test redundancy and, more generally, investigate which tests tend to group together when more than one test is abnormal. In Table 27 we analyse abnormalities between index tests.
Analyte | % abnormal | Analyte | |||||||
---|---|---|---|---|---|---|---|---|---|
ALT | AST | Bilirubin | ALP | GGT | Albumin | Globulin | Total protein | ||
39.3 | 22.0 | 11.7 | 14.9 | 75.3 | 2.3 | 5.6 | 9.9 | ||
ALT | 39.3 | 1.00 | 0.88 | 0.22 | 0.30 | 0.37 | 0.23 | 0.18 | 0.31 |
AST | 22.0 | 0.44 | 1.00 | 0.15 | 0.22 | 0.18 | 0.08 | 0.11 | 0.15 |
Bilirubin | 11.7 | 0.06 | 0.06 | 1.00 | 0.04 | 0.05 | 0.17 | 0.06 | 0.05 |
ALP | 14.9 | 0.09 | 0.15 | 0.05 | 1.00 | 0.10 | 0.13 | 0.13 | 0.08 |
GGT | 75.3 | 0.71 | 0.72 | 0.33 | 0.64 | 1.00 | 0.48 | 0.49 | 0.54 |
Albumin | 2.3 | 0.01 | 0.01 | 0.03 | 0.02 | 0.01 | 1.00 | 0.07 | 0.08 |
Globulin | 5.6 | 0.02 | 0.03 | 0.04 | 0.06 | 0.03 | 0.24 | 1.00 | 0.37 |
Total protein | 9.9 | 0.08 | 0.08 | 0.06 | 0.07 | 0.07 | 0.47 | 0.65 | 1.00 |
The entries in Table 27 estimate the sensitivity (in the usual diagnostic sense) of using the analyte defined by the row, to detect abnormalities in the analyte defined by the column. From this point of view, ALT alone is superior to AST alone because it misses relatively few cases of AST abnormality, whereas AST would miss more than half of the cases of abnormality in ALT. Where ALT is abnormal, there is a high chance that GGT will also be abnormal. However, GGT is frequently abnormal when ALT is not. There is relatively little overlap between an abnormal ALT level and abnormal bilirubin, ALP or protein levels. Interestingly, an abnormal bilirubin was associated with abnormal GGT (although the reverse was not true). A raised bilirubin was not strongly associated with abnormal ALP and vice versa.
It is clear that GGT is the best candidate if a single analyte is to be chosen to detect abnormality in other analytes. Not only does it provide the greatest individual rate of abnormality in this group, but also it finds a substantial proportion of abnormalities in other analytes too.
Nevertheless, GGT misses nearly one-quarter of abnormal cases. Other analytes must be considered if this shortfall is to be addressed. The effects of removing analytes from the standard panel are investigated in Table 28. The best subsets of analytes (i.e. those that flag up the greatest number of cases) of given size have been obtained using the 915 complete index panels of LFTs.
No. of analytes | Best subset (with next best alternative) | Cases detected by best (and next best) subset | Sensitivity (%) of best (and next best) subset |
---|---|---|---|
1 | GGT (ALT) | 687 (352) | 75.1 (38.5) |
2 | GGT + ALT (AST) | 795 (738) | 86.9 (80.7) |
3 | GGT + ALT + bilirubin (total protein) | 843 (832) | 92.1 (90.9) |
4 | GGT + ALT + bilirubin + total protein (ALP) | 877 (875) | 95.8 (95.6) |
5 | GGT + ALT + bilirubin + total protein (globulin) + ALP | 905 (895) | 98.9 (97.8) |
6 | GGT + ALT + bilirubin + total protein + ALP + globulin | 909 | 99.3 |
GGT + ALT + bilirubin + total protein + LP + AST | 909 | 99.3 | |
7 | GGT + ALT + bilirubin + total protein + ALP + globulin + AST | 913 (911) | 99.8 |
8 | GGT + ALT + bilirubin + total protein + ALP + globulin + AST + albumin | 915 | 100.0 |
In our sample, it appears that > 90% of abnormal cases can be identified by recourse to a panel of only three analytes. The table also suggests a hierarchy of analytes in decreasing order of importance, as the best subsets obtained from an increasing sequence (GGT > ALT > bilirubin > total protein ≥ ALP). Between them, these five account for nearly 99% of abnormal cases in our sample. If formal ‘abnormality’ – as defined within standard laboratory practice – is the only criterion then the remaining analytes – AST, globulin and albumin – may be seen as redundant. However, some caution is indicated here, given that no allowance has been made for sampling variability. Moreover, the detection of analyte abnormality may be a relatively minor concern in practice; the diagnostic value of individual analytes for predicting liver disease is of much greater importance when considering which tests might be dropped from the panel.
Patterns of abnormality and disease classes
The predictive value of index abnormality for individual analytes, and pairs of individual analytes, is investigated in Table 29. The table contains the percentage of patients in each liver disease category among those who have registered an abnormality in the analyte concerned, or on at least one member of a pair of analytes. It is interesting to compare these predictive values with prevalence of liver disease among the subsample of patients with a complete index panel of eight analytes. This is given in the first row of the table, and represents the PPV of the LFT panel.
Analyte combination | Liver disease 1a | Liver disease 1b | Liver disease 2 | Any liver disease: 1a + 1b + 2 | No. abnormal |
---|---|---|---|---|---|
Complete panel | 2.5 | 0.8 | 1.5 | 4.7 | 891 |
Single analytes | |||||
ALT | 4.3 | 0.7 | 1.0 | 6.0 | 418 |
AST | 5.6 | 1.2 | 0.8 | 7.7 | 248 |
Bilirubin | 3.7 | 0.7 | 0.0 | 4.4 | 135 |
ALP | 2.7 | 4.9 | 2.2 | 9.8 | 184 |
GGT | 2.2 | 1.0 | 1.7 | 4.8 | 833 |
Albumin | 3.6 | 0.0 | 0.0 | 3.6 | 28 |
Globulin | 3.6 | 0.0 | 0.0 | 3.6 | 55 |
Total protein | 4.3 | 0.0 | 0.0 | 4.3 | 94 |
Pairs of analytes | |||||
ALT or AST | 4.6 | 1.0 | 0.8 | 6.3 | 505 |
ALT or bilirubin | 4.2 | 0.8 | 0.8 | 5.7 | 527 |
ALT or ALP | 4.1 | 1.8 | 1.2 | 7.1 | 564 |
ALT or GGT | 2.7 | 0.8 | 1.5 | 5.0 | 963 |
ALT or albumin | 4.3 | 0.7 | 0.9 | 5.9 | 442 |
ALT or globulin | 4.3 | 0.6 | 0.9 | 5.8 | 464 |
ALT or total protein | 3.9 | 0.6 | 0.8 | 5.3 | 486 |
AST or bilirubin | 4.4 | 1.1 | 0.5 | 6.0 | 367 |
AST or ALP | 4.1 | 2.3 | 1.3 | 7.6 | 394 |
AST or GGT | 2.4 | 1.1 | 1.5 | 5.0 | 942 |
AST or albumin | 5.5 | 1.1 | 0.7 | 7.3 | 274 |
AST or globulin | 5.4 | 1.0 | 0.7 | 7.1 | 297 |
AST or total protein | 4.6 | 0.9 | 0.6 | 6.1 | 328 |
Bilirubin or ALP | 2.9 | 2.9 | 1.3 | 7.1 | 312 |
Bilirubin or GGT | 2.3 | 1.0 | 1.5 | 4.7 | 930 |
Bilirubin or albumin | 3.2 | 0.6 | 0.0 | 3.8 | 158 |
Bilirubin or globulin | 3.2 | 0.5 | 0.0 | 3.7 | 187 |
Bilirubin or total protein | 3.6 | 0.4 | 0.0 | 4.0 | 224 |
ALP or GGT | 2.1 | 1.3 | 1.6 | 5.0 | 933 |
ALP or albumin | 2.9 | 4.3 | 1.9 | 9.1 | 208 |
ALP or globulin | 3.0 | 3.9 | 1.7 | 8.6 | 232 |
ALP or total protein | 3.3 | 3.3 | 1.5 | 8.1 | 270 |
GGT or albumin | 2.2 | 0.9 | 1.6 | 4.8 | 851 |
GGT or globulin | 2.3 | 0.9 | 1.6 | 4.9 | 864 |
GGT or total protein | 2.3 | 0.9 | 1.6 | 4.8 | 880 |
Albumin or globulin | 2.5 | 0.0 | 0.0 | 2.5 | 79 |
Albumin or total protein | 4.3 | 0.0 | 0.0 | 4.3 | 115 |
Globulin or total protein | 4.4 | 0.0 | 0.0 | 4.4 | 113 |
One striking feature of the table is the poor predictive performance of GGT. It is outperformed by ALP in all disease categories and by ALT and AST for category 1a.
Positive predictive value may not be the most important criterion of diagnostic performance because a high value can be achieved despite missing a large fraction of cases present. However, it is the one measure that can be properly estimated from this study as our data relate to a comprehensive set of patients' abnormal LFTs.
True-positive rates can be estimated, but these can be applied only to a restricted population with an abnormal LFT panel. Of course, there may be cases of liver disease whose LFT panel happens to be normal. These estimates are shown in Table 30.
Analyte combination | Disease 1a (n = 32) | Disease 1b (n = 12) | Disease 2 (n = 15) | 1a + 1b + 2 (n = 59) | No. of patients |
---|---|---|---|---|---|
Single analytes | |||||
ALT | 66.7 | 37.5 | 28.6 | 51 | 1064 |
AST | 48.3 | 25 | 14.3 | 34.5 | 1131 |
Bilirubin | 16.1 | 8.3 | 0 | 10.5 | 1213 |
ALP | 16.1 | 75 | 28.6 | 31.6 | 1220 |
GGT | 64.3 | 100 | 100 | 80 | 1101 |
Albumin | 3.1 | 0 | 0 | 1.7 | 1226 |
Globulin | 8.7 | 0 | 0 | 4.4 | 951 |
Total protein | 17.4 | 0 | 0 | 8.9 | 955 |
Pairs of analytes | |||||
ALT or AST | 79.2 | 37.5 | 30.8 | 57.8 | 966 |
ALT or bilirubin | 76.9 | 37.5 | 30.8 | 57.4 | 1054 |
ALT or ALP | 76.9 | 75 | 46.2 | 68.1 | 1057 |
ALT or GGT | 92.3 | 100 | 100 | 95.8 | 1051 |
ALT or albumin | 70.4 | 37.5 | 28.6 | 53.1 | 1062 |
ALT or globulin | 87 | 28.6 | 28.6 | 59.1 | 916 |
ALT or total protein | 82.6 | 28.6 | 28.6 | 56.8 | 917 |
AST or bilirubin | 51.7 | 33.3 | 14.3 | 38.2 | 1125 |
AST or ALP | 55.2 | 75 | 35.7 | 54.5 | 1125 |
AST or GGT | 76.9 | 100 | 100 | 87.2 | 1005 |
AST or albumin | 51.7 | 25 | 14.3 | 36.4 | 1125 |
AST or globulin | 54.5 | 12.5 | 15.4 | 34.9 | 927 |
AST or total protein | 50 | 12.5 | 15.4 | 32.6 | 930 |
Bilirubin or ALP | 29 | 75 | 28.6 | 38.6 | 1212 |
Bilirubin or GGT | 70.4 | 100 | 100 | 83.3 | 1080 |
Bilirubin or albumin | 16.1 | 8.3 | 0 | 10.5 | 1212 |
Bilirubin or globulin | 18.2 | 12.5 | 0 | 11.6 | 939 |
Bilirubin or total protein | 27.3 | 12.5 | 0 | 16.3 | 939 |
From Table 30 it appears that the combination of GGT and ALT gives the best overall disease coverage, identifying nearly 96% of all cases of serious liver disease.
The limits of abnormality used to define a positive diagnosis in Tables 29 and 30 are based on conventional reference ranges. These are determined so as to reflect the limits of “normal” physiology, but without regard to the impact of specific diseases. Because the BALLETS index panels include all patients with a positive LFT result (as conventionally determined), they can be used to investigate the effect of increasing the upper thresholds of abnormality on the numbers of positive diagnoses. If this approach is taken then the number of positives will inevitably fall, as some cases will be missed by the more stringent criterion. For an analyte which carries diagnostic information, it is to be expected that the ratio of true-positives to false-positives (the PPV) will increase as the threshold rises above the normal limit. This is investigated in Table 31 for thresholds set at twice the normal limit. A range of thresholds is considered in Figure 9: here the curves for ALT, AST and ALP lie consistently above the diagonal line, confirming that the ratio of true-positives to false-positives does in fact increase with increasing thresholds. Indeed the slopes of these curves rise sharply as they approach the origin, suggesting that a very high analyte concentration has very high predictive value for liver disease. For GGT the situation is less clear-cut. The ratio of true-positives to false-positives remains effectively constant as the threshold increases even to twice the conventional limit, rising only as it approaches a threefold increase. This observation may cast doubt on the value of GGT as a marker of liver disease in an unselected population, or at least suggest that its full diagnostic contribution is not captured by conventional reference ranges.
Analyte | Standard thresholds | Twice standard thresholds | ||
---|---|---|---|---|
True-positives (n) | PPV (%) | True-positives (n) | PPV (%) | |
ALT | 25 | 6.0 | 9 | 11.7 |
AST | 19 | 7.7 | 4 | 14.3 |
Bilirubin | 6 | 4.4 | 0 | – |
ALP | 18 | 9.8 | 3 | 20.0 |
GGT | 40 | 4.8 | 15 | 4.7 |
Diagnostic performance of alternative liver function test panels
As already noted (see Selection effects), the selected nature of the sample – confined, as it is, to subjects with at least one abnormal analyte – precludes direct estimation of absolute sensitivities and specificities of the LFT tests as markers of liver disease. However, the relative diagnostic performance of alternative LFT panels can be assessed by comparing the numbers of true-positives and false-positives in the index sample. These quantities are plotted in Figure 10 for each of the 255 (= 28 – 1) possible LFT panels that can be constructed using eight analytes. In this analysis, ‘liver-related disease’ is defined broadly to include all group 1 and 2 diseases. The data are restricted to the 915 subjects with a complete set of index LFTs.
One panel can be said to dominate another if it generates both more true-positives and fewer false-positives than its competitor. On this basis, the best candidates are those that are not dominated by any other panel. This set of candidates is well approximated by the panels on the frontier in Figure 10, which involve three analytes only: ALT, ALP and GGT. In theory, the overall ‘best’ panel would follow on specifying the trade-off between the value of a true-positive and the cost of a false-positive, and comparing this ratio with the slope of the line segments that go to make up the frontier in the figure. Using the three ‘frontier’ analytes only, the best panel would be determined as follows:
-
Value–cost ratio < 10 It is best not to use routine LFTs for these patients.
-
Value–cost ratio between 10 and 15 The best LFT panel is ALP alone.
-
Value–cost ratio between 15 and 30 The best panel is ALP with ALT.
-
Value–cost ratio > 30 The best panel is ALT with GGT.
The panels mentioned here delineate a plausible range of options, although, in practice, it may not be straightforward to specify an appropriate value–cost ratio.
The PPV of a panel depends on the ratio of true-positives to false-positives. For the 915 complete index panels that contribute to Figure 10 it ranges from 9.6% for ALP alone, through 7.0% (ALP and ALT) to 5.3% (ALT and GGT). The PPV of the eight-analyte panel is 4.8%.
The results above suggest that the functions of a routine LFT panel can be largely subsumed into just three analytes: ALT, ALP and GGT. Nevertheless, an LFT panel – however carefully chosen – will never function as a definitive diagnostic procedure for any particular pathology; symptomatic patients will continue to be monitored in a primary-care setting regardless of a ‘negative’ test result. In this context, a panel with high PPV is perhaps most useful to the clinician, particularly if a plausible biological interpretation of positive results can suggest a route towards a definitive diagnosis. For these reasons, we incline towards the recommendation that routine testing should be confined to ALP and ALT alone. Addition of GGT will certainly increase the number of positive results but will not necessarily help specify a future clinical pathway for the patient. As remarked above (see Figure 9) the PPV of an abnormal GGT result does not increase when a higher threshold of abnormality is used. This is not suggestive of a strong clinical effect.
Repeat testing
There is a natural impulse to repeat a positive test to see if it is confirmed, particularly if the test has low specificity. It appears that repeating the full eight-analyte panel is of little value, as it achieves results similar to a single administration of the two-analyte panel (ALT + GGT) (see Figure 1). In our sample, repeating a positive ALP result does not affect the PPV, although the number of positive results that survive is reduced by 30%. For both ALP + ALT and ALT + GGT, repeat testing improves the PPV and may be recommended.
In summary, the analysis in this section points to a reduction in the number of analytes in the routine panel, to ALT + ALP, leaving open the possibility of repeat testing should the initial result be abnormal.
Correlation analysis of liver function tests
Table 32 shows the correlations between all pairs of LFTs over all three panels (index, FU1 and FU2). These were obtained from the log-transformed data, standardised within laboratories and panels to have zero mean and unit standard deviation.
Analyte | ALT | AST | Bilirubin | ALP | GGT | Albumin | Globulin | Total protein |
---|---|---|---|---|---|---|---|---|
ALT | 1.000 | |||||||
AST | 0.773 | 1.000 | ||||||
Bilirubin | 0.068 | 0.150 | 1.000 | |||||
ALP | -0.047 | 0.001 | −0.204 | 1.000 | ||||
GGT | 0.363 | 0.379 | −0.103 | 0.199 | 1.000 | |||
Albumin | 0.150 | 0.058 | 0.162 | −0.206 | −0.015 | 1.000 | ||
Globulin | 0.032 | 0.099 | −0.067 | 0.152 | 0.072 | −0.146 | 1.000 | |
Total protein | 0.122 | 0.120 | 0.049 | 0.014 | 0.064 | 0.485 | 0.778 | 1.000 |
Each correlation coefficient in Table 32 derives from more than 2500 pairs of observations. Even although these originate from only 1290 patients (implying a lack of independence between pairs), statistical significance (p < 0.05) can be claimed for any coefficient > 0.05 in absolute magnitude. On this basis, most of the coefficients in Table 32 (22 out of 28) represent significant relationships between analytes. However, most of these correlations are too small to be of practical interest. Disregarding the structural correlations involving total protein and its constituent components (albumin and globulin), the only analytes with mutual correlation above 0.3 (i.e. sufficient to account for around 10% of each other's variation) are ALT, AST and GGT. ALT and AST are the most highly correlated, explaining about 60% of one another's variation.
The proportion of the variation in each analyte that can be explained by the others (i.e. an R2 value from a regression model) is presented in Table 33, using FU1, which is the most comprehensive data set. The values have been obtained by regression of the log-transformed LFTs on all other (log-transformed) analytes, after adjustment for additive laboratory effects. This is equivalent to fitting multiplicative laboratory effects to the raw data, as discussed above (see Summary of LFT data) and also Appendix 2, Liver function test results by laboratory). Total protein has been excluded from these analyses.
Analyte | R2 from best linear predictor (%) | Analyte | R2 from best single predictor (%) | Analytes | R2 from best pair of predictors (%) |
---|---|---|---|---|---|
ALT | 65.4 | AST | 62.0 | AST, albumin | 63.5 |
AST | 66.4 | ALT | 62.0 | ALT, GGT | 66.4 |
Bilirubin | 9.5 | ALP | 3.6 | ALP, AST | 5.6 |
ALP | 13.5 | Bilirubin | 3.6 | Bilirubin, GGT | 7.5 |
GGT | 24.5 | AST | 16.9 | AST, ALP | 19.6 |
Albumin | 8.3 | ALP | 3.6 | ALP, ALT | 5.3 |
Globulin | 5.8 | ALP | 2.8 | ALP, AST | 4.4 |
From Table 33 it appears that around 65% of the variation in ALT and AST can be explained by regression on the other analytes, which still leaves 35% unexplained. For all other analytes the explained variation is < 25% of the total. This analysis has some bearing on the question of test redundancy in that no LFT can be dropped from the panel without substantial loss of information. We shall return later to the question of whether or not the information that would be lost in dropping a particular LFT from the panel is relevant to any significant clinical decision.
Associations over time
Persistence of abnormality from index to first follow-up
We analysed the proportion of cases where the FU1 panel of LFTs was abnormal given that an abnormal index test was the entry criterion for the study. This analysis is restricted to the 844 participants for whom a full panel of eight analytes was available on both occasions (index and FU1). Only 138 (16%) had a normal repeat (FU1) test result and the breakdown by test is given in Table 34. The proportion of normal repeat panels at FU1, about 1 month after the index test, is < 10% if three tests are abnormal, bilirubin is abnormal or if two tests are abnormal and one of the two is GGT. An isolated abnormal ALT has a rather high chance of reverting to normal (43%), whereas the repeat panel is seldom normal if GGT is raised (16%).
Analyte(s) | Abnormal index test, n | Same abnormality at FU1, n (%) | Different abnormality at FU1, n (%) | Normal panel at FU1, n (%) |
---|---|---|---|---|
ALT alone | 50 | 17 (34.0) | 12 (24.0) | 21 (42.0) |
Bilirubin alone | 39 | 22 (56.4) | 5 (12.8) | 12 (30.8) |
GGT alone | 328 | 213 (64.9) | 65 (19.8) | 50 (15.2) |
ALT + GGT | 103 | 42 (40.8) | 52 (50.5) | 9 (8.7) |
ALT + AST | 35 | 10 (28.6) | 16 (45.7) | 9 (25.7) |
ALT + AST + GGT | 75 | 21 (28.0) | 49 (65.3) | 5 (6.7) |
ALP + GGT | 32 | 17 (53.1) | 12 (37.5) | 3 (9.4) |
There were 52 patients with serious liver disease and a complete panel of repeat LFTs at FU1 (28 in category 1a; 10 in category 1b; 14 in category 2). Of these, only two had reverted to normal (one each in categories 1b and 2).
Persistence of abnormality at 2 years
Of the 1168 complete panels available at FU1, 176 (15.1%) had reverted to normal. At FU2, only 577 complete panels were available, of which 176 (30.5%) were normal. The pattern of abnormality for each analyte over time is summarised in Table 35.
2-year follow-up | Baseline/first follow-up, n (%) | |||
---|---|---|---|---|
Normal/normal | Normal/abnormal | Abnormal/normal | Abnormal/abnormal | |
ALT | ||||
Normal | 310 (92.8) | 14 (73.7) | 60 (73.2) | 71 (47.7) |
Abnormal | 24 (7.2) | 5 (26.3) | 22 (26.8) | 78 (52.3) |
AST | ||||
Normal | 447 (94.5) | 21 (84.0) | 50 (80.6) | 47 (69.1) |
Abnormal | 26 (5.5) | 4 (16.0) | 12 (19.4) | 21 (30.9) |
Bilirubin | ||||
Normal | 603 (98.2) | 10 (90.9) | 17 (68.0) | 16 (34.8) |
Abnormal | 11 (1.8) | 1 (9.1) | 8 (32.0) | 30 (65.2) |
ALP | ||||
Normal | 587 (97.2) | 9 (69.2) | 24 (80.0) | 26 (46.4) |
Abnormal | 17 (2.8) | 4 (30.8) | 6 (20) | 30 (53.6) |
GGT | ||||
Normal | 120 (90.9) | 5 (62.5) | 58 (69.9) | 79 (20.4) |
Abnormal | 12 (9.1) | 3 (37.5) | 25 (30.1) | 308 (79.6) |
Albumin | ||||
Normal | 692 (98.3) | 12 (80.0) | 13 (100.0) | 2 (50.0) |
Abnormal | 12 (1.7) | 3 (20.0) | 0 (0.0) | 2 (50.0) |
Globulin | ||||
Normal | 455 (94.2) | 8 (80.0) | 11 (84.6) | 9 (64.3) |
Abnormal | 28 (5.8) | 2 (20.0) | 2 (15.4) | 5 (35.7) |
Total protein | ||||
Normal | 399 (91.5) | 43 (76.8) | 16 (84.2) | 11 (37.9) |
Abnormal | 37 (8.5) | 13 (23.2) | 3 (15.8) | 18 (62.1) |
Correlations between epochs
Correlations between measurements of the same analyte within individual patients are presented in Table 36.
Analyte | Correlations over time between | ||
---|---|---|---|
Index and FU1 | Index and FU2 | FU1 and FU2 | |
ALT | 0.792 | 0.585 | 0.596 |
AST | 0.736 | 0.470 | 0.511 |
Bilirubin | 0.760 | 0.701 | 0.717 |
ALP | 0.865 | 0.651 | 0.708 |
GGT | 0.891 | 0.720 | 0.756 |
Albumin | 0.726 | 0.575 | 0.617 |
Globulin | 0.676 | 0.491 | 0.492 |
Total protein | 0.659 | 0.544 | 0.529 |
The index and FU1 panels were taken close together in time (median interval 1 month), and it is therefore not surprising to see some high correlations in the first column of Table 36. The median interval between the index test and FU2 was 25.3 months. It is notable that the temporal correlation remains relatively high for all analytes over such a time interval. In practice, there was considerable variation in the timing of the FU1 and FU2 panels (see Table 21) for different patients. This feature is exploited in the analysis of Appendix 2 (see Temporal modelling of liver function tests), which investigates the persistence of LFT results over time and seeks to identify the proportion of the variance in LFTs that is due to genuine differences between patients.
Variation between and within patients
For any particular patient, the measured concentration of an analyte from the LFT panel will be subject to temporal fluctuations. Thus, only part of the variation in the recorded LFT panels can be attributed to genuine differences between patients. A simple estimate of the proportion of the variance that can be explained in this way may be obtained by means of an intraclass correlation coefficient (ICC) computed across the three epochs (index, FU1, FU2) at which the LFT panels were recorded (see Table 37, column 1).
Analyte | Intrapatient correlation (%) | Estimate from temporal analysis | ||
---|---|---|---|---|
% | 95% confidence limits | |||
ALT | 68.8 | 52.5 | 47.7 | 57.4 |
AST | 61.2 | 38.3 | 32.9 | 43.8 |
Bilirubin | 72.5 | 69.6 | 64.9 | 74.3 |
ALP | 76.0 | 66.7 | 62.2 | 71.2 |
GGT | 80.9 | 76.5 | 72.8 | 80.4 |
Albumin | 55.8 | 46.7 | 25.6 | 67.8 |
Globulin | 55.2 | 51.8 | 42.7 | 60.8 |
Total protein | 56.6 | 62.9 | 58.9 | 67.0 |
However, an analysis based on intraclass correlation necessarily disregards the selection effects that arise because abnormal LFTs were used as a criterion of entry to the study (see Selection effects). Moreover, it takes no account of the actual time intervals between the three test epochs for individual patients. Hence, the method does not adjust for ephemeral or ‘transient’ variation in levels attributable to medium-term (perhaps seasonal) fluctuations in the patient's environment or behaviour, which are of limited interest in the context of long-term clinical conditions. These issues are addressed more fully by the temporal analysis in Appendix 2 (see Temporal modelling of liver function tests). Some of the results of this analysis are reproduced in the final columns of Table 37, which show the estimated proportion of the variance in each analyte that is attributable to long-term patient differences according to the methods described in Appendix 2 (see Temporal modelling of liver function test).
It appears that the intraclass correlation tends to overestimate the variance attributable to differences between patients – particularly striking in the case of AST. Nevertheless, the impact of patient differences is substantial for all analytes.
The usefulness of LFTs as discriminatory tool for long-term clinical prognosis may be limited by the magnitude of the patient-level component of variance. In this respect it is possible that ALT has more potential than its close competitor AST. Although these are highly correlated with one another (see Diagnostic performance of alternative liver function test panels), the level of ALT may be more reflective of long-term differences between patients.
Impact of patient characteristics on liver function test results
Patient characteristics selected
Patient characteristics selected here are sex, age, ethnic group, BMI and alcohol consumption.
Univariate analysis
Results of one-way analysis of variable on log-LFT data with adjustment for laboratory effects are shown in detail in Appendix 2 (see Univariate analyses). The results may be summarised as follows.
Sex
Most of the analytes exhibit a sex effect. ALP levels tend to be higher in females than in males. For all other non-protein LFTs, and for albumin, average levels for females are lower than those for males.
Age
All analytes except globulin show significant relationships with age. These are strongest for ALT, there being some evidence that levels are lowest in the youngest and oldest patients, and for albumin, which declines steadily with increasing age.
Ethnic group
Here significant effects are largely confined to the protein measures globulin and total protein, with black and Asian subjects exhibiting higher average levels than white subjects.
BMI
Separate BMI measurements were not available at baseline, so the index analysis of LFTs utilises FU1 BMI categories. Both ALT and GGT levels were raised among patients in the higher BMI categories. Globulin also shows some increase, although this effect is not evident at the 2-year follow-up.
Alcohol
Alcohol intake was not requested at baseline, so the index analysis of LFTs utilises FU1 alcohol categories. All the non-protein analytes are significantly related to alcohol intake, particularly at the beginning of the study (index and FU1). For ALT, AST and GGT, the association is in the positive direction, with high alcohol accompanied by raised LFTs. For ALP, the direction of effect is reversed. For bilirubin, the association is less strong and its character unclear.
Multivariate analysis
It is possible that some of the effects identified in the univariate analyses can be attributed to confounding between different patient characteristics. To address this issue, stepwise analyses were carried out using a backwards elimination method, with the aim of finding the most convincing models for the influence of patient-level covariates on LFT results.
These analyses were carried out using FU1 data only. The model-fitting strategy is described below.
-
For each analyte, perform an analysis of variance (ANOVA) on the log-transformed LFT values, including the following terms:
-
main effect of laboratory
-
main effects and interaction for sex and age categories
-
main effects of ethnic group, BMI and alcohol intake (categorised as usual)
-
all two-way interactions of sex category with items in (iii)
-
all two-way interactions of age category with items in (iii).
-
-
Eliminate non-significant (p > 0.05) interaction terms under (iv) and (v) using backwards elimination.
-
Drop non-significant main effects under (iii) using backwards elimination, unless the retention of a main effect was necessary to support the interpretation of a significant interaction term.
Where the final model contained a two-way interaction between age or sex and one of BMI, alcohol intake or ethnic group, this interaction was extended to a three-way interaction in which both age and sex were included. In no case was the three-way interaction statistically significant.
In addition to the laboratory effects, all of the ‘final models’ obtained by this strategy necessarily include age, sex and their interaction effect, whether or not these achieved formal statistical significance.
The outcome of the model-fitting strategy is briefly summarised in Table 38, which contains the p-values for the terms labelled (iii), (iv) or (v) from the list above that were retained in the final models. All models include age, sex and age × sex, as well as an adjustment for laboratory effects. The final column in Table 38 refers to the proportion of the variance explained by the model (i.e. an adjusted R2-statistic), computed net of laboratory effects. Parameter estimates representing the detailed impact of the covariates in the final fitted models are given in Appendix 2 (see Multivariate analyses).
Analyte | BMI | Alcohol | Ethnicity | BMI × age | Other terms | Variance explained (%) |
---|---|---|---|---|---|---|
ALT | p = 0.0000a | p = 0.0031 | p = 0.0012 | 19.7 | ||
AST | p = 0.3367a | p = 0.0000 | p = 0.0055 | 6.7 | ||
Bilirubin | p = 0.0021 | p = 0.0321 | 9.8 | |||
ALP | p = 0.0301 | 9.3 | ||||
GGT | p = 0.0002a | p = 0.0000 | p = 0.0068 | 13.8 | ||
Albumin | p = 0.0005a | p = 0.4734a | p = 0.0192 | p = 0.0081 | Alcohol × sex; p = 0.0281 | 11.9 |
Globulin | p = 0.0000a | p = 0.0000 | p = 0.0083 | 8.7 | ||
Total protein | p = 0.2224a | p = 0.0000a | p = 0.0035 | Ethnicity × age; p = 0.0384 | 8.5 |
It is striking that ethnic group impacts only on protein measures, and that alcohol impacts (to some extent, at least) on all non-protein measures. The influence of BMI is pervasive, affecting everything except ALP, but its effects vary with age in most cases. The weak effect on albumin of the alcohol × sex interaction may be accidental given that the main effect of alcohol is not significant for this analyte. For some purposes [e.g. in the temporal correlation analysis of Appendix 2 (see Summary of analyses of liver function test results)], these terms have been omitted together with the weak ethnicity × age interaction in the total protein model.
The final column in the table quantifies the extent to which an LFT is predictable from the general characteristics of the patient in our data set, without taking direct account of any pathological condition.
Impact of diagnostic category
Table 39 shows the impact of ‘serious liver disease’ on the models developed above. The results were obtained by adding a four-level diagnostic category (see Diagnostic categories and Liver-related disease) to the multivariate models (see Multivariate analysis). Entries in this table are estimates and confidence intervals (CIs) for the multiplicative factors that represent the impact of diseases on the LFT result with the ‘non-specific’ (i.e. category 3) as base.
Analyte | Hepatitis B or C (n = 13) | Hepatocellular disease (1a), excluding hepatitis (n = 19) | Hepatic bile duct disease (1b) (n = 12) | Other diseases affecting liver (2) (n = 15) | ||||
---|---|---|---|---|---|---|---|---|
Ratio | 95% CI | Ratio | 95% CI | Ratio | 95% CI | Ratio | 95% CI | |
ALT | 2.25 | 1.65 to 3.07 | 1.61 | 1.27 to 2.04 | 1.07 | 0.79 to 1.46 | 1.07 | 0.82 to 1.39 |
AST | 1.69 | 1.35 to 2.13 | 1.56 | 1.30 to 1.88 | 1.20 | 0.94 to 1.51 | 1.00 | 0.82 to 1.24 |
Bilirubin | 0.93 | 0.68 to 1.28 | 1.27 | 0.99 to 1.64 | 0.83 | 0.60 to 1.14 | 1.02 | 0.77 to 1.35 |
ALP | 0.94 | 0.77 to 1.15 | 0.98 | 0.84 to 1.15 | 1.64 | 1.35 to 1.98 | 1.27 | 1.07 to 1.50 |
GGT | 1.17 | 0.76 to 1.79 | 1.51 | 1.08 to 2.13 | 1.68 | 1.06 to 2.68 | 1.19 | 0.82 to 1.74 |
Albumin | 1.02 | 0.97 to 1.06 | 1.00 | 0.97 to 1.04 | 0.97 | 0.93 to 1.01 | 0.96 | 0.93 to 1.00 |
Globulin | 1.05 | 0.95 to 1.15 | 0.97 | 0.90 to 1.05 | 1.13 | 1.03 to 1.24 | 1.02 | 0.94 to 1.10 |
Total protein | 1.02 | 0.98 to 1.07 | 0.99 | 0.96 to 1.03 | 1.03 | 0.99 to 1.07 | 0.99 | 0.96 to 1.02 |
From this analysis, it appears that both ALT and AST are raised in category 1a diseases (including viral hepatitis). ALP is raised in category 1b and also in category 2. GGT is raised in categories 1a and 1b, and globulin in 1b. These are the only findings to achieve formal statistical significance, although the small size of the disease categories will certainly compromise the power here.
Impact of fatty liver
A similar analysis is presented in Table 40 for the impact of fatty liver (as reported on ultrasound at FU1) on the panel of LFTs. Entries in the table represent multiplicative factors that apply to the average LFT level under the condition described at the head of each column, with the ‘Normal’ category (i.e. no fatty liver) as base. They derive from two analyses: for the column ‘fatty liver present’ a two-level categorical variable was added to the patient characteristic model; the remaining columns derive from an analysis in which fatty liver was represented by a four-level categorical variable.
Analyte | Fatty liver present (n = 484) | Mild (n = 263) | Moderate (n = 177) | Severe (n = 44) | ||||
---|---|---|---|---|---|---|---|---|
Ratio | 95% CI | Ratio | 95% CI | Ratio | 95% CI | Ratio | 95% CI | |
ALT | 1.35 | 1.27 to 1.44 | 1.29 | 1.19 to 1.39 | 1.40 | 1.29 to 1.53 | 1.58 | 1.35 to 1.84 |
AST | 1.20 | 1.14 to 1.26 | 1.16 | 1.09 to 1.23 | 1.22 | 1.14 to 1.31 | 1.34 | 1.18 to 1.53 |
Bilirubin | 1.02 | 0.95 to 1.09 | 1.05 | 0.97 to 1.14 | 0.97 | 0.89 to 1.07 | 1.03 | 0.87 to 1.22 |
ALP | 0.96 | 0.92 to 1.00 | 0.95 | 0.91 to 1.00 | 0.98 | 0.92 to 1.04 | 0.95 | 0.85 to 1.05 |
GGT | 1.13 | 1.03 to 1.25 | 1.06 | 0.95 to 1.19 | 1.23 | 1.08 to 1.40 | 1.28 | 1.02 to 1.62 |
Albumin | 1.02 | 1.01 to 1.03 | 1.02 | 1.01 to 1.03 | 1.02 | 1.01 to 1.03 | 1.02 | 1.00 to 1.04 |
Globulin | 0.99 | 0.97 to 1.01 | 0.97 | 0.95 to 0.99 | 1.02 | 0.99 to 1.05 | 0.99 | 0.94 to 1.04 |
Total protein | 1.01 | 1.00 to 1.02 | 1.00 | 0.99 to 1.01 | 1.02 | 1.01 to 1.03 | 1.00 | 0.98 to 1.02 |
The contribution of fatty liver was highly significant (p < 0.0005) in both analyses for ALT, AST and albumin with an impact that worsens with the severity of the condition. For GGT, a similar qualitative effect was observed, but with less extreme p-values (p = 0.0088 and 0.0075 for the contribution in the two analyses). Bilirubin and ALP exhibited no significant effects (p = 0.5333 and 0.4606 for bilirubin, p = 0.0593 and 0.2357 for ALP). The results for globulin and total protein are anomalous. Although there is no significant evidence for the impact of the presence of fatty liver (p = 0.3011 for globulin; p = 0.1434 for total protein), the impact of severity is formally significant in both cases (p = 0.0076, p = 0.0187). Inspection of the ratio estimates suggests a pattern in which the impact rises with moderate fattiness and falls back in severe cases. This is scarcely a plausible finding but it could arise as a chance result for globulin, with a knock-on effect in the aggregate protein measure total protein.
Laboratory and practice effects
In the data set, differences between laboratories are confounded with practice effects; indeed, they may be regarded as a component of the differences between practices. Thus, it is possible that some ‘laboratory’ effects (see Appendix 2, Liver function test results by laboratory) have little to do with differences in laboratory procedure.
Practice effects could arise from several different sources. For instance:
-
as a proxy for geographical effects
-
as a reflection of different testing policies in different practices
-
as a reflection of different testing policies between different GPs within practices.
Practice and laboratory effects were assessed using the models developed above for the first follow-up LFTs. Table 41 shows the results of testing for laboratory and practice effects in a nested ANOVA. Patient-level covariates identified in Table 38 above are included for each analyte.
Analyte | Between laboratories (assessed with respect to variation between practices) | Differences between practices within laboratories | ||
---|---|---|---|---|
F (2 and 8 df) | p-value | F (8 and 1100+ df)b | p-value | |
ALT | 4.19 | 0.0570 | 2.50 | 0.0107 |
AST | 4.99 | 0.0393 | 3.71 | 0.0003 |
Bilirubin | 6.26 | 0.0231 | 1.85 | 0.0646 |
ALP | 168.44 | 0.0000 | 2.87 | 0.0036 |
GGT | 0.17 | 0.8461 | 4.18 | 0.0001 |
Albumin | 8.35 | 0.0110 | 1.04 | 0.4056 |
Globulin | 4.12 | 0.0588 | 1.04 | 0.4066 |
Total protein | 4.50 | 0.0490 | 0.37 | 0.9388 |
The evidence for differences between laboratories is overwhelming only in the case of ALP, where it was already known to be present. For six of the other analytes (i.e. except for GGT) there is some modest evidence for a laboratory effect. The protein measures (albumin, globulin and total protein) seem not susceptible to variation at the practice level. However, the evidence for such variation is moderately strong for the other LFTs (especially AST, GGT and ALP), with only bilirubin yielding a result that is not formally significant.
The impact of practice effects on the overall fitted models is comparatively small and does not appear to distort the impact of patient characteristics on LFT results. In particular, none of the terms would have been dropped (as non-significant) from any of the models in Table 38 following inclusion of fixed practice effects.
Liver-related disease
Disease categories
Over the course of the study, 59 patients were diagnosed with a serious liver-related condition. Of these, 32 had category 1a diseases (including 13 with hepatitis B or C), 12 had category 1b diseases, and 15 had category 2 diseases. The breakdown of these conditions is shown in Table 42. The remaining 1231 patients were classified as having no specific liver diagnosis. However, 53 of the non-specific group were not tested for at least one of hepatitis B and C – the most common of the serious diseases. These 53 patients were excluded from analyses that involve the diagnostic grouping, leaving 1178 patients in the non-specific group. A combination of non-specific and diagnostic categories yields 1237 patients.
Category | No. of patients | Subtotal | Total |
---|---|---|---|
1. Specific liver diseases | 44 | ||
1a. Hepatocellular diseases | 32 | ||
Viral hepatitis | 13 | ||
Haemochromatosis | 10 | ||
A1AT deficiency | 3 | ||
Alcoholic cirrhosis | 5 | ||
Hepatocellular cancer | 1 | ||
1b. Intrahepatobiliary duct disease | 12 | ||
PBC | 10 | ||
PSC | 2 | ||
2. Systemic diseases involving the liver | 15 | ||
Liver metastases | 4 | ||
Cancer pancreas/bile duct | 4 | ||
Lyme disease | 1 | ||
Hypothyroidism40 | 4 | ||
Amoebic liver abscess | 1 | ||
Chronic pancreatitis | 1 |
Chronic viral hepatitis
Thirteen out of 1236 patients for whom results of both test were available (hepatitis B or hepatitis C test results were missing for 54 of 1290 patients; hepatitis B and hepatitis C results were missing in 38 cases) had chronic viral hepatitis: nine patients had hepatitis B and four had hepatitis C. The breakdown of the type of analyte that was abnormal in the index panel is shown in Table 43. In 10 out of these 13 cases, more than one analyte was abnormal. In eight cases, the ALT was abnormal, and there was no case in which AST was high but ALT was normal. In one case, only protein levels were abnormal and all the enzyme tests (ALT, AST, GGT and ALP) were normal. Fatty liver was present in six cases, a proportion that is similar to the overall rate of fatty liver in the complete data set (see Ultrasound features). One patient with hepatitis B has subsequently progressed to cirrhosis.
Type of hepatitis | Case | ALT | AST | Bilirubin | ALP | GGT | Albumin | Globulin | Total protein | Fattya |
---|---|---|---|---|---|---|---|---|---|---|
Hepatitis B | 1 | High | High | Normal | Normal | High | Normal | Normal | High | Yes |
2 | Normal | Normal | High | Normal | Normal | High | Low | Normal | No | |
3 | High | Normal | Normal | Normal | High | Normal | Normal | Normal | Yes | |
4 | High | High | High | Normal | High | Normal | Normal | High | Yes | |
5 | High | High | Normal | Normal | Normal | Normal | Normal | Normal | Yes | |
6 | High | – | – | – | Normal | Normal | Normal | Normal | No | |
7 | Normal | – | High | Normal | – | Normal | – | – | No | |
8 | – | Normal | Normal | Normal | Normal | Normal | High | High | No | |
9 | – | High | Normal | Normal | – | Normal | – | – | No | |
Hepatitis C | 1 | High | Normal | Normal | Normal | High | Normal | Normal | Normal | No |
2 | High | High | Normal | Normal | Normal | Normal | Normal | High | Yes | |
3 | Normal | – | Normal | Normal | High | Normal | – | – | Yes | |
4 | High | High | Normal | Normal | Normal | Normal | Normal | Normal | No |
When ALT or AST levels were abnormal, the values tended to be more extreme in patients with viral hepatitis than in patients who did not have this disease (Table 44). The same trend was observed of all patients as a whole, not only those in whom the analyte was abnormal (data not shown). In nine participants either ALT or AST was abnormal. Country of origin was recorded in 1208 out of the 1236 patients in whom both viral hepatitis tests were undertaken: 107 were born in a medium- or high-risk country for hepatitis B or hepatitis C, according to World Health Organization (WHO) definitions, but 11 out of the 13 patients with chronic hepatitis originated from a medium- or high-risk country. None of the 13 patients admitted to use of intravenous drugs at any time.
Analyte | Upper limit | HBV or HCV | Non-hepatitis | ||||
---|---|---|---|---|---|---|---|
n | Mean | Median | n | Mean | Median | ||
ALT | 41 | 8 | 98.0 | 89.5 | 426 | 65.4 | 56.0 |
AST | 43 | 6 | 94.5 | 69.5 | 254 | 64.5 | 53.5 |
Primary biliary cirrhosis and primary sclerosing cholangitis
Primary biliary cirrhosis was defined as a cholestatic blood picture with positive serology for anti-mitochondrial antibody (AMA). AMAs were positive in 13 BALLETS cases. Three were weakly positive, leaving 10 positive cases included as category 1b cases in the statistical analysis. In retrospect, one of those cases (case 7) may have been misclassified (Table 45). Nine patients had a diagnosis of PBC confirmed by liver specialist follow-up, with a strong predominance for female sex (8/9) and white race (9/9). The mean age was 69.1 years, with two-thirds being aged > 65 years. Other risk factors for liver disease, namely alcohol excess and obesity, were unremarkable in this PBC cohort with a mean BMI of 27.7 kg/m2, of whom 100% consumed ≤ 6 units of alcohol per week on average. ALP was abnormal on index blood testing in all the identified female patients with PBC. The only exception was the male patient with a solitary GGT abnormality on index testing. The ALP remained abnormal on repeat blood testing at FU1 in seven individuals. The course of the disease is usually benign in patients detected by LFTs rather than features of cirrhosis. However, two cases in our sample had features of early cirrhosis on ultrasound.
Case no. | Age (years) | Sex | Titre | M2 Subtype | ALP | Proportional increase of ALP above ULN | Abdominal ultrasound | Confirmed diagnosis by liver specialist |
---|---|---|---|---|---|---|---|---|
1 | 73 | M | 1 : 40 | Weak positive | 278 | Normal | Mild fatty liver | PBC |
2a | 58 | M | 1 : 40 | Negative | 170 | Normal | Normal | PBC |
3 | 75 | F | 1 : 100 | Strong positive | 456 | 1.38 | Mildly irregular liver surface | PBC |
4 | 86 | F | 1 : 100 | Strong positive | 462 | 1.4 | 1-cm simple cyst head of pancreas and pancreatic duct at head at ULN (at 3 mm); common bile duct dilated at 0.9 cm | PBC |
5 | 87 | F | 1 : 100 | Positive | 362 | 1.1 | Normal | PBC |
6 | 69 | F | 1 : 100 | Strong positive | 346 | 1.05 | Normal | PBC |
7b | 45 | F | 1 : 25 | Negative | 206 | Normal | Mild fatty liver; mid-portion bile duct minimally dilated at 72 cm | Not referred |
8 | 48 | F | 1 : 100 | Strong positive | 364 | 1.26 | Moderate fatty liver; liver enlarged at 18 cm; bright, coarse texture – possibly cirrhotic change | PBC |
9 | 52 | F | 1 : 40 | Positive | 407 | 1.23 | Gallbladder multiple small calculi – largest 5 mm | PBC |
10 | 81 | F | 1 : 100 | Strong positive | 633 | Normal | Gallbladder multiple small calculi; small aortic aneurysm | PBC |
In summary, if a GP identifies an incidental raised ALP (± GGT) in a white woman aged > 65 years, having excluded obesity and/or alcohol excess as causes of liver disease, it is likely to be cost-effective and clinically more intuitive to proceed straight to an AMA test rather than proceed to repeat tests or a full liver screen. An AMA test costs £8.01 (University Hospitals Birmingham NHS Foundation Trust, 2011) and the result should be available within 7 days, thus not delaying further clinical investigation if indicated. Two cases of PSC were detected in the study (Table 46).
Case no. | Age (years) | Sex | ALP | Co-existing conditions | Abdominal ultrasound | Confirmed diagnosis by liver specialist |
---|---|---|---|---|---|---|
1 | 40 | M | 349 | Ulcerative colitis | Mild intrahepatic duct dilatation; mildly heterogeneous liver texture | PSC (MRI) |
2 | 29 | M | 1075 | Ulcerative colitis | Common bile duct at ULN (7 mm) | PSC (liver biopsy) |
Autoimmune hepatitis
Smooth muscle antibodies (SMAs) were positive in 47 cases (weakly positive in five of these). In two cases (Table 47) either the ALT or AST exceeded twice the ULN. These cases had not been followed up at the time of the report and the study hepatologist is reticent about making a firm diagnosis of this very rare disease in elderly patients. Moreover, such a diagnosis would use test results as both the topic of investigation and a diagnostic criterion thereby risking inclusion bias.
Case no. | Age (years) | Smooth muscle | ALT and AST | Abdominal ultrasound | Action |
---|---|---|---|---|---|
1 | 74 | Positive | ALT 29 (normal); AST 29 (normal) | Stone present in fundus of gallbladder; enlarged spleen at 15.7 cm | Referred. Reviewed by two liver specialists. Outcome = alcohol excess, unlikely autoimmune hepatitis |
2 | 70 | Positive | ALT missing; AST 249 (raised) | Abnormal parenchyma; small calcified speck in right lobe | Not referred |
Haemochromatosis
Iron saturation exceeded 50% in 39 cases, and in eight of these it exceeded 80%. We obtained a haemochromatosis genotype for 27 cases with iron saturation > 50% during the 2-year follow-up.
These cases are summarised in Table 48, in which it can be seen that there are six cases of homozygous haemochromatosis (C282Y or H63D) and four compound heterozygote (C282Y + H63D) who may be classed as having haemochromatosis (category 1a disease). There were also two carrier heterozygotes (C282Y). Five of the six homozygous patients had iron saturations above 80%. In none of the compound heterozygote patients did iron saturation exceed this level. Three patients were deceased and one patient was no longer registered at this practice. Five patients did not attend follow-up clinic appointments. Three patients attended hospital liver clinic appointments but did not have HFE genotype results. Four out of the six homozygous cases had ferritin levels of > 1000 mg/dl and receive frequent venesection. Their families have been screened.
Case no. | Sex | Age (years) | Fe saturation (%) | HFE genotype | Ultrasound abnormalities |
---|---|---|---|---|---|
1 | M | 43 | 58.0 | HHCC normal | None |
2 | F | 60 | 51.9 | HHCC normal | Mildly fatty liver |
3 | F | 75 | 53.3 | HHCC normal | Mildly fatty liver |
4 | M | 56 | 51.7 | Compound heterozygote | Abnormal parenchyma and moderately fatty liver |
5 | M | 60 | 55.7 | HHCC normal | Abnormal parenchyma and single, solid focal lesion 1.4 × 1 cm; possibly haemangioma |
6 | F | 72 | 61.9 | Haemochromatosis homozygote (H63D) | Three polyps in gallbladder – largest = 4.3 mm |
7 | F | 45 | 85.9 | Alcohol dependent; did not return to clinic | Enlarged liver and abnormal parenchyma and moderately fatty liver |
8 | M | 51 | 87.8 | Deceased; septicaemia/septic arthritis | Enlarged liver and abnormal parenchyma and marked fatty liver; Spleen enlarged at 14 cm |
9 | M | 75 | 94.6 | Haemochromatosis homozygote (C282Y) | Abnormal parenchyma and small highly echogenic focus in posterior aspect of right lobe – 1.1 cm |
10 | M | 64 | 64.0 | HHCC normal | None |
11 | F | 53 | 56.7 | HHCC normal | Abnormal parenchyma and mildly fatty liver; multiple angiomyolipomas in right kidney |
12 | M | 54 | 79.2 | Compound heterozygote | Abnormal parenchyma and moderately fatty liver |
13 | F | 67 | 85.1 | Haemochromatosis homozygote (C282Y) | Abnormal parenchyma and moderately fatty liver; two small calculi in gallbladder |
14 | M | 41 | 61.4 | Did not return to clinic | Left renal calculus – 8 mm |
15 | M | 61 | 64.8 | Deceased; metastatic cholangiocarcinoma | Dilated tubular structure extending into right lobe, numerous areas of calcification |
16 | M | 74 | 62.2 | HHCC normal | Abnormal parenchyma and a solid focal lesion on liver adjacent to gallbladder; moderately fatty liver |
17a | M | 73 | 70.5 | No result; alcoholic cirrhosis | Abnormal parenchyma and liver has irregular surface and abnormal coarse texture; cirrhotic changes; enlarged spleen at 15 cm |
18 | M | 32 | 51.2 | No longer registered | None |
19 | M | 58 | 59.5 | Did not return to clinic | None |
20 | M | 54 | 55.2 | HHCC normal | Ascites around liver and in pelvis |
21 | M | 50 | 50.8 | Did not return to clinic | None |
22 | M | 45 | 81.5 | Haemochromatosis; homozygote (C282Y) | Moderately fatty liver; 1.9 cm × 1.3 cm solid hypoechoic area adjacent to gallbladder |
23 | F | 64 | 60.5 | HHCC normal | None |
24 | M | 78 | 65.4 | HHCC normal | None |
25 | M | 54 | 64.8 | Compound heterozygote | None |
26 | M | 51 | 91.1 | Carrier, heterozygote | Mildly fatty liver and abnormal parenchyma |
27 | F | 50 | 94.1 | Haemochromatosis; homozygote (C282Y) | None |
28 | F | 75 | 50.1 | HHCC normal | Abnormal parenchyma and moderately fatty liver; enlarged liver |
29 | F | 72 | 71.5 | HHCC normal | None |
30 | M | 62 | 58.8 | Did not return to clinic | Abnormal parenchyma and moderately fatty liver; dilation of intrahepatic common bile duct at 8 mm |
31 | F | 54 | 72.4 | Hepatology referral; no result | None |
32a | F | 67 | 57.0 | Deceased; liver failure; alcohol dependence | Liver appears coarse – increased reflectivity in gallbladder – no shadowing |
33 | M | 39 | 53.7 | HHCC normal | None |
34 | F | 75 | 94.3 | Haemochromatosis; homozygote (C282Y) | None |
35 | M | 54 | 50.9 | HHCC normal | None |
36 | M | 49 | 53.2 | Compound heterozygote | Abnormal parenchyma and moderately fatty liver; area of calcification in head of pancreas |
37 | M | 36 | 55.5 | Hepatology referral; no result | None |
38 | F | 33 | 79.7 | HHCC normal | Abnormal parenchyma and mildly fatty liver |
39 | M | 62 | 54.2 | Carrier, heterozygote | None |
Wilson's disease
Four patients had abnormal caeruloplasmin levels (Table 49). Wilson's disease was excluded by 24-hour urine test in three patients. We reminded the GP of the remaining patient of the possible desirability of referral, but this does not seemed to have occurred.
Case no. | Age (years) | Caeruloplasmin (mg/dL) | Abdominal ultrasound | GP action |
---|---|---|---|---|
1 | 66 | 0.11 | Two renal calculi | Referral to liver clinic; Wilson's disease excluded by 24-hour urine test |
2 | 53 | 0.14 | Mildly fatty liver, smaller left lobe | Wilson's disease excluded by 24-hour urine test |
3 | 36 | 0.09 | None | Wilson's disease excluded by 24-hour urine test |
4 | 57 | 0.14 | Mildly fatty liver, abnormal parenchyma | Not referred |
Alpha-1 antitrypsin deficiency
Low A1AT levels were found in 47 patients. Thirty-seven have had phenotype testing and these were abnormal in three cases (Table 50). Cases 1 and 2 are under the care of a specialist. Case 3 has a lower risk phenotype (Pi MZ).
Case no. | Age (years) | A1AT | Phenotype | Follow-up |
---|---|---|---|---|
1 | 61 | A1AT | Pi SZ | Respiratory |
2 | 52 | A1AT | Pi SS | Hepatology |
3 | 60 | A1AT | Pi MZ |
Alcoholic cirrhosis
There were five cases in which the hepatologist agreed (on the basis of the ultrasound picture and history) that the patient had alcoholic cirrhosis (with some overlap with haemochromatosis). A further case of hepatocellular cancer was detected in a patient who did not have hepatitis B, but biopsy confirmed diagnosis of non-alcoholic steatohepatitis (NASH).
Other diseases (category 2) involving the liver
Metastatic cancers in the liver, cancer of the pancreas/bile duct and hypothyroidism are the common diseases. We did not include incidental cancers (e.g. cancer of kidney) or gallstones confined to the gallbladder in this category.
Diagnostic value of liver function tests
Objective
In this section the possibility of using LFTs to predict diagnostic group is explored by means of a discriminant analysis incorporating appropriate adjustment for patient-level demographic and clinical variables. These comprise age, sex, BMI, ethnicity and country of birth. Alcoholic consumption is excluded; although it may be implicated in the onset of certain conditions, it is clearly a lifestyle variable and, moreover, is the one variable in the data set which has not been objectively determined. The aim here is to summarise clinical information in such a way as to inform diagnosis. The LFT panels used here were obtained at FU1.
The analysis concerns (and is restricted to) the patients diagnosed with category 1a and 1b liver disease, together with those in the non-specific diagnostic category. These groups comprise 1222 patients in all. For 14% of these patients some of the clinical and demographic data were missing. In order to make full use of the information available the complete case analysis was supplemented with an analysis using an imputation method to cope with missing observations.
Method (complete case analysis)
Adjustment for laboratory effects
For the current analysis, each LFT was corrected for a multiplicative laboratory effect obtained from fitting the explanatory model derived in Chapter 4 (see Multivariate analysis) to the patients in the ‘non-specific’ diagnostic group. Inclusion of all patients – and incorporating diagnostic group in the model – might have weakened the ability to identify diagnostic groups whose prevalence varied across the groups of practices associated with different laboratories. In practice, there is little difference between these two approaches, as is evident from Table 51.
Analyte | Adjustment factors used (estimated from ‘non-specific’ diagnostic group) | Factors estimated from all diagnostic groups | ||||
---|---|---|---|---|---|---|
Birmingham (1): laboratory 1 | Birmingham (2): laboratory 2 | London: laboratory 3 | Birmingham (1): laboratory ab 1 | Birmingham (2): laboratory 2 | London: laboratory 3 | |
ALT | 1 | 1.37 | 1.05 | 1 | 1.32 | 0.99 |
AST | 1 | 1.32 | 1.20 | 1 | 1.31 | 1.17 |
Bilirubin | 1 | 0.87 | 1.24 | 1 | 0.87 | 1.22 |
ALP | 1 | 0.47 | 0.40 | 1 | 0.47 | 0.39 |
GGT | 1 | 1.12 | 0.98 | 1 | 1.10 | 0.98 |
Albumin | 1 | 1.00 | 1.03 | 1 | 1.00 | 1.03 |
Globulin | 1 | 0.97 | 0.96 | 1 | 0.97 | 0.96 |
Total protein | 1 | 0.98 | 1.00 | 1 | 0.99 | 1.00 |
Analysis strategy
There are three disease categories of primary interest:
-
liver disease (1a), including viral hepatitis (n = 32)
-
hepatitis B and C (n = 13)
-
liver disease (1b) (n = 12).
In addition, a separate analysis was conducted for liver disease (1a) – excluding viral hepatitis (n = 19).
In each case we attempt to discriminate between the ‘non-specific’ group (n = 1178) and the diagnostic group of interest using a linear predictor within a logistic discrimination set-up. In each analysis, subjects not falling into one of these two groups were ignored.
Forwards and backwards stepwise procedures were applied to the following list of variables:
-
log-LFTs (eight analytes)
-
sex
-
age group (six categories)
-
BMI (four categories)
-
ethnic group (four categories)
-
country of birth (three categories).
Interaction terms were not considered, mainly because the diagnostic groups were too sparsely populated in the data to avoid problems associated with complete outcome specification within some subgroups.
Initially, the predictors were identified using backwards elimination, with a p = 0.01 threshold for exclusion from the model. Comparison was made with a forwards selection procedure (with a p = 0.01 threshold for inclusion) and with more general stepwise selection procedures involving both forward and backward steps. In no case did the more general procedures identify a set of predictors that had not been obtained already using either the backwards or forwards methods, which are the only ones reported here.
Demographic variables and LFTs with independent discriminatory power (identified from the stepwise procedures) were included in a multiple logistic discriminant analysis of the non-specific group with liver disease categories 1a and 1b.
Results (complete case analysis)
Stepwise procedures
A summary of results for the four discrimination problems is presented in Tables 52–55. Pseudo R2 is a likelihood-based measure of fit with properties similar to a conventional R2-statistic, and the area under the curve (AUC) denotes the area under the ROC curve, a c-statistic that measures diagnostic potential. For both of these quantities, values close to ‘1’ indicate excellent predictive ability.
Stepwise procedure | Variables retained | Pseudo R2 | AUC |
---|---|---|---|
Backwards elimination | ALT, BMI, ethnic group | 0.219 | 0.861 |
Forwards selection | AST, country of birth | 0.160 | 0.771 |
Stepwise procedure | Variables retained | Pseudo R2 | AUC |
---|---|---|---|
Backwards elimination | ALT, BMI, ethnic group | 0.443 | 0.960 |
Forwards selection | AST, country of birth | 0.304 | 0.860 |
Procedure | Variables retained | Pseudo R2 | AUC |
---|---|---|---|
Backwards elimination | ALP | 0.164 | 0.842 |
Forwards selection | ALP | 0.164 | 0.842 |
Procedure | Variables retained | Pseudo R2 | AUC |
---|---|---|---|
Backwards elimination | ALT | 0.099 | 0.776 |
Forwards selection | AST | 0.105 | 0.749 |
Discrimination between diagnostic groups
From the results above, it appears that ALT, AST and ALP are the only LFT analytes that contribute independent discriminatory power to the problem of diagnosis. Accordingly, these were included in the discriminant analysis, alongside age, sex, BMI, ethnic group and country of birth. Four groups were retained in the analysis: non-specific; 1a, excluding hepatitis; hepatitis; and 1b. The potential of the method to distinguish between liver disease (categories 1a and 1b) and a non-specific diagnosis is summarised in Figure 11. This shows a ROC plot of the true-positive rate against the false-positive rate when the estimated probability of disease (i.e. probability of lying in either category 1a or category 1b) from a logistic discrimination analysis is used as a marker of disease.
According to this plot, when the threshold for a positive diagnosis is set so that the true-positive rate (sensitivity) is 90%, then more than half of the non-specific group will be misclassified as having liver disease (false-positive rate = 53%). This gives a pragmatic indication of discriminatory capability. In practice, the performance is likely to be worse than that depicted because the same data have been used both for estimation and assessment of the discriminant function.
Analysis of imputed data
Method
The stepwise method described above uses only complete cases, with the result that the final equation is computed only from patients with a complete LFT panel, and also ignores individuals with unrecorded BMI, ethnicity and country of birth. In all, 176 (14.4%) of the patients eligible for analysis had incomplete data, including 9 of the 44 patients with diagnosed liver disease in category 1. The result is that a substantial fraction of the information in the diseased groups has not contributed to the analysis presented above. To make better use of the available information, missing values were imputed from the rest of the data, augmented by age group (six categories), sex, alcohol consumption (six categories), liver fat (five categories from the sonographer's report, including non-visualised liver as a separate category) and diagnosis (five categories: non-specific; category 1a, excluding hepatitis; hepatitis; category 1b; and category 2). Imputation was carried out using the chained equation method of van Buuren et al. ,41 as implemented in Stata version 11 (StataCorp LP, College Station, TX, USA). 42 Twenty imputed samples were simulated. Within the stepwise procedures, the regression analyses were repeated for each imputed sample and the results combined using Rubin's rules.
The laboratory effects were re-estimated using the imputed samples, and are displayed in Table 56.
Analyte | Adjustment factors from imputed samples | ||
---|---|---|---|
Birmingham (1): laboratory 1 | Birmingham (2): laboratory 2 | London: laboratory 3 | |
ALT | 1 | 1.30 | 1.06 |
AST | 1 | 1.31 | 1.14 |
Bilirubin | 1 | 0.89 | 1.21 |
ALP | 1 | 0.48 | 0.42 |
GGT | 1 | 1.12 | 0.98 |
Albumin | 1 | 1.00 | 1.03 |
Globulin | 1 | 0.97 | 0.96 |
Total protein | 1 | 0.98 | 1.00 |
Results (imputed data)
Similar stepwise procedures were followed as for the complete case analyses. The results are displayed in Tables 57–60. These may be compared with Tables 52–55. For disease category 1b there are eight complete cases, augmented to 12 in the imputation analysis. Nevertheless, the same variable (ALP) emerges as the only significant predictor in both sets of analyses. The situation in category 1a is less clear cut. Here there are 27 complete cases and 32 cases in the imputation analyses. ALT and AST emerge as the only LFTs that figure in any of the selected models, although never together in the same model. For the non-hepatitis category, category 1a (19 cases, 16 complete), BMI features in the imputation model, enhancing the discrimination achieved by ALT in the complete case analysis. In the imputation analysis, country of birth (in combination with ALT or AST) has superseded ethnicity and BMI (also with ALT or AST) in predicting hepatitis (13 cases, 11 complete).
Stepwise procedure | Variables retained | Pseudo R2 | AUC |
---|---|---|---|
Backwards elimination | ALT, BMI | 0.144 | 0.800 |
Forwards selection | AST, country of birth | 0.135 | 0.755 |
Stepwise procedure | Variables retained | Pseudo R2 | AUC |
---|---|---|---|
Backwards elimination | ALT, country of birth | 0.308 | 0.924 |
Forwards selection | AST, country of birth | 0.302 | 0.894 |
Procedure | Variables retained | Pseudo R2 | AUC |
---|---|---|---|
Backwards elimination | ALP | 0.189 | 0.835 |
Forwards selection | ALP | 0.189 | 0.835 |
Procedure | Variables retained | Pseudo R2 | AUC |
---|---|---|---|
Backwards elimination | ALT | 0.075 | 0.758 |
Forwards selection | AST | 0.096 | 0.756 |
The discriminant analysis described above for complete cases was repeated for the imputed samples, using ALT, AST and ALP alongside age, sex, BMI and country of birth. In contrast to the complete case analysis, ethnic group was omitted from further analysis as it does not feature in the imputed stepwise results. The implied diagnostic performance does not change substantially, although the trade-off between false-positive and true-positive rates (as measured by the AUC statistic) is marginally less favourable (Figure 12).
Discussion
The general conclusion appears to be that automatic diagnosis of serious conditions has little chance of success given the high false-positive rates. The diagnosis of viral hepatitis is the most promising possibility, with an estimated AUC of > 0.90. The power of the method relies heavily on the country of origin of the patient.
Among the LFT panel only three analytes make a significant contribution: ALP is the only useful predictor of disease category 1b; ALT and AST are both implicated in the diagnosis of category 1a diseases, though they appear to substitute for one another. It remains unclear which should be preferred if a choice has to be made.
Fatty liver on ultrasound
Of the 1277 participants in whom the texture of the liver could be discerned, 484 (38%) (see Figure 6) had an ultrasound diagnosis of fatty liver at FU1, and this was classified as moderate or marked in 221 cases (46%).
Fatty liver in patients with abnormal liver function tests
The presence of an abnormal ALT or AST was associated with an increased likelihood of fatty liver, but the prevalence of fatty liver among patients with abnormal bilirubin or ALP was reduced. Results for GGT were of marginal significance and the protein analytes exhibited no clear effects (Table 61).
Analyte | Tests, n | Abnormal, n | Fatty liver, n (%) | Normal, n | Fatty liver, n (%) | p-value |
---|---|---|---|---|---|---|
ALT | 1102 | 434 | 230 (53.0) | 668 | 178 (26.6) | < 0.001 |
AST | 1147 | 255 | 137 (53.7) | 892 | 307 (34.4) | < 0.001 |
Bilirubin | 1252 | 148 | 36 (24.3) | 1104 | 437 (39.6) | < 0.001 |
ALP | 1259 | 186 | 41 (22.0) | 1073 | 432 (40.3) | < 0.001 |
GGT | 1139 | 858 | 342 (39.9) | 281 | 90 (32.0) | 0.019 |
Albumin | 1265 | 29 | 7 (24.1) | 1236 | 470 (38.0) | 0.174 |
Globulin | 966 | 55 | 16 (29.1) | 911 | 348 (38.2) | 0.199 |
Total protein | 970 | 96 | 34 (35.4) | 874 | 332 (38.0) | 0.659 |
A detailed breakdown of the severity of fatty liver when different analytes (and pairs of analytes) are abnormal is given in Table 62. The PPVs refer to the performance of the LFTs in determining that the liver is in the mild, moderate or severe category on ultrasound. The aminotransferase enzymes (ALT and AST) have the highest PPVs.
Analyte | Fatty liver status when LFT abnormal | Per cent fatty liver | ||||
---|---|---|---|---|---|---|
Normal, n (%) | Mild, n (%) | Moderate, n (%) | Severe, n (%) | PPV (%) | CI | |
Complete panel | 524 (62.5) | 170 (20.3) | 119 (14.2) | 26 (3.1) | 37.5 | 34.3 to 40.9 |
Single analytes | ||||||
ALT | 181 (46.5) | 102 (26.2) | 82 (21.1) | 24 (6.2) | 53.5 | 48.4 to 58.5 |
AST | 106 (46.3) | 63 (27.5) | 42 (18.3) | 18 (7.9) | 53.7 | 47.0 to 60.3 |
Bilirubin | 96 (74.4) | 25 (19.4) | 7 (5.4) | 1 (0.8) | 25.6 | 18.3 to 34.0 |
ALP | 126 (77.3) | 19 (11.7) | 15 (9.2) | 3 (1.8) | 22.7 | 16.5 to 29.9 |
GGT | 473 (60.3) | 151 (19.3) | 129 (16.5) | 31 (4.0) | 39.7 | 36.2 to 43.2 |
Albumin | 19 (73.1) | 6 (23.1) | 1 (3.8) | 0 (0.0) | 26.9 | 11.6 to 47.8 |
Globulin | 37 (69.8) | 9 (17.0) | 7 (13.2) | 0 (0.0) | 30.2 | 18.3 to 44.3 |
Total protein | 58 (65.2) | 20 (22.5) | 10 (11.2) | 1 (1.1) | 34.8 | 25.0 to 45.7 |
Pairs of analytes | ||||||
ALT or AST | 174 (48.3) | 94 (26.1) | 72 (20.0) | 20 (5.6) | 51.7 | 46.4 to 56.9 |
ALT or bilirubin | 248 (53.1) | 111 (23.8) | 84 (18.0) | 24 (5.1) | 46.9 | 42.3 to 51.5 |
ALT or ALP | 253 (53.7) | 105 (22.3) | 88 (18.7) | 25 (5.3) | 46.3 | 41.7 to 50.9 |
ALT or GGT | 499 (58.6) | 179 (21.0) | 139 (16.3) | 34 (4.0) | 41.4 | 38.0 to 44.8 |
ALT or albumin | 194 (48.1) | 103 (25.6) | 82 (20.3) | 24 (6.0) | 51.9 | 46.9 to 56.8 |
ALT or globulin | 181 (49.9) | 95 (26.2) | 70 (19.3) | 17 (4.7) | 50.1 | 44.9 to 55.4 |
ALT or total protein | 196 (51.0) | 100 (26.0) | 71 (18.5) | 17 (4.4) | 49.0 | 43.9 to 54.1 |
AST or bilirubin | 169 (54.9) | 77 (25.0) | 43 (14.0) | 19 (6.2) | 45.1 | 39.5 to 50.9 |
AST or ALP | 197 (57.4) | 75 (21.9) | 51 (14.9) | 20 (5.8) | 42.6 | 37.3 to 48.0 |
AST or GGT | 462 (59.9) | 157 (20.4) | 122 (15.8) | 30 (3.9) | 40.1 | 36.6 to 43.6 |
AST or albumin | 119 (48.6) | 67 (27.3) | 41 (16.7) | 18 (7.3) | 51.4 | 45.0 to 57.8 |
AST or globulin | 113 (53.3) | 51 (24.1) | 34 (16.0) | 14 (6.6) | 46.7 | 39.8 to 53.7 |
AST or total protein | 135 (55.3) | 59 (24.2) | 36 (14.8) | 14 (5.7) | 44.7 | 38.3 to 51.1 |
Bilirubin or ALP | 216 (76.3) | 43 (15.2) | 20 (7.1) | 4 (1.4) | 23.7 | 18.8 to 29.1 |
Bilirubin or GGT | 520 (62.1) | 160 (19.1) | 126 (15.0) | 32 (3.8) | 37.9 | 34.6 to 41.3 |
Bilirubin or albumin | 112 (74.2) | 30 (19.9) | 8 (5.3) | 1 (0.7) | 25.8 | 19.1 to 33.6 |
Bilirubin or globulin | 90 (72.0) | 24 (19.2) | 11 (8.8) | 0 (0.0) | 28.0 | 20.3 to 36.7 |
Bilirubin or total protein | 111 (69.8) | 33 (20.8) | 14 (8.8) | 1 (0.6) | 30.2 | 23.2 to 38.0 |
ALP or GGT | 505 (62.0) | 150 (18.4) | 127 (15.6) | 33 (4.0) | 38.0 | 34.7 to 41.5 |
ALP or albumin | 141 (76.6) | 24 (13.0) | 16 (8.7) | 3 (1.6) | 23.4 | 17.5 to 30.2 |
ALP or globulin | 115 (77.2) | 17 (11.4) | 15 (10.1) | 2 (1.3) | 22.8 | 16.3 to 30.4 |
ALP or total protein | 134 (73.2) | 28 (15.3) | 18 (9.8) | 3 (1.6) | 26.8 | 20.5 to 33.8 |
GGT or albumin | 476 (60.7) | 151 (19.3) | 126 (16.1) | 31 (4.0) | 39.3 | 35.8 to 42.8 |
GGT or globulin | 418 (62.0) | 131 (19.4) | 102 (15.1) | 23 (3.4) | 38.0 | 34.3 to 41.8 |
GGT or total protein | 431 (62.3) | 136 (19.7) | 102 (14.7) | 23 (3.3) | 37.7 | 34.1 to 41.4 |
Albumin or globulin | 47 (73.4) | 10 (15.6) | 7 (10.9) | 0 (0.0) | 26.6 | 16.3 to 39.1 |
Albumin or total protein | 65 (67.7) | 20 (20.8) | 10 (10.4) | 1 (1.0) | 32.3 | 23.1 to 42.6 |
Globulin or total protein | 72 (67.3) | 22 (20.6) | 12 (11.2) | 1 (0.9) | 32.7 | 24.0 to 42.5 |
The proportions of fatty livers according to BMI and alcohol consumption are shown in Figure 13 and Table 63. Among people with abnormal LFTs, the probability of fatty liver is over 64.6% if they are obese (BMI ≥ 30 kg/m2) drinkers, rising to 73.8% if the abnormality includes an abnormal ALT. However, there is a sizeable probability (31%) of fatty liver in participants who were neither (moderate to heavy) drinkers nor obese.
Risk | Liver | ||
---|---|---|---|
Normal, n (%) | Fatty, n (%) | Total, n (%) | |
No risk | 450 (76.3) | 140 (23.7) | 590 (100.0) |
High alcohol | 93 (60.4) | 61 (39.6) | 154 (100.0) |
High BMI | 183 (46.6) | 210 (53.4) | 393 (100.0) |
High alcohol + BMI | 34 (35.4) | 62 (64.6) | 96 (100.0) |
Not known | 33 (75.0) | 11 (25.0) | 44 (100.0) |
Patient characteristics associated with ultrasound diagnosis of fatty liver
The present study provides an opportunity for a detailed investigation of the impact of patient characteristics on fatty liver – especially the lifestyle factors of weight and alcohol consumption. LFT results may also be considered to see how well they discriminate between fatty and non-fatty livers without reference to ultrasound. The analysis was done using the FU1 results.
A logistic regression model was constructed for the presence of fatty liver on ultrasound using a backwards elimination approach. The explanatory variables considered were:
-
sex
-
age group (six categories)
-
ethnic group (four categories)
-
BMI (four categories)
-
alcohol consumption (six categories).
These variables were entered into the model, together with all two-way interactions involving age or sex. Then all non-significant interaction terms (p > 0.05) were sequentially removed. At this stage, the intention had been to remove also non-significant main effects for variables not included in any surviving interaction (in practice, this eventuality did not arise). The method was repeated for an analysis of the ordinal category (‘severity’) for fatty liver using ordinal logistic regression.
The model was supplemented by LFTs (log transformed) obtained from a stepwise procedure involving both forwards and (potentially) backwards steps.
The backwards elimination and stepwise analyses were performed both for presence/absence of fatty liver (using ordinary logistic regression) and for severity (using ordinal logistic regression). The results, in terms of variables selected, were identical. Although several analytes are individually correlated with the presence of fatty liver, these correlations are subsumed into two analytes only – ALT and albumin. Once these two are entered into the model, no other analyte achieves predictive significance for the presence or severity of fatty liver.
Table 64 summarises the patient characteristic model from the logistic regression analysis. This model achieved a pseudo R2 of 15.3% and an AUC measure of discrimination of 0.75. The ordinal regression results (not shown) are similar. It can be seen that BMI is by far the most important predictor here.
Terms in model | Degrees of freedom | Chi-squared | p-value |
---|---|---|---|
Age group | 5 | 37.05 | 0.0000 |
BMI | 3 | 102.79 | 0.0000 |
Ethnic group | 3 | 10.72 | 0.0134 |
Alcohol × sex interaction | 5 | 13.33 | 0.0205 |
Alcohola | 5 | 16.07 | 0.0067 |
Sexa | 1 | 2.97 | 0.0850 |
The model described above includes main effects for sex, age group (six categories), ethnic group (four categories), BMI (four categories) and alcohol consumption (six categories) together with the sex × alcohol interaction. The model may be simplified using linear and quadratic components for these terms as appropriate, leading to a more readily interpretable analysis.
Table 65 shows the effect on the deviance of sequentially replacing the categorical variables alcohol, age group and BMI by their linear and quadratic components. It appears from the p-values in the table that the fit of the model is not compromised by this process since the change in deviance is less than the change in degrees of freedom (df) in every case. Finally, it turns out that the interaction of sex with the quadratic component of alcohol (1 df) is not formally significant (p = 0.0697). It is convenient to omit this term from the final model, especially as its interpretation is not straightforward in any case.
Source | Deviance (negative) | df | Likelihood ratio (chi-squared) test | ||
---|---|---|---|---|---|
Change in deviance | df | p-value | |||
1. Full model | 244.95 | 22 | – | – | – |
2. Linear and quadratic alcohol | 239.34 | 16 | 5.61 | 6 | 0.4683 |
3. Linear and quadratic age group and alcohol | 237.57 | 13 | 1.77 | 3 | 0.6215 |
4. Linear BMI category | 237.13 | 11 | 0.44 | 2 | 0.8025 |
5. After removal of sex × quadratic alcohol | 233.84 | 10 | 3.29 | 1 | 0.0697 |
The final (simplified) model (numbered ‘5’ in the table) predicts that the probability of having a fatty liver:
-
has an inverted U-shaped relationship with age, reaching a maximum at around age 55 years
-
increases with BMI
-
increases with alcohol intake above 30 units per week, though with some variation between the sexes
-
is less for patients of Asian origin (compared with white patients).
These features are shown graphically in Figures 14 and 15, which show the relationship between fatty liver, age and alcohol intake separately for males and females of normal BMI and BMI > 30 kg/m2. It is apparent that fatty liver is more responsive to alcohol intake in females than in males. In males there is perhaps even a suggestion that alcohol may have a protective effect at low doses, a finding corroborated in the literature (see Discussion).
Liver function tests were added to the model of Table 65 using a stepwise procedure. Only two LFTs (ALT and albumin) were retained and the results are summarised in Table 66. LFT is an important predictor here, second only to BMI. The pseudo R2 = 22.1% and AUC = 0.802. These figures do not suggest that LFTs could furnish a reliable substitute for ultrasound for the determination of fatty liver.
Terms in model | df | Chi-squared | p-value |
---|---|---|---|
Age group | 5 | 19.36 | 0.0016 |
BMI | 3 | 95.98 | 0.0000 |
Ethnic group | 3 | 7.52 | 0.0571 |
Alcohol × sex interaction | 5 | 12.35 | 0.0303 |
ALT | 1 | 66.80 | 0.0000 |
Albumin | 1 | 18.24 | 0.0000 |
Alcohola | 5 | 9.78 | 0.0816 |
Sexa | 1 | 0.01 | 0.9192 |
Persistence of fatty liver from first to second follow-up
Thirteen patients were excluded from this analysis, as belonging to the serious disease categories (categories 1 and 2). This left 628 cases for analysis. LFTs, BMI and alcohol intake were all taken from the FU2 data. Logistic regression analyses were performed with FU2 fatty liver as the outcome variable. Fatty liver at FU1 was included as a predictor in the analyses.
Sparseness of data for some of the covariate combinations impeded the fitting of the above models to the second follow-up sonography data. (For example, there are no instances of fatty liver at FU2 among the eight subjects with BMI of < 20 kg/m2.) To simplify the model fitting, age group (six categories) was replaced by linear and quadratic effects (2 df) and the category for BMI < 20 20 kg/m2 was amalgamated with the base category (25 kg/m2 ≤ BMI < 30 kg/m2). For ease of interpretation, units of alcohol were represented by the linear component of the six-level categorical variable.
The results (without LFTs) are summarised in Table 67 (pseudo R2 = 29.8% and AUC = 0.845). As might be expected, the presence of fatty liver at FU1 is the most important predictor of fatty liver at FU2.
Terms in model | df | Chi-squared | p-value |
---|---|---|---|
Fatty liver at FU1 | 1 | 97.92 | 0.0000 |
Age group | 2 | 7.91 | 0.0192 |
BMI | 2 | 22.45 | 0.0000 |
Ethnic group | 3 | 3.98 | 0.2639 |
Alcohol | 1 | 3.74 | 0.0531 |
Sex | 1 | 0.49 | 0.4823 |
The results of adding ALT and albumin to the model are summarised in Table 68 (pseudo R2 = 28.7% and AUC = 0.841). Fatty liver at FU1, BMI and ALT are the only variables that contribute independently to the chance of fatty liver at FU2. AST could stand in for ALT, but results in a marginally inferior fit. Albumin no longer features as a significant predictor either instead of ALT or in addition to it.
Terms in model | df | Chi-squared | p-value |
---|---|---|---|
Fatty liver at FU1 | 1 | 77.76 | 0.0000 |
Age group | 2 | 4.19 | 0.1230 |
BMI | 2 | 17.74 | 0.0001 |
Ethnic group | 3 | 2.86 | 0.4137 |
Alcohol | 1 | 1.45 | 0.2283 |
Sex | 1 | 0.10 | 0.7518 |
ALT | 1 | 8.35 | 0.0039 |
Albumin | 1 | 1.94 | 0.1639 |
The effect of changes in body mass index and alcohol intake on fatty liver
It is clear from earlier sections that raised BMI is the most important risk factor for fatty liver. As BMI is partly determined by lifestyle and voluntary behaviour, it is of some interest to determine whether or not a change in BMI over the period of the study is associated with a concomitant change in fatty liver status for the individual patient. The analysis of this question is confined to the ‘non-specific’ diagnostic group. Table 69 suggests an association between even small reductions in BMI and improved liver fat.
Liver fat from FU1 to FU2 | n | BMI (kg/m2), mean (SD) | ||
---|---|---|---|---|
BMI at FU1 | Change in BMI from FU1 to FU2 | % change (within patient) | ||
Improved liver fat | 129 | 32.4 (6.2) | −0.5 (2.1) | −1.3 (6.3) |
Unchanged liver fat | 397 | 28.3 (5.2) | −0.0 (1.9) | 0.1 (6.8) |
Worsened liver fat | 80 | 32.1 (5.7) | 0.1 (2.3) | 0.5 (6.4) |
Total | 606 | 29.7 (5.8) | −0.1 (2.0) | −0.1 (6.7) |
The association was investigated by an ordinal logistic regression analysis, using a seven-level outcome variable, defined as the difference in the ordinal number of the liver fat category between FU1 and FU2, i.e. a measure of liver fat improvement ranging from −3 (representing a change from ‘normal’ to ‘severe’) to +3 (i.e. a change from ‘severe’ to ‘normal’). This outcome was regressed on percentage change in BMI, with a marginally significant result [p = 0.030, odds ratio (OR) = 0.76 (95% CI 0.60 to 0.97)] per 10 percentage points change in BMI.
Change in alcohol consumption (represented as a difference in units per week on the square root scale) was added to these models, though without any significant, or near significant, finding (Table 70).
Term | OR | 95% confidence limits | p-value | |
---|---|---|---|---|
% change in BMI (÷ 10) | 0.77 | 0.60 | 0.98 | 0.032 |
Change in alcohol intake (square root) | 0.97 | 0.90 | 1.05 | 0.503 |
These results are necessarily inconclusive, but furnish some confirmatory evidence for the benefits of weight loss on fatty liver.
It is of great interest to explore whether or not the finding of a fatty liver prompts weight loss. There was a non-significant change in BMI in the hypothesised direction. The main weight change in the fatty liver group is −0.4% and in the non-fatty liver group is 0.2% (p = 0.30).
Other ultrasound features
At FU1, the liver was abnormal in size in 58 cases (4.5%), and was large in all but two of these. Eight patients had a diagnosis of diffuse cirrhosis.
A focal lesion was found in 106 cases (8.2%), but in only 21 cases was the lesion suspicious (20) or obviously malignant (1). The gall bladder was identified in 1150 cases (90%) and gallstones were detected in 191 (17%) of those.
The extrahepatic bile ducts were dilated in 29 of the 1123 patients in whom they were seen (2.4%), and the mean ALP was higher in these cases (271 vs 203). The difference in ALP was even greater when 12 out of 1230 (0.98%) where the intrahepatic duct was dilated (364 vs 203).
Chapter 5 Substudies
Psychology 1: effects of positive tests
Background
Chronic liver disease is often asymptomatic or associated with non-specific symptoms and its early diagnosis is usually through the use of blood-based LFTs, which are routinely requested in primary care. Although the result of an LFT might indicate serious liver pathology, an abnormal result is much more often a chance finding, with a predictive value of < 5%. In fact, as we pointed out earlier, the proportion of people who really benefit from an abnormal LFT result is much smaller than 5%. Most of the cases of haemochromatosis and PBC identified in BALLETS appeared to be progressing at a very slow rate and were likely to have lead times longer than patients' remaining lives. There must be great doubt about the benefits that would have arisen from identifying four cases of metastatic cancer. The 1% of patients with chronic viral hepatitis really did stand to benefit, but over 1300 positive test results is a considerable number when only 13 patients are to benefit. This ‘yield’ would be especially worrying if it was associated with anxiety sufficient to impair quality of life. On the other hand, long-term benefits might accrue if LFT results prompted people to adopt healthier lifestyles – a situation that could arise if LFT results were used to reinforce behaviour change advice in addition to their rather minimal diagnostic value. The psychological consequences of testing are therefore important. In this section we consider the effect of psychological testing on anxiety. We considered the effects of LFT results on behaviour in Chapter 4 and will do so again in this chapter (see Psychology 2: effects of results on behaviour).
A psychological evaluation was added to the main study to monitor any psychological harms created by reporting abnormal LFT test results to patients and informing them of their ultrasound results. Previous evaluations of the process of screening report negative effects on psychological outcomes including anxiety, depression and reduced quality of life in the short term, but little effect in the longer term. 43,44 Screening for potential liver disease, however, has not been investigated and this study therefore examined the psychological effects on patients.
Methods and rationale
Procedures
Participants completed psychological assessment questionnaires at two points: at recruitment, following results of the index test (T1), and again at 2 years (T2).
A pilot study was implemented to inform development of psychological questionnaires (T1 and T2) for use in the main study. This phase gave the research team a clearer idea of the ways that patients tend to think about and respond to abnormal test results.
The first questionnaire (T1) was ready to administer 11 months after the recruitment phase commenced, and when the first 250 patients had already been recruited to the study.
There were slight differences in the administration of T1 at individual practices, as the study clinical process was modified to merge with routine practice. All changes to the clinical process were approved by ethics and local research and development committees.
General practitioners at three practices invited patients to take part in the study and practice administration staff posted information sheets and T1 questionnaires to patients. At the remaining eight practices, GPs identified patients meeting the study criteria, and provided the research team with a patient list. The research team telephoned listed patients to invite them to a clinic and posted T1 questionnaires and other study documentation to patients prior to clinics.
Patients who had not completed a questionnaire were offered another opportunity on arrival at the clinic. Reminder letters and additional T1 questionnaires were sent by study psychologists 1 week following non-response.
Two-year follow-up (T2) questionnaires (Appendix 1, section 10.10.d) were posted to patients, along with information concerning their study appointment. Patients were asked to complete the questionnaire and either return it by post or bring it with them to their study appointment. Again a further supply of questionnaires was available at GP surgeries so that patients who had not completed a questionnaire could be offered another opportunity to do so.
Outcome measures
Disease-specific worry
Disease-specific worry was assessed using the item ‘How worried are you about the health of your liver?’, adapted from Lerman's cancer-specific worry scale. 45 Participants responded on a seven-point scale ranging from 1, ‘not at all worried’, through to 7, ‘extremely worried’.
State anxiety
This was assessed using the short form of the Spielberger State Trait Anxiety Inventory,46 in which participants are asked to rate six mood states: calm, tense, upset, relaxed, content and worried. Items are scored on a four-point scale ranging from 1, ‘not at all’, to 4, ‘very much’. Scores were transformed to provide a scale ranging between 0 and 100, with higher scores indicative of higher state anxiety.
Self-assessed health
This was assessed using responses to five items from the Short Form questionnaire-36 items (SF-36) health survey47 comprising a single item rating of self-rated health (‘Would you say your health is: excellent, very good, good, fair, poor?’) and four further items: ‘I seem to get sick a little easier than other people’; ‘I am as healthy as anybody I know’; ‘I expect my health to get worse’; ‘My health is excellent’. Scores were transformed to provide a scale range from 0 to 100, with higher scores indicating higher self-assessed health.
Results
Not all patients were offered a psychological assessment questionnaire to complete, and some declined. Overall, 527 questionnaires were obtained following the index test (T1). Two years later, T2 questionnaires were returned by 596, of whom 243 had returned baseline questionnaires.
Table 71 shows the demographic and clinical characteristics of the 527 patients completing the T1 questionnaire.
Demographic characteristics | |
Age (years), mean (SD) | 57.5 (15.5) |
Gender, n (%) | |
Male | 296 (56) |
Female | 230 (44) |
Ethnicity, n (%) | |
White | 445 (87) |
Other | 65 (13) |
Social deprivation, IMD score: mean (SD) | 36.5 (7.8) |
Clinical characteristics | |
BMI (kg/m2), mean (SD) | 29.3 (6.3) |
Waist–hip ratio | 0.93 (0.09) |
Alcohol units per week: mean (SD) | 14.9 (28.9) |
Fatty liver: n (%) | |
Yes | 179 (35) |
No | 336 (65) |
Repeat abnormal blood test result, n (%) | |
Yes | 430 (83) |
No | 89 (17) |
Baseline characteristics of patients completing T1 (as shown in Table 72) were compared with those not completing the questionnaire: there were no significant differences. Similarly, there were no significant differences between this cohort and those responding to both T1 and T2 questionnaires except for age, with the latter being slightly older (mean age 59.5 vs 56.9 years; t-test, p < 0.01).
Psychological assessment | T1 | T2 |
---|---|---|
Self-rated health | 68.85 (17.69) | 63.37 (15.29) |
The impact over time of the report of an abnormal LFT was examined in those patients returning questionnaires at T1 and T2. Table 72 shows that both anxiety and worry declined significantly over the 2-year period; there was no change in self-reported health.
As the impact of the report of an abnormal LFT might have been amplified by the subsequent diagnosis of fatty liver following ultrasound, the change in emotional state was examined in those with and without a reported fatty liver. Figures 16 and 17 show similar declines in anxiety and worry over the 2 years, irrespective of the diagnosis of a fatty liver.
Discussion
The results of this study are in accord with previous research into the impact of screening, that screening might well raise initial anxieties and worries but these soon return to levels within the normal range. Previous research on the emotional impact of screening, however, has largely examined the impact of the screen on the bulk of patients, who are subsequently judged to be negative. In that context, initial anxiety might well be allayed by the reassurance of a negative test. In this study, however, the cohort recruited were those patients who had screened positive albeit with an indicator with predictive significance that was poorly calibrated but not likely to be high. As the GPs had to inform the patients of the results, it is likely that they were reassuring: LFT results were slightly raised, but this was probably of no serious clinical significance. In that sense, patients may have interpreted their abnormal blood test as within normal limits and therefore as a sort of negative.
However, fatty liver, which does suggest the early signs of liver disease, was diagnosed in one-third of patients. This diagnosis was reported to the patients after they had completed their baseline T1 questionnaire. It might be expected that, if patients had been alarmed by a fatty liver diagnosis, this would have been apparent at 2 years. But again, whatever the initial concerns, these had clearly dissipated over time. Lastly, if sustained anxiety is a necessary ingredient of behaviour change, then this would suggest that the fatty liver diagnosis would not affect lifestyle to a material degree. If, on the other hand, anxiety is a necessary trigger for change which then becomes self-reinforcing then the initial anxiety may be sufficient. The initial levels of anxiety are slightly higher in those ‘with’ than in those ‘without’ a diagnosis of fatty liver.
In conclusion, this study confirms previous research showing that screening has no long-term emotional effects. Where it adds to these findings is that, when the screening result is ‘positive’ but surrounded by prognostic uncertainty, this too becomes normalised over time and has no long-term effect.
Psychology 2: effects of results on behaviour
Background
The BALLETS study recruited 1300 patients from Birmingham and Lambeth practices. At the initial assessment 40% of study patients were found to have fatty liver on ultrasound. The recognised primary treatments for fatty liver are diet and regular exercise. 48
The literature on the subject suggests that the ‘working alliance’ between care provider and patient is important in adopting health behaviours. 49 This alliance is defined by the mutual agreement on goals and objectives and the extent of the emotional bond (liking or trust) between patient and provider. 50 In addition, care providers use a number of verbal compliance strategies in which the subtle use of language can help influence a patient's behaviour. 51 Recently, evidence emerged that moderate- and low-level lifestyle counselling interventions in patients with fatty livers are a practical and effective method of improving health. 52
At Birmingham follow-up clinics, where patients returned for a repeat USS, BMI and LFT, research nurses had positive anecdotal reports from patients regarding improved drinking, eating and exercise habits. Many reported having had an abnormal first ultrasound. The results were supported by preliminary analysis from the first 277 patients who were followed up at FU2. In the event, an association between a change in mass and a reversal of fatty liver to normal was confirmed in the final analysis (see Chapter 4, The effect of changes in body mass index and alcohol intake on fatty liver). We therefore sought an extension to conduct a qualitative study of BALLETS patients to better understand a possible modifying effect on behaviour of having an ultrasound showing fatty liver.
Methods and rationale
Based on the above evidence, we conducted a qualitative substudy to explore the patient's experience of participation in the BALLETS study with respect to the finding of fatty liver. We focused on the patient's perception of the results from the initial scan, how the results were imparted to the patient and whether the finding of a fatty liver led to making any lifestyle changes. Therefore, the main aim of this substudy was to understand the overall experience of taking part in the BALLETS study with special reference to the psychological impact of the finding of fatty liver on USS.
Forty patients who participated in the BALLETS study and attended for initial clinics (FU1) and follow-up clinics (FU2) were invited to be interviewed. These 40 patients were divided into four subgroups (Table 73) according to whether or not their BMI had reduced and whether or not their ultrasound showed fatty liver.
BMI | Fatty liver at FU1 | Non-fatty liver at FU1 |
---|---|---|
Unchanged at FU2 | 10 | 10 |
Reduced by ≥ 5% at FU2 | 10 | 10 |
Patients were randomly selected across the four categories using BALLETS study participant ID numbers. Patients from all GP practices taking part in BALLETS were included. Because it was expected that not all patients would agree to take part in the substudy, 20 patients were randomly selected for each group. Patients were phoned by the substudy research associate, in list order. If patients could not be contacted or were unable to take part, the next patient on the list was invited, until the final sample of 40 was reached. Patients were sent information sheets after providing verbal consent. An appointment was made for the research associate to interview patients if they were in agreement.
All Birmingham, BALLETS study sonographers were invited to be interviewed to determine their opinions on the consultation process, the methods used to impart the results of the scan and possible implications.
Interview process
During the main visit, informed consent was sought by the research associate (DC), who had received informed consent training. The interview was semistructured in nature. Interviews took 30–60 minutes and were audio-recorded with the permission of the patient. A similar process was used for approaching and surveying sonographers, using semistructured interviews.
Data analysis
Once all the interviews were complete, they were transcribed. The transcripts were anonymised. Each transcript was analysed using a qualitative data analysis method of interpretative phenomenological analysis53–56 as an attempt to unravel the meanings contained in the transcripts. 54 This method recognises that the meanings that people ascribe to events are the product of interactions between people in the social world. 57 The analysis explored the participants' view of the world adopting an ‘insider perspective’58 of the phenomenon under study. 54 This is in accordance with the guidelines of Elliott et al. 59 and Parker60 for good qualitative research, whereby owning up to one's perspective and assumptions helps readers to interpret and understand the researcher's data.
The transcripts were analysed by the research associate who conducted the patient interviews, the research associate who conducted the sonographer interviews and the research fellow who wrote the study protocol. A process of data triangulation took place with constant comparison of emergent themes and discussion within the research team.
Results
We interviewed 40 participants (see Methods and rationale) and five key themes emerged. These are described below and in Table 74. Five sonographers were also interviewed in order to gain a greater understanding of the consultation process.
Subgroups | Research participation | Health beliefs | |||||||
---|---|---|---|---|---|---|---|---|---|
Theme 1: poor recall of study participation | Theme 2: motivation to participate | Theme 3: normalisation of results such as LFTs, fatty liver | Theme 4: external factors such as other illnesses | Theme 5: lifestyle awareness | |||||
Health concerns | Maintenance of relationship with GP/practice | Altruism | Understanding of what affects the liver | Awareness of healthy lifestyle factors | Proactive behaviour | ||||
Group A (no change in BMI/normal liver) | 5 | 4 | 2 | 5 | 5 | 6 | 5 | 7 | 1 |
Group B (no change in BMI/fatty liver) | 5 | 6 | 2 | 3 | 5 | 6 | 5 | 9 | 6 |
Group C (> 5% reduction in BMI/normal liver) | 8 | 2 | 1 | 4 | 3 | 4 | 4 | 4 | 2 |
Group D (> 5% reduction in BMI/fatty liver) | 5 | 4 | 4 | 5 | 7 | 6 | 4 | 9 | 6 |
Participants
Theme 1: poor recall of BALLETS research study
Most participants exhibited poor recall regarding their involvement in BALLETS including the results from either of the scans. In fact, some appeared unaware that they had participated in a research study. One participant suggested that he confused BALLETS with regular visits to the GP for clinical reasons.
I don't think there was anything different from perhaps going on other occasions.
(Patient 2B)
Most participants seemed to have problems remembering the time scale and sequence of events associated with the study, with many having poor recall of results obtained from either consultation. Two common explanations for poor recall of study results emerged. Participants believed that if an untoward result was reported they would remember the information.
I would have remembered if … if they'd said anything derog … you know anything that may have been wrong.
(Patient 2B)
In addition, poor recall of scan results was also explained by the sense of trust engendered by their GP or their practice, which meant that the participant would be contacted if any adverse results emerged.
I just thought it was all right because I thought that is there was something wrong it would come back to the doctor
(Patient 17A)
Theme 2: reasons for participating in the BALLETS study
Reasons for participating fell into one of three subthemes.
Participants felt that involvement in BALLETS would benefit their health.
… as I say I'm willing to do these things because it's helping me as well.
(Patient 13B)
Some participants were concerned about hereditary health conditions which motivated them to consult their GP leading to a greater awareness of their health and their decision to take part in BALLETS.
Mum was concerned that there may be something hereditary in our family to do with the heart, so he did a load of tests and then he found that my liver reading was borderline or slightly higher (laughs) or lower than it should have been. I don't really understand much about my liver and anyway it was after that the study contacted me and asked if I'd like to take part so I thought I may as well.
(Patient 18C)
Participants would take part in research if asked to by their GP to help maintain a constructive relationship with the GP or practice.
… my surgery have been very good to me over the years and looked after me so it's the least I could do really, so yeah.
(Patient 17B)
The altruistic nature of taking part in research was identified by most participants.
… the BALLETS study is there to sort of find out the information they need to, sort of, improve people's lives and to improve people's medical side of things.
(Patient 18D)
Theme 3: result interpretation
In many cases patients recalled abnormal ultrasound findings even though they had not recalled detail of the study process.
I think yeah, but like, basically … I have got a slightly fatty liver.
(Patient 1D)
Sonographers were reported as assuring the participant that the abnormality had minimal implications for participants' health.
… they said ‘yes, everything was OK, that it was ‘a little bit fatty but it was OK’ …
(Patient 4C)
Theme 4: external causality of BALLETS study results
During follow-up clinics, anecdotal evidence emerged that many participants had lost weight and engaged in making lifestyle changes following the initial consultation. However, sometimes changes in lifestyle or weight loss were perceived to be associated with factors other than the liver ultrasound or LFT results, such as existing health problems or medication.
… because I was taking my tablets as well … at the time. But since then I've stopped really taking my tablets … I'd kind of lost weight … I felt better in myself and I wanted to be in control of what I was doing, and not want the tablets to be in control.
(Patient 7A)
Theme 5: lifestyle awareness
Three subthemes were identified within the emergent theme of lifestyle awareness. It was noted that more individuals in groups A and C (with fatty liver) discussed their awareness of the factors that contribute to a healthy lifestyle (see Subtheme: conviction regarding own lifestyle) and engaged in proactive behaviour (see Subtheme: proactive behaviour) as described in Table 74.
Participants believed that a poor diet or excessive alcohol consumption can cause problems to the liver.
I suppose in some respects I … could have expected a problem with me liver … because I used to be a very heavy drinker over a long period of time.
(Patient 10A)
Participants displayed a good understanding of what constitutes healthy living and were confident that they maintained a healthy lifestyle.
I knew I didn't drink alcohol or have never drunk it very much … I've never been one to eat a great deal of fatty food.
(Patient 1A)
For some participants, BALLETS appears to have encouraged beneficial lifestyle changes.
… he [GP] said that ‘a lot of people get these fatty liver cysts … possibly a change in diet might help’ … which I've since tried to do.
(Patient 13B)
They said … that it was a little bit fatty … From the questionnaires … it asks you various questions about your food intake, your alcohol intake … it's one of those things that you take on board and you tend to live that sort of lifestyle … you don't do excessive alcohol and you don't do excess of foods …
(Patient 4C)
Sonographers
Five sonographers were questioned about their role in BALLETS. We were interested in exploring details of the consultation, particularly regarding their interaction with the patient, and whether there was a difference in their approach to the consultation and participant, in comparison with routine hospital consultations. Sonographers reported differences in the equipment and the setting though these made little difference to the scan.
… considering we weren't in our normal environment we found that we got quite good because every time we recalled a patient to the QE we didn't actually gain any more from it, a lot of the time.
(Sonographer ID S1)
When asked if a scan in the secondary-care environment was comparable to the study scan, attitudes of sonographers varied.
Well it's a totally different thing really, because with the study the patients were going through questionnaires, blood tests, explanations.
(S3)
… but I would say perhaps comparable – the patient knows why they're there.
(S1)
Sonographers would inform patients of scan results depending upon their clinical significance, being careful not to exceed their own clinical expertise.
But if they come in for query ‘have you got secondary cancer in your liver’, I wouldn't tell a patient they had that for instance, ’cause it's not my place to, you know. They're going to want to ask lots of questions and I don't know the answers … but then you know, in the same way if they've got gallstones, I'd probably say ‘yeah you've got gallstones’.
(S3)
Discussion
The growing commitment to patient involvement in research has been reflected by the expanding literature on the aims and core features of research from a patient's perceptive. There is, however, scant literature on the impact of research participation on patients, particularly regarding beneficial health effects resulting from behavioural and lifestyle changes.
Both compliance and adherence to lifestyle changes are influenced by a number of factors. Our initial hypothesis was that the results from the first ultrasound consultation acted as a powerful driver in motivating people towards improving behavioural and lifestyle factors, reflected in their change in BMI. Across the four groups, recall of participation in BALLETS was poor. Participants were uncertain when and how they received results, if at all. It was evident that in this context the impact of the consultation with the sonographer appeared to be minimal. Evidence elsewhere has indicated that even in the most serious of clinical cases patient recall of clinical information is poor:61 between 40% and 80% of medical information presented by health-care professionals is forgotten by patients. 62 A number of factors can contribute to this lack of recall, such as complicated medical terminology, educational status of the patient and the means by which the information is presented. 62 Existing literature indicates that participant recall of Central Office of Research Ethics Committees (COREC)-approved, informed consent information is poor, even among those with medical training. 63 The relevant details of BALLETS were contained within the information sheet, although the complicated constraints of a COREC-approved information form may have inhibited understanding and retention of this information and greater engagement with the study by BALLETS participants, including understanding the context of the consultation and implications of the results of the scan. 64
Individuals can be motivated to participate for several potentially interacting factors, including the likelihood of improved clinical care as a result of their involvement,65 and social influences such as a desire to please the practitioner. 66 As elsewhere, for the majority of those interviewed, participation was altruistically motivated. 67 The results were seen as of relevance to the study team and not to them as individuals.
Improving communication between patient and care provider, including adopting a less formal approach, can increase the likelihood of adherence to treatment and behavioural regimes. 68 The potentially more relaxed consultation between participant and sonographer exemplified a more informal discourse. Sonographers would impart information of low clinical impact during the scan, and as a result participants, even those in whom abnormalities were observed, reported that results were underplayed and as a consequence they felt no obligation to alter their behaviour.
Participants were aware of the requisites for a healthier lifestyle, some because of existing health conditions and others as a result of their participation in BALLETS. This capacity to obtain, process, and understand basic health information can then lead them to make appropriate health decisions and may account for the observed changes in liver status and BMI. The results within theme 5 indicate that there may be a relationship between being diagnosed with a fatty liver and an increase in awareness of healthy lifestyle factors. However, we did not find strong evidence that patients were powerfully motivated to change lifestyle by the finding of a fatty liver on ultrasound.
Sociology of testing: an exploration of the clinical and non-clinical motives behind the decision to order a liver function test
Background
The numbers of diagnostic tests used in public health systems are increasing in most countries69 (by 10% per annum in the UK over the last 3 years). 70 The proportion of tests originating from GPs is also increasing; requests from GPs accounted for 37.2% of biochemistry tests in 2002, compared with 41.7% in 2005. 70
Increases in the number of tests ordered could be because of a number of factors: an older population,71 increased range of tests available, increased expectations of patients and guidelines promoting multiple test use. 72 Increased testing inevitably produces more positive results, leading to knock-on investigations, adding further to the number of tests ordered. 71,73
The motivation for ordering a test can be conceptualised under two non-exclusive categories: technical factors related to the diagnosis and management of disease and social factors. The latter include reassurance for patient and/or doctor, patient expectation and maintaining the doctor–patient relationship. 9,74,75 Guthrie75 found that non-technical motivations behind blood tests were commonly viewed as relevant by GPs, particularly when used to reassure the patient or the doctor, and van der Weijden et al. 76 concluded that GPs order tests for many purposes and that non-medical motives were viewed as rational and legitimate.
Liver function tests are a good example of inexpensive tests that are frequently ordered in patients with non-specific symptoms, such as tiredness or upper abdominal discomfort. 77 LFTs are often carried out when the prior risk of disease is low, thereby yielding a high proportion of false-positive results. As LFTs are frequently used despite their lack of specificity, we decided that they would provide an interesting model through which to explore GP motivations behind test ordering.
Methods and rationale
Sample
The study group consisted of GPs participating in the BALLETS study from South Birmingham Primary Care Trust. BALLETS is a National Institute for Health Research (NIHR) HTA study of the value of abnormal LFTs among patients in primary care with non-specific symptoms.
Recruitment
Practice managers in the eight practices participating in the BALLETS study were approached and asked to consult their constituent GPs to ascertain their willingness to take part in the study. GPs from six practices elected to participate. The GPs (29 in total) at each of these practices were supplied with an information sheet and consent form. Interviews were arranged with consenting GPs at a time and date of their choosing.
Interviews
Semistructured interviews with a topic guide and prompts were used. The themes in the topic guide were identified from the existing literature concerning the test-ordering behaviour of GPs and included the impact of a GP's formal and experiential knowledge base, social influences, defensive medicine and characteristics of the test and order process.
Analyses
The interviews were digitally recorded and transcribed by the author. Following initial discussions within the study team the principal codes were determined. The constant comparative method77 was used, leading to the inclusion of an additional question, addressing the use of LFTs as a tool for modifying patient behaviour. All GPs preferred a telephone interview, usually immediately following morning surgery. Interviews were carried out by the same individual. Saturation was reached after 11 interviews.
As a way of ordering the themes and categories, we adapted the ‘attitude–social influence–efficacy’ model defined by Kok et al. 78 and used by van der Weijden et al. 76 The model is based on the assumption that a GP's intention to order a test can be determined by a number of factors, which we placed in one of two broad categories. The first is internal, and includes the themes of expectation of efficacy and general attitude toward LFTs (positive or negative). The second category contains external influences, and consists of the themes of social influence, test characteristics and defensive medicine.
Results
General practitioner characteristics
Breakdown of GPs by age, sex, duration of service, and part-time versus full-time working is given in Table 75. The participating GPs were heterogeneous with respect to these attributes.
GP study ID | Practice | Gender (M/F) | Age | Part time (%)/full time) | Years practising as a GP (including training) | Years at current practice |
---|---|---|---|---|---|---|
1 | C | M | 31 | Full time | 2 years 6 months | 1 year 8 months |
2 | D | M | 36 | Full time | 9 | 8 |
3 | B | M | 41 | Full time | 12 | 10 |
4 | C | M | 52 | Full time | 20+ | 20 |
5 | E | M | 54 | 66 | 25 | 24 |
6 | A | F | 33 | Full time | 6 | 6 |
7 | A | F | 38 | 75 | 11 | 9 |
8 | F | F | 41 | 55 | 14 | 14 |
9 | F | F | 43 | Full time | 15 | 8 |
10 | C | F | 46 | 77 | 16 | 15 |
11 | D | F | 58 | 50 | 28 | 28 |
Motives behind a decision to order a liver function test
Table 76 shows the themes and subthemes mentioned by each respondent represented by ‘×’ according to Kok et al. 's classification. 78
GP no. | Internal | External | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Expectation of efficacy | General attitude to LFT | Social influence | Test characteristics | ||||||||||
Formal knowledge | Craft knowledge | Personal reassurance | Overordered | Positive | Negative | Colleagues | Patients | Research participation | a | b | c | Defensive medicine | |
1 | × | × | × | × | × | ||||||||
2 | × | × | × | × | |||||||||
3 | × | × | × | × | × | × | × | ||||||
4 | × | × | × | × | × | ||||||||
5 | × | × | × | × | × | × | × | ||||||
6 | × | × | × | ||||||||||
7 | × | × | × | × | × | × | × | × | × | ||||
8 | × | × | × | × | × | ||||||||
9 | × | × | × | × | × | × | × | ||||||
10 | × | × | × | × | × | × | × | × | |||||
11 | × | × | × |
Internal influences on the decision to order a liver function test
Expectation of efficacy
The expectation a GP has of his or her own ability to correctly diagnose a patient and order the correct test at the apposite time is a function of the knowledge gained from formal training, and knowledge in the form of experience gained as a practising GP. 78
Clinical reasons for test ordering were mentioned spontaneously by all interviewees. These included decisions based on a patient presenting symptoms of liver disease such as jaundice or pruritus, and medicines known to affect (or be affected by) liver function.
If someone is jaundiced or suffering from weight loss or something like that …
(GP8)
I would tend to tick someone's LFTs if I was checking someone's cholesterol. If they are going to go on a statin then I am going to need to know what someone's LFTs are like.
(GP2)
Tests were ordered for a number of personal reasons related to the GP's beliefs and experiences. Evidence emerged during the early interviews that LFTs were used to incentivise certain patients to make behavioural modifications necessary to improve their health. Notably, GPs would order LFTs for patients suspected of drinking too much alcohol, in the expectation that an abnormal test result would provide evidence of impending self-harm and thereby prompt a change in behaviour.
If someone has got alcohol related problems … and the LFT does come back as abnormal then I would use that as a way of saying ‘look, what you're doing is affecting your liver and you're at a stage where you can do something about it’.
(GP8)
I've got one particular alcoholic who successfully became a teetotaller. His GGTs were up in the sky and then came down to normal or near normal again and with his permission I use a printout of his GGTs going up and down to try and motivate other patients.
(GP11)
GPs we interviewed conceded a lack of complete confidence in their ability to identify a condition by using physical examinations and medical history and so sought reassurance from tests such as the LFT.
… I get the feeling that the more experienced you become the more you do a lot more tests because you know what can happen.
(GP3)
Rather than just keep saying that ‘yes, everything's OK and its just anxiety which is x, y, z and more of a psychological and mental component’, sometimes you do the blood test so that you're more reassured …
(GP8)
General attitude to liver function text
Despite the fact that none of the analytes in an LFT can provide a definitive diagnosis nor is necessarily specific for liver complaints, 10 out of the 11 GPs interviewed held positive opinions on the effectiveness of the LFT, though one recently qualified GP was less convinced.
They are a useful tool, especially for a patient that is unwell and you can't work out what is going on.
(GP10)
I think they're pretty useless to be honest. I think they throw up a lot of spurious results, most of which don't mean anything at all.
(GP1)
It became apparent that those interviewed felt that LFTs were not used as efficiently as they might be. Drawing comparison with other blood tests, they felt that too many were being ordered.
I think like most tests we order too many.
(GP4)
External influences on the decision to order a liver function test
Social influence
The external sources affecting the motivation to order LFTs included patient influence, defensive medicine, and characteristics of the test and ordering process.
Ordering an LFT can be used as a way of reassuring anxious patients that their concerns are being taken seriously, so maintaining the working alliance between patient and doctor.
I do think that patients do feel on the whole that they're being taken more seriously if you stick a needle in them.
(GP7)
One of the GPs in our sample used LFTs alongside other blood tests as a way of managing patients who are presenting psychosomatic complaints.
Sometimes a patient's come and you're sure that they have a psychosocial problem or even depression … but you take a blood test and they're all normal. That's actually quite useful information to feed back to the patient.
(GP10)
The experience of private health care can also serve to raise levels of expectation amongst patients.
They may go to a private consultation and have panels of blood tests done so they have an expectation that they have regular blood tests.
(GP10)
A theme that we had not anticipated was introduced by three of those interviewed, who mentioned the effect of taking part in the primary BALLETS study on their attitude to LFTs. This had led them to question their use of the LFT and helped them focus on the underlying physiology behind the test, increasing the confidence in their evaluation of the LFT.
In light of the BALLETS study I’ll probably find them less useful. If I get a slightly abnormal liver function test I'm probably not going to worry about it.
(GP4)
Since we've done the BALLETS I feel much more able to understand what's going on.
(GP7)
Negative defensive practice was observed in our sample.
We have to do that [LFT test]. Because if someone ends up with liver disease because they were on statins and you didn't do the test then you can end up in big trouble.
(GP10)
Test characteristics
Currently there is less financial pressure on investigation than on prescribing and referral. The lower financial impact of ordering a test means that decision can become easier.
Instead of just doing one, checking renal samples, you might check the whole lot: kidneys, liver, bones, because it doesn't cost any more.
(GP10)
The ease with which an LFT can be ordered can influence the decision-making process.
I think one of the reasons [we order too many] is because of the tick box, you end up doing a profile on people and you end up taking them.
(GP3)
Ordering an LFT has little impact on the patient, particularly if other tests are being ordered, and so lowers the decision-making threshold for ordering LFTs.
I will do an LFT because it's a relatively non-invasive test isn't it, really, to be honest? It's not like a colonoscopy.
(GP7)
Discussion
Summary of main findings
Our study sample consistently admitted using an LFT for routine monitoring of medication and liver-specific diagnostic reasons. In addition, a number of non-clinical motives behind the test-ordering decision were explored. These include the ‘internal’ influences stemming from their own expectations of efficacy including their clinical training and the need for personal reassurance. Two novel findings also emerged. First, it became clear that some GPs used LFTs as a means to actively influence unhealthy (eating and drinking) behaviours. The pattern of alcohol consumption in the UK is changing; young people are drinking more and from an earlier age,79 as are women,80 with potentially large costs to their health and to the NHS. The use of LFTs to promote lifestyle change among heavy drinkers is an interesting idea that warrants further study. Not only can an LFT provide hard evidence of harm, but repeating the LFT after a period of reduced alcohol consumption can also confirm improvement in a patient's condition. However, there are also potential dangers in using LFTs in this way, as a normal result may have a perverse effect by providing false reassurance. The other novel finding was that active participation in research (i.e. the BALLETS study) led a number of GPs to reappraise their use of LFTs. It will be interesting to observe any effect of the result of the BALLETS study on the test-ordering behaviour of participating and non-participating GPs.
In addition, there are the ‘external’ influences, for example social interaction with patients, characteristics of the test and the litigative pressure for defensive practice.
Strengths and limitations of this study
This study has for the first time explored the underlying influences behind a GP's decision to order a LFT. LFTs are somewhat unusual in that each ‘test’ is composed of a panel of five to eight analytes, so it could be seen as a kind of ‘catch-all’. Moreover, the tests are fairly sensitive to alcohol abuse and (to a lesser extent) overeating. A limitation is that none of the behaviours documented was observed directly by the research team and recall bias may have been introduced.
Relationship to existing literature
The GPs in our study who had experience of discovering something unexpected said they were more likely to test in the future, a heuristic known as the ‘availability bias’ in the psychological literature. 81 This may explain a positive correlation between experience and propensity to order seven common blood tests. Another key factor in testing for unexplained complaints is the need to maintain the doctor–patient relationship by meeting user expectations. We found that GPs frequently ordered tests to reassure patients and to signal to them that they were being taken seriously. Also, as reflected in this study, blood tests, such as LFTs, can be used as a way of managing a patient with psychological problems. 82
The drive towards patient-centred care83 means that individuals are increasingly aware of their role as customers and may engender a sense of entitlement. Evidence of patient pressure was observed in this study, and it has been reported that GPs are more likely to test if a patient is assertive and actively asks for a test. 84 GPs in our study also acknowledged reassuring a worried or concerned patient by ordering a test. This may increasingly be the case, as many patients now see a blood test as the most reliable diagnostic tool at the GP's disposal. 82,84,85 The countervailing risks of embarking on an investigation ‘cascade’, triggered by a false-positive test, seem to weigh less highly with patients.
Many in our study group felt an increased need to practise defensively, and other research in the UK has shown that GPs here are now more likely to pursue diagnostic testing as a result of fear of litigation. 86
A number of GPs in our study provided comments on the ease with which an LFT can be ordered. Studies elsewhere have demonstrated that reducing the options on the test order form can reduce the total number of tests. 87,88 It has also been shown that the design of laboratory request forms can influence the decision to order a test. 87,88 Similarly, the low cost and non-invasive nature of LFTs means that the GP can order with minimum impact on budgets.
Implications for future research or clinical practice
As described, a number of elements interact to prompt frequent orders of LFTs. The need that patients feel for reassurance, and the need for investigation perceived by GPs in our study, could be driven in part by the ‘democratisation’ of medical information as web-based sources of medical data continue to proliferate. This is a situation unlikely to change soon, and all GPs who participated in this study felt that the number of LFTs ordered was higher than necessary. However, the GP cannot be solely influenced or restricted by formal guidelines and training, as this approach would exclude the social and consultative nature of the doctor–patient relationship and the carefully constructed working alliance that exists between GP and patient. The character and maintenance of this relationship often drives the testing process, beyond narrowly defined clinical need. The GP's acceptance of the need to balance what the patient expects with what the patient requires is further influenced by the low financial and temporal costs of ordering these tests, their non-invasive nature and the increasing threat of litigation, if failing to use correctly the diagnostic tools at their disposal. The study illustrates that social and behavioural reasons are strong motivators to order a LFT and may even take precedence over clinical motives on some occasions. In particular, the use of LFTs as a tool to increase uptake of health-promoting behaviour could be further explored. Therefore, although an educational change to reduce testing among patients and their doctors might be the theoretical optimal solution, the above range of factors favouring test use suggests that large-scale, rapid change is unlikely to occur.
Chapter 6 Decision analysis
Background
Liver function tests are ordered in large numbers in primary care
Liver function tests comprise a panel of five to eight analytes that are processed inexpensively in large batches. LFTs are one of the most commonly performed ‘blood tests’ in primary care, such that in 2003 the laboratory at University Hospital Birmingham received 67,182 requests for LFTs from 83 GP practices that serve a population of 300,000 people. 89
Enigmatic responses to abnormal liver function tests primary-care settings
An abnormal LFT may signify a serious disease that can be identified only through further testing. These conditions include liver diseases (such as PBC), diseases of other organs (such as Paget's disease of bone) and multiorgan diseases (such as haemochromatosis). However, the majority of people with an abnormal LFT result in primary care settings will not have any such previously undetected disease. They will have either no disease at all or will be manifesting the effects of alcohol abuse or obesity. The doctor is likely to be aware, or at least suspicious, of these behaviours when ordering LFTs, but this does not exclude the presence of other diseases that may aggravate liver damage. There is thus a real question about which specific further tests, if any, a GP should order when an abnormal LFT result is obtained in a patient with non-specific symptoms, or as a result of routine testing. In some cases there may be a clear indication for further tests. For example, in a patient with a family history of haemochromatosis, iron saturation should be measured. In some cases, the pattern of LFT abnormality may suggest a diagnosis – for example, an isolated raised unconjugated bilirubin suggests Gilbert's disease, whereas a high blood level of ALP is indicative of PBC. In most cases, however, no unambiguous clinical indication for follow-on testing exists. The literature deals mostly with the pattern of abnormality given a diagnosis, rather than the probability of the various diagnoses given a pattern of abnormal LFTs. It is therefore not surprising that guidelines for GPs3,10,90–93 confronted with an abnormal LFT result in patients with non-specific symptoms or detected fortuitously are inconsistent, or that the way GPs in which respond has been found to be eclectic. 94 A point on which guidelines do agree is that the LFT panel should be repeated following an abnormal result.
Criteria for selection of a topic for decision analysis
If there is any particular previously unrecognised disease that a patient would wish to have excluded by further testing, then it will have the following features:
-
It is a serious disease.
-
It is treatable in the prodromal phase.
-
Failure to identify the condition can lead to permanent damage.
-
It can be diagnosed with a high specificity by a familiar and inexpensive test.
-
It is among the more prevalent of the serious diseases.
-
It is not a condition, like alcohol misuse or obesity, which can be diagnosed from history and examination.
Viral hepatitis
We discern that chronic viral hepatitis is the prime candidate based on the above criteria. It is a massive problem worldwide95–97 and Table 77 shows that it is the most common of the specific liver diseases in the UK population after alcohol damage. Moreover, chronic viral hepatitis can be reliably confirmed or excluded by means of a relatively inexpensive blood test. 98 The disease has a prodromal period lasting many decades and is eminently treatable if caught early, thereby averting cirrhosis and liver cancer. 99
Disease | Prevalence among adult population (%) | Blood tests carried out on all members of the cohort (to diagnose or screen for the disease) | Diagnostic algorithm |
---|---|---|---|
Chronic viral hepatitis C | 0.4232 | HCV antibody (HCV Ab) | Viral marker positive |
Chronic viral hepatitis B | 0.333 | Hepatitis B viral markers (HBV surface Ag) | Viral marker positive |
Metal storage disease: iron | 0.25 (prevalence of phenotype; homozygous plus complex heterozygous)34 | Iron saturation | Genotype if iron saturation > 50% |
PBC | 0.02435 | AMA | Raised antibodies and raised ALP level |
Autoimmune hepatitis | 0.00136 | SMA | Raised antibodies and raised ALT, AST or globulin exceeding twice the ULN. Confirmed by hepatologist |
Metal storage disease: copper | < 0.02537 | Caeruloplasmin | Low levels of caeruloplasmin |
A1AT deficiency | < 0.02538 | A1AT | Low A1AT levels followed by phenotype testing |
The purpose of the decision analysis described here is to inform the selection of an efficient strategy for the diagnosis of chronic viral hepatitis. Such a strategy should optimise the trade-off between detection rate and cost.
Methods and rationale
Testing strategies
A simple decision tree was constructed in Microsoft Excel (Microsoft Corporation, Redmond, WA, USA) to enable costs per case detected to be calculated for seven strategies. 100 The strategies were developed in consultation with a GP and hepatologist (PG and JN, respectively), who were aware of the relevant literature and guidelines.
The root decision (or starting point) of the tree is the discovery of an abnormal LFT result in primary care where the patient does not have known or self-evident liver disease. From the root node we identified seven decisions that may be considered by a GP under such a scenario:
-
Strategy A Repeat the LFT panel and then perform a specific test for viral hepatitis if an abnormality is still present on retesting. This could be considered the intuitive response by a GP on receiving an abnormal LFT result in a patient without the indictors of a specific disease, and is the strategy recommended in the literature. 10,90–93
-
Strategy B Perform a viral test in all patients with an abnormal ALT. The rationale for this strategy is that ALT is the most specific indicator of viral hepatitis10 and has been recommended as the testing criterion by other authors. 28,101,102
-
Strategy C Select ALT as the trigger for viral testing, but nominate a higher threshold, at twice the ULN as recommended by Jamali et al. 103 This is also the threshold for instigating viral therapy for HBV in certain treatment guidelines. 104–106
-
Strategy D Perform a test for viral infection in all patients who originate from a country with an intermediate or high prevalence of viral hepatitis according to WHO criteria. 107–109 Screening has been shown to be cost-effective for people who were born in intermediate- or high-prevalence countries and it is likely that testing would be more cost-effective still in a population with abnormal LFTs. 106,110
-
Strategy E Combine the two previous strategies by testing those who have an ALT level exceeding twice the ULN and who also originate from an intermediate- or high-prevalence country.
-
Strategy F Test all patients from prevalent countries as well as those with an ALT level exceeding twice the ULN.
-
Strategy G Test all patients for viral hepatitis irrespective of the type or extent of abnormal LFT results.
There is also an option to take no action with respect to viral hepatitis, and although this may be a sound decision in some cases, for example when a LFT is ordered in the hope that a positive result will prompt a reduction in alcohol intake, this was not considered here.
In this study, the hepatitis status of all patients was known. Moreover, most had an ALT test result and the results of a repeat LFT panel. Thus, it was possible to evaluate the performance of each of the above strategies.
Populating the decision tree with probabilities/statistical model
All 1236 patients were used in the evaluation of strategy G: but for all other strategies the effective sample size was reduced because of missing data in some of the patient records. Estimates of the proportion of patients undergoing viral tests and the proportion of actual cases detected (sensitivity) were obtained using the sample of patients available for evaluating each strategy. The PPV of a strategy was defined as the proportion of hepatitis cases among those selected for viral testing. Confidence limits for this quantity were calculated using Wilson's method for binomial data. 111
Estimation of costs
The direct costs incurred at the time of the test were the laboratory costs of the liver function and viral hepatitis tests (Pathology Laboratory Manager, University Hospitals Birmingham NHS Foundation Trust, 2005, personal communication); the GP costs for scheduling each test; and following up on results. Administrative costs were estimated by estimating the time implications for a secretary to add patients to appointment slots and a receptionist to check the patient in for an appointment (MidReC: West Midlands Research Consortium, Department of Primary Care, University of Birmingham, 2006, personal communication; figures correct as of February 2009). The costs are presented in pounds sterling (£) (and were correct for the year 2009). Non-health service costs (patient travel cost and lost earnings) were not measured but are considered in the discussion.
Analysis
The number of cases detected per 100 patients was estimated as the sensitivity of the strategy (cases detected ÷ cases present) multiplied by the prevalence (per 100 patients) of viral hepatitis in the whole sample of 1236 patients. For each strategy, the cost per case detected was then computed as the ratio of the cost per patient to the number of cases detected per patient. The strategy which minimised this quantity was taken as the base case. For each alternative strategy, the incremental cost-effectiveness ratio (ICER) was computed, defined as the incremental cost per additional case detected compared with the base case. The analysis is deterministic and does not consider the impact of sampling variability. The results of these analyses were compared with published results of cost-effectiveness analysis of screening for chronic viral hepatitis, bearing in mind likely differences between a screening and a diagnostic population. We used this analysis to develop a ‘fast and frugal’ heuristic,112 which we offer to readers for their consideration.
Results
Patients
A total of 1344 patients consented to the study; 54 were excluded because they did not match the entry criteria in the protocol, along with a further 54 for whom data on at least one viral hepatitis test were missing (Figure 18). This left 1236 patients for this study; 105 of these patients were from Lambeth and 1131 were from Birmingham. The median interval between index and repeat testing was 31 days (IQR 19–52 days).
Chronic viral hepatitis cases
Thirteen of the 1236 patients for whom the test result was available had chronic viral hepatitis – nine had hepatitis B and four had hepatitis C. This gives an estimate of 1.1% (95% CI 0.6% to 1.8%) for the prevalence rate in the primary-care population with abnormal LFTs: only slightly more than the baseline prevalence in the general population (0.7%). The demographic breakdown of patients with and without viral hepatitis is shown in Table 78.
Feature | Total | Viral hepatitis | Not viral hepatitis |
---|---|---|---|
n | 1236 | 13 | 1223 |
Age (years), mean (SD) | 57.7 (15.2) | 54.0 (15.9) | 57.7 (15.2) |
Sex (n, %) | |||
Male | 693 (56.1) | 9 (69.2) | 684 (55.9) |
Female | 543 (43.9) | 4 (30.8) | 539 (44.1) |
Ethnic group (n, %) | |||
White | 1023 (82.8) | 3 (23.1) | 1020 (83.4) |
Asian | 88 (7.1) | 5 (38.5) | 83 (6.8) |
Black | 53 (4.3) | 3 (23.1) | 50 (4.1) |
Other | 38 (3.1) | 2 (15.4) | 36 (2.9) |
Missing | 34 (2.8) | 0 (0.0) | 34 (2.8) |
Reason (n, %) | |||
Abdominal signs/symptoms | 69 (5.6) | 1 (7.7) | 68 (5.6) |
Non-abdominal signs/symptoms | 302 (24.4) | 6 (46.2) | 296 (24.2) |
Diagnosis – alcohol abuse | 17 (1.4) | 0 (0.0) | 17 (1.4) |
Review – CVD | 50 (4.0) | 0 (0.0) | 50 (4.1) |
Review – cholesterol | 53 (4.3) | 0 (0.0) | 53 (4.3) |
Review – hypertension | 147 (11.9) | 2 (15.4) | 145 (11.9) |
Review – diabetes | 216 (17.5) | 2 (15.4) | 214 (17.5) |
Review – medication | 92 (7.5) | 0 (0.0) | 92 (7.4) |
Medical – review other | 290 (23.5) | 2 (15.4) | 288 (23.5) |
The breakdown LFT results in the infected cases is given in Table 79. In 10 of these 13 cases, more than one analyte was abnormal. In eight cases, the ALT was abnormal, and it was notably raised in six of those (above twice the ULN). In one case (perhaps detected by serendipity), only protein levels were abnormal and all the enzyme tests (ALT, AST, GGT and ALP) were normal. Eleven of the 13 patients with chronic viral hepatitis had an abnormality on the repeat LFT. In two other cases, there were missing data among the repeat LFT panels. Of the 1113 patients with no viral hepatitis who underwent a complete LFT panel, 169 (15%) reverted to normal.
Case no. | ALT | AST | Bilirubin | ALP | GGT | Albumin | Globulin | Total protein | Repeat LFT | Country of origin (prevalence of viral hepatitis) |
---|---|---|---|---|---|---|---|---|---|---|
HBV | ||||||||||
1 | Higha | High | Normal | Normal | High | Normal | Normal | High | Abnormal | Kenya (high) |
2 | Normal | Normal | High | Normal | Normal | High | Low | Normal | Abnormal | UK (low) |
3 | High | Normal | Normal | Normal | High | Normal | Normal | Normal | Abnormal | Pakistan (high) |
4 | Higha | High | High | Normal | High | Normal | Normal | High | Abnormal | India (high) |
5 | Higha | High | Normal | Normal | Normal | Normal | Normal | Normal | Abnormal | Malaysia (high) |
6 | Higha | No result | No result | No result | Normal | Normal | Normal | Normal | Abnormal | UK (low) |
7 | Normal | Normal | Normal | Normal | Normal | Normal | High | High | Abnormal | Kenya (high) |
8 | No result | High | Normal | High | No result | Normal | No result | No result | Abnormal | Iraq (high) |
9 | Normal | No result | High | Normal | No result | Normal | No result | No result | Incompleteb | Malta (high) |
HC | ||||||||||
1 | High | Normal | Normal | Normal | High | Normal | Normal | Normal | Incomplete | Pakistan (high) |
2 | Higha | High | Normal | Normal | Normal | Normal | Normal | High | Abnormal | Hong Kong (high) |
3 | Normal | No result | Normal | Normal | High | Normal | No result | No result | Abnormal | Jamaica (high) |
4 | Higha | High | Normal | Normal | Normal | Normal | Normal | Normal | Abnormal | Somalia (high) |
The country of origin was recorded in 1208 of the 1236 study participants, and of these 170 were born in a country with an intermediate or high prevalence of viral hepatitis (based on WHO definitions of prevalence107–109) and 1038 were from low-risk countries. The high-risk group contained 11 out of the 13 patients (85%) with viral hepatitis. None of the 13 cases admitted to use of intravenous drugs at any time.
As expected from the literature, ALT or AST levels when abnormal tended to be more extreme in patients with viral hepatitis than in patients who did not have this disease (Table 80).
Analyte | Upper limit | HBV or HCV | Non-hepatitis | ||||
---|---|---|---|---|---|---|---|
n | Mean | Median | n | Mean | Median | ||
ALT | 41 | 8 | 98.0 | 89.5 | 426 | 65.4 | 56.0 |
AST | 43 | 6 | 94.5 | 69.5 | 254 | 64.5 | 53.5 |
Diagnostic performance
The sensitivity and PPV of each detection strategy are given in Table 81. It can be seen that the recommended strategy (A), of repeating the LFT and then performing a viral test if an abnormality persists, is highly sensitive. However, the predictive value is low (1.15%). Strategy D, simply carrying out a viral test if the patient originates from a high- or intermediate-risk country, detects 85% of cases and has a much higher predictive value (6.47%) than the strategy of repeating the LFT test. The strategy (B) of ordering a LFT if the ALT is raised is not particularly sensitive (67%), nor does it have a high predictive value (1.91%). The more selective strategy (C) of testing if the index ALT is more than twice the ULN has a higher predictive value, but is less sensitive. The best features of strategies C and D are combined in the hybrid strategy F, which achieves high sensitivity (92%) and worthwhile predictive value (5.12%).
Strategy for viral testing | No. of patientsa | Hepatitis casesa | Viral tests | Cases detected | Sensitivity (%) | PPV, % (95% CI) |
---|---|---|---|---|---|---|
A. If repeat LFT panel is abnormal | 1124 | 11 | 955 | 11 | 100 | 1.15 (0.64 to 2.05) |
B. If ALT abnormal on primary test | 1064 | 12 | 418 | 8 | 67 | 1.91 (0.97 to 3.73) |
C. If ALT > twice ULN on primary test | 1064 | 12 | 77 | 6 | 50 | 7.79 (3.62 to 15.98) |
D. If patient born in a country of intermediate to high viral hepatitis prevalence | 1208 | 13 | 170 | 11 | 85 | 6.47 (3.65 to 11.21) |
E. If patient born in a country of intermediate to high viral hepatitis prevalence and ALT > twice ULN on primary test | 1041 | 12 | 16 | 5 | 42 | 31.25 (14.16 to 55.60) |
F. If patient born in a country of intermediate to high viral hepatitis prevalence, or ALT > twice ULN on primary test | 1041 | 12 | 215 | 11 | 92 | 5.12 (2.88 to 8.93) |
G. Test all cases | 1236 | 13 | 1236 | 13 | 100 | 1.05 (0.62 to 1.79) |
Costs and cost minimisation analysis
The cost of the laboratory tests and the practice costs are given in Table 82. The average cost per case detected and the incremental costs of detecting each additional case are shown in Table 83. Strategy E (viral test if patient born in an intermediate-/high-risk country and ALT is greater than twice the ULN) provides the lowest cost per case detected. This strategy was therefore designated as the base case for calculation of ICERs. Strategy A, the intuitive and widely advocated practice of repeating LFTs, turns out to be the most expensive per case detected. It is dominated by strategy G, in which all patients undergo a viral test. Similarly, strategy B (viral test if the index ALT is abnormal) is dominated by strategy D (perform viral test if patient was born in an intermediate- or high-risk country). Strategy C (viral test if the ALT is greater than twice the ULN) can be eliminated by an extended dominance principle. If strategy C is preferred to strategy E, this can only be because the extra cases detected by strategy C are deemed worth the extra cost. However, strategy D finds yet more cases than strategy C at lower incremental cost. Therefore, either strategies E or D is preferable to strategy C. The cost-effectiveness of the remaining admissible strategies is shown in Figure 19. The dotted lines join strategies that cannot be eliminated by dominance principles. The absence of any explicit penalty for missing cases of viral hepatitis in this analysis implies that the costs of strategies E, D and F are underestimated with respect to strategy G. However, strategy F must be regarded as highly competitive with strategy G – it picks up almost as many cases and has very high efficiency in terms of cost per case detected.
Cost category | Resources (£) |
---|---|
GP consultation cost to check LFT results | 12.86a |
Receptionist to check patient in for appointment (2 minutes) | 0.91a |
Secretary time (1 minute) | 0.33a |
Phlebotomist time (5 minutes) | 1.00a |
Sample analysis: LFT | 2.69b |
Sample analysis: hepatitis B surface Ag and hepatitis C | 25.42b |
The number of detected cases per patient is estimated as (sensitivity of strategy) × 1.05%, where the latter figure is the viral hepatitis prevalence observed in the complete sample of 1236 patients. The number used differs slightly from the actual number of cases detected per patient in Table 83 because of variation in the prevalence of the condition across the samples in which each strategy was tested. The current approach achieves a more consistent comparison of strategies within our data set; for example, it ensures that the estimate of detected cases per patient for a strategy with 100% sensitivity will always be at least as great as that of any other strategy.
Strategy | Cost per 100 patients (£)a | Cases detected per 100 patients | Cost per case detected (£) | Incremental cost (£) per 100 patients (with base = E) | Incremental cases detected per 100 patients (base = E) | ICER |
---|---|---|---|---|---|---|
A | 5222 | 1.05 | 4965 | 5159 | 0.61 | Dominatedb |
B | 1592 | 0.70 | 2270 | 1530 | 0.26 | Dominated |
C | 293 | 0.53 | 558 | 231 | 0.09 | (2635)c |
D | 570 | 0.89 | 641 | 508 | 0.45 | 1124 |
E (base)d | 62 | 0.44 | 142 | 0 | 0.00 | Base |
F | 837 | 0.96 | 868 | 775 | 0.53 | 1473 |
G | 4052 | 1.05 | 3853 | 3990 | 0.61 | 6503 |
Discussion
Summary of main findings
The BALLETS study is the first GP-based study in which the entire cohort was comprehensively tested for additional diseases (such as viral hepatitis) after an abnormal LFT, using the full analyte panel and normal reference ranges. We have shown that an abnormal LFT alone does not select out a population in which the prevalence rate approaches a threshold that would justify viral screening. We have assessed the validity of the various strategies a GP could adopt, at least as far as viral hepatitis is concerned, when faced with an abnormal LFT of uncertain provenance. The intuitive response for a GP in such a situation would be to repeat the LFT, an approach advocated by current literature. This study shows that this may not be the optimal policy. This strategy is the most expensive, even more so than viral testing all patients, as the costs incurred include repeating the LFT as well as viral testing the majority. The study also shows that, if ALT is notably raised (greater than twice the ULN), then the probability of chronic viral hepatitis is high (nearly 8%), but sensitivity is low. The strategy of testing all people from intermediate- or high-prevalence countries is the second most efficient, in terms of cost per case detected, and detects almost twice as many cases as the most efficient strategy – testing for viral infection when two conditions (birth in an intermediate- or high-prevalence country and an ALT greater than twice the ULN) are satisfied. The relative financial disadvantages of the strategy of repeating the LFT would be even greater if patient cost were included, as the extra visit would have to be factored in.
Strengths and limitations of the study
The main strength lies in the unique nature of the BALLETS cohort, being the only prospective study in a primary-care setting that has looked at the consequences of an abnormal LFT from a full analyte panel. The main limitation of our study relates to the rather small number of cases of chronic viral hepatitis (n = 13) and hence wide confidence limits on the results. That said, the results are plausible, in the sense that they are consistent with the pathophysiology of hepatitis and in line with what was found in non-practice settings (see Table 2). They are available for meta-analysis with potential future studies.
We deliberately selected multicultural inner city populations in order to provide a sizeable subgroup of people from countries where chronic viral hepatitis is common, as a result of infection during infancy (hepatitis B)113 and iatrogenic infection (hepatitis C). It turns out that 11 out of the 13 cases originated in medium- or high-prevalence countries. This has two implications. First, the prior probability is low (< 0.2%) and independent of ethnic group in an inner city UK population who do not originate from medium- or high-risk countries. Second, one of the most important questions a doctor can ask of a patient with abnormal LFTs is his or her country of origin – this is likely to apply irrespective of where the patient finally settles, as in most cases hepatitis is acquired soon after birth (hepatitis B) or as a result of iatrogenic infections in countries where has been a real risk (hepatitis C). Inner city populations were selected for this study in order to provide an ‘enriched’ population with a high proportion of immigrants. Given the biology of hepatitis B and C, there is little or no reason to suspect that an immigrant from a high-risk area will have a different risk according to where they settle in a low-risk country, whereas only 6% of the ethnic minority population had chronic viral hepatitis (see Table 77).
Our study considers only one disease type, chronic viral hepatitis, whereas GP decision-making must take into account other diseases, such as haemochromatosis, as well as other behavioural and social motivations for testing. 5,6 That said, our conclusion that repeating the LFT ‘offers more than it delivers’ may well apply to diseases such as PBC and haemochromatosis.
Lastly, we have presented an analysis for cost minimisation and incremental cost per case detected. This is not a full cost-effectiveness or decision analysis. Donnan et al. 24 did attempt a decision analysis. However, this decision analysis was intended to find the most cost-effective strategy in the short term and used a limited time horizon of 1 year. LFTs are often ordered to prevent poor outcome in the long term, with many serious liver diseases, viral hepatitis included, manifesting over decades. Anxiety resulting from a false-positive result was included in the model, whereas long-term health gains as a result of successful case finding and treatment were not captured.
Our results are considered in the context of published cost-effectiveness analyses for screening for viral hepatitis (i.e. studies that found screening was cost-effective in populations with high prevalence rates, for example migrants) and attempt to produce a ‘fast and frugal heuristic’112 guide to practice.
Implications for practice: a fast and frugal heuristic
The intuitively appealing practice of repeating abnormal LFTs (strategy A) gets little support from our analysis. It is more expensive, both in absolute terms and in terms of cost per case detected, than all five alternative strategies (see Table 82), including that of simply testing everyone for viral infection.
The most important question a doctor can ask a patient with abnormal LFTs is his or her country of origin. This holds good whether the person settles in an area of high or low ethnic mix, as infections are acquired in infancy (hepatitis B) or as a result of substandard medical practices, such as needle sharing (hepatitis C). Once infected, people ‘take their risk with them’ – fewer people will need to be tested in a low-ethnic-mix area, but those from intermediate- or high-prevalence countries still need testing. The strategy of testing people from such countries promises good value for money. In this study, 11 of the 13 patients with chronic hepatitis originated in medium- or high-risk countries. Thus, the prevalence of chronic hepatitis viral infection (PPV) among people with an abnormal LFT who were born in a medium- or high-risk country was 6.5% (11/170; 95% CI 3.7% to 11.2%; see Table 80), whereas the prevalence among the home-born population (of all ethnic groups) was < 0.2% (2/1038; 95% CI 0.05% to 0.7%). Our findings support viral testing only in the former group, consistent with the threshold prevalence for both HBV and HCV, of approximately 3%, at which population screening becomes cost-effective. 106,114,115
Four of the strategies – C, D, E and F – entail viral testing in a population in which the rate of hepatitis exceeds the 3% threshold for which testing has proven cost-effective in screening programmes (see Table 80). The cost-effective threshold is probably a little lower in a diagnostic population than in a screening population (costs of inviting people to attend are lower and cases detected might be a slightly higher risk), but no other strategy yields a population with hepatitis rate exceeding even 2%.
Strategy D (test immigrants from prevalent countries) has a better (lower) ICER than strategy C and detects twice as many cases as strategy E. However, strategy F, testing immigrants from prevalent countries or any people with a very high ALT, is our preferred strategy, being both sensitive and efficient. We therefore recommend the ‘fast and frugal’ heuristic described in Figure 20. This combines strategy F with normal judgement of clinical indications. For example, a patient who is an intravenous drug user, or who has recently returned from a trip abroad where they had an attack of hepatitis, would be tested notwithstanding the result of the LFTs. Otherwise we recommend testing all patients with an abnormal LFT who were born in a country of intermediate or high prevalence, and all patients for whom the ALT exceeds twice the limit of normal.
The probability of chronic viral hepatitis is low, even when the ALT exceeds this limit and the patient does not originate from a medium- or high-risk country (about 0.2%). Nevertheless, we advocate testing in these patients for the following reasons:
-
It is hard to ignore a level this high, and the wide confidence levels from our data suggest the need for flexibility. 116
-
The progression for undetected chronic viral hepatitis is worse for patients with ALT levels that are greater than twice the ULN, and this level has been used as a threshold for treatment in guidelines.
-
If chronic viral hepatitis is not present at this level, a more in-depth search for other causes of hepatocellular damage is indicated.
We draw the line on further viral testing after this algorithm has been followed, unless of course further clinical indicators emerge. The likelihood of a case of viral hepatitis being present following the exclusions in this algorithm is approximately 0.1% in our study. This is considerably below the UK population prevalence.
Conclusions
This analysis indicates that the strategy of repeating LFTs in asymptomatic patients, advocated by current guidelines, is less sensitive and far more expensive than viral testing those patients born in countries where viral hepatitis is prevalent. Despite few cases of viral hepatitis, the data on costs of the various strategies are strong and the results of prevalence rates within the cohort are consistent with other literature. The finding that a notably raised ALT level was also effective at identifying infected patients inspired the construction of a ‘fast and frugal’ heuristic that might aid GPs who are faced with abnormal LFTs in asymptomatic patients, with regards to viral hepatitis. Our proposal addresses the diagnostic problem by identifying a clear high-risk population originating in high-risk countries. The residual population who are not immigrants from such countries are at low risk. However, this should not over-ride clinical judgement. Its overall cost in other settings will depend on the relative proportions of patients in these risk strata, but our results suggest that the cost of automatic testing of high-risk individuals will be repaid in terms of additional cases detected.
Clearly, the situation might change as vaccination catches on in developing countries and needle hygiene improves. The key points to emerge are that:
-
It is more efficient to determine country of origin with a view to viral testing, than to simply repeat the LFT.
-
It is more cost-effective to test the whole LFT positive population for viral hepatitis than to repeat the LFT with a view to viral testing if it remains positive.
Chapter 7 Presence and severity of non-alcoholic fatty liver disease
Introduction
The incidence of liver disease is rising throughout the world, and liver disease now accounts for 1.5% of deaths in the UK. 117 In parallel with this, there has been a year-on-year rise in the number of LFTs carried out in primary care. Primary-care practitioners (PCPs) are thus commonly faced with the scenario of abnormal LFT results in patients in whom there are no clinical risks, signs or symptoms of liver disease. NAFLD is now recognised as the most common cause of hepatic dysfunction in the general population; however, this is yet to be confirmed in primary-care practice. 118,119 Furthermore, because of the indolent asymptomatic nature of NAFLD, identifying those with advanced disease in whom specific interventions may be required remains a clinical challenge in primary care.
The prevalence of NAFLD has risen markedly to 14–34% of the general population in Europe,119,120 Asia121 and America122 in recent years. Although patients with simple NAFLD are believed to have benign disease, there is now clear evidence that those who have progressed to NASH and fibrosis are at a much higher risk of developing hepatocellular carcinoma (HCC), liver failure and death. 22,123 The majority of data describing the severity of liver fibrosis in NAFLD arise from selected populations in secondary referral centres. 18,19,21,22,29,124,125 In a large UK prospective study, Skelly et al. 19 demonstrated that 19% (23/120) of patients with biopsy-confirmed NASH had significant fibrosis after presenting to their secondary-care centre with unexplained abnormal LFTs. 19 This, and other such studies,18,29 included patients in whom the decision to refer had been made on clinical grounds by PCPs/consultant colleagues and who were then rigorously screened in liver clinics for other disease aetiologies prior to proceeding to liver biopsy. These studies are therefore influenced by ascertainment bias and may overestimate the severity of NAFLD emerging from primary care.
With the alarming growth of obesity and type 2 diabetes, it is currently expected that the burden of NAFLD on primary care and liver services will continue to rise in the UK. 126 To date, no studies have determined the underlying disease severity of NAFLD in primary care. PCPs remain at the forefront of identifying the patients with advanced NAFLD who require further evaluation, closer surveillance for complications (and interventions where appropriate) and stricter lifestyle modifications. By investigating a large UK primary care sample of patients with incidental abnormal LFTs and absent clinical features of liver disease, this study is the first of its kind to determine the presence and disease severity of silent NAFLD in a primary-care setting.
Methods
Study population
This cross-sectional substudy utilises baseline data from patients enrolled in the BALLETS study from the eight primary-care practices within the Birmingham region only. Patients identified as having significant, positive liver disease aetiology were followed up in the specialist liver outpatient clinic at the Queen Elizabeth University Hospital, Birmingham. Electronic liver clinic letters were reviewed for this substudy cohort until May 2010 to strengthen the reliability of the initial study finding of liver-specific disease.
Data definitions
The LFT blood profile consisted of ALT, AST, ALP, GGT, total bilirubin, globulin and albumin measurements. LFTs were classified as abnormal according to reference ranges in the local laboratories, which are compliant with quality control standards. All patients were screened for hereditary (Wilson's disease, A1AT deficiency and genetic haemochromatosis), infectious (HBV and HCV), autoimmune (autoimmune hepatitis, PBC and PSC) and drug-induced liver injury.
Body mass index was defined as weight in kilograms divided by the square of the height in metres (kg/m2). Obesity was defined as BMI ≥ 30 kg/m2. Alcohol intake was reported as standard units (1 unit = 10 g alcohol) of alcohol consumed on average per week in the 6 months prior to recruitment. The past medical history was also extensively reviewed to identify study participants who had a history of alcohol excess or alcohol-related health problems. Mild (female 1–7 units/week, male 1–11 units/week) and moderate (female 8–14 units/week, male 12–21 units/week) alcohol consumption were defined as drinking within the current UK health guidelines (female ≤ 14 units/week, male ≤ 21 units/week). 127 At-risk alcohol consumption was defined as exceeding these guidelines.
For the purposes of this substudy, type 2 diabetes was defined in patients with a documented history of the disease or a recorded drug history of anti-diabetic medication. Hypertension was defined as a past medical history of the disease or a current recorded drug history of two or more antihypertensive medications.
The diagnosis of NAFLD was based on the following criteria: (1) sonographic features of fatty liver on USS (increased hepatic parenchymal echotexture and vascular blurring); (2) a negative history of alcohol consumption exceeding current UK health guidelines; and (3) exclusion of liver disease of other aetiology including drug-induced, autoimmune, viral hepatitis, cholestatic, metabolic and genetic liver disease.
Non-alcoholic fatty liver disease fibrosis score
The NAFLD Fibrosis Score (NFS)21 is a simple non-invasive scoring system designed to identify or exclude advanced fibrosis (classified as Kleiner stages F3 and F4128) in patients with an established diagnosis of NAFLD on imaging. The NFS was developed and validated by Angulo et al. 21 in over 700 liver patients with biopsy-proven NAFLD and is routinely used in liver clinics to select those at risk of disease progression and HCC. The NFS utilises a number of simple clinical (age, hyperglycaemia/diabetes, BMI) and laboratory (platelet count, albumin and AST/ALT ratio) independent predictors of advanced liver fibrosis. The low cut-off score (< –1.455) has a negative predictive value (NPV) of 88–93% and the high cut-off score (> +0.676) has a PPV of 79–90% for the presence of advanced fibrosis in NAFLD in secondary-care populations. 21,129 The NFS was calculated using the web-based electronic calculator (http://Nafldscore.com).
As the original BALLETS study protocol did not incorporate a platelet count, retrospective data collection of the electronic haematology laboratory archive at the University Hospital Birmingham enabled platelet counts within 6 months of patient enrolment to be recorded. To avoid false-positive or false-negative NFS, the scoring system was not applied to participants with a past medical history of platelet disorder or an active systemic inflammatory disease or being treated with myelosuppressive medications.
Statistical analysis
After exclusion of a positive blood/drug/alcohol aetiology screen, patients were diagnosed with NAFLD based on the presence of fatty liver on USS. Descriptive statistics were applied to characterise the whole study cohort and the identified NAFLD group. Continuous clinical variables are reported as medians and IQR. Categorical variables are reported as numbers and percentages.
Results
A total of 1118 primary-care patients were included. The majority (38%; 424/1118) of these resulted from routine chronic disease check-ups. In 4.5% (50/1118) of cases no reason was recorded. Liver aetiology screen and ultrasound were successfully completed in 98% (1095/1118) of patients at the study visit.
Causes of abnormal liver function tests
The cause of abnormal LFTs was identified in 54.9% (614/1118) of cases. Detailed testing for viral, genetic and autoimmune causes yielded 33 diagnoses (3.0%). NAFLD was identified as the commonest cause of abnormal LFTs, accounting for 26.4% of all cases, exceeding alcohol excess (25.3%). There were no reported cases of cirrhotic appearances or ascites on USS in the NAFLD cohort. Two or more abnormal LFT analytes were present in 40.7% of NAFLD subjects (120/295), with the remainder having a single analyte abnormality (59.3%; 175/295) on GP sampling. GGT was the most common LFT abnormality in the NAFLD cohort (76.5%; 199/260). The median time difference between GP ordering blood tests and the study visit was 30 days (IQR 18–51 days).
At-risk alcohol consumption was reported in 25.2% (282/1118). The majority of at-risk alcohol consumers were male (44.7%; 126/282) and drank a significantly greater amount of alcohol (units per week) than women [median 42 (IQR 30–56) units/week vs 29 (IQR 21–46) units/week; Mann–Whitney U-test, p < 0.001]. An echo-bright fatty liver was identified with USS in 44.7% (126/282) of subjects who consumed at-risk levels of alcohol. The majority of excess drinkers (87%; 110/126) had a BMI of > 25 kg/m2. Cirrhotic appearances (coarse texture with irregular outline) on USS were reported in two patients with at-risk alcohol consumption. The diagnosis of compensated alcohol-induced cirrhosis was confirmed by tertiary liver specialists. No cause for LFT abnormality was identified in the remainder of study subjects (45.1%; 504/1118). Of note, 17.5% (88/504) of the unexplained abnormal LFT cohort were obese with a concurrent diagnosis of either type 2 diabetes and/or hypertensive disease.
Disease severity in the cohort of patients with non-alcoholic fatty liver disease
To calculate the severity of NAFLD in this cohort we used the NFS. The score was calculated in 236 of the 295 patients who met the diagnostic criteria for NAFLD. The NFS was not calculated in the remaining 59 patients with NAFLD as a result of incomplete records of blood platelets (n = 50), BMI (n = 5) or AST/ALT ratio (n = 4). A high NFS (> +0.676) was found in 7.6% (18/236) of patients with NAFLD, suggesting the presence of underlying advanced liver fibrosis (stages F3/F4 on Kleiner classification). 130 Advanced fibrosis was predicted to be absent in the majority of NAFLD subjects with a low NFS (< −1.455), being calculated in 57.2% (135/236). The presence of advanced fibrosis, however, could not be confidently excluded in 35.2% (83/236) of the NAFLD patients who scored an indeterminate value with the NFS (−1.455 to + 0.676).
Discussion
This large prospective primary-care study highlights that NAFLD accounts for over 25% of incidental abnormal LFTs in primary-care consultations in which the consulting GP's suspicion of underlying liver disease is low or absent. In contrast, a specific viral (HBV/HCV), genetic or autoimmune disease was identified on thorough study testing in only 3.0% of all study patients. Application of a simple, non-invasive scoring system suggests that undetected advanced liver fibrosis is present in 7.6% and absent in 57.2% of patients with NAFLD. Incidental abnormal LFTs were most commonly encountered during routine chronic disease reviews (38% cases), including diabetes, hypertension and cardiovascular disease. This study is the first of its kind to report the severity of NAFLD in patients with incidental abnormal LFTs in primary care.
Our study evaluated a primary care-based population with abnormal LFTs rather than a volunteer population from the general community. Nonetheless, the frequency of NAFLD (26%) identified in our study is within the wide range (14–34%) previously reported in general population studies carried out in Italy,118 Spain,120 Asia121 and America. 122 The variation in reported frequencies may be influenced by ethnic diversity122,130 and differences in study methodologies. These include variable alcohol thresholds that define NAFLD, lack of consistency in screening for other disease aetiologies and variation in risk stratification for liver disease at study enrolment. All the studies nevertheless confirm the strong association between NAFLD and components of the metabolic syndrome,121,131 the prevalence of which has increased rapidly worldwide. 126 The high proportion of patients with diabetes (38.6%), obesity (60.3%) and hypertension (45.4%) in the NAFLD group in our study is in keeping with population-based studies. 118
The suspected proportion of advanced fibrosis within our NAFLD cohort is 7.6%. Additionally, from experiences in hospital care21,129,132 we predict that a subset of the 35.2% of patients with an indeterminate NFS may also have advanced fibrosis. There are currently no data on the severity of NAFLD in primary care. The most relevant studies that best reflect low-risk populations are restricted to biopsy findings in living related liver donors, among whom the prevalence of NASH (± fibrosis) ranges from 1.1% in Japan to 18.5% in the USA. 133 The latter figure is likely to be an overestimate due to the lack of detail on alcohol consumption and full liver aetiology screening in liver donors. Secondary/tertiary centre studies of variable size (range 118–733) and white predominance have reported that 11–27% of patients with biopsy-proven NAFLD and elevated aminotransferases have advanced (stages 3/4) fibrosis. 22,125,132,134,135 The higher rates of advanced fibrosis reported in these liver specialist centres are likely to be due to referral/sampling bias.
Our study has several unique strengths. First, this is the largest prospective cohort of primary-care patients with clinically unsuspected liver disease and incidental abnormal LFTs to be reported. Second, this is the first study to apply the non-invasive NFS to identify patients with advanced NAFLD fibrosis in primary care who are most in need of intensive lifestyle modifications and surveillance for liver-related complications (e.g. HCC detection). Third, the detailed assessment of the liver aetiology screen (alcohol/drug data, serology, genetics and USS imaging) undertaken and high completion rate (98%) mean that a cause for abnormal LFT was identified in the majority of cases (55%). Previous large-scale population-based retrospective analyses of abnormal LFTs have been limited by the absence of USS119 and the lack of information on alcohol and measured anthropometry2 to accurately describe the presence of NAFLD. The high rate of liver disease identification in our patient sample that PCPs perceived as a low-risk group may also be explained by the fact that GGT, which has the highest reported sensitivity for liver disease, above other LFTs,2 was the commonest LFT abnormality. The finding of an elevated GGT in more than 70% of the NAFLD group, compared with raised ALT in 51.0% and AST in 26.2%, has not been previously reported in adult patients with NAFLD. This finding has also been reported in children with NAFLD. 136
One limitation of this study is that the application of the NFS was validated against liver biopsy in patients with NAFLD attending hospital,21,129,132 and so it is possible that the severity of NAFLD may be overestimated in our primary-care cohort. However, our NAFLD cohort has very similar patient characteristics (white, obese, middle-aged, with abnormal LFT results) to those reported by Angulo et al. ,21 and in many countries the distinction between primary and secondary care is not as clear. For the purpose of our study, the NFS was chosen over other non-invasive systems135,137,138 that detect advanced fibrosis as it is an easily applicable tool (web-based calculator) that has the best reported PPV in secondary care,129 entails minimal extra cost to GPs (i.e. platelet sampling) and incorporates blood and clinical parameters that are routinely available in primary care. We were not able to validate the NFS against other non-invasive modalities,137–139 as these had not been developed or sufficiently studied by the time our study had started. Moreover, there are issues about how to validate such modalities in primary care, as it is unlikely that liver biopsies would ever be performed in such a large sample of patients or in this setting (and would also be unethical).
Despite a thorough non-invasive aetiology screen and detailed alcohol history, 45% had unexplained abnormal LFTs in our cohort. However, as we targeted the more problematic patients in primary care, who have incidental abnormal LFTs in the absence of a clinical suspicion of underlying liver disease, this is not a surprise. Furthermore, unlike previous general population studies118,119 that utilised only ALT, AST and/or GGT, our study recruited patients with a wider spectrum of LFT analytes to reflect common practice in primary care. It is therefore possible that some of the unexplained abnormal LFTs represent transient viral illness, Gilbert syndrome, under(self)-reported use of alcohol/over-the-counter medications or non-liver-related disease (i.e. bone, muscle). 119 Although USS is the most readily available imaging tool in primary care, the fact that 18% of the ‘unexplained’ group had co-existing obesity with diabetes and/or hypertension raises the possibility that reliance on ultrasound alone will miss a proportion of cases of NAFLD. The difficulty in detecting the presence of fatty liver with USS is well reported in the morbidly obese and when the degree of fat infiltration is < 33% of the hepatic content. 140 Furthermore, biopsy reports have shown that fat content is lost towards the more advanced stages of NAFLD, with the resultant fibrotic tissue being undetectable on USS. 140 The lack of markers of insulin sensitivity and lipid profile in the study meant that we were unable to non-invasively quantify hepatic fat,141 and hence potentially determine the numbers of patients with undetected NAFLD on USS within the ‘unexplained’ group.
Our findings have important clinical and public health implications. This study raises awareness that NAFLD accounts for a significant proportion of incidental abnormal LFT results commonly encountered by PCPs, in the absence of a clinical suspicion of liver disease. We have identified a potential subset of patients with NAFLD with advanced fibrosis (7.6%) who require further follow-up and management in secondary care. We would advocate reassurance and lifestyle modifications to patients with a low NFS (57.2%). In the absence of validated scoring systems, at present patients with an indeterminate NFS require close surveillance in primary care with referral to secondary care as deemed appropriate by the PCP.
In conclusion, we provide novel information on the severity of NAFLD in a primary-care setting, as well as guidance on the triaging of such patients for further investigation and management.
Chapter 8 Interpretation and discussion
The BALLETS study did what it set out to do – recruit a cohort of patients with abnormal LFT results in primary care, characterise them comprehensively and follow them up for 2 years. It is a prognostic study but not a standard diagnostic study. It is possible to calculate ‘sensitivity’ (‘true-positive rate’), as there are seven independent analytes and therefore many negative results for each analyte. However, it is important to remember that this will be an overestimate, as the results refer to a selected population of patients with at least one abnormality at index LFT (see Chapter 4, Selection effects). The degree to which this is representative of a wider population cannot be determined from within the study. However, the predictive value of abnormal LFTs can be confidently estimated from the study data.
Comparison of our results with previous literature
Research questions
Many of the BALLETS findings reinforce existing understanding concerning LFTs (or corroborate, in a primary-care setting, what is known in hospital care). Some findings reinforce ideas that many had long suspected but for which the evidence was scanty. A small but important number of findings are new or could be taken to contradict current understanding. We shall consider our findings with respect to the following questions:
-
Which LFTs (or combination of LFTs) predict what diseases (or disease classes)?
-
Which LFTs contribute most to diagnosis and which are more marginal?
-
What is the utility of the standard advice to repeat abnormal LFT results?
-
What is the overall contribution of LFTs to diagnosis in a primary-care setting?
-
What are the psychological sequelae of being told that LFT results are abnormal?
-
Why do doctors do so many LFTs and what are the different reasons for doing them?
-
What are the implications of 1–6, above, for ordering and interpreting LFTs?
-
How is fatty liver affected by change in weight for obese and non-obese patients?
Which tests predict which disorders?
There is a large literature on factors other than liver injury affecting LFTs, as summarised by Dufour et al. 14,15 Levels of both aminotransferase enzymes (ALT, AST) were higher in men, and levels of ALP were higher in women, in both the BALLETS study and Dufour's review. Both studies find an inverted U-shaped relationship between ALT and age, but Dufour did not find the age-related decline in albumin levels that we found in BALLETS. We found that globulin was higher in certain ethnic groups than others, but Dufour does not comment on this relationship. Laboratory reference ranges should be designed to take these factors into account, although, in practice, ethnicity is not considered. The positive association between ALT, AST and GGT with alcohol intake is well known but we have found an interesting negative association with ALP.
As far as the relationship between LFT levels and disease is concerned, our results are again in line with findings from Dufour et al. 's15 systematic review both in the univariate analysis (see Chapter 4, Patterns of abnormality and disease classes) and in the various multivariate analyses (summarised in Chapter 4, Discussion).The major distinction typically drawn between diseases that damage hepatocytes directly (e.g. alcohol, viral infection) and those that cause intrahepatic bile obstruction (namely PBC and PSC) was confirmed in the BALLETS cohort. As expected, the first group (disease category 1a) was associated with increased levels of aminotransferase enzymes (ALT and AST), whereas the second (disease category 1b) was characterised by high levels of ALP, the production and release of which from cell membranes is stimulated by cholestasis. ALP was also the analyte that was most strongly associated with type 2 diseases (that includes metastatic cancer); it was the only analyte for which abnormality was significantly associated statistically with this category.
Gamma-glutamyltransferase is by far the most frequently abnormal analyte (it has a strongly positive skewed distribution) and is raised across disease categories 1a and 1b. However, the high sensitivity of GGT is a function of its high overall positivity and it has lower predictive values for type 1a diseases than ALT and lower predictive values for 1b and category 2 diseases than ALP (see Chapter 4, Diagnostic performance of alternative liver function test panels, and Figure 10). Moreover, GGT was less sensitive than ALT for the most important 1a disease – viral hepatitis – and the cases of PBC that showed impending cirrhosis were all associated with abnormal ALP. Curiously, there is no tendency for higher GGT levels to achieve higher sensitivity as results become more extreme (see Chapter 4, Patterns of abnormality and disease classes, and Figure 9). We discuss the implications of the very low ‘specificity’ of GGT below.
Albumin levels are known to decline in decompensating cirrhosis and in many other late-stage diseases. 15 However, albumin measurement did not emerge as a useful test in our sample of low-risk patients with non-specific symptoms or attending for review of chronic diseases. It was the analyte that was least predictive of any other analyte being abnormal. It was not statistically associated with any disease category, nor did it emerge as an independent predictor for any disease class. We discuss this topic further below.
In summary, this study confirmed the well-known finding that aminotransferases are associated with ‘hepatocellular’ (1a) diseases, ALP with cholestatic (1b) diseases and systemic diseases involving the liver (type 2 diseases). It confirms that GGT is the most commonly abnormal analyte but its predictive value is relatively low. Albumin emerges as unhelpful for the diagnosis of liver disease in a non-high-risk population.
Which analytes are most useful and which are candidates for relegation?
Many laboratories use only five analytes and few use all eight deliberately included in BALLETS. It is reasonable to suppose that adding an analyte to a set that is already fit for its discriminatory purpose will add marginal diagnostic value at the cost of greater anxiety, patient inconvenience and health service expense. It is therefore important to use the BALLETS study to define the default set of analytes. If one wished to select only one analyte, GGT would be a very strong candidate, especially if sensitivity were regarded as a more important goal than specificity. However, as soon as two analytes can be selected, two prime candidates emerge – ALT and ALP (see Chapter 4, Patterns of abnormality and disease classes). The former ascertains most category 1a diseases and the latter most cases of 1b along with many in category 2. There was no correlation between ALP on the one hand and either ALT or AST on the other. In so far as they portend disease, they portend different diseases: ALT and AST are indicators of category 1a diseases (most often viral hepatitis, haemochromatosis), whereas ALP is a sign of intrahepatic biliary disease and, to a lesser extent perhaps, tumours in the liver. The univariate analysis showed that ALT is most strongly associated with 1a diseases and ALP with 1b and with category 2. The discriminant analysis confirms that these analytes are the strongest independent predictors of 1a and 1b diseases respectively [see Chapter 4, Results (complete case analysis) and Chapter 4, Analysis of imputed data]. The data show clearly that these two analytes, used together, are, by a considerable margin, the most discriminatory combination of tests in the extended LFT panel investigated. Bilirubin did not emerge as a strong discriminator. However, bilirubin is a marker of acute liver disease and there were few such cases in our sample. We can see an argument to retain bilirubin in the panel so that it can be used in both acute and chronic liver disease. 15 Candidates for removal from the LFT panel (when used to diagnose disease of the liver – see below) are therefore GGT, AST and the protein fractions.
These will be considered in turn. Our conclusions regarding the first and third of these analytes are the opposite of those proposed in a companion study also funded by the HTA programme. This study was based on record linkage in Tayside, Scotland, and was conducted by Donnan et al. 24 Donnan et al. 's study24 did not break down liver disease by type – the outcomes were ‘liver disease diagnoses, ‘liver disease mortality’ and ‘all-cause mortality’. As with BALLETS, the target population consisted of ‘patients with no obvious liver disease’. Patients were excluded if they were referred to hospital following an abnormal test, had an abnormal LFT within the 6 months preceding the index test or had ‘clinically obvious liver disease’ recorded in the electronic patient records. No distinction was drawn between ALT or AST, which were combined into an aminotransferase category. Follow-up was for a median of 3.5 years. Comprehensive testing for type 1 disease was not carried out, nor was ultrasound examination, as this was a retrospective study. These points are mentioned because we think they go some way to explaining the diametrically opposite conclusions we have reached with respect to GGT and albumin.
-
Gamma-glutamyltransferase Donnan et al. 24 recommend that GGT should be retained by laboratories when it is routine and that it should be incorporated in laboratories when this is not the case. This argument is based on the finding that that GGT was the most sensitive test for liver disease (sensitivity of 62%), whereas the next most sensitive test (transaminase) had a sensitivity of 36%. We contest their argument that GGT should be retained for this reason on two grounds:
-
The clinically important factor is the added value of a test given other information available to the clinician at the time. It is not overall sensitivity that should drive decision-making but marginal sensitivity given other information. There is no suggestion from any quarter that ALT and ALP should not be included in the LFT panel. So the salient question can be stated thus: given ALT and ALP, what are the marginal gains from adding GGT and what is the cost in terms of false-positives? ALT identified the majority of category 1a diseases, whereas ALP was raised in all but four of the 12 category 1b diseases. We provide an argument (see Chapter 5, Summary of main findings) that the most important result not to miss is chronic viral hepatitis. In this regard ALT was increased in 8 out of 12 cases in which the measurement was available from the index test; raised GGT identified only one additional case, and was seen in only 5 out of 11 cases for which the analyte was available, a smaller proportion than in the remainder of the study patients. It therefore appears that ALT and ALP in combination provide a sensitive strategy for detection of serious disease, and that the marginal yield from GGT is modest.
-
Any gains from GGT must be offset against the ‘costs’ contingent on a high false-positive rate. GGT was by far the most commonly abnormal analyte in BALLETS, with a high false-positive rate, and Donnan et al. 24 also found it to be the least specific of all the analytes they tested. As positive results create anxiety and risk a cascade of further tests, this undesirable feature of GGT must be offset against the small marginal gains in detection.
-
The expected relationship between true-positive rate and threshold (found for the other enzymes) is not present for GGT when the threshold is increased threefold (Figure 9). This casts further doubt on the value of GGT as a diagnostic test for liver disease in a general practice population at the existing threshold.
GGT was less sensitive than ALT for identification of fatty liver, but the relationship to alcohol use was confirmed. We discuss the use of LFTs in relation to alcohol misuse below.
-
-
Albumin Donnan et al. 24 note that albumin is a sensitive marker for ‘all-cause mortality’, while being more specific than GGT in this regard. The PPV for albumin, for death over 5 years, was 50% compared with 15% for GGT and 10% for ALP (the analyte with the lowest PPV). However, the marginal contribution to decision-making might have been less than these results might imply, as patients with low albumin were older (mean age of 69 years) than patients with normal albumin (mean age of 53 years) and they had considerably more comorbidities. The fact that these patients were frail and at high risk would therefore not have come as ‘news’ to the doctor in many cases. Donnan et al. 24 also found that albumin was the least sensitive marker for liver disease. We will discuss the role of LFTs as a general marker for health below, but the BALLETS results confirm that albumin is a very poor marker for type 1 and 2 liver diseases. It was associated with fatty liver but to a lesser extent than either BMI or ALT and does not emerge as a sensible screening test for people needing further investigation of this condition. In our opinion albumin could be dropped from the standard panel of tests for diagnosis of diseases involving the liver, although it might have utility as a marker for general health, not withstanding the point we make above. The issue of LFT markers for non-liver diseases is discussed below.
-
Aspartate aminotransferase AST seems to be the ‘poor relation’ of ALT. It has little independent diagnostic precision, and misses more type 1a diseases generally, and more cases of viral hepatitis specifically, than ALT. It seems to be a very strong candidate for relegation and this is not highly controversial; most laboratories in the UK incorporate either one or the other of the aminotransferase enzymes in their standard panel. As with all analytes, it is in no way part of our argument that analytes excluded from the standard panel should not be available in special circumstances and such an argument applies to AST, which has a proven utility in distinguishing between alcohol damage and other type 1a liver diseases. 15
In summary, we think that the standard testing for liver diseases should be built around just two analytes (ALT and ALP). We realise that this flies in the face of convention and may be too radical for immediate implementation. Bilirubin has a role when acute liver disease is suspected15 and it may be reasonable to include it in the standard panel for this reason. We think that protein measurements should, if possible, be kept in reserve for situations in which there is general ill health, rather than specifically disease of the liver, which prompts investigation as discussed below. GGT and AST also have potential special roles in relation to alcohol, also as discussed below.
It could be argued that we have not demonstrated, in a formal sense, that the marginal gains from an extended panel are outweighed by the contingent losses, in terms of anxiety, additional visits and further tests. However, a full decisions analysis to ‘prove’ this point would be a massive and tedious undertaking, as there are so many inputs to the model (clinical features and patterns of LFT abnormality) and so many outputs (all category 1 and 2 diseases along with NAFLD). For each disease the added improvement in outcome would have to be modelled in the face of extremely uncertain transitional probabilities (viral hepatitis being something of an exception). Moreover, the drop-off in sensitivity gain and rise in specificity loss when extending the default panel (beyond the two analytes we recommend) is so stark that we think an intuitive response to the data is not only necessary but desirable. One claim that we cannot make is that there are significant savings to be made directly from reducing the number of analytes processed. This is because the tests are performed on the same analytical platform and directly from the same sample so that only reagent costs of about £0.17 could be saved. That said, volumes are high so the laboratory would save over £11,000 per year.
Repeating abnormal liver function tests versus testing for a specific disease forthwith
The standard advice when the LFT result is abnormal is to repeat it. 3,10,90,91,93 We have no quarrel with the conclusion of Donnan et al. 's decision analysis24 – repeating the test is better than sending the patient to hospital for further investigation. However, this is not the only alternative; the result can be simply ignored (on the grounds, say, that it is only marginally abnormal) or a more specific follow-on test can be arranged as an alternative to simply repeating the LFT panel. The latter policy is precisely what our decision analysis indicates with respect to chronic viral hepatitis.
We question the standard default advice to repeat the LFT panel. LFT results remained abnormal on retesting in 84% of BALLETS patients (see Chapter 4, Correlation analysis of liver function tests) and in 66% of patients in the National Health and Nutrition Examination Survey. 142 The corollary is that, for the great majority of patients, the decision to take more specific action is merely deferred by the retesting policy. Furthermore, the low specificity of the repeat panel (15.6%) means that there is little opportunity for reassuring healthy patients (see Chapter 4, Diagnostic performance of alternative liver function test panels). Following an initial abnormal result, we suggest that either the patient should be reassured immediately or a further, more specific, test should be conducted in most cases. This point is reinforced by the decision analysis (see Chapter 6). Although conducted with respect to viral hepatitis, it may serve as a specimen for LFTs in general. An LFT work-up for category 1 genetic, autoimmune and viral diseases cost £67 in the study. Based on the costs shown in Chapter 6 (see Costs and cost analysis), it is less expensive to the service to proceed directly to a LFT work-up than to repeat the LFT panel with a view to a full work-up if still positive, provided that the probability of a positive repeat LFT panel exceeds 67%. This result would be more extreme if the patient costs for an extra test were also factored in. The advice to proceed directly to a specific test might make yet more sense if the panel used to exclude/diagnose diseases of the liver could be restricted to just two or three analytes (ALT, ALP and bilirubin), as suggested above. In that case, a broad default framework for investigation of an abnormal LFT result could be described, as shown in Figure 21.
As ALP is also associated with category 2 diseases, a more complete algorithm for this scenario is posited in Figure 22. Likewise, excluding type 1a diseases when ALT is raised requires further elucidation, and a default guideline for this scenario is offered in Figure 23 – this builds on the fast and frugal heuristic developed on the basis of the decision analysis in Chapter 6.
This scheme is indicative rather than prescriptive, must be tailored to individual circumstances and does not trump clinical judgement. It could be adapted to cope with a more extended range of analytes, if these are retained in the panel used for the diagnosis of liver disease. The algorithms deal with the default scenario and do not preclude repeating the LFT. This would certainly be the appropriate course of action if, for example, it transpired that the patient had taken vigorous exercise shortly before testing, as it is known that this can cause up to a threefold elevation in ALT. 14 Likewise, repeating the test would make sense in cases of suspected drug reaction or transient viral infection (such as infectious mononucleosis).
In summary, the standard advice to routinely repeat a LFT test following an abnormal result can be strongly questioned on the basis of BALLETS results. In 84% of cases this will simply defer the decision. The decision analysis (in which viral hepatitis is used as a specimen – see Chapter 6) suggests that moving directly to the test that would have been carried out had the LFT remained abnormal is the most cost-effective option from both a health service and societal perspective. Such a policy will also avoid cases of false reassurance that a recidivist test can provide (see Chapter 4, Persistence of liver function test abnormality from index to first follow-up).
The overall contribution of liver function tests to diagnosis in a primary-care setting
Donnan et al. 24 state that one of the four main findings of their record linkage study is that ‘liver disease is not common in those with abnormal LFTs’. These authors would have underestimated disease frequency because they followed patients for a median of 3.5 years, whereas many diseases, such as viral hepatitis, PBC and haemochromatosis, emerge over decades. BALLETS, on the other hand, must overestimate the frequency of clinically relevant disease, as many of the diagnoses were pathophysiological entities with lead times that would exceed life expectancy in many cases. This applies, in particular, to haemochromatosis and PBC but even, to some extent, to viral hepatitis. Despite this potential ‘over-ascertainment’ in BALLETS and underestimation in the Donnan et al. study,24 both studies reach the same conclusion: the incidence of serious liver disease in patients with abnormal LFT results in primary care is low. Only 59 patients in nearly 1300 had a potential serious disease affecting the liver, including metastatic cancer – a predictive value of < 5%. Only 45 (< 3.5%) had one of the diseases that are captured by a comprehensive liver screen for viral, autoimmune and genetic diseases. Viral hepatitis (B and C) is the most common of the serious liver conditions for which a highly effective medical treatment is available. Only 1% of people had this condition – similar to the UK population prevalence of 0.7% (and the Birmingham prevalence in antenatal clinics) (see Chapter 5, Discussion). The risk of cirrhosis is also under 1%, at 0.7% (see Chapter 4, Disease categories). The results therefore confirm the prevailing opinion that LFTs are carried out in circumstances in which serious preventable disease is unlikely to be detected. Category 1 diseases were rare in the BALLETS cohort, and even when detected the majority seemed to be following a benign course – increased availability and use of testing in developed countries identifies diseases that would not have presented but for the testing. Many of these category 1 diseases were subclinical and likely to remain so. Only two cases of PBC were likely to produce clinical sequelae – the remainder were in elderly patients with no evidence of incipient cirrhosis. Serious cases of PBC usually present in mid-life. Likewise, the majority of the 10 haemochromatosis cases are unlikely to come to harm and none was started on chelating treatment as a result of the diagnosis. Fewer than 1% of patients were started on a course of treatment that was likely to extend their life as a result of an abnormal LFT. These were patients with viral hepatitis, and all but two of them could have been detected not by LFT testing, but by simply testing all people from intermediate- or high-risk countries for the virus. In short, the BALLETS study confirms what many have long suspected: LFTs deliver much less than they promise, at least as far as detecting disease in the nominated organ is concerned.
Group 2 includes certain conditions that might not have been in the GP's mindset when ordered; for example, Paget's disease of the bone, infectious diseases such as glandular fever and leptospirosis, and thyroid disease. Analyses excluding these cases from Group 2 have been published in a BMJ Open paper (Lilford RJ, Bentham LM, Armstrong MJ, Neuberger J, Girling AJ. What is the best strategy for investigating abnormal liver function tests in primary care? Implications from a prospective study. BMJ Open 2013; in press.) The conclusions are essentially unchanged after these exclusions.
The 59 cases of putative liver disease do not include patients who have fatty liver and/or exceed safe alcohol limits. ALT is the most sensitive analyte for the identification of fatty liver. The high incidence of fatty liver (38%) is consistent with findings reported in the literature. 17,18,23,26–30 The finding nevertheless reinforces the high prevalence of ultrasonically detected liver fat. The ultrasound diagnosis of fatty liver is not an ‘exact science’, but the finding that obesity and ALT results increased with the presence and degree of fatty liver provided evidence of criterion validity for the interpretation of the ultrasonic images. That said, the value of making the ultrasound diagnosis of fatty liver is questionable. The probability of identifying incipient cases of fat-induced cirrhosis in this way must be small; the incidence of fatty liver is very much higher than that of fat-induced cirrhosis (38% vs < 0.5%). Moreover, the main argument for detecting fatty liver by ultrasound must rest on the expectation that a positive result will prompt behaviour change and motivate the identified person to eat and/or drink less. If this is true, then there may also be a risk that a negative result will provide false reassurance and hence an insouciant attitude to an unhealthy lifestyle. There are good arguments for behaviour change in people with high calorie or alcohol intake irrespective of LFT and liver ultrasound results. The use of LFTs or ultrasound as a method to encourage healthy lifestyles is an unproven intervention and arguably both should be used with circumspection pending further study – a topic to which we return below.
In summary, we conclude with Donnan et al. 24 that the proportion of cases in which LFTs lead to the diagnosis of a previously unsuspected liver disease, for which an evidence-based treatment is indicated, is very low – < 1% in BALLETS. Most of these cases relate to viral hepatitis and the value of LFTs would be further attenuated in a population in which all patients from high-risk countries had been screened. A more circumspect/discriminatory attitude to LFTs is recommended.
Psychological effects
The psychological consequences of being informed about a positive LFT were measured in nested studies in both the Donnan et al. 24 and BALLETS projects. Both studies formed a measurable adverse effect on anxiety (state anxiety in BALLETS and both state and trait anxiety in Donnan et al. 24). The BALLETS study (see Chapter 5, Psychology 1: effects of positive tests) also found that disease-specific worry was markedly increased and self-assessed health slightly decreased after testing when compared with results 2 years later. An ultrasound diagnosis of fatty liver was associated with slightly worse scores on all three dimensions but this result did not reach statistical significance. The qualitative study (see Chapter 5, Psychology 2: results on behaviour) produced results that were consistent with the dissipation of anxiety and disease-specific worry seen in the quantitative study. The hypothesis that ultrasound detection of fat in the liver would be a powerful motivating factor in behaviour change did not gain support from the quantitative data. This is consistent with poor recall of findings and their significance found in the qualitative study as discussed below. The effect of finding abnormal LFTs and/or ultrasound on unhealthy behaviour, and whether or not repeating these tests can nudge people towards healthy lifestyles, remain unanswered questions. On the other hand, there can be little question that the tests are anxiety provoking in the short term and this must be included in the deficit column in the balance sheet of potential benefits and harms of testing. Demonstrable anxiety following an abnormal result forms part of the argument for removing GGT from the default list of analytes in the LFT panel (it adds little information at the margin and increases the probability of a positive test with all that entails) (see Chapter 4, Diagnostic performance of alternative liver function test panels). The idea of restricting the panel for suspected liver disease to just two (or possibly three) analytes in the first instance is merely an extension of this argument.
In summary, the documented negative effects of an abnormal LFT mean that false-positives must be taken seriously. It is an argument against simply advocating including analytes with the highest sensitivity with no regards to predictive value. The idea that, in some circumstances, the disability of anxiety can be offset against positive effects on behaviour only applies to circumstances where there is an indication for behaviour change and even then the net benefit of using LFT results for this purpose is unproven. Taken in the round these considerations reinforce arguments for:
-
being more circumspect about doing LFTs in the first place
-
excluding analytes (e.g. GGT) that add only small marginal sensitivity to the LFT panel at the expense of a big increase in false-positives.
Doctors' motivations in ordering large numbers of liver function tests
The low probability of making a timely diagnosis of an important disease needing treatment suggests that LFTs have limited value in people with vague symptoms or as part of the monitoring of non-liver diseases. The time has come to re-examine the widespread use of these tests. Four motivations for testing can be discerned from the sociological substudy:
-
to diagnose a serious disease affecting the liver (i.e. a category 1 or 2 disease)
-
to test for a non-liver disease
-
to promote/reinforce behaviour change and/or to elucidate a suspicion that the patient may underestimate alcohol consumption
-
to reassure the patient and/or signal that the complaint was being taken seriously, to ‘buy time’ or as an ‘insurance policy’ against potential complaints.
We appreciate that these may be overlapping motivations and that doctors are not necessarily conscious of explicit motivations in practice; as one respondent said, ‘Ordering an LFT may have become a type of “tick-box” response.’ However, that might be part of the problem – failure to think through the purpose of testing can be blamed for the current situation where large numbers of people present with abnormal LFTs, the meaning of which is unclear; uncertain provenance generates low prognostic significance.
In summary, it makes sense to consider tailoring the LFT to the reason for testing rather than adopting a ‘one-size-fits-all’ approach.
Building a new testing paradigm
In this section, we combine the conclusions from the six preceding sections. First, we take as our starting point a hypothetical scenario in which LFTs have only recently been discovered and have not yet come into routine use. Second, we consider the rational response under such a scenario, given both biological knowledge (summarised by Dufour et al. 14,15) and the results of the BALLETS study. Third, we attempt an answer to this question in terms of the four broad motives for doing LFTs described above:
-
Concern over disease affecting the liver. We recommend that recourse to LFTs be more circumspect and that when carried out a panel of just two analytes (ALT and ALP) is used, with bilirubin added where an acute liver event is suspected. Alternatively, for simplicity, a three-panel test may be used. Such a panel would be suitable for monitoring liver-toxic drugs, in cases of suspected acute poisoning (e.g. paracetamol, mushroom) and in patients with infectious diseases, such as hepatitis A and leptospirosis. 15
-
Concern over general (non-liver) conditions. The standard LFT panel is not fit for purpose. It will produce a crop of positive results that do not point clearly to the next step, and simply repeating the LFT is unlikely to advance the diagnosis. We recommend a dropdown list of tests from which the clinician can select according to circumstances. Pending further research we suggest the following candidates for inclusion on such a list: thyroid function tests (TFTs), the full blood count (FBC), an inflammatory marker (such as C-reactive protein) and albumin. Which tests are most propitious in these circumstances is unclear. The FBC is useful in patients with non-acute abdominal complaints in general practice,143 whereas TFTs, FBC and an inflammatory marker are advocated for chronic fatigue. 144 TFTs and the FBC are ‘tractable’ in the sense that the required actions contingent on a positive result are reasonable well defined.
-
As a means to promote behaviour change or to confirm suspicion of alcohol misuse. If the patient is suspected to be in denial about alcohol misuse, then AST and GGT would be a sensible choice of test. Using LFTs to reinforce behaviour change may be a reasonable option pending further evaluation but it gets no support from this study (see Chapter 7, The effect of changes in body mass index and alcohol intake on fatty liver). GGT is an obvious choice in the case of alcohol misuse. It is important to be aware that behaviour change is still warranted, even if the test is normal, and that false reassurance should be avoided.
-
Meeting the patient's perceived need for a blood test. Here, it seems that the very last thing required is an ‘open-ended test’ – that is to say a test that has a high positive rate but whose meaning is obscure. We would recommend selection of tests that cover frequently missed diagnoses and where further action is well defined by a positive test result. TFTs and the FBC meet these requirements. Again, the clinician may be aided by a dropdown list or check list.
In summary, we think the LFT panel has outlived its usefulness and should be replaced by a more nuanced approach. Above, we outline such an approach based on the results of the BALLETS study and review of the literature.
Non-alcoholic fatty liver disease
We found a high prevalence of fatty liver disease as described above. BMI and ALT were the strongest predictors of this condition. Seventy per cent of patients who had fatty liver on their initial scan were found to have a fatty liver 2 years later, whereas only 14% patients did not have a fatty liver at the outset had this finding at 2-year follow-up. We found an interesting J-shaped curve relating alcohol intake to probability of fatty liver in men. This has been discovered before in secondary care,145,146 and moderate alcohol consumption was associated with lower risk of metabolic syndrome in a community-based study. 142
BALLETS found a significant (p = 0.032) association between change in BMI over 2 years and change in liver fat (see Chapter 7, The effect of changes in body mass index and alcohol intake on fatty liver). The improvement in liver fat was found to be sensitive to relatively small reductions in BMI. These findings should be encouraging to patients with metabolic syndrome/fatty liver, although the extent to which they may translate into clinical outcomes is conjectural. One important question relates to the effect of having a fatty liver on behaviour. The qualitative study did not suggest that the finding of a fatty liver was a sustained, powerful motivating factor (see Chapter 5, Psychology 2: effects of results on behaviour). Examination of weight change in BALLETS showed that, although patients with fatty livers lost some mass, on average, over the period of the study, those without fatty liver experienced a small gain. The difference does not reach statistical significance (see Chapter 4, The effect of changes in body mass index and alcohol intake on fatty liver). The possibility of a small but worthwhile effect remains an intriguing possibility.
Lastly, although most cases of fatty liver do not progress to cirrhosis, finding out why a small proportion do so is a priority, given the rising incidence of obesity and metabolic syndrome.
In summary, we provide yet further evidence that a small amount of alcohol is associated with healthy outcomes but proving a cause and effect relationship remains elusive. There is an intriguing hint that informing a person that they have a fatty liver will prompt weight loss but this needs confirmation, as does the hypothesis that this putative benefit is not vitiated by a countervailing effect in people who test negative. Using LFTs/ultrasound to promote behaviour change is unproven and clinicians should be cautious in doing so.
Strengths and limitations
Birmingham and Lambeth Liver Evaluation Testing Strategies is a unique study comprising patients presenting in primary care who have been investigated by means of a ‘full’ panel of LFTs who have then been comprehensively screened for liver disease and followed up for 2 years.
An important strength of the BALLETS study is that it investigates not only the psychological sequelae of testing, but also the reasons for ordering LFTs in the first place. This has enabled the authors to analyse the implications of the results not just in some general context, but in the context of what turned out to be very different motivations for testing. We have also conducted an analysis of cost per case detected for the most important liver disease – viral hepatitis – which resulted in a provocative finding that contradicts the current guidelines. Lastly, we have created a cohort of patients, many with fatty liver, for further follow-up. We had planned at one stage to develop a consensus statement regarding the practical implications of the BALLETS study. However, the results suggested that radical changes in practice were indicated and the groups of primary-care clinicians to whom the data have been presented were reluctant to immediately accept the radical corollaries that we believe flow from the data. Rather than sublimate our views in contemporary consensus we decided on a completely different philosophy – an interpretation that can be debated over time and gradually assimilated into practice as required. In economic terms, we felt that a form of supplier generated demand was needed, that this would take time, and that a ‘consensus development’ approach was likely to be excessively reactionary in the circumstances.
Not all eligible patients were recruited to the BALLETS study, but the substudy of patients who were not recruited in the integral pilot (see Chapter 3, Integral pilot) provides reassurance that the population studied is broadly representative of the target group.
Many patients diagnosed with specific conditions (especially PBC) represented pathophysiological entities rather than patients destined to suffer clinical effects. However, by providing details of each case we were able to distinguish between cases where clinical consequences were more or less likely to occur in a transparent way.
The number of patients in certain disease categories was small, limiting the statistical power of some of the analyses. However, this low incidence of disease emerges as an important finding in its own right. The populations chosen were deliberately rather high risk with a skew towards the inner city, rather than wealthier suburban or rural locations (see Chapter 6, Strengths and limitations of the study). The low predictive values observed in the BALLETS study would be, in all likelihood, lower still in more middle-class neighbourhoods.
Implications for research
-
A pilot study of a ‘customised’ approach to test ordering should be considered. The clinical value of different tests when patients have vague symptoms, such as tiredness or upper abdominal pain, should be evaluated. Likewise, the need to carry out more blood tests when patients are on treatment for chronic disease, such as hypertension, is unclear. There is a mismatch between the frequency with which blood tests are used to monitor chronic diseases and investigate symptoms, on the one hand, and scientific exploration of this subject, on the other. We have made the point (see Chapter 5) that LFTs are sometimes (or even usually) ordered not because liver disease is suspected, but because of a less focused suspicion of disease interacting with a perceived societal expectation to perform a test of some type. We propose a research project aimed at better defining situations in which different blood tests are done for vague symptoms (such as tiredness). On the basis of the BALLETS results, we hypothesise that tests such as TFTs and the FBC, for which the meaning of a positive result is rather clear cut, will offer more than LFTs, for which the predictive meaning of a positive result is so uncertain. In addition to studying the yield from various tests we propose an evaluation of a specific dropdown menu comprised the FBC, TFTs and the various individual components of the LFTs.
-
The BALLETS cohort should be followed up over time to find out whether or not it is possible to identify the minority of cases of fatty liver that are likely to progress to cirrhosis and to evaluate the fibrosis score in a primary-care setting. We are seeking permission to obtain death certificates for the cohort and also to obtain funding to follow up patients with special reference to the ultrasound diagnosis of NAFLD.
-
A controlled study of the net effects of using serial LFTs (including liver ultrasound) as part of a package to reduce unhealthy behaviours should be seriously considered, especially in light of the rising incidence of obesity. The hypothesis that using test results to promote behaviour change will do more good than harm is unproven. The simpler solution – tackling unhealthy behaviours directly and irrespective of test results – may be more effective all round.
Chapter 9 Conclusions
Our conclusion, derived by integrating statistical findings with motivations for testing, is that we have reached the beginning of the end of the pervasive LFT panel. A more rational response is to blend biological knowledge and the statistical results from the BALLETS study, to create testing heuristics appropriate to the very different purposes for testing. The ubiquitous and frequently used LFT panel has been the subject of prolonged scepticism. What has been lacking hitherto was a sufficiently large empirical study of patients similar to the bulk of those encountered in clinical practice and an intellectual framework that started with the objective of testing. We offer our study as an example of the insights that can be achieved when biological knowledge, quantitative field work and qualitative interviews are combined. The conclusions are radical, only because existing practice has evolved as a type of mneme, with little empirical justification.
Acknowledgements
Contributions of authors
All authors read and approved the final manuscript. In addition:
Richard Lilford (Professor of Clinical Epidemiology, Director of Birmingham Clinical Research Academy) participated in the design of the study and acquired funding, supervised the research group and drafted the final report manuscript.
Louise Bentham (BALLETS Study Co-ordinator) participated in study management, data collection in Birmingham, the design of the sociological and qualitative substudies, and contributed to writing final report.
Alan Girling (Senior Research Fellow) participated in the design of the study, performed the statistical analysis and wrote the first draft of the results section.
Ian Litchfield (Senior Research Fellow) conducted and co-authored qualitative and sociological substudies.
Robert Lancashire (Computer Officer) participated in design of analysis plan and provided critical revision of final report.
David Armstrong (Professor of Medicine and Sociology) participated in the design of the study and provided critical revision of final report.
Roger Jones (Professor of General Practice) participated in the design of the study, was lead GP at the Lambeth site and provided critical revision of final report.
Theresa Marteau (Professor of Psychology) participated in the design of the study, conducted psychology substudy and provided critical revision of final report.
James Neuberger (Honorary Consultant Physician) participated in the design of the study and provided critical revision of final report.
Paramjit Gill (Clinical Reader in Primary Care Research) participated in the design of the study and provided critical revision of final report.
Bob Cramb (Consultant Chemical Pathologist) participated in the design of the study, wrote a subsection of the report and provided critical revision of final report.
Simon Olliff (Consultant Radiologist) participated in the design of the study, conducted and drafted quality assurance of ultrasound subsection and provided critical revision of the final report.
David Arnold (Medical Student) conducted main study literature search and co-authored the decision analysis substudy.
Khalid Khan (Professor of Women's Health and Clinical Epidemiology) participated in the design of the study, the analysis plan and the final analysis, and provided critical revision of final report.
Matthew Armstrong (Clinical Research Fellow) conducted and co-authored the fibrosis substudy and provided critical revision of the final report.
Diarmaid Houlihan (Clinical Research Fellow) conducted and co-authored the fibrosis substudy.
Philip Newsome (Senior Lecturer in Hepatology) conducted and co-authored the fibrosis substudy.
Peter Chilton (Research Associate) co-authored the decision analysis substudy and provided critical revision of the final report.
Karel Moons (Professor of Clinical Epidemiology) participated in the design of the study, the analysis plan and the final analysis, and provided critical revision of the final report.
Doug Altman (Professor of Biostatistics) participated in the design of the study, the analysis plan and the final analysis, and provided critical revision of the final report.
All authors read and approved the final manuscript.
Contribution of others
The authors are grateful to the following people:
Study collaborators Rosalind Raine (Professor of Health Care Evaluation) and Ramasamyiyer Swaminathan (Consultant Chemical Pathologist).
Sonographers Clair Powell, David Hill, Sharon Breedon, Vicky Conway, Judith Phillips and John Leddy.
Research associates Susan Hoult Robinson, Emily Benson, Ruth Collins and Divya Chadha.
Research nurses and clinical researchers Carol Wheeler, Anne-Marie Goddard and Ugochi Nwulu.
Lead GPs Dr Ewan Hamnett, Dr Bill Strange, Dr David Taylor, Dr Lakh Jhass, Dr Kirstie Blackford, Dr Adam Fraser and Dr Peter Clarke.
Practice managers and administrative support Chris Jenkins, Joyce Marriott, Sheila Jones, Fay Knight, Janet Barrass, Julie Brown, Julie Walker, Karen Leslie, Angela Styring, Vicky Chambers, Suzy Hill, Maria DelVecchio, Melita Shirley and Catharine Hill.
Laboratory staff Dr Peter Lewis, Timothy Plant, Barbara Swann, Dr David Vallance, Dr Mourad Labib, Jean Shaw and Gillian Muirhead.
Database and IT support Phill Martin and Ciaron Hoye.
Publications
-
Arnold D, Bentham L, Jacob R, Girling A, Lilford J. Should patients with abnormal liver function tests in primary care be tested for chronic viral hepatitis: cost minimum analysis based on a comprehensively tested cohort. BMC Fam Pract 2011;12:9.
-
Lilford RJ, Bentham LM, Armstrong MJ, Neuberger J, Girling AJ. What is the best strategy for investigating abnormal liver function tests in primary care? Implications from a prospective study. BMJ Open 2013; in press.
Disclaimer
This report presents independent research funded by the National Institute for Health Research (NIHR). The views and opinions expressed by authors in this publication are those of the authors and do not necessarily reflect those of the NHS, the NIHR, NETSCC, the HTA programme or the Department of Health.
References
- Office for National Statistics (ONS) . Health Service Quarterly 2008:59-60.
- McLernon DJ, Donnan PT, Ryder S, Roderick P, Sullivan FM, Rosenberg W, et al. Health outcomes following liver function testing in primary care: a retrospective cohort study. Fam Pract 2009;26:251-9.
- Sherwood P, Lyburn I, Brown S, Ryder S. How are abnormal results for liver function tests dealt with in primary care? Audit of yield and impact. BMJ 2001;322:276-8.
- Milne S, Sheeran P, Orbell S. Prediction and intervention in health related behaviour: a meta-analytic review of protection motivation theory. J Appl Soc Psychol 2000;30:106-43.
- Petrie KJ, Cameron LD, Ellis CJ, Buick D, Weinman J. Changing illness perceptions after myocardial infarction: an early intervention randomized controlled trial. Psychosom Med 2002;64:580-6.
- Witte K, Allen M. A meta-analysis of fear appeals: implications for effective public health campaigns. Health Educ Behav 2000;27:591-615.
- Chatwin T. Diagnosing liver disease in asymptomatic patients. JAAPA 2001;14:39-47.
- Moseley RH. Evaluation of abnormal liver function tests. Med Clin North Am 1996;80:887-906.
- Murphy MK, Black NA, Lamping DL, McKee CM, Sanderson CF, Askham J, et al. Consensus development methods, and their use in clinical guideline development. Health Technol Assess 1998;2.
- Pratt DS, Kaplan MM. Evaluation of abnormal liver-enzyme results in asymptomatic patients. N Engl J Med 2000;342:1266-71.
- Rochling FA. Evaluation of abnormal liver tests. Clin Cornerstone 2001;3:1-12.
- Younossi ZM. Evaluating asymptomatic patients with mildly elevated liver enzymes. Cleve Clin J Med 1998;65:150-8.
- Green RM, Flamm S. AGA technical review on the evaluation of liver chemistry tests. Gastroenterology 2002;123:1367-84.
- Dufour DR, Lott JA, Nolte FS, Gretch DR, Koff RS, Seeff LB. Diagnosis and monitoring of hepatic injury. I. Performance characteristics of laboratory tests. Clin Chem 2000;46:2027-49.
- Dufour DR, Lott JA, Nolte FS, Gretch DR, Koff RS, Seeff LB. Diagnosis and monitoring of hepatic injury. II. Recommendations for use of laboratory tests in screening, diagnosis, and monitoring. Clin Chem 2000;46:2050-68.
- Bonacini M. The value of hepatic iron index in cirrhosis. Gastroenterology 1997;113:1052-3.
- Whitehead MW, Hawkes ND, Hainsworth I, Kingham JG. A prospective study of the causes of notably raised aspartate aminotransferase of liver origin. Gut 1999;45:129-33.
- Daniel S, Ben Menachem T, Vasudevan G, Ma CK, Blumenkehl M. Prospective evaluation of unexplained chronic liver transaminase abnormalities in asymptomatic and symptomatic patients. Am J Gastroenterol 1999;94:3010-14.
- Skelly MM, James PD, Ryder SD. Findings on liver biopsy to investigate abnormal liver function tests in the absence of diagnostic serology. J Hepatol 2001;35:195-9.
- Sorbi D, McGill DB, Thistle JL, Therneau TM, Henry J, Lindor KD. An assessment of the role of liver biopsies in asymptomatic patients with chronic liver test abnormalities. Am J Gastroenterol 2000;95:3206-10.
- Angulo P, Hui JM, Marchesini G, Bugianesi E, George J, Farrell GC, et al. The NAFLD fibrosis score: a noninvasive system that identifies liver fibrosis in patients with NAFLD. Hepatology 2007;45:846-54.
- Ekstedt M, Franzen LE, Mathiesen UL, Thorelius L, Holmqvist M, Bodemar G, et al. Long-term follow-up of patients with NAFLD and elevated liver enzymes. Hepatology 2006;44:865-73.
- Kim HC, Nam CM, Jee SH, Han KH, Oh DK, Suh I. Normal serum aminotransferase concentration and risk of mortality from liver diseases: prospective cohort study. BMJ 2004;328.
- Donnan PT, McLernon D, Dillon JF, Ryder S, Roderick P, Sullivan F, et al. Development of a decision support tool for primary care management of patients with abnormal liver function tests without clinically apparent liver disease: a record-linkage population cohort study and decision analysis (ALFIE). Health Technol Assess 2009;13.
- Dore GJ, Freeman AJ, Law M, Kaldor JM. Is severe liver disease a common outcome for people with chronic hepatitis C?. J Gastroenterol Hepatol 2002;17:423-30.
- Bellentani S, Tiribelli C, Saccoccio G, Sodde M, Fratti N, De MC, et al. Prevalence of chronic liver disease in the general population of northern Italy: the Dionysos Study. Hepatology 1994;20:1442-9.
- Pendino GM, Mariano A, Surace P, Caserta CA, Fiorillo MT, Amante A, et al. Prevalence and etiology of altered liver tests: a population-based survey in a Mediterranean town. Hepatology 2005;41:1151-9.
- Yano E, Tagawa K, Yamaoka K, Mori M. Test validity of periodic liver function tests in a population of Japanese male bank employees. J Clin Epidemiol 2001;54:945-51.
- Hultcrantz R, Glaumann H, Lindberg G, Nilsson LH. Liver investigation in 149 asymptomatic patients with moderately elevated activities of serum aminotransferases. Scand J Gastroenterol 1986;21:109-13.
- Mathiesen UL, Franzen LE, Fryden A, Foberg U, Bodemar G. The clinical significance of slightly to moderately increased liver transaminase values in asymptomatic patients. Scand J Gastroenterol 1999;34:85-91.
- Kim KM, Kim YJ, Lee KH, Paek D. Clinical characteristics of factory workers with asymptomatic liver function test abnormalities found on serial health examination. Korean J Hepatol 2005;11:144-56.
- Health Protection Agency . Hepatitis C information. Health Protection Agency 2003. www.hpa.org.uk/infections/topics_az/hepatitis_c/phlsgen_info.htm (accessed 22 February 2011).
- Health Protection Agency . Hepatitis B information. Health Protection Agency 2003. www.hpa.org.uk/infections/topics_az/hepatitis_b/gen_info.htm (accessed 22 February 2011).
- Worwood M. Haemochromatosis. Clin Lab Haematol 1998;20:65-7.
- Metcalf JV, Bhopal RS, Gray J, Howel D, James OF. Incidence and prevalence of primary biliary cirrhosis in the city of Newcastle upon Tyne, England. Int J Epidemiol 1997;26:830-6.
- AutoImmune Hepatitis Support Group . Autoimmune Hepatitis. AutoImmune Hepatitis Support Group Website 2009. www.autoimmunehepatitis.co.uk (accessed 3 April 2009).
- Olivarez L, Caggana M, Pass KA, Ferguson P, Brewer GJ. Estimate of the frequency of Wilson's disease in the US Caucasian population: a mutation analysis approach. Ann Hum Genet 2001;65:459-63.
- de Serres FJ. Worldwide racial and ethnic distribution of alpha1-antitrypsin deficiency: summary of an analysis of published genetic epidemiologic surveys. Chest 2002;122:1818-29.
- ESRC/JISC Census Programme, Census Geography Data Unit (UKBORDERS). EDINA: University of Edinburgh; n.d.
- Targher G, Montagnana M, Salvagno G, Moghetti P, Zoppini G, Muggeo M, et al. Association between serum TSH, free T4 and serum liver enzyme activities in a large cohort of unselected outpatients. Clin Endocrinol 2008;68:481-4.
- van Burren S, Boshuizen HC, Knook DL. Multiple imputation of missing blood pressure covariates in survival analysis. Stat Med 1999;18:681-94.
- Royston P, Carlin JB, White IR. Multiple imputation of missing values:new features for mim. Stata Journal 2009;9:252-64.
- Heshka JT, Palleschi C, Howley H, Wilson B, Wells PS. A systematic review of perceived risks, psychological and behavioral impacts of genetic testing. Genet Med 2008;10:19-32.
- Shaw C, Abrams K, Marteau TM. Psychological impact of predicting individuals' risks of illness: a systematic review. Soc Sci Med 1999;49:1571-98.
- Lerman C, Trock B, Rimer BK, Jepson C, Brody D, Boyce A. Psychological side effects of breast cancer screening. Health Psychol 1991;10:259-67.
- Marteau TM, Bekker H. The development of a six-item short-form of the state scale of the Spielberger State-Trait Anxiety Inventory (STAI). Br J Clin Psychol 1992;31:301-6.
- Ware JE, Gandek B. Overview of the SF-36 Health Survey and the International Quality of Life Assessment (IQOLA) project. J Clin Epidemiol 1998;51:903-12.
- Grattagliano I, Portincasa P, Palmieri VO, Palasciano G. Managing nonalcoholic fatty liver disease: recommendations for family physicians. Can Fam Physician 2007;53:857-63.
- Fuertes JN, Mislowack A, Bennett J, Paul L, Gilbert TC, Fontan G, et al. The physician–patient working alliance. Patient Educ Couns 2007;66:29-36.
- Bordin E. The generalizability of the psychanalytic concept of the working-alliance. Psychother Theory Res Pract Train 1979;16:252-60.
- Smith VA, DeVellis BM, Kalet A, Roberts JC, DeVellis RF. Encouraging patient adherence: primary care physicians' use of verbal compliance-gaining strategies in medical inteviews. Patient Educ Couns 2005;57:62-76.
- St GA, Bauman A, Johnston A, Farrell G, Chey T, George J. Effect of a lifestyle intervention in patients with abnormal liver enzymes and metabolic risk factors. J Gastroenterol Hepatol 2009;24:399-407.
- Smith JA. Semistructured interviewing and qualitative analysis. Rethinking methods in psychology. London: Sage; 1995.
- Smith JA. Beyond the divide between cognition and discourse: using interpretative phenomenological analysis in health psychology. Psychol Health 1996;11:261-71.
- Smith JA. Identity development during the transition to motherhood: an interpretative phenomenological analysis. J Reprod Infant Psyc 1999;17:281-300.
- Smith JA, Osborn M, Smith JA. Qualitative psychology. London: Sage; 2003.
- Willig C. Qualitative research in psychology: a practical guide to theory and method. Buckingham: Open University Press; 2001.
- Conrad P, Roth JA, Conrad P. The experience of illness: recent and new directions. Research in the sociology of health care. London: JAI Press; 1987.
- Elliott R, Fischer CT, Rennie DL. Evolving guidelines for publication of qualitative research studies in psychology and related fields. Br J Clin Psychol 1999;38:215-29.
- Parker I. Criteria for qualitative research in psychology. Qual Res Psychol 2004;1:95-106.
- Ley P. Memory for medical information. Br J Soc Clin Psychol 1979;18:245-55.
- Kessels RP. Patients' memory for medical information. J R Soc Med 2003;96:219-22.
- Fortun P, West J, Chalkley L, Shonde A, Hawkey C. Recall of informed consent information by healthy volunteers in clinical trials. QJM 2008;101:625-9.
- Edwards SJ, Lilford RJ, Braunholtz DA, Jackson JC, Hewison J, Thornton J. Ethical issues in the design and conduct of randomised controlled trials. Health Technol Assess 1998;2.
- Lawton J, Fox A, Fox C, Kinmonth AL. Participating in the United Kingdom Prospective Diabetes Study (UKPDS): a qualitative study of patients' experiences. Br J Gen Pract 2003;53:394-8.
- Raftery J, Bryant J, Powell J, Kerr C, Hawker S. Payment to healthcare professionals for patient recruitment to trials: systematic review and qualitative study. Health Technol Assess 2008;12.
- Lovato LC, Hill K, Hertert S, Hunninghake DB, Probstfield JL. Recruitment for controlled clinical trials: literature summary and annotated bibliography. Control Clin Trials 1997;18:328-52.
- Cegala DJ, Marinelli T, Post D. The effects of patient communication skills training on compliance. Arch Fam Med 2000;9:57-64.
- Beastall GH. The impact of the new General Medical Services contract: national evidence. Bull R Coll Pathol 2004;128.
- Galloway MJ, Osgerby JC. An audit of the indications for the reporting of blood films: results from the National Pathology Benchmarking Study. J Clin Pathol 2006;59:479-81.
- Croal BD. Bull R Coll Pathol. 2007.
- Wolf JL, Starfield B, Anderson G. Prevalence, expenditures, and complications of multiple chronic conditions in the elderly. Arch Intern Med 2002;162:2269-76.
- Xiong T, McEvoy K, Morton DG, Halligan S, Lilford RJ. Resources and costs associated with incidental extracolonic findings from CT colonogaphy: a study in a symptomatic population. Br J Radiol 2006;79:948-61.
- Axt-Adam P, van der Wouden JC, van der Does E. Influencing behaviour of physicians ordering laboratory tests: a literature study. Med Care 1993;31:784-94.
- Guthrie B. Why do general practioners take blood?. Eur J Gen Pract 2001;7.
- van der Weijden T, van Bokhoven MA, Dinant GJ, van Hasselt CM, Grol RP. Understanding laboratory testing in diagnostic uncertainty: a qualitative study in general practice. Br J Gen Pract 2002;52:974-80.
- Strauss A, Corbin J. Basics of qualitative research, grounded theory procedures. London: Sage; 1990.
- Kok G, de Vries H, Mudde NA, Strecher VJ. Planned health education and the role of self-efficacy: Dutch research. Health Edu Res 1991;6:231-8.
- Jefferis BJ, Power C, Manor O. Adolescent drinking level and adult binge drinking in a national birth cohort. Addiction 2005;100:543-9.
- Dantzer C, Wardle J, Fuller R, Pampalone SZ, Steptoe A. International study of heavy drinking: attitudes and sociodemographic factors in university students. J Am Coll Health 2006;55:83-9.
- Tversky A, Kahneman D. Judgment under uncertainty: heuristics and biases. Science 1974;185:1124-31.
- Wileman L, May C, Chew-Graham CA. Medically unexplained symptoms and the problem of power in the primary care consultation: a qualitative study. Fam Pract 2002;19:178-82.
- Creating a patient-led NHS: delivering the NHS Improvement Plan. London: DoH; 2005.
- van der Weijden T, van VM, Dinant GJ, van Hasselt CM, Grol R. Unexplained complaints in general practice: prevalence, patients' expectations, and professionals' test-ordering behavior. Med Decis Making 2003;23:226-31.
- van Bokhoven MA, Pleunis-van Empel MC, Koch H, Grol RP, Dinant GJ, van der Weijden T. Why do patients want to have their blood tested? A qualitative study of patient expectations in general practice. BMC Fam Pract 2006;7.
- Summerton N. Positive and negative factors in defensive medicine: a questionnaire study of general practioners. BMJ 1995;310:27-9.
- Bailey J, Jennings A, Parapia L. Change of pathology request forms can reduce unwanted requests and tests. J Clin Pathol 2005;58:853-5.
- Zaat JO, van Eijk JT, Bonte HA. Laboratory test form design influences test ordering by general practitioners in The Netherlands. Med Care 1992;30:189-98.
- Cramb R. Chemical Pathologist 2005.
- Cobbold JF, Anstee QM, Thomas HC. Investigating mildly abnormal serum aminotransferase values. BMJ 2010;341.
- Giannini EG, Testa R, Savarino V. Liver enzyme alteration: a guide for clinicians. CMAJ 2005;172:367-79.
- Limdi JK, Hyde GM. Evaluation of abnormal liver function tests. Postgrad Med J 2003;79:307-12.
- Theal RM, Scott K. Evaluating asymptomatic patients with abnormal liver function test results. Am Fam Physician 1996;53:2111-19.
- Butt AA. Hepatitis C virus infection: the new global epidemic. Expert Rev Anti Infect Ther 2005;3:241-9.
- Lavanchy D. Hepatitis B virus epidemiology, disease burden, treatment, and current and emerging prevention and control measures. J Viral Hepat 2004;11:97-107.
- Trepo C, Pradat P. Hepatitis C virus infection in Western Europe. J Hepatol 1999;31:80-3.
- Weber B, Taelem-Brule N, Berger A, Simon F, Geudin M, Ritter J. Evaluation of a new automated assay for hepatitis B surface antigen (HBsAg) detection VIDAS HBsAg Ultra. J Virol Methods 2006;135:109-17.
- Kanwal F, Farid M, Martin P, Chen G, Gralnek IM, Dulai GS, et al. Treatment alternatives for hepatitis B cirrhosis: a cost-effectiveness analysis. Am J Gastroenterol 2006;101:2076-89.
- Thompson-Coon J, Rogers G, Hewson P, Wright D, Anderson R, Cramp M, et al. Surveillance of cirrhosis for hepatocellular carcinoma: systematic review and economic analysis. Health Technol Assess 2007;11.
- Weinstein M, Fineberg H. Clinical decision analysis. Philadelphia, PA: WB Saunders; 1980.
- Lu SN, Wang JH, Kuo YK, Kuo HL, Chen TM, Tung HD, et al. Predicting the prevalence of antibody to hepatitis C virus (HCV) in a community by the prevalence of elevated levels of alanine aminotransferase: a method to identify areas endemic for HCV. Am J Trop Med Hyg 2002;67:145-50.
- Wang CS, Wang ST, Chou P. Using the prevalence of an elevated serum alanine aminotransferase level for identifying communities with a high prevalence of hepatitis C virus infection. Arch Intern Med 2001;161:392-4.
- Jamali R, Khonsari M, Merat S, Khoshnia M, Jafari E, Bahram KA, et al. Persistent alanine aminotransferase elevation among the general Iranian population: prevalence and causes. World J Gastroenterol 2008;14:2867-71.
- D’souza R, Foster GR. Diagnosis and treatment of chronic hepatitis B. J R Soc Med 2004;97:318-21.
- Persico M, Persico E, Suozzo R, Conte S, De SM, Coppola L, et al. Natural history of hepatitis C virus carriers with persistently normal aminotransferase levels. Gastroenterology 2000;118:760-4.
- Veldhuijzen IK, Toy M, Hahne SJ, De Wit GA, Schalm SW, de Man RA, et al. Screening and early treatment of migrants for chronic hepatitis B virus infection is cost-effective. Gastroenterology 2010;138:522-30.
- Introduction of hepatitis B vaccine into childhood immunization services: management guidelines, including information for health workers and parents. Geneva: WHO; 2001.
- International Travel and Health. Situation as on 1 January 2005. Geneva: WHO Press; 2005.
- International travel and health. Situation as on 1 January 2010. Geneva: WHO; 2010.
- Hutton DW, Tan D, So SK, Brandeau ML. Cost-effectiveness of screening and vaccinating Asian and Pacific Islander adults for hepatitis B. Ann Intern Med 2007;147:460-9.
- Newcombe RG. Two-sided confidence intervals for the single proportion: comparison of seven methods. Stat Med 1998;17:857-72.
- Todd PM, Gigerenzer G. Precis of Simple heuristics that make us smart. Behav Brain Sci 2000;23:727-41.
- Poland GA, Jacobson RM. Clinical practice: prevention of hepatitis B with the hepatitis B vaccine. N Engl J Med 2004;351:2832-8.
- Castelnuovo E, Thompson-Coon J, Pitt M, Cramp M, Siebert U, Price A, et al. The cost-effectiveness of testing for hepatitis C in former injecting drug users. Health Technol Assess 2006;10.
- Stein K, Dalziel K, Walker A, Jenkins B, Round A, Royle P. Screening for hepatitis C in genito-urinary medicine clinics: a cost utility analysis. J Hepatol 2003;39:814-25.
- Nakamura J, Terajima K, Aoyagi Y, Akazawa K. Cost-effectiveness of the national screening program for hepatitis C virus in the general population and the high-risk groups. Tohoku J Exp Med 2008;215:33-42.
- Mortality statistics: deaths registered in 2008. London: ONS; 2008.
- Bedogni G, Miglioli L, Masutti F, Tiribelli C, Marchesini G, Bellentani S. Prevalence of and risk factors for nonalcoholic fatty liver disease: the Dionysos nutrition and liver study. Hepatology 2005;42:44-52.
- Clark JM, Brancati FL, Diehl AM. The prevalence and etiology of elevated aminotransferase levels in the United States. Am J Gastroenterol 2003;98:960-7.
- Caballeria L, Auladell MA, Toran P, Miranda D, Aznar J, Pera G, et al. Prevalence and factors associated with the presence of non alcoholic fatty liver disease in an apparently healthy adult population in primary care units. BMC Gastroenterol 2007;7.
- Hamaguchi M, Kojima T, Takeda N, Nakagawa T, Taniguchi H, Fujii K, et al. The metabolic syndrome as a predictor of nonalcoholic fatty liver disease. Ann Intern Med 2005;143:722-8.
- Browning JD, Szczepaniak LS, Dobbins R, Nuremberg P, Horton JD, Cohen JC, et al. Prevalence of hepatic steatosis in an urban population in the United States: impact of ethnicity. Hepatology 2004;40:1387-95.
- Bugianesi E, Leone N, Vanni E, Marchesini G, Brunello F, Carucci P, et al. Expanding the natural history of nonalcoholic steatohepatitis: from cryptogenic cirrhosis to hepatocellular carcinoma. Gastroenterology 2002;123:134-40.
- Neuschwander-Tetri BA, Clark JM, Bass NM, Van Natta ML, Unalp-Arida A, Tonascia J, et al. Clinical, laboratory and histological associations in adults with nonalcoholic fatty liver disease. Hepatology 2010;52:913-24.
- Soderberg C, Stal P, Askling J, Glaumann H, Lindberg G, Marmur J, et al. Decreased survival of subjects with elevated liver function tests during a 28-year follow-up. Hepatology 2010;51:595-602.
- Ahmed MH, Abu EO, Byrne CD. Non-Alcoholic Fatty Liver Disease (NAFLD): new challenge for general practitioners and important burden for health authorities?. Prim Care Diabetes 2010;4:129-37.
- Alcohol: guidelines on sensible drinking. London: BMA; 1995.
- Kleiner DE, Brunt EM, Van NM, Behling C, Contos MJ, Cummings OW, et al. Design and validation of a histological scoring system for nonalcoholic fatty liver disease. Hepatology 2005;41:1313-21.
- McPherson S, Stewart SF, Henderson E, Burt AD, Day CP. Simple non-invasive fibrosis scoring systems can reliably exclude advanced fibrosis in patients with non-alcoholic fatty liver disease. Gut 2010;59:1265-9.
- Petersen KF, Dufour S, Feng J, Befroy D, Dziura J, Dalla MC, et al. Increased prevalence of insulin resistance and nonalcoholic fatty liver disease in Asian-Indian men. Proc Natl Acad Sci USA 2006;103:273-7.
- Marchesini G, Brizi M, Bianchi G, Tomassetti S, Bugianesi E, Lenzi M, et al. Nonalcoholic fatty liver disease: a feature of the metabolic syndrome. Diabetes 2001;50:1844-50.
- Wong VW, Wong GL, Chim AM, Tse AM, Tsang SW, Hui AY, et al. Validation of the NAFLD fibrosis score in a Chinese population with low prevalence of advanced fibrosis. Am J Gastroenterol 2008;103:1682-8.
- Tran TT, Changsri C, Shackleton CR, Poordad FF, Nissen NN, Colquhoun S, et al. Living donor liver transplantation: histological abnormalities found on liver biopsies of apparently healthy potential donors. J Gastroenterol Hepatol 2006;21:381-3.
- Angulo P. Long-term mortality in nonalcoholic fatty liver disease: is liver histology of any prognostic significance?. Hepatology 2010;51:373-5.
- Harrison SA, Oliver D, Arnold HL, Gogia S, Neuschwander-Tetri BA. Development and validation of a simple NAFLD clinical scoring system for identifying patients without advanced disease. Gut 2008;57:1441-7.
- Feldstein AE, Charatcharoenwitthaya P, Treeprasertsuk S, Benson JT, Enders FB, Angulo P. The natural history of non-alcoholic fatty liver disease in children: a follow-up study for up to 20 years. Gut 2009;58:1538-44.
- Parkes J, Roderick P, Harris S, Day C, Mutimer D, Collier J, et al. Enhanced liver fibrosis test can predict clinical outcomes in patients with chronic liver disease. Gut 2010;59:1245-51.
- Poynard T, Lebray P, Ingiliz P, Varaut A, Varsat B, Ngo Y, et al. Prevalence of liver fibrosis and risk factors in a general population using non-invasive biomarkers (FibroTest). BMC Gastroenterol 2010;10.
- Castera L, Forns X, Alberti A. Non-invasive evaluation of liver fibrosis using transient elastography. J Hepatol 2008;48:835-47.
- Saadeh S, Younossi ZM, Remer EM, Gramlich T, Ong JP, Hurley M, et al. The utility of radiological imaging in nonalcoholic fatty liver disease. Gastroenterology 2002;123:745-50.
- Kotronen A, Peltonen M, Hakkarainen A, Sevastianova K, Bergholm R, Johansson LM, et al. Prediction of non-alcoholic fatty liver disease and liver fat using metabolic and genetic factors. Gastroenterology 2009;137:865-8.
- Freiberg MS, Cabral HJ, Heeren TC, Vasan RS, Curtis ER. Alcohol consumption and the prevalence of the Metabolic Syndrome in the US: a cross-sectional analysis of data from the Third National Health and Nutrition Examination Survey. Diabetes Care 2004;27:2954-9.
- Muris JW, Starmans R, Fijten GH, Crebolder HF, Schouten HJ, Knottnerus JA. Non-acute abdominal complaints in general practice: diagnostic value of signs and symptoms. Br J Gen Pract 1995;45:313-16.
- Harrison M. Pathology testing in the tired patient: a rational approach. Aust Fam Physician 2008;37:908-10.
- Dunn W, Xu R, Schwimmer JB. Modest wine drinking and decreased prevalence of suspected nonalcoholic fatty liver disease. Hepatology 2008;47:1947-54.
- Moriya A, Iwasaki Y, Ohguchi S, Kayashima E, Mitsumune T, Taniguchi H, et al. Alcohol consumption appears to protect against non-alcoholic fatty liver disease. Aliment Pharmacol Ther 2011;33:378-88.
Appendix 1 BALLETS study protocol, version 13.0, March 2010
Appendix 2 BALLETS study analysis
Appendix 3 BALLETS study: summary of ethics and substantial amendment approval
List of abbreviations
- A1AT
- alpha-1 antitrypsin
- ALP
- alkaline phosphatase
- ALT
- alanine aminotransferase
- AMA
- anti-mitochondrial antibody
- ANOVA
- analysis of variance
- AST
- aspartate aminotransferase
- AUC
- area under the curve
- BALLETS
- Birmingham and Lambeth Liver Evaluation Testing Strategies
- BMI
- body mass index
- CI
- confidence interval
- COREC
- Central Office of Research Ethics Committees
- df
- degrees of freedom
- FBC
- full blood count
- FU1
- first follow-up test
- FU2
- second follow-up test
- GGT
- gamma-glutamyltransferase
- GP
- general practitioner
- HBV
- hepatitis B virus
- HCC
- hepatocellular carcinoma
- HCV
- hepatitis C virus
- HTA
- Health Technology Assessment
- ICC
- intraclass correlation coefficient
- ICER
- incremental cost-effectiveness ratio
- IQR
- interquartile range
- LFT
- liver function test
- NAFLD
- non-alcoholic fatty liver disease
- NASH
- non-alcoholic steatohepatitis
- NFS
- NAFLD fibrosis score
- NIHR
- National Institute for Health Research
- NPV
- negative predictive value
- OR
- odds ratio
- PBC
- primary biliary cirrhosis
- PCP
- primary-care practitioner
- PI
- Principal Investigator
- PPV
- positive predictive value
- PSC
- primary sclerosing cholangitis
- Q–Q
- quantile–quantile
- ROC
- receiver operating characteristic
- SD
- standard deviation
- SE
- standard error
- SMA
- smooth muscle antibody
- TFT
- thyroid function test
- ULN
- upper limit of normal
- USS
- ultrasound scan
- WHO
- World Health Organization
All abbreviations that have been used in this report are listed here unless the abbreviation is well known (e.g. NHS), or it has been used only once, or it is a non-standard abbreviation used only in figures/tables/appendices, in which case the abbreviation is defined in the figure legend or in the notes at the end of the table.