Notes
Article history
The research reported in this issue of the journal was commissioned and funded by the HTA programme on behalf of NICE as project number 11/118/01. The protocol was agreed in November 2012. The assessment report began editorial review in July 2013 and was accepted for publication in January 2014. The authors have been wholly responsible for all data collection, analysis and interpretation, and for writing up their work. The HTA editors and publisher have tried to ensure the accuracy of the authors’ report and would like to thank the reviewers for their constructive comments on the draft document. However, they do not accept liability for damages or losses arising from material published in this report.
Declared competing interests of authors
Aileen Clarke is a member of the NIHR HTA Editorial Board and the Warwick Medical School receive payment for this work.
Permissions
Copyright statement
© Queen’s Printer and Controller of HMSO 2015. This work was produced by Clarke et al. under the terms of a commissioning contract issued by the Secretary of State for Health. This issue may be freely reproduced for the purposes of private research and study and extracts (or indeed, the full report) may be included in professional journals provided that suitable acknowledgement is made and the reproduction is not associated with any form of advertising. Applications for commercial reproduction should be addressed to: NIHR Journals Library, National Institute for Health Research, Evaluation, Trials and Studies Coordinating Centre, Alpha House, University of Southampton Science Park, Southampton SO16 7NS, UK.
Chapter 1 Background
Description of the health problem
Arthritis is a general term that describes pain and inflammation within a joint. There are many causes, of which the most common is osteoarthritis (OA), a degenerative disease that has become a leading cause of pain and disability both in the UK and worldwide. 1 OA is a chronic syndrome of articular cartilage degeneration with associated synovitis and hypertrophic changes within bone. 2
Aetiology, pathology and prognosis
Osteoarthritis of the hip
The hip is a weight-bearing ball and socket joint that is commonly affected by OA. OA in the hip manifests itself as loss of articular cartilage, inflammation of synovial tissue and hypertrophy of the associated bone (e.g. osteophytes, bone sclerosis). The loss of cartilage tissue and new bone tissue growth suggests OA may result from disordered repair of cartilage damaged by mechanical and biochemical changes within the joint. 3
When the repair process is unable to keep up with the rate of tissue damage, the consequence is symptomatic OA characterised by pain, stiffness and progressive disability. 3
Osteoarthritis of the hip may be classified as primary or secondary. Secondary hip OA can be caused by most intra-articular diseases, including osteonecrosis, trauma, septic arthritis, Paget’s disease, hip dysplasia, Perthes’ disease and slipped upper femoral epiphysis. Primary hip OA is presumed when no other specific cause has been identified. 3
Rheumatoid arthritis of the hip
Rheumatoid arthritis (RA) is an autoimmune disease that commonly affects the synovial lining of peripheral joints, including those of the hand, foot and hip. RA is a multisystem disorder with implications for almost every region of the body, including the heart, lungs and eyes. 4 Multiple episodes of synovial inflammation lead to reduced articular cartilage (e.g. causing secondary OA), joint destruction and progressive disability. It has also been associated with reduced quality of life and premature mortality. 5–7
Rheumatoid arthritis manifests itself by gradual accumulation of structural changes within the joint, which can (particularly in late-stage disease) be detected by radiography or other imaging techniques. 5 In 2010, a joint working group of the American College of Rheumatology and the European League Against Rheumatism5 developed new criteria for identifying patients with early RA, which place more emphasis on characteristics associated with a high risk of later progression to severe and erosive disease.
Epidemiology of osteoarthritis and rheumatoid arthritis
Osteoarthritis is one of the most commonly encountered musculoskeletal diseases. There are an estimated 2.8 million patients with OA in the UK, based on symptomatic diagnosis in patients aged > 45 years. 8 A further 8.5 million people are estimated to be affected by joint pain that can be attributed to OA. 3
Current projections estimate that 10% of the world’s population aged ≥ 60 years will present with symptoms caused by OA. 9 The prevalence and incidence of OA, including hip OA, increase with age and are higher in women than in men after 50 years of age. 10,11 For example, the incidence rates of hip OA in men and women aged 70–79 years are estimated to be 430 and 600 per 100,000 person-years, respectively. 12
Estimates of age-standardised incidence rates of hip OA among women and men in Europe are about 53.3 and 38.1 per 100,000, respectively. 13 The prevalence of hip OA among Caucasians is demonstrably higher (range 3–6%) than in Asian, black and East Indian populations (≤ 1%). 14 In light of a longer life expectancy, an ageing population and increasing rates of obesity observed in developed countries, it is expected that both the incidence and the prevalence of OA will rise in future. 1,15,16
It is difficult to estimate the prevalence and incidence rates of OA accurately because of variable diagnostic criteria (e.g. radiographic, symptomatic or self-reported features). 10,17,18 For example, some patients with radiographic evidence of joint damage indicative of OA may not experience pain or disability whereas some patients with clinical OA may not demonstrate radiographic changes. These discrepancies make it challenging to determine the presence or absence of OA accurately. 10 In general, the prevalence of symptomatic or self-reported OA is higher than that of radiographic OA. 3
The prevalence of RA is estimated at 400,000 cases in the UK. Estimates of annual incidence suggest that 10,000–20,000 people develop RA in the UK each year. Although the disease may develop in patients at any age, onset classically occurs between the ages of 40 and 60 years. The incidence of RA is approximately two to three times greater in women than in men4 and approximately 10–40% of cases manifest within the hip. 19
Risk factors for osteoarthritis
Evidence suggests that contributing factors to OA can be classified broadly as:
-
biomechanical (e.g. joint injury, reduced muscle strength)
-
constitutional [e.g. advanced age (≥ 65 years), female sex, obesity and high bone density]
-
genetic (high heritability estimates for OA).
Biomechanical factors are probably the most important cause and may explain both the relationship between OA and obesity as well as the tendency for OA to affect weight-bearing joints, for example the hips and knees. 2 Malalignment, instability and altered joint loading correlate with OA progression in both clinical and animal studies. 20,21 In the hip, femoroacetabular impingement are related to OA onset; ‘cam type’ is a bump on the surface of the femoral head typically affecting younger athletic men and ‘pincer type’ impingements describe an overdeep acetabululm, which restricts the movement of the femoral head – this typically affects middle-aged women. The prevalence of any type of congenital or acquired hip malformation is 4.3% in men and 3.6% in women. Similarly, epidemiological studies have demonstrated associations between certain occupational factors (e.g. long-distance running, farming, heavy physical work load) and hip OA. 22,23
However, biomechanical factors alone do not explain the onset of OA in non-weight-bearing joints (e.g. the carpometacarpal joints) and metabolic factors may also play a role. 2,24
Symptoms and diagnosis
Symptoms of hip OA include pain, stiffness and loss of function, that is, limited daily activities such as walking, climbing the stairs and performing household tasks. 1,11,19,25 The diagnosis of primary hip OA is usually based on history and clinical examination with particular assessment of joint pain, deformity and reduced range of movement. Physical examination can also exclude pain resulting from other causes, for example bursitis, tendonitis and muscle spasm. Plain radiographs of the hip are used to identify and stage OA.
Advanced imaging techniques such as magnetic resonance imaging (MRI) and computerised tomography can identify causes of secondary hip OA (e.g. stress fractures, osteonecrosis, Paget’s disease, inflammatory arthropathies) as well as evaluating and monitoring the extent of hip damage. 1,18
Natural history of osteoarthritis
The natural history of OA varies between affected joints but little is known about the natural history of the symptomatic disease. The prognosis of hip OA has been shown to be the least favourable and is the most frequent reason for surgical intervention after 1–5 years of progression. 3 The national clinical guideline (CG) for OA3 states that hip OA has the worse outcome of all the OA sites discussed in the CG (hip, knee, hand). Occasionally, OA hips can improve without surgical intervention as measured by symptoms and radiographic change. 3 Comorbidity (e.g. diabetes, obesity, cardiovascular disease) may additionally influence the prognosis of OA, as does older age. 3
Impact of the health problem
Significance for patients in terms of ill health (burden of disease)
Osteoarthritis has a significant impact on an individual patient, resulting in pain, stiffness, limited mobility and reduced function. A UK-based survey assessed the impact of OA on daily living for 1762 people. 26 The majority of the sample consisted of people aged ≥ 50 years, of whom 75% were female. In total, 81% of respondents were found to have experienced constant pain and/or were limited in their ability to perform everyday tasks. Many respondents had visited a general practitioner three or four times before a diagnosis of OA, which was made on average 18 months after the onset of symptoms. Approximately 72% of respondents had comorbid conditions such as heart disease, diabetes and hypertension.
Significance for the NHS
The economic impact of arthritis consists of direct costs to health-care services and indirect costs because of lost productivity and early mortality. The impact of OA on health services and the UK economy has been substantial. The cost of treating OA has been estimated to be approximately £640 per person per year. 19 A report has suggested that, if one-tenth of the 15.2 people per 1000 who experience hip pain severe enough for surgery received medical and/or physical therapy, the cost to the NHS in England and Wales would be of the order of £48M per year in 2002. 19 The costs of both surgical and non-surgical interventions are reviewed in detail later in this chapter.
Because of the ageing of the population, OA is projected to become the fourth leading cause of disability worldwide by 2020. 3 In the present economic climate of tightening health-care spending, the implications of increasing demand for the treatment of arthritis of the hip have led to intense discussions about the cost-effectiveness of new technologies and treatment options.
Measurement of disease
More than 20 tools have been developed and validated for the assessment and monitoring of patient outcomes specific to hip arthritis. 27 One commonly used disease-specific tool is the Western Ontario and McMaster University Osteoarthritis Index (WOMAC). 28 This is a 24-item questionnaire that covers three domains of pain, stiffness and physical function, with a total score ranging from 0 (worst outcome) to 100 (best outcome). Other validated tools designed to measure outcomes specific to hip function and symptoms (e.g. disability, pain, range of motion, limitations in daily living and other activities) have also been used. 27,29
In the UK the most commonly used tools are the Oxford Hip Score (OHS)30 and the Harris Hip Score (HHS). 31
The Oxford Hip Score
The OHS is one of the most commonly used hip-specific measures. It was designed to assess function and pain in relation to daily activities (e.g. walking, dressing, sleeping) for patients undergoing total hip replacement (THR) surgery. 30 The OHS includes 12 multiple choice items and scores range from 0 (worst outcome) to 48 (best outcome).
The Harris Hip Score
The HHS is another frequently used tool. It includes 10 items (maximum score of 100 denoting ‘best possible outcome’) and consists of four domains: pain (severity, effect on activities, need for pain medication), function (daily activities – stair climbing, sitting, managing shoes/socks; gait – limp, support needed, walking distance), absence of deformity (hip flexion, abduction, internal rotation, extremity length) and range of motion (hip flexion, abduction, internal/external rotation and adduction). 31
Other commonly used measures include the Hip Disability and Osteoarthritis Outcome Score (HOOS),29 the Merle d’Aubigné and Postel hip score32 and the Lequesne Index of Severity for Osteoarthritis of the Hip (LISOH). 33–35
Current service provision
Management of disease
Treatment and management of arthritis in the UK can be categorised as non-surgical and surgical as detailed below. Patients in the early stages of OA begin treatment with non-surgical options; when non-surgical management has failed, patients are considered for intervention with surgical treatment.
Non-surgical management:
-
self-management and patient education
-
non-pharmacological (acupuncture, exercise, physical therapy, manual therapy, weight reduction)
-
pharmacological [simple analgesics, non-steroidal anti-inflammatory drugs (NSAIDs), topical treatments, intra-articular steroid injections].
Surgical management:
-
surgery [e.g. THR or resurfacing arthroplasty (RS), arthrodesis, arthroscopy, osteotomy].
Current service cost
Arthritis has a significant negative impact on the UK economy with an estimated total cost of 1% of gross national product. 36 It is the most common group of conditions for which people receive Disability Living Allowance in England. The benefits provided outweigh those provided for people diagnosed with heart disease, stroke, chest disease and cancer combined. 36 A reported £43M is spent annually on community services and £215M on social services for OA. 36 In 2002 an estimated 36 million workdays were lost because of OA, resulting in £3.2B of lost productivity. 36 Data for the numbers of people who have their symptoms managed by non-surgical interventions (such as pain relief, exercise, physical therapy and manual therapy) within England and Wales are difficult to ascertain.
Chen et al. 8 estimated the cost of topical and oral NSAIDs using prescribing data from 2005/6. They reported that an estimated 167,000 people with a diagnosis of OA were found to have been prescribed topical NSAIDs and 1.4 million patients were prescribed oral NSAIDs. The annual costs were £8.5M and £25M, respectively. 8 Adjusting for inflation they found that this would equate to £19.2M and £25.65M, respectively in 2010. Most health economic analyses have reported that surgery for the treatment of arthritis is a cost-effective intervention and maximises cost per quality-adjusted life-year (QALY) gained. 37
An earlier Health Technology Assessment (HTA) report (reference number 01/21/01)19 found that the annual cost to the NHS of elective hip replacement surgery for the treatment of OA was £140M and that each trust spent, on average, £257,000 on the purchase of hip prostheses in 1998/9. This study was conducted in 2002. 19 It reported that the cost to the NHS and social services of non-surgical treatment for an individual was approximately £640 per person per year. During the year 2000, £405M was spent on 44,000 hip and 35,000 knee replacements. 36 Since then the costs have increased substantially, as the estimated cost to the NHS of THR surgery alone in 2011 was reported to be £426M. 36
The cost of one surgical treatment in 2002 was £3891, averaged across all NHS trusts in 1999/2000, with the cost for 50% of trusts falling within the range £3404–4434.23. 19 According to the 8th Annual Report of the National Joint Registry for England and Wales (NJR),36 the cost of hip replacement surgery varies considerably from trust to trust in the UK, with no set national price for implants. The cost depends considerably on length of hospital stay. For example, the tariff reimbursement paid to a trust in one study in 2005/6 for a primary THR was £6000 whereas, in 2010, the national tariff was set at £5552 for an uncomplicated THR. 36
When hip replacement surgery fails, revision surgery to replace part or all of the prosthetic hip joint may be required. The number of revision procedures has increased in recent years, with 3012 carried out in 2003/4, rising to 6581 by 2008/9. 36 This accounted for approximately 9.4% of all elective hip replacement procedures performed in England and Wales. 36 Revision surgery is also a key element of the current service expenditure, with unit costs of revision generally higher than those for primary surgery. Briggs et al. 38 reported a mean cost for a standard hip revision procedure in 2000/1 as £5294 (£6385 in 2008 prices) compared with £3889 (£4690 in 2008 prices) for a primary procedure. The 2002 HTA report19 stated that, in 1989/90, one in seven of all procedures (5000 out of a total of 35,000) was a revision of a hip replacement. In 1999/2000 a crude estimate of 6700 revisions was reported. 19
Randomised controlled trials (RCTs) have compared revision rates across prosthesis types but with insufficient sample sizes or durations of follow-up to produce conclusive results. 39 The largest observational study found that 7-year revision rates were lower for cemented (3.0%) than for hybrid (3.8%) or cementless (4.6%) prostheses. 36 Edlin et al. 40 reported that a total of 97% of UK hip replacements are still working (unrevised) at 5 years.
Variation in services and uncertainty about best practice
Outcomes for hip replacement surgery vary by geographical location, surgeon and hospital. The Global Orthopaedic Registry has shown that patient selection criteria vary between practitioners, surgeons and referring doctors and between countries. 41 Nationally, there are reported inconsistencies in the treatment, procedure and prostheses that are offered to patients in the NHS. 42
In 1998 more than 60 hip prostheses manufactured by 19 companies were available commercially in the UK, with total NHS expenditure of approximately £53M. 43 By 2008 this had risen to 124 brands of acetabular cups and 137 brands of femoral stems at a cost of £67M. 36 This represents a substantial increase in the variety of available prostheses in recent years. Implants are often grouped into cemented, cementless and hybrid prostheses. 44 The reported increasing use of cementless components in the UK has contributed to a doubling of prosthesis costs between 1996 and 2006. 44
There is variation in the rate of primary hip replacement expenditure in England per 1000 population weighted by age, sex and need. For example, hip RS accounts for 6% of the approximate 70,000 hip arthroplasty operations conducted in England and Wales every year, although the equivalent figure among men aged < 55 years is 33%. 40
Spend also varies significantly between regions in the UK, with the lowest reported in Tower Hamlets (£560) and the highest in Devon (£8140). 42 When examining data by local authority, the difference in the rate of provision of hip replacements per 1000 people in need was almost 14-fold. 42 National European Quality of Life-5 Dimensions (EQ-5D) data after hip replacement for England and Wales show that variation between the best and worst trusts is large (31–49%) and cost-effectiveness varies considerably between hospitals. 45
Relevant national guidance
In the UK, the National Collaborating Centre for Chronic Conditions (NCC-CC) of the Royal College of Physicians developed clinical practice guidelines for OA. 3 The National Institute for Health and Care Excellence (NICE) developed clinical guidance on the selection of prostheses for primary THR46 and metal-on-metal hip RS. 25
Summary of National Institute for Health and Care Excellence guidance on the selection of prostheses for primary total hip replacement
The 2000 technology appraisal (TA)246 stated that the ‘benchmark’ for the selection of prostheses for THR should be a revision rate of ≤ 10% at 10 years with evidence relating to data from adequately sized, well-conducted observational studies or RCTs. NICE recommended that various patient factors, including age and underlying pathology, should be taken into account when choosing prostheses, for example ease of revision (of particular importance for younger patients).
Specific recommendations on the selection of hip prostheses for primary THR were considered difficult to construct because the evidence base was generally poor and difficult to interpret. However, the available evidence supported the use of a range of cemented prostheses for primary THR. This was further supported by the evidence on immediate and long-term postoperative pain.
There are currently no cost-effectiveness data based on revision rates after ≥ 10 years of follow-up to support the use of the generally more costly cementless and hybrid hip prostheses. Some evidence suggested that these types of prostheses might lead to less bone loss, meaning that they were potentially easier to revise than cemented prostheses. However, no reliable evidence was available to support the proposition that the potential ease of revision of a hip prosthesis would outweigh its poorer revision rate.
Summary of National Institute for Health and Care Excellence guidance on the use of metal-on-metal hip resurfacing arthroplasty
In the June 2002 NICE guidance TA44,25 metal-on-metal hip RS was recommended as one option for people with advanced hip disease who would otherwise receive, and are likely to outlive, a conventional primary hip replacement. It did note, however, that the current evidence was principally in individuals aged < 65 years and that surgeons should bear this in mind. Furthermore, the guidance stated that all patients receiving this arthroplasty should be made aware of the relative paucity of evidence for medium- to long-term safety and reliability and the likely outcome of revision surgery compared with that for conventional THR.
However, in June 2012 advice about follow-up of patients receiving a metal-on-metal articulation changed. The Medicines and Healthcare products Regulatory Agency (MHRA) issued a medical device alert47 stating that a small number of patients implanted with these hips might be at risk of developing progressive soft tissue reactions to the wear debris associated with metal-on-metal articulations; this updated the original advice of April 2010. These reactions could also adversely affect the results of later revision surgery. However, it also stated that its evidence pointed to the fact that early revision of such poorly performing metal-on-metal articulations should give a better revision outcome. Therefore, the agency advised that clinicians should perform appropriate follow-up, depending on which group a patient’s hip surgery fitted into, as well as whether the patient was symptomatic or asymptomatic. Follow-up, if indicated, should consist of both imaging (MRI or ultrasound) and blood metal ion tests [ion level greater than seven parts per billion indicates the potential for soft tissue reaction]. Revision should be considered if imaging is abnormal and/or blood metal ion levels are rising.
Summary of Medicines and Healthcare products Regulatory Agency alert advice
Metal-on-metal hip RS implants:
-
symptomatic: follow-up annually for life of implant
-
asymptomatic: follow-up according to local protocols – no need for investigations unless cause for concern about cohort or patients who become symptomatic.
Metal-on-metal THRs with a head diameter < 36 mm:
-
symptomatic: follow-up annually for life of implant
-
asymptomatic: follow-up according to local protocols – no need for investigations unless cause for concern about implant.
Metal-on-metal THRs with a head diameter ≥ 36 mm:
-
annual follow-up for life of implant whether symptomatic or not.
DePuy ASR™ hip replacements (all types) (DePuy, West Chester, PA, USA):
-
annual follow-up for life of implant whether symptomatic or not.
National Institute for Health and Care Excellence guidance on the care and management of osteoarthritis in adults
The most recent NICE guidance on OA, issued in February 2008,3 stresses the importance of a holistic assessment of the patient, including his or her function, quality of life, occupation, mood, relationships and leisure activities. After this assessment, the clinician is advised to formulate and agree a management plan with the patient, which should include ‘core treatments’ such as education, muscle strengthening and aerobic exercise, and weight-loss programmes for the overweight or obese. It should also include other self-management and ‘conservative’ strategies such as application of heat/cold packs or transcutaneous electrical nerve stimulation to the site of pain, manipulation and stretching (particularly for hip OA) and assessment for bracing/joint supports/insoles/walking sticks.
Adjuncts to the above ‘core’ treatment could include pharmacological treatments, in particular paracetamol (regular dosing may be required) and topical NSAIDs or topical capsaicin (although topical treatments are less useful for hips). If these are found to be insufficient for relieving pain, practitioners are advised to consider adding opioid analgesics or oral NSAIDs. Intra-articular corticosteroid injections are recommended for moderate to severe pain. Clinicians are advised to consider a referral for joint surgery if the patient has already been offered the ‘core’ treatments and is still experiencing joint symptoms that have a substantial impact on quality of life.
The Orthopaedic Data Evaluation Panel
The Orthopaedic Data Evaluation Panel (ODEP) was established to provide an independent assessment of clinical evidence, submitted by suppliers, on the compliance of their implants for THR and hip RS with NICE benchmarks for safety and effectiveness. ODEP produced detailed criteria for this assessment and in 2010 there was an ongoing review of this guidance by all stakeholders. 36 ODEP does have to rely on the honesty of the submitting companies and therefore provides no warranty that the data in its database are accurate, complete or current.
For 10-year benchmark products (those recommended to last for 10 years), ODEP places them in one of four categories according to whether there is evidence that a product meets NICE guidelines:
-
level A – strong evidence that product meets NICE guidelines
-
level B – reasonable evidence that product meets NICE guidelines
-
level C – weak evidence that product meets NICE guidelines
-
unacceptable – unacceptable evidence that product meets NICE guidelines.
For products that fail to meet NICE’s 10-year benchmark, ODEP looks at evidence at 3, 5 and 7 years. Again, these products are split according to whether there exists acceptable, weak or unacceptable evidence for the product meeting NICE guidelines.
As of March 2011, ODEP ratings had been given to 38% of available brands of femoral stems and 41% of available brands of acetabular cups used in primary procedures. However, 42% of available brands of acetabular cups and 47% of available brands of femoral stems being used in England had not yet submitted data to ODEP. Clearly, for surgeons to make the most informed choices, it is important that all manufacturers submit their product data to ODEP using the pro forma and associated guidelines.
Description of the technology under assessment
Summary of total hip replacement
The predominant surgical intervention for the treatment of arthritis in England and Wales is THR, using a variety of cemented or uncemented stemmed femoral prostheses articulating with a cup that fits into the acetabulum. In 2011, 80,314 hip procedures were carried out in England and Wales; this rose to 88,599 in 2012. 48 THR has been so successful in treating hip OA that it has been described as the operation of the 20th century. 49 The average age of patients undergoing a hip replacement in 2010 was 67.2 years. There was a 3% increase in the percentage of women undergoing a THR in 2010/11 (59%) compared with 2009. On average, female patients were older than male patients at the time of their THR (68.8 years and 66.3 years, respectively). 36
Modern THR began in the 1970s with widespread use of the Charnley prosthesis (DePuy, West Chester, PA, USA). More than 80,000 procedures are performed every year in England and Wales, with excellent clinical outcomes showing > 95% implant survivorship at 10 years’ follow-up and > 80% implant survivorship at 25 years’ follow-up. 41
Rates for primary and revision THR have been increasing, with a 16% increase recorded in the UK between 2005 and 2010. 41 Although rates are 1.5–2 times higher for women than for men, THR is becoming more common for both sexes and for those in younger age groups. The greatest proportion of procedures (65%) is carried out in patients aged ≥ 65 years. However, the proportion of patients undergoing THR who are aged < 65 years is projected to increase to 50% of all arthroplasties by 2030. 41
The decision to undertake THR is guided by symptoms (pain, functional impairment) and by physical examination and radiographic findings. Patients presenting with hip pain will follow a care pathway similar to the one presented in the following section.
In the early stages, non-surgical treatment options will be provided such as exercise and physical therapy. Non-surgical options are used until the point at which they are deemed to have failed. The patient is then referred to an orthopaedic specialist for secondary assessment and possible surgical intervention. Indications for THR surgery in the UK are:
-
OA (93%)
-
avascular necrosis (2%)
-
fractured neck of femur (2%)
-
congenital dislocation (2%)
-
inflammatory arthropathy (1%). 48
The success of surgical intervention can be influenced through patient selection. Assessment of patient and prosthesis outcomes is necessary to identify which designs or surgical techniques provide the best patient benefit. Relative contraindications to THR include severe obesity, advanced age and other medical comorbidities. There is a reported 40% increased risk of complications for every decade above the age of 65 years. 41 THR in younger patients, who are typically more active, is problematic because of the risk of poor prosthesis survivorship over a patient’s lifetime. Waiting time for surgery should also be considered as it can be an important factor in patient outcomes following THR. Under the current waiting time targets, people in England should not have to wait longer than 18 weeks for their hip replacement surgery once it has been recommended.
Example patient care pathway for hip arthroplasty
Figure 1 presents a typical care pathway for patients treated for arthritis in the NHS. In general, patients would be treated in primary care services and undergo various non-surgical management options. Once non-surgical management is said to have failed, the patient is classified as having end-stage arthritis and is recommended for surgery in secondary care.
Figure 2 presents the two surgical options THR and hip RS. The care pathways are similar in terms of pre- and postoperative care and follow-up.
Identification of different types of total hip replacement
The different types of THR can be categorised into the following subgroups:
-
hip replacement with different fixation methods for implant components (cemented, cementless, hybrid or reverse hybrid prostheses)
-
hip replacement with implant components (i.e. femoral stem, femoral head, acetabular cup) made from different materials (metal, ceramic, polyethylene)
-
hip replacement with differing femoral head sizes.
Hip replacement with different fixation methods
Hip replacement prostheses can be categorised by their fixation method (Figure 3) as (a) cemented, (b) cementless, (c) reverse hybrid with a cemented cup and cementless stem or (d) hybrid with a cemented stem and cementless cup. Cemented prostheses are held in place with bone cement and generally consist of three components: a femoral stem, a femoral head (modular) and an acetabular cup. These components are permanently attached to the pelvis and the femur. According to the NJR, the percentage of cemented procedures did not change between 2009 and 2010. The number of cemented procedures had been in decline since 2005. In 2004 the figure was at 77%, and by 2010 this had reduced to 50%. 36
Cementless prostheses rely on initial press-fit fixation followed by natural bone growth. They typically consist of four components: a femoral stem, a femoral head, an acetabular cup shell and an acetabular liner. The theoretical benefit of the cementless fixation is the possibility of bone–implant interface (human : technology) remodelling. In England and Wales there has been a 4% increase in cementless procedures in recent years. 36
The cementless prostheses include implant components coated in a porous material (hydroxyapatite) that is compatible with bone growth and which helps to secure the liner in place. Hydroxyapatite is a mineral form of calcium apatite. 50 Hydroxyapatite is also commonly used as a filler to replace amputated bone in addition to a coating to promote bone ingrowth into prosthetic implants.
A hybrid hip replacement consists of a cemented femoral stem and a cementless acetabular cup, whereas the reverse hybrid uses a cementless femoral stem and a cemented acetabular cup. In 2010, 14% of these types of procedure were reverse hybrid (cementless stem, cemented acetabulum) and 86% were standard hybrid (cemented stem, cementless acetabulum). 36
Hip replacement with components made from different materials
The combinations of prosthetic components that are available are listed in Table 1. The different materials used for the implant components (i.e. femoral stem, femoral head, acetabular cup) produce various articulating surfaces or bearing surfaces.
Femoral head (press-fit) | Fixation method | Femoral stem | Acetabular cupa | Acetabular cup shell | Acetabular liner |
---|---|---|---|---|---|
THR articulation type | |||||
Metal | Cemented | Metal | Polyethylene | – | – |
Metal | Metal | Metal | – | – | |
Ceramic | Metal | Polyethylene | – | – | |
Ceramic | Metal | Ceramic | – | – | |
Ceramic | Cementless | Metal | – | Metal | Ceramic |
Metal | Metal | – | Metal | Polyethylene | |
Metal | Metal | – | Metal | Metal | |
Ceramic | Hybrid (cemented femoral stem and a cementless acetabular cup) | Metal | – | Metal | Ceramic |
Ceramic | Metal | – | Metal | Polyethylene | |
Metal | Metal | – | Metal | Metal | |
Metal | Metal | – | Metal | Polyethylene | |
Metal | Reverse hybrid (cementless femoral stem and a cemented acetabular cup) | Metal | Polyethylene | – | – |
Metal | Metal | Metal | – | – | |
Ceramic | Metal | Polyethylene | – | – | |
Ceramic | Metal | Ceramic | – | – | |
RS articulation type | |||||
– | Cemented | Metal | Metal | – | – |
– | Cementless | Metal | Metal | – | – |
– | Hybrid | Metal | Metal | – | – |
The NJR report for 201136 provided the percentage use of fixation type during 2010 and 2011 (Table 2). The cemented fixation type was the most popular fixation method and the polyethylene-on-metal articulation combination was used the most (86.1%) of all the cemented bearing surfaces. The cementless fixation type was the second most common fixation method and the polyethylene-on-metal articulation combination was most popular (35.6%).
Articulation combination (cup material-on-femoral head material) | Cemented (n = 132,511) | Cementless) (n = 102,688) | Hybrida (n = 43,933) | All (n = 279,132) |
---|---|---|---|---|
Other/unknown | 2.9 | 5.7 | 3.8 | 4.0 |
Ceramic-on-ceramic | 1.8 | 25.6 | 15.1 | 12.6 |
Polyethylene-on-ceramic | 8.4 | 14.2 | 11.7 | 11.0 |
Metal-on-metal | 0.9 | 18.9 | 3.0 | 7.9 |
Polyethylene-on-metal | 86.1 | 35.6 | 66.5 | 64.4 |
Another way of characterising the variation of combination of articulation surface and fixation method is by frequency of use, as reported in the NJR. The most common combinations are listed in Table 3 along with the associated acronyms that have been used in the remainder of this report.
Implant characteristics | Acronym used in the reporta |
---|---|
Metal head (cemented stem) on cemented polyethylene cup | CeMoP |
Metal head (cementless stem) on cementless hydroxyapatite-coated metal cup (polyethylene liner) | CeLMoP |
Ceramic head (cementless stem) on cementless hydroxyapatite-coated metal cup (ceramic liner) | CeLCoC |
Hybrid metal head (cemented stem) on cementless hydroxyapatite-coated metal cup (polyethylene liner) | HyMoP |
Metal head (cementless stem) on cementless non-HA-coated metal cup (polyethylene liner) | CeLMoP (non-HA) |
Ceramic head (cemented stem) on cemented polyethylene cup | CeCoP |
Hybrid metal head (cemented stem) on cementless non-HA-coated metal cup (polyethylene liner) | HyMoP (non-HA) |
Polyethylene-on-metal (cup material-on-femoral head material)
A metal ball with polyethylene cup (or polyethylene liner inside a metal cup) (Figure 4) is the most common type of articulation combination (both cemented and cementless) and is one of the cheapest. The Charnley low-friction arthroplasty was the first widely accepted polyethylene-on-metal prosthesis to be used. It has a high reported implant survivorship at > 20 years’ follow-up (> 80%) and at 35 years’ follow-up (78%). 41 It also provides the baseline against which new prosthetic designs are compared. In England and Wales this was the most common articulation type used during 2010 and 2011 (see Table 2). Clinical advice suggested that, if a metal cup is used with a polyethylene liner, a cementless cup fixation is most commonly used in England, and the cementing of the metal cup is increasingly rare. Highly cross-linked polyethylene is being used by some surgeons in place of standard polyethylene in THRs because of its lower reported wear rates. 51,52
Polyethylene-on-ceramic
The polyethylene-on-ceramic option combines a polyethylene cup with a hard ceramic femoral head (Figure 5). This articulation type is reported to have a lower wear rate than the polyethylene-on-metal combination and is cheaper than the ceramic-on-ceramic option. It is used more often with a cementless fixation (14.2%) than with a cemented fixation (8.4%) (see Table 2). The ceramic head is harder than metal and hence reportedly withstands more wear. In the past ceramics were brittle and cracked, leading to failure of the implant, but advances in technology have limited this problem in recent years.
Ceramic-on-ceramic
The ceramic-on-ceramic articulation (Figure 6) provides the hardest bearing surface combination and is generally the most expensive combination available. 40 This combination has a lower reported wear rate than other options available to patients in England and Wales. The ceramic-on-ceramic articulation is mostly used without cement, as shown in Table 2 (25.6% cementless vs. 1.8% cemented). Clinical advice suggests that the cementless ceramic cup is the most common type of ceramic-on-ceramic articulation in England; cementing the ceramic cup is increasingly rare, as demonstrated in the NJR data. 36
Metal-on-metal
Metal-on-metal articulations (Figure 7) provide a hard bearing surface; however because of their reportedly high revision rate the MHRA has made recommendations for following up patients implanted with such devices. 47
The MHRA recommendations apply to four groups of metal-on-metal replacements:
-
metal-on-metal hip RS implants
-
metal-on-metal THRs with a head diameter < 36 mm
-
metal-on-metal THRs with a head diameter ≥ 36 mm
-
DePuy ASR hip replacements comprising:
-
ASR acetabular cups for hip RS or THR
-
ASR surface replacement heads for hip RS
-
ASR XL femoral heads for THR.
Revision is necessary when prostheses fail, more common in younger patients, usually for loosening secondary to wear or dislocation. Interestingly, metal-on-metal bearing surfaces were actually designed by surgeons to reduce the proportion of replacements that require revision. They had been extensively assessed in simulator tests and were noted to be highly resistant to wear, even when used in very large head sizes. 53
Head size is important because in simulator tests larger head sizes give lower wear because of the boundary lubrication regime becoming more favourable. 54 Therefore, implantation of large diameter metal-on-metal bearing surfaces on stemmed prostheses became popular on the basis of such evidence, which suggested that they should result in less wear and thus lower failure rates. They seemed to be particularly appropriate for younger, more active patients.
However, several issues have arisen with the practical use of these metal-on-metal prostheses. It soon emerged that one brand of metal-on-metal prosthesis, the DePuy ASR, actually seemed to fail early. 55 Data received from the company56 showed that 5 years after surgery 12% of patients who received the ASR RS and 13% of patients who received the ASR THR required revision surgery.
This prompted recent analysis of NJR data on 402,051 hip replacements to assess whether metal-on-metal bearing surfaces lead to increased implant survival compared with other bearing surfaces in stemmed THR. 16 These authors additionally challenged the previous evidence that larger head sizes result in improved implant survival.
The results revealed that, in THR, metal-on-metal articulations failed at higher rates than other bearings. For example, 5-year revision rates in younger women were 6.1% [95% confidence interval (CI) 5.2% to 7.2%] for 46-mm metal-on-metal articulations compared with 1.6% (95% CI 1.3% to 2.1%) for 28-mm polyethylene-on-metal articulations. This effect was found even though the ASR data had been removed before analysis (the DePuy ASR articulations had already been removed from the market). Thus, it is a problem with all metal-on-metal prostheses, not an implant-specific characteristic. In addition, their failure was found to be related to head size, with larger heads failing earlier than smaller versions (this effect was the opposite to that found for ceramic-on-ceramic articulations). The authors suggested a number of potential reasons for the finding that larger metal heads fail earlier, such as a failure to achieve optimum lubrication or trunnion (post that inserts into head) wear55 resulting in metal debris leading to local soft tissue reactions57 or early loosening because of increased transmitted torque from the larger head. The authors of the paper therefore suggested that metal-on-metal replacements not be performed because of poor implant survival and that patients undergo at least an annual review with both clinical and radiological examination, in line with the MHRA recommendations. 47
Furthermore, there are the potential dangers of exposure to metals such as chromium and cobalt. Metal alloys used in metal-on-metal bearings degrade through wear, from corrosion or because of a combination of the two. 58 Consequently, they produce a vast number of nanometre- to submicrometre-sized metal particles that cumulatively present a large surface area for corrosion. 59 This is also relevant to the polyethylene-on-metal bearings, which also produce such particles through wear. The consequences of local and systemic exposure to the wear particles and the accompanying biologically active corrosion products have been extensively researched. 60 It is well known that metal debris can induce adverse local soft tissue reactions41 including the release of inflammatory cytokines from macrophages, histiocytosis, fibrosis and necrosis. 61 Local results include aseptic loosening because of osteolysis induced by some immunological reaction involving hypersensitivity62 and local pseudotumours (soft tissue masses relating to the joint) that are locally destructive and require revision surgery in the majority of patients. 63
Furthermore, it seems that metals can disseminate through the body and cause direct damage to end organs such as the kidneys, lungs and brain. 64,65 There is also evidence of genotoxicity and evidence that these metals can signal across biological barriers at concentrations produced after THR. 66 The genotoxic effects of the metal ions are thought to be mediated either by direct action, causing DNA breaks through attacks on free radicals, or through an indirect effect by inhibiting the repair of DNA. 67 There have been concerns that this genotoxicity could cause a long-term increased risk of malignancy, particularly important for the younger, more active patients in whom life expectancy after implantation is long. However, recent studies have failed to find this increase68 and some have actually found a decrease in the numbers of certain malignancies in metal-on-metal articulation patients. 69
The US Food and Drug Administration (FDA),70 the UK MHRA47 and the British Orthopaedic Association71 have released statements of concern about metal-on-metal articulations. The MHRA recommendation states that patients with metal-on-metal bearings and a painful hip joint should have yearly measurements of whole blood metal ion concentrations and radiographic assessment to exclude adverse local tissue reactions as the source of pain. 47 These yearly assessments should continue for the lifetime of the hip replacement. The use of metal-on-metal bearing surfaces has consequently declined in England and Wales. In 2010/11 only 7.9% of all procedures used a metal-on-metal implant (see Table 2). However, data suggest that they are still being used extensively in other countries. For example, in the USA, 35% of articulations were metal-on-metal in 2009. 72
Hip replacement with differing femoral head sizes
Research has suggested that differing femoral head sizes leads to variation in the rate of revision. Smith et al. 16 reported that the use of larger head sizes (> 36 mm in diameter) improves stability and range of motion compared with the smaller head diameters that are used with other bearing surfaces. Use of large diameter femoral heads increases the distance that the head must travel before dislocation, without decreasing hip range of motion, thus increasing stability. 41
Summary of hip resurfacing arthroplasty
Hip RS has been developed as a surgical alternative to THR. It is reported to be an option that is predominantly suited to younger, active, male patients. 46 The procedure consists of placing a cobalt–chromium metal cap over the head of the femur while a matching metal cup (similar to that in THR) is placed in the acetabulum. This replaces the articulating surfaces of the hip joint and is bone-conserving compared with THR (Figure 8). According to clinical advice, in NHS practice the metal cup is generally cementless and the femoral metal head can be cemented or cementless.
In 2011 patients were on average 54.8 years of age when they underwent RS. Four times as many men underwent this procedure as women. 36 According to the NJR 2011 report,36 this shows good adherence by the orthopaedic community to guidelines issued by the British Orthopaedic Association during 2009/10 on patient selection criteria for metal-on-metal RS prostheses. As with THR, patient selection is crucial for the outcome of RS.
The FDA has produced patient selection criteria for metal-on-metal hip RS. These include:
-
patient is fit and active
-
patient has normal proximal femoral bone geometry and bone quality
-
patient would otherwise receive a conventional primary THR
-
patient is likely to live longer than current conventional THR prostheses are expected to last. 73
Johnson et al. 74 reported 100% implant survivorship at 5 years’ follow-up in 93 patients identified using narrow selection criteria who underwent RS. The selection criteria included avoiding RS in patients with large femoral head or neck cysts, ensuring proper seating of the femoral component band and ensuring an optimal thickness of the cement mantle. The authors of this study suggested that the best results were achieved in male patients aged < 50 years with a primary diagnosis of OA and a native femoral head > 50 mm in diameter. 74 Individual surgeon experience with hip RS is also an important factor and outcomes may differ between operators. Although positioning of the surgical component in RS is comparable in difficulty to that of THR, there is a learning curve that must be negotiated for surgeons inexperienced with the procedure. 41
Since 2011 there has been a significant decrease in the percentage of RS procedures conducted in England and Wales. There has also been a reduction in the percentage of procedures using a large head implant for RS. 36 This is thought to be because of the withdrawal of the DePuy ASR device from the market following the identification of higher than expected revision rates for this product.
Failure of hip replacement
A hip replacement may fail because of peri- and/or postoperative complications such as implant instability, dislocation, aseptic loosening, osteolysis, implant fracture and infection.
Implant instability and dislocation
Instability and recurrent dislocation are the most common reasons for THR failure and the second most common cause of failure of revision THR. The prevalence of dislocation ranges from 0.3% to 10% for primary THR and is 28% for revision THR. 75–77
The most common reasons for instability are component malpositioning and abductor (muscle) deficiency such as a loss of abduction power, which can lead to a severe limp. Cup malpositioning can lead to increased wear of particular sections of the prosthesis, for example both 45-degree inclination (tilting) and 20-degree anteversion (forward tilting) have been associated with THR failure. 78,79 However, age, previous fracture, surgical volume, surgical approach, component sizing and polyethylene wear are also contributory factors to revision because of instability and dislocation. 80–83
Recurrent late dislocation remains a major source of THR failure. There are various treatment options for patients who have recurrent dislocations. These include revision surgery using constrained polyethylene liners (which offers increased stability but at the cost of smaller range of motion), larger diameter femoral heads and dual mobility devices.
Aseptic loosening and osteolysis
Aseptic loosening is a common cause of failure of THR. It arises because of osteoclast-mediated bone reabsorption at the bone–implant interface, which can lead to loosening, implant migration, implant failure and periprosthetic fracture. 84 Osteolysis is one of the most common complications after THR, which may lead to implant failure. It is initiated as a result of an inflammatory process against polyethylene particulate debris. Component malpositioning is a major cause of severe wear and osteolysis, but they are also affected by activity level and material and component design. 85
Aseptic loosening and osteolysis are diagnosed clinically by patient reports of pain. They are treated with replacement of loose components and correction of component malalignment. Outcomes after revision surgery are generally good, with reported mechanical failure rates < 5% at follow-up. 86
Periprosthetic fracture
Periprosthetic fracture is a major complication after THR and is associated with increased morbidity and mortality. Risk factors for periprosthetic fracture include previous revision surgery, component malalignment, age, osteoporosis, previous fracture and minor trauma. 87,88
Treatment for most periprosthetic fractures is usually surgical. Treatment for most periprosthetic fractures is usually surgical and the options depend on the fracture pattern. It can include open reduction and internal fixation with or without cortical strut allografts, longer femoral stems or increases in the setting of acetabular fractures, or tumour prostheses. 89,90
Infection
Infection of a THR prosthesis is associated with greatly increased morbidity, mortality and use of health-care resources. The infections can by treated with antibiotics; however, deep infections are rarely cured by antibiotics alone and may require revision surgery. As more THRs are performed, the absolute number of deep infections is likely to increase although, because of comprehensive infection control techniques, rates are relatively low. Risk factors for infection include age, obesity, comorbidities and American Society of Anesthesiologists (ASA) score. Longer operative times and reoperation within 90 days have been implicated as risks for infection. 91,92
Revision of hip arthroplasty
Recent data demonstrated that 7-year revision rates were lower for cemented (3.0%) than for hybrid (3.8%) or cementless (4.6%) prostheses. 36 RCTs have compared revision rates across prosthesis types but with insufficient sample sizes or durations of follow-up to produce conclusive results. 39
Factors affecting long-term prosthesis survivorship include patient-related factors such as comorbidities and patient activity levels. 41 Once an implant has failed, patients will have implant revision surgery. The rate at which hip replacements are revised is termed the revision burden.
In England and Wales the NJR keeps a record of whether each operation performed is a primary replacement or a secondary revision of a replacement. This allows trends to be followed to estimate how many revision operations are expected in the future, hence the revision burden (Table 4).
Procedure | 2006/7 | 2007/8 | 2008/9 | 2009/10 | 2010/11 |
---|---|---|---|---|---|
Hip primary, n | 58,445 | 66,556 | 69,681 | 70,669 | 77,800 |
Hip revision, n (%) | 6198 (9.6) | 6725 (9.2) | 7345 (9.5) | 8285 (10.5) | 9200 (10.6) |
Total, N | 64,643 | 73,281 | 77,026 | 78,954 | 87,000 |
This shows a rise in the number and proportions of operations that are being conducted for revision of THRs over the last couple of years, which in real terms relates to around 3000 more revisions over the last 5 years. This may be because the recipients of the replacements are living longer and are thus outliving their THR or possibly may be because of more stringent follow-up. At NHS hospitals, revision procedures account for a higher percentage of the total procedures (13%) than at any other type of provider, with 84% of all revision procedures in 2010/11 being performed in the NHS. 36
Clinical follow-up
Implants should be assessed every year for signs of loosening, migration/measure of prosthesis movement (e.g. femoral head penetration rate) and failure. Although no studies have examined the benefits of specific follow-up frequencies, NICE recommends continued periodic follow-up.
Follow-up using radiostereometric analysis allows for precise quantification of any implant movement of the prosthesis; however, visual inspection of the radiograph by the surgeon is commonly used in clinical follow-up. 93 Evidence suggests that early detection of lesions (e.g. aseptic lymphocyte-dominated vasculitis) is more cost-effective than waiting until patients report pain and loss of function and an assessment is conducted. 94
Disability, function, pain, limitations in daily activities, overall satisfaction and health-related quality of life should be routinely measured and documented at follow-up using validated instruments [e.g. Short Form questionnaire-12 items/Short Form questionnaire-36 items (SF-12/SF-36), EQ-5D]. 27
Current usage in the NHS
The following information was taken from the 8th Annual Report of the NJR. 36
General statistics
-
In total, 179,450 operations (hip, ankle and knee) were reported to the NJR in 2010, a 9.9% increase on the previous year.
-
However, 15.8% of these operations were accounted for by operations performed in previous years being added to the register.
-
The increase in numbers of hip and knee replacements over the last few years is the result of increases in the number of operations performed in England; Wales has not seen similar growth.
Hip replacement surgery
According to these 2010/11 data, 83,014 hip replacement operations (95%) took place in England and 4024 operations took place in Wales. There are four types of organisation in England carrying out hip replacement surgery (Table 5) (note: there are no NHS treatment centres or independent sector treatment centres in Wales).
Organisation type | Percentage of procedures in 2010/11 |
---|---|
NHS hospitals | 67 |
NHS treatment centres | 3 |
Independent sector hospitals | 26 |
Independent sector treatment centres | 5 |
There have been no major changes in these proportions over the last 5 years although there has been a constant, very slight increase in the proportion of operations carried out by NHS hospitals over this time period and a slight decrease in the proportion carried out by NHS treatment centres. Annual fluctuations between types of provider have been small and the proportion of operations for each type of provider in 2010/11 is within two percentage points of the figures from 2006/7. In total, 93% of patients at independent sector hospitals and independent sector treatment centres were reported to be ‘fit and healthy’ or with ‘mild’ disease (ASA grading system) compared with only 80% at NHS centres.
Type of procedure
The operations carried out across the NHS organisations can be categorised by procedure type as displayed in Table 6.
Procedure type | Overall (68,907 treatments) | NHS hospitals (44,054 treatments) | NHS treatment centres (2075 treatments) |
---|---|---|---|
Cemented | 36 | 38 | 25 |
Cementless | 43 | 42 | 66 |
Hybrid | 3 | 17 | 4 |
RS | 2 | 3 | 4 |
The percentage of primary hip RS undertaken in independent hospitals (5%) is nearly double that carried out at NHS hospitals. Interestingly, at NHS treatment centres, 66% of primary procedures are cementless hip primary procedures, a greater proportion than at any other type of provider.
Background summary
Arthritis is a general term describing pain and inflammation within a joint. It commonly affects the hip, which is a weight-bearing ball and socket joint. The most common causes of the arthritis syndrome are OA and RA.
Osteoarthritis is a degenerative disease in which the degeneration and consequent loss of articular cartilage are associated with synovial inflammation and bone hypertrophy. This leads to symptoms of pain, stiffness and loss of function and mobility. The degeneration can be primary (no specific cause identified) or secondary to a number of intra-articular diseases. Its prevalence is also increased by a number of risk factors including biomechanical, constitutional and genetic factors. OA is by far the most common arthritis of the hip and is diagnosed clinically and by imaging. There are difficulties in estimating the disease burden of OA because of variable diagnostic criteria. However, there are an estimated 2.8 million patients in the UK alone who have the disease and current projections estimate that 10% of the world’s population aged ≥ 60 years will be affected at some point. Estimates of the annual incidence of RA suggest that 10,000–20,000 people develop RA in the UK each year. Although the disease may develop in patients at any age, onset is classically between the ages of 40 and 60 years. This is especially important in light of the ageing population as OA and RA mostly affect elderly people with comorbidities. Although the natural history of OA varies between affected joints, the prognosis of hip OA is particularly poor. Approximately 10–40% of cases of RA manifest within the hip joint.
The economic impact of arthritis is vast, both because of direct costs to the health-care system, community and social services and because of indirect costs from lost productivity and early mortality. In the present economic climate in which health-care spending must be carefully justified, the implications of increasing demand for the treatment of arthritis of the hip has led to intense discussion about the cost-effectiveness of new technologies and treatment options. To aid this comparison, different tools such as the OHS and the HHS have been developed and validated for the assessment and monitoring of patient outcomes.
Non-surgical and surgical treatments exist for the management of arthritis to provide symptomatic relief in the short term and to avoid progressive joint damage and improve quality of life in the longer term. Surgical options, including THR, are usually considered for patients with symptoms unmanageable through conservative management. The surgical interventions are believed to be cost-effective interventions that maximise cost per QALY gained. Patient selection criteria, amount spent and outcomes for hip replacement surgery vary across geographical location, hospital and surgeon. The NCC-CC and NICE have developed guidelines to assist clinicians with making clinical decisions about whether or not a patient requires a hip replacement; however, there still exist inconsistencies in surgeries offered at different NHS centres.
Total hip replacement is the predominant surgical intervention for the treatment of arthritis in the UK and is highly successful. Hip replacements can be categorised and compared according to their components, fixation methods, femoral head size and revision rates. For example, there are many different brands of prosthesis for a surgeon to choose from, with fixation types split into cemented, cementless or hybrid, in addition to the option of RS. Failure of the articulations and need for revision surgery are important considerations, especially considering the growing number of primary procedures that are taking place and the overall increasing revision burden. Requirements for revision include instability/dislocation, aseptic loosening and osteolysis, periprosthetic fracture and infection, and NICE recommends periodic follow-up to help identify such issues.
Chapter 2 Definition of the decision problem
Decision problem
This report aims to evaluate the clinical effectiveness and cost-effectiveness of THR and hip RS for the treatment of pain and disability in people with arthritis. More specifically, we aim to investigate, in people with pain and disability resulting from arthritis of the hip for whom non-surgical management has failed:
-
the clinical effectiveness and cost-effectiveness of different types of elective primary THR compared with primary hip RS in those suitable for both procedures
-
the clinical effectiveness and cost-effectiveness of different types of primary THR compared with each other in those not suitable for hip RS.
Overall aims and objectives
-
To undertake a systematic review of the clinical effectiveness and cost-effectiveness of (a) different types of primary THR compared with RS for people in whom both procedures are suitable and (b) different types of primary THR compared with each other for people who are not suitable for hip RS and to investigate factors that influence benefits and costs. If data are sufficient, the influence of patient- and intervention-related factors on the magnitude of treatment effects will be explored through subgroup analysis and meta-regression.
-
To further develop the cost-effectiveness and cost–utility models published in TA4425 using updated NJR data and model inputs when available.
-
To report on findings and make recommendations for future research.
This report aims to evaluate the clinical effectiveness and cost-effectiveness of THR and RS for the treatment of pain and disability in people with arthritis [Table 7 provides a summary of the population, intervention, comparator/control and outcome (PICO)].
PICO | Final scope issued by NICE (17/01/13)a | Decision problem addressed in the assessment report |
---|---|---|
Population | People with pain or disability resulting from arthritis of the hip for whom non-surgical management has failed | People with pain or disability resulting from end-stage arthritis of the hip for whom non-surgical management has failed |
Intervention |
|
|
Comparators | Different types of primary THR and hip RS will be compared for people in whom both procedures are suitable Different types of primary THR will be compared for people in whom hip RS is not suitable The different types of hip replacement that will be considered separately are dependent on the available evidence, but may include hip replacements with components made from different materials (metal, ceramic, polyethylene, ceramicised metal); cemented, cementless or hybrid prostheses; prostheses with differing femoral head sizes; prostheses with differing revision rates |
Different types of primary THR and hip RS will be compared for people in whom both procedures are suitable Different types of primary THR will be compared for people in whom hip RS is not suitable |
Outcomes | The outcome measures to be considered include functional result, pain, bone conservation, revision rates, radiosteriometric analysis to assess prosthesis movement, dislocation rates, adverse effects of treatment (peri- and postprocedural) including degradation products when appropriate, health-related quality of life and mortality | Outcome measures considered include function, pain, bone conservation, revision rates (device failure/revision rates/time to revision), radiosteriometric analysis (to assess prosthesis movement), radiological results, dislocation rates, health-related quality of life and mortality Adverse events include peri- and postprocedural complications (e.g. infection, nerve palsy, dislocation rates, femoral neck fracture, metallosis, muscle weakness) and metal and other degradation products |
Economic analysis | The reference case stipulates that the cost-effectiveness of treatments should be expressed in terms of incremental cost per QALY. The reference case stipulates that the time horizon for estimating clinical effectiveness and cost-effectiveness should be sufficiently long to reflect any differences in costs or outcomes between the technologies being compared. Costs will be considered from NHS and Personal Social Services perspectives | Cost-effectiveness outcomes include mean difference in costs and clinical effectiveness measures or utility measures, ICERs, uncertainty measures, ceiling WTP ratios and probabilities from CEAC |
Different types of THR to be considered | If the evidence allows, subgroups based on activity levels will be compared. Guidance will be issued in accordance with CE marking only. If the recommendations remain based on long-term performance (revision rates, for example ODEP ratings), the collection and monitoring of performance data and arrangements for the effective implementation of such recommendations should be considered | With components made from different materials (metal, ceramic, polyethylene, ceramicised metal); cemented, cementless or hybrid prostheses; prostheses with differing femoral head sizes |
Chapter 3 Joint registries
Description of the three largest international registries
National joint registries have improved the recording of interventions, patient outcomes, implant survival and different surgical techniques for joint replacement. They aim to collect data on large samples, that is, countrywide to improve the outcome of replacement surgery for patients. Interest in national registries has continued to grow and annual reporting from the registries is important for decision-makers, academia and the various industry professionals. Registries available worldwide include those from the UK, Canada, Australia, New Zealand, Sweden, Italy, Norway and Denmark (among others). We conducted a review of the recent annual reports published from these databases. A summary of the three longest-established joint registries is provided for information (Table 8 and following sections).
Name | Country | Year established | Lifetime reporting | Most recent report | Data collected |
---|---|---|---|---|---|
NJR | England and Wales | 2002 | 10 years | 2011, surgical data to 31 December 2010 | Reports a large number of process and outcome variables across England and Wales, including operation totals, provider sector and type; patient characteristics and procedure details; implant and operation details; implant survival (88.6%); compliance (85.2%) |
Swedish Hip Arthroplasty Register | Sweden | 1979 | 33 years | 2010 | Reports a large number of outcome variables at unit and aggregate county council levels, including reported health gains (EQ-5D index gain after 1 year); patient satisfaction after 1 year; short-term complications after 2 years; 10-year implant survival (95%); compliance (98.5%) |
Australian Orthopaedic Association National Joint Replacement Registry | Australia | 1999 | 13 | 2012 | Reports outcome variables across all states: 10-year implant survival (95%); RS reported to be 1.6% of procedures; compliance (93.9%) |
Australian Orthopaedic Association National Joint Replacement Registry
The Australian Orthopaedic Association established the National Joint Replacement Registry (AOANJRR) in 1999. At that time, outcomes of surgery in Australia were unknown. The registry began data collection in South Australia on 1 September 1999 followed by the inclusion of each of the Australian states until 2002. 95 The register was expanded to include other joint replacements in November 2007, with all hospitals undertaking joint replacement in Australia approving participation in the collection of additional data. The number of hip replacements has been steadily increasing since 1999, with > 37,000 hip replacements undertaken in Australia in 2012. 95
The most recent report from the AOANJRR discussed the large increase in revision hip procedures in Australia. 95 In 2010, revision procedures represented 11.3% of all hip replacements, but by 2011 this had increased to 12.5%. The authors associated this increase with the DePuy ASR hip (discontinued metal-on-metal hip replacement) and its reported problems. The use of primary RS had declined by 39.7% between 2010 and 2011, accounting for only 1.6% of all hip procedures. In 2012 a reduction in the use of new hip prostheses and prosthetic combinations was reported. In 2010 there were 330 combinations being used in Australia; this had reduced to 97 in 2011.
The Swedish Hip Arthroplasty Register
The Swedish Hip Arthroplasty Register (SHAR) is entering its 33rd year of activity. 96 National coverage for 2010 was 98.5% and 15,935 primary THRs were performed. The registry collects data on all implant types, surgical techniques and reoperation frequency. Individual patient data (IPD) such as age, sex, diagnosis, surgical technique and type of implant used are recorded and, since 2002, patient-reported outcome measures (PROMs) such as pain relief, satisfaction and health-related quality of life have been included. The response rate for PROMs at the 1-year follow-up is just over 90%.
All units in Sweden (78 hospitals) that carry out total hip arthroplasty, both public and private, are included in the registry. The registry’s aim is to identify predictors for both good and poor outcomes. 96 In international comparisons, Sweden has the world’s highest reported 10-year implant survival rate for total hip arthroplasties. At county council level there are no large and significant differences that are detectable at unit level. The 10-year survival rate of the most common implants was > 95% in 2010. 96 The 2010 report stated that the potential for improvement lies chiefly among certain patient groups. Sweden reports the lowest frequency of revision worldwide. However, it states that problem areas still exist and that these can be overcome with systematic local analyses and subsequent improvement work.
National Joint Registry for England and Wales
The NJR aims to improve patient safety and clinical outcomes by providing information to patients and to all those involved in the management and delivery of joint replacement surgery. This is achieved by collecting data to monitor the effectiveness of hip, knee and ankle replacement surgery and prosthetic implants. 36
The NJR was established in October 2002 and began collecting data on hip and knee replacement operations on 1 April 2003. The most recent report36 was from the period 1 April 2010–31 March 2011 and also included statistics on joint replacement activity and a survivorship analysis of hip replacement surgery using data from 1 April 2003 to 31 December 2010. 36 The NJR is one of the largest registries with over one million recorded procedures and a compliance rate of 85.2% (from 1 April 2003 to 31 March 2010). Compliance has shown a steady upwards trend since 2003. 36
Quality assessment of the NJR36 is undertaken as a part of the annual reporting of the NJR process using robust statistical techniques. The following factors are considered: random variation, differences in surgical case mix and factors related to the practice of care. The quality assessment results from 2011 reported:
-
data from 1.2 million procedures
-
a sophisticated method of classifying implant components
-
a patient consent rate of 90.4%
-
activity and outcomes data at trust, health board and unit level.
Since 1 April 2009, providers of hip replacement surgery have been required to collect and report PROMs under the terms of the Standard NHS Contract for Acute Services. 36 This means that all providers of NHS-funded surgery are expected to invite patients undergoing this procedure to complete a preoperative PROMs questionnaire in accordance with the relevant guidance. Postoperative questionnaires are then sent to patients following their operation after a specified time period. Data collected in the NJR can be linked to the PROMs data collected by the Health and Social Care Information Centre. 97 The NJR is currently working to extend its own study of the follow-up of PROMs to 12 months. This will allow for investigation of population-level quality-of-life reporting after hip replacement. 36
Summary of national registries
Joint registries, such as those in the UK and Australia, are ‘government’ organisations. Some are funded by fees levied on orthopaedic implant manufacturers, with fund disbursement conducted under the discretion of the registry steering committee. Although the costs associated with the development and maintenance of national joint registries vary, registries are considered a beneficial medical development because of their ability to detect poorly performing implants at a national level.
The three national registries summarised here report long-term data and have compliance rates of 83.2% (NJR), 98.5% (SHAR) and 93.9% (AOANJRR). Implant survival rates are reported as 88.6%, 95% and 95% at 9, 10 and 10 years, respectively. In England and Wales the incorporation of new PROMs data is planned, which will allow for linkage between activity and patient outcomes.
Chapter 4 Assessment of evidence
Methods for the review of clinical effectiveness
A protocol was developed and approved by NICE (www.nice.org.uk/nicemedia/live/13690/62831/62831.pdf). General principles were applied as recommended by the NHS Centre for Reviews and Dissemination (CRD). 98
This report contains reference to confidential information provided as part of the NICE appraisal process. This information has been removed from the report and the results, discussions and conclusions of the report do not include the confidential information. These sections are clearly marked in the report.
Identification of studies
Initial scoping searches were undertaken in MEDLINE in October 2012 to assess the volume and type of literature relating to the assessment question. The scoping searches also informed development of the final search strategies (see Appendix 1). An iterative procedure was used to develop these strategies with input from clinical advisors and previous HTA reports (e.g. Vale et al. ,19 de Verteuil et al. 11). The strategies have been designed to capture generic terms for arthritis, THR and RS.
Search strategies
Final searches were undertaken in November and December 2012 (see Appendix 1) and were date limited from 2002 (the date of the most recent NICE guidance in this area25). Searches of the clinical effectiveness literature were restricted to RCTs and systematic reviews; additional searches were undertaken to capture literature relating to costs, resource use, utilities, cost-effectiveness, cost-effectiveness models and registries to inform the survival and cost-effectiveness analysis.
The following main sources were searched to identify relevant published and unpublished studies and studies in progress:
-
electronic bibliographic databases
-
contact with experts in the field
-
references of included studies
-
screening of relevant websites.
The following databases of published studies were searched: MEDLINE, MEDLINE In-Process & Other Non-Indexed Citations, EMBASE, Science Citation Index and Conference Proceedings CitationIndex – Science, The Cochrane Library [specifically the Cochrane Database of Systematic Reviews (CDSR), Cochrane Central Register of Controlled Trials (CENTRAL), Database of Abstracts of Reviews of Effects (DARE), NHS Economic Evaluation Database (NHS EED), HTA database], Current Controlled Trials, ClinicalTrials.gov and UK Clinical Research Network (UKCRN) Portfolio Database. The search strategies were initially developed for MEDLINE and were adapted as appropriate for other databases.
The reference lists of included studies and relevant review articles were checked and the following websites of hip implant manufacturers were screened for relevant publications:
-
Amplitude
-
Biomet
-
B Braun/Aesculap
-
Comis Orthopaedics
-
Corin
-
DePuy
-
Exactech
-
Finsbury
-
JRI Orthopaedics
-
Implantcast
-
Implants International
-
Lima WG Healthcare
-
Mathys Orthopaedics
-
Medacta UK
-
Orthodynamics
-
Peter Brehm
-
SERF Dedienne santé
-
Smith & Nephew
-
Stanmore Implants Worldwide
-
Stryker
-
Symbios SA
-
Waldemar Link
-
Wright Medical UK
-
Zimmer, Inc.
Grey literature searches were undertaken using Google (Google Inc., Mountain view, CA, USA) and the online resources of the following regulatory bodies, health services, research agencies and professional societies:
-
British Hip Society
-
British Orthopaedic Association
-
Orthopaedic Research UK
-
ODEP
-
NJR
-
Arthritis Research UK
-
Cochrane Musculoskeletal Group
-
Arthritis Care
-
MHRA
-
American Association of Hip and Knee Surgeons
-
American Academy of Orthopedic Surgeons (AAOS)
-
The Hip Society
-
Royal College of Surgeons
-
Royal College of Surgeons of Edinburgh.
All bibliographic records identified through the electronic searches were collected in a managed reference database.
Inclusion criteria
Study design
-
RCTs.
-
Systematic reviews.
-
Meta-analyses.
Given the wide scope and large amount of identified evidence, we limited studies to those published since 2008 with a sample size of ≥ 100 participants.
Population
-
People with pain or disability resulting from end-stage arthritis of the hip for whom non-surgical management has failed.
Intervention
-
Elective primary THR.
-
Primary hip RS.
Comparator
-
Different types of primary THR compared with RS for people in whom both procedures are suitable.
-
Different types of primary THR compared with each other for people who are not suitable for hip RS.
Outcomes
Clinical effectiveness outcome measures were mortality, validated functional/pain and health-related quality of life total scores, revision rate, implant survival rate and femoral head penetration rate (measure of prosthesis movement). Adverse events included incidence of peri-/postprocedural complications (i.e. implant dislocation, infection, osteolysis, aseptic loosening, femoral fracture and deep-vein thrombosis).
Exclusion criteria
The exclusion criteria were as follows:
-
indications for hip replacement other than end-stage arthritis of the hip
-
revision surgery as the primary procedure of interest
-
abstract/conference proceedings, letters and commentaries
-
non-English language publications.
Study selection process
All retrieved records were collected in a specialised database. All duplicate records were identified and removed. Two reviewers pilot tested an a priori screening form based on the predefined study eligibility criteria. Afterwards, two independent reviewers applied the same inclusion/exclusion criteria and screened all identified bibliographic records for title/abstract (level I) and then for full text (level II). Disagreements over eligibility were resolved through consensus or by a third party reviewer. Reasons for exclusion of full-text papers were documented. The study flow was documented using a Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) diagram. 99
Quality assessment strategy
Two reviewers independently assessed the risk of bias of individual studies using validated tools |(see Appendix 2). 100,101 Any disagreements between the two reviewers were resolved by a third reviewer through discussion.
Randomised controlled trials were assessed using the Cochrane Collaboration risk of bias tool,100 which covers the following domains of threat to internal validity: selection bias (randomisation sequence generation, treatment allocation concealment), performance bias (blinding of participants/personnel), detection bias (blinding of outcome assessors), attrition bias (incomplete outcome data), reporting bias (selective outcome/analysis reporting) and other prespecified bias [e.g. funding source, adequacy of statistical methods used, type of analysis (intention to treat/per protocol), imbalance in the distribution of baseline prognostic factors between the compared treatment groups]. The risk of bias assessment results fall into three distinct categories of high, low and unclear risk of bias. For each RCT, the risk of bias for the performance, detection and attrition bias domains was assessed for a priori defined groups of subjective (e.g. patient-administered clinical and functional scores) and objective (e.g. mortality, revision, survival, radiography result, complications) outcomes separately. Afterwards, the within-study summary risk-of-bias rating across all of the domains was derived for subjective and objective outcomes separately. The decision for determining the within-study summary risk of bias was based on the ratings prevailing for the selection, performance and detection bias domains. At data synthesis stage, the across-study average summary risk of bias was determined and assigned to each outcome of interest.
The methodological quality of included systematic reviews was assessed with the Assessment of Multiple Systematic Reviews (AMSTAR) tool,101 which covers the following domains: (1) research question, (2) inclusion/exclusion criteria, (3) search strategy (at least two major electronic databases), (4) data extraction by independent reviewers, (5) assessment of risk of bias by independent reviewers, (6) consideration of risk of bias in the analysis, (7) exploration of heterogeneity and (8) publication bias. For convenience of presentation, the methodological quality of each systematic review was graded according to the number of items satisfied as follows: high (range 9–11), medium (range 5–8) and low (range 0–4).
Grading the overall quality of clinical effectiveness evidence
The overall quality of evidence for each preselected (i.e. gradable) outcome across studies was assessed using the systematic approach developed by the Grading of Recommendations, Assessment, Development and Evaluation (GRADE) Working Group (see www.gradeworkinggroup.org).
The GRADE approach102 indicates levels of confidence in the observed treatment effect estimate(s), which are categorised as high, moderate, low or very low. The grading of overall quality of evidence for each gradable outcome is based on assessments across five domains: (1) summary risk of bias across studies per gradable outcome (internal validity across studies, study limitations), (2) consistency of results (heterogeneity), (3) directness of the evidence (applicability of the results, indirect treatment comparisons), (4) precision of the results (the width of the 95% CI around the estimate) and (5) publication/reporting bias (detection of asymmetry in the funnel plot, selective outcome reporting). The definitions and explanations of the grading levels and the grading process across the five domains are presented later in this chapter (see Tables 35 and 43).
The gradable outcomes, selected according to their meaningfulness and importance for decision-making, were the following: HHS, WOMAC score, revision, mortality, femoral head penetration rate and implant dislocation.
Data extraction strategy
The relevant data were extracted from included studies independently by one reviewer using a data extraction form informed by the CRD. 103 The extracted data were cross-checked by a second reviewer. Uncertainty and/or any disagreements with the second researcher were resolved by discussion. The extracted data were entered into summary and full extraction tables (see Appendices 3 and 4, respectively). The extracted information included the following:
-
Study characteristics (i.e. authors, country, design, study setting, sample size, funding source, duration of follow-up and information relevant to risk-of-bias assessment such as generation of randomisation, allocation concealment, blinding, completeness of outcome ascertainment, patient withdrawals/attrition for randomised trials; for observational studies and non-randomised trials, and information on potential confounding was additionally ascertained).
-
Patient baseline characteristics [i.e. inclusion/exclusion criteria, number of enrolled/analysed participants, age, race, sex, body mass index (BMI), underlying conditions, concomitant conditions, co-interventions, disability, activity levels, function, pain intensity and quality of life and disease-specific measures such as the OHS30 and HHS31].
-
Experimental treatment characteristics (e.g. type – THR, RS; training/experience of the operator and postoperative rehabilitation staff; method of fixation – cemented, cementless, hybrid; bearing surface material – metal-on-metal, ceramic-on-ceramic, polyethylene-on-metal; femoral head size; name/brand and country of manufacturer; postoperative rehabilitation).
-
Outcome characteristics [e.g. definition; timing of measurement; scale of measurement – dichotomous, continuous; measures of association – mean difference (MD), risk ratio (RR), odds ratio (OR), hazard ratio (HR)]. Statistical test results and measures of variability were also extracted [standard deviation (SD), 95% CI, standard error (SE), p-value).
Any additional relevant information found in multiple publications of included studies was also extracted. For studies of clinical effectiveness in which summary measures and 95% CIs for the association between the treatments were not reported, MDs with 95% CIs were calculated if data allowed (t-tests for independent samples and continuous outcomes and RRs for dichotomous outcomes). No RRs and 95% CIs were estimated for individual studies that observed zero events in one or both treatment arms. The 95% CIs and SEs were used to derive SDs or vice versa. All calculated parameters were entered into the data extraction sheets.
Data management
Study, treatment, population and outcome characteristics were summarised in text, evidence and summary tables. The study results were compared qualitatively and quantitatively in text and summary tables. For each outcome of interest, the effectiveness of treatments reported in individual studies was compared as follows:
-
different types of primary THR compared with each other for people who are not suitable for hip RS
-
different types of primary THR compared with RS for people in whom both procedures are suitable.
Meta-analysis
The decision to pool individual study results was based on a degree of similarity with respect to methodological and clinical characteristics of studies under consideration (e.g. design, population, comparator treatment and outcome). Estimates of post-treatment MDs for continuous outcomes and RRs for binary outcomes (except for rare events) of individual studies were pooled using a DerSimonian and Laird random-effects model. 104 The choice of this model was based on the assumption that some residual clinical and methodological diversity will exist across pooled studies. Dichotomous outcomes with low event rates (5.0–10.0%) were pooled as RRs using a Mantel–Haenszel fixed-effects model. Dichotomous outcomes for studies with very low event rates (≤ 5.0%) or zero events in one of the treatment arms were pooled as ORs using a Peto fixed-effects model. 105
Trials were not pooled if the mean and/or SD for the continuous outcome of interest could not be ascertained.
The degree of statistical heterogeneity across pooled studies was determined through inspection of the forest plots, Cochran’s Q and the I2 statistic. The presence of heterogeneity was judged according to predetermined levels of statistical significance (chi-squared p < 0.10 and/or I2 > 50%). Statistical pooling was performed using The Cochrane Collaboration software package Review Manager version 5.2 (The Cochrane Collaboration, The Nordic Cochrane Centre, Copenhagen, Denmark).
Publication bias
It was planned to examine the extent of publication bias, given a sufficient number of data points, by visual inspection of funnel plots with respect to plot asymmetry as well as using linear regression tests. 106
Analysis to explore heterogeneity
If data allowed, exploration of study-level clinical and methodological sources of statistical heterogeneity of effect estimates across studies was planned through a priori-defined subgroup analysis (i.e. age, sex, function), sensitivity analysis (risk of bias item-specific ratings, intention-to-treat vs. per-protocol analysis) and meta-regression.
Data synthesis and interpretation
For both RCTs and systematic reviews, the comparison and synthesis of results for each outcome of interest was summarised and categorised as conclusive evidence (either there is a ‘difference’ or there is ‘no difference’) or inconclusive evidence (indeterminate results because of statistical uncertainty, statistical heterogeneity/inconsistency in treatment effects and/or incomplete information). This conclusion was based on several factors determined separately or in combination such as statistical significance of the observed difference (p-value), magnitude of the effect estimate, width of the 95% CIs, a minimal clinically important difference (MCID) for a given outcome, if known, and consistency in terms of effect direction and statistical significance. We ascertained the MCIDs for clinical/functional measures such as HHS (MCID range 7–10), OHS (MCID range 5–7), WOMAC score (MCID 8) and EQ-5D score (MCID 0.074) from previous empirical research evidence. 107–109
Evidence was considered conclusive in showing a ‘difference’ if a treatment effect estimate was statistically significant and the 95% CI included the MCID for any given outcome. Evidence was considered conclusive in showing ‘no difference’ if a treatment effect estimate was not statistically significant and the 95% CI around it was narrow enough to exclude the MCID for any given outcome. Alternatively, evidence was considered conclusive in showing ‘no difference’ if a treatment effect estimate was statistically significant but the 95% CI around it did not include the MCID for an outcome.
Evidence was considered inconclusive if a treatment effect estimate was not statistically significant and had 95% CIs that were sufficiently wide to include the MCID or any large effect size values. (Because for such studies the possibility of type II error cannot be ruled out, the observed non-significant results should not be interpreted as if there is no difference between the treatment effects. The lack of precision around the effect estimates may be a result of an insufficient sample size, a short follow-up period and/or low event counts, leading to inadequate study power and an increased chance of a type II error.)
The results were also considered inconclusive if there were partially missing data for continuous outcomes (e.g. reporting treatment arm-specific means without SDs; reporting only p-values for the between-treatment difference) or zero events for binary outcomes in both treatment arms. Evidence from studies showing inconsistent results, that is, significant effects but in opposing directions, was also classified as inconclusive.
Evidence from systematic reviews not reporting pooled results of RCTs (i.e. reporting only narrative syntheses), those reporting inappropriate pooling methods (e.g. indirect naive comparison of single group cohorts; pooling of studies of different design) or those reporting inconsistent summary findings was also considered inconclusive.
Industry submissions regarding effectiveness of treatments
The included clinical effectiveness evidence was compared with the evidence submitted by industry. These industry submissions will be discussed in Appendix 5.
Results of the review of clinical effectiveness
Search results
A total of 2469 records were identified through our searches of different sources. The removal of duplicates left 1522 records to be screened. Of these, 1281 records were excluded as irrelevant at title and abstract screening, leaving 241 potentially relevant records. Of these 241 full-text records screened, 146 were excluded, leaving 95 potentially relevant full-text records, of which 58 were additionally excluded based on publication date (published before 2008 unless a companion paper to an included study) and sample size (< 100 participants). The remaining 37 records were included in the review. 107,110–145
The flow chart outlining the process of identifying relevant literature can be found in Figure 9.
A list of records excluded at full-text screening with reasons for exclusion is provided in Appendix 6. The main reasons for exclusion were the comparison of different surgical/operative approaches (n = 4211,146–186), study published before 2008 (unless a companion paper to an included study) (n = 3319,39,187–217) and study includes < 100 participants (n = 2583,218–241).
A separate search in December 2012 of Clinical Trials.gov, Current Controlled Trials, the UKCRN Portfolio Database and the National Library of Medicine (NLM) Gateway Health Services Research Projects in Progress (HSRProj) database retrieved 511 potential trials or health services research projects. After screening titles and full records (if available), 20 clinical trials and one health services research project were identified, one of which130 had already been identified from the original database search (see Appendix 7). The identified clinical trials were considered potentially relevant based on the available information. The trials were ongoing or completed since 2009 or their status was unknown.
The included 37 records represent 16 RCTs107,110–136,145 and eight systematic reviews. 137–144
Six of the 16 RCTs were represented by multiple publications:
-
Capello et al. ,115 D’Antonio et al. 116,117 and Mesko et al. 118
-
Corten et al. ,119,122 Laupacis 2002120 and Bourne and Corten121
-
Vendittoli et al. ,132,133,136 Girard et al. 134 and Rama et al. 135
These six RCTs are cited as follows: Bjørgul et al. 110 Engh et al. ,113 Capello et al. ,115 Corten et al. ,119 Costa et al. 130 and Vendittoli et al. 132 Thirteen RCTs110,112,113,115,119,123–129,145 and five systematic reviews 137–141 comparing different types of primary THR and three RCTs130–132 and three systematic reviews142–144 comparing primary THR with RS were finally included in the current review.
In the following sections we will begin by reporting the findings for the comparison of different types of THR and will then report the findings for the comparison between THR and RS.
Comparison of different types of total hip replacement
Study and participant characteristics
Randomised controlled trials
The study and participant characteristics of the 13 included RCTs110,112,113,115,119,123–129,145 are summarised in Table 9. More details can be found in Appendices 3 and 4. Briefly, four RCTs were conducted in the USA,113,115,125,127 one in the UK,112 one in Australia,123 two in Norway110,126 two in the Republic of Korea128,129 and three in Canada. 111,119,124 A total of 3175 participants were randomised across the 13 RCTs, with the number of participants in each study ranging from 100124,128,145 to 557. 123 The mean age of participants across the RCTs ranged from 45129 to 72123,145 years. The proportion of women across the studies ranged from 24%129 to 73%. 110 The length of follow-up of the studies ranged from 3 months119 to 20 years. 119,129 The proportion of participants diagnosed with primary OA was reported for nine studies110,112,113,115,123,124,127–129 and ranged from 14%129 to 96%. 123
Study characteristic | Metric |
---|---|
Geographical region | UK (n = 1); Australia (n = 1); Norway (n = 2); the Republic of Korea (n = 2); Canada (n = 3); USA (n=4) |
Total number of randomised participants | 3175 (range 100–557) |
Mean age (years) | Range 45–72 |
Female participants (%) | Range 24–73 |
Length of follow-up | Range 3 months–20 years |
Diagnosis of primary OA (%) | Range 14–96 |
Comparison of THR interventions in the included RCTs was based on differences in hip replacement implant components (e.g. acetabular cup/shell, femoral stem and femoral head) according to their composition,127 design,115,128 bearing surface,113,115,124–126,145 fixation method110,112,119,129 and component size. 123 Table 10 shows the distribution of RCTs across the THR comparison categories.
Basis of comparison | Study |
---|---|
|
Bjørgul 2010110 |
Angadi 2012112 | |
|
McCalden 2009145 |
Engh 2012113 | |
|
Capello 2008115 |
|
Corten 2011119 |
|
Howie 2012123 |
|
Lewis 2008124 |
|
Amanatullah 2011125 |
Capello 2008115 | |
Kadar 2011126 | |
|
Healy 2009127 |
|
Kim 2011128 |
|
Kim 2011129 |
Reported outcomes across the 13 RCTs varied. Most RCTs reported HHS110,112,113,115,119,124–129,145 and risk of revision. 112,113,115,119,123–125,127–129 The follow-up of outcome assessments ranged from 3 months119 to 20 years. 119,129 Outcomes reported in the included studies can be found in Appendix 8. A summary of the functional/clinical and quality of life measures/tools used is provided in Appendix 9.
Systematic reviews
The five included systematic reviews137–141 evaluated RCTs and non-RCTs of the clinical effectiveness of THR (see Appendix 3). The primary focus of these systematic reviews was the comparison of the effects of different cup fixation methods (cemented vs. cementless)137–139 and materials used for implant articulations140,141 on postoperative clinical/functional scores (HHS, OHS)137,138,140 and risk of revision rate. 138,139 Searches in these systematic reviews were undertaken between July 2007141 and June 2011. 139 Further details on specific outcomes reported in the included systematic reviews can be found in Appendix 8.
Risk of bias and methodological quality
Risk of bias in the randomised controlled trials
The risk-of-bias assessments for the 13 included RCTs comparing different types of THR are presented in risk-of-bias tables (see Appendix 2), the summary table (Table 11) and the risk-of-bias graph (Figure 10). Overall, four112,119,123,128 of the 13 RCTs reported an adequate method for random sequence generation and eight110,112,119,123–126,129 reported adequate treatment allocation concealment (low risk of bias). A greater proportion of the RCTs were rated as having a low risk of performance and detection bias for objective (e.g. mortality, dislocation) than for subjective (e.g. patient-administered functional scores) outcomes (92–100% vs. 15–23%, respectively). For at least eight of the RCTs, it was unclear whether or not awareness of THR type would influence the ascertainment of clinical/functional scores by patients/study personnel (performance bias)110,112,113,115,124,125,127–129,145 or outcome assessors (detection bias). 112,113,115,124–126,128,129 Most RCTs failed to report the blinding status of the patients, study personnel and/or outcome assessors. Eight RCTs were judged as having a low risk of attrition bias. Five RCTs115,124,125,127,128 were judged as being at high risk for selective outcome and/or analysis bias. The risk of other biases (e.g. funding source, baseline imbalance in important characteristics, inappropriate analysis) for about one-third of the RCTs was judged to be high.
Study | Selection bias: random sequence generation | Selection bias: allocation concealment | Performance bias: subjective (e.g. patient reported) | Performance bias: objective (e.g. mortality, radiography, dislocation) | Detection bias: subjective (e.g. patient reported) | Detection bias: objective (e.g. mortality, radiography, dislocation) | Attrition bias: subjective (e.g. patient reported) | Attrition bias: objective (e.g. mortality, radiography, dislocation) | Reporting bias: selective reporting of the outcome, subgroups or analysis | Other bias [funding source, adequacy of statistical methods used, type of analysis (ITT/PP), baseline imbalance in important characteristics] |
---|---|---|---|---|---|---|---|---|---|---|
Amanatullah 2011125 | ? | + | ? | + | ? | + | ? | – | – | – |
Angadi 2012112 | + | + | ? | + | ? | + | + | + | + | ? |
Bjørgul 2010110 | ? | + | ? | + | + | + | – | – | + | – |
Capello 2008115 | ? | ? | ? | + | ? | + | + | + | – | – |
Corten 2011119 | + | + | + | + | + | + | + | + | + | + |
Engh 2012113 | ? | ? | ? | ? | ? | + | – | – | + | ? |
Healy 2009127 | – | – | ? | + | – | + | + | + | – | + |
Howie 2012123 | + | + | NA | + | NA | + | NA | + | + | + |
Kadar 2011126 | ? | + | + | + | + | + | + | + | + | + |
Kim 2011128 | + | ? | ? | + | ? | + | + | + | – | + |
Kim 2011129 | ? | + | ? | + | ? | + | + | + | + | ? |
Lewis 2008124 | ? | + | ? | + | ? | + | ? | ? | – | ? |
McCalden 2009145 | ? | ? | ? | + | + | + | + | + | + | – |
Methodological quality of the systematic reviews
The assessment of methodological quality of the five included systematic reviews comparing different types of THR is presented in Table 12 and the quality assessment sheets (see Appendix 2). Briefly, based on the number of methodological items that were satisfied, two systematic reviews137,140 were judged to be of high quality (falling into the score range of 9–11) and two systematic reviews138,141 were of medium quality (falling into the score range of 5–8). The one remaining systematic review139 was judged to be of low quality (falling into the score range of 0–4). The specific unmet methodological items related to inappropriate analysis, absence of duplicate study selection, limited literature search, failure to address issues of publication bias and no information on conflicts of interest.
Study | Was an a priori design provided? | Was there duplicate study selection and data extraction? | Was a comprehensive literature search performed? | Was the status of publication (i.e. grey literature) used as an inclusion criterion? | Was a list of studies (included and excluded) provided? | Were the characteristics of the included studies provided? | Was the scientific quality of the included studies assessed and documented? | Was the scientific quality of the included studies used appropriately in formulating conclusions? | Were the methods used to combine the findings of studies appropriate? | Was the likelihood of publication bias assessed? | Was the conflict of interest stated? | Overall |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Clement 2012139 | Yes | No | No | Yes | Yes | Yes | No | No | No | No | No | Low quality |
Pakvis 2011138 | Yes | No | Yes | Yes | No | Yes | Yes | No | No | No | No | Medium quality |
Sedrakyan 2011140 | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes | Yes | Yes | No | High quality |
Voigt 2012137 | Yes | Yes | Yes | CA | Yes | Yes | Yes | Yes | Yes | Yes | Yes | High quality |
Yoshitomi 2009141 | Yes | Yes | Yes | CA | Yes | Yes | Yes | NA | No | Yes | No | Medium quality |
Clinical effectiveness findings for the comparison of different types of total hip replacement
This section summarises the evidence from the 13 RCTs110,112,113,115,119,123–129,145 and five systematic reviews. 137–141
The reported outcomes for this section were HHS (12 RCTs;110,112,113,115,119,124–129,145 three systematic reviews137,138,140), WOMAC score (four RCTs119,124,129,145), McMaster Toronto Arthritis Patient Preference Questionnaire (MACTAR) score (one RCT119), Merle d’Aubigné and Postel hip score (one RCT119), University of California Los Angeles (UCLA) activity score (one RCT129), OHS (one systematic review137), SF-12 score (three RCTs;124,125,145 one systematic review140), risk of revision (10 RCTs;112,113,115,119,123–125,127–129 five systematic reviews137–141), mortality (six RCTs110,113,119,123,128,145), femoral head penetration rate (three RCTs113,126,145), implant dislocation (seven RCTs;110,112,115,123–125,127 two systematic reviews139,140), osteolysis (seven RCTs;112,113,115,125,127,129,145 two systematic reviews138,139), aseptic loosening (five RCTs;112,113,119,124,127 one systematic review139), femoral fracture (three RCTs113,115,127), infection (four RCTs112,124,125,127) and deep-vein thrombosis (one RCT125).
Neither the RCTs nor the systematic reviews reported any evidence for the following clinical effectiveness outcomes:
-
HOOS
-
LISOH
-
AAOS Hip and Knee Questionnaire
-
Arthritis Impact Measurement Scale (AIMS)
-
Nottingham Health Profile (NHP) questionnaire
-
EQ-5D
-
SF-36
-
time to revision
-
pain score [visual analogue scale (VAS)].
Summary results for the following outcomes are presented separately for RCTs and systematic reviews in the following sections. The outcomes of interest are as follows:
-
mortality
-
validated functional/pain (total scores): HHS, OHS, pain score (VAS), Merle d’Aubigné and Postel score, UCLA activity score, WOMAC, MACTAR, HOOS, LISOH, AAOS Hip and Knee Questionnaire, AIMS
-
health-related quality of life (total scores): EQ-5D, SF-36/SF-12, NHP
-
revision rate (risk of revision, mean time to revision)
-
femoral head penetration rate (measure of prosthesis movement)
-
adverse events (peri-/postprocedural complications): implant dislocation, infection, osteolysis, aseptic loosening, femoral fracture, deep-vein thrombosis, muscle weakness, nerve palsy and pulmonary embolism.
Functional/clinical measures
Twelve of the 13 included RCTs comparing different types of THR reported at least some results for the following functional scores measured at different postprocedure follow-up times: HHS (12 studies110,112,113,115,119,124–129,145) WOMAC score (four studies119,124,129,145), MACTAR score (one study119), Merle d’Aubigné and Postel score (one study119) and UCLA activity score (one study129). None of these 12 studies reported measurements of the OHS.
Three of the five included systematic reviews comparing different types of THR reported at least some evidence on HHS137,138,140 and OHS. 137 None of the three reviews reported any summary evidence for WOMAC, MACTAR, Merle d’Aubigné and Postel, and UCLA scores.
Mean HHS at follow-up (range 6 months–10 years) did not differ between the following interventions: cup fixation (two studies;110,112 cemented vs. cementless), cup liner bearing surface (two studies;113,145 cross-linked polyethylene vs. non-cross-linked polyethylene), cup and femoral stem fixation (one study;119 cemented vs. cementless) and femoral head-on-cup liner bearing surfaces (one study;126 cobalt–chromium/oxinium-on-polyethylene vs. cobalt–chromium/oxinium-on-cross-linked polyethylene) (Table 13). The pooled MD for HHS in our meta-analysis of two studies113,145 comparing cup liners made with cross-linked polyethylene compared with non-cross-linked polyethylene was 2.29 (95% CI –0.88 to 5.45), suggesting a non-significant benefit of cross-linked polyethylene cup liners (Figure 11).
Follow-up | Arm-specific estimate, mean (SD or 95% CI) | Difference (p-value or 95% CI) | Number of RCTs (SROB across studies)a | Treatment effect conclusionb |
---|---|---|---|---|
Cup fixation: cemented vs. cementless | ||||
6 months | 90.2 (87.9 to 92.6) vs. 89.1 (86.9 to 91.3)110 | p > 0.05 (NS) | 2 (unclear) | No difference |
2 years | 92.7 (89.6 to 95.8) vs. 94.0 (92.4 to 95.7)110 | p > 0.05 (NS) | ||
5 years | 93.9 (91.6 to 96.2) vs. 91.4 (89.3 to 93.5)110 | p > 0.05 (NS) | ||
10 years | 89.8 (87.0 to 92.6) vs. 87.3 (84.1 to 90.6)110 | p > 0.05 (NS) | ||
10 years | 74.5 (NR) vs. 78.0 (NR)112 | p > 0.05 (NS) | ||
Cup liner bearing surface: XLPE vs. non-XLPE | ||||
1 year | 85.0 (10.3) vs. 83.4 (13.1)145 | MD 1.60 (–3.07 to 6.27)c | 2 (unclear) | No difference |
5 years | 86.0 (13.1) vs. 83.1 (15.4)145 | MD 2.90 (–2.77 to 8.57)c | ||
10 years | 88.0 (14.0) vs. 86.0 (15.0)113 | MD 2.00 (–1.85 to 5.85)c | ||
Pooled estimate of MDc 2.29 (–0.88 to 5.45)113,145 | ||||
Cup shell design: porous-coated shell vs. arc-deposited HA-coated shell | ||||
5 years | 97.0 (NR) vs. 96.4 (NR)115 | p > 0.05 (NS) | 1 (unclear) | Inconclusive |
10 years | 96.0 (NR) vs. 96.7 (NR)115 | p > 0.05 (NS) | ||
Cup and femoral stem fixation: cemented cup/femoral stem vs. cementless cup/femoral stem | ||||
3 months | 41 (12.0) vs. 41 (11.0)119 | MD 0.0 (–3.00 to 3.00)c | 1 (low) | No difference |
6 months | 47 (12) vs. 50 (13)119 | MD –3.0 (–6.32 to 0.32)c | ||
1 year | 52 (10.0) vs. 53 (11.0)119 | MD –1.0 (–3.86 to 1.86)c | ||
3 years | 50 (14.0) vs. 52 (11.0)119 | MD –2.0 (–5.62 to 1.62)c | ||
5 years | 47 (14.0) vs. 48 (13.0)119 | MD –1.0 (–4.88 to 2.87)c | ||
7 years | 44 (15) vs. 46 (14)119 | MD –2.0 (–7.07 to 3.05)c | ||
Femoral head bearing surface: oxinium femoral heads vs. CoCr femoral heads | ||||
2 years | 92 (NR) vs. 92.5 (NR)124 | p > 0.159 (NS) | 1 (unclear) | Inconclusive |
Femoral head-on-cup liner bearing surface: ceramic-on-ceramic vs. metal-on-PE | ||||
5 years | 96.4 (NR) vs. 97.0 (NR)115 | p > 0.05 (NS) | 1 (unclear) | Inconclusive |
10 years | 96.7 (NR) vs. 96.4 (NR)115 | p > 0.05 (NS) | ||
Femoral head-on-cup liner bearing surface: ceramic-on-ceramic vs. ceramic-on-PE | ||||
5 years | NR125 | p > 0.05 (NS) | 1 (unclear) | Inconclusive |
Femoral head-on-cup liner bearing surface: steel-on-PE vs. CoCr-on-PE vs. oxinium-on-PE vs. CoCr-on-XLPE vs. oxinium-on-XLPE | ||||
2 years | 91 (10.8) vs. 91 (8.5) vs. 91 (11.1) vs. 93 (11.3) vs. 88 (9.5)126 | p = 0.7 (NS); ANOVA-based p = 0.5 (NS)c | 1 (low) | No difference |
Femoral stem composition: CoCr vs. titanium | ||||
5 years | 83 (NR) vs. 87 (NR)127 | p = 0.029 (SS) | 1 (high) | Inconclusive |
Femoral stem design: short metaphyseal-fitting stem vs. conventional metaphyseal- and diaphyseal-fitting stem | ||||
3 years | 97.0 (NR) vs. 96.0 (NR)128 | p = 0.79 (NS) | 1 (unclear) | Inconclusive |
Femoral stem fixation: cemented vs. cementless | ||||
18 years | 91 (NR) vs. 90 (NR)129 | p = 0.71 (NS) | 1 (unclear) | Inconclusive |
The evidence for the other comparisons based on cup shell design (porous coated vs. arc-deposited hydroxyapatite coated),115 femoral head bearing surface (oxinium vs. cobalt–chromium),124 femoral head-on-cup liner bearing surfaces (ceramic-on-ceramic vs. metal-on-polyethylene or ceramic-on-polyethylene),115,125 femoral stem composition (cobalt–chromium vs. titanium),127 femoral stem design (short metaphyseal fitting vs. conventional diaphyseal fitting)128 and femoral stem fixation (cemented vs. cementless)129 was judged to be inconclusive.
One systematic review140 reported the pooled MD for the HHS (Table 14). Pooled estimates for the comparison between metal-on-metal and metal-on-polyethylene bearing surfaces at two different follow-up times were not consistent: at 2 years metal-on-metal bearing surfaces gave a significantly higher HHS than metal-on-polyethylene, but at > 2 years there was no significant difference between the two types of THR. The remaining two systematic reviews presented only narrative summaries. 137,138 In summary, for the HHS the systematic review-based evidence was judged to be inconclusive.
Follow-up | Pooled effect estimate (95% CI) | Number of RCTs in MA or narrative synthesis | AMSTAR rating | Treatment effect conclusiona |
---|---|---|---|---|
Cup fixation: cemented vs. cementless | ||||
3 years | NR137 | 2137 | High quality137 | Inconclusive |
2–5 years | NR138 | 3138 | Low quality138 | |
Femoral head-on-cup liner surface: metal-on-metal vs. metal-on-PE | ||||
2 years | MD –2.40 (–4.47 to –0.33) (SS)140 | 4140 | High quality140 | Inconclusive |
> 2 years | MD 1.21 (–2.41 to 4.83) (NS)140 | 2140 | ||
Femoral head-on-cup liner surface: ceramic-on-ceramic vs. ceramic-on-PE | ||||
NR | NR140 | 5140 | High quality140 | Inconclusive |
Femoral head-on-cup liner surface: ceramic-on-PE vs. metal-on-PE | ||||
NR | NR140 | 2140 | High quality140 | Inconclusive |
Femoral head-on-cup liner surface: metal-on-metal vs. ceramic-on-ceramic | ||||
NR | NR140 | 1140 | High quality140 | Inconclusive |
Results from all four RCTs reporting postprocedural mean WOMAC scores indicated statistically non-significant differences between the THR groups compared with respect to cup liner bearing surface (cross-linked polyethylene vs. non-cross-linked polyethylene),145 cup and femoral stem fixation (cemented vs. cementless),119 femoral head bearing surface (oxinium vs. cobalt–chromium)124 and femoral stem fixation (cemented vs. cementless)129 (Table 15). The MD in WOMAC score of –0.12 (95% CI –7.58 to 7.34) observed for one RCT145 suggested no difference between cross-linked polyethylene and non-cross-linked polyethylene cup liners. Results for WOMAC score in the remaining three RCTs119,124,129 were judged to be inconclusive because of incompletely reported data.
Follow-up | Arm-specific estimate, mean (SD or 95% CI) | Difference (p-value or 95% CI) | Number of RCTs (SROB)a | Treatment effect conclusionb |
---|---|---|---|---|
Cup liner bearing surface: XLPE vs. non-XLPE | ||||
1 year | 83.0 (17.2) vs. 81.6 (17.6)145 | MD 1.43 (–5.48 to 8.34)c | 1 (unclear) | No difference |
5 years | 78.0 (19.4) vs. 78.1 (18.2)145 | MD –0.12 (–7.58 to 7.34)c | ||
Cup and femoral stem fixation: cemented cup/femoral stem vs. cementless cup/femoral stem | ||||
NA | Mean domain subscores only119 | – | 1 (low) | NA |
Femoral head bearing surface: oxinium femoral heads vs. CoCr femoral heads | ||||
2 years | 84.9 (NR) vs. 87.0 (NR)124 | p > 0.159 (NS) | 1 (unclear) | Inconclusive |
Femoral stem fixation: cemented vs. cementless | ||||
16 years | 11 (NR) vs. 13 (NR)129 | p = 0.927 (NS) | 1 (unclear) | Inconclusive |
No evidence was identified.
In one RCT119 there was no difference in mean MACTAR scores (at 7 years: mean change difference 0.20, 95% CI –0.74 to 1.14) and Merle d’Aubigné and Postel scores (at 7 years: mean change difference –0.40, 95% CI –1.34 to 0.54) between patients who received a THR with cemented components and those who received a THR with cementless components (Tables 16 and 17). Results from one RCT129 comparing femoral stem fixation (cemented vs. cementless) by the postoperative UCLA activity score were inconclusive because of incomplete data reporting (Table 18).
Follow-up | Arm-specific estimate, mean (SD) | Difference (95% CI) | No. of RCTs (SROB)a | Treatment effect conclusionb |
---|---|---|---|---|
Cup and femoral stem fixation: cemented cup/femoral stem vs. cementless cup/femoral stem | ||||
Mean change (postoperative): | Mean change difference: | 1 (low) | No difference | |
3 months | –5.3 (2.5) vs. –5.2 (2.2)119 | MD 0.10 (–0.51 to 0.71)c | ||
6 months | –6.6 (1.9) vs. –6.4 (2.1)119 | MD 0.20 (–0.33 to 0.73)c | ||
1 year | –7.0 (1.8) vs. –6.9 (2.0)119 | MD 0.10 (–0.41 to 0.61)c | ||
3 years | –6.6 (2.3) vs. –6.4 (2.3)119 | MD 0.20 (–0.46 to 0.86)c | ||
5 years | –6.0 (2.8) vs. –6.2 (2.4)119 | MD –0.20 (–0.45 to 0.55)c | ||
7 years | –6.2 (2.8) vs. –6.0 (2.6)119 | MD 0.20 (–0.74 to 1.14)c |
Follow-up | Arm-specific estimate, mean (SD) | Difference (95% CI) | Number of RCTs (SROB)a | Treatment effect conclusionb |
---|---|---|---|---|
Cup and femoral stem fixation: cemented cup/femoral stem vs. cementless cup/femoral stem | ||||
Mean change (postoperative): | Mean change difference: | 1 (low) | No difference | |
3 months | 5.8 (1.9) vs. 5.6 (2.2)119 | MD 0.20 (–0.34 to 0.74)c | ||
6 months | 6.7 (2.1) vs. 7.0 (2.2)119 | MD –0.30 (–0.87 to 0.27)c | ||
1 year | 7.5 (1.8) vs. 7.4 (2.1)119 | MD 0.10 (–0.43 to 0.63)c | ||
3 years | 7.1 (2.2) vs. 6.9 (2.1)119 | MD 0.20 (–0.41 to 0.81)c | ||
5 years | 6.5 (2.3) vs. 6.6 (2.4)119 | MD –0.10 (–0.77 to 0.57)c | ||
7 years | 6.1 (2.6) vs. 6.5 (2.8)119 | MD –0.40 (–1.34 to 0.54)c |
Follow-up | Arm-specific estimate, mean (SD or 95% CI) | Difference (p-value) | Number of RCTs (SROB)a | Treatment effect conclusionb |
---|---|---|---|---|
Femoral stem fixation: cemented vs. cementless | ||||
16 years | 7.6 (NR) vs. 7.8 (NR)129 | p = 0.814 (NS) | 1 (unclear) | Inconclusive |
The OHS was reported in one systematic review137 comparing cup fixation methods (cemented vs. cementless), but the results were inconclusive (Table 19). This evidence was based on one RCT showing a statistically non-significant result.
Follow-up | Pooled effect estimate (95% CI) | Number of RCTs in MA or narrative synthesis | AMSTAR rating | Treatment effect conclusiona |
---|---|---|---|---|
Cup fixation: cemented vs. cementless | ||||
3 years | NR137 | 1137 | High quality137 | Inconclusive |
Health-related quality of life
Only three RCTs124,125,145 and one systematic review140 reported any comparative evidence for measures of health-related quality of life.
In one RCT,145 at follow-up times of 1 and 5 years, there was no difference in quality of life (on the mental and physical subscales of SF-12) between two groups of patients receiving cross-linked and non-cross-linked polyethylene cup liner bearings (Table 20).
Follow-up | Arm-specific estimate, mean (SD) | Difference (p-value or 95% CI) | Number of RCTs (SROB)a | Treatment effect conclusionb |
---|---|---|---|---|
Cup liner bearing surface: XLPE vs. non-XLPE | ||||
1 year | MCS: 55.79 (7.38) vs. 56.01 (8.55);145 PCS: 42.20 (11.37) vs. 40.86 (11.11)145 | MCS: MD –0.22 (–3.38 to 2.94);c PCS: MD 1.34 (–3.12 to 5.80)c | 1 (unclear) | No difference |
5 years | MCS: 55.24 (8.01) vs. 53.36 (10.13);145 PCS: 37.24 (12.16) vs. 40.00 (11.78)145 | MCS: MD 1.88 (–1.74 to 5.50);c PCS: MD –2.76 (–7.51 to 1.99)c | ||
Femoral head bearing surface: oxinium femoral heads vs. CoCr femoral heads | ||||
2 years | MCS: 53.80 (NR) vs. 52.57 (NR);124 PCS: 45.20 (NR) vs. 49.20 (NR)124 | MCS: p > 0.05 (NS); PCS: p > 0.05 (NS) | 1 (unclear) | Inconclusive |
Femoral head-on-cup liner bearing surface: ceramic-on-ceramic vs. ceramic-on-PE | ||||
5 years | NR125 | p > 0.05 (NS) | 1 (unclear) | Inconclusive |
In two other RCTs124,125 there were no statistically significant differences in mean SF-12 mental and physical subscale scores between THR groups with different femoral head bearings (oxinium vs. cobalt–chromium)124 and femoral head-on-cup liner articulations (ceramic-on-ceramic vs. ceramic-on-polyethylene). 125 This evidence was judged to be inconclusive (see Table 20).
One systematic review140 reported evidence from two studies that compared SF-12 scores across different articulations (metal-on-metal vs. metal-on-polyethylene) (Table 21). The review did not provide any formal narrative or quantitative synthesis of the data. The evidence was considered to be inconclusive.
Follow-up | Pooled effect estimate (95% CI) | Number of RCTs in MA or narrative synthesis | AMSTAR rating | Treatment effect conclusiona |
---|---|---|---|---|
Femoral head-on-cup liner surface: metal-on-metal vs. metal-on-PE | ||||
2–3 years | NR140 | 2140 | High quality140 | Inconclusive |
Revision
Evidence on revision was reported for 10 RCTs112,113,115,119,123–125,127–129 and five systematic reviews. 137–141
One RCT113 demonstrated a reduced risk of revision in patients who received cross-linked polyethylene compared with non-cross-linked polyethylene cup liners (RR 0.18, 95% CI 0.04 to 0.78) (Table 22). The evidence reported in the remaining nine RCTs showed statistically non-significant differences in the risk of revision between the different types of THR with wide CIs compatible with large size effects in both directions (i.e. favouring one or other of the treatment group). This evidence was deemed to be inconclusive (see Table 22).
Follow-up | Arm-specific counts, n/N | Difference (p-value or 95% CI) | Number of RCTs (SROB)a | Treatment effect conclusionb |
---|---|---|---|---|
Cup fixation: cemented vs. cementless | ||||
10 years | 17/183 vs. 11/104112 | p > 0.05 (NS); RR 0.87 (95% CI 0.42 to 1.80)c | 1 (low) | Inconclusive |
Cup liner bearing surface: XLPE vs. non-XLPE | ||||
10 years | 2/111 vs. 11/109113 | p < 0.05 (SS); RR 0.18 (95% CI 0.04 to 0.78)c | 1 (unclear) | In favour of XLPE cup liner |
Cup shell design: porous-coated shell vs. arc-deposited HA-coated shell | ||||
5 years | 2/113 vs. 4/109115 | p > 0.05 (NS); RR 0.48 (95% CI 0.09 to 2.57)c | 1 (low) | Inconclusive |
5–10 years | 2/113 vs. 2/109115 | p > 0.05 (NS); RR 0.96 (95% CI 0.13 to 6.72)c | ||
Cup and femoral stem fixation: cemented cup/femoral stem vs. cementless cup/femoral stem | ||||
7 years | 13/124 vs. 6/126119 | p = 0.11 (NS); RR 2.20 (95% CI 0.86 to 5.60)c | 1 (low) | Inconclusived |
Femoral head size: 36 mm vs. 28 mm | ||||
1 year | 4/273 vs. 6/284123 | p = NR; RR 0.69 (95% CI 0.19 to 2.43)c | 1 (low) | Inconclusive |
Femoral head bearing surface: oxinium femoral heads vs. CoCr femoral heads | ||||
2 years | 1/50 vs. 1/50124 | p = NR; RR 1.00 (95% CI 0.06 to 15.50)c | 1 (low) | Inconclusive |
Femoral head-on-cup liner bearing surface: ceramic-on-ceramic vs. metal-on-PE | ||||
5 years | 6/222 vs. 8/106115 | p = 0.045 (SS); RR 0.35 (95% CI 0.12 to 1.00)c | 1 (low) | Inconclusive |
5–10 years | 4/222 vs. 5/106115 | p = 0.08 (NS); RR 0.38 (95% CI 0.10 to 1.39)c | ||
Femoral head-on-cup liner bearing surface: ceramic-on-ceramic vs. ceramic-on-PE | ||||
5 years | 11/196 vs. 3/161125 | p = 0.06 (NS); RR 3.01 (95% CI 0.85 to 10.61)c | 1 (low) | Inconclusive |
Femoral stem composition: CoCr vs. titanium | ||||
5 years | 2/199 vs. 0/191127 | p = 0.16 (NS); RR and 95% CI not estimated | 1 (unclear) | Inconclusive |
Femoral stem design: short metaphyseal-fitting stem vs. conventional metaphyseal- and diaphyseal-fitting stem | ||||
3 years | 0/50 vs. 0/50128 | p = NR; RR and 95% CI not estimated | 1 (low) | Inconclusive |
Femoral stem fixation: cemented vs. cementless | ||||
20 years | Acetabular: 14/109 vs. 18/110129 | p = 0.673 (NS); RR 0.78 (95% CI 0.41 to 1.49)c | 1 (low) | Inconclusive |
Femoral: 3/109 vs. 4/110129 | p = 0.912 (NS); RR 0.75 (95% CI 0.17 to 3.30)c |
Of the five systematic reviews reporting on revisions, two137,141 provided pooled estimates for risk of revision (Table 23). According to one review,141 at 9 years post surgery the recipients of zirconium femoral heads were at similar risk for revision as the recipients of non-zirconium femoral heads (three pooled RCTs; risk difference 0.02, 95% CI –0.01 to 0.06). This evidence was considered conclusive in detecting no difference in revision rates between these two types of femoral head.
Follow-up | Pooled effect estimate (95% CI) | Number of RCTs in MA or narrative synthesis | AMSTAR rating | Treatment effect conclusiona |
---|---|---|---|---|
Cup fixation: cemented vs. cementless | ||||
4–8 years | RR 0.15 (0.02 to 1.18) (NS)137 | 2137 | High quality,137 low quality,138 low quality139 | Inconclusive |
10 years | RR 1.36 (0.81 to 1.29) (NS)137 | 2137 | ||
< 10 years | NR138 | 6138 | ||
5–15 years | NR139 | NR139 | ||
Femoral head-on-cup liner surface: metal-on-metal vs. metal-on-PE | ||||
2–5 years | NR140 | 2140 | High quality140 | Inconclusive |
Femoral head-on-cup liner surface: ceramic-on-ceramic vs. metal-on-PE | ||||
6–8 years | NR140 | 1140 | High quality140 | Inconclusive |
Femoral head-on-cup liner surface: ceramic-on-ceramic vs. ceramic-on-PE | ||||
2–8 years | NR140 | 5140 | High quality140 | Inconclusive |
Femoral head-on-cup liner surface: ceramic-on-PE vs. metal-on-PE | ||||
8 years | NR140 | 1140 | High quality140 | Inconclusive |
Femoral head-on-cup liner surface: zirconia-on-PE vs. non-zirconia-on-PE | ||||
9 years | RD 0.02, 95% CI –0.01 to 0.06 (NS)141 | 3141 | Medium quality141 | No difference |
In another review137 the risk of revision at 10 years after surgery did not significantly differ between cemented and cementless cup fixation THR groups (pooled RR 0.15, 95% CI 0.02 to 1.18). This result was considered inconclusive given the uninformative 95% CIs. Evidence from the remaining three reviews138–140 was of a narrative nature, which precluded us drawing conclusions (see Table 23).
Mortality
Evidence on mortality was reported for six RCTs. 110,113,119,123,128,145 None of the five systematic reviews reported on mortality.
Evidence from the six RCTs110,113,119,123,128,145 that reported mortality was inconclusive because of non-significant RR estimates and wide 95% CIs (Table 24). For example, based on a pooled RR estimate of 1.39 (95% CI 0.78 to 2.49),113,145 5- to 10-year post-surgery mortality rates in the group receiving cross-linked polyethylene cup liners were not significantly different from those in the group receiving non-cross-linked polyethylene cup liners (Figure 12). Similarly, the rest of the studies showed non-significant results for mortality between THR groups defined by femoral stem and/or cup fixation (cemented vs. cementless)110,119 and femoral head size (36 mm vs. 28 mm). 123 One RCT128 reported no deaths for both treatment groups receiving femoral stems of different design.
Follow-up | Arm-specific count (n/N) | Difference (p-value or 95% CI) | Number of RCTs (SROB across studies)a | Treatment effect conclusionb |
---|---|---|---|---|
Cup fixation: cemented vs. cementless | ||||
10 years | 12/107 vs. 14/108110 | p = NR; RR 0.86 (95% CI 0.41 to 1.78)c | 1 (low) | Inconclusive |
Cup liner bearing surface: XLPE vs. non-XLPE | ||||
5 years | 7/50 vs. 2/50145 | p > 0.05 (NS); RR 3.50 (95% CI 0.76 to 16.03)c | 2 (unclear) | Inconclusive |
10 years | 17/111 vs. 15/109113 | p > 0.05 (NS); RR 1.11 (95% CI 0.58 to 2.11)c | ||
Pooled estimate of MH-RR: RR 1.39 (95% CI 0.78 to 2.49)113,145 | ||||
Cup and femoral stem fixation: cemented cup/femoral stem vs. cementless cup/femoral stem | ||||
7 years | 18/124 vs. 17/126119 | p = NR; RR 1.07 (95% CI 0.58 to 1.98)c | 1 (low) | Inconclusive |
Femoral head size: 36 mm vs. 28 mm | ||||
1 year | 5/273 vs. 2/284123 | p = NR; RR 2.58 (95% CI 0.53 to 13.20)c | 1 (low) | Inconclusive |
Femoral stem design: short metaphyseal-fitting stem vs. conventional metaphyseal- and diaphyseal-fitting stem | ||||
3 years | 0/50 vs. 0/50128 | p = NR; RR and 95% CI not estimated | 1 (low) | Inconclusive |
No evidence was identified.
Femoral head penetration rate (measure of prosthesis movement)
Evidence on femoral head penetration rate was reported by three RCTs. 113,126,145 None of the five systematic reviews reported this end point.
Two RCTs113,145 demonstrated reduced femoral head penetration in favour of cross-linked polyethylene cup liners compared with non-cross-linked (conventional) polyethylene cup liners at 5–10 years of follow-up (Table 25). Similarly, in another RCT,126 cross-linked polyethylene cup liners with either metal or oxinium femoral heads outperformed conventional polyethylene cup liners in reducing femoral head penetration during 2 years of follow-up.
Follow-up | Arm-specific estimate (mm/year), mean (SD or 95% CI) | Difference (p-value or 95% CI) | Number of RCTs (SROB across studies)a | Treatment effect conclusionb |
---|---|---|---|---|
Cup liner bearing surface: XLPE vs. non-XLPE | ||||
5 years | 0.003 (–0.024 to 0.030) vs. 0.051 (0.029 to 0.073)145 | p = 0.006 (SS) | 2 (unclear) | In favour of XLPE |
5 years | 0.24 (0.42) vs. 1.26 (0.62)113 | p < 0.001 (SS) | ||
10 years | 0.06 (0.05) vs. 0.22 (0.11)113 | p < 0.001 (SS) | ||
Femoral head-on-cup liner bearing surface: steel-on-PE vs. CoCr-on-PE vs. oxinium-on-PE vs. CoCr-on-XLPE vs. oxinium-on-XLPE | ||||
2 years | 0.19 (0.16 to 0.23) vs. 0.40 (0.33 to 0.46) vs. 0.44 (0.37 to 0.51) vs. 0.19 (0.15 to 0.23) vs. 0.18 (0.13 to 0.22)126 | p < 0.001 (SS; steel-on-PE, CoCr-on-XLPE and oxinium-on-XLPE vs. CoCr-on-PE and oxinium-on-PE) | 1 (low) | In favour of CoCr-on-XLPE, oxinium-on-XLPE and steel-on-PE |
No evidence was identified.
Complications
Evidence on the occurrence/absence of complications was reported by nine RCTs112,113,115,123–125,127,129,145 and three systematic reviews. 138–140 In most studies112,113,115,123–125,129,145 the reported complications were classified as postoperative. In one RCT127 some of the complications were classified as perioperative.
Implant dislocation
Evidence on the occurrence/absence of implant dislocation was reported by seven RCTs110,112,115,123–125,127 (Table 26). Our pooled estimate of two studies110,112 (Figure 13) indicated a reduced risk of implant dislocation at 10 years’ follow-up in recipients of cemented compared with cementless cups (pooled OR 0.34, 95% CI 0.13 to 0.89). Moreover, in one RCT123 after 1 year of follow-up, the THR recipients with a larger size of femoral head experienced a lower risk of implant dislocation than those with a smaller size of femoral head (36 mm vs. 28 mm: RR 0.17, 95% CI 0.04 to 0.78). Evidence on implant dislocation for the remaining four RCTs115,124,125,127 was inconclusive because of incomplete data and non-significant results.
Follow-up | Arm-specific count, n/N | Difference (p-value or 95% CI) | Number of RCTs (SROB across studies)a | Treatment effect conclusionb |
---|---|---|---|---|
Cup fixation: cemented vs. cementless | ||||
10 years | 4/107 vs. 10/108110 | p > 0.05 (NS); RR 0.40 (95% 0.13 to 1.24)c | 2 (low) | In favour of cemented cup |
1/183 vs. 3/104112 | p = NR; RR 0.18 (95% 0.02 to 1.79)c | |||
Pooled estimate of Peto OR: 0.34 (95% 0.13 to 0.89)110,112c | ||||
Cup shell design: porous-coated shell vs. arc-deposited HA-coated shell | ||||
10 years | 2/113 vs. 3/109115 | p = NR; RR 0.64 (95% 0.10 to 3.77)c | 1 (low) | Inconclusive |
Femoral head size: 36 mm vs. 28 mm | ||||
1 year | 2/258 vs. 12/275123 | p = NR; RR 0.17 (95% 0.04 to 0.78)c | 1 (low) | In favour of 36-mm head size |
Femoral head bearing surface: oxinium femoral heads vs. CoCr femoral heads | ||||
2 years | 2/50 vs. 1/50124 | p = NR; RR 2.00 (95% 0.18 to 21.35)c | 1 (low) | Inconclusive |
Femoral head-on-cup liner bearing surface: ceramic-on-ceramic vs. ceramic-on-PE | ||||
5 years | 10/166 vs. 9/146125 | p = 0.672 (NS); RR 0.97 (95% CI 0.40 to 2.33)c | 1 (low) | Inconclusive |
Femoral head-on-cup liner bearing surface: ceramic-on-ceramic vs. metal-on-PE | ||||
10 years | 5/222 vs. 5/106115 | p = 0.25 (NS); RR 0.47 (95% CI 0.14 to 1.61)c | 1 (low) | Inconclusive |
Femoral stem composition: CoCr vs. titanium | ||||
5 years | 3/199 vs. 0/191127 | p = 0.678 (NS); RR and 95% CI not estimated | 1 (unclear) | Inconclusive |
Overall, no conclusions on implant dislocation could be drawn from the two systematic reviews, given the narrative evidence summary140 and the mixed study designs139 (Table 27). The pooled data from one review139 was based on nine studies, most of which were not randomised and which indicated a lower risk of dislocation in the groups receiving cemented compared with cementless cups.
Follow-up | Pooled effect estimate (95% CI) | Number of RCTs in MA or narrative synthesis | AMSTAR rating | Treatment effect conclusiona |
---|---|---|---|---|
Cup fixation: cemented vs. cementless | ||||
5–15 years | 12/914 (1.3%) vs. 28/696 (4.1%) (p = 0.001).139 Pooled data from nine comparative studies (most non-RCTs) suggested that cemented cups had lower dislocation rates than cementless cups | NR139 | Low quality139 | Inconclusive |
Femoral head-on-cup liner surface: metal-on-metal vs. metal-on-PE | ||||
2–5 years | NR.140 No significant difference based on results from three RCTs | 3140 | High quality140 | Inconclusive |
Osteolysis
Evidence on osteolysis was reported by seven RCTs112,113,115,125,127,129,145 (Table 28). In one RCT115 comparing different femoral head-on-cup liner bearing surfaces, recipients of ceramic-on-ceramic articulations had a reduced risk of osteolysis compared with recipients of metal-on-polyethylene articulations at 10 years post operation (RR 0.10, 95% CI 0.02 to 0.32).
Follow-up | Arm-specific count, n/N | Difference (p-value or 95% CI) | Number of RCTs (SROB across studies)a | Treatment effect conclusionb |
---|---|---|---|---|
Cup fixation: cemented vs. cementless | ||||
10 years | 0/183 vs. 1/104112 | p = NR; RR and 95% CI not estimated | 1 (low) | Inconclusive |
Cup liner bearing service: XLPE vs. non-XLPE | ||||
5 years | 0/50 vs. 0/50145 | p = NR; RR and 95% CI not estimated | 2 (unclear) | Inconclusive |
10 years | 0/111 vs. 15/109113 | p < 0.001; RR and 95% CI not estimated | ||
Cup shell design: porous-coated shell vs. arc-deposited HA-coated shell | ||||
10 years | 1/113 vs. 2/109115 | p = NR; RR 0.48 (95% CI 0.04 to 5.24)c | 1 (low) | Inconclusive |
Femoral head-on-cup liner bearing surface: ceramic-on-ceramic vs. ceramic-on-PE | ||||
5 years | 1/166 vs. 1/146125 | p = 0.797 (NS); RR 0.87 (95% CI 0.05 to 13.93)c | 1 (low) | Inconclusive |
Femoral head-on-cup liner bearing surface: ceramic-on-ceramic vs. metal-on-PE | ||||
10 years | 3/222 vs. 15/106115 | p < 0.001 (SS); RR 0.10 (95% CI 0.02 to 0.32)c | 1 (low) | In favour of ceramic-on-ceramic bearing surface |
Femoral stem composition: CoCr vs. titanium | ||||
5 years | 0/199 vs. 0/191127 | p = NR; RR and 95% CI not estimated | 1 (unclear) | Inconclusive |
Femoral stem fixation: cemented vs. cementless | ||||
20 years | Acetabular: 35/109 vs. 40/110129 | p = 0.168 (NS); RR 0.88 (95% CI 0.61 to 1.27)c | 1 (low) | Inconclusive |
Femoral: 31/109 vs. 35/110129 | p = 0.159 (NS); RR 0.89 (95% CI 0.59 to 1.33)c |
For seven RCTs, the evidence for osteolysis was inconclusive across the comparisons based on different methods of cup fixation (cemented vs. cementless),112 cup liner bearing surface (cross-linked polyethylene vs. non-cross-linked polyethylene),113,145 cup shell design (porous coated vs. arc-deposited hydroxyapatite coated),115 femoral head-on-cup liner bearing surface (ceramic-on-ceramic vs. ceramic-on-polyethylene),125 femoral stem composition (cobalt–chromium vs. titanium)127 and femoral stem fixation (cemented vs. cementless). 129
Overall, no conclusions could be drawn on the incidence of osteolysis from two low-quality systematic reviews138,139 comparing cemented and cementless methods of cup fixation, given the narrative evidence summaries, mixed study designs and inconsistent findings (Table 29).
Follow-up | Pooled effect estimate (95% CI) | Number of RCTs in MA or narrative synthesis | AMSTAR rating | Treatment effect conclusiona |
---|---|---|---|---|
Cup fixation: cemented vs. cementless | ||||
2–6 years | NR.138 The analysis and narrative synthesis of RCT data showed no statistically significant difference in the occurrence of osteolysis between cemented and cementless cups | 3138 | Low quality138 | Inconclusive |
5–15 years | NR.139 The narrative synthesis of nine comparative studies (most non-RCTs) indicated lower rates of osteolysis with cemented cups | NR139 | Low quality139 | Inconclusive |
Other complications
Seven RCTs reported other complications such as aseptic loosening (Table 30),112,113,119,124,127 femoral fracture (Table 31),113,115,127 infection (Table 32),112,124,125,127 and deep-vein thrombosis (Table 33). 125 This evidence was judged to be inconclusive because of low event or zero event counts and CIs indicating great uncertainty.
Follow-up | Arm-specific count, n/N | Difference (p-value or 95% CI) | Number of RCTs (SROB)a | Treatment effect conclusionb |
---|---|---|---|---|
Cup fixation: cemented vs. cementless | ||||
10 years | 11/183 vs. 2/104112 | p = NR; RR 3.12 (95% CI 0.70 to 13.83)c | 1 (low) | Inconclusive |
Cup liner bearing surface: XLPE vs. non-XLPE | ||||
10 years | 0/111 vs. 0/109113 | NA; RR and 95% CI not estimated | 1 (unclear) | Inconclusive |
Cup and femoral stem fixation: cemented cup/femoral stem vs. cementless cup/femoral stem | ||||
20 years | 9/124 vs. 4/126119 | p = NR; RR 2.28 (95% CI 0.72 to 7.23)c | 1 (low) | Inconclusive |
Femoral head bearing surface: oxinium femoral heads vs. CoCr femoral heads | ||||
2 years | 0/50 vs. 1/50124 | p = NR; RR and 95% CI not estimated | 1 (low) | Inconclusive |
Femoral stem composition: CoCr vs. titanium | ||||
5 years | 1/199 vs. 0/191127 | p = 0.324 (NS); RR and 95% CI not estimated | 1 (unclear) | Inconclusive |
Follow-up | Arm-specific count, n/N | Difference (p-value or 95% CI) | Number of RCTs (SROB)a | Treatment effect conclusionb |
---|---|---|---|---|
Cup liner bearing service: XLPE vs. non-XLPE | ||||
10 years | 2/111 vs. 0/109113 | p = NR; RR and 95% CI not estimated | 1 (unclear) | Inconclusive |
Cup shell design: porous-coated shell vs. arc-deposited HA-coated shell | ||||
10 years | 0/113 vs. 0/109115 | NA; RR and 95% CI not estimated | 1 (low) | Inconclusive |
Femoral stem composition: CoCr vs. titanium | ||||
5 years | 0/199 vs. 1/191127 | p = 0.309 (NS); RR and 95% CI not estimated | 1 (unclear) | Inconclusive |
Follow-up | Arm-specific count, n/N | Difference (p-value or 95% CI) | Number of RCTs (SROB)a | Treatment effect conclusionb |
---|---|---|---|---|
Cup fixation: cemented vs. cementless | ||||
10 years | 0/183 vs. 2/104112 | p = NR; RR and 95% CI not estimated | 1 (low) | Inconclusive |
Femoral head bearing surface: oxinium femoral heads vs. CoCr femoral heads | ||||
2 years | 1/50 vs. 1/50124 | p = NR; RR 1.00 (95% CI 0.06 to 15.55)c | 1 (low) | Inconclusive |
Femoral head-on-cup liner bearing surface: ceramic-on-ceramic vs. ceramic-on-PE | ||||
5 years | Superficial: 6/166 vs. 3/146125 | p = 0.357 (NS); RR 1.75 (95% CI 0.44 to 6.90)c | 1 (low) | Inconclusive |
Deep: 1/166 vs. 2/146125 | p = 0.909 (NS); RR 0.43 (95% CI 0.04 to 4.79)c | |||
Femoral stem composition: CoCr vs. titanium | ||||
5 years | 1/199 vs. 0/191127 | p = 0.324 (NS); RR and 95% CI not estimated | 1 (unclear) | Inconclusive |
Follow-up | Arm-specific count, n/N | Difference (p-value or 95% CI) | Number of RCTs (SROB)a | Treatment effect conclusionb |
---|---|---|---|---|
Femoral head-on-cup liner bearing surface: ceramic-on-ceramic vs. ceramic-on-PE | ||||
5 years | 3/166 vs. 2/146125 | p = 0.909 (NS); RR 1.31 (95% CI 0.22 to 7.78)c | 1 (low) | Inconclusive |
Of other complications, only aseptic loosening was reported in one low-quality systematic review139 (Table 34). Pooled data from 11 studies, most of which were not randomised, pointed towards a greater risk of aseptic loosening with cemented compared with cementless cups; however, the evidence is inconclusive given the lack of numerical data and the evidence synthesis being based on mixed study designs.
Follow-up | Pooled effect estimate (95% CI) | Number of RCTs in MA or narrative synthesis | AMSTAR rating | Treatment effect conclusiona |
---|---|---|---|---|
Cup fixation: cemented vs. cementless | ||||
5–15 years | NR.139 Pooled data from 11 comparative studies (most non-RCTs) presented only graphically suggested higher rates of aseptic loosening with cemented vs. cementless cups | NR139 | Low quality139 | Inconclusive |
Grading the overall quality of the evidence
The results for graded outcomes are presented in the evidence profile (Table 35). For a meaningful grading process and for consistency, only the THR comparison categories that included at least two studies (cup fixation – cemented vs. cementless and cup liner bearing surface: cross-linked polyethylene vs. non-cross-linked polyethylene) were selected. The overall quality for gradable outcomes across the THR comparison categories (cup fixation and cup liner bearing surface) was as follows: HHS – moderate grade; WOMAC score – not graded and very low grade, respectively; revision – very low grade; mortality – very low grade and low grade, respectively; femoral head penetration – not graded and moderate grade, respectively; and implant dislocation – high grade and not graded, respectively.
Outcome (follow-up timing) | Number of studies reporting outcome (participants) | Pooled effect estimate (95% CI) and conclusion | SROB across studies | Consistency | Directness | Precision | Outcome reporting bias | Quality of the evidence (GRADE)b |
---|---|---|---|---|---|---|---|---|
Cup fixation (cemented vs. cementless) – two RCTs110,112 | ||||||||
HHS (6 months-10 years) | 2 (502) | None; no difference | Unclear | Consistent | Direct | Precise | Unlikely | Moderate |
WOMAC score (NA) | 0 | NA | NA | NA | NA | NA | NA | NA (no evidence) |
Revision (10 years) | 1 (287) | None; inconclusive | Low | NA | Direct | Imprecise | Likely | Very low |
Mortality (10 years) | 1 (215) | None; inconclusive | Low | NA | Direct | Imprecise | Likely | Very low |
Femoral head penetration (NA) | 0 | NA | NA | NA | NA | NA | NA | NA (no evidence) |
Implant dislocation (10 years) | 2 (502) | OR 0.34 (0.13 to 0.89); in favour of cemented cup | Low | Consistent | Direct | Precise | Unlikely | High |
Cup liner bearing surface (XLPE vs. non-XLPE) – two RCTs113,145 | ||||||||
HHS (1–10 years) | 2 (320) | MD 2.29 (–0.88 to 5.45); no difference | Unclear | Consistent | Direct | Precise | Unlikely | Moderate |
WOMAC score (1–5 years) | 1 (100) | None; no difference | Unclear | NA | Direct | Precise | Likely | Very low |
Revision (10 years) | 1 (220) | None; in favour of XLPE cup liner | Unclear | NA | Direct | Precise | Likely | Very low |
Mortality (5–10 years) | 2 (320) | RR 1.39 (0.78 to 2.49); inconclusive | Unclear | Consistent | Direct | Imprecise | Unlikely | Low |
Femoral head penetration (5–10 years) | 2 (320) | None; in favour of XLPE cup liner | Unclear | Consistent | Direct | Precise | Unlikely | Moderate |
Implant dislocation (NA) | 0 | NA | NA | NA | NA | NA | NA | NA (no evidence) |
Summary conclusions for the comparison between different types of total hip replacement
Randomised controlled trials
The majority of the evidence comparing THRs was rated as inconclusive by us (Table 36). In three RCTs there was evidence of a reduced risk of implant dislocation with the use of a cemented cup (vs. a cementless cup)110,112 or a larger femoral head size (36 mm vs. 28 mm)123 (high-grade evidence for the cup fixation comparison). In three other RCTs, patients who received a THR with a cross-linked polyethylene cup liner experienced a reduced (i.e. improved) femoral head penetration rate (moderate-grade evidence)113,126,145 and risk for revision (very low-grade evidence)113 compared with recipients of conventional polyethylene cup liners. In one RCT119 the use of cementless fixation of the cup and femoral stem (vs. cemented fixation) was associated with a better implant survival rate. Moreover, the recipients of ceramic-on-ceramic articulations (vs. metal-on-polyethylene) experienced a reduced risk of osteolysis. 115 For half of the studies,110,112,113,119,126,145 the mean post-THR clinical and functional scores (i.e. HHS, WOMAC score, SF-12 score, MACTAR score, Merle d’Aubigné and Postel score) measured at different follow-up times were similar between the different THR treatment groups (moderate-grade evidence for no difference in HHS across the comparisons for cup fixation and cup liner surface types).
Conclusive evidence suggesting a difference | Conclusive evidence suggesting no difference | Inconclusive evidence |
---|---|---|
Cup fixation: cemented vs. cementless110,112 | ||
Implant dislocation (high-grade evidence)110,112 (in favour of cemented) | HHS (moderate-grade evidence)110,112 | Mortality (very low-grade evidence),110 revision (very low-grade evidence),112 osteolysis,112 aseptic loosening,112 infection112 |
Cup liner bearing surface: XLPE vs. non-XLPE113,145 | ||
Femoral head penetration (moderate-grade evidence)113,145 revision rate (very low-grade evidence)113 (in favour of XLPE) | HHS (moderate grade evidence),113,145 WOMAC score (very low grade evidence)145 SF-12 score (mental/physical)145 | Mortality (low-grade evidence),113,145 aseptic loosening,113 femoral fracture113 |
Cup shell design: porous coated vs. arc-deposited HA coated115 | ||
None | None | HHS, revision, implant dislocation, osteolysis, femoral fracture |
Cup and femoral stem fixation: cemented vs. cementless119 | ||
Nonea | HHS, Merle d’Aubigné and Postel score, MACTAR score | WOMAC score, mortality, revision, aseptic loosening |
Femoral head size: 36 mm vs. 28 mm123 | ||
Implant dislocation (in favour of 36 mm) | None | Mortality, revision |
Femoral head bearing surface: oxinium vs. CoCr124 | ||
None | None | HHS, SF-12 score, WOMAC score, revision, implant dislocation, aseptic loosening, infection |
Femoral head-on-cup liner bearing: ceramic-on-ceramic vs. metal-on-PE115 | ||
Osteolysis (in favour of ceramic-on-ceramic) | None | HHS, revision, implant dislocation |
Femoral head-on-cup liner bearing: ceramic-on-ceramic vs. ceramic-on-PE125 | ||
None | None | HHS, SF-12 score, revision, implant dislocation, osteolysis, infection, deep-vein thrombosis |
Femoral head-on-cup liner bearing: steel-on-PE vs. CoCr/oxinium-on-XLPE vs. CoCr/oxinium-on-PE126 | ||
Femoral head penetration (in favour of steel-on-PE or CoCr/oxinium-on-XLPE) | HHS | None |
Femoral stem composition: CoCr vs. titanium127 | ||
None | None | HHS, revision, implant dislocation, osteolysis, aseptic loosening, femoral fracture, infection |
Femoral stem design: short metaphyseal fitting vs. conventional metaphyseal and diaphyseal fitting128 | ||
None | None | HHS, mortality, revision |
Femoral stem fixation: cemented vs. cementless129 | ||
None | None | HHS, UCLA activity score, WOMAC score, revision, osteolysis |
Evidence from studies reporting the UCLA activity score,129 mortality (very low-grade evidence),110,113,119,123,128,145 aseptic loosening,112,113,119,124,127 femoral fracture,113,115,127 infection112,124,125,127 and deep-vein thrombosis125 was all inconclusive. Also, the evidence reported in four studies was considered inconclusive for all outcomes (very low-grade evidence). 124,125,127,128 Results were considered inconclusive by us because of partial reporting (missing data for effect estimates, CIs, SEs, SDs, p-values), great uncertainty (wide CIs), zero event counts and/or inconsistency in estimates.
Systematic reviews
The majority of evidence from the five systematic reviews comparing different types of THR137–141 was considered inconclusive. This is because of unreported pooled results across RCTs (i.e. reporting only narrative syntheses), the reporting of inappropriate pooling methods (e.g. indirect naive comparison of single-group cohorts; pooling of studies of different design)138,139,141 or the reporting of inconsistent summary findings140 (Table 37). The evidence from one review141 indicated no difference in the risk for revision between two different articulations of zirconium-on-polyethylene and non zirconium-on-polyethylene.
Conclusive evidence suggesting a difference | Conclusive evidence suggesting no difference | Inconclusive evidence |
---|---|---|
Cup fixation: cemented vs. cementless137–139 | ||
None | None | HHS,137,138 OHS,137 revision,137–139 aseptic loosening139 |
aFemoral head-on-cup liner bearing: different comparisons140,141 | ||
None | Revision141 | HHS,140 SF-12 score,140 revision,140 implant dislocation140 |
Other analysis
Publication bias
The extent to which publication bias could have influenced the pooled treatment effect estimates (i.e. degree of funnel plot asymmetry) could not be explored because of an insufficient number of data points in the forest/funnel plots.
Heterogeneity, subgroup effects and sensitivity analysis
The data reviewed from RCTs were too sparse and heterogeneous (in terms of different types of THR) to allow exploration of whether or not the relative effect of any given THR differed by study-level methodological characteristics (i.e. risk of bias, type of data analysis) or patient-related characteristics (i.e. age, sex or functional status). None of the included RCTs reported within-study subgroup effects of the different THRs compared.
Comparison between total hip replacement and resurfacing arthroplasty
Study and participant characteristics
Randomised controlled trials
Study and participant characteristics of the three included RCTs130–132 are summarised in Table 38. More details can be found in Appendices 3 and 4. Two RCTs131,132 were conducted in Canada and one130 was conducted in the UK. A total of 422 participants were randomised across the three RCTs, ranging from 104131 to 192132 participants. The mean age of participants ranged from 50132 to 56130 years and the proportion of female participants across the studies ranged from 10.5%131 to 41%. 130 The total length of follow-up the studies ranged from 1 year130 to 6 years. 132 The proportion of participants diagnosed with primary OA was reported for two studies130,132 and was 33%132 and 95%. 130
Study characteristic | Metric |
---|---|
Geographical region | UK (n = 1), Canada (n = 2) |
Total number of randomised participants | 422 (range 104–192) |
Mean age (years) | Range 50–56 |
Female participants (%) | Range 10.5–41 |
Length of follow-up (years) | Range 1–6 |
Diagnosis of primary OA (%) | Range 33–95 |
The three RCTs reported on clinical/functional scores (e.g. HHS, OHS, UCLA activity score, WOMAC score), health-related quality of life and risk of revision. Follow-up of outcome assessments ranged from 3 weeks130 to 5 years. 132 Outcomes reported in the included studies can be found in Appendix 8.
Systematic reviews
Three systematic reviews142–144 were included that evaluated the clinical effectiveness of THR compared with RS with respect to postoperative clinical/function (HHS, WOMAC score), risk of revision, mortality and complications. Searches for these systematic reviews were undertaken between March 2008144 and January 2010. 143 Evidence was synthesised from both RCTs and non-RCTs (see Appendices 3 and 4). Further details on specific outcomes reported (or not reported) in the included systematic reviews can be found in Appendix 8.
Risk of bias and methodological quality
Risk of bias in randomised controlled trials
The risk of bias assessment for the three included RCTs130–132 comparing THR with RS is presented in risk of bias tables (see Appendix 2), the summary table (Table 39) and the risk of bias graph (Figure 14). Overall, two studies130,132 reported an adequate method for random sequence generation and all three studies130–132 reported treatment allocation concealment (low risk of bias). Two of the three studies130,132 were rated as having a low risk of performance and detection bias for objective outcomes (e.g. revision, dislocation). The same two studies had a high risk of performance bias for subjective outcomes (e.g. patient-administered functional scores). Patients and study personnel were blinded in only one study. 131 For two studies130,132 the influence of attrition bias on objective outcomes was judged to be of low risk. All three studies were judged as being at low risk for selective outcome and/or analysis bias. Risk of other biases (e.g. funding source, balance/imbalance in important characteristics, inappropriate analysis) for one of the three studies was judged to be high. 131
Study | Selection bias: random sequence generation | Selection bias: allocation concealment | Performance bias: subjective (e.g. patient reported) | Performance bias: objective (e.g. mortality, radiography, dislocation) | Detection bias: subjective (e.g. patient reported) | Detection bias: objective (e.g. mortality, radiography, dislocation) | Attrition bias: subjective (e.g. patient reported) | Attrition bias: objective (e.g. mortality, radiography, dislocation) | Reporting bias: selective reporting of the outcome, subgroups or analysis | Other bias [funding source, adequacy of statistical methods used, type of analysis (ITT/PP), baseline imbalance in important characteristics] |
---|---|---|---|---|---|---|---|---|---|---|
Costa 2012130 | + | + | – | + | + | + | + | + | + | + |
Garbuz 2010131 | ? | + | + | NA | ? | NA | + | NA | + | – |
Vendittoli 2010132 | + | + | – | + | ? | + | – | + | + | + |
Methodological quality of systematic reviews comparing total hip replacement with resurfacing arthroplasty
The assessment of methodological quality of the three included systematic reviews142–144 is presented in Table 40 and the data extraction sheets (see Appendices 3 and 4). Given the number of methodological items that were satisfied, one of the three reviews was judged as being of high quality (falling into the score range 9–11),143 one was judged as being of medium quality (falling into the score range 5–8)142 and one was judged as being of low quality (falling into the score range 0–4). 144 The specific unmet methodological items related to inappropriate analysis, failure to address issues of publication bias and no information on conflicts of interest.
Study | Was an a priori design provided? | Was there duplicate study selection and data extraction? | Was a comprehensive literature search performed? | Was the status of publication (i.e. grey literature) used as an inclusion criterion? | Was a list of studies (included and excluded) provided? | Were the characteristics of the included studies provided? | Was the scientific quality of the included studies assessed and documented? | Was the scientific quality of the included studies used appropriately in formulating conclusions? | Were the methods used to combine the findings of studies appropriate? | Was the likelihood of publication bias assessed? | Was the conflict of interest stated? | Overall |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Jiang 2011142 | Yes | Yes | Yes | No | Yes | No | Yes | CA | No | No | No | Medium quality |
Smith 2010143 | Yes | Yes | Yes | Yes | Yes | Yes | Yes | CA | No | Yes | Yes | High quality |
Springer 2009144 | Yes | Yes | No | No | Yes | Yes | CA | No | No | No | No | Low quality |
Clinical effectiveness findings for the comparison between total hip replacement and resurfacing arthroplasty
This section summarises the findings from the three RCTs130–132 and three systematic reviews. 142–144
The reported outcomes for this section were the HHS (one RCT;130 two systematic reviews142,143), WOMAC score (two RCTs;131,132 two systematic reviews142,143), Merle d’Aubigné and Postel score (one RCT;132 one systematic review142), UCLA activity score (two RCTs;131,132 one systematic review42), OHS (one RCT130), health-related quality of life scales (SF-36 and EQ-5D; two RCTs130,131), risk of revision (one RCT;132 two systematic reviews142,143), mortality (two systematic reviews142,143), infection (two RCTs;130,132 one systematic review142), aseptic loosening (one RCT;132 two systematic reviews142,143), implant dislocation (two RCTs;130,132 one systematic review142) and deep-vein thrombosis (two RCTs130,132).
Neither the RCTs nor the systematic reviews reported any evidence for the following clinical effectiveness outcomes:
-
HOOS
-
LISOH
-
AAOS Hip and Knee Questionnaire
-
AIMS
-
MACTAR
-
NHP questionnaire
-
SF-12
-
time to revision
-
pain score (VAS)
-
femoral head penetration.
Summary results for the included outcomes are presented separately for RCTs and systematic reviews.
Evidence from randomised controlled trials
Functional/clinical measures
All three included RCTs comparing THR and RS reported some evidence for the following functional scores measured at 12–24 months after the procedure: HHS,130 OHS,130 WOMAC score,131,132 UCLA activity score131,132 and Merle d’Aubigné and Postel score. 132
In two RCTs there was no difference between the THR group and the RS group in mean postoperative OHS (12 months; MD –2.23, 95% CI –5.98 to 1.52),130 Merle d’Aubigné and Postel score (24 months; MD 0.0, 95% CI –1.06 to 1.06)132 or WOMAC score (12 months; MD 2.20, 95% CI –1.57 to 5.97). 132 One of these RCTs showed a significantly improved mean WOMAC score for the RS group compared with the THR group at 24 months of follow-up; however, this difference was not deemed to be clinically important (MD 3.30, 95% CI 0.01 to 6.58). 132
All three included RCTs comparing THR with RS reported some evidence for the following functional scores measured at 12–24 months after the procedure: HHS,130 OHS,130 WOMAC score,131,132 UCLA score131,132 and Merle d’Aubigné and Postel score. 132
Health-related quality of life
Two RCTs reporting quality of life measures showed statistically non-significant differences between the THR group and the RS group for both the SF-36 (p = 0.55 and p = 0.97 for mental and physical components, respectively)131 and the EQ-5D (MD –0.08, 95% CI –0.18 to 0.03). 130 These results were deemed to be inconclusive given the wide CI130 and incomplete data reporting. 131
Revision
The occurrence of implant revision was reported for only one RCT. 132 There was no statistically significant difference between the THR group and the RS group for risk of revision at 6 months (RR 1.01, 95% CI 0.06 to 15.92), 24 months (RR 0.50, 95% CI 0.04 to 5.48) or 56 months (RR 0.54, 95% CI 0.10 to 2.91) post surgery. The 95% CIs around the effect estimates embraced the value 1.00 and therefore did not allow definitive conclusions to be made regarding the effectiveness of THR compared with RS.
Mortality rate
No evidence on mortality rates was identified from the RCTs.
Complications
Evidence on complications was reported for two RCTs. 130,132 Meta-analysis of the data on risk of infection from the two RCTs indicated that, at 12–56 months post operation, THR recipients were at an increased risk of infection compared with RS recipients (pooled OR 7.94, 95% CI 1.78 to 35.40) (Figure 15). In addition, evidence on the differences between groups for the risk of deep-vein thrombosis (Figure 16; pooled OR 0.60, 95% CI 0.15 to 2.42),130,132 implant dislocation (Figure 17; pooled OR 3.97, 95% CI 0.79 to 19.90),130,132 wound complications (RR 4.01, 95% CI 0.92 to 18.18)130 and aseptic loosening (RR not estimable)132 was judged to be inconclusive by us.
A summary of the results for the difference outcomes is presented in Table 41.
Follow-up (months) | Arm-specific estimates, n/N or mean (SD or 95% CI) (THR vs. RS) | Difference (p-value or 95% CI) | Number of RCTs (SROB across studies)a | Treatment effect conclusionb |
---|---|---|---|---|
HHS (range 0–100) | ||||
12 | 82.3 (77.2 to 87.5) vs. 88.4 (84.4 to 92.4)130 | MD –6.04 (12.58 to 0.51) | 1 (low) | Inconclusive |
OHS (range 0–48) | ||||
12 | 38.2 (35.3 to 41.0) vs. 40.4 (37.9 to 42.9)130 | MD –2.23 (–5.98 to 1.52) | 1 (low) | No difference |
WOMAC score (range 0–100) | ||||
3 | c19.2 (NR) vs. 19.9 (NR)132 | p = 0.76 (NS)132,136 | 2 (unclear) | No difference |
6 | c11.3 (NR) vs. 13.9 (NR)132 | p = 0.20 (NS)132,136 | ||
12 | c10.2 (10.7) vs. 8.0 (13.2)132 | dMD 2.20 (–1.57 to 5.97)132,136 | ||
12 | e90.18 (NR) vs. 90.40 (NR)131 | p = 0.95 (NS)131 | ||
24 | c9.0 (11.9) vs. 5.7 (8.6)132 | dMD 3.30 (0.01 to 6.58)132,136 | ||
Merle d’Aubigné and Postel score (range 0–18) | ||||
3 | 15.8 (NR) vs. 16.2 (NR)132 | p = 0.59 (NS) | 1 (unclear) | No difference |
6 | 17.1 (NR) vs. 17.2 (NR)132 | p = 0.72 (NS) | ||
12 | 16.6 (NR) vs. 16.7 (NR)132 | p = 0.94 (NS) | ||
24 | 17.5 (1.3) vs. 17.5 (1.3)132 | p = 0.94 (NS); MD 0.0 (–1.06 to 1.06)c | ||
UCLA activity score (range 1–10) | ||||
12 | 6.3 (NR) vs. 6.8 (NR)131 | p = 0.24 (NS)131 | 2 (unclear) | Inconclusive |
12 | 6.3 (NR) vs. 7.1 (NR)132 | p = 0.03 (SS)132,136 | ||
24 | NR (NR) vs. NR (NR)132 | p = 0.09 (NS)132,136 | ||
SF-36 score (range 0–100) | ||||
12 | MCS: 55.13 (NR) vs. 53.87 (NR)131 | p = 0.55 (NS) | 1 (unclear) | Inconclusive |
12 | PCS: 51.28 (NR) vs. 51.22 (NR)131 | p = 0.97 (NS) | ||
EQ-5D score (range 0–1) | ||||
12 | 0.71 (0.63 to 0.80) vs. 0.79 (0.72 to 0.87)130 | MD –0.077 (–0.188 to 0.034) | 1 (low) | Inconclusive |
Revision rate | ||||
3 | 1/102 vs. 0/103132 | p = NR; RR and 95% CI not estimated | 1 (low) | Inconclusive |
6 | 1/102 vs. 1/103132 | p = NR; RR 1.01 (0.06 to 15.92)d | ||
12 | 1/102 vs. 2/103132 | p = NR; RR 0.50 (0.04 to 5.48)d | ||
24 | 1/102 vs. 2/103132 | p = NR; RR 0.50 (0.04 to 5.48)d | ||
56 | 2/100 vs. 4/109132 | p = 0.47 (NS); RR 0.54 (0.10 to 2.91)d | ||
Complications | ||||
Infection | ||||
12 | 2/66 vs. 0/60130 | p = 0.49 (NS); RR and 95% CI not estimated | 2 (low) | In favour of RS |
56 | 5/100 vs. 0/109132 | p = 0.02 (SS); RR and 95% CI not estimated | ||
dPooled estimate of Peto OR: 7.94 (1.78 to 35.40)130,132 | ||||
Deep-vein thrombosis | ||||
12 | 0/66 vs. 4/60130 | p = 0.05 (NS); RR and 95% CI not estimated | 2 (low) | Inconclusive |
56 | 3/100 vs. 1/109132 | p = NR (NS); RR 3.27 (95% CI 0.30 to 30.90)d | ||
dPooled estimate of Peto OR: 0.60 (95% CI 0.15 to 2.42)130,132 | ||||
Implant dislocation | ||||
12 | 1/66 vs. 1/60130 | p = 1.00 (NS); RR 0.90, 95% CI 0.05 to 14.21d | 2 (low) | Inconclusive |
56 | 4/100 vs. 0/109132 | p = 0.038 (SS); RR and 95% CI not estimated | ||
dPooled estimate of Peto OR: 3.97 (95% CI 0.79 to 19.90)130,132 | ||||
Superficial wound complication | ||||
12 | 9/66 vs. 2/60130 | p = 0.06 (NS); RR 4.01 (95% CI 0.92 to 18.18)d | 1 (low) | Inconclusive |
Aseptic loosening | ||||
56 | 0/100 vs. 6/109132 | p = 0.017 (SS); RR and 95% CI not estimated | 1 (low) | Inconclusive |
Evidence from systematic reviews
Functional/clinical measures
Two of the three included systematic reviews comparing THR with RS reported evidence on HHS,142,143 WOMAC score,142,143 Merle d’Aubigné and Postel score142 and UCLA activity score142 (Table 42). The evidence was inconclusive because of the lack of pooled MD estimates for all four scores as well as the inconsistent results for the mean HHS and WOMAC score.
Follow-up (years) | Pooled effect estimate (95% CI) (RS vs. THR) | Number of RCTs in the MA or narrative synthesis | AMSTAR rating | Treatment effect conclusiona |
---|---|---|---|---|
HHS (range 0–100) | ||||
1–2 | NR;142 no significant difference | 3142 | Medium quality142 | Inconclusive |
2 | MD 2.51 (1.24 to 3.77) (SS);143 better for RS than for THR | NR143 | High quality143 | |
WOMAC score (range 0–100) | ||||
1–2 | NR;142 no significant difference | 3142 | Medium quality142 | Inconclusive |
2 | MD –2.41 (–3.88 to –0.94) (SS);143 better for RS than for THR | NR143 | High quality143 | |
Merle d’Aubigné and Postel score (range 0–18) | ||||
1–2 | NR;142 no significant difference | 3142 | Medium quality142 | Inconclusive |
UCLA activity score (range 1–10) | ||||
1–2 | NR;142 mean UCLA activity score significantly higher in RS group than in THR group | 2142 | Medium quality142 | Inconclusive |
Revision rate | ||||
1–10 | RR 2.60 (1.31 to 5.15) (SS)142 | 4142 | Medium quality142 | In favour of THR |
NR | RR 1.72 (1.20 to 2.45) (SS);143 higher in RS group than in THR group (19 pooled RCTs and non-RCTs) | NR143 | High quality143 | |
Mortality rate | ||||
3 | NR;142 one study showed no significant difference between RS and THR RR 1.05 (0.24 to 4.66) | 1142 | Medium quality142 | Inconclusive |
NR | RR 1.10 (0.10 to 17.8) (NS)143 | NR143 | High quality143 | |
Failure rate | ||||
NR | 3.70% (2.0% to 6.5%) vs. 11.60% (7.50% to 17.40%);144 indirect naive comparison of 15 studies of RS and 19 studies of THR | NA144 | Low quality144 | Inconclusive |
Dislocation rate | ||||
1–2 | RR 0.25 (0.05 to 1.21) (NS)142 | 3142 | Medium quality142 | In favour of RS |
NR | RR 0.20 (0.10 to 0.50) (SS);143 lower in RS group than in THR group (no. of pooled studies NR) | NR143 | High quality143 | |
Component loosening | ||||
1–10 | RR 4.96 (1.82 to 13.50) (SS);142 higher in RS group than in THR group | 4142 | Medium quality142 | In favour of THR |
NR | RR 3.00 (1.11 to 8.50) (SS);143 higher in RS group than in THR group (10 pooled RCTs and non-RCTs) | NR143 | High quality143 | |
Infection | ||||
1–3 | RR 2.25 (0.61 to 8.31) (NS)142 | 3142 | Medium quality142 | Inconclusive |
Health-related quality of life
No evidence was identified.
Revision
Both systematic reviews142,143 found a higher risk of revision in patients receiving RS than in those receiving THR. One review meta-analysed data from four RCTs that compared risk of revision in RS and THR recipients, reporting a pooled RR estimate of 2.60 (95% CI 1.31 to 5.15) (see Table 42). 142
Mortality
Overall, evidence on mortality reported by both systematic reviews142,143 was inconclusive because of great uncertainty in the effect estimates and the variability around them. For example, the pooled RR for mortality in one review143 for the comparison between RS and THR was 1.10 (95% CI 0.10 to 17.8) (see Table 42).
Failure rate
One systematic review144 reported an indirect naive comparison analysis (i.e. analysis without a common comparator) based on data from 15 studies of RS and 19 studies of THR (see Table 42). The analysis suggested a reduced risk of failure in the RS recipients compared with the THR recipients (3.70% vs. 11.60%). Given the well-recognised problems with validity of such methodology, this evidence was judged to be inconclusive.
Complications
Evidence on complications was reported by both systematic reviews142,143 (i.e. implant dislocation, infection and component loosening) (see Table 42). The evidence consistently showed an increased risk for component loosening142,143 but a reduced risk for implant dislocation142 among RS recipients compared with THR recipients. One review,142 which provided the risk of infection pooled across three studies, was not informative enough to draw any conclusions (RR 2.25, 95% CI 0.61 to 8.31).
Grading the overall quality of the evidence
The results for graded outcomes are presented in the evidence profile (Table 43). The overall quality for gradable outcomes across the reviewed evidence comparing THR with RS was as follows: HHS – very low grade; WOMAC score – low grade; revision – very low grade; mortality – not graded because of absence of evidence; and implant dislocation – very low grade.
Outcome (follow-up timing) | Number of studies reporting outcome (participants) | Pooled effect estimate (95% CI) and conclusion | SROB across studies | Consistency | Directness | Precision | Outcome reporting bias | Quality of the evidence (GRADE)b |
---|---|---|---|---|---|---|---|---|
HHS (12 months) | 1 (126)130 | None; inconclusive | Low | NA | Direct | Imprecise | Likely | Very low |
WOMAC score (3–24 months) | 2 (313)131,132 | None; no difference | Unclear | Consistent | Direct | Precise | Likely | Low |
Revision (3–56 months) | 1 (209)132 | None; inconclusive | Low | NA | Direct | Imprecise | Likely | Very low |
Mortality (NA) | 0 | NA | NA | NA | NA | NA | NA | NA (no evidence) |
Implant dislocation (12–56 months) | 2 (335)130,132 | OR 3.97 (0.79 to 19.90); inconclusive | Low | Inconsistent | Direct | Imprecise | Likely | Very low |
Summary conclusions for the comparison between total hip replacement and resurfacing arthroplasty
The majority of the evidence from three RCTs130–132 (Table 44) and three systematic reviews142–144 (Table 45) comparing THR and RS was rated as inconclusive (RCTs – very low-grade evidence). Nevertheless, the evidence from two RCTs and two systematic reviews indicated a reduced risk of infection130,132 and implant dislocation142,143 among RS patients compared with THR patients. However, the evidence from the same reviews also indicated that recipients of RS were at higher risk of revision and component loosening than patients who received a THR. In three RCTs130–132 the mean postoperative OHS, WOMAC score (low-grade evidence) and Merle d’Aubigné and Postel score were not different between patients who received THR and those who received RS.
Conclusive evidence suggesting difference | Conclusive evidence suggesting no difference | Inconclusive evidence |
---|---|---|
Infection130,132 (in favour of RS) | OHS,130 WOMAC score (low-grade evidence),131,132 Merle d’Aubigné and Postel score132 | HHS (very low-grade evidence)130 UCLA activity score,131,132 SF-36,131 EQ-5D,130 revision (very low-grade evidence),132 mortality (no evidence; not graded), deep-vein thrombosis,130,132 implant dislocation (very low-grade evidence),130,132 superficial wound complications,130 aseptic loosening132 |
Conclusive evidence suggesting difference | Conclusive evidence suggesting no difference | Inconclusive evidence |
---|---|---|
Revision142,143 (in favour of THR), implant dislocation142,143 (in favour of RS), component loosening142,143 (in favour of THR) | None | HHS,142,143 WOMAC score,142,143 Merle d’Aubigné and Postel score,142 UCLA activity score,142 mortality,142,143 failure,144 infection142 |
There was inconclusive evidence on mortality (three RCTs130–132 and two systematic reviews142,143), HHS (one RCT130 and two systematic reviews142,143), UCLA activity score (two RCTs131,132 and one systematic review142) and selected complications (i.e. infection, wound complication, deep-vein thrombosis; two RCTs130,132 and one systematic review142).
Results from individual RCTs were considered inconclusive because of the partial reporting (missing data for effect estimates, CIs, SEs, SDs, p-values) and great uncertainty in the estimates (wide CIs). The findings from the systematic reviews were inconclusive because of great uncertainty in the pooled estimates (wide CIs), lack of reporting of pooled results across RCTs (i.e. only narrative synthesis reported) or inconsistent summary findings.
Other analysis
Publication bias
The extent to which publication bias could have influenced the pooled treatment effect estimates (i.e. degree of funnel plot asymmetry) could not be explored because of insufficient numbers of data points in the forest/funnel plots.
Heterogeneity, subgroup effects and sensitivity analysis
The reviewed data from RCTs were too sparse (only three RCTs) to allow an exploration of whether or not the effect of any given THR relative to RS differed by study-level methodological (i.e. risk of bias, type of data analysis) or patient-related (i.e. age, sex or functional status) characteristics. None of the included RCTs reported within-study subgroup effects of the THR relative to RS (or vice versa).
Overall summary of the clinical effectiveness findings
A large proportion of evidence appraised and summarised in this review has been judged to be inconclusive (very low to low grade) because of poor reporting, missing data, inconsistent results and/or great uncertainty in the treatment effect estimates. Nevertheless, results from most studies suggested significantly improved post-surgery scores for functional/clinical measures (HHS, OHS, WOMAC score, MACTAR score, Merle d’Aubigné and Postel score and SF-12 score), regardless of the type of THR or RS received. Some moderate- or lower-grade evidence indicated no difference for these measures between different types of THR (or between THR and RS) at different follow-up times. There was a reduced risk of implant dislocation for participants receiving a THR with a larger femoral head size (vs. a smaller head size) or with a cemented cup (vs. cementless; high-grade evidence). Moreover, the evidence suggested a reduced femoral head penetration rate (moderate grade evidence) and risk of implant revision (very low-grade evidence) for participants who received cross-linked polyethylene compared with conventional polyethylene cup liner bearings. Participants with ceramic-on-ceramic articulations (vs. metal-on-polyethylene articulations) experienced a reduced risk of osteolysis. Recipients of RS had a lower risk of infection than recipients of a THR. The evidence on mortality and other complications (e.g. loosening, femoral fracture and deep-vein thrombosis) was inconclusive (very low grade).
Limitations of the reviewed evidence and pitfalls in interpretation
The review findings warrant cautious interpretation given the limitations of the available evidence. Specifically, great uncertainty in the treatment effect estimates (i.e. wide 95% CIs) because of limited sample sizes and/or small numbers of events (especially for deaths, revisions and complications), as well as incomplete or poor reporting (e.g. missing effect measures, SDs/SEs, 95% CIs, p-values), rendered some of the reviewed evidence inconclusive. Moreover, reported evidence on complications was scarce. It is unclear whether this is because of the absence or rarity of these events or because of under-reporting. In light of poor reporting, it was not possible to explore contextual factors that might have influenced the study results. For example, lack of blinding of participants and study personnel may have led to systematic differences in caregiving or co-interventions across implant groups, which would independently influence outcome measures. Furthermore, none of the studies reported the between-group distribution of experience and skills of study personnel, including surgeons, physicians, physiotherapists and occupational therapists. Any imbalance between the study treatment groups in the above-mentioned factors would influence the participants’ prognosis apart from treatment.
The paucity of data did not allow the exploration of any variation in the treatment effect across the predefined subgroups of patients or methodological features of studies; likewise, the extent of publication bias could not be examined using funnel plots because of the small numbers of studies in the meta-analyses.
Scenario analysis around revision rates
We did not feel that it would be appropriate to use data from other clinical trials/registries to check our findings from the economic modelling because the clinical effectiveness studies that we identified concerned with revision rates were based on low counts and/or on small trials with a great deal of uncertainty. Overall, across the THR/THR and THR/RS comparisons, trials were often based on selective populations or interventions and provided data on revision rates that were inconclusive with often wide CIs.
Comparison of the results from randomised controlled trials and systematic reviews
The findings of the RCTs and systematic reviews could be compared only with regard to implant fixation methods (cemented vs. cementless) and femoral head-on-cup articulations (e.g. metal-on-metal vs. metal-on-polyethylene, ceramic-on-ceramic vs. metal-on-polyethylene, ceramic-on-ceramic vs. ceramic-on-polyethylene). In summary, the effect estimates for differences between the above-mentioned THR groups in risk of revision, mortality and complications reported in RCTs and systematic reviews were statistically non-significant and had wide uninformative CIs around them. Therefore, the evidence from both RCTs and systematic reviews was rendered as inconclusive because of the wide variability around the estimates and/or missing data. The reviewed evidence from RCTs suggested that there was no difference in postoperative HHS between cemented and cementless THR groups. The evidence for HHS reported in the included systematic reviews was ruled as inconclusive.
Our update search identified four new relevant systematic reviews. 242–245 Of these four systematic reviews, three compared the effectiveness of THRs using different articulations (metal-on-metal vs. metal-on-polyethylene),242 implant fixation methods (cemented vs. cementless)245 or femoral stem coating materials (hydroxyapatite coated vs. non-hydroxyapatite coated)244 for risk of revision,245 HHS,242,244,245 mortality245 and complications. 242,245 The remaining systematic review compared THR with RS for risk of revision. 243
Briefly, the review by Voleti et al. 242 presented a meta-analysis based on three RCTs and found no significant difference in HHS between the two articulations (metal-on-metal vs. metal-on-polyethylene) at 6 years post-surgery follow-up (pooled MD –1.05; p = 0.37). However, the risk of complications (dislocation, aseptic loosening, trochanteric/iliopsoas bursitis, femoral fracture and wound dehiscence) was greater in the metal-on-metal articulation group than in the metal-on-polyethylene articulation group (OR 3.37, 95% CI 1.57 to 7.26). 242 Similarly, another review245 presented a meta-analysis of seven RCTs showing a statistically non-significant difference in the mean postoperative HHS between the cemented and the cementless THR groups (pooled MD 1.12, 95% CI –1.17 to 3.41). In the same review, the meta-analytic estimates for risk of revision (six RCTs; pooled RR 1.44, 95% CI 0.88 to 2.36), mortality (five RCTs; pooled RR 1.06, 95% CI 0.73 to 1.52) and complications (four RCTs; pooled RR 1.54, 95% CI 0.21 to 11.03) between the cemented and the cementless groups of THR were also statistically non-significant. In the review by Li et al. ,244 the postoperative pooled mean HHS was not statistically significantly different between the hydroxyapatite-coated and the non-hydroxyapatite-coated THR groups (four RCTs; pooled MD 3.04, 95% CI –4.47 to 10.54). The review by Pailhe et al. 243 included a qualitative synthesis of three RCTs and eight non-RCTs, providing no definitive conclusions regarding the differences between THR and RS in terms of implant survival or risk of revision.
In summary, the findings from the newly identified systematic reviews242–245 are in agreement with those of this review in showing no difference in postoperative HHS between the cemented and the cementless THR groups. Also in agreement with our findings, the pooled estimates for revision, mortality and complications were statistically non-significantly different between the groups, with sufficiently wide 95% CIs (because of low event counts and the small sample size of trials) that were compatible with a moderate-to-large effect size in either direction, rendering these findings inconclusive. 245 Future well-designed RCTs need to corroborate or refute the finding of one systematic review242 which suggests that there is an increased risk of complications in the metal-on-metal articulation group compared with the metal-on-polyethylene articulation group.
Strengths and limitations of the review
One of the strengths of this review is the fact that the reviewers used systematic and independent strategies to minimise bias in searching, identifying, selecting, extracting and appraising the relevant evidence. The search strategy was applied to multiple electronic sources. Apart from the limitations of the evidence itself, the scope of this review was limited to a predefined set of outcomes ascertained from recently published evidence (2008 or later); evidence from non-English publications was not included. Given the wide scope and large amount of evidence identified, we limited inclusion to studies with a sample size of at least 100 that were published since 2008. The rationale for the size limitation was that smaller studies tend to be underpowered to detect meaningful differences in outcomes. 244,245 The results of such studies are usually rendered inconclusive because of statistically non-significant estimates with wide CIs that include large treatment effect size values compatible with both a better and a worse outcome for any given treatment compared with the control treatment. Therefore, to minimise this problem we calculated the minimum sample size for a study that would have 90% power at a two-tailed test significance level of 0.05 to detect a MD of 10 on the HHS (we selected a SD of 15 based on external sources). 107,246 This calculation yielded a total sample size of 100 participants.
Future research
Because the evidence for any given comparison of two types of THR was sparse (maximum of two trials), the observed findings need to be replicated in larger, long-term pragmatic trials comparing the same THRs with each other (or with RS) before more definitive conclusions or recommendation are made. Large, multicentre, long-term pragmatic trials would help to reliably evaluate relative treatment effects and their variation(s) across patients, as well as manufacturer-based subgroups, and maximise generalisability of the findings to larger populations in clinical practice settings. For a more complete picture to aid health-care policy decisions, trials are also needed to investigate the cost-effectiveness of alternative THR (or RS) techniques. Study authors are encouraged to specify MCIDs and power calculations for their primary outcome(s). This information would help in the interpretation of the study findings in both clinical and statistical terms. Better reporting of future trial results is also warranted.
Methods for the review of cost-effectiveness
Identification of studies
Initial scoping searches were undertaken in MEDLINE in October 2012 to assess the volume and type of literature relating to the assessment question. These scoping searches also informed development of the final search strategies (see Appendix 1). An iterative procedure was used to develop these strategies with input from clinical advisors and previous HTA reports (e.g. Vale et al. ,19 de Verteuil et al. 11). The strategies have been designed to capture generic terms for arthritis, THR and RS. Searches were limited by the addition of economic and quality of life terms, which were selected with reference to previous research. 247,248
Searches were date limited from 2002 (the date of the most recent NICE guidance in this area25). The searches were undertaken in November 2012 (for exact search dates see Appendix 1).
All bibliographic records identified through the electronic searches were collected in a managed reference database.
The following main sources were searched to allow for identification of relevant published and unpublished studies and studies in progress:
-
electronic bibliographic databases, including research in progress
-
references of included studies.
The following databases of published studies were searched: MEDLINE, MEDLINE In-Process & Other Non-Indexed Citations, EMBASE, Science Citation Index and Conference Proceedings Citation Index – Science, The Cochrane Library (specifically CDSR, CENTRAL, DARE, NHS EED and HTA database) and the Cost-effectiveness Analysis Registry (CEA Registry) (Articles).
The following databases of research in progress were searched: Current Controlled Trials, ClinicalTrials.gov, UKCRN Portfolio Database and NLM Gateway (HSRProj).
The reference lists of included studies were checked for additional studies.
Inclusion and exclusion criteria
The following inclusion and exclusion criteria were used to identify eligible studies reporting costs and/or effects of THR and RS useful for the economic model and decision analysis:
Inclusion criteria
Study design
-
RCTs.
-
Observational designs, cohort studies and registry-based studies.
-
Decision-analytic modelling studies.
-
Systematic reviews.
-
Meta-analyses.
Population
-
People with pain or disability resulting from end-stage arthritis of the hip for whom non-surgical management has failed.
Intervention
-
Elective primary THR.
-
Primary hip RS.
Comparator
-
Different types of primary THR compared with RS for people in whom both procedures are suitable.
-
Different types of primary THR compared with each other for people who are not suitable for hip RS.
-
Studies reporting costs or utilities without a comparator were also included.
Record
-
Full-text articles of completed or in-progress studies (protocols) published in English.
Outcomes
-
Cost-effectiveness outcomes were costs (cost of resources/devices, quantitative use of resources reported) and clinical effectiveness measures or utility measures (utility, EQ-5D score or QALYs), incremental cost-effectiveness ratios (ICERs), uncertainty measures, the ceiling willingness-to-pay (WTP) ratios and probabilities of cost-effectiveness from cost-effectiveness acceptability curves (CEACs).
Exclusion criteria
-
Non-English-language publications.
-
Abstract/conference proceedings, letters and commentaries.
-
Quality of life reported without utilities or QALYs.
-
Hip/knee data not reported separately.
-
Studies including only patients aged < 35 years.
Assessment of eligibility
All retrieved records were collected in a specialist database and duplicate records were identified and removed. An initial sift was undertaken by one reviewer to exclude clearly non-relevant records using the following exclusion criteria:
-
non-hip only papers
-
papers on animals
-
papers on children
-
papers on surgery for hip fracture only
-
non-English full-text papers.
This was followed by a formal sift by title and abstract by two reviewers using the inclusion/exclusion criteria. All identified relevant studies were read in full by two reviewers to identify eligible studies. Disagreement was resolved by a third reviewer. Reasons for exclusion of full-text papers were documented. The study flow was documented using a PRISMA diagram. 99
Data extraction
Data extraction was carried out in two stages by one reviewer using the data extraction sheets (see Appendix 4) and was checked by a second reviewer. Stage one considered all eligible studies and stage two considered studies assessed for usefulness for populating the economic model and decision analysis. Data extracted during stage one included the following:
-
study characteristics [i.e. author names, country, design, study aim, type of economic evaluation (cost-effectiveness analysis, cost–utility analysis), perspective (e.g. societal, health-care payer, patient) and study currency]
-
patient characteristics (i.e. number of participants, age, sex, OA)
-
outcomes [i.e. utilities, resources use and costs (both direct and indirect), ICERs]
Data extraction also included the overall study conclusion and a comment on the type of data included in the study that are relevant for the economic model. Studies were subsequently categorised by topic (THR or RS) and outcomes (costs or utilities) and cost studies were also ordered by year and date using the following hierarchy:
-
UK study published in 2008 or later
-
UK study UK study published before 2008
-
non-UK study published in 2008 or later
-
non-UK study UK study published before 2008.
Utility studies were ordered by study size and ‘patient-reported utility data’ (utilities derived prospectively using patient questionnaires or from databases that prospectively collected utilities) using the following hierarchy:
-
> 100 THR/RS patients and primary data
-
< 100 THR/RS patients and primary data
-
> 100 THR/RS patients and secondary data
-
< 100 THR/RS patients and secondary data.
Data extracted during the second stage considered the costs of THR (cost of the device, cost of surgical time/hospital stay), follow-up for successful THR, revision THR, follow-up for successful revision THR, RS (cost of the device, cost of surgical time/hospital stay), follow-up for successful RS, revision RS, follow-up for successful revision RS and utilities at baseline, post surgery up to 12 months and > 12 months post surgery. Information on definition of costs, source of costs, cost year and currency was also extracted.
Quality assessment
The key cost-effectiveness papers that were identified as relevant for the economic model were assessed by one reviewer and checked by a second reviewer using the Consensus on Health Economic Criteria (CHEC);249 cost-effectiveness studies with decision-analytic models were also assessed using the criteria of Philips et al. 250
Results of the review of cost-effectiveness
Identification of studies
The flow chart outlining the process of identifying relevant literature can be found in Figure 18. The database search identified 1650 records, with an additional 14 records identified through screening of reference lists of included studies. Removal of duplicates left 913 studies to be screened for inclusion. The initial sift excluded 283 studies that were clearly not relevant, with a further 525 records excluded on title and abstract (κ = 0.89). The remaining 105 full-text articles were assessed for eligibility, of which 35 were excluded with reasons (see Appendix 13). This resulted in a total of 70 eligible articles,8,11,19,37,38,40,43,44,120,130,148,208,251–308 in which 66 studies were reported and subsequently included in the review. Of these, 35 were observational studies with or without an economic analysis,37,208,251,252,254,255,258,264,266–268,270–272,274,276,277,279–282,284–287,290,294,295,297,298,300–302,305,306,308 22 were economic analyses11,19,38,44,148,253,256,257,259–262,269,273,275,278,288,291–293,299,304,307 including three HTAs,11,19,148,299 four were reviews8,43,289,296 (three non-systematic43,289,296 and one systematic8), four were RCTs40,120,130,263,283,303 and one was a before-and-after trial. 265 Study location covered the UK (n = 138,11,19,37,38,40,43,44,130,251,252,257,292,295,299,304), other European countries (n = 22256,258,260,261,263,265,266,271,274–276,278,280,281,283,287,288,297,298,302,303,305,306), North America (n = 21120,148,208,253–255,259,262,267,268,277,284,285,289,291,293,294,296,300,301,307), Australia and New Zealand (n = 6262,267,271,280,284,288) and Asia (n = 4270,272,279,308). Costs/resource use were reported by 30 studies,43,148,254,256,261,264,267,268,270,271,273,274,276–280,283,285–293,300,304,308 utilities/QALYs by 15 studies122,251,252,258,266,272,284,294–298,301,302,305,306 and both costs/resource use and utilities/QALYs by 21 studies. 8,11,19,37,38,40,44,130,208,253,255,257,259,260,262,263,265,269,275,281,282,299,303,307 Seven of the 14 economic models reported transition probabilities. 8,11,19,253,259,261,275,299
A separate search (December 2012) of the ClinicalTrials.gov, Current Controlled Trials, UKCRN Portfolio and HSRProj Databases retrieved 511 potential trials or health services research projects. After screening titles and full records (if available), eight clinical trials were identified as potentially relevant from the cost-effectiveness point of view (see Appendix 7). All were either ongoing or completed since 2009.
Description of included studies
Resurfacing arthroplasty
Evidence on RS was scarce with only five of the 66 included studies investigating hip RS (see Appendix 10). A 2012 UK RCT including 126 OA patients suitable for RS investigated the cost-effectiveness of RS compared with THR. 40,130 At the end of this 12-month trial small benefits of RS in terms of QALYs could be shown for a selected patient group, resulting in an ICER of £17,451 per QALY. This evidence was stronger for male than for female patients. In a comparison between ceramic-on-ceramic THR and RS at 3 months post surgery, evidence was not as strong, favouring THR over RS. 208 However, longer-term follow-up in a study comparing hybrid THR with RS confirmed that, after 5 and 9 years, the revision rates for RS were lower than for hybrid THR (9.3% and 16.7% at 9 years post surgery, respectively) and patients were more active. 251,252
A retrospective economic decision analysis of published data over a 30-year time horizon showed the cost-effectiveness of RS compared with THR for women aged < 55 years and men aged < 65 years. 253 The main drivers of cost-effectiveness were the cost of the implant and length of hospital stay. 40,208 However, Vale et al. 19 reported in their HTA that RS would be cost-effective compared with THR only if RS revision rates could be shown to be 80–88% lower than revision rates for THR. They further concluded that RS could be cost-effective compared with ‘watchful waiting’ followed by THR or an extended period of ‘watchful waiting’ over 20 years.
Total hip replacement
The majority of studies investigated THR (n = 61) (see Appendix 10). Of these, five compared minimally invasive techniques with standard THR, reporting perioperative advantages, better short-term outcomes and reduced costs in favour of minimally invasive techniques. 11,148,254–256 However, Coyle et al. 148 concluded that there is little evidence of a difference between the two surgical techniques in the long term, mainly because of lack of data.
Ten of the THR studies focused on the comparison of different types of THR or specific components/brands of THR. Briggs et al. ,38 Davies et al. ,43 Fordham et al. 257 and Hulleberg et al. 258 assessed different brands of THR, Bozic et al. 259 investigated alternative bearings including metal-on metal, ceramic-on-ceramic and ceramic-on-polyethylene and Laupacis et al. ,120 Marinelli et al. ,260 Pennington et al. 44 and di Tanna et al. 261 compared cemented, cementless and hybrid THR more generally and reported inconsistent findings. The most recent economic model by Pennington et al. 44 used PROMs and showed that (1) cemented prostheses were the least costly type for THR, (2) hybrid prostheses were the most cost-effective and (3) cementless prostheses did not provide sufficient improvement in health outcomes to justify their additional costs. Similarly, Davies et al. 43 identified cemented prostheses as the least costly type of prosthesis in their review. However, they concluded that there is a lack of observed long-term prosthesis survival data and particularly limited up-to date evidence for the UK, which led them to call for more trials with longer-term follow-up. Cummins et al. 262 reported that use of antibiotic-impregnated bone cement can result in an overall decrease in costs. For more detail on the studies investigating the different types of THR see Appendix 12.
Patient management and rehabilitation was the focus of four studies,263–266 which reported that perioperative management and rehabilitation programmes could improve patient outcomes and reduce costs.
The majority of the THR studies (30/618,37,267–285,298,300–302,304–308) assessed the costs and/or effectiveness of THR without a specific focus on a rehabilitation programme, surgical intervention, implant brand or prosthesis type. Of these, two US studies267,268 concentrated on obese patients and reported that, even though operative costs are higher for obese patients, overall care costs and in-hospital outcomes for THR are comparable across all BMI groups. Eleven studies269–279 evaluated the cost-effectiveness of THR in a specific country, and two multicentre studies280,281 aimed to assess the costs and outcomes of THR comparatively across a number of European member states. These two studies concluded that improvement after surgery is associated with high preoperative expectations. Stargardt et al. 280 reported further that the total cost of treatment ranged from €1290 (Hungary) to €8739 (the Netherlands) and that the two main cost drivers were the cost of the implants and ward costs.
The overall findings of the cost-effectiveness studies were that (1) THR resulted in greater benefits than conservative treatment and (2) longer waiting times incurred greater costs and resulted in physical deterioration. 271,282,283 Further, agreement was reached on the long-term cost-effectiveness and sustained benefits of THR. 37,120,257,273,275 However, Bozic et al. 284 stated that, although THR improved quality of life, failed THR could lead to health states worse than chronic OA. Resource use might be increased as patients with a THR were shown to have a 10% increase in hospital stay compared with patients pre surgery. 285
In contrast, two studies286,287 that took a patient perspective rather than a health-care perspective concluded that out-of-pocket costs (including hospital costs, medication costs, rehabilitation costs, costs of health professional visits, costs of tests, costs of special equipment, costs of household alterations, use of private and community services and transportation costs that are not paid for by the health system), as well as use of health services, fell dramatically in the first year post surgery and that costs as well as resource use depended on pre-surgery health status.
Studies that focused on revision THR concluded that revision THR seems cost-effective but that it is resource intensive and has important implications for the allocation of health-care funding as the number of revisions is expected to increase with increasing demand for THR. 288–293 Vanhegan et al. 292 evaluated the costs associated with revision THR for different indications and reported that costs vary significantly by indication and that these variations were not reflected in the NHS tariffs. Durable implants and reduction in complications such as early dislocations have been suggested to be the solutions to reduce revision rates. 289 However, the highest revision costs were reported for revision as a result of infection,292 with infections caused by methicillin-resistant strains of bacteria (41% of periprosthetic joint infections) incurring significantly higher costs than infections with sensitive strains of bacteria. 293
Four studies evaluated the usefulness of different outcome measures for measuring quality of life after THR or revision THR, which showed that there was no consistency in the tools used to assess quality of life. Feeny et al. 294 reported that there is low agreement between certain outcome measures [SF-36, standard gamble, Health Utilities Index (HUI)-2 and HUI-3]. Dawson et al. 295 and Jones et al. 296 found that disease-specific measures reported larger changes than generic and utility measures. Ostendorf et al. 297 recommended the use of the OHS and the SF-12 in the assessment of THR and the EQ-5D in situations in which utility values are needed.
Overall, studies confirmed the long-standing claims that THR and RS are cost-effective interventions for patients with OA of the hip. However, there is little evidence from long-term trials on differences between implant brands and types of prostheses. This limits the conclusions that can be drawn with regard to the most cost-effective type of prosthesis. Studies used different methodologies to estimate costs (reference costs vs. prices actually paid by health-care centres) and definitions of costs included varied extensively, and many studies did not clearly report how costs were broken down. Although this review concentrates on clinical outcomes measured by the EQ-5D, the included studies tended to use more than one outcome measure with great variation across studies. In summary, THR, more so than RS, is a widely researched topic and receives great interest in many countries; however, further research should set out to include an assessment of the cost-effectiveness of different treatments.
Core studies for the cost-effectiveness analysis
Ranking eligible cost studies by year and country (most recent UK studies on top) and utility studies by number of participants, 11 studies were identified that were potentially useful to inform the decision model. These included one HTA and a further four cost-effectiveness studies. The HTA assessed the cost-effectiveness of hip RS compared with watchful waiting and THR. 19 The cost-effectiveness studies included three models that compared the cost-effectiveness of RS and THR,253 the cost-effectiveness of cemented, cementless and hybrid prostheses44 and the cost-effectiveness of two particular prosthesis types,38 respectively. One cost-effectiveness study was included that evaluated THR and RS but did not use a model. 40
The remaining six studies included partial economic evaluations that examined either costs or consequences but not both. Vanhegan et al. 292 reported costs for revision THR; Baker et al. 252 and Hulleberg et al. 258 reported medium- to long-term utilities in small populations; Dawson et al. 295 investigated quality of life post revision THR; and Bozic et al. 284 measured health state utilities for chronic OA of the hip, successful primary THR, failed primary THR, successful revision THR, failed revision THR and chronically infected THR. Rolfson et al. 298 evaluated the Swedish patient-reported outcomes data, reporting utilities for close to 35,000 THR patients.
Of the 11 studies three reported costs for THR,19,40,44 two reported costs for follow-up of successful THR19,40 and three reported costs of revision THR19,44,292 (see Appendix 12). Costs for RS were reported in three studies. 19,40,253 Of these, Edlin et al. 40 and Vale et al. 19 also reported follow-up costs after successful RS and Bozic et al. 253 reported costs for revision RS (see Appendix 11).
The studies reporting the most useful data on utilities following THR were those by Pennington et al. ,44 Rolfson et al. ,298 Hulleberg et al. ,258 Dawson et al. 295 and Bozic et al. 284 (see Appendix 14). Utilities for RS were reported in only three studies40,252,253 (see Appendix 15). No data were identified on quality of life at > 12 months post RS or for post-revision RS. Follow-up costs reported by Vale et al. 19 were the same for THR, RS and revision THR. Similarly, Bozic et al. 253 made no distinction between revision following THR or RS in terms of costs.
Quality assessment of core studies
Of the 11 core studies, five252,253,258,295,298 provided useful information on EQ-5D utility scores only and one292 provided useful data on costs only. These partial economic evaluations were not included in the critical appraisal. 309
Five studies19,38,40,44,253 were full economic evaluations and have been critically appraised using the CHEC-list. 249 Of these five studies, four19,38,44,253 included models. These studies have also been critically appraised using an adapted checklist for models developed by Philips et al. 250
Table 46 shows that all studies met ≥ 16 of the 19 criteria in the CHEC-list. 249
CHEC-list | Bozic 2010253 | Briggs 200438 | Edlin 201240 | Pennington 201344 | Vale 200219 |
---|---|---|---|---|---|
1. Is the study population clearly described? | Y | Y | Y | Y | Y |
2. Are competing alternatives clearly described? | Y | Y | Y | Y | Y |
3. Is a well-defined research question posed in answerable form? | Y | Y | Y | Y | Y |
4. Is the economic study design appropriate to the stated objective? | Y | Y | Y | Y | Y |
5. Is the chosen time horizon appropriate to include relevant costs and consequences? | Y | Y | Y | Y | Y |
6. Is the actual perspective chosen appropriate? | Y | Y | Y | Y | Y |
7. Are all important and relevant costs for each alternative identified? | Y | Y | Y | Y | Y |
8. Are all costs measured appropriately in physical units? | Y | Y | Y | Y | Y |
9. Are costs valued appropriately? | Y | Y | Y | Y | Y |
10. Are all important and relevant outcomes for each alternative identified? | Y | Y | Y | Y | Y |
11. Are all outcomes measured appropriately? | Y | Y | Y | Y | Y |
12. Are outcomes valued appropriately? | Y | Y | Y | Y | Y |
13. Is an incremental analysis of costs and outcomes of alternatives performed? | Y | Y | Y | Y | Y |
14. Are all future costs and outcomes discounted appropriately? | Y | Y | NA | Y | Y |
15. Are all important variables whose values are uncertain appropriately subjected to sensitivity analysis? | Y | Y | Y | Y | Y |
16. Do the conclusions follow from the data reported? | Y | Y | Y | Y | Y |
17. Does the study discuss the generalisability of the results to other settings and patient/client groups? | Y | N | Y | UN | N |
18. Does the article indicate that there is no potential conflict of interest of study researcher(s) and funder(s)? | UN | Y | Y | Y | UN |
19. Are ethical and distributional issues discussed appropriately? | N | N | N | UN | N |
Table 47 shows that all studies met ≥ 20 of the 32 criteria for economic models provided by Philips et al. 250 All studies correctly reported the time horizon and the perspective of the model, and the inputs used within the models were consistent with the perspectives that were chosen. In terms of costs and outcomes used in the model, these were appropriate to the specific study data set that was used. All studies conducted subgroup analyses. None of the studies applied a half-cycle correction and no justification was given for its exclusion. In addition, Pennington et al. 44 did not provide a clear definition of all of the options under evaluation and Briggs et al. 38 did not specify the cycle length of the model.
Philips criteria | Bozic 2010253 | Briggs 200438 | Pennington 201344 | Vale 200219 |
---|---|---|---|---|
Structure | ||||
1. Is there a clear statement of the decision problem? | Y | Y | Y | Y |
2. Is the objective of the model specified and consistent with the stated decision problem? | Y | Y | Y | Y |
3. Is the primary decision-maker specified? | N | Y | N | Y |
4. Is the perspective of the model stated clearly? | Y | Y | Y | Y |
5. Are the model inputs consistent with the stated perspective? | Y | Y | Y | Y |
6. Is the structure of the model consistent with a coherent theory of the health condition under evaluation? | Y | Y | Y | Y |
7. Are the sources of the data used to develop the structure of the model specified? | Y | Y | Y | Y |
8. Are the structural assumptions reasonable given the overall objective, perspective and scope of the model? | UN | Y | UN | UN |
9. Is there a clear definition of the options under evaluation? | Y | Y | UN | Y |
10. Have all feasible and practical options been evaluated? | Y | N | Y | Y |
11. Is there justification for the exclusion of feasible options? | UN | N | UN | UN |
12. Is the chosen model type appropriate given the decision problem and specified casual relationships within the model? | Y | Y | Y | Y |
13. Is the time horizon of the model sufficient to reflect all important differences between the options? | Y | Y | Y | Y |
14. Do the disease states (state transition model) or the pathways (decision tree model) reflect the underlying biological process of the disease in question and the impact of interventions? | Y | Y | Y | Y |
15. Is the cycle length defined and justified in terms of the natural history of disease? | Y | UN | Y | Y |
Data | ||||
16. Are the data identification methods transparent and appropriate given the objectives of the model? | N | Y | Y | Y |
17. Where choices have been made between data sources are these justified appropriately? | Y | UN | Y | Y |
18. Where expert opinion has been used are the methods described and justified? | NA | NA | NA | Y |
19. Is the choice of baseline data described and justified? | N | Y | Y | Y |
20. Are transition probabilities calculated appropriately? | UN | Y | UN | Y |
21. Has a half-cycle correction been applied to both costs and outcomes? | N | N | N | N |
22. If not, has the omission been justified? | N | N | N | N |
23. Have the methods and assumptions used to extrapolate short-term results to final outcomes been documented and justified? | UN | Y | Y | Y |
24. Are the costs incorporated into the model justified? | Y | Y | Y | Y |
25. Has the source for all costs been described? | Y | Y | Y | Y |
26. Have discount rates been described and justified given the target decision-maker? | Y | Y | Y | Y |
27. Are the utilities incorporated into the model appropriate? | Y | Y | Y | Y |
28. Is the source of utility weights referenced? | Y | Y | Y | Y |
29. If data have been incorporated as distributions, has the choice of distributions for each parameter been described and justified? | N | Y | N | NA |
30. If data are incorporated as point estimates, are the ranges used for sensitivity analysis stated clearly and justified? | NA | NA | NA | N |
31. Has heterogeneity been dealt with by running the model separately for different subgroups? | Y | Y | Y | Y |
32. Have the results been compared with those of previous models and any differences in results explained? | Y | N | N | N |
Core studies for the economic model
Of the 11 core studies, Edlin et al. ,40 Pennington et al. ,44 Vale et al. 19 and Vanhegan et al. 292 provided data for the model in Chapter 9 (see Chapter 9 for the rationale of the selection procedure). This section will provide a brief description of the four core studies (Table 48).
Study and country | Study design | Methods | Results | Main conclusion | Information provided in the study | |
---|---|---|---|---|---|---|
Edlin 2012,40 UK Costa 2012,130 UK |
Type: RCT and economic (cost–utility) analysis Aim: to report on the relative cost-effectiveness of THR and RS in patients with severe arthritis suitable for hip joint RS |
Population: patients aged > 18 years with severe arthritis of the hip joint suitable for RS (n = 126): THR n = 66, RS n = 60 Outcomes: primary: hip function (12 months post-surgery OHS and HHS); secondary: quality of life (EQ-5D), disability rating, physical activity level, complications, cost-effectiveness, incremental costs, ICERs Economic analysis: NHS perspective, 12-month time horizon, cost year 2009/10 (£), univariate sensitivity analyses |
Hip function: mean OHS: effect size 2.23 (95% CI –1.52 to 5.98, p = 0.070); mean HHS: effect size 6.04 (95% CI –0.51 to 12.58, p = 0.242) Complication rates did not differ (p = 0.291) Quality of life at 12 months: RS 0.795, THR 0.727; RS vs. THR: incremental QALYs 0.032, incremental cost £564, ICER £17,451 per QALY |
No evidence of a difference in hip function between groups was seen in patients with severe arthritis of the hip, 1 year post surgery RS appears to offer very short-term efficiency benefits over THR within a selected patient group |
1(a) Resource use | ⊠ |
(b) Costs | ⊠ | |||||
2(a) Utilities | ⊠ | |||||
(b) QALYs | ⊠ | |||||
3 Transition probabilities | ||||||
Pennington 2013,44 UK | Type: retrospective economic (cost–utility) and decision analysis Aim: to evaluate the relative cost-effectiveness of cemented, cementless and hybrid prostheses for elective THR surgery |
Population: patients undergoing primary THR for OA (n = 30,203 for quality of life analysis) Male: cemented 35.1% (n = 4195), cementless 44.6% (n = 6548), hybrid 38.0% (n = 1350) Age (years), mean (SD): cemented 72.4 (6.7), cementless 67.8 (7.2), hybrid 70.4 (7.2) Outcomes: quality of life 6 months post surgery (OHS, EQ-5D), lifetime cost-effectiveness, costs (£), ICERs Economic model: health service perspective, cost year 2010/11 (£); sensitivity analysis of QALYs post 2 years, revision rates using different hazard function, failed hip category without revision, excluding metal-on-metal prostheses |
Lifetime costs: lowest with cemented prostheses Postoperative quality of life and lifetime QALYs: highest with hybrid prostheses Women aged 70 years: mean costs for cemented prosthesis £6900, mean costs for cementless prosthesis £7800, mean costs for hybrid prosthesis £7500 Mean postoperative EQ-5D scores: cemented 0.78, cementless 0.80, hybrid 0.81 Lifetime QALYs: cemented 9.0, cementless 9.2, hybrid 9.3 ICER: hybrid vs. cemented £2500 per QALY |
Cemented prostheses were the least costly type for THR. For most patient groups hybrid prostheses were the most cost-effective. Cementless prostheses did not provide a sufficient improvement in health outcomes to justify their additional costs | 1(a) Resource use | |
(b) Costs | ⊠ | |||||
2(a) Utilities | ⊠ | |||||
(b) QALYs | ||||||
3 Transition probabilities | ||||||
Comment: initial costs (including prosthesis, operating theatre and hospital stay costs); utilities and revision rates; costs and utilities by sex, year group and prosthesis type | ||||||
Vale 2002,19 UK McKenzie 2003,299 UK |
Type: systematic review and retrospective economic (cost–utility) analysis Aim: to assess the effectiveness and cost-effectiveness of metal-on-metal hip RS compared with watchful waiting, THR, osteotomy, arthrodesis and arthroscopy of the hip joint |
Population: patients with hip disease Age (years): 45–50 and 65–70 Outcomes: costs (£), QALYs, ICERs Economic model: Markov model, 20-year time horizon, NHS perspective, cost year 2000 (£); subgroup analysis considering those who would not outlive a THR; sensitivity analyses for revision rates, operation times, watchful waiting costs, time horizon and quality of life |
Revisions: RS over 3-year follow-up: 0–14%, THR over 10-year follow-up: ≤ 10%, osteotomy over 10- to 17-year follow-up: between 2.9% and 29% Patients pain free: RS: 91% at 4 years, THR: 84% at 11 years, arthrodesis: 22% at 8 years Costs: RS for a patient aged < 65 years £5515, THR £4195, revision £6027, arthroscopy £951, osteotomy £2731, watchful waiting £642 annually Cost-effectiveness: for patients aged < 65 years, RS dominated by THR; RS dominated watchful waiting within 20-years’ follow-up Incremental cost per QALY: RS vs. osteotomy £3039, RS vs. arthroscopy £366 For patients aged > 65 years, THR dominated RS |
Metal-on-metal RS had lower revision rates than THR over an extended time period and resulted in better outcomes overall for those who are likely to outlive a primary THR. If metal-on-metal RS has lower revision rates than THR over an extended period and results in better outcomes from subsequent THR, then metal-on-metal RS could possibly be considered cost-effective or even dominant | 1(a) Resource use | |
(b) Costs | ⊠ | |||||
2(a) Utilities | ||||||
(b) QALYs | ⊠ | |||||
3 Transition probabilities | ⊠ | |||||
Comment: revision rates for metal-on-metal RS and THR; costs including prosthesis costs; broken-down costs for watchful waiting | ||||||
Vanhegan 2012,292 UK | Type: retrospective economic analysis Aim: to evaluate the costs associated with revision THR for different indications |
Population: patients undergoing revision THR (n = 286; n = 305 procedures) Male: aseptic loosening (n = 194): 34% (n = 65), deep infection (n = 76): 42% (n = 32), periprosthetic fracture (n = 24): 25% (n = 6), dislocation (n = 11): 28% (n = 3) Age (years), mean (range): aseptic loosening 67 (20–89), deep infection 62 (29–83), periprosthetic fracture 76 (31–88), dislocation 79 (54–90) OA: aseptic loosening 69%, deep infection 48%, periprosthetic fracture 80%, dislocation 54% Outcomes: LOS, costs (£) |
Mean total costs for revision surgery: aseptic cases £11,897 (SD 4629), septic revision £21,937 (SD 310,965), periprosthetic fracture £18,185 (SD £9124), dislocation £10,893 (SD £5476) Surgery for infection and periprosthetic fracture: longer operating times, increased blood loss, increase in complications, longer LOS |
Financial costs vary significantly by indication. Variation is not reflected in current NHS tariffs | 1(a) Resource use | ⊠ |
(b) Costs | ⊠ | |||||
2(a) Utilities | ||||||
(b) QALYs | ||||||
3 Transition probabilities |
Edlin et al. 40 reported a cost–utility analysis of RS compared with THR as part of a RCT of 126 adult patients with severe arthritis of the hip. Patients were randomised on a 1 : 1 basis between THR and RS. All RS patients received a Cormet™ (Corin Group, Cirencester, UK) metal-on-metal RS prosthesis. The THR patients received one of three types of prosthesis (ceramic-on-ceramic, metal-on-metal or metal-on-polyethylene) depending on the surgeon’s preference. The study took the NHS perspective and considered the within-trial period without any extrapolation past the 12-month trial period. The costs were reported in 2009/10 UK pounds and EQ-5D 3 Levels (EQ-5D-3L) outcomes were measured as secondary outcomes of the trial.
The study used Healthcare Resource Group v4 (HRG4) reference costs combined with NHS trust finance department list prices for implants and IPD on length of stay (LOS). Resource use data and personal costs were obtained from patient-reported data. Univariate sensitivity analyses included an assessment of the impact of using the cheapest THR type (metal-on-metal) for all THR operations. The study reported NHS and Personal Social Services (PSS) costs after 12 months by type of hip replacement (THR vs. RS), including the costs of the initial operation/care, subsequent inpatient, outpatient, primary and community care, aids and medication [THR £7217 (£1320); RS £6653 (£917)], as well as private and social costs. The main results of this analysis included a difference in QALYs of 0.033 in favour of RS after 12 months and a greater cost of RS (difference of £564) in the first 12 months following surgery. This resulted in an ICER for RS of £17,451 per QALY. These results are based on a short-term trial using a single RS prosthesis type. The study did not explore variation in costs within for each type of prosthesis used in THR. Variation in prosthesis costs by hospital, a change in current practice regarding the choice of THR implant, longer follow-up (including higher revision rates for RS than for THR) and use of different RS implants may affect the reported cost-effectiveness in this study.
Pennington et al. 44 used IPD from three data sources (national PROMs programme, the NJR and Hospital Episode Statistics) to compare the cost effectiveness of cemented, cementless and hybrid THR in adult patients with hip OA. They conducted a probabilistic Markov model over patients’ lifetime taking the NHS perspective. Implant prices were based on prices paid by English NHS centres. Costs for surgery plus hospital stay were taken from the literature and adjusted for LOS by prosthesis type and costs of revision were varied by reason for revision. Costs were reported as 2010/11 prices. The national data sources provided data on quality of life, LOS, rates of revision and rerevision and mortality for 30,203 patients.
Patients receiving different prosthesis types were matched by age, sex, number of comorbidities, ASA grade, BMI, deprivation, preoperative quality of life, surgeon experience and hospital type. The study reported data on the combined cost of the prosthesis, operating theatre and hospital stay, quality of life at 6 months post surgery and 5- and 10-year revision rates by prosthesis type, age group and sex. Overall, the study concluded that in patients aged 70 years the ICER for a hybrid prosthesis compared with a cemented prosthesis was £2100 for men and £2500 for women, with hybrid prostheses resulting in higher quality of life in all subgroups except women aged 80 years and cemented prostheses being the least costly option. The initial costs of a cementless prosthesis were highest in all subgroups. One of the limitations of the study was that it assumed that the observed quality of life at 6 months post surgery would remain unchanged for the patients’ lifetime. Furthermore, the study did not consider different revision rates by brand for the three different THR types.
Vale et al. 19 undertook an assessment of the clinical effectiveness and cost-effectiveness of RS compared with watchful waiting (i.e. patient monitoring, drug-based treatment and supportive activities including physiotherapy), THR and other bone-conserving treatments. The HTA comprised a systematic review of the clinical effectiveness and cost-effectiveness of RS compared with any of the treatments above and a Markov model comparing the comparators from the NHS perspective for patients suitable for RS for up to 20 years. Cost data (in 2000/1 UK pounds) for THR and revision THR were taken from the literature (£4195 and £6027, respectively) and prostheses costs for RS were obtained from manufacturers. The model considered the lower of the two RS implant costs obtained (£1730 vs. £1890), resulting in an overall cost of £5515 for RS. LOS was estimated to be 10 or 12 days for THR and 8 or 10 days for RS. All other costs including use of the operating theatre and staff, radiography, outpatient visits and first-year follow-up costs were assumed to be the same for RS and THR. First-year follow-up included two outpatient visits with one radiography scan, totalling £118.74. Quality-of-life estimates considered pain levels and quality-of-life scores for mild, moderate and severe OA and were combined with revision and mortality rates to generate QALYs.
The main conclusion from the systematic review was that evidence from the literature on the effectiveness of RS was limited. Revision rates were reported to range between 0% and 14% over a 3-year follow-up period for RS compared with ≤ 10% over 10 years for THR. Patients with RS experienced less pain than patients managed by watchful waiting. Results from the model showed that RS was dominated by THR based on assumptions about revision rates for RS and the lower cost of THR. In subsequent sensitivity analyses the revision rates for RS had to be reduced to < 80–88% of the THR revision rates before RS was no longer dominated by THR. However, RS dominated watchful waiting within the 20-year follow-up. The study was limited because of the lack of data for the parameters of the model, particularly revision rates for different RS brands and effectiveness data for revision THR following RS. Furthermore, available data for RS originated from a small number of surgeons.
Vanhegan et al. 292 investigated the costs of 305 consecutive revision THRs by reason for revision in 286 patients, with a diagnosis of hip OA in 64% of revisions (n = 195). Revision THR was carried out in a single tertiary centre by one of three experienced surgeons. Costs were obtained from the finance department of the tertiary centre (in 2007/8 UK pounds) and included costs of the implant, materials and augmentation, use of the operating theatre and recovery room, the inpatient stay and laboratory tests, radiology, pharmacy, physiotherapy and occupational therapy. The study provided cost data on 13 different implants and data on resource use and costs by reason for revision (aseptic loosening, deep infection, periprosthetic fracture and dislocation).
The mean costs of revision for aseptic loosening, deep infection, periprosthetic fracture and dislocation were reported to be £11,897 (SD £4629), £21,937 (SD £10,965), £18,185 (SD £9124) and £10,893 (SD £5476), respectively. Higher complication rates as well as reoperation rates were associated with revisions for deep infection, periprosthetic fracture and dislocation. However, the numbers of revisions for these three indications were relatively small (n = 76, n = 24 and n = 11, respectively). Although the cost estimates can be assumed to be very accurate, they are limited by their lack of generalisability as they were based on one single tertiary centre. Furthermore, the study did not consider the cost of readmission for complications and other direct and indirect medical and social costs.
Summary of the cost-effectiveness evidence
We found that four19,40,44,292 of the 11 core cost-effectiveness studies were able to provide utility and cost data for the model. We assessed these using the checklists developed by Evers et al. 249 and Philips et al. 250 and found them to be of varying quality. All studies met ≥ 16 of the 19 criteria for economic analyses provided by Evers et al. 249 and ≥ 20 of the 32 criteria for economic models provided by Philips et al. 250
Methods for the review of registries
Identification of studies
Initial scoping searches were undertaken in MEDLINE in October 2012 to assess the volume and type of literature relating to national joint registries for hip replacement procedures. These scoping searches informed the development of the final search strategy (see Appendix 1). The registry search strategy was designed to capture the generic terms for ‘arthritis’, total hip replacement’ and ‘resurfacing arthroplasty’ in addition to the word ‘registry’. Searches were not date limited for the registry search and were undertaken in November 2012 (see Appendix 1). All bibliographic records identified through the electronic searches were collected in a managed reference database.
The following databases of published studies were searched: MEDLINE, MEDLINE In-Process & Other Non-Indexed Citations, EMBASE, Science Citation Index and Conference Proceedings Citation Index – Science, The Cochrane Library (specifically CDSR, CENTRAL, DARE, NHS EED, HTA database) and CEA Registry (articles).
Inclusion and exclusion criteria
The following inclusion and exclusion criteria were used to identify eligible papers reporting joint replacement studies. The aim was to identify any studies that reported survival, utilities and outcomes that would potentially be useful for the economic model and survival analysis.
Inclusion criteria
Study design (registries)
-
Reporting of the results of joint replacement registry data collection.
-
All study designs.
-
Most recent publication in the series.
Population
-
People with pain or disability resulting from end-stage arthritis of the hip for whom non-surgical management has failed.
Intervention
-
Elective primary THR.
-
Primary hip RS.
Comparator
-
Different types of primary THR compared with hip RS for people in whom both procedures are suitable.
-
Different types of primary THR compared with each other for people not suitable for hip RS.
Record
-
Full-text articles of completed studies published in English and annual reports of national registries.
Outcomes
-
All reported outcomes.
Exclusion criteria
-
Abstract/conference proceedings, letters and commentaries.
-
Non-English-language publications.
-
< 1000 patients included in the registry study at the time of publication.
-
Hip/knee data not reported separately.
Assessment of eligibility
All retrieved records were collected in a referencing database and all duplicate records were identified and removed. The search returned 541 records. An initial sift was undertaken by one reviewer to exclude clearly non-relevant records using the following exclusion criteria:
-
non-hip only papers
-
papers on animals
-
papers on children
-
non-registry papers
-
papers on surgery for hip fracture only
-
non-English full-text papers.
This was followed by a formal sift of 329 papers by title and abstract by two reviewers using the inclusion/exclusion criteria. All identified relevant studies were read in full by one reviewer to identify eligible studies, with cross-checking by a second reviewer. Disagreement was resolved by a third reviewer. Reasons for exclusion of full-text papers were documented.
Data extraction
Data extraction was carried out on the final eligible papers by one reviewer in two stages. In stage one all eligible studies were considered and in stage two the studies that would provide useful input to the economic model and survival analysis were identified. Data extracted in stage one included the following:
-
author surname
-
publication year
-
country of registry
-
year that registry data were collected
-
type of registry data collected
-
size of the registry database
-
description of the patient population
-
results of key outcomes.
Data extraction of the overall aim and conclusion of each paper was also conducted to help identify inputs for the economic model and survival analysis. During stage two data extraction, registry studies were ordered by their publication year to ensure that the most recent data were extracted. Stage two extraction included the following additional exclusion criteria:
-
not the most recent paper in a publication series
-
not the most recent annual report from a national joint registry.
Results of the registry review
Identification of studies
The PRISMA flow diagram outlining the identification of registry studies is shown in Figure 19. 99 The database search for registry studies identified 538 publications, with an additional record identified through other sources. A total of 326 papers remained once duplicates were removed and these were screened for relevance. This process resulted in the exclusion of a further 230 papers, with 96 papers screened at title and abstract level. A further 47 studies were excluded with a reason provided (see Appendix 16), resulting in the inclusion of 49 studies in the review. 15,16,49,261,298,310–353
Of the 49 papers included in the review, 44 were carried out in the following 10 countries: Japan (n = 1310), Australia (n = 5311–315), the UK (n = 715,16,316–319,353), Italy (n = 2261,320), Finland (n = 10321–330), Norway (n = 5331–335), the USA (n = 449,336–338), Denmark (n = 4339–342), Sweden (n = 3298,343,344), and Slovakia (n = 1345). In addition, seven papers346–352 reported outcomes from multinational registries.
In stage two, 1949,298,310,312,316,319,321,322,324–326,331,332,336,338,340,342,346,352 of the 49 papers were excluded (not most recent paper publication in a series or not most recent annual report from a national joint registry). Therefore, 30 papers were included in the narrative review, reflecting the most recent publication in a series from each particular registry for both THR and RS. 15,16,261,311,313–315,317,318,320,323,327–330,333–335,337,339,341,343–345,347–351,353
Review of included studies following stage two exclusion
A narrative review of the included papers by intervention type (THR, RS) and country is given in the following sections. The 30 papers did not report similar patient populations, interventions, comparator groups or outcomes and therefore they are reported separately. For the purposes of the economic model and survival analysis, revision rate and implant survival were the key outcomes to be extracted.
Resurfacing arthroplasty
Eight registry studies provided evidence on RS. 15,311,313,318,329,349,351,353 The majority of these studies investigated various comparisons between THR and RS. Table 49 provides a summary of the RS studies.
Study | Registry | Implant type/comparator | Outcomes | Results |
---|---|---|---|---|
Jameson 2012353 | NJR | Men vs. women undergoing RS | Survival time to revision for RS procedures | Women were at greater risk of revision than men (HR 1.30, 99% CI 1.01 to 1.76; p = 0.007) |
McMinn 2012318 | NJR | Cemented vs. uncemented THR procedures, and cemented and uncemented THR procedures vs. RS in men only | Mortality and revision rates (8 years) | Higher mortality rate for patients undergoing cemented than for patients undergoing uncemented THR (adjusted HR 1.11, 95% CI 1.07 to 1.16) |
Smith 201215 | NJR | Men vs. women undergoing RS by femoral head size | Revision rate (5 years) | Revision rate: women, 55 years: 8.3% (95% CI 7.2% to 9.7%) for 42-mm RS head, 6.1% (95% CI 5.3% to 7.0%) for 46-mm RS head and 1.5% (95% CI 0.8% to 2.6%) for 28-mm cemented metal-on-polyethylene stemmed THR; men, 55 years: 4.1% (95% CI 3.3% to 4.9%) for 46-mm RS head, 2.6% (95% CI 2.2% to 3.1%) for 54-mm RS head and 1.9% (95% CI 1.5% to 2.4%) for 28-mm cemented metal-on-polyethylene stemmed THR |
Seppanen 2012329 | Finnish Arthroplasty Register | RS vs. THR | Risk of revision (3.5–3.9 years) | No statistically significant difference in risk of revision between RS and THR (risk of revision 0.93, 95% CI 0.78 to 1.10) |
Buergi 2007311 | Australian National Joint Replacement Registry | RS vs. THR | Risk of revision (3 years) | Revision rates after RS and THR were 2.8% and 2.0%, respectively |
Corten 2010313 | Multinational | RS vs. THR | Revision rate (3 years) | Revision rate for RS was 1.8% in England and Wales and 3.4% in Sweden |
Johanson 2010349 | Nordic Arthroplasty Registry | RS vs. THR | RR | RS had a threefold increased revision risk compared with THR (RR 2.7, 95% CI 1.9 to 3.7) |
Schuh 2012351 | Multinational | RS reported in registry vs. clinical studies from specialist centres | Revision rates (difference in revisions per 100 observed component-years) | Specialist clinical centres (defined by the number of patients treated, staff training and personal expertise): 0.27 (95% CI 0.14 to 0.40) per 100 observed component-years; register data: 0.74 (95% CI 0.72 to 0.76) per 100 observed component-years. Average revision rate from the register data was 3.41% (SD 1.79%) |
England and Wales
Jameson et al. 353 conducted a retrospective cohort study and reported survival time to revision for RS procedures from 2003 to 2013. The study explored the risk factors independently associated with failure. Mean time to revision for each group was not reported. Data were taken from the NJR for England and Wales. The study concluded that women were at greater risk of revision than men (HR 1.30, 99% CI 1.01 to 1.76; p = 0.007), independent of age. Smaller femoral head components were also significantly more likely to require revision than medium (≤ 44 mm: HR 2.14, 99% CI 1.53 to 3.00; p < 0.001) or large heads (45–47 mm: HR 1.48, 99% CI 1.09 to 2.00; p = 0.001), as was surgery performed by low-volume surgeons (HR 1.36, 99% CI 1.09 to 1.71; p < 0.001).
McMinn et al. 318 examined mortality and revision rates among patients with OA undergoing THR, both cemented and uncemented procedures, or RS. The authors used data from the NJR database for the analysis [154,996 patients receiving cemented THR, 120,017 receiving uncemented THR and 8352 receiving RS (in particular, Birmingham hip RS)]. The baseline characteristics recorded include age (cemented mean 73.2 years, uncemented mean 66.7 years), sex (cemented: men 53,409, women 101,587; uncemented: men 50,529, women 69,488) and ASA grade. The analysis took into account the age of patients at primary surgery and their length of follow-up. Survival analysis was used to compare the cemented and uncemented procedures with adjustment for sex, age at primary surgery, ASA grade before the operation, complexity of the procedure and ‘both sides’ (surgery on both hips at the same time).
The multivariable survival analyses demonstrated a higher mortality rate for patients undergoing cemented THR than for those undergoing uncemented THR (adjusted HR 1.11, 95% CI 1.07 to 1.16). There was a lower revision rate for cemented procedures (unadjusted HR 0.53, 95% CI 0.50 to 0.57). The authors stated that these findings translate into small predicted differences in the population-averaged absolute survival probability at all time points. At 8 years post surgery the predicted probability of death in the cemented group was 0.013 higher (95% CI 0.007 to 0.019) than that in the uncemented group and the predicted probability of revision was 0.015 lower (95% CI 0.012 to 0.017). In multivariable analyses that included only men, there was a higher mortality rate in the cemented group and the uncemented group than in the RS group. RS had a similar revision rate to uncemented THR and both had a higher revision rate than cemented THR. The authors concluded that there was a small but significant increased risk of revision with uncemented THR compared with cemented THR, and a small but significant increased risk of death with cemented procedures.
A study from Smith et al. 15 reported that, in women, RS resulted in worse implant survival than THR, regardless of head size. The predicted 5-year revision rates in 55-year-old women were 8.3% (95% CI 7.2% to 9.7%) for a 42-mm RS head, 6.1% (95% CI 5.3% to 7.0%) for a 46-mm RS head and 1.5% (95% CI 0.8% to 2.6%) for a 28-mm cemented metal-on-polyethylene stemmed THR. In men with smaller femoral heads, RS resulted in poor implant survival. Predicted 5-year revision rates in 55-year-old men were 4.1% (95% CI 3.3% to 4.9%) for a 46-mm RS head, 2.6% (95% CI 2.2% to 3.1%) for a 54-mm RS head and 1.9% (95% CI 1.5% to 2.4%) for a 28-mm cemented metal-on-polyethylene stemmed THR. Of the male RS patients, only 23% (5085/22,076) had a head size ≥ 54 mm. The authors concluded that RS resulted in similar implant survival to other surgical options in men with large femoral heads, and worse implant survival in other patients, particularly women.
Finland
Seppanen et al. 329 analysed the risk of revision of 4401 RS procedures in the Finnish Arthroplasty Register compared with the risk of revision of 48,409 THRs performed during the same time period. The median follow-up time was 3.5 (range 0–9) years for RS and 3.9 (range 0–9) years for THRs. The study reported no statistically significant difference in risk of revision between RS and THR (risk of revision 0.93, 95% CI 0.78 to 1.10). The 4-year unadjusted Kaplan–Meier survival rate was 96% (95% CI 96% to 97%) for both the RS group and the THR group. Female patients had about double the risk of revision as male patients (risk of revision 2.0, CI 1.4 to 2.7).
Australia
Buergi et al. 311 reported the use of RS based on the Australian National Joint Replacement Registry. A total of 7205 RS procedures were carried out between 1999 and 2005. The study concluded that, in the database, early revision rates were higher for RS than for THR. At 3 years, the revision rate after RS was 2.8% and that after THR was 2.0%.
Multinational
Corten et al. 313 compared RS survivorship reported by registries in Australia, England and Wales and Sweden with the failure of THR between 2006 and 2009. RS was associated with an overall increased failure rate compared with THR. The cumulative revision rates in the Australian registry were 3.7% for RS and 2.7% for THR. The 3-year revision rate for RS was 1.8% in England and Wales and 3.4% in Sweden.
A study using data from the Nordic Arthroplasty Registry compared the outcome of RS (n = 1638) with that of THR (n = 309,290) between 1995 and 2007. 349 Results indicated that RS had a threefold increased revision risk compared with THR (RR 2.7, 95% CI 1.9 to 3.7). The difference was greater when RS was compared with cemented THR (RR 3.8, 95% CI 2.7 to 5.3). In men aged < 50 years the difference in revision risk was less (RS vs. THR: RR 1.9, 95% CI 1.0 to 3.9; RS vs. cemented THR: RR 2.4, 95% CI 1.1 to 5.3). However, the difference in revision risk was higher in women of the same age group (RS vs. THR: RR 4.7, 95% CI 2.6 to 8.5; RS vs. cemented THR: RR 7.4, 95% CI 3.7 to 15). In the Cox regression analysis, RS showed an increased risk of early aseptic revision compared with THR (RR 2.7, 95% CI 1.9 to 3.7; p < 0.001) and cemented THR (RR 3.8, 95% CI 2.7 to 5.3; p < 0.001).
The purpose of one recent study351 was to evaluate the outcome of Birmingham hip RS using revision rates as reported in national joint replacement registry studies (categorised as from the UK, Australia, Asia and the USA). In total, 9806 RS procedures were analysed (reported as 44,294 observed component-years). The analysis revealed a significant difference in revisions per 100 observed component-years between studies authored by specialist clinical centres (defined by the number of patients treated, staff training 4and personal expertise) (0.27, 95% CI 0.14 to 0.40) and the register data (0.74, 95% CI 0.72 to 0.76). The average revision rate from register data was 3.41% (SD 1.79%).
Summary of resurfacing arthroplasty in registry studies
In summary, the eight studies that reported data from joint registries had mixed results. There is little evidence from long-term studies; generally, 5-year revision rates (or less) were reported. No two studies had the same comparators for analysis, which makes drawing conclusions from the eight studies difficult. The reported benefits of RS include preservation of the bone on the femoral side, greater physiological stress transfer at the proximal femur and lower risk of dislocation because of the larger femoral head compared with conventional THR. 351 However, the majority of studies included in this review found that RS had a higher revision rate than THR, particularly in female patients. Only one study found no significant difference between the procedures. 329 No studies were included that reported RS implant survival as better than that for THR. One study of men only reported that RS had a similar revision rate to that of uncemented THR, but that both had a higher revision rate than that of cemented THR. 318
Total hip replacement
In total, 22 registry studies reported evidence on THR, with the majority of these studies investigating various types of THR surgery or demographic differences regarding the specific countries. Table 50 provides a summary of the THR studies.
Study | Registry | Implant type/comparator | Outcomes | Results |
---|---|---|---|---|
Jameson 2012317 | NJR | Primary cemented THR | Survival time to revision (7 years) | 7-year rate of revision for any reason 1.70% |
Smith 201216 | NJR | Metal-on-metal THR vs. non-metal-on-metal THR – head size and sex | Survival time to revision (5 years) | Larger heads failed earlier: cumulative incidence of revision 3.2% (95% CI 2.5% to 4.1%) for 28-mm heads and 5.1% (95% CI 4.2% to 6.2%) for 52-mm heads at 5 years in men aged 60 years. The 5-year revision rates in younger women were 6.1% (95% CI 5.2% to 7.2%) for 46-mm metal-on-metal THR and 1.6% (95% CI 1.3 to 2.1) for 28 mm metal-on-polyethylene THR |
Johnsen 2006339 | Danish Hip Arthroplasty Registry | Patient-related factors and the risk of initial, short-term and long-term failure after primary THR | Implant revision | Male sex and comorbidity index score (Charlson Comorbidity Index) were strongly predictive of THR failure. In total, 3.1% of the 36,984 procedures were revised |
Pedersen 2011341 | Danish Hip Arthroplasty Registry | Mortality of patients undergoing primary THR compared with that in the general population | Adjusted mortality rate ratio | Long-term mortality was lower among THR patients than in the general population control group (adjusted mortality rate ratio 0.7, 95% CI 0.7 to 0.7) |
Lazarinis 2010343 | SHAR | Cementless cups with or without HA | Revision because of aseptic loosening | HA coating was a risk factor for cup revision because of aseptic loosening (adjusted RR 1.7, 95% CI 1.3 to 2) |
Weiss 2012344 | SHAR | Monoblock cups vs. modular cups | Implant survival (5 years) | Implant survival 95% (95% CI 91% to 98%) for monoblock cups and 97% (95% CI 96% to 98%) for modular cups (p = 0.6) |
Luo 2012314 | AOANJRR | Identification of implants with higher than expected failure rates between 2003 and 2007 | NR | Results state that if the poor-performing THRs had been conducted using average longevity designs, the number of THR revisions could have been reduced by 47% |
Sexton 2009315 | AOANJRR | Metal-on-polyethylene vs. ceramic-on-ceramic THR | Rate of revision | Higher rate of revision for dislocation in ceramic-on-ceramic THR than in metal-on-polyethylene THR when smaller head sizes (≤ 28 mm) were used in younger patients (< 65 years) (HR 1.53, p = 0.041) and also with larger head sizes (> 28 mm) in older patients (≥ 65 years) (HR 1.73, p = 0.016) |
Di Tanna 2011261 | Emilia-Romagna Regional Registry on Orthopaedic Prosthesis | Cementless vs. hybrid prostheses | Numbers of revisions expected | 243 revisions would be expected in the cementless group vs. 300 in the hybrid group. This was equal to a 19% difference and a NNT of 18 |
Stea 2009320 | Emilia-Romagna Regional Registry on Orthopaedic Prosthesis | Survival rates for THR in Italy between 2000 and 2006 | Implant survival rate (7 years) | 7-year implant survival rate was 96.8% (95% CI 96.4% to 97.1%) |
Eskelinen 2005323 | Finnish Arthroplasty Register | Population-based survival of cementless THR | Implant survival rate (10 years) | Survival rate of > 90% at 10 years for cementless THR |
Makela 2011327 | Finnish Arthroplasty Register | Cemented vs. cementless THR | Implant survival rate (15 years) | 15-year survival rate for cementless THR (80%) was comparable with rates in the cemented groups (86%) |
Makela 2011328 | Finnish Arthroplasty Register | Cemented vs. cementless THR for OA patients | Implant survival rate (15 years) | Implant survival rates for the cementless THR groups (62%, 95% CI 57% to 67% and 58%, 95% CI 52% to 66%) were worse than that of the cemented THR group (71%, 95% CI 62% to 80%) |
Nečas 2011345 | Slovakia | Operations performed between 2003 and 2010 | Revision rate (7 years) | Revision rate in period 2003–10 was 9.15% |
Espehaug 2011332 | Norwegian Arthroplasty Register | Differences by county and regional health authority over a 20-year period (1989–2008) | Numbers of THR procedures performed | Increase in number of THR procedures performed from 109 operations per 100,000 inhabitants in 1991–5 to 140 in 2006–8 |
Fevang 2010333 | Norwegian Arthroplasty Register | Risks of revision during the time periods 1993–7, 1998–2002 and 2003–7 were compared to that in the reference period 1987–92 | Revision risk | Reduced risk of revision in the time periods 1993–7, 1998–2002 and 2003–7 compared with the reference period |
Schrama 2010335 | Norwegian Arthroplasty Register | THR in RA patients vs. OA patients | Implant survival (5 years) | 5-year survival was 99.5% in RA patients and 99.4% in OA patients (RR 0.98, 95% CI 0.65 to 1.48 for RA vs. OA patients) |
Namba 2012337 | Kaiser Permanente Total Joint Replacement Registry | Factors associated with deep SSI following THR | Incidence of SSI | 155 deep SSIs (0.51%, 95% CI 0.43% to 0.59%) occurred at a mean of 72 days (median 28, SD 93.3 days) after the procedure |
Sadoghi 2012350 | Multinational | Compared primary THRs between different countries in terms of THR number per inhabitant, age and procedure type | Implant survival | THRs performed in Denmark showed the lowest survival rate within the first 15 years; however, THRs performed in Norway had similar low survival rates |
Graves 2011347 | Multinational | The use of metal-on-metal THR across three registries | NR | All registries reported an increased revision rate associated with larger femoral head sizes when using metal-on-metal bearing surfaces |
Havelin 2009348 | Nordic Registry | Compared demographics, choice of implant, fixation techniques and results between countries | Implant survival (10 years) | 10-year survival rate was 92% (95% CI 91.6% to 92.4%) in Denmark, 94% (95% CI 93.6% to 94.1%) in Sweden and 93% (95% CI 92.3% to 93.0%) in Norway |
Kadar 2012334 | Nordic Registry | Metal femoral heads made from various materials (cobalt–chromium, aluminium, zirconium) | Implant survival (12 years) | The survival rate was 88.1% with cobalt–chromium heads and 74.8% with zirconium heads |
England and Wales
Jameson et al. 317 reported survival time to revision following primary cemented THR in 34,721 THRs recorded in the NJR for England and Wales between 2003 and 2010. The authors reported the 7-year rate of revision for any reason as 1.70% (99% CI 1.28% to 2.12%). The overall risk of revision was independent of age, sex, ASA grade, BMI, surgeon volume, surgical approach, brand of cement/presence of antibiotic, femoral head material (stainless steel/alumina) and stem taper size/offset.
Smith et al. 16 assessed the use of metal-on-metal bearing surfaces in the NJR between 2003 and 2011. They reported that metal-on-metal THR failed at high rates and that this was linked to head size. Analysis of the 31,171 metal-on-metal THRs showed that larger heads failed earlier (cumulative incidence of revision: 3.2%, 95% CI 2.5% to 4.1% for 28-mm heads and 5.1%, 95% CI 4.2% to 6.2% for 52-mm heads at 5 years in men aged 60 years). The 5-year revision rates in younger women were 6.1% (95% CI 5.2% to 7.2%) for 46-mm metal-on-metal THR and 1.6% (95% CI 1.3 to 2.1) for 28-mm metal-on-polyethylene THR. This finding contrasted with findings for ceramic-on-ceramic bearing surfaces, for which larger head sizes were associated with improved survival (5-year revision rate: 3.3%, 95% CI 2.6% to 4.1% for 28-mm heads and 2.0%, 95% CI 1.5% to 2.7% for 40-mm heads for men aged 60 years).
Denmark
Johnsen et al. 339 examined the association between patient-related factors and the risk of initial, short-term and long-term failure after primary THR using data from the Danish Hip Arthroplasty Registry (n = 36,984). The study concluded that in Denmark between 1995 and 2002 male sex and comorbidity index score (Charlson Comorbidity Index) were strongly predictive of THR failure. The Charlson Comorbidity Index includes 19 disease categories, which correspond to International Classification of Diseases, Eighth Edition (ICD-8) and International Classification of Diseases, Tenth Edition (ICD-10) codes used in the national registries. A total of 1132 primary THRs were revised (3.1% of the 36,984 procedures) during this time period.
A more recent study from Denmark341 evaluated short-term (0–90 days) and longer-term (up to 12.7 years) mortality of patients undergoing primary THR compared with mortality in the general population. THR patients (n = 44,558) was matched at the time of surgery with three people from the general population (n = 133,674). The findings suggest that there was a 1-month period of increased mortality immediately after surgery among THR patients (adjusted mortality rate ratio 1.4, 95% CI 1.2 to 1.7); however, overall short-term mortality (0–90 days) was significantly lower (adjusted mortality rate ratio 0.8, 95% CI 0.7 to 0.9). THR surgery was associated with increased short-term mortality in subjects aged < 60 years and among THR patients without comorbidity. Long-term mortality was lower among THR patients than in the general population control group (adjusted mortality rate ratio 0.7, 95% CI 0.7 to 0.7).
Sweden
Lazarinis et al. 343 analysed patient data (n = 8043) on cementless cups with or without a hydroxyapatite coating that had been recorded in the SHAR between 1992 and 2007. The primary end point was revision because of aseptic loosening; the secondary end points were cup revision for any reason and cup revision because of infection. The results reported that the hydroxyapatite coating was a risk factor for cup revision because of aseptic loosening (adjusted RR 1.7, 95% CI 1.3 to 2). Age at primary THR of < 50 years, paediatric hip disease, a cemented stem and the cup brand were also associated with a statistically significantly increased risk of cup revision due to aseptic loosening.
A more recent study from Sweden reported data from 1999 to 2010. 344 The authors investigated revision rates of monoblock cups used in primary THR that were registered in the SHAR. Kaplan–Meier and Cox regression analyses with adjustment for age, sex and other variables were used to calculate survival rates and adjusted HRs of the revision risk for any reason. The cumulative 5-year survival rate with any revision as the end point was 95% (95% CI 91% to 98%) for monoblock cups and 97% (95% CI 96% to 98%) for modular cups (p = 0.6). The adjusted HR for revision of monoblock cups compared with modular cups was 2 (95% CI 0.8 to 6, p = 0.1). The authors concluded that there was not any clinically relevant difference in risk of revision between monoblock and modular acetabular cups in the medium term.
Australia
Luo et al. 314 analysed the effect of the AOANJRR on the cost of joint arthroplasty through identification of implants with higher than expected failure rates between 2003 and 2007. A total of 242,454 primary joint arthroplasties were performed in Australia at a cost of AU$4.1B. The authors state that if the poor-performing THRs had been conducted using average longevity designs, the number of THR revisions could have been reduced by 47%.
One study315 investigated the relationship between the bearing surface and the risk of revision because of dislocation using 110,239 records in the AOANJRR from 1999 to 2007. The authors reported that 2621 (2.4%) primary THRs were revised for any reason; 862 (0.78%) THRs were revised because of dislocation. Ceramic-on-ceramic bearing surfaces had a lower risk of revision for dislocation than metal-on-polyethylene and ceramic-on-polyethylene bearing surfaces at 7 years’ follow-up. The authors reported a significantly higher rate of revision for dislocation with ceramic-on-ceramic bearing surfaces than with metal-on-polyethylene bearing surfaces when smaller head sizes (≤ 28 mm) were used in younger patients (< 65 years) (HR 1.53, p = 0.041) and also with larger head sizes (> 28 mm) in older patients (≥ 65 years) (HR 1.73, p = 0.016).
Italy
Di Tanna et al. 261 report data from the Emilia-Romagna Regional Registry on Orthopaedic Prosthesis from 2000 to 2007. This registry collects information on all orthopaedic interventions performed in Emilia-Romagna, Italy. The study assessed the cost-effectiveness of cementless prostheses compared with hybrid prostheses in 41,199 THRs and concluded that there were differences in the revision rate and impact on costs between the two groups. The authors concluded that, considering two cohorts of 100 subjects, 243 revisions would be expected in the cementless group compared with 300 in the hybrid group. This was equal to a 19% difference and a number needed to treat of 18.
A second paper reporting on the Emilia-Romagna Regional Registry on Orthopaedic Prosthesis320 conducted survival analysis using the Kaplan–Meier method to analyse survival rates for THRs in Italy between 2000 and 2006 (35,042 THRs, 5878 revisions). The reported cumulative survival rate for THR at 7 years was 96.8% (95% CI 96.4% to 97.1%). Multivariate analysis demonstrated that THR survival was affected by pathology, for example the presence of RA. Women comprised 66.4% of patients and > 54.0% of patients were overweight (BMI > 25 kg/m2). Mean age at primary surgery was 66.9 years (range 16–101 years) and at revision was 70.0 years (range 22–98 years).
Finland
Eskelinen et al. 323 evaluated the population-based survival of cementless THR in patients aged < 55 years using data from the Finnish Arthroplasty Register. All cementless stems studied showed a survival rate of > 90% at 10 years.
Makela et al. 327 analysed population-based survival rates for cemented and cementless THRs in patients aged ≥ 55 years in Finland between 1980 and 2006. The 15-year survival rate for cementless THR (80%) was comparable with the rates for the cemented groups [86% in cemented group 1a (cemented, loaded-taper stem combined with a cemented, all-polyethylene cup) and 79% in cemented group 2 (a cemented, composite-beam stem with a cemented, all-polyethylene cup)] when revisions for any reason were used as the end point. The authors concluded that both cementless stems and cementless cups, analysed separately, had a significantly lower risk of revision for aseptic loosening than cemented implants.
The same authors reported revision outcomes in primary OA. 328 The 15-year survival rate of group 1 cementless THR (implants with a cementless, straight, proximally circumferentially porous-coated stem and a porous-coated press-fit cup) performed in 1987–96 (62%, 95% CI 57% to 67%) and group 2 cementless THR (implants with a cementless, anatomic, proximally circumferentially porous-coated stem, with or without hydroxyapatite, and a porous-coated press-fit cup with or without hydroxyapatite) performed during the same time period (58%, 95% CI 52% to 66%) was worse than that of cemented THR (71%, 95% CI 62% to 80%), although the difference was not statistically significant. The risk of revision for aseptic loosening of group 1 cementless THR (0.49, 95% CI 0.32 to 0.74) was lower than that of cemented THR (p = 0.001).
Slovakia
One study345 reported findings from Slovakia from 2003 to 2010, including a total of 4970 primary THRs and 457 revisions. Cement was used for all components in 35.45% of all arthoplasties, 53.25% were cementless and 11.28% were hybrids. By 2010, the revision rate reached 9.20%, representing an annual increase of 1.1%. The revision rate in the whole observed period from 2003 to 2010 was 9.15%.
Norway
Espehaug et al. 332 studied differences by county and regional health authority over a 20-year period (1989–2008) using data from the Norwegian Arthroplasty Register. The authors observed an increase in the number of THR operations, from 109 operations per 100,000 inhabitants in the years 1991–5 to 140 in 2006–8. Variations were found across the four regions studied.
A second study from Norway333 reported the risks of revision after THR during a 21-year period among hip replacements reported to the Norwegian Arthroplasty Register. The risks of revision during the time periods 1993–7, 1998–2002 and 2003–7 were compared with that of the reference period 1987–92. There was an overall reduced risk of revision in the time periods 1993–7, 1998–2002 and 2003–7 compared with the risk of revision in the reference period. The improved results were due to a reduction in the incidence of aseptic loosening of the femoral and acetabular components in all time periods and in all subgroups of prostheses. The best results were obtained with the use of cemented prostheses. Analyses of revision for any cause were carried out for all prostheses together and separately for cemented, hybrid, reverse hybrid and cementless prostheses. The major cause of revision was aseptic loosening of one or both implant components.
One study used data from the Norwegian Arthroplasty Register (data from 1987 to 2008)335 to compare the difference in risk of THR revision from infection and change in risk over time. Data was from 1987 to 2008. 333 Of the 84,492 THRs, 534 (0.6%) were revised for infection. Women had a significantly lower risk of revision for infection than men (RR 0.41, 95% CI 0.34 to 0.48). The cumulative 5-year survival rate was 99.5% in RA patients and 99.4% in OA patients (RR 0.98, 95% CI 0.65 to 1.48 for RA vs. OA patients) with revision for infection as the end point. The risk of revision for infection from 6 years postoperatively was higher in patients with RA.
USA
One study reported registry data from the USA. 337 It examined patient and surgical factors associated with deep surgical site infection (SSI) following THR using data from the Kaiser Permanente Total Joint Replacement Registry between 2001 and 2009. A total of 30,491 THRs were included in the analysis, of which 17,474 (57%) were performed on women. The incidence of SSI was 0.51% (155/30,491), equating to a total of 155 deep SSIs, which occurred at a mean of 72 days (median 28, SD 93.3 days) after the procedure. Patient factors associated with SSIs included female sex, obesity and ASA grade ≥ 3.
Multinational
Sadoghi et al. 350 compared primary THRs between different countries in terms of THR number per inhabitant, age and procedure type and compared survival curves including all THRs using data from nine registries. On average, the annual number of primary THRs per 100,000 inhabitants was found to be 133 for all ages, 26 for those aged < 55 years, 269 for those aged 55–64 years, 520 for those aged 65–74 years and 531 for those aged ≥ 75 years. The fixation method varied by country, for example in Sweden 67% of THRs are cemented whereas in Emilia-Romagna (Italy) 89% are cementless. Cementless fixation was more popular in Australia, Denmark, Emilia-Romagna, New Zealand and Portugal (50%) and cemented fixation was used more in Sweden and Norway (50%). Cemented and cementless fixations were used equally in England and Wales and Slovakia. The use of hybrid fixation was more uniform across countries and ranged from 8% in Portugal to 34.5% in New Zealand. Denmark showed the lowest survival rate within the first 15 years; however, THRs performed between 2006 and 2009 in Norway had similar low survival rates. All survival curves calculated in the study (except for Danish data) varied by < 1% within the first 9 years. Multivariate or subgroup analyses were not performed to compare the survival curves. The use of primary RS was not reported separately in the registries from Norway and Slovakia. Use of RS in the other countries varied from 1% in Portugal to between 2% and 3% in Denmark, Emilia-Romagna, New Zealand and Sweden to approximately 5% and 6% in Australia and England and Wales, respectively.
Graves et al. 347 performed an investigation of the use of metal-on-metal THRs in the National Arthroplasty Registries of Australia, England and Wales and New Zealand. All registries reported an increased revision rate associated with larger femoral head size when metal-on-metal bearing surfaces were used.
The Nordic Registry includes the joint registries of Denmark, Sweden and Norway. One study348 aimed to compare demographics, choice of implant, fixation techniques and results between the countries, including a total of 280,201 THRs performed between 1995 and 2006. The study reported that 9596 THRs (3.4%) had later been revised. RS accounted for ≤ 0.5% of procedures in all countries. The 10-year survival rate was 92% (95% CI 91.6% to 92.4%) in Denmark, 94% (95% CI 93.6% to 94.1%) in Sweden and 93% (95% CI 92.3% to 93.0%) in Norway.
A second study reporting data from the Nordic Registry compared the survival of cemented THRs with metal femoral heads made from various materials (cobalt–chromium, aluminium and zirconium). 334 The study reported prosthesis survival and relative revision risks adjusting for age, sex and diagnosis between 1987 and 2010. In total, 132,000 cases of THR were included in the analysis. At 12 years the survival rate was 88.1% for cobalt–chromium heads and 74.8% for zirconium heads. Aluminium femoral heads provided no advantage over cobalt–chromium heads for prosthesis survival. The authors concluded that cemented polyethylene THR with aluminium heads had a similar survival rate as the same THR with ceramic-on-ceramic heads when any revision was the end point.
Summary of the total hip replacement studies
The 22 THR studies reported the analysis of registry data from nine countries. These studies examined various aspects of the THR procedure, including revision and survival rates; different implants and combinations of implant bearing surfaces; and outcome measures such as reason for failure and patient differences associated with failure. Four of the 22 THR studies used registry data from multinational databases. Sadoghi et al. 350 provided an extensive review of registries worldwide. They stated that fixation methods varied by country, with the cemented THR being most popular in Sweden and Norway and the cementless THR being most common in Emilia-Romagna (Italy) but also popular in Australia, Denmark, New Zealand and Portugal. Cemented and cementless fixations were used equally in England and Wales and Slovakia. In terms of survival rates, THRs carried out in Denmark showed the lowest survival rate within the first 15 years.
Core articles included in the economic model and survival analysis
The prioritisation of the eligible studies resulted in the identification of 30 papers that were deemed to be potentially useful for the economic model and survival analysis. The final number of core papers that helped to inform the survival analysis in this report was three. 15,16,318 This was in addition to the annual reports from the Swedish Arthoplasty Registry,96 the NJR36 and the AOANJRR,95 which were used for comparison of survival analysis methods.
Summary of the registry evidence
Thirty papers were identified in the registry review and were included in the narrative synthesis. Eight of the studies reported registry data investigating the use of RS for the treatment of arthritis. Five of the studies combined findings in three individual countries and three studies used multinational data. The final number of THR papers included was 22. These papers reported various aspects of the THR procedure, including revision and survival rates; however, the time periods over which the analyses were carried out varied between 3 years and 15 years. Comparison of different implants and combinations of implant bearing surfaces was also conducted. Finally, additional outcome measures analysed included reason for failure (e.g. infection) and patient/demographic differences associated with failure.
Chapter 5 Individual patient data set
Introduction to individual patient data analysis
This chapter provides a narrative description of the IPD that were retrieved from the NJR and used for analysis in this report. The data set is known here as the NJR data set; data comes from the 009 data set including primary operations carried out before 1 March 2012. Any revision or notified death up to September 2012 has been included. The NJR is maintained on behalf of the Department of Health and the Welsh government. It was established in 2002 and is updated annually; data on hip and knee joint replacements were collected from April 2003. Northern Ireland joined the registry in 2013, which was after the receipt of the data. 36 Data are collected for all types of implants used in joint replacement and carried out across England and Wales. The NJR also includes data from some of the private operations carried out in independent hospitals.
Method
This is a retrospective cohort study that involves analysis of NJR data to derive time to revision of hip replacement procedures. The data provided by the NJR were divided into two types depending on type of surgery carried out: RS or THR. THR data were separated into five categories on the basis of the frequency of combinations of the components used in the procedures.
Selection of patients
Within this report THR and RS used for hip replacement procedures in England and Wales have been considered. This chapter explains the NJR data used for calculating parameter values to evaluate cost-effectiveness in the THR and RS economic models (see Chapter 8 and 9). For the purpose of this report and in line with the scope, information and analyses have been stratified by procedure type (THR and RS).
Structure of the database
The NJR database collects numerous variables relating to the joint affected, outcomes, procedures and implants. For the purposes of this study, 198 variables were requested from both the RS database and the THR database. The extracted data contained the following information:
-
patient demographics
-
provider type
-
lead surgeon grade
-
procedure types/patient procedure/side
-
indications for primary surgery
-
primary thromboprophylaxis
-
primary untoward intraoperative events
-
primary bone-graft usage
-
all primary implant details
-
current outcome type
-
time from primary operation to outcome
-
age at death
-
any revision details – date and reasons and implants removed.
All but a few entries for ‘indication’ included the word ‘osteoarthritis’; the few that did not were mostly entered as RA seronegative or RA seropositive. These were excluded from the analysis of time to revision.
Contents of the database
To evaluate the cost-effectiveness of hip replacement procedures in line with the scope, we requested the variables outlined in the previous section separately for the two patient groups (RS and THR):
-
RS – this involves removing the damaged surfaces of bones inside the hip joint and cementing a metal surface to the reshaped bone; the socket has a metal surface and is fixed into the pelvis without using cement (n = 31,222 excluding RA patients)
-
THR – this involves the removal of the entire damaged hip joint and replacement with an artificial joint (n = 387,667 including RA patients; 386,556 excluding RA patients).
Results
For statistical and economic modelling the primary outcome was time to revision.
Hip resurfacing arthroplasty
This section describes the data reported for the patients in the NJR RS data set. Figure 20 shows the outcomes for this group of patients. Of 31,222 patients, 9339 were female and 21,883 were male. Further subdivision according to age and head size is shown in Tables 51 and 52 (excludes RA patients).
Age group (years) | Head size (mm) | Total | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
36 | 38 | 40 | 42 | 44 | 46 | 48 | 50 | 52 | 54 | 56 | 58 | 60 | ||
15–24 | 0 | 2 | 0 | 3 | 0 | 8 | 2 | 11 | 4 | 7 | 1 | 0 | 0 | 38 |
25–34 | 0 | 1 | 0 | 2 | 7 | 37 | 44 | 69 | 28 | 36 | 6 | 4 | 1 | 235 |
35–44 | 0 | 0 | 2 | 12 | 30 | 205 | 300 | 776 | 311 | 405 | 41 | 31 | 0 | 2113 |
45–54 | 0 | 2 | 3 | 13 | 89 | 565 | 936 | 2516 | 1109 | 1312 | 164 | 121 | 3 | 6833 |
55–64 | 1 | 1 | 5 | 22 | 123 | 776 | 1334 | 3717 | 1519 | 1882 | 204 | 150 | 4 | 9738 |
65–74 | 0 | 0 | 1 | 9 | 24 | 206 | 340 | 1070 | 404 | 564 | 87 | 47 | 3 | 2755 |
75–84 | 0 | 0 | 1 | 2 | 3 | 15 | 11 | 63 | 20 | 44 | 2 | 5 | 0 | 166 |
85–94 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 1 | 1 | 1 | 0 | 0 | 0 | 5 |
Total | 1 | 6 | 12 | 63 | 276 | 1812 | 2969 | 8223 | 3396 | 4251 | 505 | 358 | 11 | 21,883 |
Age group (years) | Head size (mm) | Total | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
34 | 36 | 38 | 40 | 42 | 44 | 46 | 48 | 50 | 52 | 54 | 58 | ||
15–24 | 0 | 0 | 7 | 2 | 10 | 5 | 7 | 1 | 2 | 0 | 1 | 0 | 35 |
25–34 | 0 | 0 | 5 | 9 | 46 | 24 | 52 | 10 | 14 | 0 | 0 | 0 | 160 |
35–44 | 1 | 0 | 17 | 45 | 245 | 172 | 361 | 72 | 53 | 10 | 0 | 0 | 976 |
45–54 | 0 | 0 | 45 | 163 | 769 | 604 | 1267 | 240 | 225 | 22 | 14 | 1 | 3350 |
55–64 | 0 | 1 | 31 | 133 | 738 | 759 | 1678 | 355 | 342 | 20 | 9 | 1 | 4067 |
65–74 | 0 | 1 | 6 | 25 | 118 | 119 | 299 | 69 | 74 | 3 | 2 | 0 | 716 |
75–84 | 0 | 0 | 1 | 1 | 2 | 5 | 17 | 1 | 4 | 0 | 1 | 0 | 32 |
85–94 | 0 | 0 | 0 | 0 | 0 | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 3 |
Total | 1 | 2 | 112 | 378 | 1928 | 1689 | 3683 | 748 | 714 | 55 | 27 | 2 | 9339 |
Total hip replacement
The NJR describes the outcomes of patients undergoing THR surgery in England and Wales from April 2003 to December 2012. On date of receipt of the data (6 December 2012) the data set had a total of 387,694 records. From this number only 387,667 records were usable for one of more of the following reasons:
-
irrelevant data type reported (negative age, zero age) (22 records)
-
missing variable information (11 records).
The remaining 387,667 patients could have one of three outcomes (Figure 21):
-
death
-
unrevised THR
-
revision surgery.
Of these 387,667 patients, 240,156 (62%) were selected for analysis on the basis of the frequency of use of different THR components and of these 239,089 patients had an OA indication for surgery. Five different types of THR category were selected by looking at the frequency distribution of THR components used in the population of NJR participants using cross-tabulation.
Total hip replacement category development
The NJR database for non-RS procedures contained 387,694 records. After removing unusable records (this included records with missing entries and in which the primary time to outcome was negative), the database contained 387,667 useable records.
The database contained several key components of THR procedures, which were used to determine the categories that were used in the survival and cost-effectiveness analyses:
-
cup component group
-
cup component type
-
cup composition
-
cup fixation
-
cup implant type
-
head component type
-
head composition
-
liner component type
-
liner composition
-
stem component type
-
stem fixation
-
stem implant type.
We conducted two-way cross tabulations for each of the variables listed above to determine the most frequent combinations. For example, we cross-tabulated the cup component group with liner composition. We then added another component that was the most frequently occurring. For example, looking at the two-way cross-tabulation for cup component group and head composition, we know from the previous two-way cross-tabulation that the most frequent cup component group is shell, so taking this into account we then added the most frequent head composition. The next most frequent combination was then added and so forth and the process was repeated until all of the key components listed above had been taken into account.
This was an iterative process; by adding on the next most frequent combinations, we identified seven mutually exclusive categories. After consulting with our expert clinical advisor, we included four of these categories, which each accounted for > 25,000 operations. Our expert clinical advisor identified a further exclusive category (n = 12,705), which is a well-known option consisting of a cemented stem with a ceramic head articulating with a cemented polyethylene cup (Figures 22 and 23). Both the cup and stem are cheaper than cementless options and the ceramic femoral head is known to have better wear properties than the metal equivalent. Our advisor suggested that this combination is often used in younger high-demand patients because of its low wear characteristics.
Table 53 shows the final five categories that were used in the time to revision and cost-effectiveness analysis and this accounts for 239,089 patients (∼62% of patients in the NJR non-RS database). Characteristics of the five THR categories are provided along with their short-form acronyms. Further information on age and sex distribution and technical characteristics of the categories is provided in Tables 54 and 55, respectively.
Category | Characteristics | Acronym used in the report |
---|---|---|
A | Metal head (cemented stem) on cemented polyethylene cup | CeMoP |
B | Metal head (cementless stem) on cementless hydroxyapatite-coated metal cup (polyethylene liner) | CeLMoP |
C | Ceramic head (cementless stem) on cementless hydroxyapatite-coated metal cup (ceramic liner) | CeLCoC |
D | Metal head (cemented stem) on cementless hydroxyapatite-coated metal cup (polyethylene liner) | HyMoP |
E | Ceramic head (cemented stem) on cemented polyethylene (poly) cup | CeCoP |
Category | Women aged > 65 years | Men aged > 65 years | Women aged < 65 years | Men aged < 65 years | Total |
---|---|---|---|---|---|
A | 75,734 | 37,018 | 8079 | 4454 | 125,285 |
B | 18,396 | 11,878 | 4423 | 3177 | 37,874 |
C | 7554 | 6186 | 11,698 | 9316 | 34,754 |
D | 15,641 | 8657 | 2649 | 1524 | 28,471 |
E | 4655 | 2777 | 3073 | 2200 | 12,705 |
Total | 121,980 | 66,516 | 29,922 | 20,671 | 239,089 |
Category | Cup component group | Cup component type | Cup composition | Cup fixation | Cup implant type | Head component type | Head composition | Liner component type | Liner composition | Stem component type | Stem fixation | Stem implant type | Number of patients in category with OA |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
A | Cup | Monobloc | Polyethylene | Cemented | Cups cemented | Modular | Metal | Null | Null | Modular | Cemented | Stem cemented | 125,285 |
B | Shell | Standard | Metal | Cementless HA coated | Cups cementless | Modular | Metal | Standard | Polyethylene | Modular | Cementless HA coated | Stem cementless | 37,874 |
C | Shell | Standard | Metal | Cementless HA coated | Cups cementless | Modular | Ceramic | Standard | Ceramic | Modular | Cementless HA coated | Stem cementless | 34,754 |
D | Shell | Standard | Metal | Cementless HA coated | Cups cementless | Modular | Metal | Standard | Polyethylene | Modular | Cemented | Stem cemented | 28,471 |
E | Cup | Monobloc | Polyethylene | Cemented | Cups cemented | Modular | Ceramic | Null | Null | Modular | Cemented | Stem cemented | 12,705 |
Matching
In health evaluation, data often do not come from RCTs but from (non-randomised) observational studies. Rosenbaum and Rubin354 proposed propensity score matching as a method to reduce the bias in the estimation of treatment effects using observational data sets. Propensity matching on age and sex was undertaken using the Edwin Leuven procedure. 355
The rationale for using propensity scores is that, because in observational studies assignment of subjects to the treatment and control groups is not random, estimation of the effects of treatment may be biased by the existence of confounding factors. Using propensity score matching is a way to adjust or correct the estimation of treatment effects, controlling as far as possible for the existence of confounding factors, and is based on the idea that bias is reduced when comparison of outcomes is performed using treated and control subjects who are as similar as possible. We used the IPD retrieved from the 009 NJR data set with primary surgery undertaken before 1 March 2012.
We stratified data by sex (RS n = 31,222: 21,883 male, 9339 female; THR categories A–E n = 239,089: 87,187 male and 151,902 female) and matched by age within each sex stratum. From the RS group 9321 women and 17,322 men were matched by age with 9321 women and 17,322 men, respectively, from the THR group.
Analysis to match the RS and THR groups was performed using the statistical package Stata 12 Special Edition (StataCorp LP, TX, USA).
We used the Stata command ‘psmatch2’. 355 We used nearest-neighbour matching using one-to-one matching by identifying the ‘nearest neighbour’ to each RS patient from the THR database based on closest propensity score; the variable used to construct the propensity score was age within sex strata.
In using these programs it should be kept in mind that they allow us only to reduce, and not to eliminate, the bias generated by unobservable confounding factors.
Assessment of utility and quality of the National Joint Registry for England and Wales database
This section considers the utility and quality of the data set from the perspective of the requirements of the present report. Unsurprisingly, the database structure of this resource was not tailored specifically for the task in hand. The strengths and weaknesses of the data set are briefly summarised below.
Strengths
-
The data set was comprehensive in that it contained information on all patients listed for hip arthroplasty surgery in NHS hospitals in England and Wales between April 2003 and December 2012.
-
A small number of missing variables was present (less than 0.2% for the THR database).
-
The size of the data set was large; this provides narrow CIs for survival analysis and hence more certainty in the evaluation of cost-effectiveness.
-
It was possible to distinguish between THR and RS patients.
Weaknesses
-
The elapsed time to any primary outcome was reported in years rather than number of days or by date.
-
No costs were reported for the procedures.
-
It was not possible to link patients who progressed from treatment with RS to full THR in the database.
-
Our data set was not linked by revision surgery.
-
There was very poor reporting of BMI.
-
There was no linkage to the PROMs data set in our data.
Summary of the individual patient data set
The NJR provides valuable information about patient subgroups and the categorisation of hip replacement procedures for all patients receiving treatment in the NHS in England and Wales. There were insufficiently complete data to estimate linked primary and secondary surgery for each patient or costs or utilities associated with the procedures.
Subsequent chapters describe further analysis of this data set in the cost-effectiveness model.
Chapter 6 Patient-reported outcome measures
Quality of life and utilities
Background
This section provides a brief description of the PROMs data set, which was used to provide utility data for analysis in the Markov model. We obtained quality-of-life data from the PROMs data set for patients who had a THR between January 2009 and December 2012. 97 The variables in the data set included the following: PROMs ID, patient sex, patient death, surgery date, complications (e.g. bleeding, infection and wound problems, readmission, further surgery) and EQ-5D-3L data, which was completed 6 months after surgery.
The EQ-5D-3L is a generic health-related quality of life measure that comprises the following five dimensions: mobility, self-care, usual activities, pain/discomfort and anxiety/depression. Each dimension has three levels of scoring: no problems, some problems or severe problems. This creates 243 possible health states, to which unconscious and dead have been added, giving a total of 245 health states. These health states are then converted to an index score from 0 (dead) to 1 (perfect health).
Methods
Two analyses of the PROMs data set were undertaken.
Analysis 1
The PROMs data set for patients who had a THR between January 2009 and December 2012 included 207,436 records. After removing records with missing EQ-5D scores or surgery dates the data set contained 117,044 records. No age-specific utilities by sex were available in this data set.
Analysis 2
A second PROMs data set containing EQ-5D-3L data for THR by age and sex for the year 2010/11 was downloaded from the NHS Information Centre website in March 2013 (www.ic.nhs.uk/catalogue/PUB07049) for further analysis. This data set included 38,378 records. After removing patients with missing information with regard to EQ-5D scores, sex and age category, and after excluding patients aged < 40 years, the data set contained 32,577 records.
Overall
For both analyses, mean EQ-5D index results including SDs and 95% CIs were calculated. Linear regression analyses were conducted for EQ-5D index scores by sex for the different age categories. All statistical analyses were conducted in Stata 12.
Results
For all patients, the mean EQ-5D score after their hip operation was 0.767 (Table 56). Men had a slightly higher EQ-5D utility index score than women (0.787 vs. 0.753).
All patients | Men | Women | |
---|---|---|---|
n | 117,044 | 47,745 | 68,676 |
Mean (SD) | 0.767 (0.256) | 0.787 (0.253) | 0.753 (0.257) |
95% CI | 0.765 to 0.768 | 0.785 to 0.790 | 0.751 to 0.754 |
Table 57 shows that the mean EQ-5D utility score for patients who required further surgery after hip replacement was 0.575 for men and 0.553 for women.
All patients | Men | Women | |
---|---|---|---|
n | 3096 | 1320 | 1776 |
Mean (SD) | 0.562 (0.341) | 0.575 (0.352) | 0.553 (0.332) |
95% CI | 0.550 to 0.574 | 0.556 to 0.594 | 0.537 to 0.568 |
Table 58 shows the EQ-5D results for patients after surgery for the period 2010/11 by sex and age band. Overall, men had a slightly higher EQ-5D utility index score than women after their hip operation for all age bands. Men in the age band 60–70 years gave a slightly higher value to their health-related quality of life than men in any other age band; likewise, women in the age band 60–70 years gave a slightly higher value to their health-related quality of life than women in any other age band.
Age band | All patients | Men | Women |
---|---|---|---|
40–50 years | |||
n | 794 | 316 | 478 |
Mean (SD) | 0.726 (0.297) | 0.736 (0.319) | 0.720 (0.282) |
95% CI | 0.706 to 0.747 | 0.700 to 0.771 | 0.695 to 0.746 |
50–60 years | |||
n | 4352 | 1883 | 2469 |
Mean (SD) | 0.753 (0.287) | 0.767 (0.287) | 0.742 (0.286) |
95% CI | 0.744 to 0.761 | 0.754 to 0.780 | 0.731 to 0.753 |
60–70 years | |||
n | 11,106 | 4758 | 6348 |
Mean (SD) | 0.779 (0.259) | 0.792 (0.261) | 0.769 (0.257) |
95% CI | 0.774 to 0.784 | 0.784 to 0.799 | 0.763 to 0.775 |
70–80 years | |||
n | 12,308 | 4841 | 7467 |
Mean (SD) | 0.764 (0.246) | 0.790 (0.235) | 0.747 (0.251) |
95% CI | 0.759 to 0.768 | 0.783 to 0.797 | 0.741 to 0.752 |
80–90 years | |||
n | 4017 | 1234 | 2783 |
Mean (SD) | 0.721 (0.253) | 0.745 (0.249) | 0.710 (0.254) |
95% CI | 0.713 to 0.729 | 0.731 to 0.759 | 0.701 to 0.720 |
Summary of patient-reported outcome measure data
The PROMs data set has provided valuable EQ-5D data by age and sex for patients who have undergone a THR for use in the economic model. However, there were insufficient linkage data to link the PROMs data set to the NJR data set.
Chapter 7 Methods for modelling revision rates
Introduction
This section describes methods used for modelling revision rates to feed into the economic model. Revision rates found, the justification for using subgroups and findings by age and sex subgroups are included. We also compare here our findings with the previous benchmark generated from NICE TA 2 and TA 4.
Data were extracted from the NJR database (see Chapter 5) and patient cohorts were analysed for time to revision. Kaplan–Meier and competing risk analysis were implemented in Stata 11. For Kaplan–Meier analysis, non-revision by end of follow-up and death were censored; for competing risk analysis, the competing risk was death and the risk of interest was revision according to the Statauser-written routine. 356
Kaplan–Meier analyses were fitted with parametric distributions to allow for extrapolation beyond the observed data. Following the NICE Decision Support Unit (DSU) recommendation (see www.nicedsu.org.uk/Methods-Development(1985316).htm; accessed August 2014), the IPD were fitted with Weibull, Gompertz, log-logistic, log-normal and gamma distributions using the ‘streg’ command in Stata. It was found that for most cohorts of patients these commonly used distributions predicted decreasing hazard for revision beyond the observed data. As decreasing hazard is unlikely to capture increasing likelihood of revision from wear and tear, particularly for those who are active or of young age, further alternative models (bathtub, Rayleigh and Mitscherlich) were explored to allow for increasing hazard of revision beyond the observed data. An initial analysis of these was carried out using ordinary least squares in Stata 11 or Excel (2010; Microsoft Corporation, Redmond, WA, USA). The Rayleigh model predicts a linearly increasing hazard, the bathtub model a U-shaped hazard and the Mitscherlich model a hazard that increases at a decreasing rate with time to reach an asymptote:357,358
where π, a, b, g and l are constant parameters and t is time.
In practice, the Mitscherlich and Rayleigh models generated poor fits and were not pursued. The results from the Weibull, Gompertz, log-logistic, log-normal and bathtub models for each cohort are catalogued in Appendix 17, which presents modelled time to revision and hazard for the observation period and for extrapolation to 50 years.
The selection of an appropriate model or models for use in the economic analysis was based on the Akaike information criterion (AIC), judgement of the plausibility of the resulting extrapolations, visual goodness of fit to the IPD-derived Kaplan–Meier plot, and plots of the log-Kaplan–Meier estimated cumulative hazard compared with the log-modelled cumulative hazard. 359 In sex-stratified sensitivity analyses parametric fits were adjusted for age, with age for each cohort centred near the mean. The bathtub models were analysed using the Stata ‘stgenreg’ package developed by Crowther and Lambert. 360 This provided considerable advantages including the use of IPD, adjustment for age, prediction of hazard and survival and generation of AIC estimates for comparison with other models and of a covariance matrix of parameters that could be employed for probabilistic economic analysis. Flexible parametric models of Royston–Parmar were implemented using the ‘stpm2’ package in Stata developed by Lambert and Royston. 361,362
Revision rates
Categories of total hip replacement
We considered five separate categories of THR, which differ from each other with regard to the characteristics of the component parts of each type of prosthesis. The main features of these five categories are detailed in Table 53.
Patient populations to be compared
The remit from NICE for this report (see http://guidance.nice.org.uk/TA304; accessed August 2014) specified the following comparisons in people with pain and disability resulting from arthritis of the hip for which non-surgical management has failed:
-
different types of primary THR compared with hip RS for people in whom both procedures are suitable
-
different types of primary THR compared with each other for people who are not suitable for hip RS.
We considered five separate categories of THR, which differ from each other with regard to the characteristics of the component parts of each prosthesis category. The derivation and main features of these five categories are detailed in Chapter 5. The five categories account for ≈62% of all NJR THR recipients.
We used NJR data to investigate revision rates. Figure 24 shows the age distribution, according to decade, of NJR patients who received THR or RS and the age distribution by sex for those who received RS.
Most RS patients were aged < 65 years at the time of the intervention whereas most THR recipients were aged > 65 years. Figure 25 is a Kernel density diagram showing the overlap between the two distributions. We found that populations undergoing RS or THR overlapped substantially (for RS 89.7% were aged < 65 years and for all THR categories 22.6% were aged < 65 years).
Table 59 summarises the age and sex differences between the population who received RS and the population who received THR. THR interventions outnumbered RS interventions by more than 10 : 1, the proportion of women was twice as large for THR as it was for RS and the mean age of RS recipients was about 15 years less than that for THR recipients.
Population | Number | % female | Mean (SD) age (years) | Median age (years) | Interquartile range (years) |
---|---|---|---|---|---|
All RS recipients | 31,222 | 29.9 | 55.0 (8.6) | 55.7 | 49.7–60.9 |
All THR recipients | 386,556 | 61.4 | 69.5 (10.3) | 70.4 | 63.2–76.8 |
THR category A–E recipients | 239,089 | 63.5 | 71.6 (9.6) | 72.5 | 65.8–78.3 |
To compare RS with THR we needed to define patients who were eligible for both interventions. The NJR did not contain information indicating which patients were suitable for both THR and RS, nor was there information on those who might be considered unsuitable for RS. Expert clinical opinion indicated that RS was selected mainly for relatively active younger patients whereas THR was the predominant option for less active older patients. However, the NJR did not provide information on activity levels of patients.
The literature indicates that revision rates after RS are much higher for women than for men,15 whereas for THR the reverse is the case, a finding that we confirmed in our preliminary analysis (see Appendix 17). It is known that revision rates in general are lower for older patients. Because revision rates differ by sex and age it is likely that the cost-effectiveness of interventions will reflect the age and sex mix of the population(s) examined. Given the observed differences in age and sex for RS and THR populations, the following alternative strategies were considered to identify appropriate RS and THR populations for comparison of the interventions:
-
all RS recipients compared with all THR recipients, not matched
-
all RS recipients compared with recipients of the five identified THR categories, not matched (see Chapter 5)
-
all RS recipients compared with each of the (different 16+) categories of THR in the NJR data set, separately matched by age and sex
-
all RS recipients compared with THR recipients from each of the five identified categories, separately matched by age and sex
-
all RS recipients compared with all THR recipients from the combined five identified categories, matched by age and sex
-
all RS recipients compared with the total pool of all THR recipients, matched by age and sex.
Options 1 and 2 (without matching) were rejected because of the large age and sex differences between RS recipients and THR recipients; these imbalances influence revision rates and were judged likely to result in an inequitable comparison of the interventions. Options 3–6 avoid age and sex mismatch if age matching is undertaken separately for each sex and then the matched male and female populations combined. Age matching within sexes was in general feasible because of the much larger number of THR recipients than RS recipients. Therefore, we judged options 3–6 to be preferable to options 1 and 2.
Option 3 was considered impractical because of the large number of different THR interventions in the NJR database. Also, for options 3 and 4 the number of recipients within some individual THR categories was too small to allow age and sex matching with a significant proportion of RS recipients. Furthermore, expert clinical advice indicated that the relevant clinical decision was between RS and THR rather than between RS and any one of many THR options and therefore options 3 and 4 were considered less appropriate than options 5 and 6.
For these two important reasons we therefore selected option 5 for the base case. This represents a departure from the comparison specified in the protocol and scope. We selected option 5 to represent the most likely clinical comparison (the selection of THR prosthesis for a patient eligible for both RS and THR is likely to be from the most frequently used prostheses with the lowest revision rates, as represented by the five identified THR categories) (see Figure 23).
We therefore used propensity matching to match NJR patients with RS patients for decision problem 1 (see Chapter 2). Propensity matching on age and sex was undertaken using the Edwin Leuven procedure. 355
The comparison of revision rates among these matched individuals was used in the economic analysis.
We undertook subgroup analyses in which the comparison between RS and THR was examined separately for each sex, within which parametric models of revision were controlled for age. Revision rates were then estimated for men and women aged 40, 50 and 60 years. These ages were selected to avoid extremes in the age distribution of patients while capturing age-dependent differences that may exist in revision rates. There were three reasons for undertaking subgroup analyses: (1) the difference between the sexes in mechanical load bearing through the hip joint;363 (2) the large difference in observed revision rates between men and women (see Figures 31 and 46); and (3) expert clinical opinion, which indicated that age represents a reasonable proxy for activity levels.
In the selection of alternative interventions to address our objective (2) (comparison between different types of THR), we were guided by the frequency of use of different prostheses and by clinical advice (see Chapter 5). The wording of the scope required identification of THR recipients unsuitable for RS. However, the NJR did not provide information about which THR recipients were unsuitable for RS. Although it can be assumed that all RS patients may also be candidates for THR, the reverse is less likely. The majority of NJR THR recipients were aged > 65 years (see Figure 24), consistent with expert clinical opinion that older patients would be more likely candidates for THR than RS. Furthermore, the observed high revision rates that follow RS15,16 imply that in future fewer younger patients (aged < 65 years) will be considered to be candidates for both procedures. Therefore, for the base case we took the decision to compare THR categories across the whole population who received them (irrespective of age and sex).
However, because of the wide age range of patients who received a THR, and the different proportions of men and women receiving the different types of THR, we conducted sensitivity analysis controlling for age and sex. In addition, as only ≈10% of RS recipients were aged > 65 years, it appears that patients over this age are unlikely to be suitable for RS.
We therefore conducted subgroup analyses in which the THR populations were stratified by age (> 65 years or < 65 years) and examined separately by sex. Parametric models for revision in these subgroups were controlled for age and then revision rates were estimated for men and women aged 40, 50 and 60 years using the population aged < 65 years, and for men and women aged 70 and 80 years using the population aged > 65 years. The ages were selected to avoid extremes in the age distribution of patients while capturing age-dependent differences that may exist in revision rates.
The use of subgroups described above is consistent with NICE consultations for the update of its previous technology assessments of hip replacement interventions (TA246 and TA4425), which recommended, should evidence allow, that different interventions should be compared in subgroups of patients according to age and sex. 364 However, these subgroup analyses represent an extension from our protocol and scope. Table 60 summarises the make-up of the THR populations by age and sex.
Population | Number | % female | Mean (SD) age (years) | Median age (years) | Interquartile range (years) |
---|---|---|---|---|---|
All THR recipients | 386,556 | 61.4 | 69.5 (10.3) | 70.4 | 63.2–76.8 |
All THR female recipients | 237,436 | 100 | 70.2 (10.3) | 71.1 | 63.8–77.6 |
All THR male recipients | 149,120 | 0 | 68.45 (10.3) | 69.4 | 62.3–75.6 |
All THR category A–E recipients | 239,089 | 63.5 | 71.6 (9.6) | 72.5 | 65.8–78.3 |
All THR category A–E female recipients | 151,902 | 100 | 72.1 (9.6) | 73 | 66.4–78.9 |
All THR category A–E male recipients | 87,187 | 0 | 70.5 (9.6) | 71.5 | 64.9–77.1 |
All category A recipients | 125,285 | 66.9 | 74.6 (7.9) | 74.9 | 69.7–80 |
All category B recipients | 37,874 | 60.2 | 71.5 (8.7) | 72 | 65.9–77.5 |
All category C recipients | 34,754 | 55.4 | 61.6 (9.9) | 62.3 | 55.9–67.9 |
All category D recipients | 28,471 | 64.2 | 73.0 (8.3) | 73.4 | 67.8–78.7 |
All category E recipients | 12,705 | 60.1 | 66.2 (9.6) | 66.3 | 60.7–72.5 |
All category A male recipients | 41,472 | 0 | 73.9 (7.7) | 74.2 | 69.2–79.0 |
All category B male recipients | 15,055 | 0 | 70.9 (8.6) | 71.6 | 65.6–76.7 |
All category C male recipients | 15,502 | 0 | 61.6 (9.8) | 62.5 | 56–67.9 |
All category D male recipients | 10,181 | 0 | 72.5 (8.1) | 72.9 | 67.6–77.9 |
All category E male recipients | 4977 | 0 | 65.5 (9.4) | 65.6 | 60.3–71.6 |
All category A female recipients | 83,813 | 100 | 74.9 (8.0) | 75.3 | 70.0–80.5 |
All category B female recipients | 22,819 | 100 | 71.8 (8.8) | 72.3 | 66.2–78 |
All category C female recipients | 19,252 | 100 | 61.6 (9.9) | 62.2 | 55.8–67.9 |
All category D female recipients | 18,290 | 100 | 73.3 (8.5) | 73.7 | 67.9–79.2 |
All category E female recipients | 7728 | 100 | 66.7 (9.7) | 66.8 | 60.9–73.1 |
Overall revision rates, competing risks and rationale for analysis
Revision rates among NJR patients have been the subject of several recent publications. 15,16,318,353 Some investigators have used Kaplan–Meier analysis whereas others have employed competing risk analysis in which the event of interest is revision and death is taken as a competing risk. In Kaplan–Meier analysis death, as well as no revision at the end of follow-up, is censored. We briefly compared overall revision rates in our NJR RS and THR patients according to these methodologies (see Appendix 17 for results). RS revision rate estimates were very similar for both Kaplan–Meier and competing risk analyses and were similar to those reported by Smith et al. 15 For THR the Kaplan–Meier analysis generated somewhat higher rates of revision than the competing risk analysis.
Both Kaplan–Meier- and competing risk-estimated revision rates were higher for women than for men for RS whereas revision rates for women were less than those for men for THR. For this reason some sensitivity analyses in the economic analyses that follow have been stratified according to sex. To be consistent with all previous economic analyses of hip replacement technologies, we have used the revision estimates from Kaplan–Meier analysis together with parametric modelling to predict the rate of revision beyond the observed data.
In practice, several parametric models fitted the observed data for revision well. On extrapolation, models generated quite different revision rates, mainly determined by the different modelled hazard during the extrapolation period, with some models predicting an increasing hazard (e.g. bathtub) and others a decreasing hazard (e.g. log-normal); an example is shown in Figure 26. An increasing hazard of revision appears reasonable for ‘younger’ patients who are likely to outlive their prosthesis; however, it is clear that for patients of advanced age there is a relative lack of clinical imperative to undertake revision and an extrapolation with increasing hazard becomes less appropriate (Figure 27).
In view of these considerations, in the base-case analysis we selected the best fit to the observed data across all of the interventions that we compared. Because in practice the best fit was provided by the bathtub model (increasing hazard on extrapolation), sensitivity analyses were conducted with the best alternative fit that allowed for a decreasing extrapolated hazard. In subgroup analyses according to age and sex a dual approach was adopted in which increasing and decreasing extrapolated hazards were both investigated.
In principle, our approach conforms to NICE DSU guidance for modelling time-to-event IPD. This guidance, however, specifically refers to interventions compared within a single clinical trial and recommends that it is desirable to adopt the same parametric form for the interventions being compared. 365,366 The NJR comprises observational rather than RCT data so parametric fits for different interventions and/or patient groups may not be well described by a single parametric form.
Published cost-effectiveness analyses of THR have predominantly adopted a bathtub hazard model for revision rates. 38,44,273,367
Information criteria [AIC, Bayesian information criterion (BIC)] scores for modelled fits and plots of the modelled log-cumulative hazard compared with the log-Kaplan–Meier estimated hazard were used to judge goodness of fit and are provided in the main text or in Appendices 19 and 20 respectively.
Results
The parametric modelling results are reported in full in Appendix 17.
Proportional hazards tests
The condition of proportional hazards between observed revision rates for compared groups was examined using log-Kaplan–Meier estimated cumulative hazard compared with log-time. The results for RS compared with THR and for the five categories of THR prostheses are shown in Figures 28 and 29, respectively.
Cumulative hazard plots for women for the comparison between RS and THR are not parallel (see Figure 28); this held also for THR categories when the population was stratified by sex and age (see Figure 29). Because there was a lack of general support for proportional hazards for most comparisons, separate models were fitted for each comparison rather than using treatment as a covariate.
For men for the comparison between RS and THR a proportional hazards assumption appears to hold moderately well. For the comparison between different THR prostheses, again the cumulative hazards were not noticeably parallel (see Figure 28); this held also for THR categories when the population was stratified by sex and age. As there was a lack of general support for proportional hazards for most comparisons, separate models were fitted for each comparison rather than using treatment as a covariate.
Comparison of resurfacing arthroplasty with total hip replacement
For both sexes many more patients received THR than RS. The observed revision rate for all RS recipients (n = 31,222) over 9 years of follow-up was about three times that for all THR recipients (n = 386,556) (Figure 30). When the comparison was made by sex the observed revision rate for female RS recipients was more than three times that of female THR patients and the observed revision rate for male RS recipients was about twice that for male THR recipients (Figure 31).
When the comparison between RS and THR was restricted to THR recipients of the five prosthesis categories A–E (n= 239,089), the differences were larger (Figure 32) and again held across sexes (Figures 33–35). When revision rates for recipients of the individual categories of THR were compared with all RS recipients the observed revision rates for both sexes were considerably higher for RS than for any single THR category.
It is clear that revision rates after RS are much higher for both sexes than those after THR of any category. However, age and sex differences between the RS and the THR populations (Table 61) make these comparisons inequitable. More men than women received RS whereas more women than men received THR, and nearly all RS recipients were aged < 65 years (mean age ≈56 years) whereas most THR recipients were aged > 65 years (mean age ≈72 years). For an equitable comparison of the interventions it is necessary to match populations by sex and age.
Population | Number | % female | Mean (SD) age (years) | Median age (years) | Interquartile range (years) |
---|---|---|---|---|---|
All RS recipients | 31,222 | 29.9 | 55.0 (8.6) | 55.7 | 49.7–60.9 |
All THR recipients | 386,556 | 61.4 | 69.5 (10.3) | 70.4 | 63.2–76.8 |
THR category A–E recipients | 239,089 | 63.5 | 71.6 (9.6) | 72.5 | 65.8–78.3 |
RS propensity-matched population | 26,643 | 35.0 | 55.83 (8.3) | 54.0 | 49–59 |
THR propensity-matched population | 26,643 | 35.0 | 55.83 (8.3) | 54.0 | 49–59 |
RS propensity-matched population male | 17,322 | 0 | 57.1 (8.03) | 58 | 53–62 |
THR propensity-matched population male | 17,322 | 0 | 57.1 (8.03) | 58 | 53–62 |
RS propensity-matched population female | 9321 | 100 | 53.5 (8.4) | 54.0 | 49–59 |
THR propensity-matched population female | 9321 | 100 | 53.5 (8.4) | 54.0 | 49–59 |
Of the male and female patients who received RS for OA, 17,322 and 9321, respectively, were successfully propensity matched by age with THR patients from THR categories A–E (n = 239,089), providing 26,643 matched pairs for comparison (see Chapter 5, Matching and Figure 23). Age distribution was identical in the RS and THR matched populations (see Table 61) but was slightly skewed from normal (Figure 36). Kaplan–Meier analysis (see Figure 36) revealed that revision rates were much higher for RS than for the matched THR population.
Revision was more frequent among the matched THR population than among the whole THR population (Figure 37), demonstrating the importance of the matching process before comparison of RS with THR.
Information criteria (Table 62) indicated that bathtub models provided the best fit for both RS and THR, shown in Figure 36. Therefore, to compare RS with THR in the base-case economic analysis, transition probabilities were calculated using the bathtub model. Bathtub fits and extrapolations are shown in Figure 38 and reflect clinical practice as represented by patients in the NJR database. Bathtub fits were supported visually (see Appendix 17) and by plots of modelled compared with Kaplan–Meier-estimated cumulative hazards (Figure 39).
Intervention | Model | Observations | Model likelihood | Parameters | AIC | BIC |
---|---|---|---|---|---|---|
THR | Exponential | 26,643 | –3239.377 | 1 | 6480.753 | 6488.944 |
THR | Weibull | 26,643 | –3219.967 | 2 | 6443.935 | 6460.315 |
THR | Gompertz | 26,643 | –3230.912 | 2 | 6465.825 | 6482.205 |
THR | Log-normal | 26,643 | –3221.913 | 2 | 6447.827 | 6464.207 |
THR | Log-logistic | 26,643 | –3220.111 | 2 | 6444.222 | 6460.603 |
THR | Bathtub | 26,643 | –3215.51 | 3 | 6437.021 | 6461.592 |
RS | Exponential | 26,643 | –8102.451 | 1 | 16206.9 | 16215.09 |
RS | Weibull | 26,643 | –8101.688 | 2 | 16207.38 | 16223.76 |
RS | Gompertz | 26,643 | –8094.569 | 2 | 16193.14 | 16209.52 |
RS | Log-normal | 26,643 | –8162.981 | 2 | 16329.96 | 16346.34 |
RS | Log-logistic | 26,643 | –8107.527 | 2 | 16219.05 | 16235.43 |
RS | Bathtub | 26,643 | –8037.685 | 3 | 16081.37 | 16105.94 |
The bathtub-modelled percentage revision at 10, 20 and 30 years is summarised in Table 63.
Intervention | Revision (%) | ||
---|---|---|---|
10 years | 20 years | 30 years | |
RS | 17.2 | 48.3 | 76.3 |
THR | 4.6 | 12.9 | 24.6 |
As the age distributions of the matched populations were somewhat removed from normal (see Figure 36) we undertook sensitivity analysis in which bathtub models were controlled for age and sex and extrapolated revision was calculated for an ‘average’ population aged 55.8 years with 35% women (see Figure 38). Because it was evident that revision rates were much higher for women receiving RS than for men receiving RS, and because revision rates likely vary according to the age of patients, subgroup analyses focused on comparing populations stratified by sex and controlled for age. The results of the analysis of revision rates for these subgroups are provided in the following sections and in Appendix 17.
Comparison of the total hip replacement categories
For THR patients encompassed within the five selected categories (A–E, n = 239,089), the proportion remaining unrevised at 9 years according to Kaplan–Meier analysis was 0.974. The proportion of all 386,556 THR recipients unrevised at 9 years was 0.962 (Figure 40). The Kaplan–Meier plot for the five selected THR interventions indicated a relatively high initial hazard for revision that gradually decreased over about 4 years and subsequently gradually increased between 5 and 9 years.
Kaplan–Meier analyses indicated different revision rates across the five categories of THR (Figures 41 and 42). Revision rates for patients who received CeLCoC (category C) and CeLMoP (category B) THRs were clearly higher than those for patients who received CeCoP (category E) and CeMoP (category A) THRs.
According to information criteria scores (Table 64), other than for CeCoP (category E) THR, the bathtub model provided the best parametric fit, followed by the log-normal model. For CeCoP (category E), the log-normal model was marginally superior to the bathtub model. These inferences were supported by visual inspection (see Appendix 17) and by comparing modelled with Kaplan–Meier-estimated cumulative hazards for each category (Figure 43).
THR | Model | Observations | Model likelihood | Parameters | AIC | BIC |
---|---|---|---|---|---|---|
CeLCoC (category C) | Exponential | 34,754 | –3955.734 | 1 | 7913.467 | 7921.923 |
CeLCoC (category C) | Weibull | 34,754 | –3882.115 | 2 | 7768.229 | 7785.141 |
CeLCoC (category C) | Gompertz | 34,754 | –3906.282 | 2 | 7816.563 | 7833.475 |
CeLCoC (category C) | Log-normal | 34,754 | –3872.162 | 2 | 7748.323 | 7765.235 |
CeLCoC (category C) | Log-logistic | 34,754 | –3881.911 | 2 | 7767.822 | 7784.734 |
CeLCoC (category C) | Bathtub | 34,754 | –3858.878 | 3 | 7723.755 | 7749.123 |
HyMoP (category D) | Exponential | 28,471 | –2428.234 | 1 | 4858.468 | 4866.724 |
HyMoP (category D) | Weibull | 28,471 | –2387.427 | 2 | 4778.854 | 4795.368 |
HyMoP (category D) | Gompertz | 28,471 | –2405.936 | 2 | 4815.872 | 4832.385 |
HyMoP (category D) | Log-normal | 28,471 | –2383.97 | 2 | 4771.94 | 4788.454 |
HyMoP (category D) | Log-logistic | 28,471 | –2387.411 | 2 | 4778.822 | 4795.335 |
HyMoP (category D) | Bathtub | 28,471 | –2373.646 | 3 | 4753.291 | 4778.061 |
CeLMoP (category B) | Exponential | 37,874 | –4535.478 | 1 | 9072.955 | 9081.497 |
CeLMoP (category B) | Weibull | 37,874 | –4391.882 | 2 | 8787.763 | 8804.847 |
CeLMoP (category B) | Gompertz | 37,874 | –4442.601 | 2 | 8889.202 | 8906.286 |
CeLMoP (category B) | Log-normal | 37,874 | –4377.507 | 2 | 8759.014 | 8776.098 |
CeLMoP (category B) | Log-logistic | 37,874 | –4391.567 | 2 | 8787.133 | 8804.217 |
CeLMoP (category B) | Bathtub | 37,874 | –4345.8 | 3 | 8697.601 | 8723.227 |
CeMoP (category A) | Exponential | 125,285 | –10000.51 | 1 | 20003.01 | 20012.75 |
CeMoP (category A) | Weibull | 125,285 | –9929.73 | 2 | 19863.46 | 19882.94 |
CeMoP (category A) | Gompertz | 125,285 | –9965.745 | 2 | 19935.49 | 19954.97 |
CeMoP (category A) | Log-normal | 125,285 | –9927.767 | 2 | 19859.53 | 19879.01 |
CeMoP (category A) | Log-logistic | 125,285 | –9929.867 | 2 | 19863.73 | 19883.21 |
CeMoP (category A) | Bathtub | 125,285 | –9909.508 | 3 | 19825.02 | 19854.23 |
CeCoP (category E) | Exponential | 12,705 | –759.4492 | 1 | 1520.898 | 1528.348 |
CeCoP (category E) | Weibull | 12,705 | –757.1662 | 2 | 1518.332 | 1533.232 |
CeCoP (category E) | Gompertz | 12,705 | –757.8727 | 2 | 1519.745 | 1534.645 |
CeCoP (category E) | Log-normal | 12,705 | –756.8497 | 2 | 1517.699 | 1532.599 |
CeCoP (category E) | Log-logistic | 12,705 | –757.163 | 2 | 1518.326 | 1533.226 |
CeCoP (category E) | Bathtub | 12,705 | –756.6023 | 3 | 1519.205 | 1541.554 |
For the base-case economic analysis, transition probabilities were calculated from the bathtub fit for all categories. The fit to the Kaplan–Meier estimates and the extrapolation beyond the observed data are shown in Figures 44 and 45, respectively. These analyses reflect the performance of the five types of prosthesis for NJR patients over 9–10 years to 2012.
The lowest and highest revision rates were experienced by CeCoP (category E) and CeLCoC (category C) recipients, respectively (Table 65). The bathtub-modelled percentage of patients requiring revision at 10, 20 and 30 years is summarised in Table 66.
Population | Number | % female | Mean (SD) age (years) | Median age (years) | Interquartile range (years) |
---|---|---|---|---|---|
All THR (category A–E) recipients | 239,089 | 63.5 | 71.6 (9.6) | 72.5 | 65.8–78.3 |
All CeMoP (category A) recipients | 125,285 | 66.9 | 74.6 (7.9) | 74.9 | 69.7–80 |
All CeLMoP (category B) recipients | 37,874 | 60.2 | 71.5 (8.7) | 72 | 65.9–77.5 |
All CeLCoC (category C) recipients | 34,754 | 55.4 | 61.6 (9.9) | 62.3 | 55.9–67.9 |
All HyMoP (category D) recipients | 28,471 | 64.2 | 73.0 (8.3) | 73.4 | 67.8–78.7 |
All CeCoP (category E) recipients | 12,705 | 60.1 | 66.2 (9.6) | 66.3 | 60.7–72.5 |
THR category | Revision (%) | ||
---|---|---|---|
10 years | 20 years | 30 years | |
CeMoP (category A) | 2.8 | 7.9 | 15.6 |
CeLMoP (category B) | 3.9 | 9.9 | 18.7 |
CeLCoC (category C) | 4.6 | 12.3 | 23.5 |
HyMoP (category D) | 3.0 | 8.4 | 16.5 |
CeCoP (category E) | 2.1 | 5.2 | 9.9 |
Across the five THR category recipients 36.5% were men and 63.5% were women but within categories the ratio varied from 1.24 for CeLCoC (category C) to 2.02 for CeMoP (category B). Revision was more frequent for men than for women (Figure 46) although this was least pronounced for the CeCoP (category E) prosthesis.
Similarly, the age distribution of patients differed somewhat according to THR category (Figure 47). CeLCoC (category C) prostheses were used more for younger patients and CeMoP (category A) prostheses were used more for older patients. Across the five THR categories the mean age was 71.56 years
In sensitivity analysis the bathtub model was controlled for age and sex to adjust for spurious differences in revision rates because of differing proportions of men and women or of younger or older patients in the different THR categories. The relative performance of the five categories modelled for a population aged 71.6 years with 63.5% women demonstrates that the superiority of the CeCoP prosthesis was somewhat enhanced.
In further sensitivity analysis we used log-normal fits to the Kaplan–Meier-estimated revision rates; these are shown for each of the types of THR (categories A–E) in Table 67 and Figure 48. With a mean age across all categories of nearly 72 years, extrapolation predicting a decreasing hazard for revision may be appropriate. The best-fit model providing this condition was the log-normal model. These fits are shown in Figure 49. The relative performance of the prostheses was similar to that with the bathtub model; however, unsurprisingly, extrapolated revision rates were lower than with the bathtub model.
THR category | Revision (%) | ||
---|---|---|---|
10 years | 20 years | 30 years | |
CeMoP (category A) | 2.3 | 3.5 | 4.4 |
CeLMoP (category B) | 3.3 | 4.6 | 5.5 |
CeLCoC (category C) | 3.7 | 5.3 | 6.4 |
HyMoP (category D) | 2.4 | 3.4 | 4.2 |
CeCoP (category E) | 1.8 | 2.9 | 3.8 |
Further sensitivity analysis was carried out in which the log-normal model was controlled for age and sex. With this model the superior performance of the CeCoP (category E) prosthesis was maintained (see Figure 49).
Comparison between resurfacing arthroplasty and total hip replacement: subgroup analyses according to sex (women)
Because the use of different categories of THR prostheses differed by age and sex and as recipients of THR interventions aged > 65 years approximate a population unlikely to be considered candidates for RS (see Figure 24), we undertook subgroup analyses in which the THR population for each category was stratified by sex and by age (> 65 years and < 65 years) and parametric models were controlled for age. Results from these analyses are presented in Figure 49.
As expected, the matched groups (n =9321) had an identical age distribution [mean 53.5 (SD 8.4) years, range 15–93 years] (Figure 50).
The observed time to revision was far shorter for RS than for THR recipients (Figure 51).
For RS, Gompertz, bathtub and Weibull models provided good fits and each predicted an increasing hazard beyond the observed data; according to AIC scores and cumulative hazard plots the Gompertz and bathtub models were the better fits (see Appendices 19 and 20, respectively) and predicted similar revision beyond the observed data.
For THR patients the bathtub fit was as good as the alternatives (see Appendix 17) and was the only model that predicted an increasing hazard beyond the observation period. According to AIC scores and cumulative hazard plots, differences were trivial between the bathtub, log-normal and Weibull models (see Appendices 19 and 20, respectively). For the economic analysis the bathtub model was adopted for both the RS group and the THR group. The predicted requirement for revision at 10, 20 and 30 years using the bathtub model is shown in Table 68.
Intervention | Revision (%) | ||
---|---|---|---|
10 years | 20 years | 30 years | |
RS | 23.1 | 61.2 | 87.6 |
THR | 4.8 | 13.2 | 25.2 |
Comparison between resurfacing arthroplasty and total hip replacement: subgroup analyses according to sex (men)
Each of the matched groups (n = 17,322) had a mean age of 57.1 (SD 8.03 years; range 16–89 years) and an identical age distribution (Figure 52).
The observed revision rate was higher for RS than for THR (Figure 53). Parametric fits are presented in Appendix 17. The bathtub distribution produced the lowest AIC scores and visually the superior fit (see Appendices 17 and 19); cumulative hazard plots are provided in Appendix 20. Apart from the bathtub model, the models predicted a decreasing hazard on extrapolation (see Appendix 17). For the economic analysis the bathtub model was adopted for both the RS group and the THR group. The predicted requirement for revision at 10, 20 and 30 years is shown in Table 69.
Intervention | Revision (%) | ||
---|---|---|---|
10 years | 20 years | 30 years | |
RS | 12.4 | 35.6 | 61.2 |
THR | 4.7 | 13.2 | 25.5 |
Comparison of total hip replacement revision rates according to sex and age: men aged more than 65 years
Figure 54 shows the observed time to revision for male patients aged > 65 years according to category of THR prosthesis. Revision was less frequent for CeCoP (category E) than for other categories. Parametric fits to the observed data are shown in Appendix 17, AIC values for models in Appendix 19 and diagnostic plots in Appendix 20. Visually and by AIC scores the bathtub and log-normal models generated best fits except for the CeCoP (category E) prosthesis for which the bathtub model did not resolve. In view of the advanced age of these patients, after accumulating 9 years of follow-up data, it was considered that an increasing hazard (bathtub) for revision was unlikely and therefore the log-normal model was used for the economic base case. The extrapolations shown in Figure 54 apply for patients aged 70 years.
The model-predicted requirements for revision at 10, 20 and 30 years are summarised in Table 70.
THR category | Revision (%) | ||
---|---|---|---|
10 years | 20 years | 30 years | |
CeMoP (category A) | 2.4 | 3.5 | 4.4 |
CeLMoP (category B) | 3.6 | 4.9 | 5.9 |
CeLCoC (category C) | 3.9 | 5.5 | 6.7 |
HyMoP (category D) | 2.5 | 3.7 | 4.6 |
CeCoP (category E) | 1.9 | 2.9 | 3.6 |
Comparison of total hip replacement revision rates according to sex and age: women aged more than 65 years
Figure 55 shows the observed time to revision for female patients aged > 65 years according to category of THR prosthesis. Revision was less frequent for CeCoP (category E) than for other categories. Parametric fits to the observed data are shown in Appendix 17, AIC values for models in Appendix 19 and diagnostic plots in Appendix 20. Visually and by AIC scores the bathtub and log-normal models generated best fits except for the CeCoP (category E) prosthesis for which the bathtub model did not resolve. In view of the advanced age of these patients, after accumulating 9 years of follow-up data, it was considered that an increasing hazard (bathtub) for revision is unlikely and therefore the log-normal model was used for the economic base case. The extrapolations shown in Figure 55 apply for patients aged 70 years. The predicted requirement for revision at 10, 20 and 30 years is summarised in Table 71.
THR category | Revision (%) | ||
---|---|---|---|
10 years | 20 years | 30 years | |
CeMoP (category A) | 2.0 | 3.1 | 3.9 |
CeLMoP (category B) | 2.8 | 3.8 | 4.5 |
CeLCoC (category C) | 2.7 | 3.7 | 4.4 |
HyMoP (category D) | 1.9 | 2.7 | 3.3 |
CeCoP (category E) | 1.4 | 2.3 | 3.0 |
Comparison of total hip replacement revision rates according to sex and age: men aged less than 65 years
Figure 56 shows the observed time to revision for male patients aged < 65 years according to category of THR prosthesis. Parametric fits to the observed data are shown in Appendix 17 and AIC values for models are summarised in Appendix 19. Cumulative hazard plots are shown in Appendix 20. Observed revision was less frequent for CeCoP (category E) than for other categories. According to AIC values (and visually), the bathtub model provided a superior fit for categories B, C and D followed by the log-normal model. For categories A and E there were only trivial differences in AIC values between the bathtub and the log-normal models. On extrapolation of the bathtub models the CeMoP category becomes superior to CeCoP after about 25 years’ follow-up. Transition probabilities for the economic analysis were based on bathtub models (base case for the subgroup) and log-normal models were used in sensitivity analysis. The extrapolations of the bathtub models shown in Figure 56 apply to patients aged 50 years.
The bathtub-predicted requirement for revision at 10, 20 and 30 years is summarised in Table 72.
THR category | Revision (%) | ||
---|---|---|---|
10 years | 20 years | 30 years | |
CeMoP (category A) | 4.2 | 10.3 | 18.9 |
CeLMoP (category B) | 6.9 | 20.7 | 39.0 |
CeLCoC (category C) | 5.4 | 14.3 | 27.0 |
HyMoP (category D) | 5.3 | 13.8 | 26.0 |
CeCoP (category E) | 2.9 | 8.5 | 19.7 |
Comparison of total hip replacement revision rates according to sex and age: women aged less than 65 years
Figure 57 shows the observed time to revision for female patients aged < 65 years according to category of THR prosthesis. Observed revision was less frequent for CeCoP (category E) than for other categories. Parametric fits to the observed data are shown in Appendix 17 and AIC values for models are summarised in Appendix 19. Cumulative hazard plots are shown in Appendix 20. According to AIC values and visual inspection the bathtub model provided a superior fit to observed data for categories A, C, D and E, but failed to resolve for category B (CeLMoP). Of the tested models for category B, each except for the exponential model generated a decreasing hazard beyond the observed data. For the economic model the bathtub model was selected for all categories except B for which the exponential model was used (this will tend to favour category B over the other categories). The predicted requirement for revision at 10, 20 and 30 years is shown in Table 73.
THR category | Revision (%) | ||
---|---|---|---|
10 years | 20 years | 30 years | |
CeMoP (category A) | 4.7 | 14.3 | 28.0 |
CeLMoP (category B) | 4.8 | 9.4 | 13.8 |
CeLCoC (category C) | 5.2 | 14.2 | 27.1 |
HyMoP (category D) | 4.5 | 14.9 | 29.7 |
CeCoP (category E) | 3.1 | 10.0 | 20.3 |
Comparison of revision rates with the National Institute for Health and Care Excellence benchmark
The two previous TA guidance documents (TA4425 and TA246) suggested a revision rate benchmark of 10% at 10 years for hip replacement interventions. Here we compare the performance of the technologies assessed in this report against this benchmark. It should be noted that the benchmark is derived from an assessment of technologies based on data from approximately 15–20 years ago.
Table 74 summarises our estimates of revision rates at 10 years for the currently examined technologies. It should be noted that these are based on data from the NJR in which follow-up was somewhat short of 10 years so that some extrapolation beyond the observed data was necessary.
Intervention | Population | Revision at 10 years (%) |
---|---|---|
RS | All NJR patients (n = 31,222) | 14.4 |
RS | Matched population (n = 26,643) | 17.2 |
RS | Female matched (n = 9321) | 23.1 |
RS | Male matched (n = 17,322) | 12.4 |
THR | Categories A–E matched to RS (n = 26,643) | 4.7 |
THR | All NJR patients (n = 386,566) | 5.2 |
THR | All CeMoP (category A) (n = 125,285) | 2.8 |
THR | All CeLMoP (category B) (n = 37,874) | 3.9 |
THR | All CeLCoC (category C) (n = 34,754) | 4.7 |
THR | All HyMoP (category D) (n = 28,471) | 3.0 |
THR | All CeCoP (category E) (n = 12,705) | 2.1 |
It is clear that for each of the THR categories A–E, the revision rate at 10 years is within half the benchmark rate, the CeCoP (category E) prosthesis performing better than the rest. Category A–E THR patients age matched to RS recipients similarly experienced revision rates that were less than half the benchmark rate, and this also nearly applied for the revision rate observed for all THR patients in the NJR.
In contrast, the revision rate for RS recipients as a whole or for RS patients after age matching with THR recipients for both sexes substantially exceeded the benchmark; the rate for women reached 23.1% and the rate for men reached 12.4%.
These results suggest that a new benchmark of < 10% at 10 years would now appear to be appropriate for THR technologies and that RS technologies may require considerable improvement to meet the 10% benchmark.
Flexible parametric modelling
Several recent analyses of revision rates for patients in the NJR have employed the flexible parametric procedure of Parmar and Lambert. 361 As far as we are aware no economic models for hip replacement have yet employed this approach. We therefore employed flexible parametric modelling in sensitivity analysis of revision rates to determine whether conclusions based on methods described above might be at odds with results from flexible parametric modelling.
In general, flexible parametric models generated good fits to the Kaplan–Meier estimates of observed revision rates (see Appendix 23); in some instances, AIC scores were as good as or better than those for alternative models. With regard to different THR categories, revision rates gradually decreased on extrapolation, and rates were sometimes greater and sometimes lesser than those predicted by the Weibull and log-normal models (see Appendix 23); as with the base-case bathtub model and the log-normal model, the CeCoP (category E) prosthesis provided the lowest modelled revision rate. With regard to the comparison between RS and THR, for both men and women, as with the base-case bathtub model, flexible modelling yielded considerably higher rates of revision than the log-normal or Weibull model (see Appendix 23).
Increasing the number of knots in the flexible parametric modelling improved goodness of fit and modified the extrapolated revision rates such that predicted revision beyond the observed data appeared to be more influenced by the tail of the observed data where the observations were subject to greater uncertainty. This did not necessarily appear to offer an advantage over alternative models. Furthermore, there was no obvious way of determining the number of knots likely to generate the most reasonable extrapolation. Therefore, in sensitivity analysis we used three knots.
Discussion of methods of modelling revision rates
In the NJR twice as many men as women received RS, whereas 1.7 times as many women as men received a category A–E THR, with the mean age for RS recipients nearly 15 years lower than that for THR recipients. The number of THR recipients outnumbered RS recipients by about 10 : 1. When observed revision rates over about 9 years of follow-up were compared between the total THR population and the total RS population they were found to be about three times higher for RS. The difference was greater for women than for men (nearly fourfold and about twofold, respectively). When the comparisons were made between RS and the most frequently used categories of THR, these differences were greater.
All THR categories for both men and women had far lower revision rates than that of RS. Because of the age and sex imbalances between the RS population and the THR population we used propensity matching by age and sex to generate a THR population that would allow an equitable comparison between the RS and the THR interventions. This did not disadvantage RS relative to THR because the younger THR matched population exhibited higher rates of revision than did the whole THR population. The revision rate for RS controlled for age was substantially greater than that for THR. This held for both men and women and, when carried through to the economic analysis, this translated to the association of higher costs with RS than with THR.
The number of unique THR prostheses used for NJR patients was large, even without taking into account the variety of manufacturer brands available for the different prosthesis components. It was necessary to reduce these to a smaller number for economic analysis. Selection was based on the frequency of use of different categories of prosthesis and on expert clinical opinion. The selection of the five THR categories was conducted pre hoc and before all analyses of revision rates. Just over 239,000 patients in the NJR received one of the five selected categories of THR prostheses. The observed revision rates were lowest for CeCoP (category E) and highest for CeLCoC (category C) and CeLMoP (category B) THR. This reflects practice over the last 9–10 years.
Age and sex distributions varied between categories; however, when populations were controlled for differences in age and sex, or were stratified by sex and controlled for age, the lower revision rate for CeCoP (category E) THR relative to the other categories was not diminished. Also, when well-fitting models were used that predicted either increasing hazard or decreasing hazard on extrapolation, the superiority of the CeCoP (category E) revision rate was again upheld. There was insufficient information consistently recorded within the NJR for investigation of other potential confounders. Several potentially influential factors might determine the observed differences in revision rates; these include different prosthesis designs, different patients, different surgical performance and different orthopaedic centres. NJR data were complete for patient age and sex on receipt of THR.
For economic modelling we used the revision estimates from Kaplan–Meier analysis. This conforms with the practice of previous hip replacement cost-effectiveness models found in the literature. McMinn et al. 318 aptly define the inference of such analyses as follows: ‘inferences about, and comparisons of, revision rates at any time relate to patients who are not already dead at that time’. This was considered appropriate for the structure of the economic model.
To model revision rates we followed NICE DSU guidance in first exploring exponential, Weibull, Gompertz, log-normal and log-logistic models of observed revision rates based on IPD; these commonly used parametric fits are readily available within statistical packages (such as Stata) and an initial consideration of goodness of fit can be obtained, for example from the AIC and BIC. 365 However, most economic analyses of hip replacement, notably those of Briggs et al. ,38 Higashi et al. 273 and Pennington et al. ,44 modelled revision rates on the assumption of a U-shaped hazard. In these analyses an assumed high hazard for failure associated with surgery is followed by a decreasing hazard that eventually plateaus during an initial recovery period and is then followed by a gradually increasing hazard as host bone deteriorates with patient age and the prosthesis accumulates wear and tear. The resulting hazard curve forms a ‘U’ shape commonly termed a bathtub. We therefore also explored bathtub models.
The NJR observation period for both RS and THR patients extended to about 9 years. NICE requires a lifetime economic model to capture all benefits (and harms) of interventions; therefore, extrapolation of revision rates beyond the observed data was required. In most of the comparisons undertaken for this report the extrapolation of most models predicted a decreasing rate of revision (i.e. decreasing hazard); however, the bathtub models all described an increasing revision rate beyond the observed period. Increasing the hazard of revision appears reasonable for patients who are relatively young at the time of primary hip replacement and who might be expected to live with their prosthesis for ≥ 30 years. For older age groups it may be argued that a model predicting an increasing hazard for revision is unsuitable as, relative to younger generally more active patients, the prosthesis is subject to less wear and tear for a shorter time. The observed rate of revision during the observation period for NJR patients aged > 85 years was very low and minor relative to attrition because of death (see Figure 26). It is clear that for patients of advanced age there is a relative lack of clinical imperative to undertake revision and an extrapolation with an increasing hazard becomes less appropriate.
Published economic models of hip replacement have adopted various solutions for modelling THR revision rates. In common with several of these we modelled revision rates in the base case using a U-shaped (bathtub) hazard assumption. 38,44,273 This was supported by the goodness of fit to the observed data according to visual inspection, information criteria scores and plots of the log-Kaplan–Meier-estimated cumulative hazard compared with the log-modelled cumulative hazard. 359 Published analyses with long-term follow-up of patients also support increasing revision rates beyond 10 years from the primary intervention. Previous studies obtained an overall bathtub hazard by combining a Weibull fit for early failures with a Weibull fit for late failures. 38,44,273 We derived the bathtub hazard directly using the Stata package developed by Crowther and Lambert. 360 This had the advantages of parsimony and of not requiring arbitrary decisions about early and late failures. Higashi and Barendregt273 used long-term follow-up studies for the second Weibull fit to obtain an increasing hazard in the long term; however, this suffers the disadvantage that very different populations were used for the early and late fits. Pennington et al. 44 employed a piece-wise procedure to generate the U-shaped hazard; however, after extrapolation this predicted that > 100% of patients sustained revision and at this point the rate required capping.
For revision rates the unit of analysis was the time to a patient’s first revision. For patients who received THR for both hips simultaneously only the replacement that failed first was included as an event, and for those who received THR for both hips on separate occasions only the first primary intervention entered the analysis.
For RS a wide range of different femoral head sizes are used and revision rates have been reported to vary according to head size. 15 Only a narrow range of different head sizes are used for THR prostheses and expert clinical advice indicated that these are unrelated to RS head sizes so that comparisons between RS and THR according to head size were not undertaken.
Summary
The Kaplan–Meier-estimated rates of revision during approximately 9 years of follow-up of NJR patients indicated that the probability of revision differed between interventions. RS had a considerably higher frequency of revision than THR; this held across both sexes. The five categories of THR selected also differed in the observed revision rate, with CeCoP (category E) tending to have a lower rate of revision than other categories; again, this held generally across age groups and sex.
For all interventions several parametric models generated good fits to the observed data. The differences between models with a good fit over the observation period were minor relative to differences generated on extrapolation. Extrapolations generated from well-fitting models could be broadly divided into those predicting a gradual increase in rate of revision with time (usually, but not always, these were bathtub models) and those predicting a gradual decrease in rate of revision with time. Data summarised in Appendix 24 from several sources (the Swedish registry,95 the RCT of Kim et al. 129 and long-term follow-up observational studies368–372) tended to support the proposition of an increasing hazard, at least for the first decade or so beyond the 9 years of NJR data.
On the other hand it is clear that NJR patients who receive a THR in old age (e.g. > 85 years) have a low probability of surgery for THR revision. In general, it appears likely that revisions beyond the observed data first occur at an increasing rate and later at a decreasing rate. The parametric fits did not capture this putative pattern well and it is difficult to ascertain when rates might change from increasing to decreasing for different age groups. However, the lower rate of revision seen for CeCoP (category E) THR relative to other categories was maintained across models that differed in the direction of the hazard after extrapolation beyond the observed data.
The differences between models in the extrapolation of revision rates require about a decade beyond the observation period before becoming substantial. By that time discounting and higher mortality rates will tend to attenuate the influence of differing extrapolations on the results from an economic model. Therefore, it may be anticipated that, over a lifetime, different modelling approaches to extrapolation (increasing hazard for each intervention or alternatively decreasing hazard for each) might not have a large influence on the economic outcomes for the interventions relative to their observed differences.
Our assessment of THR and RS against the revision rate benchmark from TA246 and TA4425 of 10% at 10 years suggests that a new benchmark of < 10% at 10 years would now appear to be appropriate for THR technologies, but that RS technologies may still require considerable improvement to meet the 10% benchmark.
Chapter 8 Warwick economic assessment
This chapter describes the structure of the economic model, the main assumptions of the model, the scenarios evaluated and the sensitivity analyses. The underlying model is based on that of Fitzpatrick et al. ,373 which has been adapted for our decision problem and updated with new data.
Methods
De novo analysis
Patients
We used NJR data to investigate revision rates. Detailed information on this is given in Chapters 5 and 7.
We used propensity matching to match by age and sex NJR THR category A–E patients with RS patients. These matched populations were used to generate modelled revision rates for our economic model for the base case for decision problem (1) (see Chapter 2). Furthermore, we performed subgroup analyses in which RS and THR matched populations were stratified by sex, and models of time to revision were controlled for age. For decision problem (2) (see Chapter 2), in the base case we compared THR categories A–E, irrespective of age and sex. In sensitivity analysis we controlled for age and sex. For subgroup analysis we stratified by age (< 65 years and > 65 years) and by sex, and the modelled time to revision was controlled for age. The selection of the subgroup aged > 65 years reflected a population unlikely to be considered suitable only for THR and not suitable for RS (see Table 60 for population details).
Model structure
An economic model was developed based on a Markov multistate model, as shown in Figure 58.
In the model, each patient can enter one of four health states following primary surgery:
-
Successful primary (RS or THR) surgery (if initial surgery is successful, patients enter this health state).
-
Revision surgery arises at the second-year cycle (if initial surgery fails, patients may then require a revision). If necessary, patients can move into this state more than once. Patients stay in this health state for one cycle only.
-
Successful revision surgery (if revision surgery is successful, patients enter this health state).
-
Death (this is an absorbing health state and patients may enter this state because of operative mortality or because of death from other causes).
For RS compared with THR and for different categories of THR compared with each other, similar models were built (see Figure 58), with different estimates of transition probabilities, utilities and costs.
The cycle length for each model was set at 1 year and transitions between each health state occur at the end of each cycle. Before submission of the final report, a third party who was not directly involved in the assessment cross-checked the inputs to the model and fully rebuilt the model as a structural cross-check. All discrepancies were discussed with the assessment team and the appropriate final set of model inputs and model structure were agreed on for the final report.
Based on the external assessment, it was assumed that all THR events occurred at the start of the annual cycle, with mortality from other causes (non-THR events) occurring at the end of each cycle. We also noticed that the estimates for the first-year revision rates were high over the first several months after implantation of a prosthesis but that for category E this was less pronounced than for other categories. Therefore, the transition from successful primary health state to revision THR was assumed to occur at any time and was not specified as occurring at the start of the second annual cycle.
For both review questions we adopted a 10-year and a lifetime horizon. The 10-year time horizon reflects observed IPD from the NJR, and the lifetime horizon follows the recommendation from NICE that the time horizon should be sufficiently extended to capture all benefits likely to accrue from an intervention. 374 The analysis was conducted from the perspective of the NHS and personal social services (PSS). All costs are in pounds sterling in 2011/12 prices. Health outcomes were measured in QALYs. Results are expressed as incremental cost per QALY gained. An annual discount rate of 3.5% was applied to both costs and outcomes. 374
The key features of the analysis are listed in Table 75.
Element of HTA | Reference case | Section in Guide to the Methods of Technology Appraisal374 |
---|---|---|
Defining the decision problem | Clinical effectiveness and cost-effectiveness analysis of different types of THR and RS for the treatment of pain and disability in people with end-stage arthritis of the hip (scope developed by NICE375) | 5.2.5 and 5.2.6 |
Comparator(s) | Different types of primary THR compared with surface replacement for people in whom both procedures are suitable; different types of primary THR compared with each other for people who are not suitable for hip RS | 5.2.5 and 5.2.6 |
Perspective costs | NHS and PSS | 5.2.7–5.2.10 |
Perspective benefits | All health effects on individuals | 5.2.7–5.2.10 |
Type of economic evaluation | Cost-effectiveness analysis | 5.2.11 and 5.2.12 |
Synthesis of evidence on outcomes | Based on NJR database | 5.3 |
Measure of health effects | QALYs | 5.4 |
Source of data for measurement of health-related quality of life | Based on PROMs database (reported directly by patients and carers) | 5.4 |
Source of preference data for valuation of changes in health-related quality of life | Representative sample of the public | 5.4 |
Discount rate | An annual rate of 3.5% on both costs and health effects | 5.6 |
Equity weighting | An additional QALY has the same weight regardless of the other characteristics of the individuals receiving the health benefit | 5.12 |
Base-case analysis
For the base-case analysis we estimated the cost-effectiveness of THR compared with RS for patients who were eligible for both procedures using revision rates modelled using a bathtub model. Utilities for successful implant health states were varied with patient age throughout the model. Costs were based on NHS Supply Chain costs (Dr Philip Lewis, NHS Supply Chain, 2013, personal communication).
Similarly, we estimated the cost-effectiveness of the different categories of THR prostheses using revision rates based on the bathtub model. Utilities for successful implant health states were varied with patient age throughout the model. Again, costs were based on NHS Supply Chain costs (Dr Philip Lewis, NHS Supply Chain, 2013, personal communication).
Structural model assumptions
Transition probabilities
Time to revision was described according to well-fitting parametric models (the base case for the comparison of THR with RS and for the comparison between different THR categories was based on the bathtub model; in sensitivity analysis of THR compared with THR a log-normal parametric model was used, adjusted for age and sex). The risk of rerevision was based on rerevision rates obtained from the manufacturer’s submissions to NICE (sourced from the New Zealand joint registry376 by the manufacturer).
Utilities
Utilities for both models for the base-case analysis were obtained from the PROMs database (see Chapter 6). The mean EQ-5D-3L scores for the successful primary health state and successful revision health state were reduced by the mean EQ-5D-3L scores for the respective age band and sex at the end of each 10-year cycle to represent the impact of ageing on general health-related quality of life. The age-related utilities were assumed to be the same for the comparison of RS with THR and for the comparison of different types of THR. We assumed that at 6 months patients would have fully recovered from the surgery and this assumption was supported by the EQ-5D-3L responses obtained from patients at baseline, 3 months, 6 months and 12 months from Edlin et al. 40
Costs
For the comparison of THR with RS and the comparison between different types of THR, prices of the primary prostheses were based on the list prices obtained from the NHS Supply Chain. We assumed that, for the comparison between THR and RS, if initial RS surgery failed the patient would then be revised with a THR prosthesis and not a RS prosthesis. The prices of the revision prosthesis and the rerevision prosthesis were obtained from Vanhegan et al. 292 based on a weighted average of the mean costs of all revision procedures. For the comparison between different types of THR, we assumed that, if initial THR surgery failed, the same type of prosthesis was used for each category. Hence, we included the mean implant cost from Vanhegan et al. 292 based on a weighted average of the mean costs of all revision procedure.
For both sets of comparisons we included follow-up costs in the first year after surgery and the surgical cost of adverse event(s) resulting in revision surgery but because of a lack of reliable data we were not able to include the cost of other treatments for adverse events in the months following revision surgery. We have also not included end-of-life costs19,373 (Table 76).
Parameter | Assumptions |
---|---|
Transition probabilities | Time to revision was assumed to be described according to well-fitting parametric models. The risk of rerevision was based on the rerevision rate obtained from the manufacturer’s submissions to NICE |
Utilities | Utilities for the base-case analysis were obtained from the PROMs database. The utilities were assumed to be the same for the comparison between RS with THR and the comparison between different types of THR |
Costs | For the comparison between THR and RS and the comparison between different types of THR, the prices of the primary prostheses were based on the list prices obtained from the NHS Supply Chain. The price of the revision prosthesis and the rerevision prosthesis were obtained from Vanhegan et al.292 based on a weighted average of the mean costs of all revision procedure |
Estimation of model parameters
Resource use and cost inputs
Resource use and associated costs were required for the following health states:
-
successful primary procedure
-
revision procedure
-
successful revision procedure.
Health states 1 and 2 have two phases: a short-term phase with costs associated with surgery and the immediate aftermath of surgery, followed by a more prolonged phase including the costs of maintenance.
Rationale for the choice of parameter values
The process of identifying the relevant literature can be found in Chapter 6. Of the 11 core studies, three cost-effectiveness studies provided data for the economic model. These were the studies by Edlin et al. ,40 Vale et al. 19 and Vanhegan et al. 292
Edlin et al. 40 reported a cost–utility analysis of RS compared with THR alongside a RCT using NHS and PSS perspectives and costs were reported as UK pounds in 2009/10 prices. The study used HRG4 reference costs combined with NHS trust finance department list prices for implants and IPD on LOS. Resource use data and personal costs were obtained from patient-reported data. The study reported costs after 12 months by type of hip replacement (THR vs. RS) including the costs of initial operation/care, subsequent inpatient, outpatient, primary and community care, aids and medications, as well as private and social costs.
Vale et al. 19 assessed the clinical effectiveness and cost-effectiveness of RS compared with watchful waiting (i.e. patient monitoring, drug-based treatment and supportive activities including physiotherapy), THR and other bone-conserving treatments. 19 Cost data were reported in UK pounds in 2000/1 prices; costs for THR and revision THR were taken from the literature and prostheses costs for RS were obtained from manufacturers. Cost components for surgical interventions including use of the operating theatre, staff, radiography, outpatient visits and first-year follow-up costs were reported.
Vanhegan et al. 292 investigated the costs of revision THR. Costs were reported in UK pounds in 2007/8 prices and were obtained from the finance department of the tertiary centre and included costs of the implant, materials and augmentation, use of the operating theatre and recovery room, the inpatient stay and laboratory tests, radiology, pharmacy, physiotherapy and occupational therapy. The study provided cost data on 13 different implants and data on resource use and costs by reason for revision (aseptic loosening, deep infection, periprosthetic fracture and dislocation).
All three core studies provided important and relevant costs for THR and RS patients for use in the economic model, with prices updated to 2011/12 prices by applying the projected Health Service Cost Index (HSCI). 377 It is also important to mention that none of the studies identified in the literature included costs per component of prosthesis as grouped in our analysis.
Base-case cost inputs: resurfacing arthroplasty compared with total hip replacement
The cost of the primary THR or RS includes the cost of the prosthesis, the initial operation and the inpatient hospital stay. The cost of the RS prosthesis was obtained from the NHS Supply Chain (Dr Philip Lewis, NHS Supply Chain, 2013, personal communication). Information provided detailed the full list price for three suppliers using their most common brands of implant. These data were anonymised by averaging the cost for each component (Table 77). In real life these prices are often discounted (using a discount de-escalator based on the volume of the purchase).
Component | Average unit cost (£) | Supplier list price (£) | ||
---|---|---|---|---|
Supplier 1 | Supplier 2 | Supplier 3 | ||
Acetabular cup, HA coated | 1583 | 1690 | 1535 | 1523 |
Resurfacing head, cemented | 1031 | 1140 | 865 | 1089 |
Mixing bowla | 31 | NA | NA | NA |
Cement (one pack)a | 27 | NA | NA | NA |
Total cost | 2672 |
The costs of the THR prostheses were also obtained from the NHS Supply Chain. We obtained the full list price for the five most commonly used suppliers (details of suppliers were anonymised) using their most common brands of implant. We calculated a weighted mean THR cost based on the frequency of use of the different categories of THR (categories A–E) in the RS vs. THR comparison (Table 78).
Category | Number of male patients | Number of female patients | Total number of patients | Mean cost (£) | Weighted cost (£) |
---|---|---|---|---|---|
A | 6080 | 3812 | 9892 | 1557 | 589 |
B | 2177 | 741 | 2918 | 3016 | 336 |
C | 5803 | 2414 | 8217 | 3869 | 1215 |
D | 1104 | 477 | 1581 | 2650 | 160 |
E | 2100 | 1459 | 3559 | 1996 | 271 |
Weighted cost of THR prosthesis | 2571 |
The cost of the surgery itself was assumed to be the same for both THR and RS. The costs of theatre overheads, theatre staff and number of radiographs, etc. were taken from Vale et al. 19 and updated to current prices. 377 The total cost of surgery was estimated at £2805 (Table 79).
Resource use | 1996 prices | 2011/12 prices | |
---|---|---|---|
Primary THR (units) | Total cost (£) | Total cost (£) | |
Theatre overheads | 134 minutes | 655 | 1799 |
Theatre staff | – | 232 | 637 |
Number of radiographs | 6 | 134 | 368 |
Total cost per patient | 2805 |
The average LOS was based on point estimates as reported in Edlin et al. 40 The total cost of the inpatient stay for RS was estimated to be £1628. This was based on an average cost per day of a hospital stay of £296, multiplied by the average LOS of 5.5 days. 40 The average LOS for THR was 5.7 days and the total cost of the inpatient stay for THR was estimated to be £1687. RS was associated with a slightly shorter LOS (5.5 vs. 5.7 days); although this difference was not statistically significant, we assigned this slightly shorter LOS so as not to overestimate the cost of RS.
Cost of a revision procedure (total hip replacement or resurfacing arthroplasty)
The costs of revision were assumed to be the same for both THR and RS. The cost of a revision hip arthroplasty was obtained from Vanhegan et al. ;292 the data were based on 305 successive revisions following THR in 286 patients between 1999 and January 2008. In this study, patient-specific resource use data were reported for the implant, materials, use of the theatre, use of the recovery room, inpatient stay, physiotherapy, occupational therapy and pharmacy, radiology and laboratory, with costs based on NHS 2007/8 rates for payment by results (PbR).
Costs were inflated to 2011/12 prices by applying the projected HSCI. 378 Importantly, the study also reported mean costs for revision surgery for aseptic cases, deep infection, periprosthetic fracture and dislocation. Hence, the cost of revision was calculated based on a weighted average of the mean costs of all revision procedures (Table 80).
Indication | Number of patients | Mean cost (£), 2007/8 prices | Mean cost (£), 2011/12 prices |
---|---|---|---|
Aseptic loosening | 194 | 11,897 | 13,226 |
Deep infection | 76 | 21,937 | 24,387 |
Periprosthetic fracture | 24 | 18,185 | 20,216 |
Dislocation | 11 | 10,893 | 12,109 |
Weighted average | 16,517 |
We used this cost because the frequency of aseptic loosening found in this study is comparable to that reported in the NJR in 2006.
Cost of a successful revision procedure (total hip replacement or resurfacing arthroplasty)
The cost of follow-up post primary THR or RS was obtained from the study by Edlin et al. ,40 which was based on resource use, using patient-reported data at 3, 6 and 12 months. Cost data on outpatient care, primary and community care, aids and adaptations provided by the NHS/social services, medication (pain relief and other NHS medication) and personal costs (out-of-pocket expenditure such as medicine usage and time off work for either the patient or a carer) were reported for both the THR arm and the RS arm. The NHS and social care costs of follow-up in 2011/12 prices were £394 for the THR arm and £501 for the RS arm at 12 months (Table 81).
Costs | 2009/10 prices (£) | 2011/12 prices (£) | ||
---|---|---|---|---|
Total cost RS | Total cost THR | Total cost RS | Total cost THR | |
Outpatient care | 360 | 276 | 383 | 294 |
Primary/community care | 63 | 49 | 67 | 52 |
Aids and adaptations | 21 | 21 | 22 | 22 |
Medications | 27 | 24 | 29 | 26 |
Total cost | 501 | 394 |
We used cost data from this study because they were based on a RCT and the mean age of RS patients (56.3 years) in this study was comparable to that reported in the NJR database (55 years).
Base-case cost inputs: comparison of different types of hip replacement
Resource use and cost assumptions were mostly assumed to be the same as for the comparison between THR and RS. The cost of primary THR included the operation cost, prosthesis cost, hospital ward cost and follow-up cost. The cost of the operation were assumed to be the same for all types of prostheses.
The total cost of the inpatient stay was estimated to be £1687, based on the average cost per day of a hospital stay, multiplied by the average LOS (5.7 days), as reported in Edlin et al. 40 The total cost of surgery including radiography, theatre time, staff and overheads was estimated at £2805. 377 Outpatient costs and other follow-up costs were estimated to be £394 based on Edlin et al. 40 (see Table 81).
Prosthesis cost
We were not able to use published costs for the costs of the prostheses because prostheses were grouped as cemented, cementless or hybrid rather than being based on the separately identifiable prosthesis components as categorised in our analysis (categories A–E). Our base-case cost for each category of prosthesis was obtained from the NHS Supply Chain (Dr Philip Lewis, NHS Supply Chain, 2013, personal communication). Anonymised information was available detailing the list price per component for all five categories. The cost data from the five most commonly used suppliers using their most common brands of implant were available and an average cost was calculated. Again, this is subject to a volume de-escalator in price for the NHS (Table 82).
Component | Average unit cost (£) | Supplier 1 (£) | Supplier 2 (£) | Supplier 3 (£) | Supplier 4 (£) | Supplier 5 (£) |
---|---|---|---|---|---|---|
Category A – CeMoP | ||||||
Cemented stem | 701.60 | 625 | 523 | 706 | 798 | 856 |
Metal head | 297.20 | 204 | 231 | 272 | 375 | 404 |
Polyethylene cup – cemented | 249.60 | 164 | 227 | 311 | 332 | 214 |
Cemented stem centraliser | 47.50 | NA | 19 | 76 | NA | NA |
Bone cement plug | 58.38 | 44.5 | 49 | NA | 81 | 59 |
Cemented stem and cup extras | 203.10 | |||||
Total | 1557.38 | |||||
Category B – CeLMoP | ||||||
Cementless HAC stem | 1342.20 | 1370 | 1129 | 1110 | 1816 | 1286 |
Metal stem | 292.20 | 204 | 231 | 226 | 396 | 404 |
Metal cup – cementless HA | 883.40 | 910 | 759 | 892 | 941 | 915 |
Liner – polyethylene | 412.20 | 190 | 447 | 435 | 547 | 442 |
Fixation screw | 85.60 | 82 | 96 | 73 | 74 | 103 |
Total | 3015.60 | |||||
Category C – CeLCoC | ||||||
Cementless HAC stem | 1342.20 | 1370 | 1129 | 1110 | 1816 | 1286 |
Ceramic head | 735.80 | 620 | 764 | 738 | 857 | 700 |
Metal cup – cementless HA | 883.40 | 910 | 759 | 892 | 941 | 915 |
Liner ceramic | 821.80 | 815 | 759 | 789 | 1,046 | 700 |
Fixation screw | 85.60 | 82 | 96 | 73 | 74 | 103 |
Total | 3868.80 | |||||
Category D – HyMoP | ||||||
Cemented stem | 701.60 | 625 | 523 | 706 | 798 | 856 |
Metal head | 297.20 | 204 | 231 | 272 | 375 | 404 |
Metal cup – cementless HA | 883.40 | 910 | 759 | 892 | 941 | 915 |
Liner polyethylene | 412.20 | 190 | 447 | 435 | 547 | 442 |
Cemented stem centraliser | 47.50 | NA | 19 | 76 | NA | NA |
Bone cement plug | 58.38 | 44.5 | 49 | NA | 81 | 59 |
Fixation screw | 85.60 | 82 | 96 | 73 | 74 | 103 |
Cemented stem extras | 163.90 | |||||
Total | 2649.78 | |||||
Category E – CeCoP | ||||||
Cemented stem | 701.60 | 625 | 523 | 706 | 798 | 856 |
Ceramic head | 735.80 | 620 | 764 | 738 | 857 | 700 |
Polyethylene cup – cemented | 249.60 | 164 | 227 | 311 | 332 | 214 |
Cemented stem centraliser | 47.50 | NA | 19 | 76 | NA | NA |
Bone cement plug | 58.38 | 44.5 | 49 | NA | 81 | 59 |
Cemented stem and cup extras | 203.10 | |||||
Total | 1995.98 |
The pricing of a bone cement pack including bone cement, mixing devices and pressuriser was available from one supplier only. We have itemised the cost of a bone cement pack for a cemented stem and cup and a cemented stem only (Table 83).
Pack | Component | Total cost (£) |
---|---|---|
Cemented stem and cup | Cement 40-g and 80-g pack | 203.10 |
Cement syringe | ||
Femoral pressuriser | ||
Cement mixing pot | ||
Acetabular pressuriser | ||
Cemented stem | Cement 80-g pack | 163.90 |
Cement syringe | ||
Femoral pressuriser | ||
Cement mixing pot |
A summary of the transition probabilities, utilities and cost inputs for the cost–utility model
The justification for the transition probabilities between health states based on parametric models of time to revision consisted of model diagnostic plots, visual goodness of fit and information criteria. Prostheses costs were obtained from the NHS Supply Chain as alternative sources of information were lacking.
Utilities were calculated from information in the PROMs database. This was justified because it represented patient-centred EQ-5D-3L data in a population appropriate to the decision problem and the NJR database.
Costs used for the elements of the interventions were justified on the basis of our literature search for relevant information. Mortality associated with surgery was adapted from the value common to all other hip replacement models.
The bathtub parameters used to calculate the transition probabilities between health states employed for the base case are summarised in Table 84.
Comparison | Prosthesis | Bathtub alpha | Bathtub beta | Bathtub gamma |
---|---|---|---|---|
RS vs. THR (matched) | ||||
Base case | RS | 0.0030976 | 0.0358272 | 3.971709 |
Base case | THR | 0.0005699 | 0.0123899 | 1.918951 |
THR vs. THR | ||||
Base case | CeMoP (category A) | 0.0003396 | 0.0083374 | 2.163733 |
Base case | CeLMoP (category B) | 0.0004045 | 0.0337383 | 6.832735 |
Base case | CeLCoC (category C) | 0.0005333 | 0.0236369 | 4.051712 |
Base case | HyMoP (category D) | 0.0003642 | 0.0158328 | 4.68618 |
Base case | CeCoP (category E) | 0.0001935 | 0.0039017 | 0.6967542 |
Table 85 provides a summary of the inputs (transition probabilities, utilities and costs) used in the base-case analysis.
Inputs | Mean value | SE | Distribution | Source | |
---|---|---|---|---|---|
Transition probabilities | |||||
Surgical mortalitya | 0.0050 | 0.001 | NJR48 | ||
Risk of rerevision | 0.0326 | NA | DePuy submission | ||
Beta distribution, parameter alpha | Beta distribution, parameter beta | ||||
Utilities | |||||
Age 50–60 years | 0.7529 | 0.004 | 1296 | 488 | PROMs97 |
Age 60–70 years | 0.7789 | 0.002 | 7397 | 2427 | PROMs97 |
Age 70–80 years | 0.7637 | 0.002 | 22,244 | 6315 | PROMs97 |
Age 80+ years | 0.7210 | 0.003 | 28,054 | 8681 | PROMs97 |
Revision surgery | 0.5624 | 0.340 | 9092 | 3518 | PROMs97 |
Gamma distribution, parameter alpha | Gamma distribution, parameter beta | ||||
Costs (£) | |||||
RS vs. THR | |||||
RS | |||||
Prosthesis cost | 2778 | NA | NA | NA | NHS Supply Chain |
Surgery costs (excluding prosthesis) | 1485 | NA | NA | NA | Vale et al.19 |
Hospital inpatient stay | 1628 | NA | NA | NA | Edlin et al.40 |
Successful primary RS | 501 | 44 | 130 | 4 | Edlin et al.40 |
Revision surgery | 16,517 | 456 | 1314 | 13 | Vanhegan et al.292 |
Successful revision surgery | 394 | 30 | 169 | 2 | Edlin et al. 201240 |
THR | |||||
Prosthesis cost | 2571 | NA | NA | NA | NHS Supply Chain |
Surgery costs (excluding prosthesis) | 1485 | NA | NA | NA | Vale et al.19 |
Hospital inpatient stay | 1687 | NA | NA | NA | Edlin et al.40 |
Successful primary THR | 394 | 30 | 169 | 2 | Edlin et al.40 |
Revision surgery | 16,517 | 456 | 1314 | 13 | Vanhegan et al.292 |
Successful revision surgery | 394 | 30 | 169 | 2 | Edlin et al. 201240 |
Different types of THR | |||||
Category A – CeMoP | 1557 | NA | NA | NA | NHS Supply Chain |
Category B – CeLMoP | 3017 | NA | NA | NA | NHS Supply Chain |
Category C – CeLCoC | 3869 | NA | NA | NA | NHS Supply Chain |
Category D – HyMoP | 2650 | NA | NA | NA | NHS Supply Chain |
Category E – CeCoP | 1996 | NA | NA | NA | NHS Supply Chain |
Other costs (£) | |||||
Surgery costs (excluding prosthesis) | 1485 | NA | NA | NA | Vale et al.19 |
Hospital inpatient stay | 1687 | NA | NA | NA | Edlin et al.40 |
Successful primary THR | 394 | 30 | 169 | 2 | Edlin et al.40 |
Revision surgery | 16,517 | 456 | 1314 | 13 | Vanhegan et al.292 |
Successful revision surgery | 394 | 30 | 169 | 2 | Edlin et al. 201240 |
Cost-effectiveness analysis
The base-case analysis is based on costs and outcomes for all THR and RS patients over two time horizons: 10 years and lifetime.
For the RS compared with THR base-case analysis, the male and female patients who received RS were successfully propensity matched by age with THR patients from THR categories A–E, and transition probabilities were calculated using bathtub model fits (predicting an increasing hazard beyond the 10-year observation period). The bathtub model defines a decreasing followed by an increasing hazard with time according to the equation:
where a, b and g are constant parameters and t is time.
For the comparison of different types of THR base-case analysis, transition probabilities were calculated using bathtub model fits for categories A–E.
We report total mean costs and total mean QALYs related to THR and RS, and incremental costs per QALY (ICERs) gained. The cost-effectiveness model for all THR categories had more than two mutually exclusive comparisons; we report total mean costs and total mean QALYs. The categories were ranked in order of increasing cost. We eliminated categories for which another category was cheaper and more effective (simple dominance). If there was a linear combination of two other categories that were more costly and less effective, these were eliminated (extended dominance). With the remaining options, we calculated incremental costs per QALY gained.
We present first the deterministic results, followed by the probabilistic results. To represent the uncertainty in the parameters used in the model and to illustrate sampling uncertainty, we undertook probabilistic analyses using 1000 simulations. The results from these simulations were plotted on a cost-effectiveness plane with 95% CIs. Each point is a simulation from the probabilistic analysis. The plot illustrates the uncertainty surrounding the incremental costs and QALYs for the two groups being compared. We also produced CEACs to illustrate the effect of sampling uncertainty, in which individual model parameters were sampled from the appropriate probability distribution. CEACs were reported for a WTP threshold from £0 to £50,000. The perspective taken is from the UK NHS and PSS. Discounting of costs and benefits at 3.5% was undertaken according to UK guidelines. 374
Sensitivity analyses
Sensitivity analyses were conducted by altering base-case inputs to the model. Several types of subgroup and scenario analysis were explored, encompassing changes to the RS/THR comparison and the THR/THR comparison.
Subgroup analysis for the comparison between resurfacing arthroplasty and total hip replacement and the comparison between different types of total hip replacement
-
Revision rates were much higher for women receiving RS than for men and, because the revision rate varies according to the age of the patient, subgroup analyses focused on comparing populations stratified by sex and controlled for age. Therefore, in the sensitivity analysis we separately compared the cost-effectiveness of RS and THR for men and women aged 40, 50, and 60 years at the time of the primary implant, using age-matched populations and a bathtub model stratified by sex and controlled for age.
-
For THR compared with THR, the modelled time to revision was stratified by age (< 65 years and > 65 years) and sex and models were controlled for age. We undertook these subgroup analyses because the use of different categories of THR prosthesis differed by age and sex and because recipients of hip replacement interventions aged > 65 years approximate a population unlikely to be considered candidates for RS. We compared the cost-effectiveness of different types of THR for patients aged < 65 years (40, 50 and 60 years) using a bathtub model and for patients aged > 65 years (70 and 80 years) using a log-normal model (Table 86).
Sex | Prosthesis | Bathtub alpha | Bathtub beta | Bathtub gamma | Bathtub age coefficient |
---|---|---|---|---|---|
RS vs. THR (matched): bathtub parameters | |||||
Male | RS | 0.0020179 | 0.0370237 | 4.443342 | –0.0380901 |
Male | THR | 0.0006006 | 0.0135972 | 2.384484 | –0.0258836 |
Female | RS | 0.0044984 | 0.0280047 | 2.558539 | –0.0118076 |
Female | THR | 0.0005964 | 0.0099966 | 1.314233 | –0.016463 |
THR vs. THR: bathtub parameters | |||||
Male < 65 years | CeMoP (category A) | 0.0003869 | 0.008084 | 0.7177154 | –0.0207576 |
Male < 65 years | CeLMoP (category B) | 0.0010417 | 0.0245433 | 4.822729 | –0.0024683 |
Male < 65 years | CeLCoC (category C) | 0.0006243 | 0.0212657 | 3.032461 | –0.0110798 |
Male < 65 years | HyMoP (category D) | 0.0005998 | 0.0237569 | 3.576745 | –0.0172004 |
Male < 65 years | CeCoP (category E) | 0.0004695 | 0.0033726 | 1.782609 | –0.0327686 |
Female < 65 years | CeMoP (category A) | 0.0006692 | 0.0132853 | 3.675229 | –0.0293667 |
Female < 65 years | CeLMoP (category B) | Not resolved | |||
Female < 65 years | CeLCoC (category C) | 0.0006154 | 0.0215004 | 3.952961 | –0.0088734 |
Female < 65 years | HyMoP (category D) | 0.00076 | 0.0077105 | 3.21092 | 0.0048101 |
Female < 65 years | CeCoP (category E) | 0.0004703 | 0.0071811 | 3.211915 | –0.0078225 |
Log-normal mu | Log-normal sigma | Log-normal age coefficient | |||
THR vs. THR: log-normal parameters | |||||
Male > 65 years | CeMoP (category A) | 10.37363 | 4.075863 | 0.0020929 | |
Male > 65 years | CeLMoP (category B) | 10.52551 | 4.554688 | –0.0483328 | |
Male > 65 years | CeLCoC (category C) | 9.611438 | 4.12394 | –0.0448092 | |
Male > 65 years | HyMoP (category D) | 10.31021 | 4.093764 | 0.0126215 | |
Male > 65 years | CeCoP (category E) | 10.54446 | 3.971899 | –0.0407056 | |
Female > 65 years | CeMoP (category A) | 9.815575 | 3.636813 | 0.033098 | |
Female > 65 years | CeLMoP (category B) | 12.10535 | 5.138115 | –0.0241371 | |
Female > 65 years | CeLCoC (category C) | 11.471 | 4.744101 | –0.0287428 | |
Female > 65 years | HyMoP (category D) | 12.18021 | 4.757849 | 0.0504173 | |
Female > 65 years | CeCoP (category E) | 10.13035 | 3.562737 | 0.0631827 |
For subgroup analyses, mean EQ-5D index scores were split by sex and age band (Table 87).
Age group (years) | Mean value | SE | Beta distribution, parameter alpha | Beta distribution, parameter beta | Source |
---|---|---|---|---|---|
Men | |||||
40–50 | 0.736 | 0.0179 | 443 | 159 | PROMs97 |
50–60 | 0.767 | 0.0066 | 3133 | 952 | PROMs97 |
60–70 | 0.762 | 0.0038 | 9112 | 2393 | PROMs97 |
70–80 | 0.790 | 0.0034 | 11,488 | 3054 | PROMs97 |
80+ | 0.0071 | 2816 | 964 | PROMs97 | |
Women | |||||
40–50 | 0.720 | 0.0129 | 872 | 339 | PROMs97 |
50–60 | 0.742 | 0.0058 | 4287 | 1491 | PROMs97 |
60–70 | 0.769 | 0.0032 | 13,128 | 3944 | PROMs97 |
70–80 | 0.747 | 0.0029 | 16,732 | 5667 | PROMs97 |
80+ | 0.710 | 0.0048 | 6305 | 2575 | PROMs97 |
Revision surgery | |||||
Males | 0.575 | 0.009 | 1496 | 1106 | PROMs97 |
Females | 0.553 | 0.007 | 2201 | 1779 | PROMs97 |
Sensitivity analysis around the base-case time to revision for the comparison between resurfacing arthroplasty and total hip replacement
-
The bathtub model was controlled for age and sex because the age distributions of the matched populations were somewhat removed from the normal distribution (see Chapter 7). Transition probabilities were then calculated for the average population (35% female, age 55.8 years) (Table 88).
Comparison | Prosthesis | Bathtub alpha | Bathtub beta | Bathtub gamma | Bathtub age coefficient | Bathtub sex coefficient |
---|---|---|---|---|---|---|
RS vs. THR (matched) | ||||||
Sensitivity analysis | RS | 0.00373026 | 0.04400835 | 3.8505838 | –0.02491814 | –0.4098118 |
Sensitivity analysis | THR | 0.00058692 | 0.01189397 | 1.989425 | –0.02238228 | 0.05307551 |
Sensitivity analyses around the base-case time to revision for the comparison between different types of total hip replacement
-
The bathtub model was controlled for age and sex. This was carried out because both age and sex differed between categories and both variables influenced the time to revision (see Chapter 9). Transition probabilities were then calculated for the age and sex mix across all five categories (63.5% female, age 71.6 years).
-
A log-normal model was used because the information criteria scores and the visual plot for this model showed it to be the next best fit after the bathtub model, while providing a decreasing hazard on extrapolation that may be more suitable for older populations.
-
A log-normal model controlled for age and sex was used because both age and sex differed between categories and both variables were associated with time to revision. Transition probabilities were then calculated for the age and sex mix across all five categories (63.5% female, age 71.6 years) (Table 89).
Comparison | Prosthesis | Bathtub alpha | Bathtub beta | Bathtub gamma | Bathtub age coefficient | Bathtub sex coefficient |
---|---|---|---|---|---|---|
THR vs. THR – bathtub model controlled for age and sex | ||||||
Sensitivity analysis | CeMoP (category A) | 0.0003132 | 0.008041 | 2.081738 | –0.0236324 | 0.2120103 |
Sensitivity analysis | CeLMoP (category B) | 0.0003712 | 0.030807 | 6.827069 | 0.0014804 | 0.2144175 |
Sensitivity analysis | CeLCoC (category C) | 0.0004542 | 0.0203098 | 4.028858 | –0.0070475 | 0.1657326 |
Sensitivity analysis | HyMoP (category D) | 0.000317 | 0.0145044 | 4.595129 | –0.019714 | 0.2461955 |
Sensitivity analysis | CeCoP (category E) | 0.0001675 | 0.0034053 | 0.680878 | –0.0149548 | 0.1011695 |
Log-normal mu | Log-normal sigma | |||||
THR vs. THR – log-normal model | ||||||
Sensitivity analysis | CeMoP (category A) | 9.738756 | 3.716562 | |||
Sensitivity analysis | CeLMoP (category B) | 10.71464 | 4.573634 | |||
Sensitivity analysis | CeLCoC (category C) | 9.526446 | 4.034555 | |||
Sensitivity analysis | HyMoP (category D) | 10.66382 | 4.215337 | |||
Sensitivity analysis | CeCoP (category E) | 9.574467 | 3.481879 | |||
Log-normal mu | Log-normal sigma | Log-normal age coefficient | Log-normal sex coefficient | |||
THR vs. THR – log-normal model controlled for age and sex | ||||||
Sensitivity analysis | CeMoP (category A) | 9.825973 | 3.730391 | 0.03258 | –0.3417841 | |
Sensitivity analysis | CeLMoP (category B) | 10.84608 | 4.563342 | –0.0077298 | –0.3729022 | |
Sensitivity analysis | CeLCoC (category C) | 9.747396 | 4.036228 | 0.0093327 | –0.2627816 | |
Sensitivity analysis | HyMoP (category D) | 10.85018 | 4.238437 | 0.0314349 | –0.3886501 | |
Sensitivity analysis | CeCoP (category E) | 9.729236 | 3.482196 | 0.01658 | –0.1431533 |
Sensitivity analyses for cost inputs
For these sensitivity analyses we varied the prosthesis cost using the highest and lowest cost estimates from the list prices supplied by the NHS Supply Chain:
-
RS vs. THR comparison:
-
highest list price for both RS and THR prostheses
-
lowest list price for both RS and THR prostheses (Table 90).
-
-
THR vs. THR comparison:
-
highest list price for all THR prostheses
-
lowest list price for all THR prostheses (Table 91).
-
-
We assumed a 20% price de-escalator to reflect what NHS trusts would pay for implants in reality (this is usually at a discounted rate based on the volume of purchase):
-
RS vs. THR comparison: the impact of this assumption was not tested
-
THR vs. THR comparison: a 20% reduction in the cost of each category of prosthesis (see Table 91).
-
Prosthesis | Base-case list price (£) | Highest list price (£) | Lowest list price (£) |
---|---|---|---|
THR | 2571 | 3073 | 2180 |
RS | 2778 | 2994 | 2487 |
Prosthesis | Component | Highest average unit cost (£) | Lowest average unit cost (£) | 20% reduction in prosthesis list price: average unit cost (£) |
---|---|---|---|---|
Category A – CeMoP | Cemented stem | 1789 | 1241 | 1246 |
Metal head | ||||
Polyethylene cup – cemented | ||||
Cemented stem centraliser | ||||
Bone cement plug | ||||
Cemented stem and cup extras | ||||
Category B – CeLMoP | Cementless HAC stem | 3774 | 2662 | 2413 |
Metal stem | ||||
Metal cup – cementless HA | ||||
Liner – polyethylene | ||||
Fixation screw | ||||
Category C – CeLCoC | Cementless HAC stem | 4734 | 3507 | 3095 |
Ceramic head | ||||
Metal cup – cementless HA | ||||
Liner – ceramic | ||||
Fixation screw | ||||
Category D – HyMoP | Cemented stem | 2980 | 2219 | 2120 |
Metal head | ||||
Metal cup – cementless HA | ||||
Liner – polyethylene | ||||
Cemented stem centraliser | ||||
Bone cement plug | ||||
Fixation screw | ||||
Cemented stem extras | ||||
E – CeCoP | Cemented stem | 2271 | 1657 | 1597 |
Ceramic head | ||||
Polyethylene cup – cemented | ||||
Cemented stem centraliser | ||||
Bone cement plug | ||||
Cemented stem and cup extras |
Sensitivity analyses for utility inputs
In the base case, utility values were obtained from the PROMs data set. 97 For the sensitivity analysis, utility values were taken from Rolfson et al. 298 This study reported 1-year postoperative utility values for 32,396 patients from the SHAR using a UK EQ-5D tariff. Utility values from the PROMs data set were applied to re-revision health as in the base case. The impact of this assumption was tested only for the comparison between different types of THR and not for the comparison between RS and THR (Table 92).
One-way sensitivity analysis for category E compared with category A total hip replacement (tornado diagram)
One-way sensitivity analysis was conducted to examine the individual impact of the net monetary benefit of category E (CeCoP) compared with category A (CeMoP) THR. All parameters were varied around the base-case values within the plausible ranges as specified.
Scenario analysis around revision rates using values obtained from clinical trials/registries
We did not feel that it would be appropriate to use data from other clinical trials/registries to check our findings because the clinical effectiveness studies of revision rates that we identified were based on low counts and/or on small trials with a great deal of uncertainty. Overall, across the THR/THR and THR/RS comparisons, trials were often based on selective populations or interventions. Data that could be obtained from studies examining revision rates were inconclusive and often the results had wide CIs.
Results of the cost-effectiveness analysis
We present here the deterministic and probabilistic cost-effectiveness results for the comparison between RS and THR and the comparison between different types of THR.
Base-case results
Resurfacing arthroplasty compared with total hip replacement
In the base-case analysis we compared the cost-effectiveness of different types of primary THR compared with RS for people in whom both procedures are suitable.
Table 93 shows the deterministic and probabilistic results for the 10-year and lifetime horizons. For all scenarios the mean cost of RS was higher than that of THR and the mean QALYs were lower. For all scenarios the ICER for RS was dominated by THR, that is, THR was cheaper and more effective than RS.
Analysis | RS | THR |
---|---|---|
Deterministic | ||
10-year time horizon | ||
Total mean cost (£) | 22,519 | 11,879 |
Total mean QALYs | 7.2830 | 7.4147 |
Incremental cost (£) | 10,641 | |
Incremental QALYs | –0.1317 | |
ICER (£) | Dominated | |
Lifetime horizon | ||
Total mean cost (£) | 29,603 | 18,113 |
Total mean QALYs | 14.6968 | 14.7846 |
Incremental cost (£) | 11,490 | |
Incremental QALYs | –0.0879 | |
ICER (£) | Dominated | |
Probabilistic | ||
10-year time horizon | ||
Total mean cost (£) | 22,615 | 11,887 |
Total mean QALYs | 7.2823 | 7.4150 |
Incremental cost (£) | 10,729 | |
Incremental QALYs | –0.1327 | |
ICER (£) | Dominated | |
Lifetime horizon | ||
Total mean cost (£) | 29,770 | 18,120 |
Total mean QALYs | 14.6963 | 14.7848 |
Incremental cost (£) | 11,650 | |
Incremental QALYs | –0.0885 | |
ICER (£) | Dominated |
Figure 59a and b shows the cost-effectiveness planes for THR compared with RS for the 10-year and lifetime horizons. The graph clearly shows that THR dominates RS, as the iterations fall in the north-west quadrant of the plane, that is, RS is clearly more costly and less effective than THR. Figure 59c and d shows the CEACs for the two time horizons. For a WTP threshold from £0 to £50,000 per QALY, THR is the more cost-effective option.
Comparison of different categories of total hip replacement
In the base-case analysis, using a bathtub model, we compared the cost-effectiveness of different categories of primary THR with each other for patients who were not suitable for RS. Table 94 shows the deterministic and probabilistic results for the 10-year and lifetime horizons; results were ranked by the least costly option. For the 10-year time horizon (both deterministic and probabilistic), category A was cheaper than all of the other categories; however, the QALYs were slightly higher for category E than for the other categories. The ICER for category A compared with category E was £166,217 per QALY gained for the deterministic analysis and £225,225 per QALY gained for the probabilistic analysis. However, when looking at the lifetime scenario (both deterministic and probabilistic), the mean cost for category E was slightly lower and the mean QALYs for category E were slightly higher than the corresponding values for the other categories. Hence, category E dominated the other four categories.
Analysis | Total mean cost (£) | Total mean QALYs | Comparison | Incremental cost (£) | Incremental QALYs | ICER (£) |
---|---|---|---|---|---|---|
Deterministic | ||||||
10-year time horizon | ||||||
A | 9444 | 7.4189 | – | – | – | – |
E | 9743 | 7.4207 | E vs. A | 299 | 0.0018 | 166,217 |
D | 10,588 | 7.4182 | D vs. E | 845 | –0.0025 | Dominated |
B | 11,155 | 7.4156 | B vs. D | 567 | –0.0026 | Dominated |
C | 12,112 | 7.4143 | C vs. B | 957 | –0.0013 | Dominated |
Lifetime horizon | ||||||
E | 14,522 | 14.7909 | – | – | – | – |
A | 14,801 | 14.7887 | A vs. E | 278 | –0.0022 | Dominated |
D | 16,040 | 14.7881 | D vs. A | 1240 | –0.0006 | Dominated |
B | 16,804 | 14.7861 | B vs. D | 764 | –0.0020 | Dominated |
C | 18,226 | 14.7845 | C vs. B | 1422 | –0.0016 | Dominated |
Probabilistic | ||||||
10-year time horizon | ||||||
A | 9449 | 7.4199 | – | – | – | – |
E | 9775 | 7.4213 | E vs. A | 326 | 0.0014 | 225,225 |
D | 10,594 | 7.4192 | D vs. E | 820 | –0.0021 | Dominated |
B | 11,160 | 7.4165 | B vs. D | 566 | –0.0026 | Dominated |
C | 12,121 | 7.4152 | C vs. B | 961 | –0.0014 | Dominated |
Lifetime horizon | ||||||
E | 14,456 | 14.7914 | – | – | – | – |
A | 14,740 | 14.7892 | A vs. E | 284 | –0.0022 | Dominated |
D | 15,975 | 14.7885 | D vs. A | 1234 | –0.0006 | Dominated |
B | 16,730 | 14.7866 | B vs. D | 755 | –0.0019 | Dominated |
C | 18,163 | 14.7850 | C vs. B | 1432 | –0.0016 | Dominated |
Figure 60a and b shows the cost-effectiveness planes with 95% CIs for the comparison between different types of THR. For the 10-year time horizon, although category A is cheaper, category E generates more QALYs. For the lifetime horizon, category E is more cost-effective (i.e. cheaper and more effective) than the other four categories. Figure 60c and d shows the CEACs for the comparison between different types of THR using a bathtub model. For the 10-year time horizon, if the decision-maker is willing to pay £20,000 per QALY, category A is 95% more cost-effective than the other four categories. For the lifetime horizon, if a decision-maker is willing to pay anything from £0 to £50,000 per QALY, category E is > 90% cost-effective.
Sensitivity analysis results
This section presents the results from the deterministic and probabilistic sensitivity analyses.
Subgroup analysis: resurfacing arthroplasty compared with total hip replacement
Tables 95 and 96 shows the deterministic and probabilistic results, respectively, for RS compared with THR, presented separately for men and women by age group (40, 50 and 60 years). The incremental cost difference and the incremental QALY difference between THR and RS were higher for women than for men for all age groups. Following the base-case results, RS is clearly dominated by THR, that is, THR is cheaper and more effective than RS.
Analysis | Age 40 years | Age 50 years | Age 60 years | |||
---|---|---|---|---|---|---|
RS | THR | RS | THR | RS | THR | |
Women | ||||||
10-year time horizon | ||||||
Total mean cost (£) | 23,230 | 11,877 | 23,142 | 11,665 | 22,967 | 11,427 |
Total mean QALYs | 7.0604 | 7.1891 | 7.1940 | 7.3373 | 7.2501 | 7.4072 |
Incremental cost (£) | 11,353 | 11,476 | 11,541 | |||
Incremental QALYs | –0.1287 | –0.1432 | –0.1571 | |||
ICER (£) | Dominated | Dominated | Dominated | |||
Lifetime horizon | ||||||
Total mean cost (£) | 33,272 | 21,637 | 31,248 | 18,790 | 28,677 | 15,904 |
Total mean QALYs | 16.7060 | 16.8272 | 14.9977 | 15.1024 | 12.6013 | 12.6785 |
Incremental cost (£) | 11,635 | 12,458 | 12,773 | |||
Incremental QALYs | –0.1212 | –0.1047 | –0.0772 | |||
ICER (£) | Dominated | Dominated | Dominated | |||
Men | ||||||
10-year time horizon | ||||||
Total mean cost (£) | 22,100 | 12,022 | 22,019 | 11,671 | 21,820 | 11,307 |
Total mean QALYs | 7.2311 | 7.3407 | 7.4061 | 7.5345 | 7.3816 | 7.5205 |
Incremental cost (£) | 10,078 | 10,348 | 10,513 | |||
Incremental QALYs | –0.1096 | –0.1284 | –0.1389 | |||
ICER (£) | Dominated | Dominated | Dominated | |||
Lifetime horizon | ||||||
Total mean cost (£) | 30,805 | 21,523 | 28,798 | 18,126 | 26,313 | 15,003 |
Total mean QALYs | 16.5899 | 16.6779 | 14.7441 | 14.8238 | 12.1711 | 12.2304 |
Incremental cost (£) | 9283 | 10,672 | 11,310 | |||
Incremental QALYs | –0.0879 | –0.0797 | –0.0593 | |||
ICER (£) | Dominated | Dominated | Dominated |
Analysis | Age 40 years | Age 50 years | Age 60 years | |||
---|---|---|---|---|---|---|
RS | THR | RS | THR | RS | THR | |
Women | ||||||
10-year time horizon | ||||||
Total mean cost (£) | 23,233 | 11,883 | 23,125 | 11,672 | 22,962 | 11,414 |
Total mean QALYs | 7.0599 | 7.1886 | 7.1937 | 7.3370 | 7.2495 | 7.4069 |
Incremental cost (£) | 11,349 | 11,453 | 11,549 | |||
Incremental QALYs | –0.1287 | –0.1433 | –0.1574 | |||
ICER (£) | Dominated | Dominated | Dominated | |||
Lifetime horizon | ||||||
Total mean cost (£) | 33,291 | 21,720 | 31,247 | 18,802 | 28,669 | 15,883 |
Total mean QALYs | 16.7033 | 16.8251 | 14.9976 | 15.1024 | 12.6010 | 12.6783 |
Incremental cost (£) | 11,570 | 12,445 | 12,785 | |||
Incremental QALYs | –0.1218 | –0.1047 | –0.0773 | |||
ICER (£) | Dominated | Dominated | Dominated | |||
Men | ||||||
10-year time horizon | ||||||
Total mean cost (£) | 22,106 | 12,027 | 22,015 | 11,659 | 21,828 | 11,307 |
Total mean QALYs | 7.2313 | 7.3408 | 7.4061 | 7.5334 | 7.3814 | 7.5204 |
Incremental cost (£) | 10,080 | 10,357 | 10,521 | |||
Incremental QALYs | –0.1095 | –0.1284 | –0.1389 | |||
ICER (£) | Dominated | Dominated | Dominated | |||
Lifetime horizon | ||||||
Total mean cost (£) | 30,765 | 21,533 | 28,778 | 18,143 | 26,314 | 15,022 |
Total mean QALYs | 16.5895 | 16.6775 | 14.7433 | 14.8232 | 12.1706 | 12.2301 |
Incremental cost (£) | 9231 | 10,635 | 11,292 | |||
Incremental QALYs | –0.0880 | –0.0799 | –0.0595 | |||
ICER (£) | Dominated | Dominated | Dominated |
The results from Tables 95 and 96 are reflected in the cost-effectiveness planes and CEACs (Figures 61 and 62).
Subgroup analyses: comparison of different types of total hip replacement (patients aged > 65 years)
The deterministic and probabilistic results for the different THR categories over a 10-year time horizon, split by age and sex, are shown in Tables 97 and 98, respectively, along with the corresponding ICERs (when appropriate). For both men and women aged both 70 years and 80 years, although category A was cheaper, category E was more effective.
Category | Total mean cost (£) | Total mean QALYs | Comparison | Incremental cost (£) | Incremental QALYs | ICER (£) |
---|---|---|---|---|---|---|
Age 70 years | ||||||
Women | ||||||
A | 9047 | 6.8159 | – | – | – | – |
E | 9364 | 6.8173 | E vs. A | 317 | 0.0014 | 231,970 |
D | 10,134 | 6.8160 | D vs. E | 770 | –0.0013 | Dominated |
B | 10,586 | 6.8150 | B vs. D | 452 | –0.0010 | Dominated |
C | 11,427 | 6.8151 | C vs. B | 841 | 0.0001 | 5,773,991 |
A | 9047 | 6.8159 | – | – | – | – |
E | 9364 | 6.8173 | E vs. A | 317 | 0.0014 | 231,970 |
C | 11,427 | 6.8151 | C vs. E | 2,063 | –0.0022 | Dominated |
Men | ||||||
A | 8900 | 6.8903 | – | – | – | – |
E | 9238 | 6.8915 | E vs. A | 338 | 0.0012 | 281,096 |
D | 10,028 | 6.8898 | D vs. E | 790 | –0.0016 | Dominated |
B | 10,506 | 6.8885 | B vs. D | 478 | –0.0013 | Dominated |
C | 11,451 | 6.8874 | C vs. B | 944 | –0.0011 | Dominated |
Age 80 years | ||||||
Women | ||||||
A | 8175 | 5.1980 | – | – | – | – |
E | 8495 | 5.1984 | E vs. A | 320 | 0.0004 | 803,012 |
D | 9263 | 5.1981 | D vs. E | 768 | –0.0003 | Dominated |
B | 9829 | 5.1975 | B vs. D | 566 | –0.0006 | Dominated |
C | 10,681 | 5.1975 | C vs. B | 851 | –0.0000 | Dominated |
Men | ||||||
A | 8035 | 5.0689 | – | – | – | – |
E | 8464 | 5.0690 | E vs. A | 429 | 0.0000 | 12,763,540 |
D | 9138 | 5.0689 | D vs. E | 673 | –0.0001 | Dominated |
B | 9752 | 5.0679 | B vs. D | 615 | –0.0010 | Dominated |
C | 10,695 |