Attention deficit hyperactivity disorder (ADHD) is characterised by excessive levels of hyperactivity, impulsivity and inattention (Wang et al, 2022). Thomas et al (2015) report a worldwide prevalence rate of between 5.29% and 7.2% in children, and 6.76% in adults (Song et al, 2021). National Institute for Health and Care Excellence (NICE, 2018) guidance changes placing ADHD into mental health services, alongside growing recognition of the condition, have put increasing pressure on healthcare providers.
Therefore, there is an urgent need for objective assessment methods to strengthen neurodevelopmental assessment processes, to accurately diagnose ADHD in a more streamlined manner (Wang et al, 2022).
The quantified behavioural test (QbTest; Qbtech Ltd) is an objective screening tool to supplement ADHD diagnosis. Hall et al (2018) found the QbTest improves reliability and clinical decision-making speed. Hollis et al (2018) also determined the QbTest had valuable applications in the assessment of ADHD in children, with clinicians 1.44 times more likely to reach a diagnostic decision. Moreover, Bijlenga et al (2019) concluded that the QbTest was a suitable tool for assessing ADHD in older adults.
However, the evidence base is conflicting, with some researchers claiming the QbTest is unable to accurately distinguish ADHD from healthy controls (Brunkhorst-Kanaan et al, 2020). Reh et al (2015) highlighted poor concurrent validity between the QbTest and psychometric assessment tools. Factors significantly affecting QbTest reliability and validity include clinicians' capabilities to accurately interpret data from the QbTest; the extent to which quantitative measurements are reflective of real-life behaviours; and the degree to which QbTesting reliably differentiates symptoms of ADHD from comorbid conditions, such as autism (Johansson et al, 2021; Vogt, 2021).
Despite this, the QbTest is now used across the NHS in England. Arguably, this is due to benefits relating to cost-effectiveness, with QbTesting estimated to save £80 000 per year, per clinic (Qbtech, 2020). Additionally, the NIHR Collaboration for Leadership in Applied Health Research and Care (2017) estimated that the QbTest saves an average of 32.6%, due to a reduction in appointments. However, some researchers have associated financial ties with positive results in the literature (Groom, 2016; Ahn et al, 2017). Consequently, NICE (2023) guidance stipulates there is a need for a systematic review of the current evidence base for QbTesting, and there is a clear requirement for further research examining its effectiveness.
This literature review provides a platform to identify factors affecting diagnostic accuracy and reliability, to improve nursing practices and strengthen neurodevelopmental assessments (Hollis et al, 2018). Diagnostic criteria is shown in Table 1.
Inattention |
|
Hyperactivity and impulsivity |
|
Method
As this project aimed to appraise and synthesise results of previous research, this review was conducted systematically, using primary research between 2013 and 2022, to evaluate the reliability and validity of QbTesting. Although all papers used randomised controlled trials (RCTs) providing quantitative data, a thematic analysis was used to identify patterns and establish relationships between data, as recommended by Braun and Clarke (2022).
Consequently, this review summarised the reoccurring themes across the body of research. Due to the quantitative nature of this review, a PICO tool was used to facilitate the search strategy, identifying relevant articles through cross-database searches (Coughlan and Cronin, 2020).
As literature searching forms the basis of systematic reviews, it was imperative the search strategy was accurate and extensive, as this has a significant impact on the quality of the review process (McGowan et al, 2016).
Thus, the population, intervention, comparison and outcomes (PICO) tool was used with caution, due to its limited evidence base and difficulties defining the scope of the topic (Eriksen and Frandsen, 2018). Inclusion criteria were identified to refine the search process as follows:
Electronic databases Summon, CINAHL and Medline were used to source articles relevant to this literature review. Databases selected provided access to primary research relevant to nursing, as recommended by Aveyard (2018). Secondary sources, such as reference lists of selected articles, were examined. Due to time restrictions and the limited number of primary studies available on QbTesting, seven articles were selected for review. A literature search was conducted to identify relevant articles, using the inclusion and exclusion criteria. Due to limited research 55 articles were identified (Figure 1).

Thematic analysis is a method of identifying and analysing reoccurring themes, patterns and meanings throughout data (Braun and Clarke, 2022). However, thematic analyses have been criticised for their vulnerability to bias, poor coherence and overlap between themes due to poor research design and ambiguous guidelines for interpreting data (Javadi and Zarea, 2016). Themes were therefore established prior to conducting this review, due to previous research and experience within this field. Inferential statistics provide a means of accurately assessing the reliability of quantitative data, using tools derived from statistical tests (Ellis, 2019).
Due to the quantitative nature of this review, evidence supporting findings from the thematic analysis was derived from examining inferential statistics to further scrutinise the fidelity of each study's conclusions.
Data extraction and data synthesis
Guidelines by Coughlan et al (2007) and the CASP (2018) RCT checklist were used to critically appraise articles. Findings were combined to provide an overview of key themes, using a pragmatic, structured process (Figure 2).

A total of 55 records were retrieved through initial database searches, with 52 abstracts screened after duplicates were removed to assess eligibility. The search yielded seven articles that met the inclusion criteria, conducted in England and Sweden (Table 2).
Paper | Authors | Findings |
---|---|---|
1 | Edebol et al (2013) | Supports the use of QbTest in adult populations, highlighting good specificity at 83% and sensitivity at 86% across all groups. Due to this study's large sample size and standardised procedure, it was deemed useful for review |
2 | Hollis et al (2018) | Concluded QbTest is a useful means of reducing consultation time, with clinicians 1.44 times more likely to reach a diagnostic decision. Hence, it was determined the QbTest increases clinical efficiency without compromising diagnostic accuracy |
3 | Johansson et al (2021) | Results highlighted the QbTest's ability to correctly classify symptoms of ADHD in children was poor. Discriminatory analysis showed sensitivity and specificity was also unsatisfactory |
4 | Hult et al (2018) | Supports the use of QbTest, stating that it was able to identify higher rates of hyperactivity and inattention in children with ADHD. However, scores relating to impulsivity were unaffected, requiring further examination |
5 | Emser et al (2018) | Supports the use of the QbTest when combined with subjective assessment methods. Researchers reported higher rates of accuracy, 79% (adults) and 78% (children), when using the QbTest to detect symptoms of ADHD. This increased when combined with self-report measures. Despite this, it was identified as being an unreliable predictor of ADHD in adults, again requiring further investigation |
6 | Bijlenga et al (2019) | Researchers concluded the QbTest was a suitable means of assessing ADHD in older adults, but this did not apply to impulsivity, similar to paper 4 |
7 | Adamou et al (2022) | The QbTest was unable to differentiate symptoms of ADHD from healthy adult controls. QbTest demonstrated 70% accuracy when identifying those with a clinical diagnosis, buut only 43% specificity was identified when detecting the absence of ADHD in those who did not |
Characteristics and quality
All articles included were peer reviewed to increase rigor and reduce bias (Coughlan and Cronin, 2020). Key strengths across all papers included the use of experimental designs increasing reliability. Additionally, all used inferential statistics to examine the accuracy of data. Key limitations related to poor discriminant validity and generalisability, as sample representativeness was not demonstrated.
Discussion
As recommended by Coughlan and Cronin (2020), themes were derived based on the inclusion criteria, relevance to the research question and identification of sub-themes throughout the text, strengthening this review's integrity.
Thematic analysis highlighted most studies lacked generalisability, were poorly standardised and had weak external validity. Failures related to an absence of clinician experience when interpreting and administering the QbTest, an absence of standardised testing procedures, issues regarding cross-cultural validity and the impact of comorbid conditions. Furthermore, inconsistencies were noted across three papers regarding the QbTest's ability to accurately measure impulsivity.
The ‘gold standard’ of research design is underpinned by the five themes of reliability, validity, accuracy, standardisation and generalisability (McBride, 2020). This review explored the fidelity of QbTesting, comparing and contrasting data sets across several journal articles, in accordance with these themes.
Reliability
QbTest is a reliable means of assessing ADHD in adults, claim Edebol et al (2013). Findings concur with Hollis et al (2018), who concluded QbTest aided quicker diagnostic decisions. Emser et al (2018) support this, reporting higher rates of accuracy when using QbTest to detect symptoms of ADHD in adults and children. Thus, with sensitivity increasing so does reliability, indicating the QbTest can yield positive results regarding ADHD symptomatology (Parkih et al, 2008).
Conversely, Hult et al (2018) concluded the QbTest only had moderate ability to identify ADHD, when used as a ‘stand-alone’ tool. Despite the use of convenience sampling and issues related to potential researcher bias, findings were consistent with Johansson et al (2021), who concluded QbTest was an unreliable predictor of ADHD in children, with discriminatory analysis concluding sensitivity and specificity was unsatisfactory. Additionally, Adamou et al (2022) highlighted the QbTest was unable to differentiate symptoms of ADHD in adults.
Although Edebol et al (2013) and Hollis et al (2018) used large sample sizes, indicating stronger reliability, it is too simplistic to conclude these studies findings are more reliable. Interestingly, Hult et al (2018), Hollis et al (2018) and Johansson et al (2021) reported confidence intervals, suggesting only their samples were truly representative of the general population (Hanneman et al, 2013). Failure to report confidence intervals increases the likelihood of high variability in samples, indicating a likelihood of bias (Aveyard, 2018).
Thus, although these studies advocate the QbTest is a reliable tool, implications relating to weak experimental designs reduce the clinical rigour and robustness of these claims.
Impulsivity measurements
Results regarding the QbTest's ability to identify symptoms of impulsivity were mixed. Bijlenga et al (2019), Hult et al (2018) and Adamou et al (2022) stress the QbTest could not reliably detect impulsivity; hence this was not a valid differentiator for adults and children. Edebol et al (2013) also recognised hyperactivity was the most common feature of ADHD, with impulsivity the least common.
Bidderman et al (2000) (cited in Emser et al, 2018) offer a useful explanation, stating hyperactivity and inattention decline to a greater extent over time. Still, this does not explain why the QbTest was less sensitive to impulsivity in a sample of children by Hult et al (2018) with a mean age of 10 years. This is also inconsistent with findings from Bijlenga et al (2019) claiming QbTesting can accurately identify symptoms of hyperactivity in older adults.
Validity
Discriminant validity
Johansson et al (2021) argued the QbTest cannot differentiate ADHD from comorbid neurodevelopmental conditions, concurring with Sharma and Singh's (2009) claims. Despite issues relating to missing data, findings by Johansson et al (2021) are supported by Vogt (2021), who stated research should measure the QbTest's ability to differentiate ADHD from difficulties related to emotional dysregulation.
Moreover, Edebol et al (2018) reported sensitivity dropped to 36% when using the QbTest to assess ADHD in individuals with personality disorders. This is significant as ADHD is a heterogenous disorder, with 51.8% of individuals with ADHD exhibiting at least one comorbid condition (Merrill et al, 2022). Consequently, use of valid measurement tools to identify ADHD symptoms is vital.
Johansson et al (2021) suggest the QbTest could not differentiate between sub-types of ADHD in children, indicating random to poor validity. Furthermore, Hult et al (2018) and Adamou et al (2022) concluded that when comparing performance against healthy controls, discriminatory validity was poor across adults and children. Findings concur with claims that inconsistencies have been identified in the QbTest's convergent and discriminant validity when used with children (Emser et al, 2018).
Internal validity
All researchers made robust attempts at successfully measuring QbTest accuracy, using RCTs or mixed-method approaches, as recommended by Aveyard (2018). Valid and reliable diagnostic tools were used to inform diagnostic decisions, which were compared to QbTest data. Baseline characteristics were accounted for, with QbTest results objectively compared to normative data of children of the same age and gender. This increases accuracy reducing the likelihood of confounding variables affecting the results (McBride, 2020; QbTech, 2020).
Ecological validity
All studies were conducted in a healthcare setting, replicating real-life assessment processes. Therefore to some extent, good ecological validity was established. However, the QbTest is conducted in a controlled, artificial setting with random distractions not present as they would be in real-life circumstances. Hence, the extent to which the results of the QbTest are representative of behaviours outside of a laboratory setting, such as school, are questionable (Wang et al, 2022).
Accuracy
Outcomes were comprehensively reported using inferential statistics, with p values objectively determining whether measured differences were due to the intervention, rather than chance (Fink, 2019). Conclusions by Hollis et al (2018) supporting QbTest hold valuable merit due to the use of single-blinding techniques and robust experimental procedure. However, a significant limitation relates to missing data.
Psychiatrists' diagnoses were made with more than half of participants' information missing, providing an unreliable comparison against QbTest performance. This is problematic, as missing data compromises the accuracy of results and is subject to bias and inter-rater disagreement (Jakobsen et al, 2017). Similarly, despite holding opposing views, Johansson et al (2021) also reported 15 cases of missing data when measuring Qbactivity on children. This questions how sensitive QbTesting is to micro-movements, and to what extent rater bias has an impact on the interpretation of observable behaviours (Brunkhorst-Kanaan et al, 2020). Thus, results from the QbTest measuring activity are subjective and prone to error. This coincides with Emser et al's (2018) findings, who confirmed Qbactivity was not a reliable predictor of ADHD in adults and children.
Little attention has been placed on the impact of extraneous variables, such as anxiety, which may distort QbTest results (Pellegrini et al, 2020). While Hollis et al (2018), Bijlenga et al (2019), Johansson et al (2021) and Edebol et al (2013) included participants with comorbid conditions, the impact of these were not considered, despite evidence confirming anxiety influences inattention and impulsivity in children when undertaking continuous performance tasks (Méndez-Freije et al, 2023). Accuracy of the results are therefore unclear, as ADHD and anxiety share similar characteristics. Claims coincide with Söderström et al (2014), who demonstrated poor discriminant validity of the QbTest when comorbid conditions were present. Furthermore, Emser et al (2018) identified children with ADHD had a lower IQ than controls. Reh et al (2015) support these findings, identifying children with higher IQ scores were less impulsive. Again, although this association has been established, the impact of IQ on QbTest performance remains unclear.
Standardisation
A key finding was a lack of standardisation across testing procedures. Although there are strict instructions for administering the test, the impact of environmental variables and clinician experience may distort its performance (Vogt, 2021). Implementing consistent testing procedures across all experimental conditions limits the impact of extraneous environmental variables, which may disrupt QbTest performance (McLeod, 2023). However, Hollis et al (2018), Johansson et al (2021) and Bijlenga et al (2019) conducted testing at multiple sites. Consequently, assessments were not standardised across clinics, with researchers failing to specify whether environmental controls were consistently implemented to avoid interference.
Additionally, some clinicians had extensive experience of using the QbTest, while others had very little (Lennox et al, 2020). Furthermore, Hollis et al (2018) recognised lower sensitivity in the group where QbTest data was accessible, indicating clinicians may have been applying a more stringent diagnostic criteria. Hult et al (2018) also noted QbTest results were known to some clinicians who contributed to the final diagnosis, highlighting inconsistent assessment processes across clinics.
Therefore, neurodevelopmental assessments must be conducted by competent and skilled clinicians, with careful interpretation of QbTest results undertaken following the ‘gold standard’ of diagnostic procedures (Villagomez et al, 2019).
Generalisability
All studies demonstrated poor cross-cultural validity; therefore, results cannot be generalised to the whole population. Fridman et al (2017) reported that countries including north America made significantly quicker diagnostic decisions than England. Thus, results are not representative of assessment practices outside of England and Sweden.
Additionally, Chan et al (2022) highlighted differences in diagnostic thresholds and behavioural symptoms when examining ADHD in Asian and British children. Nevertheless, few studies have examined cultural differences which may affect ADHD diagnosis and QbTest performance.
Gender differences relating to QbTest performance are poorly understood. A quantitative study by Slobodin and Davidovitch (2019) recognised gender differences in ADHD are unclear, due to limited samples of females used in research.
Despite issues with this study's small sample size, researchers found females largely suffer from inattention, as opposed to males who display more symptoms of hyperactivity, also reported by Edebol et al (2013).
Ultimately, QbTest performance varies depending on gender differences, requiring further investigation. Recommendations for future practice are summarised in Table 3.
Recommendation | Justification |
---|---|
Use of competent, skilled clinicians with extensive experience in neurodevelopmental assessments and QbTesting | As highlighted by Vogt (2021), there is concern over whether clinicians are competent at interpreting and administering the QbTest. This questions the extent to which data from the QbTest is accurately measured and used appropriately to assist with diagnostic decision-making. As dictated by QbTech (2020), the QbTest should not be used as a ‘stand-alone’ diagnostic tool |
Use of a multi-disciplinary team approach to aid more reliable diagnosis of ADHD | NICE (2023) guidance stipulates a multidisciplinary team approach must be used when conducting neurodevelopmental assessments. Use of multidisciplinary approaches aids more comprehensive ADHD assessments and enhances care coordination (McGonnell et al, 2009) |
Exclusion of QbTesting to detect ADHD symptoms in complex presentations | The interaction between ADHD, autism and learning disabilities is not well understood. Separating symptoms of ADHD and intellectual difficulties is challenging, and it is recommended a more holistic approach is adopted, considering all aspects of the patient's development to support diagnostic decision-making (Royal College of Psychiatrists, 2021) |
Use of consistent, standardised testing procedures to reduce the influence of extraneous variables | Despite QbTesting having strict instructions for administering the test, the impact of environmental variables and clinicians' experience still pose a risk of distorting results (Vogt, 2021). Strict controls must be implemented throughout the testing procedure to reduce interference from extraneous variables |
Further research investigating gender differences in ADHD symptomatology and QbTest performance | Gender differences in ADHD presentation are poorly understood (Slobodin and Davidovitch, 2019). Research has not examined gender differences affecting QbTest performance. Moreover, disproportionate samples do not fairly represent female test performance, requiring further examination |
Further research examining QbTest effectiveness at differentiating symptoms of autism and ADHD | Almost half of children with autism suffer from impulsivity, hyperactivity and inattention (Murray, 2010; as cited by Hult et al, 2018). Research has highlighted similarities in QbTest performance in children with ADHD and autism; therefore, the extent to which the QbTest can differentiate between the two conditions is unclear (Hall et al, 2018) |
Cross-cultural studies examining the effectiveness of QbTesting outside of Western culture | Evidence supporting QbTesting has largely been conducted across Europe, indicating poor cross-cultural validity. Cultural differences that affect ADHD presentation have not been examined, and the reliability of the QbTest is unclear when used with different populations |
Continued use of RCTs to examine the QbTest's ability to accurately detect impulsivity | Although thematic analysis highlighted the QbTest was unable to identify symptoms of impulsivity, it remains unclear as to why, requiring further examination (Hult et al, 2018; Bijlenga et al, 2019; Adamou et al, 2022) |
More research examining the interaction of IQ and learning disabilities of QbTest performance in adults | Milioni et al (2017) report intellectual ability affects QbTest performance. The relationship between intellectual ability and QbTest performance is poorly understood, raising the question of whether the QbTest measures cognitive abilities rather than symptoms of ADHD (Johansson et al, 2021) |
Bias
While this review aimed to provide a comprehensive overview of QbTesting, it was not possible to include all primary research, due to having limited access to databases. Potential bias relating to the reporting of confidence intervals was noted throughout this review. Only three papers included confidence intervals, with failure to report these increasing the likelihood of high variability in samples, indicating a likelihood of bias (Aveyard, 2018). Consequently, caution should be employed when interpreting the results of these studies due to weak experimental design.
Moreover, some studies utilised small sample sizes and high drop-out rates were also reported (Emser et al, 2018; Adamou et al, 2022). This is indicative of attrition bias, with Hollis et al (2018) reporting 153 cases of missing data and Johansson et al (2021) 15 cases, limiting the certainty of results. Researchers did not identify whether the effects of attrition bias were examined.
Additionally, researchers such as Hult et al (2018) did not implement blinding, increasing the potential of observer bias and demand characteristics, with Bijlenga et al (2019) failing to implement randomisation when using a convenience sample, increasing the likelihood of shared characteristics. This reduces the generalisability of claims and may have led to further bias.
Conclusion
Several papers acknowledged that the QbTest was unable to identify symptoms of impulsivity, but there is no reliable explanation for this. Additionally, the QbTest cannot differentiate symptoms of ADHD from comorbid conditions, with gender differences also poorly understood (Slobodin and Davidovitch, 2019; Johansson et al, 2021). Issues around a lack of standardisation, poor ecological validity and diagnostic accuracy contradict research supporting the QbTest. It must, therefore, be used with caution when continuing to implement it into neurodevelopmental assessments across healthcare services. Ultimately, further interpretation and research into QbTest reliability is required to strengthen diagnostic accuracy.