What do the new CONSORT and SPIRIT guidelines mean for health AI?

The development of AI tools for healthcare took a significant step forward this week, with the publication of the AI-specific extensions to the CONSORT (Consolidated Standards Of Reporting Trials) and SPIRIT (Standard Protocol Items: Recommendations for Interventional Trials) statements. These are intended to set new reporting standards for clinical studies on AI tools. As one of the experts who contributed to the development of these guidelines, I wanted to share some thoughts on what they mean and why they matter.

What are CONSORT and SPIRIT?

The CONSORT and SPIRIT statements, first published in 2010 and 2013 respectively, are efforts from the clinical research community to come together and agree on what basic facts about clinical trials should be made known. CONSORT focuses on reporting the results of clinical trials, while SPIRIT focuses on publishing the clinical trial protocol before the trial is conducted.

Both take the form of a checklist: anyone reporting the results of a trial, or publishing its protocol, can go through the relevant checklist to make sure they've considered and shared all the information that the rest of the community needs to judge if it’s a well-conducted, rigorous study.

Following a hugely collaborative process across the AI community and industry, the new guidelines – which were published on September 9, 2020 – will play an important role in ensuring standardization of clinical AI research, which will in turn help to increase trust in AI as it becomes more widely used in healthcare settings.

I was privileged to be one of the experts recruited to participate in the Delphi process, which was used to arrive at a consensus on the items that need to be included in the extended checklists. The group also included other members of academia, industry, and fellow healthcare professionals to give a comprehensive view of the issue from those that are working to deliver AI-powered solutions to users. I also reviewed pre-final versions of the guidelines, providing comments and editing for clarity.

Why AI needs its own guidelines

Clinical research is obviously hugely important, but to have maximal impact, high reporting standards and transparency are required. Doctors and patients will not base decisions on the results of research they don’t trust. The CONSORT-AI and SPIRIT-AI extensions do an excellent job of bringing the new field of AI into the orbit of established clinical research, while acknowledging the unique and nuanced challenges it presents. For instance, many forms of AI are dependent on huge datasets. Researchers and the community at large need to know how this data is sourced and used in clinical research to trust the results.

There are always strong incentives for researchers to report only positive results about their clinical trials using AI. Selective reporting could lead to negative patient outcomes and bad policy decisions. The CONSORT-AI and SPIRIT-AI statements encourage teams to conduct rigorous, replicable, and trusted research on the use of AI tools in a clinical setting. This should be the foundation on which the mainstream acceptance of AI in health is built. Moreover, with the changes in medical device regulation in the EU and the UK in the coming years, robust clinical evidence won’t just be a ‘nice-to-have’, but increasingly a regulatory requirement for medical AI tools.

Therefore, these guidelines elevate clinical AI research to the status of ‘normal’ clinical research, by demanding the same high standards of reporting and transparency.

An important milestone for health AI

While it may take some time for the full impact of these guidelines to be felt in frontline practices, the clinical AI community should be hugely excited about this development. It could signal a watershed moment for the field: acceptance of clinical AI research as part of mainstream clinical research. For some others in the healthcare community, this may seem very far from their day-to-day work, but I hope they will appreciate the efforts from many of us working on the use of AI in healthcare to be genuinely scientifically robust in our approach.

Those of us who work in AI for health need to be honest with ourselves – the time for hype is over. We should be focused on interventions that can demonstrably improve users’ health outcomes and healthcare experiences, and interventions that support the effectiveness of healthcare professionals. The only way we can know which those interventions are - and, in doing so, counter the quite understandable skepticism from those in the medical community who worry about these technologies being ‘unproven’ or ‘untested’ – is if we conduct good, strong clinical research.

By standardizing the way we conduct clinical research in AI, we can compare different tools, helping clinicians and patients decide which is the best tool for them to use in a particular situation.

Of course, to make the most of these new guidelines the community will need to use them in the right way. What’s more, and as with any guidelines, regulation or standards developed for fast-moving technologies like AI, these guidelines will need to be regularly reviewed and updated to ensure that they remain accurate and relevant to the ever-increasing capability and applications of AI.

Open collaboration is critical

At Ada, we’ve always had a focus on high medical quality and robust clinical assessment. These are both essential to develop safe products that are acceptable to patients and healthcare professionals. We are also demonstrably committed to cross-industry collaboration to advance the AI sector as a whole. Alongside our modest contribution to the newly updated reporting guidelines, Ada is leading the development of a framework for standardizing the measurement of symptom assessment performance as part of a larger WHO/ITU initiative on AI for Health. I’m also part of a group of experts convened by the World Economic Forum that is working to develop guidelines for the use of conversational agents in healthcare.

My focus as Medical Safety Lead at Ada is on applying the latest thinking on how we develop safe digital health products rapidly and scalably, bringing together the best of software development and healthcare. If robustly developed, rigorously and continuously reviewed, and thoughtfully applied, AI tools have immense potential to deliver better health outcomes for users and healthcare professionals alike.

Guidelines like the CONSORT and SPIRIT statements are powerful tools to help guide this process, and I hope that the rest of the field will join us in embracing them so that user trust in digital health products can continue to grow.