Four ways we’re measuring medical quality

Practicing as a primary care physician in North West London was a formative experience. It helped me understand how meaningful connection, trust, and the quality of the doctor–patient relationship can have a big impact on that individual and those close to them. Since then, I’ve been wrestling with how we need to bring this thinking into digital health so that we can scale that impact.

It’s been 2 years, but I still remember a chat I had with Dr. Dipesh Gopal while we were working in the UK’s National Health Service. He quoted Paul Freeling, a founder of academic practice for doctors:

“If you're in clinical practice, you can help thousands of patients. If you're teaching medical students and young doctors, then you're going to change the lives of hundreds of thousands of patients. But by doing academia, research, you're changing the way we practice and changing the lives of millions.”

After 5 years working in digital health, I’d add that doctors working in digitization are creating tools that could change the lives of billions. But how can we realize the potential for a huge positive impact on health outcomes while avoiding unintended harm?

At Ada, our medical experts and probabilistic reasoning technology provide high-quality medical information to 10 million people wherever they are, whenever they need it, in 1 of 11 languages and within minutes. Naturally, health organizations that are interested in working with us to accelerate treatment pathways are curious about how we ensure medical quality at this scale.

We’ve noticed this is particularly important for life science companies, operating in a highly regulated and evidence-driven environment. It’s not enough to simply claim digital products are effective and safe in marketing materials. Providers must prove it. So, in the spirit of building trust in the clinical rigor invested in our digital health solutions, here are some of the initiatives we have in place at Ada to measure, regulate, and continuously optimize medical quality.

Internal tests

We measure medical quality for our symptom assessment technology across 4 factors.

Coverage. Have we covered a comprehensive range of conditions in priority therapeutic areas, such as rheumatology and oncology?
Accuracy. Are the top 3 suggested conditions similar to the list of differential diagnoses a human doctor would provide?
Medical safety. Is the care advice we provide ‘just right’ for the likely condition?
User experience. Has the user told us that this genuinely helped them?

Ensuring medical quality is a continuous improvement process. That’s why we optimize condition coverage every 2 weeks. Before a new condition is available to users, we conduct a rigorous internal testing process to make sure the change will have a positive effect on accuracy, safety, and user experience, and will not compromise conditions we already cover.

Let’s take COVID-19 as an example. We ensured other common respiratory conditions, such as asthma or influenza, would still be suggested at an accurate rate once we added COVID-19 as a condition suggestion for our users.

A prototype model of a condition such as COVID-19 is first tested against our clinical scenarios. These are brief narratives collected from the scientific and medical literature that describe the initial presentation of signs and symptoms, the differential diagnoses a doctor might consider, and the correct diagnosis.

We separate the experts on our Medical Knowledge Team and our Clinical Validation Team to avoid bias or ‘over-fitting’ our model to data gathered from the literature.

Then, user-facing condition information is localized into every language. Clinical oversight is essential to our translation process. It ensures people using Ada can decode medical terms in their own language to accurately report their symptoms. Even just in English, consider the medical nuance between ‘pain’ and ‘tenderness’, then imagine how that nuance needs to be managed across languages and cultures.

It’s why qualified medical professionals who are natives in Ada’s 11 languages are responsible for translations. We run usability tests here too. Once the translations have been checked and finalized, a new version of the disease model is released as an update so people can immediately benefit the next time they use Ada.

We also conduct medical verification tests. In a recent example, we worked with 11 independent doctors who created 1,268 clinical cases. We use these to test Ada’s performance before every release to cover a comprehensive range of scenarios.

Clinical validation

As our life sciences partners know, what works in carefully controlled laboratory conditions isn’t always replicated in the real world. To complement internal testing, our Clinical Evaluation Team collaborates with leading doctors, researchers, and healthcare institutions to test our performance. Let’s review some highlights of our published peer-reviewed evidence.

A study we’re particularly proud of was recently published in the British Medical Journal. It compared the performance of 7 primary care doctors with 8 digital symptom assessment tools, including Ada.

The results showed that in the 200 real-life clinical scenarios researchers tested:

Human doctors covered 100% of conditions, and Ada covered 99% – including obstetric, pediatric, mental health, and pre-existing conditions.
Human doctors had 82% accuracy, and Ada was the next closest at 71%.
Human doctors and Ada together provided advice that was safe 97% of the time.

In another peer-reviewed study, our collaborators at Stanford University measured the medical safety of Ada’s digital triage support system for people in the Sutter Health system.

Stanford researchers said Ada’s “triage recommendations were comparable to those of nurse-staffed telephone triage lines.”

Their analysis of 26,646 assessments found that many people were seeking answers outside typical consultation hours, and that over a quarter of assessments indicated a high degree of urgency. You can read the full study in the Journal of Medical Internet Research.

A peer-reviewed study with the UK’s National Health Service measured usefulness to patients in a primary care setting. Over a 3-month period, results from 523 patients showed:

98% found Ada easy to use
85% said they’d recommend Ada to a friend or relative
13% said Ada would have saved them an unnecessary trip to the doctor.

I’m always interested to learn how other healthcare professionals use Ada. A study in a Stuttgart emergency department measured how helpful accuracy and medical safety suggestions were to nurses and doctors:

100% of nurses said Ada’s clinical handover report was useful
73% of doctors agreed.

While internal tests and clinical validation help us measure and improve medical quality, industry regulation milestones are an objective stamp of approval that our quality processes are real-world ready.

Industry regulation

Our life sciences partners understand the necessity of transparent and well-managed quality management processes. As a Class-I CE-marked medical device, we take our responsibilities as medical device manufacturers seriously. Moreover, our company-wide quality management and information security systems are regularly audited.

We’re IS0 27001 certified for our information security, and ISO 13485 certified for our quality management systems. At our last ISO27001 review, the auditors mentioned they were “impressed by how seriously Ada takes information security”.

We’ve been recognized as leaders in the field by global stakeholders. Henry Hoffmann, our Director of Research and a pioneer of Ada's AI, is the AI for Health (AI4H) symptom assessment topic group driver. A joint initiative of the WHO and International Telecommunications Union (ITU), AI4H unites academia, industry, and governments to establish a standardized framework to measure AI health applications.

The group also unites several symptom assessment tool providers including Healthily, Infermedica, and Tanzanian Inspired Ideas towards an independent measurement of medical quality indicators in AI. We see it as a rising tide of digital health stakeholders lifting the performance of all symptom assessment tools.

Realizing and getting trust in this work requires input from clinical experts from around the world. The AI4H’s clinical evaluation working group, which I support as co-chair, brings together experts from around the world to help draw from this expert knowledge and much of the great work already done in the field. It helps ensure the clinical evaluation of AI tools is relevant, inclusive, and implementable.

User feedback

All the testing, validation, and regulatory processes in the world mean nothing if people don’t actually use the product. Our Marketing Team tells me there are more than 400,000 healthcare apps across the App Store and Google Play, but very few have succeeded. Most have fewer than 10,000 downloads.

In such a competitive environment, only technology that genuinely helps people can succeed.

Our medical quality improvements benefit from regular user feedback – gathered from within the assessment and more than 100,000 reviews on the App Store and Google Play.

Proactively learning from this feedback has helped us maintain an average 4.7 stars after more than 352,823 ratings, including 250,000 5-star ratings. We’re a bit shy about it (just kidding), but Ada has recently been within the top 15 medical apps for the UK, Brazil, Germany, and Spain.

We’re continuously learning and striving to improve our medical quality, and we’re committed to our approach in aiming to deliver better health outcomes for our users and partners. We’re seeking partners who also value this, and who are willing to work with us to expand our impact.

If that's you, please get in touch.