Progress towards an MVP to globally benchmark AI symptom assessments

benchmarking framework to measure the medical accuracy and performance of AI symptom assessments

“In the beginning, no one could really imagine how such complex systems like ‘AI symptom checkers’ could be benchmarked, but now this topic group is showing to be actually cracking this nut.” Developing an internationally recognized standardized benchmarking framework to measure the medical accuracy and performance of AI symptom assessments is a tough nut to crack. Professor Dr. Thomas Wiegand, chair of the focus group on AI for Health (AI4H), hinted at the inherent challenges and promising progress made at the eighth meeting to tackle this initiative, held in Brasilia, Brazil in January 2020.

AI4H is a joint initiative of the World Health Organization (WHO) and International Telecommunications Union (ITU) that brings together academia, industry, and governmental stakeholders to drive the application of AI in health by establishing an evaluation framework. AI-based symptom assessment benchmarking is one of 18 AI4H topic groups – others focus on detecting malaria, identifying falsified medicine, and predicting cardiovascular disease.

The globally decreasing supply of health workers and the increasing demand for healthcare has made the widespread uptake of AI symptom assessment tools inevitable. Despite the challenges, a benchmarking system will be essential to evaluate these tools’ performance and how effectively they can support users and health systems.

Our Director of Research and one of the pioneers of Ada's AI framework, Henry Hoffmann, is the AI4H symptom assessment topic group driver, uniting a growing community of stakeholders and coordinating the collective progress towards a standardized benchmarking framework for everyone in the symptom assessment field. The topic group represents a meeting of minds, many with products in the same competitive field, and fuses medical and computer science: doctors, medical specialists, and public health experts work alongside technologists, software engineers, and data scientists. Following the successful meeting in Brazil, and after introducing Ada’s role in AI4H some months ago, it’s time for an update on what the topic group has achieved together so far.

We collectively created a minimal viable benchmarking platform

After hosting two topic group workshops, one in London and one in Berlin, the group developed the technical and medical details for an initial ‘simple’ benchmarking system, using test data and test AI models. This means our topic group’s medical experts selected several diseases and connected them with symptoms, building a synthetic data set to test the AI models. The simplistic framework provided important insights for the first benchmarking exercise, which will test group members’ actual AI with real-world data later in 2020.

We welcomed an interdisciplinary community of experts

We have grown from being the only AI4H symptom assessment topic group member to 14 companies with symptom assessment or clinical decision support tools, and continue to encourage new members to join. Since the last workshop in Delhi in November 2019, Buoy Health, mfine, 1DOC3, and MyDoctor joined our group – the latter two representing the group with us at the Brasilia meeting. New community members spanning the fields of medicine, AI, ethics, and data science continue to expand the group’s expertise.

We connected with new public health and ethics experts

The Brazil workshop was an opportunity to see progress from the other AI4H topic groups, and also to forge new connections with local and global health experts interested in contributing to the AI symptom assessment topic group. Dr. Andreas Reis, co-lead of the WHO’s global health ethics team, will advise on ethical implications; Dr. Alejandro Lopez Osornio, a family physician and medical informatics specialist, will consult on defining a joint ontology for benchmarking test data; and Dr. Ana Riviere-Cinnamond, a WHO and Pan American Health Organization public health expert in disease surveillance and prevention, will add a pan-American perspective on AI symptom assessments.

Henry and Martina are standing in front of the event sign.

The global standardized benchmarking framework aims to provide stakeholders – regulators, the WHO, governments, and others – with an objective measure of AI symptom assessment accuracy and trustworthiness. One of the topic group’s next steps will be to learn, from Singapore’s International Medical Devices Regulators Forum in March 2020, about what can make the symptom assessment benchmarking initiative a success for regulators. Until then, read how healthtech pioneers and regulators can work together.