One-on-one with Tom Dyer: What it takes to successfully build ML for healthcare.

6 min readNov 23, 2022

This article was originally posted on our company website. Lakera’s developer platform enables ML teams to ship fail-safe computer vision models.

We recently interviewed Tom Dyer, who is one of our product users and brings a lot of experience building computer vision solutions for the healthcare industry. We took away a bunch of knowledge bytes and would love to share the same with our readers, especially those looking to explore building AI for the healthcare industry.

If you are in a hurry, below are our top takeaways from the interview:

Improving metrics is not enough. Understanding the clinical context and the key medical challenges plays a significant role.
ML teams working in development silos is a complete no-go in healthcare, having medical professionals and other stakeholders in the team is key.
Building systems that generalize requires understanding context in detail: every hospital may require different classification thresholds, for example.
For MLOps tools in healthcare, it’s all about build vs. buy.

What it takes to successfully build ML for healthcare. One-on-one with Tom Dyer

What do you find is the biggest culture shock while transitioning from ML research to an industry like health care?

In ML research, doubling down on a given metric is well-accepted. For instance, maximizing the accuracy on ImageNet. However, we must step away from exactly that in the healthcare industry. The major challenge in healthcare solutions is to identify the right metric or group of metrics that best describes the desired behavior in the first place.

The common mentality is to get a large model through as many GPUs as possible. But in practice, often simpler algorithms work best, and the challenges lie elsewhere.

What is important when building a high-functioning team in such settings?

Understanding the problem statement end-to-end and interacting with the broader team instead of ML researchers working in development silos is critical. While these are vital tasks, they are also the most challenging feats to achieve and maintain.

A holistic team with a common goal should be set up that includes clinicians, product managers, data scientists, ML researchers, and all regulatory experts who can contribute to the solution. Only by understanding the problem from all angles can ML teams scope the problem and solve it in a way that creates deep clinical value.

What do you think is the gap between SOTA research on medical imaging and production systems?

Research often focuses on ‘big ticket’ items which sound impressive for patient outcomes, but in reality, the clinical impact might be very small. Academics might receive large amounts of funding to improve the diagnosis of rare or severe conditions, but in reality, the clinical value is often limited.

Why? Context matters. While some diagnoses are incredibly important, the scarcity of positive cases and high patient volumes can often mean clinicians have to sift through false positives. SOTA often focuses on demonstrating that an ML model can perform a given task, but it mostly lacks the context around which the model will be used.

“SOTA often focuses on demonstrating that an ML model can perform a given task, but it mostly lacks the context around which the model will be used.”

Another issue is that the visual features that are learned by standard models are often not directly useful for clinical tasks. Simply expecting features learned on ImageNet to transfer to the problem at hand is often not enough.

What are the most challenging problems while bridging that gap?

The most challenging problem is to identify the model that will best generalize to the clinic. It is essential to consider not only the performance on test data but also the medical context in order to identify the best model. There is no quick fix for this, and traditionally no escape from tons of hands-on work. It is important to iteratively look at edge cases and improve the system accordingly.

A perfect example of how context influences the system’s design is the selection of classification thresholds. It is a challenging task since the threshold depends on the distribution of demographics in a given hospital. For instance, a clinic with a higher average age may require different thresholds than one with mostly younger people in order to hit the required false positive rate. This means that decisions for individuals could differ between hospitals, creating a very interesting tradeoff between group performance (e.g. false positives) and individual performance.

How do you know what to do next when improving your model?

It is essential to look at the model performance and identify

patterns or characteristics of the failure cases
specific dimensions of problematic performance (e.g.: missing X characteristic of a positive case, performing worse of subgroup A, etc.)
feedback from customers and end users

A typical cycle is to identify the characteristic errors, find similar images, add them to the training set, and continue iterating once the errors are no longer present.

“A key component of improving a model is thus to iteratively improve and refine the test set to better account for the intended use case.”

Building a test set is a challenging and often iterative process. For instance, a lot has been said about having National level datasets from the NHS in England to evaluate models nationwide. However, this is very challenging due to how much use cases differ and the different definitions developers build into their products. It is difficult to define the right level of annotation, and every application may define concepts differently. At what granularity does one define a medical condition? What tradeoffs does one try to reach with model predictions? It would be challenging to account for all these variations in a unified national dataset. A key component of improving a model is thus to iteratively improve and refine the test set to better account for the intended use case.

Where do you see the role of MLOps tools in this process?

We have given significant thought to MLOps, and it is always a “build or buy” decision. We took a deep-dive into active learning and spent a lot of time talking about it. Finally, we decided to implement everything ourselves since the tools are not quite there for our medical use case.

Lakera’s MLTest, however, was an exception. It has significantly accelerated the development of our chest x-ray technology by identifying and highlighting important performance issues early on during development. Lakera’s products enable us to rigorously test our computer vision models and prepare for MDR certification — both would have previously consumed a lot more resources from our team.

“Above all, it’s been important for us that these tools fit in with everyday workflows and support our path towards certification.”

Weights&Biases is very useful as well, and we also looked into Voxel51.

Above all, it’s been important for us that these tools fit in with everyday workflows and support our path towards certification.

What are your top tips for folks getting started in the space?

Learn about the clinical pathways and the clinical context. The best algorithms could be discarded if the interests of most stakeholders are not aligned. Setting the right context could involve answering a variety of questions like where is the algorithm going to be used, what are the healthcare costs, how does it impact the patients, and who pays for it? It is best practice to have a collection of models to explore different contexts.

My second piece of advice would be to never go in a silo and isolate yourself from clinicians, regulators, or case-specific experts. Healthcare is a regulated environment, and not involving domain and regulatory experts from day one would only delay clinical pilots and discussions. It would also result in multiple iterations. Ensure to include such conversations as part of the engineering workflows.

This interview was conducted by Mateo. We hope that you found it as illuminating as it was for us. Learn more about how MLTest can help you during development and certification here or head over to our insights blog or follow us on medium — curated by top experts in the industry!

For any questions, feel free to reach out to Mateo at mateo@lakera.ai or follow us on LinkedIn, Twitter, and Instagram.