Health in the Negative Space: What AI Misses Matters Most
- One HealthTech
- Sep 23, 2025
- 6 min read
I spend a fair bit of time around data and data-y people - helping build datasets, testing architectures, and working with others on how to design algorithms. I’ve had the chance to work across lots of different areas - from dementia and human genomics to infectious diseases and NHS records - and if something has columns and contradictions and touches health, I’ve probably tinkered with it at some point. Through those conversations, I keep coming back to a simple question: how do we make visible the invisible; the people, the context, the experiences, so we can say, with some honesty, that data helps care and saves lives?
Designers talk about “negative space” - the empty area around the main subject, the background, white space, the bit that looks like nothing and yet gives everything its shape. Alongside the “positive” space, it defines edges, creates balance, draws the eye to the focal point, and even sets the mood. Health data has negative space too: the bits we don’t collect, code, or count. In health, those gaps aren’t decorative in that they nudge decisions, and sometimes outcomes. When AI in medicine seems to miss what matters, it’s often because we’ve drawn the boundaries in the wrong place - the blind spots show you where the system isn’t looking.
A short history of what we didn’t see
The shadows didn’t turn up with AI; they’ve been here for bloody ages kids. Data grew up alongside big claims and wobbly categories. Francis Galton helped stitch statistics to hierarchy, making it seem reasonable to rank human beings, and to treat “race” as if it were biological fact. In Tuskegee, hundreds of Black men were enrolled in a study that withheld treatment so doctors could watch syphilis “naturally” unfold. Henrietta Lacks’s cells were taken without consent and went on to fuel huge advances while her own story barely featured. These aren’t the whole history of course, just a few reminders of how defaults get set about who gets measured, who gets trusted, and who gets treated. Over time, it’s therefore no surprise, that absences and abuses harden into norms.
A frame I find helpful comes from Ruha Benjamin’s Race After Technology. Data doesn’t arrive pure; it’s shaped by choices - what we collect, how we classify, who we include or leave out, and which outcomes we care about. Once those choices are coded and scaled, assumptions start to look like facts. The problem isn’t evil algorithms so much as obedient ones: systems that follow the brief we’ve written… often without quite noticing we wrote it.
“Data and technologies are socially constructed, and embed design decisions, assumptions, values, and ideologies that can be discriminatory and generate social harm” - Ruha Benjamin
When gaps do damage
During COVID, a very ordinary device reminded us that technology really doesn’t work for all people. Pulse oximeters - fixtures on wards and in living rooms - can over-read oxygen saturation on darker skin. A number that looks reassuring when it isn’t can slow escalation, and that delay matters. In the UK, this prompted a review into equity in medical devices, because it was recognised (by Sajid Javid - remember when he was PM for one hot second?!) that design choices had quietly assumed a default user and invited everyone else to fit around it. This is an assumption, that once reviewed, was very clear and cheap in a lab, but murky and costly at the bedside.
You see the same pattern when a model is pointed at the wrong target - a type of example the literature is littered with. One example that stuck with me is from a hospital risk tool in the US which was designed to allocate extra support (nurse calls, quicker appointments, home visits etc) using past healthcare spending as a proxy for need. Because Black patients often have lower recorded spend, largely due to access and affordability, the system read them as healthier at the same score. Same score, sicker patients. When researchers tuned the model to actual health need rather than cost, eligibility for extra care more than doubled for Black patients. The statistics here wasn’t the problem, it was the crude and lazy use of a proxy. Time and time again it’s clear that if you aim at the wrong thing, you can hit it with great precision and still miss the point.
Where the boundaries go a bit wobbly
I keep finding the same three places where we quietly lose clarity.
Boxes versus people - how categories help, and how they hurt
Categories exist for practical reasons. They make data entry quicker, audits possible, and research comparable. The trouble starts when a tidy label is treated as biology or destiny.Let’s take race for example. It’s a social category with a messy history, but it has often been used as if it were a biological variable. That can creep into clinical formulas. The ethnicity “correction” once applied to kidney function (eGFR) assumed, on average, different muscle mass across racial groups and adjusted results accordingly. In practice, that sometimes pushed people (often Black patients) over a treatment threshold and delayed referrals. The model wasn’t “trying” to be unfair; it was faithfully following an assumption that didn’t hold well at the level of the individual.
A general lesson I’ve found useful is to use categories as starting points for questions, not endpoints for decisions. If a category stands in for something causal, say, long-term exposure, access, or a specific genetic variant, try to measure the thing itself. Where that’s hard, be explicit that the category is only a rough proxy and keep an eye on who gets nudged across thresholds because of it.
Illness versus life - why context is clinical
Hospitals see illness; most of health happens outside the building. This sounds obvious, but it’s easy for data systems to forget. A model trained only on what’s inside the record will miss the reasons people arrive late, deteriorate faster, or don’t bounce back.Think about the everyday path from symptom to clinic: can you get time off work, childcare, and transport? Is there a GP appointment within reach? What is the air quality on your street, the damp in your flat, the food options near home? Each step slightly shifts the odds of when you present, how severe you are, and how well you can follow a plan. That is why the same condition can have very different outcomes across neighbourhoods For models, the implication is that if you can observe context (continuity of care, language needs, deprivation indices, housing instability) we must bring it into view. If you can’t, we need to say so, caveating our knowledge that key context is missing (and caveating this is not a weakness - it’s safer than a confident number built on half the story).
Metrics versus meaning - picking targets that match what matters
We tend to measure what’s easy to record: blood pressure, costs, waiting times, because those are standardised and auditable. But patients live in what’s meaningful: pain, function, trust, loneliness, respect. When the recorded metric is only loosely related to the thing that matters, we end up optimising the wrong target.
Knee osteoarthritis is a helpful case. Radiographic severity (the Kellgren–Lawrence grade) is clean and comparable, so it became the default. Yet it explains only a small slice of why some groups report more pain than others. When researchers trained a model on experienced pain (what patients actually feel) rather than the radiographic grade alone, much more of the disparity came into focus. The message wasn’t “patients were exaggerating”; it was that our usual measure was potentially under-seeing severity for many people, especially those with fewer resources. Whilst that one study definitely requires more digging to understanding what’s going on, it’s a good example of using AI to help us see where we are often bad at seeing.
Two practical takeaways I have found related to this one. First, if a measure is a proxy, name it as a proxy and test how it behaves across groups. Second, when possible, aim the model at the outcome that matters (pain, function, recovery), not just the convenient stand-in. That makes the model more useful and, often, more fair.
Through the broken window
People often say AI is a mirror. I don’t necessarily see it that way. To me, it’s more like a broken window. It doesn’t give a tidy reflection; it lets you peer through the cracks into the rooms we usually miss. Those fractures tug your attention to thin datasets, proxy targets that don’t quite hold, and places where the boundary lines need redrawing. If we treat them with a bit of care, they’re not really defects but invitations to rebuild the frame.
When we start to fill the negative space (like when we unbox people, widen health, and count what actually counts) we find possibilities the standard metrics never show. We also dodge the most expensive mistake in healthcare… that of sounding certain when we’re not.
I keep a few small reminders nearby: model causes, not categories; context is clinical; if it works at the edges, it probably works in the middle. And I still lean on Adam Rutherford’s line:
“We must always expect science to be misrepresented, overstated and misunderstood, because it is complex, because the data is unending, and because people are strange” - Adam Rutherford
People are, after all, gloriously unpredictable. So we must build for that: keep real conversations in the loop - ask people, then ask them again. When in doubt, close the laptop and talk to a human. Do that and the negative space starts to shrink - not just a subject and a background, but something that looks a little more like everyone in the same frame.
.png)

