
What Your Wearables Aren't Telling You
Your Apple Watch says you slept seven hours. Oura says six hours twenty-three. WHOOP gives you a recovery score of 64 and a strain of 12.4 that don't quite add up. Three good devices. Three different stories. This is what's going on under the hood.
Your Apple Watch says you slept seven hours. Oura says six hours and twenty-three minutes. WHOOP hands you a recovery score of 64 and a strain of 12.4 that refuse to reconcile with the green ring on a fourth screen. Three good devices, three different stories, and one body underneath them all, doing whatever it actually did last night regardless of how anyone scored it. The disagreement is not a malfunction. It is the most honest thing your wearables will ever tell you, if you know how to hear it.
Something quietly enormous has happened in the last decade. Hundreds of millions of people now sleep, work, and exercise with a sensor pressed against their skin, generating a continuous stream of numbers about the inside of their bodies. For most of human history that information was invisible, available only to a doctor with a machine on the rare day you visited one. Now it arrives every morning, unbidden, before coffee. And almost no one was taught how to read it. We have handed an entire generation a new sense organ and skipped the part where you learn what the signals mean.
That gap matters more than it sounds. A number you cannot interpret does not sit there neutrally. It does something to you. A low recovery score can talk you out of a workout you were ready for. A bad sleep grade can make you feel tired on a morning you woke up fine. People now wake up and check a score before they check in with themselves, and the score frequently wins. The skill of reading your own physiological data, of knowing which numbers to trust and which to shrug off, is becoming a real and unevenly distributed literacy. This essay is an attempt to teach a little of it.
Why your devices disagree
Start with the mechanics, because they explain almost everything downstream. Two devices give you different numbers for two reasons: they are measuring different physical signals, and they are guessing from those signals with different math.
They are not measuring the same thing
A wrist-worn optical sensor (Apple Watch, Fitbit, most Garmins) shines green light into the back of your hand and reads how the reflection flickers as blood pulses through the tissue. This is photoplethysmography, or PPG, and it is an inference: it watches blood volume change and works backward to when your heart must have beaten. A ring (Oura) does the same trick at the finger, where the arteries sit closer to the surface and the signal is cleaner. A chest strap (Polar H10, Garmin HRM) does something categorically different. It reads the electrical signal of the heart itself, the same voltage a clinical ECG measures, picking up each beat at its source instead of inferring it three steps downstream.
That difference is not cosmetic. An electrical sensor knows exactly when a beat happened. An optical sensor knows when blood arrived at your wrist, which is a slightly blurrier, slightly delayed proxy, and which gets blurrier still when you move, when your hands are cold, when the band is loose, or when your skin tone absorbs more of the green light. None of these devices is lying. They are answering different questions and we keep treating the answers as if they were the same question.
Sleep stages are inferred, not observed
The deeper source of disagreement is that the most cited numbers on your wearable are not measurements at all. No consumer device watches your brain. The clinical method for staging sleep, polysomnography, glues electrodes to your scalp and reads brain waves directly. Your watch reads heart rate, motion, and skin temperature from the outside and runs a model that predicts what your brain was probably doing. It is an educated guess dressed up as a fact.
Each company trains that model on its own data, weights the inputs its own way, and tunes it for its own typical customer. So when Oura and WHOOP disagree about how much deep sleep you got, they are not contradicting each other about something either of them saw. They are offering two different opinions, formed by two different statistical models, about an event neither one witnessed. Understood that way, the disagreement stops being mysterious. It would be strange if they agreed.
How big is the gap, really
Worth knowing, because the honest answer is reassuring in some places and humbling in others. Validation studies that put consumer devices next to polysomnography in a sleep lab tend to land in a consistent range:
- Total sleep time: good devices land within roughly 15 to 35 minutes of the lab measurement on a typical night. Most over-estimate slightly, because lying still with your eyes closed looks a lot like sleep to a motion sensor.
- Sleep efficiency: usually within about 3 percentage points. One of the more trustworthy numbers on the screen.
- Deep sleep: the weakest link, often off by 30 minutes or more. Stage-level guessing is hard, and deep sleep is where the guesses scatter most.
- REM sleep: better than deep but still loose, commonly within about 20 minutes.
- HRV: close to a reference within a few milliseconds when measured at rest, but with a real, device-specific bias that makes cross-device comparison treacherous.
Notice the shape of this. The headline number, did I sleep, is reliable. The fine-grained breakdown, how much of it was deep, is the part people obsess over and the part the device is least sure about. A lot of wearable anxiety is people taking the shakiest number on the screen most seriously.
The variance is real, but it is not random, and that is the key that unlocks the whole problem. Apple Watch tends to over-estimate total sleep time. Oura tends to run conservative on deep sleep. WHOOP leans heavily on movement, so a restless sleeper gets marked down harder there than on Oura. Once you learn the direction of a device's error, its numbers become readable. A biased measurement you understand is far more useful than an unbiased one you do not.
The HRV problem, specifically
Heart-rate variability is the single most useful number a consumer wearable produces, and the single most misread. It is the tiny, beat-to-beat variation in the gap between heartbeats, and it is a genuine window into your autonomic nervous system: higher generally means more recovered and relaxed, lower means more stressed or fatigued. People compare their HRV across devices, panic at the difference, and conclude one device is broken. Two clarifications dissolve almost all of that confusion.
First, RMSSD versus SDNN. There is more than one way to turn a string of heartbeat intervals into a single HRV number. RMSSD (the root mean square of successive differences) measures the jitter between consecutive beats and tracks short-term, parasympathetic activity. SDNN (the standard deviation of normal-to-normal intervals) captures variation over a longer window and reflects more of the whole system. The two are computed from the same heartbeats and still produce different numbers, because they are asking different questions. Most consumer devices that say "HRV" without elaborating mean RMSSD. Apple Health, Oura, and WHOOP all report RMSSD; Garmin reports it but also exposes SDNN. Comparing an RMSSD from one device to an SDNN from another and expecting a match is like comparing a sprint time to a marathon time and concluding one stopwatch is wrong.
Second, when the reading was taken. HRV swings enormously across a single day. It is highest in deep sleep, drops as you wake, and lurches around with caffeine, stress, posture, and your last breath before the measurement. Apple Watch samples opportunistically and surfaces daytime values. Oura computes an average across a defined overnight window. WHOOP also reads overnight, in a slightly different slice of sleep. So a midday Apple Watch HRV and an Oura overnight HRV are not two takes on the same quantity. They are the same person at two different times of day, which is a different physiological state. Same tree, different hour, different fruit.
A biased measurement you understand is more useful than an unbiased one you do not. The skill is not collecting more numbers. It is learning to read the ones you already have.
How to read your devices tonight
Here is the practical core, the part you can act on before any app enters the picture. None of it requires extra software. It just requires treating your wearables like instruments with known characters instead of oracles handing down verdicts.
Pick one device per metric, and stop averaging
The instinct, when two devices disagree, is to split the difference. Resist it. Averaging only helps when errors are random and cancel out. These errors are biased in known directions, so averaging a device that over-counts deep sleep with one that under-counts it does not give you the truth. It gives you a new, untraceable number that belongs to neither device. Instead, choose the best instrument for each measurement and let it own that job:
- Sleep stages: a ring or a bedside sensor (Oura, Eight Sleep). They validate against the sleep lab with the tightest margins among consumer products, partly because the finger gives a cleaner pulse than the wrist.
- Nightly HRV: a ring, a chest strap, or WHOOP, read overnight. Skip wrist HRV captured during a busy day; motion turns the optical signal into noise.
- Heart rate during hard workouts: a chest strap is the gold standard, full stop. A wrist optical sensor is fine for steady-state cardio but smears the picture during intervals, exactly when accuracy matters most, because the wrist motion and the rapid heart-rate swings fight each other.
- Steps and general movement: any major device is fine. They differ by a few percent, which is well below the level at which step counts actually change a decision.
- VO2 max estimate: Garmin or Apple Watch, fed by outdoor runs with GPS, give the cleanest estimates, since pace plus heart rate is what the estimate is built from.
Track the trend, not the absolute number
This is the single most freeing idea in wearable data, and the one that does the most to quiet the anxiety. Your HRV from one device, tracked across weeks, is worth far more than any one night's reading. Whether your absolute value is 45 ms or 60 ms barely matters; what matters is whether your own baseline is drifting up or down. Each device's bias is roughly constant, so it cancels out the moment you compare a device only to its own past. The same is true of resting heart rate, sleep duration, and recovery score. Stop asking "is this number good." Start asking "is this number moving, and which way." A person who watches their own trend line will catch a brewing illness or a creeping overtraining pattern days before they feel it, and will also stop flinching at the normal night-to-night scatter that means nothing.
Learn to spot a sensor error
Some numbers are not signals at all. They are the device failing, and recognizing the difference is most of the literacy. If your watch reports a resting heart rate of 102 on a calm Tuesday morning, that is almost never your body. It is a band that slipped, an optical reading through a pocket of air, a sensor that lost the skin. The tell is that a true physiological shift moves in concert with other signals: a real stress response nudges HRV, resting heart rate, and sleep together. A lone number stampeding off on its own while everything else holds steady is a measurement artifact, not a revelation. The mark of someone who can read their own data is that they can look at a scary number and correctly decide to ignore it.
What good data literacy actually buys you
Zoom out from the gadgets for a moment, because this is the part that outlasts any particular brand. The point of learning to read your own physiology is not to win an argument with your watch. It is what the skill gives you in ordinary life.
It buys you less anxiety. Most wearable distress comes from treating a noisy estimate as a precise grade. Understand the noise and the grade loses its power to ruin a morning. It buys you better decisions: a sustained dip in your own HRV trend is a real reason to back off training or protect your sleep, while a single ugly night is not. And it buys you earlier signals. The body broadcasts trouble, a coming infection, accumulating fatigue, the slow toll of stress, in these numbers before you consciously feel it, but only if you can tell a meaningful drift from random scatter. That early-warning capacity used to belong to clinics and elite sport. The sensors have quietly handed it to everyone. The literacy to use it has not been handed out nearly as evenly, and that gap is the real story of the wearable era.
Where a coach changes the math
Everything above is doable by hand. But doing it by hand, every morning, across three devices, for years, is exactly the kind of work people start strong on and quietly abandon. This is the narrow, honest place where software earns its keep, and it is what Vora is built to do.
The job of reading several devices at once and producing one coherent signal is what we mean by reconciliation. Vora takes what each device reports, weights each by what it is actually good at, accounts for its known bias, and outputs one number per metric per night, the number you would arrive at yourself if you had the time and the reference tables in your head. It does the bias correction so you do not have to memorize that Oura runs conservative and WHOOP overweights movement.
It also makes the guidance specific instead of vague. When Oura says recovery is 78 and WHOOP says 64, a good coach does not pick one and bury the other. It reports both, explains why they diverge (Oura forgiving a poor-sleep night, WHOOP penalizing the movement in it), and tells you what to do given the pair. Holding two numbers at once without panic is exactly what a thoughtful human coach does, and it is what reconciliation makes possible at scale.
And it catches the errors. That resting heart rate of 102 on a quiet Tuesday is not a recovery crisis; it is a sensor that slipped. A system that has read every prior Tuesday, and every other signal from that same morning, knows to flag it and move on. A dashboard that simply renders whatever it is handed will show you the 102 and let you spend the day worried about nothing.
The point of all this
The lesson is not that wearables are unreliable. They are, on balance, reliable enough for the questions most people are actually asking them, and they are getting better every year. The lesson is that the device is only half of the instrument. The other half is the person reading it, and that half has been badly under-served. Buying a second wearable does not double your insight. It doubles your numbers, often with contradictions, and the difficulty of holding both is what makes most people give up on either.
Whether you do it in your head or let something do it for you, the skill is the same: know what each sensor is really measuring, trust trends over absolutes, and recognize an error when one stares back at you. A generation has been handed a new sense organ. Learning to read it is the work of the decade, and the good news is that it is learnable, starting tonight, with the devices already on your wrist.
For more on how Vora reconciles multi-device data, see the Technology hub at askvora.com/technology/data-reconciliation. For the full per-device accuracy comparison, see askvora.com/technology/sleep-accuracy.

The Case for Connected Health
Why six fragmented apps fail you, and what it means to read your body as one coherent signal.

A Field Guide to Your Bloodwork
ApoB, Lp(a), hsCRP, A1C, Vitamin D, Ferritin and the biomarkers that actually predict your trajectory.

The Longevity Stack
Seven body systems that move the needle on healthspan, and how to read them together.