It rewards articulation, not work. The 14% of employees who say theirs inspires improvement already know it.

Your Performance Review Isn't About Your Performance

Every December, somewhere in a tech company that prides itself on rigour, a senior engineer opens a blank self-assessment field, stares at it for forty minutes, and writes the sentence Julia Evans wrote on her blog in April 2019: "wait, what did I do in the last 6 months?"

The engineer is not having a memory failure. Twelve months of nonlinear, multi-threaded contribution does not condense itself into a text field. The fact that it must condense — that the only artefact the system will register is the one the employee produces, on deadline, alone, with no scaffolding — is the actual failure. It is a documentation failure the company has outsourced to the employee, then graded the employee on.

This article is about that outsourcing, and about why it produces the outcomes it does — outcomes that Gallup, Cornell, and Hacker News all describe in roughly the same terms, from roughly the same direction, with the senior professionals affected by them mostly unsurprised.

The Problem Hiding in Plain Sight

The standard performance-review apparatus — annual cycle, blank self-assessment field, manager-led recall, calibration meeting — was built around an assumption that no longer holds: that the people doing the work and the people rating the work share a roughly accurate picture of what happened in the preceding twelve months.

They do not. The picture is fragmented across the employee's own faded memory, the manager's selective attention, the new manager who arrived two quarters ago and "subconsciously," as one Blind commenter put it, does not count anything that happened before their tenure, and the calibration room where ratings get adjusted to fit pre-allocated quotas rather than observed performance. By the time the rating is filed, the year that was actually lived has been reduced to whatever fragments survived the gauntlet.

The system advertises objectivity. The output is the opposite. Per Gallup's Re-Engineering Performance Management report, drawing on US workforce panels, only 14% of employees strongly agree that the performance reviews they receive inspire them to improve. Twenty-nine per cent strongly agree the reviews are fair. Twenty-six per cent strongly agree they are accurate. The remaining majority is filling out the form anyway, because the form is mandatory and the consequences of leaving it blank are worse than the consequences of leaving it shallow.

This is not a story about individual managers being lazy or individual employees being inarticulate. The system, run conscientiously by reasonable people, produces these numbers. Something about its architecture is structurally wrong.

The architecture, on closer inspection, rewards articulation rather than work — while pretending to do the opposite. The signal that exits the calibration room is not a measurement of performance. It is a measurement of how legibly performance was narrated, by whom, to which audience, against which quota. The 14% of employees who say their review inspires improvement are the ones for whom that narration happened to land. The other 86% are not bad performers. They are bad narrators, or were rated by managers who were bad narrators on their behalf, or got caught in the calibration room on the wrong side of a quota.

The Recognition Collapse

The architecture has a name worth coining for it, because the pieces fail together rather than separately. Call it The Recognition Collapse: the predictable process by which work that was done well becomes invisible between the moment of doing and the moment of being rated. The collapse runs along four axes, each empirically documented, each compounding the others.

The Recall Axis. Employees forget. Managers forget more. The forgetting is not failure of attention; it is the Ebbinghaus curve operating on knowledge workers who run dozens of parallel threads. Hermann Ebbinghaus showed in 1885 that memory of unrelated items decays steeply within days, a result replicated by Murre and Dros in PLOS ONE in 2015. Applied to workplace recall, DeNisi and Murphy's 2017 Journal of Applied Psychology review documents what they call the persistent idiosyncratic rater effect — the finding that a manager's rating reflects more about the manager (their tendency to rate high or low, their halo biases, their similar-to-me preferences) than about the rated employee. Evans's diagnosis is sharper still: "if you don't remember everything important you did, your manager (no matter how great they are!) probably doesn't either. And they need to explain to other people why you should be promoted." The recall failure cascades up — manager, then VP, then calibration committee — each layer further from the work than the last.

The Glue Work Axis. Tanya Reilly, then Principal Software Engineer at Squarespace, named the second axis in her LeadDevNYC 2019 talk Being Glue. Glue work is the non-coding, non-promotable, coordinative and cultural labour that holds engineering teams together — unblocking colleagues, mentoring juniors, onboarding new hires, planning retreats, running incident reviews, translating between teams. It is real, it is valuable, and it is structurally invisible to promotion packets, which want code or other "quantifiable technical work." Reilly's opening example is canonical: an engineer who received glowing reviews and believed she was on the senior track, but whose company quietly expected technical output she was not producing because she was unblocking everyone else, and whose manager never told her she was over-investing in invisible labour. Babcock, Recalde, Vesterlund and Weingart established the empirical backbone for the gendered version of this pattern in the American Economic Review in 2017: women volunteer for and are asked to do non-promotable tasks significantly more than men. The glue work happens. The promotion does not.

The Calibration Axis. Even the work that survives recall and is named correctly enters a room where ratings get re-negotiated against pre-allocated quotas. An anonymous engineer on Blind described the mechanism without embellishment: "Typically managers enter a room, rank people and then justify their ratings. Almost everyone on those meetings are competing with each other to get their share of the quotas for good scores while throwing the bad ones to others. But all while pretending that this is not what is happening." A Google engineer in the same forum logged a specific instance: "P1 was initially assessed for Outstanding impact and was considered for Transformative Impact. A few days passed and then their rating was lowered to Significant Impact 'due to lack of bla bla bla L4 signal.'" The justification arrives after the rating, not before. Whatever the self-assessment said, by the time the rating exits the room, the document has been overwritten by quota math.

The Self-Promotion Axis. Finally, the system rewards the people who are best at narrating their work — not necessarily the people who do it best. An anonymous Blind user in the thread Promotions as a carrot you can never reach? states it without hedge: "They are the ones that are typically best at three requirements: 1) Constant self promotion 2) Taking personal credit for the work of the team 3) Natural charisma and presentation skills." This is not a moral failure of the people promoted; it is a feature of a system whose final signal is a manager's subjective view, and whose subjective view is heavily weighted toward what the manager can remember and articulate about each report. Bowles, Babcock and Lai's 2007 study added the demographic complication: women face measurable backlash for the same self-promoting behaviour men are rewarded for, which is what makes the self-promotion axis structurally compromised even before the calibration axis touches it.

The four axes are not parallel problems. They compound. Faded recall (Axis 1) hands the rater a thin record. Glue work (Axis 2) ensures the thin record is missing the most relational contributions. Calibration (Axis 3) overwrites the thin record against quota math. Self-promotion (Axis 4) advantages the people whose record was loudest, not whose record was most valuable. The Recognition Collapse is what happens when all four operate simultaneously, which they always do.

What the Public Forums Show

The senior professionals affected by the collapse are not, as a group, surprised. The Hacker News thread Unfortunate things about performance reviews, originally posted in 2021 and reactivated in November 2024, contains the diagnostic comment from u/hunter-gatherer: "I have gone from a high performer to something lower in the past 6 months because my team got a new boss, and we are all miserable. No matter what I do I can't seem to get off the shit list." The new-manager problem and the recency-bias problem are described, in a single sentence, as a single mechanism — and the commenter has already concluded that no individual effort can correct it.

The same thread surfaces the rating-scale absurdity that calibration produces. u/hinkley, November 4, 2024: "You have a person Tom who is a level 3 but is working at a consistent 3.8. If Tom turns in a 3.9 this year, he deserves an Exceeds, but he's going to get a Meets or Exceeds because he's only improved a little. The whole thing isn't based on objectivity it's based on negging employees." The structure rewards step-changes the employee can narrate, not consistent excellence the manager forgot to log.

On the cascading recall failure up the org chart, a reply in the Blind thread Stop making me justify my paycheck compresses the entire upward problem into one line: "Your manager and TL barely has time to keep track of your day to day, what makes you think your VP will lol?" The self-assessment is not, in practice, a self-assessment. It is a briefing document for a stranger who will decide the writer's compensation.

And on the survival strategy that does work — found across hundreds of HN comments, distilled by u/iuafhiuah in December 2022 — "Just submit a list of single sentences that say things like 'Designed and implemented a doodad that saved n hours per day and enabled $BUZZWORD'. List the outcomes of things you did this year, use STAR, whatever you prefer. The best part of doing it this way — if they don't give you a nice bonus/raise, you've just updated your CV ready for your next place of work!" The advice is correct and the framing is exhausted. The advice is also a private workaround, not a system fix. Every senior professional reading it is being told, in essence, that the legitimate response to the Recognition Collapse is to maintain a parallel ledger for the day they leave.

Reilly's coda from Being Glue, then, becomes the unsentimental version of the diagnosis: "Managed deliberately, glue work demonstrates and builds strong technical leadership skills. Left unconscious, it can be extremely career limiting. It pushes people into less technical roles and even out of the industry." The collapse is structural. The skill of working around the collapse is also real. Both things are true at once, which is what makes the situation hard.

The Three Mistakes That Make It Worse

Inside the structural collapse, three individual mistakes recur often enough across the forum corpus to be worth naming. None of them caused the collapse. All of them deepen the damage.

The first is the list-of-tasks self-assessment. Mid-career professionals, having internalised the cultural prohibition against bragging, write self-assessments that read like activity logs and expect the reader to infer impact. Calibration rooms do not infer. The fix u/iuafhiuah names — single sentences that explicitly state the outcome the activity produced — bypasses the trap. A self-assessment is not a record of work done; it is a piece of persuasive writing addressed to a calibration room. The professional who treats it as a record loses to the professional who treats it as an argument, every time.

The second is doing too much glue work without naming it as glue work. The mistake is not the glue work — teams need it. The mistake is failing to negotiate it explicitly with the manager as part of the role, failing to time-box it, and failing to surface it in the review as leadership rather than as undifferentiated helpfulness. Reilly's framing converts the same work into a promotion case rather than a promotion liability: not "I helped a lot of people this year" but "I unblocked four senior IC initiatives that would otherwise have slipped a quarter." The same work. A different sentence. A different rating.

The third is trusting the manager's memory. The Blind commenter on the Stop making me justify my paycheck thread vents the instinct directly: "I don't want to spend two hours filling out a form about my 'top accomplishments' like I forgot what I worked on all year. You saw the work. You paid me. That was the agreement." The sentiment is correct on every dimension except the one that matters: the manager did see the work, and the manager has now forgotten most of it, and the manager will be required in three weeks to defend a rating to a calibration committee that has never seen the work at all. Evans's recommendation — externalise memory into a brag document the employee owns rather than a memory the manager and employee are both hoping to reconstruct on deadline — solves the immediate problem and entrenches the underlying one. The recommendation should be taken anyway. The entrenchment is a separate fight.

What This Leaves Open

The honest version of the diagnosis has both halves. The Recognition Collapse is real, structural, documented, and not solvable inside the existing performance-review architecture by individual effort. DeNisi and Murphy, surveying a century of Journal of Applied Psychology appraisal research, conclude that the field's role has been "primarily to test ideas and models proposed elsewhere" — that is, that a hundred years of research has not removed the rater from the rating, and may not be able to. The structural argument is sound.

And: the individual skill of articulation is also real, also separable from the structural failure, and also the only lever a mid-career professional has on the day the self-assessment is due. Reilly's framing is the one that holds both halves at once. The system is broken. The professional who learns to name, frame, bound, and narrate their work — including the glue work — converts a career-limiting situation into a career-defining one. The professionals who do not, do not.

What that conversion actually requires — what tools, what cadence, what kind of question, what scaffolding has to exist for a tired senior IC at 9pm on a Sunday in December to write something worth reading — is the next question. The next piece in this series looks at the tools professionals are actually building to close the gap, and at why the most obvious approach (track everything, surface it on demand) keeps producing 70% abandonment within a hundred days.