I'm Building the Performance-Review Tool That Doesn't Exist.

Somaditya Roy

Eight months scanning four tool categories. Lattice tells managers to use ChatGPT. Day One has no schema for work. Here's the wishlist that came out, and why each item is on it.
The first time I really looked at the performance-management software market, what stopped me wasn't the price points or the feature matrices. It was a help article on Lattice's own website — Lattice being the platform SelectSoftwareReviews calls "the gold standard in people success management" — instructing managers on which ChatGPT prompts to use when writing their reviews.
A category-defining vendor, valued in the hundreds of millions, was telling its customers to use a generic chatbot to do the actual writing the platform charges them for. That is not a feature gap. That is an admission.
I had been looking for a tool that could help me, as a mid-career professional with a decade of work across business analysis, delivery leadership and a failed startup, remember and articulate the work I had done — in my own voice, on my own cadence, for my own use, and not for HR's review cycle. The tool I wanted didn't exist. After eight months of scanning the landscape, evaluating four distinct tool categories, and trying most of them, I started building it myself.
This piece is the wishlist that came out of those eight months, and the reason each item is on the list.
The Four Categories, and What Each One Got Wrong
The mid-career professional preparing for a performance review reaches across four uneven tool categories, none of which was designed for what they actually need to do.
Performance-management platforms — Lattice, 15Five, Reflektive, Culture Amp, Leapsome, Bonusly — claim to consolidate reviews, OKRs, 1:1s and engagement. G2's 2025–26 reviews of Lattice cluster around limited customisation (43 mentions), difficult usability (42), missing features (68) and learning curve (46). One verified G2 reviewer described workflows as "click-heavy and time-consuming, especially when you're just trying to make a small update quickly." 15Five at $4–$16 per user per month exhibits a structural decay that ThriveSparrow's 2026 analysis named directly: "After six to eight months of weekly pulses, employees have seen the same prompts dozens of times. G2 and Capterra reviewers describe engagement scores becoming unreliable." Reflektive, with ~450 customers including Blue Origin, Comcast and Instacart, was acquired by PeopleFluent on February 1, 2021 for $14.2M — and with the acquisition came the unilateral end of data portability for everyone who had been depositing their career artefacts inside it. Culture Amp's unit of analysis is the team, not the person; Bonusly's recognition data is trapped inside Bonusly. None of these platforms produce an artefact the employee owns.
Journaling and reflection apps — Day One, Journey, Reflectly, Stoic, Rosebud — are built for memory-keeping or emotional processing, not impact narration. Choosing Therapy's 2024 review of Day One puts the gap plainly: "Day One is a great journaling app, but it does not provide prompts, has no integrated mood check-ins, and offers no professional support." Even Rosebud — the most sophisticated entrant in the category, $6M seed from Bessemer Venture Partners in 2025 — is explicitly architected by co-founder Chrys Bader as a between-therapy-sessions tool, in Cognitive Behavioral Therapy (CBT) and Acceptance and Commitment Therapy (ACT) space, with a deliberate design choice not to be shared with employers. No product in this category has a schema for "what I worked on this week," no taxonomy for projects, stakeholders or outcomes, no export shape that maps to a self-review.
Ad-hoc AI assistants — ChatGPT, Claude, Gemini, Notion AI — produce generic boilerplate. Lattice's own guide concedes that "a generic prompt earned a generic answer, with fluffy boilerplate compliments… more like a Yelp review than an actual performance evaluation." Textio's 2023 experiment found ChatGPT used female pronouns 90% of the time when prompted to write feedback for a receptionist and male pronouns 100% of the time for construction workers, with feedback for women longer and more critical. Peter Cappelli, at Wharton, in Carrier Management in November 2025: "I get a reason to discount this and not pay any attention to it because I don't think it actually came from my boss."
DIY manual systems — Notion templates, Google Docs, brag-doc Gumroad templates, Obsidian, paper — shift all the labour to the professional. Bragdocs.com's own marketing names the failure: "When you're in the thick of a project it's impossible to find the time to sit and write about it. But when it comes to your performance review or job interview, you're stuck with a bad memory and a blank page." The engineer Jeff Morhous: "Trying to do this on a yearly or even quarterly basis is playing life on hard mode."
The articulation use case sits in the gap between all four. None of these tools elicit work-specific content over time, in the user's voice, that the user owns and can export.
The Wishlist
Here is the list I ended up with, item by item. Each one points at a specific failure of the existing landscape.
1. The system asks questions instead of presenting a blank field. Rosebud's design philosophy names the bottleneck correctly: "the biggest source of friction for most people is blank-page anxiety, not willpower." Day One, Journey, Obsidian and every Notion brag-doc template I have seen open with a blank field and a cursor. The tool I want opens with a question — specific enough that the answer is recoverable from memory, narrow enough that the user has something to say in two minutes.
2. It captures glue work and invisible labour by name. Tanya Reilly's Being Glue (LeadDev NY 2019) is the canonical text on the coordinative, mentoring, unblocking labour that "disproportionately falls to women and senior ICs, then is invisible at calibration time." Babcock, Recalde, Vesterlund and Weingart's 2017 American Economic Review paper quantified it: women are 48% more likely to volunteer, 50% more likely to say yes when asked, and 44% more likely to be asked to do low-promotability work. Performance platforms funnel everything into OKR fields that miss this work. The tool I want has a category for it, with its own prompts: who did you unblock this week?
3. The cadence outpaces the forgetting curve. Ebbinghaus's 1885 forgetting curve, replicated by Murre and Dros in PLOS ONE in 2015, documents significant memory decay starting at the 24-hour mark. Julia Evans built her now-canonical brag-doc method explicitly to counter this — yet the method itself depends on the user remembering to write. The tool I want initiates the nudge. Daily is too aggressive; weekly is the engagement sweet spot before fatigue.
4. It preserves the user's voice rather than averaging to LLM tone. Ericsson and Simon's Protocol Analysis (1980 / 1993) distinguishes three levels of verbalisation. Level 1 and Level 2 (what happened, what you did) preserve the structure of the task being narrated. Level 3 (explanations, justifications, the meaning of what you did) is where the AI offloading risk lives. Cappelli again: "when people are given AI tools, they often offload analysis to the technology and rely less on their own judgment." The tool I want elicits at Level 1 and 2, and refuses to do Level 3 for the user. The reader has to write the meaning themselves.
5. The output is narrative, not numerical rating. This one is empirically anchored. Joonyoung Kim, Caitlin Stroup and Emily Zitek published in Academy of Management Discoveries in December 2025 (DOI 10.5465/amd.2023.0308) a study of 1,600 employees across four vignette conditions. Narrative-only feedback was rated as significantly fairer and more motivating than numerical-only or combined numerical-plus-narrative formats. Every performance-management platform on the market defaults to numerical templates. The tool I want exports a narrative document the user can paste into the form their employer requires, or send to a recruiter, or keep for themselves — and is never the rating itself.
6. Voice-first capture for friction reduction. Stanford and Baidu Research's 2017 work on speech-to-text established speech at about 150 words per minute against 30–40 for mobile typing — roughly three times faster. The bottleneck for a mid-career user is not the volume of words they can produce in a vacuum. It is the moment between meeting-end and next-meeting-start, the four-minute window in which capturing what just happened is theoretically possible and practically impossible if it requires opening a laptop. The tool I want listens.
7. User-owned data with export that survives the employer relationship. Reflektive's 450 customers learned this the hard way in 2021. The University of Miami School of Law in April 2026 noted that California's CPRA "expands privacy rights, establishing a 'legal and enforceable constitutional right of privacy' that extends to the workplace." Data pasted into an employer-paid ChatGPT seat may not survive employment changes and is "potentially discoverable in litigation." The tool I want stores nothing the user cannot take with them.
8. It refuses to write the user's narrative for them. This is item four restated as a design constraint. The user owns the analysis; the tool owns the elicitation and the structure. When the AI writes the summary, the summary is generic, biased, and discountable. The tool I want is an interrogation partner, not a ghostwriter.
What Drew Houston and Tobi Lütke Knew
The list above is, on inspection, the kind of list you write when you have decided to build something and are talking yourself into the decision. I should be honest about that.
The founder-origin pattern that recurs across the products I most respect is the same one Drew Houston described to Fortune in June 2017, after he forgot his thumb drive on a Chinatown bus from Boston to New York: "I was going from Boston to New York on the Chinatown bus, forgot my thumb drive, and I was so frustrated — really with myself, because this kept happening… I opened up the editor and started writing some code." Tobi Lütke, on Tim Ferriss's podcast in February 2019: he and Scott Lake launched Snowdevil in 2004 to sell snowboards online, found the existing e-commerce platforms clunky and expensive, built one themselves in Ruby on Rails over about two months, and then realised "it quickly became obvious that the software was more valuable than the snowboards." Sahil Lavingia built the first version of Gumroad over a weekend at age 19 because he wanted to sell an icon and the existing options were too much friction. Pieter Levels built Nomad List as a spreadsheet because he wanted to choose his next city and the existing blog posts weren't quantified.
The common pattern: the founder was the first user, the pain was specific and recurring, and the first version was tiny, custom, and made for themselves before it was made for anyone else.
I am the first user. The pain is specific. It recurs in November and again in May, like clockwork.
The Case Against Building This
The strongest argument I have read against doing what I am doing is Oliver Burkeman's. At oliverburkeman.com he describes the "done list" — a single sheet of paper, empty at the start of each day, filled in as the day proceeds. "Each entry is a cheering reminder that you could, after all, have spent the day doing nothing constructive — yet look what you did instead." In Behavioral Scientist in 2024 he warned that "the reward for good time management is more work," and that AI's efficiency gains are no exception to the rule. The implication is direct: a notepad and the discipline to use it daily is enough. Building a specialised tool may be a sophisticated form of avoidance.
Burkeman is probably right that a notepad would work for him. Cal Newport, no less disciplined, tried to build his own bullet-journal system, named it BuJoPro, and abandoned it. "BuJoPro Thoughts… was not a success," he wrote. If the patron saint of disciplined personal systems can't make a custom system stick, the case for a notepad is weaker than it first sounds. Discipline is not the bottleneck. The bottleneck is initiation: the blank page, the forgotten week, the moment between meetings when the system would help but isn't open.
What I am building is not a notepad. It is a question, asked at the right time, in a place where the answer can be spoken in two minutes — and then organised, named, and made retrievable when the November or May review cycle arrives.
What Comes Next
The tool I want refuses to write my narrative for me. It elicits the material; I do the meaning-making. It captures glue work explicitly; it stores nothing it cannot give back to me; it operates on a cadence the forgetting curve sets, not my employer.
And that is what I have been building. The name is Catalyst. The next pieces in this series cover the specific design decisions — what was kept, what was cut, what is still being argued about with myself.
If you have been writing your own brag document in a Google Doc and wondering whether there is something better — these posts are for you. Follow along at somadityaroy.com.