Making feedback work: Principles of effective employee feedback and 360° reviews
Most feedback is well-intentioned. A great deal of it does nothing. A surprising amount makes things worse. We design feedback that people can use and actually improve.
A leader opens a feedback report and finds a number. 3.2, say, on a five-point scale. Then begins the quiet, familiar work of trying to feel something useful about it. Is that good? Is it a warning? The number sits there, flat and final, telling them almost nothing about the thing they most want to know: How do the people I work with see me, and where do they see me differently from how I see myself?
This is the trouble with most feedback. It arrives as a verdict, and a verdict is one of the least useful things you can hand someone who is trying to grow.
“The organisations we work with come to us with feedback systems that are busy, trusted, and not doing very much. People brace for the annual review the way they brace for a verdict. The reports are full of scores and short on anything a person could act on by Monday. The exercise has become a ritual rather than a tool.”
The research is sobering on this point. In a landmark review of 607 studies, Kluger and DeNisi found that feedback improved performance on average, but more than a third of the time it made performance worse. Feedback is not a neutral good. Its effect depends on where it sends a person's attention. When feedback points attention at the self — your status, your rank, your reputation — it can swallow the very attention needed to improve the work. When it keeps attention on the task, the standard, and the next thing within a person's control, it is far more likely to help.
Multi-rater feedback adds a second problem: the measurement itself. A rating is never a clean reading of the person being rated. It is a blend of what that person did, what the rater happened to see, the rater's own standards, and ordinary error. Research by Scullen, Mount and Goff found that a large share of the variation in ratings comes down to the rater rather than the person being rated — the quirks of who is holding the pen, not the performance on the page. That does not make ratings useless. It makes naive averaging dangerous. A mean score can look objective while hiding the very pattern that matters most: who sees what, from which vantage point, and where their views diverge.
So the organisations we work with come to us with feedback systems that are busy, trusted, and not doing very much. People brace for the annual review the way they brace for a verdict. The reports are full of scores and short on anything a person could act on by Monday. The exercise has become a ritual rather than a tool. Our job is to rebuild it into something useful, and the rebuild rests on a few principles the evidence keeps confirming.
Measure the disagreement, not just the score. Picture two competencies that both average 3.2. In the first, every rater independently landed near 3. They see roughly the same thing. In the second, that 3.2 is the midpoint between 5s and 1s: half the room sees a real strength, the other half sees a real weakness. Same number, completely different problem. The first is a relatively stable signal. The second is a split worth investigating. It may mean the person behaves differently with different groups, that some groups have seen more than others, or that expectations differ across the organisation. We measure agreement alongside the score, so these two situations never hide behind the same number. Where agreement is high, the conversation can move faster. Where it splits, the split is the work.
Treat the self–other gap as evidence, not error. Almost every honest review reveals a distance between how a person rates themselves and how their colleagues rate them. The instinct is to correct it toward the "right" answer. That wastes some of the most useful developmental information in the report. A high self-rating beside lower colleague ratings may signal a blind spot — but it may also reflect different evidence, or unclear expectations. A low self-rating beside higher colleague ratings may reveal an under-claimed strength. The work is not to close the gap. It is to make it precise enough to talk about.
Gather feedback from the angles that see different people. A direct supervisor sees the leader who manages upward. A peer sees the one in the meeting nobody is chairing. A direct report sees the one whose decisions set the conditions for their work. The person themselves sees the one they intend to be. None of these is the true picture; all of them are real. So the choice of who rates whom is a design decision, not an administrative step. Making it deliberately — across reporting lines and alongside self-assessment — is what lets a person see themselves in the round rather than from a single, flattering or unflattering, angle.
End in behaviour, not in a competency framework. Development does not happen in a radar chart. It happens in the meeting someone chooses to run differently, the decision they finally delegate, the question they ask their team for the first time. We translate the perception data into plain behavioural choices: what to start, what to stop, what to continue. A leader cannot act on "collaboration: 3.2." They can act on "in decisions that affect the team, ask for dissenting views before you state your own."
This is also why developmental feedback should be kept separate from appraisal, unless an organisation is prepared to defend the measurement. The moment scores are tied to pay, promotion, or ranking, people read them differently. Raters grow cautious, recipients grow defensive, and the report starts to work less as a learning tool and more as evidence in a judgement. Used well, 360 feedback is not a verdict. It is a structured way to compare perspectives and decide what behaviour should change next.
None of this asks an organisation to give its people more feedback. It asks for feedback designed around how people take information in: specific enough to act on, safe enough to examine, and honest enough to show where perceptions diverge. The point is not to rank someone against a scale. It is to help them see which behaviours are visible to which people, where the pattern holds, where it splits, and what to try next. A number tells a person where they appear to stand. The texture underneath it tells them where to grow. We are far more interested in the second.

