Political surveys often include multi-item scales to measure individual predispositions such as authoritarianism, egalitarianism, or racial resentment. Scholars typically use these scales to examine how these predispositions vary across different subgroups, comparing women to men, rich to poor, or Republican to Democratic voters. Such research implicitly assumes that, say, Republican and Democratic voters’ responses to the egalitarianism scale measure the same construct in the same metric. Unfortunately, this research rarely evaluates whether this assumption holds. We present a framework to test this assumption and correct scales when it fails to hold. We apply this framework to 13 commonly used scales on the 2012 and 2016 ANES. We find widespread violations of the equivalence assumption and demonstrate that these violations often lead to biased conclusions about the magnitude or direction of theoretically-important group differences. These results suggest that researchers should not rely on multi-item scales without first establishing measurement equivalence.