Against cosine distance as a measure of disproportionality

elections
proportionality
Author

Chris Hanretty

Published

May 14, 2024

People who study elections have lots of different ways to measure disproportionality. This is not surprising: measuring disproportionality can either be seen as a problem of measuring distance or as a problem of measuring distributional inequality or even as a problem of association, and there are many ways of measuring all of these things.

One index which has been proposed more than used is cosine distance. Koppel and Diskin propose this in a 2009 paper; their measure has been used in a couple of papers, though I have not done an exhaustive search.

The cosine distance between two vectors is equal to one minus the cosine similarity between two vectors. The cosine similarity between two vectors can be described in two ways:

Writing this out formally, if we have two vectors, \(\boldsymbol{v}\) for vote shares and \(\boldsymbol{s}\) for seat shares

\[ S_{\cos{}}(\boldsymbol{v}, \boldsymbol{s}) = \cos{\theta} = \frac{\boldsymbol{v}\cdot\boldsymbol{s}}{ \lVert{}\boldsymbol{v}\rVert \lVert{}\boldsymbol{s}\rVert{}} \]

In this equation, I use the double vertical bars to indicate the Euclidean norm, so that we can restate this expression as

\[ S_{\cos{}}(\boldsymbol{v}, \boldsymbol{s}) = \frac{\sum_{i=1}^n s_i v_i}{\sqrt{\sum_{i=1}^n v_i^2} \cdot \sqrt{\sum_{i=1}^n s_i^2}} \]

We can therefore talk about cosine distance (rather than similarity) by defining distance as the maximum possible value of the cosine function (or one) minus the cosine similarity.

\[ D_{\cos{}}(\boldsymbol{v}, \boldsymbol{s}) = 1 - S_{\cos{}}(\boldsymbol{v}, \boldsymbol{s}) \]

If you are not using to thinking of angles in \(n\)-dimensional space, or “norms” of vectors (and I am not), then the cosine distance might seem quite odd. One way of making the cosine distance seem less odd is to relate is to other measures that we understand better. Koppel and Diskin do this in their 2009 article. They connect cosine distance to Gallagher disproportionality. They can do this because (per Wikipedia) cosine distance is equal to half the Euclidean distance between two normed vectors, where a normed vector is one where the square root of the sum of squares of the vector values is equal to one; and because Gallagher disproportionality is proportional to Euclidean distance.

Why (not) use this measure?

There are different reasons why you might want to use cosine distance as a measure of disproportionality. Perhaps you work primarily with text-as-data, where the cosine similarity of document-term vectors is almost de rigeur in calculating document similarity. Perhaps (unlike me) you are convinced by Koppel and Diskin’s arguments that orthogonality is an important criterion for a disproportionality measure.

Unfortunately, the cosine distance measure seems to have one major drawback which I’ve not seen mentioned. This relates to the use of “normed” vectors. Putting vectors on the same scale – which norming achieves – can be very important when you are working with text as data. If we are comparing the similarity of War and Peace with the Death of Ivan Ilyich it’s important to know that War and Peace is very much longer than the Death of Ivan Ilyich, and so working with raw counts of words is not very illuminating.

We typically don’t face these problems of scale with vote and seat shares. Sure, you could work with counts of votes and counts of seats rather than shares, but this would be eccentric. Even if you did work with counts of votes and counts of seats, you would still have to make an argument why it’s most appropriate to use a Euclidean norm to put these on the same scale, rather than simply dividing by the total number of votes and the total number of seats.

What’s worse is that the Euclidean norm is already doing some work in political science as the inverse of an effective number. The Euclidean norm of the vector of vote shares \(\boldsymbol{v}\) is

\[ \lVert{}\boldsymbol{v}\rVert{} = \sqrt{\sum_{i=1}^n s_i^2} \]

but this is just the reciprocal of the effective number of vote-winning parties, \(N_V\):

\[ N_V = \frac{1}{\sqrt{\sum_{i=1}^n s_i^2}} \]

This means that when we use cosine distance as a measure of disproportionality, we are saying (in terms more familiar to political scientists) that we should perform an operation of the vote and seat shares, and then multiply by the effective number of vote-winning parties, and then multiply again by the effective number of seat-winning parties.

Maybe you don’t see that as a problem. It’s not necessarily a problem for a measure that it incorporates other concepts or measures which have found to be useful in their own right. But it does mean that cosine distance has a problem with compositions involving perfectly proportional but differentially fragmented shares.

The problem

It’s easiest to illustrate the problem with an example. Let’s imagine we have a (partial) vector of vote shares and a (partial) vector of seat shares. Here it’s important that the sum of shares for both these vectors is the same, so in my example I’ve created by vector of seat shares by reversing the order of the vote shares.

Show the code
v_part <- c(.01, .02, .05, .1)
s_part <- rev(v_part)

These partial vectors sum to 0.18, so I’m going to add on another vector which will bring the sum up to 100%. This vector will be divided into two components, and it’s going to be the same vector for both vote and seat shares.

Show the code
v_add <- rep((1 - sum(v_part)) / 2, 2)
s_add <- rep((1 - sum(s_part)) / 2, 2)

v_add
[1] 0.41 0.41

We calculate one minus the cosine distance.

Show the code
cossim <- function(v,s) {
    (sum(v*s))/sqrt((sum(v^2))*(sum(s^2)))
}

newv <- c(v_part, v_add)
news <- c(s_part, s_add)

d_cos <- 1 - cossim(newv, news)
d_cos
[1] 0.0257732

We now have a measure of how disproportional this mapping between seat and vote shares is. I’m now going to change these seat and vote shares in a very specific way. Whereas before I divided the remaining vote and seat shares into two parts equally, now I’ll divide them into four parts equally. This will, as you might imagine, increase the effective number of vote- and seat-winning parties.

Show the code
v_add <- rep((1 - sum(v_part)) / 4, 4)
s_add <- rep((1 - sum(s_part)) / 4, 4)
s_add
[1] 0.205 0.205 0.205 0.205
Show the code
newv <- c(v_part, v_add)
news <- c(s_part, s_add)

d_cos2 <- 1 - cossim(newv, news)
d_cos2
[1] 0.0496963

Huh… our measure of disproportionality has doubled, although (intuitively) it doesn’t seem like it should have changed at all. If we were to write out this intuition in a bit more detail, we might say that where we have a measure of disproportionality which takes vectors which don’t sum to one, and where we concatenate two vectors one of which is perfectly proportional, or for which the measure is zero, then the measure of disproportionality on the concatenated vector shouldn’t change depending on the fragmentation of the perfectly proportional measure.

I’ve not seen this criticism of the cosine distance measure as applied to electoral statistics. Perhaps people have decided to pass it over in silence and continue using other measures they prefer. Since I’m starting a summer project on measures of disproportionality, I’ve found it useful to articulate this criticism for my own benefit, but maybe also (if you have read to the end of this post) for yours.