In many settings, people must give numerical scores to entities from a small discrete set—for instance, rating physical attractiveness from 1–5 on dating sites, or papers from 1–10 for conference reviewing. We study the problem of understanding when using a different number of options is optimal. We consider the case when scores are uniform random and Gaussian. We study computationally when using 2, 3, 4, 5, and 10 options out of a total of 100 is optimal in these models (though our theoretical analysis is for a more general setting with

Humans rate items or entities in many important settings. For example, users of dating websites and mobile applications rate other users’ physical attractiveness, teachers rate scholarly work of students, and reviewers rate the quality of academic conference submissions. In these settings, the users assign a numerical (integral) score to each item from a small discrete set. However, the number of options in this set can vary significantly between applications, and even within different instantiations of the same application. For instance, for rating attractiveness, three popular sites all use a different number of options. On “Hot or Not,” users rate the attractiveness of photographs submitted voluntarily by other users on a scale of 1–10. These scores are aggregated and the average is assigned as the overall “score” for a photograph. On the dating website OkCupid, users rate other users on a scale of 1–5 (if a user rates another user 4 or 5, then the rated user receives a notification). In addition, on the mobile application Tinder, users “swipe right” (green heart) or “swipe left” (red X) to express interest in other users (two users are allowed to message each other if they mutually swipe right), which is essentially equivalent to using a binary

Despite the importance and ubiquity of the problem, there has been little fundamental research done on the problem of determining the optimal number of options to allow in such settings. We study a model in which users have an underlying integral ground truth score for each item in

We then compute the average “compressed” score

We derive a closed-form expression for

One could argue that this model is somewhat “trivial” in the sense that it would be optimal to set

One line of related theoretical research that also has applications to the education domain studies the impact of using finely grained numerical grades (100, 99, 98) vs. coarse letter grades (A, B, C) [

While we are not aware of prior theoretical study of our exact problem, there have been experimental studies on the optimal number of options on a “Likert scale” [

We note that we are not necessarily claiming that our model or analysis perfectly models reality or the psychological phenomena behind how humans actually behave. We are simply proposing simple and natural models that, to the best of our knowledge, have not been studied before. The simulation results seem somewhat counterintuitive and merit study on their own. We admit that further study is needed to determine how realistic our assumptions are for modeling human behavior. For example, some psychology research suggests that human users may not actually have an underlying integral ground truth value [

Some work considers the setting where ratings over

Some prior work has presented an approach for mapping continuous prediction scores to ordinal preferences with heterogeneous thresholds that is also applicable to mapping continuous-valued ‘true preference’ scores [

Suppose scores are given by continuous probability density function (pdf)

Thus,

In general, for

Equation (

As an example, we see that

Consider a full distribution that has half its mass right around 30 and half its mass right around 60 (

If we happened to be in the case where both

We can next consider the case where both

For the case

In addition, finally, for

Using

An alternative model we could have considered is to use rounding to produce the compressed scores as opposed to using the floor function from Equation (

In general, this approach would create

One might wonder whether the floor approach would ever outperform the rounding approach (in the example above, the rounding approach produced lower error

For three options,

For general

Like for the floor model,

The above analysis leads to the immediate question of whether the example for which

The first generative model we consider is a uniform model in which the values of the pmf for each of the

For our simulations, we used

In the first set of experiments, we compared performance between using

We next explored the number of victories between just

We next repeated the extreme

We also considered the situation where we restricted the

We next repeated these experiments for the rounding compression function. There are several interesting observations from

The empirical analysis of ranking-based datasets depends on the availability of large amounts of data depicting different types of real scenarios. For our experimental setup, we used two different datasets from the Preflib database [

In the first set of experiments, the dataset contains different types of ratings based on the price, quality of rooms, proximity of location, etc., as well as overall rating provided by the users scraped from TripAdvisor. We compared performance between using

We next explored rounding to generate the ratings (

We next experimented on data from the 2002 French Presidential Election (

We also experimented on anonymous ratings data from the Jester Online Joke Recommender System [

For the TripAdvisor and French election data, the errors decrease intuitively as the number of choices increase. However, surprisingly for the Jester dataset, we observe that the average errors are very close for all of the options (

Settings in which humans must rate items or entities from a small discrete set of options are ubiquitous. We have singled out several important applications—rating attractiveness for dating websites, assigning grades to students, and reviewing academic papers. The number of available options can vary considerably, even within different instantiations of the same application. For instance, we saw that three popular sites for attractiveness rating use completely different systems: Hot or Not uses a 1–10 system, OkCupid uses 1–5 “star” system, and Tinder uses a binary 1–2 “swipe” system. Despite the problem’s importance, we have not seen it studied theoretically previously. Our goal is to select

We performed numerous simulations comparing the performance between different values of

A future avenue is to extend our analysis to better understand specific distributions for which different

Conceptualization, S.G.; data curation, F.B.Y.; formal analysis, S.G.; investigation, S.G. and F.B.Y.; methodology, S.G.; project administration, S.G.; writing—original draft preparation, S.G. and F.B.Y.; writing—review and editing, S.G.; visualization, S.G.; supervision, S.G.; project administration, S.G.

This research received no external funding.

The authors declare no conflict of interest.

Example distribution for which compressing with

Compressed distributions using

Example distribution where compressing with

Compressed distribution for

Number of times each value of

2 | 3 | 4 | 5 | 10 | |
---|---|---|---|---|---|

Uniform # victories | 5564 | 9265 | 14,870 | 16,974 | 53,327 |

Uniform average error | 1.32 | 0.86 | 0.53 | 0.41 | 0.19 |

Gaussian # victories | 3025 | 7336 | 14,435 | 17,800 | 57,404 |

Gaussian average error | 1.14 | 0.59 | 0.30 | 0.22 | 0.10 |

Results for

2 | 3 | |
---|---|---|

Uniform number of victories | 36,805 | 63,195 |

Uniform average error | 1.31 | 0.86 |

Gaussian number of victories | 30,454 | 69,546 |

Gaussian average error | 1.13 | 0.58 |

Results for

2 | 10 | |
---|---|---|

Uniform number of victories | 8253 | 91,747 |

Uniform average error | 1.32 | 0.19 |

Gaussian number of victories | 4369 | 95,631 |

Gaussian average error | 1.13 | 0.10 |

Number of times each value of

2 | 10 | |
---|---|---|

Uniform number of victories | 32,250 | 67,750 |

Uniform average error | 1.31 | 0.74 |

Gaussian number of victories | 10,859 | 89,141 |

Gaussian average error | 1.13 | 0.20 |

Number of times each value of

2 | 10 | |
---|---|---|

Uniform number of victories | 93,226 | 6774 |

Uniform average error | 1.31 | 0.74 |

Gaussian number of victories | 54,459 | 45,541 |

Gaussian average error | 1.13 | 1.09 |

Number of times each value of

2 | 3 | 4 | 5 | 10 | |
---|---|---|---|---|---|

Uniform # victories | 15,766 | 33,175 | 21,386 | 19,995 | 9678 |

Uniform average error | 0.78 | 0.47 | 0.55 | 0.52 | 0.50 |

Gaussian # victories | 13,262 | 64,870 | 10,331 | 9689 | 1848 |

Gaussian average error | 0.67 | 0.24 | 0.50 | 0.50 | 0.50 |

2 | 3 | |
---|---|---|

Uniform number of victories | 33,585 | 66,415 |

Uniform average error | 0.78 | 0.47 |

Gaussian number of victories | 18,307 | 81,693 |

Gaussian average error | 0.67 | 0.24 |

2 | 10 | |
---|---|---|

Uniform number of victories | 37,225 | 62,775 |

Uniform average error | 0.78 | 0.50 |

Gaussian number of victories | 37,897 | 62,103 |

Gaussian average error | 0.67 | 0.50 |

2 | 10 | |
---|---|---|

Uniform number of victories | 55,676 | 44,324 |

Uniform average error | 0.79 | 0.89 |

Gaussian number of victories | 24,128 | 75,872 |

Gaussian average error | 0.67 | 0.34 |

2 | 10 | |
---|---|---|

Uniform number of victories | 99,586 | 414 |

Uniform average error | 0.78 | 3.50 |

Gaussian number of victories | 95,692 | 4308 |

Gaussian average error | 0.67 | 1.45 |

Average flooring error for hotel ratings.

Average Error | 3 | 4 | |
---|---|---|---|

Overall | 1.04 | 0.31 | 0.15 |

Price | 1.07 | 0.27 | 0.14 |

Rooms | 1.06 | 0.32 | 0.16 |

Location | 1.47 | 0.42 | 0.16 |

Cleanliness | 1.43 | 0.40 | 0.16 |

Front Desk | 1.34 | 0.33 | 0.14 |

Service | 1.24 | 0.32 | 0.14 |

Business Service | 0.96 | 0.28 | 0.18 |

Number of times each

Minimal Error | 3 | 4 | |
---|---|---|---|

Overall | 235 | 450 | 1157 |

Price | 181 | 518 | 1143 |

Rooms | 254 | 406 | 1182 |

Location | 111 | 231 | 1500 |

Cleanliness | 122 | 302 | 1418 |

Front Desk | 120 | 387 | 1335 |

Service | 140 | 403 | 1299 |

Business Service | 316 | 499 | 1027 |

Number of times

# of Victories | 2 vs. 4 | 3 vs. 4 | |
---|---|---|---|

Overall | 243, 1599 | 277, 1565 | 5, 1837 |

Price | 187, 1655 | 211, 1631 | 4, 1838 |

Rooms | 275, 1567 | 283, 1559 | 10, 1832 |

Location | 126, 1716 | 122, 1720 | 11, 1831 |

Cleanliness | 126, 1716 | 141, 1701 | 5, 1837 |

Front Desk | 130, 1712 | 133, 1709 | 8, 1834 |

Service | 153, 1689 | 152, 1690 | 11, 1831 |

Business Service | 368, 1474 | 329, 1513 | 22, 1820 |

Average error using rounding approach.

Average Error | 3 | 4 | |
---|---|---|---|

Overall | 0.50 | 0.28 | 0.15 |

Price | 0.48 | 0.31 | 0.15 |

Rooms | 0.48 | 0.30 | 0.16 |

Location | 0.63 | 0.41 | 0.22 |

Cleanliness | 0.6 | 0.4 | 0.21 |

Front Desk | 0.55 | 0.39 | 0.21 |

Service | 0.52 | 0.36 | 0.18 |

Business Service | 0.39 | 0.36 | 0.18 |

Number of times

Minimal Error | 3 | 4 | |
---|---|---|---|

Overall | 82 | 132 | 1628 |

Price | 92 | 74 | 1676 |

Rooms | 152 | 81 | 1609 |

Location | 93 | 52 | 1697 |

Cleanliness | 79 | 44 | 1719 |

Front Desk | 89 | 50 | 1703 |

Service | 102 | 29 | 1711 |

Business Service | 246 | 123 | 1473 |

Number of times

# of Victories | 2 vs. 4 | 3 vs. 4 | |
---|---|---|---|

Overall | 161, 1681 | 113, 1729 | 486, 1356 |

Price | 270, 1572 | 101, 1741 | 385, 1457 |

Rooms | 344, 1498 | 173, 1669 | 575, 1267 |

Location | 275, 1567 | 109, 1733 | 344, 1498 |

Cleanliness | 210, 1632 | 90, 1752 | 289, 1553 |

Front Desk | 380, 1462 | 95, 1747 | 332, 1510 |

Service | 358, 1484 | 109, 1733 | 399, 1443 |

Business Service | 870, 972 | 278, 1564 | 853, 989 |

# victories and average rounding error,

Overall | Average error | 0.15, 0.21 |

# of victories | 1007, 835 | |

Price | Average error | 0.15, 0.17 |

# of victories | 955, 887 | |

Rooms | Average error | 0.15, 0.23 |

# of victories | 1076, 766 | |

Location | Average error | 0.22, 0.22 |

# of victories | 694, 1148 | |

Cleanliness | Average error | 0.21, 0.19 |

# of victories | 653, 1189 | |

Front Desk | Average error | 0.21, 0.17 |

# of victories | 662, 1180 | |

Service | Average error | 0.18, 0.18 |

# of victories | 827, 1015 | |

Business Service | Average error | 0.18, 0.31 |

# of victories | 1233, 609 |

Average flooring error for French election.

Average Error | 2 | 3 | 4 | 5 | 8 | 10 |
---|---|---|---|---|---|---|

Francois Bayrou | 3.18 | 1.5 | 0.94 | 0.66 | 0.3 | 0.2 |

Olivier Besancenot | 1.7 | 0.8 | 0.5 | 0.35 | 0.16 | 0.1 |

Christine Boutin | 1.15 | 0.54 | 0.34 | 0.24 | 0.11 | 0.07 |

Jacques Cheminade | 0.64 | 0.3 | 0.19 | 0.13 | 0.06 | 0.04 |

Jean-Pierre Chevenement | 3.69 | 1.74 | 1.09 | 0.77 | 0.35 | 0.23 |

Jacques Chirac | 3.48 | 1.64 | 1.03 | 0.72 | 0.33 | 0.21 |

Robert Hue | 2.39 | 1.12 | 0.7 | 0.49 | 0.22 | 0.14 |

Lionel Jospin | 5.45 | 2.57 | 1.61 | 1.13 | 0.52 | 0.33 |

Arlette Laguiller | 2.2 | 1.04 | 0.65 | 0.46 | 0.21 | 0.13 |

Brice Lalonde | 1.53 | 0.72 | 0.45 | 0.32 | 0.14 | 0.09 |

Corine Lepage | 2.24 | 1.06 | 0.67 | 0.47 | 0.22 | 0.14 |

Jean-Marie Le Pen | 0.4 | 0.19 | 0.12 | 0.08 | 0.04 | 0.02 |

Alain Madelin | 1.93 | 0.91 | 0.57 | 0.4 | 0.18 | 0.12 |

Noel Mamere | 3.68 | 1.74 | 1.09 | 0.77 | 0.35 | 0.23 |

Bruno Maigret | 0.31 | 0.15 | 0.09 | 0.06 | 0.03 | 0.02 |

Average rounding error for French election.

Average Error | 2 | 3 | 4 | 5 | 8 | 10 |
---|---|---|---|---|---|---|

Francois Bayrou | 1.65 | 0.73 | 0.91 | 0.75 | 0.48 | 0.62 |

Olivier Besancenot | 3.88 | 2.39 | 2.14 | 1.7 | 1.31 | 1.25 |

Christine Boutin | 3.87 | 2.39 | 1.84 | 1.5 | 0.9 | 0.86 |

Jacques Cheminade | 4.34 | 2.72 | 2.07 | 1.65 | 1.02 | 0.88 |

Jean-Pierre Chevenement | 1.47 | 0.65 | 1.2 | 0.82 | 0.55 | 0.61 |

Jacques Chirac | 1.64 | 1.0 | 1.13 | 0.88 | 0.55 | 0.64 |

Robert Hue | 2.51 | 1.27 | 1.14 | 1.09 | 0.67 | 0.77 |

Lionel Jospin | 0.33 | 0.49 | 0.87 | 0.67 | 0.51 | 0.63 |

Arlette Laguiller | 2.62 | 1.34 | 1.34 | 1.02 | 0.6 | 0.63 |

Brice Lalonde | 3.45 | 1.9 | 1.55 | 1.21 | 0.66 | 0.78 |

Corine Lepage | 2.89 | 1.59 | 1.56 | 1.16 | 0.79 | 0.87 |

Jean-Marie Le Pen | 4.92 | 3.26 | 2.55 | 2.06 | 1.39 | 1.2 |

Alain Madelin | 3.18 | 1.8 | 1.52 | 1.17 | 0.72 | 0.7 |

Noel Mamere | 2.02 | 1.55 | 1.77 | 1.44 | 1.29 | 1.41 |

Bruno Maigret | 4.88 | 3.23 | 2.46 | 1.99 | 1.28 | 1.1 |

Average flooring error for Jester dataset.

Average Error | 2 | 3 | 4 | 5 | 10 |
---|---|---|---|---|---|

Joke 5 | 0.57 | 0.53 | 0.52 | 0.51 | 0.5 |

Joke 7 | 1.32 | 0.88 | 0.74 | 0.66 | 0.54 |

Joke 8 | 1.51 | 0.97 | 0.8 | 0.71 | 0.56 |

Joke 13 | 2.52 | 1.45 | 1.09 | 0.91 | 0.61 |

Joke 15 | 2.48 | 1.43 | 1.08 | 0.91 | 0.62 |

Joke 16 | 3.72 | 2.01 | 1.44 | 1.16 | 0.69 |

Joke 17 | 1.94 | 1.18 | 0.92 | 0.8 | 0.58 |

Joke 18 | 1.51 | 0.97 | 0.79 | 0.71 | 0.56 |

Joke 19 | 0.8 | 0.64 | 0.58 | 0.56 | 0.51 |

Joke 20 | 1.77 | 1.1 | 0.87 | 0.76 | 0.57 |

Average rounding error for Jester dataset.

Average Error | 2 | 3 | 4 | 5 | 10 |
---|---|---|---|---|---|

Joke 5 | 0.48 | 0.47 | 0.48 | 0.47 | 0.48 |

Joke 7 | 1.2 | 1.2 | 1.2 | 1.2 | 1.2 |

Joke 8 | 1.44 | 1.43 | 1.42 | 1.43 | 1.42 |

Joke 13 | 2.43 | 2.43 | 2.43 | 2.42 | 2.42 |

Joke 15 | 2.34 | 2.34 | 2.33 | 2.33 | 2.33 |

Joke 16 | 3.59 | 3.58 | 3.57 | 3.57 | 3.57 |

Joke 17 | 1.84 | 1.82 | 1.82 | 1.81 | 1.81 |

Joke 18 | 1.45 | 1.44 | 1.44 | 1.44 | 1.44 |

Joke 19 | 0.72 | 0.72 | 0.71 | 0.71 | 0.71 |

Joke 20 | 1.65 | 1.63 | 1.63 | 1.63 | 1.63 |