"Pure" list removes rating distortion
"Pure" list is computed to remove the distortion that may affect the main rating list.
Distortion appears when several versions or settings of the same engine are included together in the testing study.
Suppose you have engine A and several versions of engine B: B1, B2, B3.
Suppose also that A is particularly strong versus any version of B,
which often happens in real testing because of some characteristics of those engines.
In such case A will have higher rating, comparing to the study where only one version of B is present.
Same thing may happen when A is weak versus B, getting lower rating.
To remove that distortion, a separate game database is constructed from games
played only by best version in each engine "family".
To save some space and time, pure database has all moves stripped out, it contains PGN header and results only.
Then the "Pure list" is computed based for that "pure" database using Bayeselo.
Pure database download
To save space, pure database has all moves stripped out, it contains PGN header and results only. This pure database is useful only for rating calculation or similar analysis, it does not have actual games, only the results.
Explanation of the columns
"Rank" — 1 is best, 2 is second best, etc.. It's simple.
"Engine" — Name and version of an engine.
"ELO" — Engine rating computed with Bayeselo.
This column has also a number in brackets, which shows the difference
between "Pure" rating and rating computed for complete database.
For example "2850 (+10)" in the ELO column means that engine's "pure" rating is 2850, which is
10 points higher than its rating in the complete list.
"+" and "−" — 95% confidence intervals.
For example, if engine's rating is 2850, "+" is +20 and "−" is −15, it means that
there is only 5% estimated probability that engine's "true" rating is outside of the
[2850−15 .. 2850+20] range.
"Score" — Number of points scored by an engine, divided by the number of games.
Win is 1 point, draw is 1/2 of a point, and loss is 0.
Please note that this is computed for "pure" database, so the numbers are different from the main list.
"Average Opponent" — Difference between the rating of engine tested and average of the opponent ratings
for all games played by that engine. (Only games from the "pure" database were counted).
Positive number means that engine was playing with stronger opponents, averagely. Negative number - weaker opponents.
"Draws" — Percentage of games by an engine, that ended in a draw.
(Only games in "pure" database are counted).
"Games" — Total number of games played by an engine.
(Only games in the "pure" database are counted).
The detailed explanation how we construct the "pure" list:
1. We have to find the best versions in each engine family.
We can't use the "Best versions" list for that,
because the "Best versions" list may be affected by distortion which we are trying to remove.
To find the true best version in a family of engines we create separate game database,
containing only games by engines from that family.
Then we compute the ratings for that small database and take the highest rated engine as best,
to represent that family in the "pure" list.
There is also a requirement that every engine in the "pure" list must have at least 150 games
played with other "pure" engines, and it also must be a public release, not beta or private version.
2. After finding a set of "pure" best versions,
we exctract all games where both side engines are from that set,
and those games form a "pure" database.
Pure list is simply a rating list computed for that database using Bayeselo.
Features of the pure list
First thing that you have to realize about the "pure" list is that it is not necessarily
more relevant than the big list of all versions.
"Pure" list removes one kind of distortion - distortion that may occur from multiple version of same engine.
But the price for that is big - the "pure" database is several times smaller than complete database.
This results in much larger statistical error, as you can see in the + / - columns.
Also, the "pure" list can still have other types of distortion -
distortion resulting from too small (including 0) or too large number of games in particular pairs.
So, don't take this list as certainly superior to the "Best versions" list.
This list does not substitute the "Best versions" list,
but simply provides a different view for those who may be afraid of distortions.
It is possible though that in time this list will become clearly superior,
when the "pure" database will be large enough.
Please also realize that some engine version being listed in the "Best versions"
list does not guarantee that the same version will be listed in the "Pure" list.
Most often it will be the case,
but theoretically it is possible that different version
will turn out to be the best in the "pure" context.