What is Wilson score?

byte[]
Solar Supporter - Fought against the New Lunar Republic rebellion on the side of the Solar Deity (April Fools 2023).
Non-Fungible Trixie -
Verified Pegasus - Show us your gorgeous wings!
Preenhub - We all know what you were up to this evening~
An Artist Who Rocks - 100+ images under their artist tag
Artist -

Philomena Contributor
@The_Park
Let’s define the problem.
We want to know in an objective sense some sort of reasonable lower bound on how well people will rate an image on average, and we don’t currently have perfect information about how people will rate the image, just a small sample of people who voted on it. For the sake of convenience (and it is a big stretch), we will assume the sample is randomly selected from independent people who wanted to see that type of image. This gives us a statistical basis to use to start making inferences.
An obvious lower bound for how well people will rate an image is that 100% of them will downvote it. But that isn’t a very useful metric because it assigns the same score to all images, even though people do like some images more than others – it doesn’t provide any ranking information.
In order to help us find a more informative lower bound on the proportion of people who will like this image, we will need to incorporate some of the information about how people have ranked it so far. To do that, we will use our assumption of the people who voted on it so far being a random sample to help us. This observation is based off of the fact that when you take a bunch of random samples for the same image, with different people voting each time, the proportion of how they would vote on the image will mostly converge on a small range of probabilities centered around the true value. This is called the central limit theorem.
full
Because the central limit theorem lets us treat these sample averages as a normal distribution, we can use the math of the normal distribution to help us out. To do this, I will introduce some notation.
(p-hat, p with a circumflex) is the sample proportion. This is the number of upvotes divided by the total votes.
n is sample size, or how many people voted.
p is the population proportion. This is the value we don’t know (the true rating of the image), and the one we would like to find a lower bound for.
μ is the the average of all the sample proportions. If our samples are really random and independent, it should be equal to the population proportion p.
σ (Greek letter sigma) is the population standard deviation. In the picture above, we expect 68% of our samples to be between one standard deviation below and one standard deviation above the average.
In the above picture, we can see that a good lower bound for our true rating should be at μ – 3σ, because 99.85% of all of our samples will be above that value. However, if we knew what μ was, we wouldn’t need any of this stuff, because we could just use that as the rating! So we have to estimate based on . We will naively guess that = p, and the sample size will control our standard deviation. This should intuitively make sense, because if we choose an enormous sample size, and they are all randomly sampled, it is much more likely that we are close to the true value than if we only have a small number of people.
The formula for the sample standard error is expressed as
`
sqrt(p̂(1 - p̂))
s = —————
sqrt(n)
`
So our formula for the estimate for the value the true proportion will be above 99.85% of the time, the value we were interested in, is
`
sqrt(p̂(1 - p̂))
p̂ - 3 * –––––––
sqrt(n)
`
This almost works as intended, and it is pretty close to what we actually do, except it annoyingly doesn’t work correctly when the sample size is really small (when an image only has a few votes) or when the probability is really extreme (everyone liked or disliked it). This is what Edwin Bidwell Wilson’s scoring interval, which we abbreviate to the Wilson score interval, fixes.
For Wilson, there are modifications to the formula which greatly improve its stability in these extreme cases. If you want to read more about it, hit up the Wikipedia page here.
Note: Derpibooru actually is less pessimistic than this and uses a lower value that 99.5% of all possible probabilities are greater than, rather than 99.85%.
The Smiling Pony
Pixel Perfection - I still call her Lightning Bolt
Silly Pony - Celebrated the 13th anniversary of MLP:FIM, and 40 years of MLP!
Shimmering Smile - Celebrated the 10th anniversary of Equestria Girls!
Solar Hero - Went above and beyond for the Solar Deity, drawing from the power of the sun itself to bring balance to the fight against the Lunar Insurrection (April Fools 2023).
Roseluck - Had their OC in the 2023 Derpibooru Collab.
Flower Trio - Helped others get their OC into the 2023 Derpibooru Collab.
King Sombra - Celebrated the 10th anniversary of The Crystal Empire!
A Lovely Nightmare Night - Celebrated the 12th anniversary of MLP:FIM!
Princess of Love - Extra special version for those who participated in the Canterlot Wedding 10th anniversary event by contributing art.
Tree of Harmony - Drew someone's OC for the 2022 Community Collab

( ͠° ͟ʖ ͡° )
@agmistry  
The tl;dr is it scores images assuming the current up/down ratio trend continues to infinity, so an image of 300up/20down is actually “liked more” than an image of 400up/60down.
Rexton the III
Silly Pony - Celebrated the 13th anniversary of MLP:FIM, and 40 years of MLP!
Lunar Supporter - Helped forge New Lunar Republic's freedom in the face of the Solar Empire's oppressive tyrannical regime (April Fools 2023).
A Really Hyper Artist - 500+ images under their artist tag
The Power of Love - Given to a publicly verified artist with an image under their artist’s tag that has reached 1000 upvotes
Notoriously Divine Tagger - Consistently uploads images above and beyond the minimum tag requirements. And/or additionally, bringing over the original description from the source if the image has one. Does NOT apply to the uploader adding several to a dozen tags after originally uploading with minimum to bare tagging.
A Really Classy Artist - 250+ images under their artist tag
Hard Work - Merited Perfect Pony Plot Provider badge with only their own art
Perfect Pony Plot Provider - Uploader of 10+ images with 350 upvotes or more (Questionable/Explicit)
Best Artist - Providing quality, Derpibooru-exclusive artwork
An Artist Who Rocks -

One artist, two tags
It’s awesome because now when you see a picture that you personally don’t like in the trending window, your downvote actually matters! You can knock a picture down from a couple of places to several pages with a single click and you can rest easy no one else needs to see it either! It’s an awesome damage boost.
 
I’m being sarcastic of course but I don’t see the logic behind it. With the old system it was the same handful of artists that were competing for those spots. Now it’s 9/10 of the same artists with the odd score 200 picture thrown in that no one thought to downvote on. Personally I always found it something to aspire towards, you know, making something that people really like and getting recognition for it, but whatever.
 
The silver lining is that it does give more people a (slight) chance at the spotlight, even if it’s pretty much random now.
The Smiling Pony
Pixel Perfection - I still call her Lightning Bolt
Silly Pony - Celebrated the 13th anniversary of MLP:FIM, and 40 years of MLP!
Shimmering Smile - Celebrated the 10th anniversary of Equestria Girls!
Solar Hero - Went above and beyond for the Solar Deity, drawing from the power of the sun itself to bring balance to the fight against the Lunar Insurrection (April Fools 2023).
Roseluck - Had their OC in the 2023 Derpibooru Collab.
Flower Trio - Helped others get their OC into the 2023 Derpibooru Collab.
King Sombra - Celebrated the 10th anniversary of The Crystal Empire!
A Lovely Nightmare Night - Celebrated the 12th anniversary of MLP:FIM!
Princess of Love - Extra special version for those who participated in the Canterlot Wedding 10th anniversary event by contributing art.
Tree of Harmony - Drew someone's OC for the 2022 Community Collab

( ͠° ͟ʖ ͡° )
@Rexton the III  
The purpose of the “Trending Images” highlight isn’t a “rich get richer” pit, it’s to expose people to a variety of images they might like. So far we’ve found that the current scoring system provides a higher variety of quality content than the previous system and allows more artists a chance to get some exposure.
 
If you have a complaint I encourage you to do so with some empirical data and less passive-aggressive snark.
Rexton the III
Silly Pony - Celebrated the 13th anniversary of MLP:FIM, and 40 years of MLP!
Lunar Supporter - Helped forge New Lunar Republic's freedom in the face of the Solar Empire's oppressive tyrannical regime (April Fools 2023).
A Really Hyper Artist - 500+ images under their artist tag
The Power of Love - Given to a publicly verified artist with an image under their artist’s tag that has reached 1000 upvotes
Notoriously Divine Tagger - Consistently uploads images above and beyond the minimum tag requirements. And/or additionally, bringing over the original description from the source if the image has one. Does NOT apply to the uploader adding several to a dozen tags after originally uploading with minimum to bare tagging.
A Really Classy Artist - 250+ images under their artist tag
Hard Work - Merited Perfect Pony Plot Provider badge with only their own art
Perfect Pony Plot Provider - Uploader of 10+ images with 350 upvotes or more (Questionable/Explicit)
Best Artist - Providing quality, Derpibooru-exclusive artwork
An Artist Who Rocks -

One artist, two tags
@The Smiling Pony  
Sorry. I was trying to point out a problem, not trying to be a dick.
 
I’m not saying the old way was necessarily better or that this is worse. Just that the new way makes the downvote disproportionately powerful. If you imagine a scenario where two score 200 images are about to trend, but then one gets a downvote and doesn’t make it while the other one doubles in score due to the exposure, well, it’s not exactly “fair”.
 
And these things are always a rich-get-richer-pit, but it’s not something that can or even should be avoided. You can’t really tweak popularity out of a popularity contest.
 
Either way, it’s good to see more variety in the trending window.
Background Pony #97D1
So if I get that right, it is measured across all images similar to the one being posted, right?  
What I always wondered was if a big fellowship of an artist will pollute the voting system, since most people that follow someone will most likely upvote it whether they like it or not, right? (my assumption)  
Are there numbers for this? Did anyone factor stuff like this in? Does it even make sense to do so or would it be just normalized in the sample data anyway?
 
It’s my subjective observation that popularity pollutes image quality to some degree (and of course other factors like current meme train or whatever too).
 
Anyway, super glad that you try to give people a chance to be seen, it is pretty difficult getting some traction for newbies.
xbi
Fine Arts - Two hundred uploads with a score of over a hundred (Safe/Suggestive)
Da Magicks! - Merited Fine Arts badge with only their own art
The Power of Love - Given to a publicly verified artist with an image under their artist’s tag that has reached 1000 upvotes
A Really Hyper Artist - 500+ images under their artist tag
Best Artist - Providing quality, Derpibooru-exclusive artwork
A Really Classy Artist - 250+ images under their artist tag
An Artist Who Rocks - 100+ images under their artist tag
Artist -

I am skeptic to Wilson score for image ranking at Derpi.
 
First, even if all-knowing oracle gives us the perfect exact value of the estimated downvote/upvote ratio with super-narrow interval it is not so useful for sorting: if we have boring image with 0 downvote ratio but result upvote count is 5 and 0 downvotes (and 0.013456% downvotes exact estimate from oracle), it is not useful to rank it better than interesting image with 1000 upvote and 10 downvotes (1% downvotes, here both calculated interval and oracle estimates are pretty near)
 
Second, it is not so meaningful to compare left confidence interval bounds, it has no any strict math meaning. It could be useful to compare right bound to the left bound to have some confidence in value difference, but I don’t know any meaning of left-with-left or right-with-right comparison.
 
Third, usually downvote count is pretty small for good images (ones, often zero), so the bottom of the confidence interval become very-very sensitive to number of downvotes. So, it become just some magic formula that punishes hard picture ranking for small number of downvotes.
 
Maybe the estimate of upvote to registered view count could be more useful. Thsi value useful value for image quality ranking. If we have another oracle that tells us exact chance of image upvote after view, it would be much more useful than oracle which tells us downvote/upvote ratio to judge about image quality without random popularity boosts (like ‘featured art’ or links from powerfull twitter bloggers)
 
Also, any ranking do not solve the problem ‘richer-get-richer’. This specific Wilson score do not solve this problem at all, it just changes definition of ‘rich’, making ‘richer’ images with non-controversial topics and artists who are loved, not hated.
 
Also, Derpibooru has not so much problem with popularity artist boost as Twitter or DeviantArt. This is specific of derpi, that image statistics much more depends on image content than artist popularity.
🐴
Magnificent Metadata Maniac - #1 Assistant
Solar Guardian - Refused to surrender in the face of the Lunar rebellion and showed utmost loyalty to the Solar Empire (April Fools 2023).
Non-Fungible Trixie -
Magical Inkwell - Wrote MLP fanfiction consisting of at least around 1.5k words, and has a verified link to the platform of their choice

IRL 🎠 stallion
@xbi  
In regards to your final paragraph, small artists aren’t buried as easily because users here follow tags rather than only following artists. A small artist who draws good art on a popular tag will soon get more views.
Rexton the III
Silly Pony - Celebrated the 13th anniversary of MLP:FIM, and 40 years of MLP!
Lunar Supporter - Helped forge New Lunar Republic's freedom in the face of the Solar Empire's oppressive tyrannical regime (April Fools 2023).
A Really Hyper Artist - 500+ images under their artist tag
The Power of Love - Given to a publicly verified artist with an image under their artist’s tag that has reached 1000 upvotes
Notoriously Divine Tagger - Consistently uploads images above and beyond the minimum tag requirements. And/or additionally, bringing over the original description from the source if the image has one. Does NOT apply to the uploader adding several to a dozen tags after originally uploading with minimum to bare tagging.
A Really Classy Artist - 250+ images under their artist tag
Hard Work - Merited Perfect Pony Plot Provider badge with only their own art
Perfect Pony Plot Provider - Uploader of 10+ images with 350 upvotes or more (Questionable/Explicit)
Best Artist - Providing quality, Derpibooru-exclusive artwork
An Artist Who Rocks -

One artist, two tags
Wow, that’s an old thread necro. FWIW I still stand behind my criticism.
 
In the earlier system a score 300 image had a chance at trending depending on the competition. Now for a no-name artist getting 5 or so downvotes pretty much guarantees it can’t get there. More popular artists can eat the hit, so they are largely unaffected.
 
TL;DR it makes downvotes disproportionately powerful. The main upside is that around score 200 images can get more exposure, but only if they receive no downvotes.
Interested in advertising on Derpibooru? Click here for information!
Pony Arts & Prints!

Help fund the $15 daily operational cost of Derpibooru - support us financially!

Syntax quick reference: **bold** *italic* ||hide text|| `code` __underline__ ~~strike~~ ^sup^ %sub%

Detailed syntax guide