In the earlier system a score 300 image had a chance at trending depending on the competition. Now for a no-name artist getting 5 or so downvotes pretty much guarantees it can’t get there. More popular artists can eat the hit, so they are largely unaffected.
TL;DR it makes downvotes disproportionately powerful. The main upside is that around score 200 images can get more exposure, but only if they receive no downvotes.
@The_Park
Let’s define the problem.We want to know in an objective sense some sort of reasonable lower bound on how well people will rate an image on average, and we don’t currently have perfect information about how people will rate the image, just a small sample of people who voted on it. For the sake of convenience (and it is a big stretch), we will assume the sample is randomly selected from independent people who wanted to see that type of image. This gives us a statistical basis to use to start making inferences.An obvious lower bound for how well people will rate an image is that 100% of them will downvote it. But that isn’t a very useful metric because it assigns the same score to all images, even though people do like some images more than others – it doesn’t provide any ranking information.In order to help us find a more informative lower bound on the proportion of people who will like this image, we will need to incorporate some of the information about how people have ranked it so far. To do that, we will use our assumption of the people who voted on it so far being a random sample to help us. This observation is based off of the fact that when you take a bunch of random samples for the same image, with different people voting each time, the proportion of how they would vote on the image will mostly converge on a small range of probabilities centered around the true value. This is called the central limit theorem.Because the central limit theorem lets us treat these sample averages as a normal distribution, we can use the math of the normal distribution to help us out. To do this, I will introduce some notation.p̂ (p-hat, p with a circumflex) is the sample proportion. This is the number of upvotes divided by the total votes.
n is sample size, or how many people voted.
p is the population proportion. This is the value we don’t know (the true rating of the image), and the one we would like to find a lower bound for.
μ is the the average of all the sample proportions. If our samples are really random and independent, it should be equal to the population proportion p.
σ (Greek letter sigma) is the population standard deviation. In the picture above, we expect 68% of our samples to be between one standard deviation below and one standard deviation above the average.In the above picture, we can see that a good lower bound for our true rating should be at μ – 3σ, because 99.85% of all of our samples will be above that value. However, if we knew what μ was, we wouldn’t need any of this stuff, because we could just use that as the rating! So we have to estimate based on p̂. We will naively guess that p̂ = p, and the sample size will control our standard deviation. This should intuitively make sense, because if we choose an enormous sample size, and they are all randomly sampled, it is much more likely that we are close to the true value than if we only have a small number of people.The formula for the sample standard error is expressed as
`
So our formula for the estimate for the value the true proportion will be above 99.85% of the time, the value we were interested in, is
`
This almost works as intended, and it is pretty close to what we actually do, except it annoyingly doesn’t work correctly when the sample size is really small (when an image only has a few votes) or when the probability is really extreme (everyone liked or disliked it). This is what Edwin Bidwell Wilson’s scoring interval, which we abbreviate to the Wilson score interval, fixes.For Wilson, there are modifications to the formula which greatly improve its stability in these extreme cases. If you want to read more about it, hit up the Wikipedia page here.Note: Derpibooru actually is less pessimistic than this and uses a lower value that 99.5% of all possible probabilities are greater than, rather than 99.85%.
If you do not specify a field to search over, the search engine will search for posts with a body that is similar to the query's word stems. For example, posts containing the words winged humanization
, wings
, and spread wings
would all be found by a search for wing
, but sewing
would not be.
Field Selector | Type | Description | Example |
---|---|---|---|
author | Literal | Matches the author of this post. Anonymous authors will never match this term. | author:Joey |
body | Full Text | Matches the body of this post. This is the default field. | body:test |
created_at | Date/Time Range | Matches the creation time of this post. | created_at:2015 |
id | Numeric Range | Matches the numeric surrogate key for this post. | id:1000000 |
my | Meta | my:posts matches posts you have posted if you are signed in. | my:posts |
subject | Full Text | Matches the title of the topic. | subject:time wasting thread |
topic_id | Literal | Matches the numeric surrogate key for the topic this post belongs to. | topic_id:7000 |
topic_position | Numeric Range | Matches the offset from the beginning of the topic of this post. Positions begin at 0. | topic_position:0 |
updated_at | Date/Time Range | Matches the creation or last edit time of this post. | updated_at.gte:2 weeks ago |
user_id | Literal | Matches posts with the specified user_id. Anonymous users will never match this term. | user_id:211190 |
forum | Literal | Matches the short name for the forum this post belongs to. | forum:meta |