Viewing last 25 versions of post by byte[] in topic Feature suggestions and discussion [READ THE FIRST POST]

byte[]
Solar Supporter - Fought against the New Lunar Republic rebellion on the side of the Solar Deity (April Fools 2023).
Non-Fungible Trixie -
Verified Pegasus - Show us your gorgeous wings!
Preenhub - We all know what you were up to this evening~
An Artist Who Rocks - 100+ images under their artist tag
Artist -

Philomena Contributor
"[@derpy727":](/meta/feature-suggestions-and-discussion/post/2932808#post_2932808

[bq]
)
 

>
There's literally no other way to find the original derpibooru page by a given image downloaded from Derpibooru except by heavyweight reverse image search (which doesn’t have a JSON API even).[/bq]
Technically, it actually does have a JSON API; POST with the image form-encoded in the image param, and set Accept to @`application/json@`, or append .json to the URL, or you can use it as a bookmarklet with the GET query param @`scraper_url@`. However, based on what you wrote, this probably wouldn't be very helpful to you.

[bq]

 

>
I'm not even sure what is the use-case for hashes of original submissions alone. [...] Maybe you're using it to detect duplicate submissions… no, scratch that.[/bq]
Actually, that's correct. SHA-512 prevents exact duplicate copies of images from being uploaded. This catches about 50% of our duplicate uploads; the rest are caught with a heavier-weight perceptual deduplication.

[bq]

 

>
But if you did, SHA512 of post-optimisation images would be even BETTER for detecting duplicates, because [...][/bq]
This is a faulty assertion. Optimized image data do not have a single "[normal form":](https://en.wikipedia.org/wiki/Normal_form_(mathematics)) that they will naturally and obviously be reduced to for every set of input pixels, and the output data and resultant hash are likely to be different for runs on differently-encoded inputs.

[bq]

 

>
And since I'm not quite confident in Derpibooru's reverse search algorithm (which is it, btw?)[/bq]
Homegrown. See "[here":](https://gist.github.com/liamwhite/b023cdba4738e911293a8c610b98f987).




 
**
ADDENDUM**: Cloudflare also messes with files downloaded from the site. Don't expect anything except large (>2MB) PNGs to have hashes matching those from our own collection.
 
 
As far as
I can tell, nothing I can personally do on my side would be helpful for you.
No reason given
Edited by byte[]
byte[]
Solar Supporter - Fought against the New Lunar Republic rebellion on the side of the Solar Deity (April Fools 2023).
Non-Fungible Trixie -
Verified Pegasus - Show us your gorgeous wings!
Preenhub - We all know what you were up to this evening~
An Artist Who Rocks - 100+ images under their artist tag
Artist -

Philomena Contributor
"@derpy727":/meta/feature-suggestions-and-discussion/post/2932808#post_2932808

[bq]There's literally no other way to find the original derpibooru page by a given image downloaded from Derpibooru except by heavyweight reverse image search (which doesn’t have a JSON API even).[/bq]Technically, it actually does have a JSON API; POST with the image form-encoded in the image param, and set Accept to @application/json@, or append .json to the URL, or you can use it as a bookmarklet with the GET query param @scraper_url@. However, based on what you wrote, this probably wouldn't be very helpful to you.

[bq]I'm not even sure what is the use-case for hashes of original submissions alone. [...] Maybe you're using it to detect duplicate submissions… no, scratch that.[/bq]Actually, that's correct. SHA-512 prevents exact duplicate copies of images from being uploaded. This catches about 50% of our duplicate uploads; the rest are caught with a heavier-weight perceptual deduplication.

[bq]But if you did, SHA512 of post-optimisation images would be even BETTER for detecting duplicates, because [...][/bq]This is a faulty assertion. Optimized image data do not have a single "normal form":https://en.wikipedia.org/wiki/Normal_form_(mathematics) that they will naturally and obviously be reduced to for every set of input pixels, and the output data and resultant hash are likely to be different for runs on differently-encoded inputs.

[bq]And since I'm not quite confident in Derpibooru's reverse search algorithm (which is it, btw?)[/bq]Homegrown. See "here":https://gist.github.com/liamwhite/b023cdba4738e911293a8c610b98f987.



As far as I can tell, nothing I can personally do on my side would be helpful for you.
No reason given
Edited by byte[]