Delete

Ande

Sorry for not replying right away, but I had to fool with the program for a little bit to see if I could come up with any additional input. Right now the only major thing that needs addressing is where the programs gets hung up on a specific file. I’ve seen a file be on the list for a good half hour or more, before I finally had enough, closed the program, and restarted it. So far that appears to be the only way to fix that kind of problem.
 
Also the program doesn’t download at full speed, so I run multiple instances of the same program, and download other tags to work around it. A bit annoying, but after modifying the output folder option, at least the multiple instances are sharing the same pool. Also, as far as readability goes, I’ve actually tried to copy/paste the settings from the download site, and after running the program, it converted everything back into it’s standard squashed togetherness. Not the worse thing in the world, but still, it’s something that will be a likely target for trouble, especially if you accidentally delete one of those invisible spaces that’s supposed to represent a new line.
 
Did try to bunch a lot of tags together to narrow the search results, and the good news is that it will do it, it all just has to be on the same line, and each tag separated by a comma…the same as if you were searching for something on the site. I didn’t see any mention of this either from the download site, or in the message of the .txt file, so mentioning that may be of use. Speaking of which ever heard of the Pixiv batch downloader? Instead of editing the .txt file it allows you to just enter the search term in the program, making things a little more easier. Just for user friendliness this may be a good thing to implement.
 
Also can you please make it so that instead of the program closing when it’s done, that it instead defaults back to a main menu giving the user an option of what to do next? This way instead of reopening a file it makes it easier to supply additional search terms if there are a lot of tags that one wants to download (example: artists).
 
I’m not saying that you have to make your program exactly like the Pixiv batch downloader, but I am saying that they are certain features of the program that are defiantly worth copying into yours. Also I believe the programmer of this nifty little program has a source code available if you need to cheat a little, so feel free to take a look at it. Here’s a link to the most current release of this program.
 
https://nandaka.wordpress.com/2014/10/06/pixiv-downloader-20141006/
 
Also one more program that may or may not be worth looking at; it is the Imgbrd-Grabber. So far it uses the booru on rails system to get Derpibooru to work, but unfortunately it has never worked, a least not for the time I’ve ever used the program. I’m not sure what use it will be too you, but decided to mention it just in case.
 
https://code.google.com/p/imgbrd-grabber
 
Anyway that’s all that I could think of to mention right now, and I hope that some of what I said will be of use in future development of this program. So far it’s the only one that shows serious promise of working despite it’s current flaws…I just hope the stuff I’m mentioning isn’t out stepping your skill set. I’ve actually taken a look at the pony downloader, and to date, I can’t even figure out how to get it to work! Programming is just not a strength of mine despite how good I may be at fixing computers, and getting them to run again.
 
https://github.com/NHOrus/ponydownloader
misspelledletter
Emerald -
The End wasn't The End - Found a new home after the great exodus of 2012

@Ande  
I’ve updated derpibooru-dl to include a text based menu.  
What text editors have you tried using? I use Notepad____, and I know Microsoft Notepad does not work with certain files due to the formatting (It’s probably to do with how new lines are handled).
 
I don’t know how to fix that problem where the script just stops working sometimes when requesting a remote resource, and if anyone has an idea on how to fix it I’d love to know how.  
As far as I know it’s always when it’s doing a web request of some sort.
 
Yeah I know about pixivutil, I learned how to do settings files from it.
 
About speeding it up:  
The script is designed to be left running in the background, possibly for days or weeks at a time for big jobs.  
It only opens a single connection at a time, and changing this would be pretty difficult.  
I’ve talked to the admins and due to their requests and the difficulty of actually doing it, I have no plans to write code to request more than one URL at a time (Multithreading is hard).  
I don’t plan to do anything to disable the ability to run multiple instances of the script that do different things, but don’t plan to give consideration to it during development either (Either would mean lots more work).
 
Your old settings will be overwritten if you run the new version, make sure to back them up if you want to copy your settings over.  
(I changed the layout of the settings, but the same setting should do the same thing)
notusingthisanymoreImdonebye

Little question:  
If someone wanted to download a majority of images from derpibooru (as an example, everything except memes and screencaps), how much disk space would be needed? I know that it would be very much, and would take lots of time, but it would be kinda interesting to know how much space all these images would take.
misspelledletter
Emerald -
The End wasn't The End - Found a new home after the great exodus of 2012

@sammy0205  
Here’s my output folder stats, at around 440GB.  
I’ve saved pretty much everything on the site.  
The space required will probably be slightly less for normal users, as comment saving is disabled by default.  
full
Ande

How do you get the tags to appear? I switched to the long file name format, but that didn’t appear to work. Basically I would at least like to have the artist, and rating tags so as to better sort through the images. Also what are the deleted_submissions.txt file for? A black list?
misspelledletter
Emerald -
The End wasn't The End - Found a new home after the great exodus of 2012

@Ande  
The long file name option is unsupported, and may not even be implimented.  
At some point I may get around to figuring out how to get it to work.  
I use a windows system which means that the short filenames are much more reliable.  
Windows does not let you have file paths longer than 255 characters, e.g. (“C:\Documents and settings\Your Name\My Documents\Derpibooru_dl\Downloads\A_whole bunch_of tags.jpg”)  
This means that there could be issues with names being too long, not only during download but also if you tried to move them.  
Some of the code that detects for previously downloaded stuff also relies on the current filename system and would need to be rewritten.
 
deleted_submissions.txt is a record of submissions the script has seen that were deleted, this is to allow them to be skipped if the ID is seen again.
 
By default the script should save to a folder with the name of the tag/query located at <The folder derpibooru_dl is in>\download<Tag>\
 
 
I’ve never made a video tutorial before, is there any software you recommend for recording it?
 
I might try making something using screencaps in a little while?
 
It’s a bit late so I might not get around to doing this for another day or so, hopefully I’ll be able to have a go at addressing these issues tonight though.
PrinceCadance
Not a Llama - Happy April Fools Day!

@misspelledletter  
Hm. Now I got it downloading something. I’m downloading the tag artist:dimwitdog it thinks theres only 80 images. but theres like 450. So far I’ve noticed its’ not doing anything tagged explicit. Does it only download safe and questionable?  
full It also thinks my api is invalid.
misspelledletter
Emerald -
The End wasn't The End - Found a new home after the great exodus of 2012

It’s probably the API key causing this, or if that is correct the options on your account might be hiding explicit images.  
I’m writing some improvements to the key checking code and should be releasing a new version later today when I get back home.  
Also I’ll probably try to do a guide.
JP
Pixel Perfection - I still call her Lightning Bolt
Silly Pony - Celebrated the 13th anniversary of MLP:FIM, and 40 years of MLP!
Shimmering Smile - Celebrated the 10th anniversary of Equestria Girls!
Solar Guardian - Refused to surrender in the face of the Lunar rebellion and showed utmost loyalty to the Solar Empire (April Fools 2023).
Roseluck - Had their OC in the 2023 Derpibooru Collab.
King Sombra - Celebrated the 10th anniversary of The Crystal Empire!
A Lovely Nightmare Night - Celebrated the 12th anniversary of MLP:FIM!
Princess of Love - Extra special version for those who participated in the Canterlot Wedding 10th anniversary event by contributing art.
Elements of Harmony - Had an OC in the 2022 Community Collab
Non-Fungible Trixie -

I miss the show so much
I successfully used derpibooru-dl (derpibooru_dl-2014-10-28.zip) to mass download some images earlier. I tested it under Windows 7 with Python 2.7.8, directly from the command line, no GUI used. The program works well, it even resumes downloading, but here are some future improvement ideas.
 
  • The Windows and Unix versions of the program could be combined very easily. Most of the differences are simply how you build pathnames. But you don’t have to do it differently: just always use the os.path* functions, or simply use the / separator everywhere, it works under Windows just fine.
     
  • If there are genuine platform differences that you must handle, put them in their separate files (like windows.py and unix.py), then detect the platform and import the correct file (import <platform> as foo, then call foo.bar(...)).
     
  • In addition, the Windows and Unix versions seem to do slightly different things. The Unix version does not have a menu, uses different names for different configuration items (the whole config_handler class), and even has bugs (Unix filesystems DO have maximum filename lengths, you need to handle output_long_filenames on the Unix side too). It seems to me that the Windows version is more recent?
     
  • The program could be split into several smaller programs that each do one thing. One downloads by search query, one downloads by range, and so on. The common parts of them should be put in a file that gets imported in each program. Currently some of the tools (sort_dl_list, find_duplicates, etc.) already import derpibooru_dl, so this shouldn’t be a terribly hard thing to do. Plus, small programs would also adhere to the Unix philosophy better :-)
     
  • There should be a database of sorts that stores past downloads and allows you to rerun those queries. The program should keep track of multiple unfinished download jobs and offer me a chance to continue/abort any of them. Instead of relying separate files to store the download status, you could consider using a real database, like SQLite, to keep track of everything. This would make the program more robust: no million JSON files needed. If you use SQLite, one could use any of the several sqlite admin tools to manipulate the database if needed.
     
  • The “press Enter to close window” is useless when running from the command line.
     
    The filename length limit is a real problem in all systems. It’s impossible to cram everything in there, no matter what. One idea is to prioritize the tags, like always having the artist name plus the rating, and then adding as many tags as you can fit.
Background Pony #88BC
IIRC, jpg has a meta-data field for keywords, derpibooru tags could potentially be stored there as opposed to the file name. Not sure about png though.
misspelledletter
Emerald -
The End wasn't The End - Found a new home after the great exodus of 2012

I’m looking at the crossplatform stuff now, and am trying to figure out how to handle filepath limitations.  
I’m removing the *nix file, the windows targeted one should be crossplatform.
 
SQL sounds like lots of work to implement for little gain, and I intend to put this off as long as I can if I do it at all.  
SQL would (AFAIK) need to have every field handled at download time, while with the current method of saving raw JSON only the fields only needs me to care in the future if I want to process the data.  
The JSON method also has the nice advantage of being forward compatible.
 
I’ll have to think about the splitting the script into seperate modules, as I’m considering adding support for command line arguments.  
I suppose I could do a quick hacky implimentation of it by writing tiny scripts that call just one download function, or scripts that edit the settings file to set downloads options before running in batch mode
 
Reformatting filenames is something I’ll have to think about to make sure nothing gets broken, but forcing “artist:foo” and “rating:foo” tags to go right after the ID in a filename might be doable.
 
I believe there is an option in the settings file to disable holding the  
window open.
 
I have no desire to muck around with changing the internal bits of image files just to put the tags in an internal field. It’s jsut far too much work for little to no gain. If you really want it you are welcome to fork the project. (Others have already done just that to add/fix things)
 
TL;DR:  
Priorities are currently  
0 Keeping it working  
1 Crossplatform filepaths  
2 CLI arguments & code to do with CLI usability  
3 Possible splitting into seperate modules  
4 ???  
10 SQL  
9999 Changing image file internal metadata
 
 
I’ve got a lot going on IRL at the moment, so any big features will probably have to wait for a few weeks.
Taivastiuku
The End wasn't The End - Found a new home after the great exodus of 2012

@misspelledletter  
You could store the data in a dictionary while you load from the API and dump it as a single json file when you’re done. Derpibooru data for all the images with score more than 10 fits in ~2GB of memory. This would also help with deduplication of fetched data.
 
images = {}  
response = requests.get(DERPIBOORU_API, QUERY_PARAMETERS)  
images_data = json.loads(requests.content)  
for image in images_data:  
____images[image["id_number"]] = image
 
If you get the same image twice for some reason it is just overwritten in the dictionary. When you’re done just save the file.
 
with open("derpibooru_data.json", "w") as outfile  
____outfile.write(json.dumps(images))
 
If you have little memory or download eg. the whole Derpibooru dataset you might want to use some database. There are however packages like Shove that implement dictionary -like interface for sqlite3 and other file formats.
 
As for tags I think you should prioritize:  
  1. rating (safe, questionable, suggestive, explicit, gotesque, grimdark)  
  2. artist  
  3. ponies pictured
     
    Here is a set of ponies I’ve complied from some image tagged with ‘everypony’:  
    ponies = set("ambrosia,angel bunny,apple bloom,apple strudel,applejack,aunt orange,babs seed,berry punch,big macintosh,blinkie pie,bon bon,braeburn,breezie,bulk biceps,button mash,care package,carrot cake,cerberus,cheerilee,cheese sandwich,chief thunderhooves,cindy block,cloudchaser,cloudy quartz,clyde pie,coco pommel,cookie crumbles,cup cake,daisy,daisy jo,daring do,derpy hooves,diamond tiara,discord,dj pon-3,doctor caballeron,doctor whooves,donut joe,dumbbell,featherweight,filthy rich,flam,flash sentry,fleetfoot,fleur-de-lis,flim,flitter,flower wishes,fluttershy,gilda,gizmo,granny smith,gummy,gustave le grande,harry,hoity toity,hondo flanks,hoops,horse md,hugh jelly,hydra,igneous rock,inkie pie,irma,iron will,king sombra,lightning dust,lily,lily valley,limestone pie,little strongheart, lucy packard,lutece twins,lyra,mane goodall,marble pie,maud pie,mayor mare,morton saltworthy,mr breezy,mr. waddle,mr. zippy,ms. harshwhinny,mulia mild,night light,nurse redheart,nurse sweetheart,octavia,opalescence,owlowiscious,pearl,photo finish,pie,pinkie pie,pipsqueak,poindexter,post haste,pound cake,prim hemline,prince blueblood,princess cadance,princess celestia,princess luna,princess twilight,pumpkin cake,queen chrysalis,rainbow dash,rarity,robert lutece,rosalind lutece,roseluck,rover,royal guard,rumble,sapphire shores,scootaloo,screw loose,seabreeze,sheriff silverstar,shining armor,silver shill,silver spoon,snails,snips,snowflake,soarin',spike,spitfire,spot,stormwalker,sue pie,sunset shimmer,suri polomare,sweetie belle,sweetie drops,sweetiecorn,tank,trenderhoof,trixie, truffle shuffle,twilight sparkle,twilight velvet,twist,vinyl scratch,wild fire,wings,winona,zecora,zippoorwhill".split(","))
misspelledletter
Emerald -
The End wasn't The End - Found a new home after the great exodus of 2012

@Taivastiuku  
Thanks for the tips, I think I’ll probably look over the stuff you suggested in the next few days.
 
Also any tag that starts with “OC:” can probably be treated as a character tag.  
I think it’d be best to store rules for tags somewhere other than the main code, as they need to be easy for users to change.  
Maybe a .txt file for each rule, with each line holding a tag that the rule is applied for?
 
I think that rating and artist tags can be done without a list, but it might be a good idea to have an extra category of tags that get priorotised in filename restructuring
 
Character_tags.txt, user_tags.txt, etc
 
Perhaps assigning a numerical value to each ruleset that can be changed in the settings?
 
ie  
[filename generation]  
rating_priority = 1  
artist_priority = 2  
character_priority = 3  
user_list priority = 4  
unspecified_priority = 5
 
Except users will probably set more than one thing to a given integer and it’d be a hassle to code.
 
I can’t do dev work from this laptop, I need my nice big screens for reading code and the docs and my IDE to test things to code productively.  
I can barely do forum replies on this thing.
 
There is absolutely no way that I am loading 300megs+ of data into ram without a good reason.  
I understand that storing the .JSON files as they are has drawbacks, but it does not use extra RAM like your suggestion would. The big advantage of storing data as I am is low ram & disk IO overhead. Maybe it would work better storing metadata in a seperate location so only one copy is needed, but I’m not breaking backwards compatibility for downloads without a good reason, and changing the metadata handling code could easily do that.  
Maybe storing a few hundred/thousand submissions worth of data together would work, but that might be troublesome if you aren’t using the range download mode.  
I’m happy to discuss the pros and cons of each method of storing metadata.  
Even if you’ve got 16 gigs of ram, you don’t want a batch downloader using 10% of it, not to mention the need to load and save the data to/from disk.  
I hope to look at that database library tomorrow.
 
Also I really need to write some unit tests, so far I’ve done bugger all testing beyond having a look at the logs if something crashes.
 
I don’t promise to do any of this, but ideas and suggestions are welcome.
misspelledletter
Emerald -
The End wasn't The End - Found a new home after the great exodus of 2012

@PrinceCadance  
Sorry to hear you’re having problems.  
I plan to write more API key debug code when my main comp is back online.  
(Showing the first and last few letters of the key while keeping the middle secret, for example)  
Until then, I’ve got a few questions that might help fix this.
 
Could you tell me what program you’re using to edit the settings file?
 
Do the first and last characters of the key in the settings file the same as on the site? You may have accidentally missed one at the end.  
Keys are supposed to be fixed in length based on what the site admins have said. If the checks say yours is 19 this is a likely cause.
 
Does the key have a single space between it and the equals sign, like “api_key = KEYKEYKEY”?  
(I don’t know if it matters, but it might. I can’t check if it does right now)
PrinceCadance
Not a Llama - Happy April Fools Day!

@PrinceCadance
Sorry to hear you’re having problems.
I plan to write more API key debug code when my main comp is back online.
(Showing the first and last few letters of the key while keeping the middle secret, for example)
Until then, I’ve got a few questions that might help fix this.
Could you tell me what program you’re using to edit the settings file?
Do the first and last characters of the key in the settings file the same as on the site? You may have accidentally missed one at the end.
Keys are supposed to be fixed in length based on what the site admins have said. If the checks say yours is 19 this is a likely cause.
Does the key have a single space between it and the equals sign, like “api_key = KEYKEYKEY”?
(I don’t know if it matters, but it might. I can’t check if it does right now)
 
Notepad. And no I didn’t miss anything. I can hand type it if that helps. I can PM you the API. Unless it’s supposed to be secret.
misspelledletter
Emerald -
The End wasn't The End - Found a new home after the great exodus of 2012

API Keys are definitely supposed to be secret, you should treat them like your password.  
You can’t actually change an API key yourself, an admin needs to do it and I’m not sure how hard it is for them to do.  
That’s why I’m only willing to include small parts of a key in log messages.
dkarm

I’ve just finished updating my script to be easier to use, please try it and tell me if there is anything you find difficult.
(Except the text only stuff, I don’t have the time to learn to make fancy graphical interfaces)
The problem with downloading the .exe version should be fixed.
 
Thanks very much! You are greate! Its very usefull and realy working!
Interested in advertising on Derpibooru? Click here for information!
KilianKuro Commissions!

Help fund the $15 daily operational cost of Derpibooru - support us financially!

This topic has been locked to new posts from non-moderators.

Locked

Lock reason: Locked by request of original poster.