Both GUI and command line versions of the program.Automatically detects and adapts output for the OS. Can output linux (bash) output, or windows (batch file) output.Output is generated while the filesystem is scanned so there is usable output even if you get a partial run.The program outputs a runnable script that contains the necessary file delete commands, so you can review/ammend before you run.Pattern matching of files is performed by using SHA256, not MD5.You can save all the input parameters into a configuration file, so you don't have to keep typing them in -).The program can save the results of your duplicate finding run in a XML database file - and it can use the data on subsequent runs.p /my/directory/savefiles -p /my/second/directory/save) You can 'preserve' one or more directories (see below) (e.g.i /my/directory -i /my/second/directory -i /my/third/directory) You can specify one or more directories to scan (e.g.Its fast: in tests, it takes around 20 mins to scan about 300Gb of files (thats around 30,000 files).The initial version of the program is a simple command line python program - I'm intending to put a GUI version here, as well as a compiled windows EXE in time. After some experimentation, the resulting program worked through the 300Gb in around 20 minutes. I'd created a duplicate file finding program in GAMBAS a while ago, but I thought it's time to re-code this, and do this in Python 3. fslint just gave a list of duplicate file names - not even the directory where they were. fslint took three days)ī) weren't that helpful in the output that they gave (e.g. I'd tried other duplicate finding programs, and while these worked, they wereĪ) horrendously slow (e.g. In my case, this totalled around 300Gb+ of photos and videos etc. you've copied the contents of an SD card to your hard drive for 'safe keeping'). If you're anything like me, you'll have hundreds of photos, spread over different directories, but you don't know if you've got a file repeated a number of times (e.g. Bug fix - if you rename the program to 'fdf_scanner_gui' it auto-starts in GUI mode - was broken now fixed.Bug fix - crash deriving file names within linux environments.progressbar takes console size into account.output file is (re)generated for every 100 files having a hash calculation.progressbar now includes name of the file currently having hash calculation and its size.Improvements - progress bar now works again properly.Of being able to have output in case the program crashes part way through processing). (file writes are fast and so there's not a major issue with time, in comparison to the benefit Duplicates are located as the directories are scanned, and output files and database file.added low memory footprint hash calculate (files > 250Mb).add '-g' switch to force to GUI mode as alternative to renaming the file (renaming still works).Add exception code to deal with invalid timestamps for files under windows.Bugfix (Many thanks for the fix from Oliver Kopp).You can now save the current configuration in the GUI to a.Addition of exception handler catches crash within Binar圜hopSearch function.Windows versions are fdf_scanner.exe (command line) and fdf_scanner_gui.exe (GUI version).running fdf_scanner_gui.py is the GUI version, using QT5.running fdf_scanner.py is the command line version.Automatically detects if linux or windows.This cuts the database file size quite a bit. The database file now only contains entries that have a SHA256 value, rather than all files > 250 bytes.Also, added ZIP version, contains a self-contained version with windows EXE (built using pyinstaller).Added '-w' option to output suitable for windows command prompt.A python program to locate duplicate files - and do it fast HISTORY V0.1
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |