Tag files with their sha-256 checksums
shatag [-fhlLqrtuv0] [-d DATABASE] [-n NAME] [-R NAME]... [FILES]...
shatag is a tool for computing and caching SHA-256 file checksums, and efficiently search for identical file across systems. Checksums are stored using the POSIX Extended Attributes filesystem facility, and are preserved when files are moved or renamed. Checksums can be fetched from a remote host and stored in an sqlite database for fast lookups.
When invoked with no options, shatag just displays the cached, valid checksums. If no files are specified, it applies to all non-hidden files in the current directory. The output format is identical to the one of the sha256sum command.
Instead of outputting one record per line (like sha256sum does,) separate records with null characters.
Set the path of the SQLite database to query when using -l , -L or -p (The default path is $HOME/.shatagdb, overridable from the config file)
Instead of a file name, a PostgreSQL database can be specified with a prefix of "pg:" followed by a psycopg2 DSN string, like:
"pg:dbname=shatag user=myuser password=mypassword host=192.168.1.3"
When running with -t or -u , recompute the checksum even if the file modification time has not changed. If the old checksum differs, report the file as corrupted.
Displays the help message
Instead of displaying the checksums, look them up against the local database and indicate if the file exists. A yellow - mark indicates that the file does not exist somewhere else, a green = that the file exists at one or several remote locations, a red + that the file has a duplicate on the local system, and a magenta * that the file is empty.
Instead of displaying the checksums, look them up against the local database. Print all the known remote locations for identical files.
Name of local storage (defaults to canonical local host name). This needs to be correct if the local database contains entries for this own host.
Record found tags in the database, for duplicate detection.
Do not display the valid checksums when they are found.
Recurse trough subdirectories
When using -l or -L , This is used to restrict the set of remote names to consider. If present, other storages will be ignored.
Compute new checksums for files that don't have one, or when it is outdated.
Recompute the outdated checksums only. Be aware that this can behave counter-intuitively; outdated checksums will only exists for files that have been appended to or partially modified. Many programs dealing with small files (some well-known text editors, notably) will overwrite the whole file when saving, and the new file will be lacking a checksum entirely. For these cases, use -t instead.
Report encoutered files that have an outated or missing checksum.
Retag a whole directory and record everything to the database:
shatag -pqrt .
Check files in the current directory for remote duplicates:
shatag -l
Show alternate locations for duplicates of a single file:
shatag -L somefile
~/.shatagrc
YAML configuration file. Currently has only two possible configuration keys: "database", which sets the database path (by default, ~/.shatagdb) and "name" for the volume name in the database (default to canonical host name.)
Examples:
database: /var/lib/shatag.db # sqlite3 backend database: "pg: dbname=shatag host=localhost user=shatag password=xxxsecretpasswordxxx" # postgres backend database: http://service.com/shatag # http backend database: insecure-https://service.com/shatag # http backend, skip ssl certificate verification
Support for non-ASCII filenames across systems of different and/or inconsistent encodings have not been fully tested.
Not all option combinations are sensible.
Report shatag bugs to the bugtracker at http://bitbucket.org/maugier/shatag,
shatag-add(1), shatag-mkdb(1)