Some useful Linux day 2 day stuff that I find most useful when managing multiple checkpoints (weights of ML models). Especially tailored to this need:

You run an experiment overnight, wake up in the morning, ssh to your pod, and find a new 100 Gb sitting on your homeā€¦ Time to strat cleaning!

Size of each sub-directories:

$ du -h --max-depth=1 
$ du -h --max-depth=1 | grep 'G' 

Delete all files starting (or ending) with some pattern pattern_X:

$ ls  | grep "^pattern_X" | xargs rm
$ ls  | grep "pattern_X$" | xargs rm  # ending with pattern_X

Only view sub-directories of a current directory:

$ ls -l | grep "^d"

View all files of certain pattern pattern_X, except one pattern_XY:

$ ls | grep "^pattern_X" | grep -v "^pattern_XY" 

View the last added files of some pattern pattern_X

$ ls -t | grep "^pattern_X" | head -n 10 

Example of use: When managing checkpoints while still training, and you would like to delete most your checkpoints, except maybe the last one.

A very very practical use:

$ ls -t check2/ | grep "^256" | head -n 9 | while read -r line; do 
$ 	python owt_eval.py "./check2/$line"; 
$ done

Another way is, now my preferred way:

$ls -t check2/ | grep "^256" | head -n 9 | xargs -I {} python owt_eval.py check2/{}
$ ls -t check2/ | grep "^256" | head -n 9 | while read -r line; do 
$ 	number=${line:31:-3}
$ 	python kron_to_gpt.py $line $number
$ done

My checkpoints are all of the format checkpoint_long_string_number.pt, hence ${line:31:-3} extracts the number before pt.