Last Updated or created 2026-06-02
(I often forgot to download important attachments, like orders and important paperwork.
I am using paperless-ngx to make pdfs searchable.
- export (takeout) mail from for example gmail
- extract all attachments using below script
- sort into media types
- use dangerzone.rocks installation to sanitize (remove malware/virus)
cat All\ mail\ Including\ Spam\ and\ Trash-002.mbox | formail -des munpack
Cleanup script
find -type f -iname '*.desc' -exec rm {} \;
for f in =X*; do
new=$(echo "$f" | sed -E 's/^=X//; s/X=(\.[0-9]+)?$/\1/')
mv -- "$f" "$new"
done
for f in *.[0-9]*; do
base=${f%.[0-9]*}
if [ -f "$base" ] && cmp -s -- "$base" "$f"; then
echo "Removing duplicate: $f"
rm -- "$f"
fi
done
for f in *.[0-9]*; do
n=${f##*.} # 1, 2, 3 ...
base=${f%.*} # winmail.dat
ext=${base##*.} # dat
name=${base%.*} # winmail
mv -- "$f" "${name}-${n}.${ext}"
done
for f in -*; do
new="${f#-}"
mv -- "$f" "$new"
done
for f in -*; do
new="${f#-}"
mv -- "$f" "$new"
done
for f in *X; do
mv -- "$f" "${f%X}"
done
mkdir -p pdf images audio text movies bww zip midi html vcf xml
mv *PDF pdf
mv *pdf pdf
mv *gif images
mv *GIF images
mv *jpg images
mv *bmp images
mv *BMP images
mv *jpeg images
mv *JPG images
mv *Jpg images
mv *png images
mv *tif images
mv *eps images
mv *PNG images
mv *Png images
mv *svg images
mv *psd images
mv *mp3 audio
mv *MP3 audio
mv *wma audio
mv *wav audio
mv *m4a audio
mv *txt text
mv *wri text
mv *doc text
mv *docx text
mv *xls text
mv *XLS text
mv *ppt text
mv *pptx text
mv *xlsx text
mv *mp4 movies
mv *MP4 movies
mv *avi movies
mv *mov movies
mv *MOV movies
mv *mpg movies
mv *MPG movies
mv *bww bww
mv *abc bww
mv *pio bww
mv *zip zip
mv *ZIP zip
mv *tgz zip
mv *tar zip
mv *rar zip
mv *mid midi
mv *html html
mv *htm html
mv *vcf vcf
mv *xml xml
Next thing to do .. sanitize PDF’s

paperless-ngx to ingest
