Scraping podcast which uses a javascript to obfuscate mp3 links

Last Updated or created 2022-06-29

wget-ting the page only gave me flat html, but no readable links.

We need the rendered version, phantomjs wil help

wget https://bitbucket.org/ariya/phantomjs/downloads/phantomjs-2.1.1-linux-x86_64.tar.bz2

printsource.js

var system = require('system');
var page   = require('webpage').create();
var url    = system.args[1];
page.open(url, function () {
  console.log(page.content);
  phantom.exit();
});

Run phantomjs

phantomjs-2.1.1-linux-x86_64/bin/phantomjs printsource.js  https://xxxxxxxx/show/xxxx > out

So now i got the rendered page, get mp3’s and titles, for this example

cat out | sed 'N;s/\n/,/' | cut -f2,7 -d\" | while read line ; do
mp3=$( echo $line | cut -f1 -d\")
title=$( echo $line | cut -f3 -d\> | tr -d '/<>[]]\!,;' | tr -d "'" | sed s/CDATA//g | sed s#title##g | sed s/:/-/g )
echo "$mp3 $title"
wget $mp3 -O "$title.mp3"
done

bash downloadscript
done