Scraping authenticated websites

A friend needed to scrape data from an authenticated website.
This needs to be scripted and processed without human intervention.

Following steps are needed to get the correct curl commands (one time only)

Login page
Press F12 or right-click inspect
Click network and reload using ctrl-r
Select the start page and right click
copy as cURL (bash)

next steps

save curl command in a file

remove –compresssion and -H ‘Cookie: JSESSIONID=?????????????????????????????’

add just after curl

-k (no certificate check) and
–cookie-jar tmpcookiefile

excecute this. It will give you a file with a session id and a true field.
(This will change at every login)
but is needed for subsequential requests

Next: use this sessioncookie to get the next authenticated request

So to scrape with login, you need two lines in your script.
One to get the session cookie. (YOUR username/pass will be in here!!)
And the second to get the needed page using the cookie

#!/bin/bash
#authenticate and save sessioncookie
curl -k --cookie-jar part1.cookie 'https://xxx.xxxxx.xxx/site/dologin'   -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7'   -H 'Accept-Language: en-GB,en;q=0.9,nl-NL;q=0.8,nl;q=0.7'   -H 'Cache-Control: max-age=0'   -H 'Connection: keep-alive'   -H 'Content-Type: multipart/form-data; boundary=----WebKitFormBoundaryb1chvkAVZSF3hPSu'    -H 'Origin: https://xxx.xxxxx.xxx'   -H 'Referer: https://xxx.xxxxx.xxx/site/loginform'   -H 'Sec-Fetch-Dest: document'   -H 'Sec-Fetch-Mode: navigate'   -H 'Sec-Fetch-Site: same-origin'   -H 'Upgrade-Insecure-Requests: 1'   -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36'   -H 'sec-ch-ua: "Chromium";v="110", "Not A(Brand";v="24", "Google Chrome";v="110"'   -H 'sec-ch-ua-mobile: ?0'   -H 'sec-ch-ua-platform: "Windows"'   --data-raw $'------WebKitFormBoundaryb1chvkAVZSF3hPSu\r\nContent-Disposition: form-data; name="form[username]"\r\n\r\nusername\r\n------WebKitFormBoundaryb1chvkAVZSF3hPSu\r\nContent-Disposition: form-data; name="form[password]"\r\n\r\npassword\r\n------WebKitFormBoundaryb1chvkAVZSF3hPSu\r\nContent-Disposition: form-data; name="form[refname]"\r\n\r\n\r\n------WebKitFormBoundaryb1chvkAVZSF3hPSu\r\nContent-Disposition: form-data; name="form[refid]"\r\n\r\n\r\n------WebKitFormBoundaryb1chvkAVZSF3hPSu\r\nContent-Disposition: form-data; name="form[refmod]"\r\n\r\n\r\n------WebKitFormBoundaryb1chvkAVZSF3hPSu\r\nContent-Disposition: form-data; name="form[csrf_hash]"\r\n\r\ncsrf_ab09f7887d9dacfe1489b68b64fe6a01\r\n------WebKitFormBoundaryb1chvkAVZSF3hPSu--\r\n'
#get data from second page
curl -k -l --cookie part1.cookie  https://xxx.xxxxx.xxx/subscriber/overview

Mikrotik and Home Assistant

A friend of mine remembered me of the Mikrotik addons in HA.
I forgot all about this.

I’ve got two integrations running. (Installed via HACS)

Don´t forget to enable the API on your devices.

/ip services
set api disabled=no port=8728

The addons/integrations extract a lot of information from the MT devices.

This integration will also let you know that your MT Router can be updated.

This is part if the presence detector/Device locator.
My roaming wifi network will give false locations. My laptop never left the same spot!

I will add to this page, after testing for a while

DNS / DHCP

Today i’ve migrated my DNS/DHCP to a Mikrotik router.
See post

I’m planning to replace my main firewall/dhcp/dns/web/irc/mail/ids whatever more .. to virtual machines and a mikrotik router.

Having used bind and isc-dhcp-server for many years, i had to try some alternatives.

  • DNS in FreeIPA
  • Unbound with zone/dns
  • Phpipam with powerdns

And now DNS/DHCP in a Mikrotik router!

I still use Phpipam as cmdb.
So i’m planning to use ansible to synchronize phpipam and the mikrotik.

Nice .. no seach domain needed …
nslookup webserver, resolves webserver.dmz
If there are 2 hosts with the same hostname, you have to add a domain.

Bread from beer waste

Last saturday we were brewing beer, ending up with a lot of spent grain. (Bier borstel in Dutch)
These are the processed grains from the brewing.

Well, let’s bake a delicious bread, very tasty with, for example, salted butter.

Preparation time : 135 minutes
Baking time : 60 minutes

What do you need
225 grams of spent grain from the brewing
400 grams of flour
200 ml beer ( I use Leffe Blond)
7 grams of baker’s yeast
50ml olive oil

Put the brewer’s grains in the food processor. (Removes the sharp edges of the malt.)
Put the yeast and a little sugar in the beer. This causes the yeast to become active.
Knead all ingredients together until it becomes an airy ball.
Then let this dough rise for an hour and a half in a warm place in a bowl that is covered with plastic.
Then knead the dough briefly and let it rest for another half hour.
Preheat the oven to 200º C.
Pour the batter into a greased baking pan.
Then place the dough in the center of the oven.
After about an hour the bread should be ready. You can check this by knocking on the bread. If it sounds hollow, the bread is ready.

Shelly Devices

I love shellies, easy to implement and hacker friendly.

MQTT out of the box.
Curl in and out – you can switch it on/off using curl, but it can send a http command also.
I’ve used this with Domoticz and Home Assistant.
It has a webinterface, with timers, and there is also a client for Android/Iphone
You’ve got some own gpio pins to your disposal and the unit is flash-able!

curl -X POST https://shellydevice/device/relay/control -d "channel=IDHERE&turn=on&id=ID&auth_key=AUTH"

The device is not isolated from the mains. To flash it, the mains must be disconnected.

SHELLY 1

I use this for simple on/off switches around the house.
Using it with a physical switch and MQTT (Nodered)

SHELLY 2 PM

Same as above but this one has a build in power meter

SHELLY DIMMER

Generic dimmer

Dimmer with low voltage rotary encoder!
https://www.instructables.com/Shelly-Dimmer-Wall-Switch-With-Rotary-Knob-and-Hom/

SHELLY RGBW2

A RGBW / 4 Channel controller
You can connect RGB strips, but also dimmable white strips in 4 channels. 12V or 24V.
NOTE! : There is a common 12V connection, and GND will be controlled!

SHELLY PLUG S

I love these small wall plugs, i’ve used these also to find power consuming devices around the house.

For example, i made a nodered flow, to see if the washing machines are running or not.

SHELLY BUTTON

This became my all purpose alarm thinghy

SHELLY DOORSENSOR

A door sensor WITH temperature and light sensor build in!

Home Assistant example

SHELLY WATER SENSOR

This one lays below our washing machine

SHELLY SMOKE ALARM

(Preordered)

SHELLY PM4

A four channel power measure/switch for your fuse box


GENERIC SHELLY STUFF
A smart doorbell schematic i found

https://www.thingiverse.com/thing:5756154
Very nice .. printable cases for your shellies!

At last .. my own lasercutter

I did a lot of lasercutting at Fablab Utrecht, but they stopped a few years ago, and I moved to Hilversum.

I loved making this at this Fablab.

  • Lasercutting : Boxes, A cryptex of my own design, Xmas ornaments, Shogi game, things in acrylic, Rubber stamps
  • Cutting plotter ( Nae Bother Case logo’s )
  • CNC Machines
  • Vacuum form

Then i bought a mini engraver, which you can attach to your 3D printer.

But I really wanted a cutter, so there it is … the Sculptfun S9

First test. crafting paper. No burning and a really high resolution!

This laser module has a new optical design, so it can cut wood as thick as 10+ mm.
It can engrave metal, cut non-transparant acrylic, leather and more.

Software:

I’m using Inkscape, with the lasercutter tool plugin from Jtech.
https://jtechphotonics.com/

I’ve also tested with LaserGRBL, which can be run under linux using Wine.

I’m trail testing LightBurn. (Native Linux App)

I will post my findings and test on this post.

UPDATE: 20230221

A stone slate engraved
Speed3000 mm/s
Laser Power100%
Lines per mm10

UPDATE: 20230306

Calibrate your machine! .. But NOT as found on YouTube using a 10x10mm or 1 inch by 1 inch. square.
As big as you can. This is far more precise!

My list of settings (Work in progress)

I have to check mm/s versus mm/minute!

MaterialSpeedPowerLines/mmPasses
Slate (engrave)
(using lasergrbl)
3000 mm/s100%101
3mm Plywood (cut)20 mm/s90%x8?
Craftpaper1500mm/s70%x1
Cardboard10 mm/s100%x4
Leather
Cork (6mm)

Converting old media

Old files, own created stuff, downloaded stuff. Sometimes there are no viewers anymore, or you thrown devices away.

(I will add to this when i find more information)

These are the tools i’ve used to convert them to a newer format.

NOTE: Converting will always impact the quality

IMAGES

LBM (amiga)

ilbmtoppm < test.LBM | pnmtopng - > test.png

RAM (video files)

ffmpeg -i file.ram file.mp4
or
ffmpeg -i file.ram file.gif

Own DVD’s
NOTE: VOB files are mpg, to keep same quality i use stream copy (-c copy) , besides mpg i´d like to have movies in a web streamable format like MP4, so i DO convert these

ffmpeg -i "concat:VTS_01_1.VOB|VTS_01_2.VOB|VTS_01_3.VOB" -f mpeg -c copy output.mpeg

Flash/SWF

http://www.swffileplayer.com/
https://flasharch.com/en/downloads
.. and record with OBS

AMR ( Adaptive Multi-Rate audio codec )

ffmpeg -i folkband.amr -ar 22050 folkband.mp3

3GP

ffmpeg -i pipes.3gp -vcodec h264 -c:a aac pipes.mp4
If your 3gp is recent and has a x264 tag (check with mediainfo)
then you can convert lossless with
ffmpeg -i 10026202.3gp -c:v copy -c:a copy 10026202.mp4

QTVR (need to get a better solution for this)
I’ve used this for rendered scenes

ffmpeg -i ../test.mov %02d.png
ffmpeg -i %02d.png -vf "tile=1x24,transpose=1" qtvrmovie.png

FLV

ffmpeg -i filename.flv -c:v libx264 -crf 19 -strict experimental filename.mp4

WEBP (Not a old media format, but i dont like webp format.)
This script converts to images or animated gif depending on the number of frames

#!/bin/bash


ls *webp |  while read file; do
duration=$(webpinfo -summary "$file" | grep Duration | head -1 |  sed -e 's/.* \([0-9]*\)$/\1/')
frames=$(webpinfo -summary "$file" | grep frames | head -1 |  sed -e 's/.* \([0-9]*\)$/\1/')
if [ "$duration" == "0" ] ; then duration=50 ;fi
echo "DUR $duration : FRM $frames"
dur=$( echo "$duration / 100" | bc )


if [ $frames == "1" ] ; then convert $file png/$file.jpg 
else 
for i in $(seq -f "%05g" 1 $frames)
do
    webpmux -get frame $i $file -o $file.$i.webp
    dwebp $file.$i.webp -o $file.$i.png
done
fi

convert $file.*.png  -colorspace srgb  +dither  -delay $dur -loop 0 gif/$file.gif
rm $file.[0-9]*.png *webp.[0-9]*.webp
done

Nextcloud Notes

(Old and new notes, i will keep adding info)

A install step stopped .. find a .step file and remove this

[nginx@nextcloud nextcloud]$ rm /mnt/fileserver/updater-ocv0dxmv6vpg/.step

Login page loop: directory rights wrong

This server was running nginx
drwxrwx---.  2 root apache     6 Jan  3 18:19 wsdlcache
drwxrwx---.  2 root apache 16384 Jan  3 18:19 session
drwxrwx---.  2 root apache     6 Jan  3 18:19 opcache

fixed with:
chown root:nginx /var/lib/php/session 
chown root:nginx /var/lib/php/opcache
chown root:nginx /var/lib/php/wsdlcache

Other things to check (Generic)

  • Really slow storage can cause problems, check with iotop
  • Check selinux! (temporary test with : setenforce 0 )
  • Behind a reverse proxy? remove these options in your config.php and check a local connection. This is a test for: reverseproxy, firewall and proxy settings in config.php. Need to test a localhost connection with your browser?
    ssh -L8080:localhost:80 nextcloudserver (and connect to http://localhost:8080 with your browser
  • Check your services! .. memcache, nginx, apache, mysql/mariadb
  • Check logs: journalctl -f, tail -f /var/log/nginx/*log /x/y/z/netcloud.log
  • Storage on a remote server? (Using NFS for example) Check your rights
    • change into your webserver account and test access to files.storage
      ( sudo su – nginx -s /bin/bash )

Migrating users to a complete new nextcloud instance keeping IDs

Get a list of users

su - apache -s /bin/bash  -c " source /opt/rh/rh-php72/enable; cd /var/www/html/nextcloud ;./occ user:list 2>/dev/null | grep -v admin ; exit" > listofusers

Sync and check users
Rsync data and use occ command to scan files into database

#!/bin/bash
cd /root
#set -x
templist=templist.$$
./getlistusers > userlistsvr1
cat userlistsvr1 > /tmp/$templist
cat userlistsvr2.new >> /tmp/$templist
cat /tmp/$templist | cut -f2 -d: | sort  | uniq -c | grep " 2 " | cut -c 10- > /tmp/deze.$$
cat /tmp/deze.$$ | while read ; do
	echo "User $REPLY"
	oldtest=0
	newtest=0
	old=$(cat userlistsvr2 | grep "$REPLY" | awk '{ print $2 }' | sed s/://g)
	new=$(cat userlistsvr1 | grep "$REPLY" | awk '{ print $2 }' | sed s/://g)
	test -d /var/data/owncloud/data/$old/files/ && oldtest=1
	test -d /var/data/owncloud/nextcloud/$new/files/ && newtest=1
	if [ $oldtest -eq 1 ] ; then echo " found old dir - $old" ; fi
	if [ $newtest -eq 1 ] ; then echo " found new dir - $new" ; fi
	if [ $oldtest -eq 1 ] && [ $newtest -eq 1 ] ; then echo "  both found - ready to sync" 
		rsync -va /var/data/owncloud/data/$old/files/ /var/data/owncloud/nextcloud/$new/files/
		echo " RUN occ files:scan $new ############################"
		su - apache -s /bin/bash  -c " source /opt/rh/rh-php72/enable; cd /var/www/html/nextcloud ;./occ files:scan $new"
        fi
	echo " "
 
done
rsync -va  /var/data/owncloud/data/root/files/xyz/ /var/data/owncloud/nextcloud/38842e24-5d0d-11e9-dbdb-0001a4ab950f/files/
su - apache -s /bin/bash  -c " source /opt/rh/rh-php72/enable; cd /var/www/html/nextcloud ;./occ files:scan 38842e24-5d0d-11e9-dbdb-0001a4ab950f"

"If something is worth doing, it's worth overdoing."