I am trying to set up a bash script to download a web page once a day, then run a diff of the last two pages and send an alert if the pages are more than 15% different. I'm not really sure how to approach the selection of the two most recent pages.
The script starts simple enough, just doing a wget of a page and inserting the date into the filename:
wget --output-document=index`date +%Y-%m-%d`.html https://www.example.com
Assuming a couple of those pages have been collected, we run a diff of the two most recent pages. (And this is where I'm lost)
sdiff -B -b -s index1.html index2.html | wc -l
Any suggestions on how to set this up so it can pull the last two files and run the diff?
I would keep the date as part of file name when you do wget.
For file comparison, I would go by below solution.
You could replace "1 day ago" any number of days you want to go back. Doing file existence check before diff would be nice too.
Check out this link for more date operations. http://ss64.com/