Running a diff from the two most recent versions of a page

Question

Running a diff from the two most recent versions of a page

54 views Asked by Ikarian At 11 November 2015 at 17:23

I am trying to set up a bash script to download a web page once a day, then run a diff of the last two pages and send an alert if the pages are more than 15% different. I'm not really sure how to approach the selection of the two most recent pages.

The script starts simple enough, just doing a wget of a page and inserting the date into the filename:

wget --output-document=index`date +%Y-%m-%d`.html https://www.example.com

Assuming a couple of those pages have been collected, we run a diff of the two most recent pages. (And this is where I'm lost)

sdiff -B -b -s index1.html index2.html | wc -l

Any suggestions on how to set this up so it can pull the last two files and run the diff?

Original Q&A

There are 1 answers

**cyber.sh** · Answer 1 · 2015-11-11T22:25:58+00:00

I would keep the date as part of file name when you do wget.

For file comparison, I would go by below solution.

YdayFile=index`date +%Y%m%d -d "1 day ago"`.html
TodaysFile=index`date +%Y%m%d`.html        
wget --output-document=${TodaysFile} https://www.example.com
sdiff -B -b -s ${TodaysFile} ${YdayFile} | wc -l

You could replace "1 day ago" any number of days you want to go back. Doing file existence check before diff would be nice too.

Check out this link for more date operations. http://ss64.com/

TechQA.

Running a diff from the two most recent versions of a page

There are 1 answers

Related Questions in BASH

Related Questions in DIFF

Related Questions in SDIFF

Popular Questions

Trending Questions