urlwatch: Watch web pages and arbitrary urls for changes

SYNOPSIS

urlwatch [options]

DESCRIPTION

urlwatch watches a list of URLs for changes and prints out unified diffs of the changes. You can filter always-changing parts of websites by providing a "hooks.py" script.

OPTIONS

--version: show program's version number and exit
-h, --help: show the help message and exit
-v, --verbose: Show debug/log output
--urls=FILE: Read URLs from the specified file
--hooks=FILE: Use specified file as hooks.py module
-e, --display-errors: Include HTTP errors (404, etc..) in the output

ADVANCED FEATURES

urlwatch includes some advanced features that you have to activate by creating a hooks.py file that specifies for which URLs to use a specific feature. You can also use the hooks.py file to filter trivially-varying elements of a web page.

ICALENDAR FILE PARSING

This module allows you to parse .ics files that are in iCalendar format and provide a very simplified text-based format for the diffs. Use it like this in your hooks.py file:

  from urlwatch import ical2txt

  def filter(url, data):
      if url.endswith('.ics'):
          return ical2txt.ical2text(data).encode('utf-8') + data
      # ...you can add more hooks here...
      return data

HTML TO TEXT CONVERSION

There are three methods of converting HTML to text in the current version of urlwatch: "lynx" (default), "html2text" and "re". The former two use command-line utilities of the same name to convert HTML to text, and the last one uses a simple regex-based tag stripping method (needs no extra tools). Here is an example of using it in your hooks.py file:

  from urlwatch import html2txt

  def filter(url, data):
      if url.endswith('.html') or url.endswith('.htm'):
          return html2txt.html2text(data, method='lynx')
      # ...you can add more hooks here...
      return data

FILES

~/.urlwatch/urls.txt: A list of HTTP/FTP URLs to watch (one URL per line)
~/.urlwatch/lib/hooks.py: A Python module that can be used to filter contents
~/.urlwatch/cache/: The state of web pages is saved in this folder

AUTHOR

Thomas Perl <thp.io/about>

WEBSITE

http://thp.io/2008/urlwatch/

urlwatch (1)