CrawlRSS (Heritrix add-on module)


NOTE: This add-on will only work with Heritrix 3.3.0 or later.

  • Download the code
  • Run "mvn package". This generates a distribution tar.gz file.
  • Extract the archive from step #2 into the root directory of a Heritrix (3.3.0+) instance
  • Startup Heritrix as usual
  • Base your job on the supplied profile "CrawlRSS-Sample-Profile"