The DeDuplicator is an add-on module for the web crawler Heritrix. It offers a means to reduce the amount of duplicate data collected in a series of snapshot crawls.
[top]