What is the DeDuplicator?

The DeDuplicator is an add-on module for the web crawler Heritrix. It offers a means to reduce the amount of duplicate data collected in a series of snapshot crawls.

[top]