Interface | Description |
---|---|
DedupAttributeConstants |
Lifted from H1 AdaptiveRevisitAttributeConstants and limited to what DeDuplicator was using.
|
Class | Description |
---|---|
CommandLineParser |
Print DigestIndexer command-line usage message.
|
CrawlDataItem |
A base class for individual items of crawl data that should be added to the
index.
|
CrawlDataIterator |
An abstract base class for implementations of iterators that iterate over
different sets of crawl data (i.e.
|
CrawlLogIterator |
An implementation of a
is.hi.bok.deduplicator.CrawlDataIterator
capable of iterating over a Heritrix's style crawl.log . |
DeDupFetchHTTP |
An extentsion of Heritrix's
org.archive.crawler.fetcher.FetchHTTP
processor for downloading HTTP documents. |
DeDuplicator |
Heritrix compatible processor.
|
DigestIndexer |
A class for building a de-duplication index.
|
Enum | Description |
---|---|
OriginHandling |
Copyright © 2014 National and University Library of Iceland. All Rights Reserved.