public class CrawlLogIterator extends CrawlDataIterator
is.hi.bok.deduplicator.CrawlDataIterator
capable of iterating over a Heritrix's style crawl.log
.Modifier and Type | Field and Description |
---|---|
protected SimpleDateFormat |
crawlDataItemFormat
The date format specified by the
CrawlDataItem for dates
entered into it (and eventually into the index) |
protected SimpleDateFormat |
crawlDateFormat
The date format used in crawl.log files.
|
protected BufferedReader |
in
A reader for the crawl.log file being processed
|
protected CrawlDataItem |
next
The next item to be issued (if ready) or null if the next item
has not been prepared or there are no more elements
|
Constructor and Description |
---|
CrawlLogIterator(String source)
Create a new CrawlLogIterator that reads items from a Heritrix crawl.log
|
Modifier and Type | Method and Description |
---|---|
void |
close()
Closes the crawl.log file.
|
String |
getSourceType()
A short, human readable, string about what source this iterator uses.
|
boolean |
hasNext()
Returns true if there are more items available.
|
CrawlDataItem |
next()
Returns the next valid item from the crawl log.
|
protected CrawlDataItem |
parseLine(String line)
Parse the a line in the crawl log.
|
protected void |
prepareNext()
Ready the next item.
|
protected final SimpleDateFormat crawlDateFormat
protected final SimpleDateFormat crawlDataItemFormat
CrawlDataItem
for dates
entered into it (and eventually into the index)protected BufferedReader in
protected CrawlDataItem next
public CrawlLogIterator(String source) throws IOException
source
- The path of a Heritrix crawl.log file.IOException
- If errors were found reading the log.public boolean hasNext() throws IOException
hasNext
in class CrawlDataIterator
IOException
- If an error occurs accessing the crawl data.public CrawlDataItem next() throws IOException
next
in class CrawlDataIterator
IOException
- If there is an error reading the item *after* the
item to be returned from the crawl.log.NoSuchElementException
- If there are no more itemsprotected void prepareNext() throws IOException
Note: This method should only be called when next==null
IOException
protected CrawlDataItem parseLine(String line)
Override this method to change how individual crawl log items are processed and accepted/rejected. This method is called from within the loop in prepareNext().
line
- A line from the crawl log. Must not be null.CrawlDataItem
if the next line in the crawl log yielded
a usable item, null otherwise.public void close() throws IOException
close
in class CrawlDataIterator
IOException
- If an error occurs closing access to crawl data.public String getSourceType()
CrawlDataIterator
getSourceType
in class CrawlDataIterator
Copyright © 2014 National and University Library of Iceland. All Rights Reserved.