public class CrawlDataItem extends Object
Modifier and Type | Field and Description |
---|---|
protected String |
contentDigest |
static String |
dateFormat
The proper formating of
setURL(String) and getURL() |
protected boolean |
duplicate |
protected String |
etag |
protected String |
mimetype |
protected String |
origin |
protected long |
size |
protected String |
statusCode |
protected String |
timestamp |
protected String |
URL |
Constructor and Description |
---|
CrawlDataItem()
Constructor.
|
CrawlDataItem(String URL,
String contentDigest,
String timestamp,
String etag,
String mimetype,
String origin,
boolean duplicate,
long size)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
String |
getContentDigest()
Returns the documents content digest
|
String |
getEtag()
Returns the etag that was associated with the document.
|
String |
getMimeType()
Returns the mimetype that was associated with the document.
|
String |
getOrigin()
Returns the "origin" that was associated with the document.
|
long |
getSize()
Get the size of the CrawlDataItem.
|
String |
getTimestamp()
Returns a timestamp for when the URL was fetched in the format:
yyyyMMddHHmmssSSS
|
String |
getURL()
Returns the URL
|
boolean |
isDuplicate()
Returns whether the CrawlDataItem was marked as duplicate.
|
void |
setContentDigest(String contentDigest)
Set the content digest
|
void |
setDuplicate(boolean duplicate)
Set whether duplicate or not.
|
void |
setEtag(String etag)
Set a new Etag
|
void |
setMimeType(String mimetype)
Set new MIME type.
|
void |
setOrigin(String origin)
Set new origin
|
void |
setSize(long size)
Set the size of the CrawlDataItem
|
void |
setTimestamp(String timestamp)
Set a new timestamp.
|
void |
setURL(String URL)
Set the URL
|
public static final String dateFormat
setURL(String)
and getURL()
protected String URL
protected String statusCode
protected String contentDigest
protected String timestamp
protected String etag
protected String mimetype
protected String origin
protected boolean duplicate
protected long size
public CrawlDataItem()
public CrawlDataItem(String URL, String contentDigest, String timestamp, String etag, String mimetype, String origin, boolean duplicate, long size)
URL
- The URL for this CrawlDataItemcontentDigest
- A content digest of the document found at the URLtimestamp
- Date of when the content digest was valid for that URL.
Format: yyyyMMddHHmmssSSSetag
- Etag for the URLmimetype
- MIME type of the document found at the URLorigin
- The origin of the CrawlDataItem (the exact meaning of the
origin is outside the scope of this class and it may be
any String value)duplicate
- True if this CrawlDataItem was marked as duplicatepublic String getURL()
public void setURL(String URL)
URL
- the new URLpublic String getContentDigest()
public void setContentDigest(String contentDigest)
contentDigest
- The new value of the content digestpublic String getTimestamp()
public void setTimestamp(String timestamp)
timestamp
- The new timestamp. It should be in the format:
yyyyMMddHHmmssSSSpublic String getEtag()
If etag is unavailable null will be returned.
public void setEtag(String etag)
etag
- The new etagpublic String getMimeType()
public void setMimeType(String mimetype)
mimetype
- The new MIME typepublic String getOrigin()
public void setOrigin(String origin)
origin
- A new origin.public boolean isDuplicate()
public void setDuplicate(boolean duplicate)
duplicate
- true if duplicate, false otherwisepublic long getSize()
public void setSize(long size)
size
- The size or -1 if the size is indeterminateCopyright © 2014 National and University Library of Iceland. All Rights Reserved.