94 lines
4.4 KiB
Markdown
94 lines
4.4 KiB
Markdown
# Distribution definition
|
|
|
|
The distrubution definition describes a single document, though
|
|
some documents may expand to multiple documents if they are
|
|
compressed with a utility such as WinZIP or 7ZIP
|
|
|
|
|
|
* Title
|
|
* Document name - Commonly being absolute or relative file name.
|
|
This value could also be an URL address or network path
|
|
* Issued date - When the document was made available. The date can include time
|
|
* Modified - When the document was created or modified. This is the data and time
|
|
* Size of document - The document size in bytes
|
|
* Hash of document - The hash of the document, which can be blank especially for large documents
|
|
* Hash algorithm
|
|
|
|
|
|
The following are optional in the distribution section.
|
|
|
|
* Identifier
|
|
* Description
|
|
* Download URL
|
|
* Version - Document version. The same document coudl be updated or this might denote the next version
|
|
of a regular report. For example a daily extract will have the version number incremented
|
|
every day and provide a new URL. The previous document can be retained.
|
|
* Format - if not specified then the consumer will in all likelihood use the document extension / mime type
|
|
* Media Type
|
|
* Expiry Date - The date and time that this document expires and can be removed from the download URL
|
|
location. This is not the document retention period as might be required for archiving.
|
|
* Described By - A link to the metadata describing this document data and format
|
|
* Compression - Type of compression used if any
|
|
* Encryption - Type of encryption used if any
|
|
|
|
|
|
## Compression
|
|
|
|
Documents can be compressed using a utility. A single compressed document can contain
|
|
multiple documents. The Marti definition document applies to the compressed document
|
|
and not to the contents, which could be multiple documents.
|
|
|
|
In the case of a compressed document, there should be a Marti definition document in the
|
|
compressed document to match the data document. That is the number of the records in a
|
|
compressed document should always be an even number.
|
|
|
|
Compression of documents always occur before encryption.
|
|
|
|
### Marti definition for Compressed Document
|
|
|
|
For a compressed document that is not encrypted, the distribution definition will be:
|
|
|
|
* Title - The compressed document title which could be a group name
|
|
* Document name - Commonly being absolute or relative file name.
|
|
This value could also be an URL address or network path
|
|
* Issued date - When the compressed document was made available.
|
|
* Modified - When the compressed document was created or modified. This is the data and time
|
|
and is not the modified date of the document in the compressed document.
|
|
* Size of document - The compressed document size in bytes
|
|
* Hash of document - The hash of the compressed document, which can be
|
|
blank especially for large documents
|
|
* Hash algorithm
|
|
|
|
The reason for this approach is it allows a generic tool to be deployed to
|
|
check the validity of the contents without unpacking the received /fetched
|
|
document. That is you can perform load quality pipeline processing.
|
|
|
|
## Encryption
|
|
|
|
The encryption of content is always applied after compression not before, if
|
|
you are not using the compression tool native encryption. WinZIP and 7ZIP
|
|
provide encryption within the tool execution.
|
|
|
|
If the compression is TAR or GZIP then you may consider applying a GPG
|
|
or other encryption algorithm to the compressed file.
|
|
|
|
* Title - The encrypted document title
|
|
* Document name - Commonly being absolute or relative file name.
|
|
This value could also be an URL address or network path
|
|
* Issued date - When the **encrypted** document was made available.
|
|
* Modified - When the **encrypted** document was created or modified.
|
|
This is the data and time and is not the modified date of the encrypted document.
|
|
* Size of document - The **decrypted** document size in bytes
|
|
* Hash of document - The hash of the **decrypted** document, which can be
|
|
blank especially for large documents
|
|
* Hash algorithm
|
|
|
|
The rational for using the decrypted document attributes is that an ecrypted
|
|
document is unlikely to be able to be modified without knowing encryption keys.
|
|
Checking the decrypted document attributes is a better check wheer appropriate.
|
|
|
|
The reason for this approach is it allows a generic tool to be deployed to
|
|
decrypt and check the validity of the received / fetched document without
|
|
needing to understand the contents. That is you can perform load quality
|
|
pipeline processing.
|