martiLQ/docs/source/comparison.md

44 lines
1.8 KiB
Markdown
Raw Permalink Normal View History

2021-12-12 12:14:23 +00:00
# Comparison of martiLQ document definition
2021-10-08 21:38:33 +00:00
2021-10-10 02:07:46 +00:00
The use of metadata definitions is not unique and examples
exist in many different situations. Some are standard and open
while others are closed.
2021-10-08 21:38:33 +00:00
2021-12-12 12:14:23 +00:00
Some open standards are EXIF data for pictures, SQL DDL definitions
2021-10-10 02:07:46 +00:00
for databases, the XMP definition and web header responses before the
web content.
2021-10-08 21:38:33 +00:00
2021-10-14 10:50:55 +00:00
The **martiLQ** document definition is intended to cover the situation
2021-10-10 02:07:46 +00:00
where data files are being transferred and reconciliation is required.
2021-10-14 10:50:55 +00:00
The **martiLQ** document definition is modelled on
the [CKAN API metadata](https://docs.ckan.org/en/2.9/api/index.html)
2021-10-10 02:07:46 +00:00
which has been adapted to included additional elements relevant to when
you are exchanging data files. This includes the reconciliation elements
such as number of records and file hash.
As the definition is based on the CKAN API, there are tools to import
2021-10-14 10:50:55 +00:00
a CKAN source into a **martiLQ** document definition and then process the data
2021-10-10 02:07:46 +00:00
through the pipeline as you would for any other data file that had a
2021-10-14 10:50:55 +00:00
**martiLQ** document definition.
2021-10-10 02:07:46 +00:00
2021-12-12 12:14:23 +00:00
## Benefit of CKAN and martiLQ
2021-10-10 02:07:46 +00:00
The CKAN is excellent at defining the data source details but it lacks information
2021-12-12 12:14:23 +00:00
for load quality. If you have CKAN deployed in your organization and wish
2021-10-10 02:07:46 +00:00
exhange or process the data referenced in CKAN, then there are synergies between
CKAN and marti.
Samples exist on CKAN integration.
2021-12-12 12:14:23 +00:00
## Magda and martiLQ
2021-10-10 02:07:46 +00:00
Another source of data is [Magda](https://magda.io/) which has API metadata
definitions. Magda is more about data federation and as such provides
2021-10-10 02:07:46 +00:00
functionality on finding data sources and describing the contents.
The Magda software is able to generate APIs and data content. This does not
address the needs of data processing pipeline when reconciliation is required.
If you have Magda data sources then synergies exist between Magda and martiLQ.