More descriptions and refafdtor names

Adding Acknowledge
draft_specifications
meerkat 2021-12-11 17:27:37 +11:00
parent 6ef1692bfb
commit d5d4df7874
35 changed files with 557 additions and 94 deletions

3
.gitignore vendored
View File

@ -1,4 +1,6 @@
logs/
temp/
target/
pom.xml.tag
pom.xml.releaseBackup
@ -11,6 +13,7 @@ buildNumber.properties
# https://github.com/takari/maven-wrapper#usage-without-binary-jar
.mvn/wrapper/maven-wrapper.jar
test/data
test/powershell/results/
test/golang/results/
test/python/results/

View File

@ -0,0 +1,58 @@
# Acknowledgment
Once the **martiLQ** document is received by a consumer then communicating the receipt, processing,
success or failure completes the feedback loop and builds an extra layer of assurance for the organisation.
The acknowledgement workflow provides the necessary feedback. If an acknowledgement is required as part of the
consumption design then the following is approach is recommended.
1. The publisher provides callback details. For extra security the callback details should be signed.
2. The consumer will acknowledge the receipt of the **martiLQ** document by sending back the same
document to the publisher with some values changed.
3. Change the root consumer and state (not resource) from ``active`` to ``receipt``.
4. Change the ``consumer`` data value to only be your identifier and not others, so that the publisher
can identify the consumer and associate it with success or failure. This change to consumer value
applies to all subsequent acknowledgement messages.
5. Send the changed **martiLQ** document back using the callback details
6. On fetching each resource the resource state is changed from ``active`` to ``received``. If any resource
cannot be retrieved the state is changed from ``active`` to ``missing``.
7. The consumer can elect to send back the **martiLQ** document to the publisher on each fetch or at the completion
of all fetches. The recommendation is to send at the end of all fetches because if there are issues then
having all the failures for analysis should assist in determining the extent of the failure.
8. Once all resources are fetched (or failed), the root state is changed from ``receipt`` to ``received`` if no
errors occurred in retrieving the resources. If a single or many errors occurred, then the root state is
changed from ``receipt`` to ``missing``. The updated document is sent back to the publisher using
the callback details.
9. The next stage is to validate and process the resources defined in the **martiLQ** document. This follows
a similar process to fetching the resources.
10. On processing each resource the resource state is changed from ``received`` to ``processed``. If any resource
cannot be processed the state is changed from ``received`` to ``error``. Once again this can be acknowledged
back to the publisher.
11. Once all resources are processed (or failed), the root state is changed from ``received`` to ``processed`` if no
errors occurred in processing the resources. If a single or many errors occurred, then the root state is
changed from ``received`` to ``error``. The updated document is sent back to the publisher using
the callback details.
This completes the acknowledgment workflow for the **martiLQ** document. The level of acknowledgement feedback
you wish to implement as a consumer is your decision. Any publisher providing callback details for acknowledgement can also
choose their behaviour on actions and recording any acknowledgments received.
In the above acknowledgement process, you **must not** change the identifiers in the **martiLQ** document and you **should not**
change other data except the ``consumer`` and ``state`` and ``stateModified``.
If you are the publisher and expect acknowledgment then there is an extra scenario you need to cater for. The scenario is
that you do not recieve any acknowledgement back from the expected consumer(s) within the agreed timeframe. In this situation
the publisher will need to know each consumer and their service level agreements.
## Compressed file handling
When the **martiLQ** document is defining a parent compressed file, e.g. ZIP or 7Z, then the resources are expected
to be in the compressed file. These resources can still be checked for existence and that they can be extracted. The
state of the resource is still changed to reflect the processing.
If the file cannot be extracted either because it has not been included or there is a decompression error, then the
same acknowledgement process of using the state is used.
## Error situations

View File

@ -117,6 +117,7 @@ sample can be generated using the GOLANG client program with parameters:
"tags": null,
"license": "",
"state": "active",
"stateModified": "2021-11-02T22:44:29.6887001+11:00",
"batch": 1.001,
"describedBy": "",
"landingPage": "",
@ -130,6 +131,7 @@ sample can be generated using the GOLANG client program with parameters:
"modified": "2021-11-02T07:47:13.9410018+11:00",
"expires": "2023-11-02T00:00:00+11:00",
"state": "active",
"stateModified": "2021-11-02T22:44:29.6881663+11:00",
"author": "",
"length": 3654,
"hash": {

View File

@ -7,7 +7,7 @@ contactPoint = Your friendly Meerkat
accessLevel = Confidential
rights = Public
license = MIT
batch = @./config/batch.no
batch = @./conf/batch.no
theme = Documentation

View File

@ -7,7 +7,7 @@ contactPoint = Your friendly Meerkat
accessLevel = Confidential
rights = Public
license = MIT
batch = @./config/batch.no
batch = @./conf/batch.no
theme = Documentation

View File

@ -13,7 +13,7 @@ contactPoint = Your friendly Meerkat
accessLevel = Confidential
rights = Public
license = MIT
batch = @./config/batch.no
batch = @./conf/batch.no
theme = Documentation

View File

@ -23,7 +23,7 @@ contactPoint = Your friendly Meerkat
accessLevel = Confidential
rights = Public
license = MIT
batch = @./config/batch.no
batch = @./conf/batch.no
theme = Documentation

View File

@ -1,3 +1,4 @@
BSBDirectoryNov21-308.csv
BSBDirectoryOct21-307.csv
BSBDirectorySep21-306.csv
BSBDirectoryAug21-305.csv

View File

@ -1,3 +1,4 @@
BSBDirectoryNov21-308.csv
BSBDirectoryOct21-307.csv
BSBDirectorySep21-306.csv
BSBDirectoryAug21-305.csv

162
pattern.md 100644
View File

@ -0,0 +1,162 @@
# Design pattern
## Abstract
**MartiLQ** defines a software pattern (document) for describing data files or documents generated from a source
and intended to be consumed by another system component with self-describing information with
load assurance metrics.
The consuming system component can be at the same location, a dfifferent geographical location,
the same organisation or another organisation.
The pattern does not define the format that the data file or document must take or how the data is transferred
or accessed. You choose the data format and transfer method. Once you have made the choice, you can describe
it in the **martiLQ** document.
Describing the format and transfer can be tooled so that the mundane activity is automated and only
the specific nuance or additional assurance aspects need your attention. Sample scripts are provided
to demonstrate generating the **martiLQ** document.
## Name
**martiLQ** documentation standard
## Problem statement
Even though event streaming is a stragetic goal for many organisations, there exists legcay processes and there
will continue to be a need to transfer data flies and other documents from one system to another.
When a handover of a data file or document occurs, the best practice is to include metrics with the transfer
to assure the recipient of provenance and quality of the data file or document. This is the metadata associated
with the data file or document.
A document includes unstructered data, letters, pictures, binary objects while data files could be though of
as strutured data that is describes multiple records.
### Assurance Problem
**How does the recipient know they have received all related files, the provenance, it is immutable and
assurance on quality?**
Many organisations have used the file name as the carrier of this information but this has limits.
## Efficiency Problem
**How can the assurance be described so that it can be tooled and not rely on manual documentation
and custom tooling?**
With the drive to DevSecOps or DataOps, any pattern that can be self describing or at least majority
self describing will improve documentation and quality of the process. This boosts the efficiency
of building, testing and maintaining the transfer of data files and documents.
Therefore the objective is to produce a documentation standard that:
1. provides load assurance when transferring data files and documents
2. can be tooled and therefore achieve some level of automation
3. is extensible to give the publisher and consumer control as to the level of assurance
required to match the risk appetite of the organisation
## Context
This pattern is intended to be applied when assurance is required on transferred data files and documents.
The data files or documents are commonly packaged together and the pattern is not intended for
real time event processing nor single record processing. The pattern can be used on
single data file or document if each is considered independent and standalone.
Packaging the related data files and documents is part of data integrity especially if
referential integrity for foreign keys is required or the documents relate all to
the same case such as in workflow.
The assurance includes the following scope and this can be extended to meet changing conditions or
changes in threats or risk.
### Assurance scope
The individual items below are not mandatory but are provided as part of the standard definition
as they are considered the minimal for best practice
* Provenance
* Immutable
* Data period or timeline
* Sequence or batch
* Status and expiry date
* Link to data file or document
* Format, encoding, compression
* Data record count
There is an acknowledgment processs that is recommended for confirmation on processing. See
[acknowldegment](docs/source/acknowledgement.md) for approach details.
## Forces
The qualities that this pattern is addressing...
The file transfer pattern is the original method for separate processes to exchange data. The file being stored on magnetic tape and either
loaded back onto the same compute resource (think mainframe) or physicaly couriered to another lcoation or tape drive. The
reference book [Enterprise Integration Patterns](https://www.enterpriseintegrationpatterns.com/patterns/messaging/FileTransferIntegration.html)
by Hohpe and Woolf recognises this by inculsion of the pattern written by Martin Fowler.
This pattern addresess the issues and concerns that relate to file transfer. Many of these are related the the common
non functional requirements that architects cover in solution designs.
### Security, robustness, reliability, fault-tolerance
The pattern defines how security and assurance is applied to the data files and documents. The pattern does
not define how to setup a reliable infrastructure, but it can be used to detect failures
in the infrastructire. The fault-tolerance allowance is up to each implementation.
Fault-tolerance and the actionable task can be dialled from 0% tolerance to 100% tolerance on a
case by case basis.
### Manageability
### Efficiency, performance, throughput, bandwidth requirements, space utilization
### Scalability (incremental growth on-demand)
The pattern scalability is not bound to the size of the data files themselves. The pattern can
be scaled to include thousands of data files or documents, though the practically of processing
may be factor in the decision of breaking down to smaller volumes.
### Extensibility, evolvability, maintainability
The **martiLQ** document can be customised and can evolve as the market conidtions change. Versioning
is built into the definition and consumers can select which attributes are mandatory for
processing.
### Modularity, independence, re-usability, openness, composability (plug-and-play), portability
### Completeness and correctness
### Ease-of-construction
### Ease-of-use
## Solution
A description, using text and/or graphics, of how to achieve the intended goals and objectives. The description should identify both the solution's static structure and its dynamic behavior - the people and computing actors, and their collaborations. The description may include guidelines for implementing the solution. Variants or specializations of the solution may also be described.
## Resulting Context
The post-conditions after the pattern has been applied. Implementing the solution normally requires trade-offs among competing forces.
This element describes which forces have been resolved and how, and which remain unresolved. It may also indicate other patterns that may be applicable in the new context. (A pattern may be one step in accomplishing some larger goal.) Any such other patterns will be described in detail under Related Patterns.
## Examples
Please refer to the [documentation](docs/source/README.md) and [samples](docs/source/samples/README.md)
## Rationale
An explanation/justification of the pattern as a whole, or of individual components within it, indicating how the pattern actually works, and why - how it resolves the forces to achieve the desired goals and objectives, and why this is "good". The Solution element of a pattern describes the external structure and behavior of the solution: the Rationale provides insight into its internal workings.
## Related Patterns
The relationships between this pattern and others. These may be predecessor patterns, whose resulting contexts correspond to the initial context of this one; or successor patterns, whose initial contexts correspond to the resulting context of this one; or alternative patterns, which describe a different solution to the same problem, but under different forces; or co-dependent patterns, which may/must be applied along with this pattern.
## Known Uses
Known applications of the pattern within existing systems, verifying that the pattern does indeed describe a proven solution to a recurring problem. Known Uses can also serve as Examples.

View File

@ -126,9 +126,9 @@ func findIni() string {
}
if foundPath == "" {
_, err := os.Stat("./config/"+ cIniFileName)
_, err := os.Stat("./conf/"+ cIniFileName)
if err == nil {
foundPath = "./config/"+ cIniFileName
foundPath = "./conf/"+ cIniFileName
}
}

View File

@ -46,7 +46,7 @@ func TestMartiLQ_DirectoryA(t *testing.T) {
SourcePath := currentDirectory
Recursive := false
DefPath := "../test/test_martilq_directoryA.json"
ProcessFilePath("", SourcePath, "", Recursive, DefPath, "")
Make("", SourcePath, "", Recursive, DefPath, "")
}
@ -56,6 +56,6 @@ func TestMartiLQ_DirectoryB(t *testing.T) {
SourcePath := currentDirectory
Recursive := false
DefPath := "../test/test_martilq_directoryB.json"
ProcessFilePath("../config/martilq.ini", SourcePath, "", Recursive, DefPath, "")
Make("../conf/martilq.ini", SourcePath, "", Recursive, DefPath, "")
}

View File

@ -19,7 +19,7 @@ import (
func main() {
bind := flag.String("bind", ":8080", "Bind Http listen to address and port, e.g. localhost:8080 or justy simply :8080")
bind := flag.String("bind", ":8080", "Bind HTTP listen to address and port, e.g. localhost:8080 or justy simply :8080")
staticDirectory := flag.String("static", "static", "Static directory content")
docsDirectory := flag.String("docs", "", "Document directory content")
dataDirectory := flag.String("data", "", "Data directory content")

View File

@ -24,6 +24,13 @@ function New-MartiDefinition
url = ""
}
$oAcknowledgement = [PSCustomObject]@{
url = ""
algo = ""
value = ""
signed = $false
}
if ($null -eq $ConfigPath -or $ConfigPath -eq "") {
$oConfig = Get-Configuration
@ -42,31 +49,66 @@ function New-MartiDefinition
$lcustom += $oTemplate
[System.Collections.ArrayList]$lresource = @()
[System.Collections.ArrayList]$lconsumer = @()
$today = Get-Date
$dateToday = $today.Tostring($oConfig.dateTimeFormat)
$expires = Set-DefaultExpiryDate -DocumentDate (Get-Date) -Configuration $oConfig
$batch = $oConfig.batch
if ($batch -ne "") {
if ($batch[0] -eq "@") {
if (!(Test-Path -Path $batch.Substring(1))) {
}
# // See if we can locate it in Config INI directory
# _, fileb := filepath.Split(m.config.batch[1:])
# dirc, _ := filepath.Split(ConfigPath)
# _, err := os.Stat(dirc + fileb)
# if err == nil {
# m.config.batch = "@" + dirc + fileb
# }
if (Test-Path -Path $batch -PathType Leaf) {
#readFile, err := os.Open(m.config.batch[1:])
#reader := bufio.NewReader(readFile)
#m.Batch, _ = strconv.ParseFloat(line, 10)
#readFile.Close()
} else {
Write-Log ("Batch file '$oConfig.batch' does not exist")
}
} else {
$batch = 1
#m.Batch, _ = strconv.ParseFloat(m.config.batch, 10)
}
}
$oMarti = [PSCustomObject]@{
contentType = "application/vnd.martilq.json"
title = ""
uid = (New-Guid).ToString()
description = ""
issued = Get-Date -f $oConfig.dateTimeFormat
modified = Get-Date -f $oConfig.dateTimeFormat
issued = $dateToday
modified = $dateToday
expires = $expires.Tostring($oConfig.dateTimeFormat)
tags = $oConfig.tags
publisher = $publisher
contactPoint = $oConfig.contactPoint
accessLevel = $oConfig.accessLevel
consumer = $lconsumer
rights = $oConfig.rights
license = $oConfig.license
state = $oConfig.state
batch = $oConfig.batch
stateModified = $dateToday
batch = $batch
describedBy = $oConfig.describedBy
landingPage = $oConfig.landingPage
theme =$oConfig.theme
resources = $lresource
acknowledge = $oAcknowledgement
custom = $lCustom
}
@ -208,10 +250,18 @@ function Get-MartiResource {
function ConvertFrom-Ckan
{
Param(
[Parameter(Mandatory)][String] $InputObject
[Parameter(Mandatory)][String] $InputObject,
[Parameter(Mandatory=$false)][switch] $FetchResource,
[Parameter(Mandatory=$false)][String] $DataPath
)
$oCkan = ConvertFrom-Json -InputObject $InputObject
if ($InputObject.StartsWith("https://") -or $InputObject.StartsWith("http://") -or $InputObject.StartsWith("ftp://")) {
$JsonFileName = Invoke-WebRequest $InputObject
} else {
$JsonFileName = $InputObject
}
$oCkan = ConvertFrom-Json -InputObject $JsonFileName
$oMarti, $oConfig = New-MartiDefinition
@ -235,7 +285,27 @@ Param(
$name = ""
}
$size = $_.size
if ($FetchResource -and $_.url -ne "") {
$localResource = New-LocalTempFile -UrlPath $_.url -Configuration $null -TempPath $DataPath
if (Test-Path -Path $localResource -PathType Leaf) {
Remove-Item -Path $localResource
}
Invoke-WebRequest -Uri $_.url -OutFile $localResource
if ($_.hash -eq "") {
$hash = New-MartiHash -Algorithm "SHA256" -FilePath $localResource -Value $null
} else {
$hash = New-MartiHash -Algorithm "SHA256" -FilePath "" -Value $_.hash
}
if ($size -le 1) {
$size = (Get-Item $localResource).length
}
} else {
$hash = New-MartiHash -Algorithm "SHA256" -FilePath "" -Value $_.hash
}
$expires = (Get-Date).AddYears(7)
$oResource = [PSCustomObject]@{
@ -247,7 +317,7 @@ Param(
expires = $expires.Tostring("yyyy-MM-ddTHH:mm:ss")
state = $_.state
author = $oCkan.result.author
length = $_.size
length = $size
hash = $hash
description = $_.description

View File

@ -26,7 +26,7 @@ function Get-DefaultConfiguration {
dateFormat = "yyyy-MM-dd"
dateTimeFormat = "yyyy-MM-ddTHH:mm:ss"
dataPath = ""
tempPath = ""
tempPath = "temp"
tags = @( "default", "martiLQ")
publisher = ""
@ -61,9 +61,6 @@ function Get-DefaultConfiguration {
loaded = $false
}
#self._Log = mLogging()
#self._Log.SetConfig(self._oConfiguration["logPath"], self.GetSoftwareName())
return $oConfiguration
}
@ -77,14 +74,25 @@ function Import-Configuration {
if ($null -eq $ConfigPath -or $ConfigPath -eq "") {
if (Test-Path "martilq.ini") {
$envPath = Get-ChildItem -Path Env:MARTILQ_MARTILQ_INI
if ($envPath -ne "" -and (Test-Path -Path $envPath -PathType Leaf)) {
$ConfigPath = $envPath
} else { if (Test-Path "martilq.ini") {
$ConfigPath = "martilq.ini"
} else {
if (Test-Path -Path "conf/martilq.ini" -PathType Leaf) {
$ConfigPath = "conf/martilq.ini"
} else { if (Test-Path -Path ".martilq/martilq.ini" -PathType Leaf) {
$ConfigPath = ".martilq/martilq.ini"
} else {
$homeDir = $env:USERPROFILE
if (Test-Path (Join-Path -Path $homeDir -ChildPath ".martilq/martilq.ini")) {
if (Test-Path -Path (Join-Path -Path $homeDir -ChildPath ".martilq/martilq.ini") -PathType Leaf) {
$ConfigPath = Join-Path -Path $homeDir -ChildPath ".martilq/martilq.ini"
}
}
}
}
}
if ($null -ne $ConfigPath -and $ConfigPath -ne "") {
Write-Log -LogEntry "Using configuration path '$ConfigPath'"
$iConfig = Get-IniFile -Path $ConfigPath

View File

@ -81,15 +81,17 @@ Param(
$lattribute = Set-MartiResourceAttributes -Path $item.FullName -FileType $item.Extension.Substring(1) -ExtendedAttributes:$ExtendAttributes
$expires = Set-DefaultExpiryDate -DocumentDate $item.LastWriteTime -Configuration $Configuration
$version = $Configuration.version
$dateToday = Get-Date -f $Configuration.dateTimeFormat
$oResource = [PSCustomObject]@{
title = Set-DefaultTitle -Document $item.Name -Configuration $Configuration
uid = (New-Guid).ToString()
documentName = $item.Name
issuedDate = Get-Date -f $Configuration.dateTimeFormat
issuedDate = $dateToday
modified = $item.LastWriteTime.ToString($Configuration.dateTimeFormat)
expires = $expires.ToString($Configuration.dateTimeFormat)
state = $Configuration.state
stateModified = $dateToday
author = $Configuration.author
length = $item.Length
hash = $hash

View File

@ -59,3 +59,36 @@ function Close-Log {
Write-Log "* End of processing: [$dateTime]"
Write-Log "***********************************************************************************"
}
function New-LocalTempFile{
Param (
[Parameter(Mandatory)][String] $UrlPath,
$Configuration,
$TempPath
)
# Create temporary file on disk for cases
# where file size, hashing and encryption are required
# This is useful for (1) CKAN file fetch
$parts = $UrlPath.split("/")
$doc_name = $parts[$parts.Length-1]
if ($null -eq $Configuration){
$oConfig = Get-Configuration
}
if ($null -ne $TempPath){
$temp_dir = $TempPath
}
else {
$temp_dir = $oConfig.tempPath
}
if (!(Test-Path -Path $temp_dir)) {
New-Item -Path $temp_dir -ItemType Directory
Write-Log("Created temp folder : $temp_dir")
}
return Join-Path -Path $temp_dir -ChildPath $doc_name
}

View File

@ -40,6 +40,13 @@ class martiLQ:
"url": ""
}
_oAcknowledgement = {
"url": "",
"algo": "",
"value": "",
"signed": False
}
_MartiErrorId = ""
_oMartiDefinition = None
@ -103,6 +110,7 @@ class martiLQ:
today = datetime.datetime.today()
dateToday = today.strftime("%Y-%m-%dT%H:%M:%S")
expires = self._oConfiguration.ExpireDate(None)
publisher = self._oConfiguration.GetConfig("publisher")
if publisher == "":
@ -125,6 +133,7 @@ class martiLQ:
"description": "",
"issued": dateToday,
"modified": dateToday,
"expires": expires.strftime("%Y-%m-%dT%H:%M:%S"),
"publisher": publisher,
"contactPoint": self._oConfiguration.GetConfig("contactPoint"),
"accessLevel": self._oConfiguration.GetConfig("accessLevel"),
@ -132,12 +141,14 @@ class martiLQ:
"tags": self._oConfiguration.GetConfig("tags"),
"license": self._oConfiguration.GetConfig("license"),
"state": self._oConfiguration.GetConfig("state"),
"stateModified": dateToday,
"batch": self._oConfiguration.GetConfig("batch"),
"describedBy": self._oConfiguration.GetConfig("describedBy"),
"landingPage": self._oConfiguration.GetConfig("landingPage"),
"theme": self._oConfiguration.GetConfig("theme"),
"resources": lresource,
"acknowledge": self._oAcknowledgement,
"custom": lcustom
}
@ -386,28 +397,32 @@ class martiLQ:
return oTestResults, testError
def ConvertFromCkan(CkanPath=None, PackageUrl=None, FetchResource=False):
def ConvertFromCkan(InputObject=None, FetchResource=False, DataPath=None):
if CkanPath is None or CkanPath == "":
if PackageUrl is None and PackageUrl == "":
raise Exception("CKAN file '{}' not supplied nor package Url '{}' ".format(CkanPath, PackageUrl))
else:
if InputObject is None or InputObject == "":
raise Exception("CKAN file '{}' not supplied as file or Url".format(InputObject))
if InputObject.startswith("https://") or InputObject.startswith("http://") or InputObject.startswith("ftp://"):
try:
user_agent = "Mozilla/5.0 (Windows NT 6.1; Win64; x64)"
headers = {"User-Agent": user_agent}
req = urllib.request.Request(PackageUrl, None, headers=headers, method="GET")
req = urllib.request.Request(InputObject, None, headers=headers, method="GET")
with urllib.request.urlopen(req) as response:
with tempfile.NamedTemporaryFile(delete=False) as tmp_file:
shutil.copyfileobj(response, tmp_file)
jsonFileName = tmp_file.name
except Exception as e:
print(e)
raise Exception("ERROR with: {}".format(PackageUrl))
raise Exception("ERROR with: {}".format(InputObject))
PackageUrl = InputObject
else:
if not os.path.exists(CkanPath):
raise Exception("CKAN file '{}' does not exist".format(CkanPath))
jsonFileName = CkanPath
if not os.path.exists(InputObject):
raise Exception("CKAN file '{}' does not exist".format(InputObject))
jsonFileName = InputObject
PackageUrl = None
jsonFile = open(jsonFileName, "r")
@ -416,6 +431,7 @@ def ConvertFromCkan(CkanPath=None, PackageUrl=None, FetchResource=False):
mlq = martiLQ()
oMarti = mlq.NewMartiDefinition()
mlq.LoadConfig(None)
oMarti["title"] = "Conversion from CKAN"
oMarti["state"] = oCkan["result"]["state"]
@ -447,7 +463,7 @@ def ConvertFromCkan(CkanPath=None, PackageUrl=None, FetchResource=False):
req = urllib.request.Request(resource["url"], None, headers=headers, method="GET")
with urllib.request.urlopen(req) as response:
#with tempfile.NamedTemporaryFile(delete=False) as tmp_file:
tmp_fileName = mUtility.MakeLocalTempFile(resource["url"], None)
tmp_fileName = mUtility.NewLocalTempFile(resource["url"], Configuration=None, TempPath=DataPath)
with open(tmp_fileName, "wb") as tmp_file:
shutil.copyfileobj(response, tmp_file)
@ -458,6 +474,7 @@ def ConvertFromCkan(CkanPath=None, PackageUrl=None, FetchResource=False):
if f_hash is None:
f_hash = local_res["hash"]
if DataPath is None:
os.remove(tmp_fileName)
except Exception as e:
@ -476,6 +493,7 @@ def ConvertFromCkan(CkanPath=None, PackageUrl=None, FetchResource=False):
"modified": resource["last_modified"],
"expires": None, #self._oConfiguration.ExpireDate(item).strftime("%Y-%m-%dT%H:%M:%S%z"),
"state": resource["state"],
"stateModified": resource["created"],
"author": oCkan["result"]["author"],
"length": f_leng,
"hash": f_hash,

View File

@ -95,15 +95,22 @@ class mConfiguration:
else:
self._Log.WriteLog("Configuration path '{}' does not exist".format(ConfigPath))
raise Exception("Configuration path '{}' does not exist".format(ConfigPath))
else:
# Check environment variable
check_ini = os.getenv("MARTILQ_MARTILQ_INI", "")
if check_ini != "" and os.path.exists(check_ini):
ConfigPath = check_ini
else:
# Look in default location and name
home = os.path.expanduser('~')
if os.path.exists(os.path.join(home, ".martilq/martilq.ini")):
ConfigPath = os.path.join(home, ".martilq/martilq.ini")
if os.path.exists("martilq.ini"):
ConfigPath = "martilq.ini"
elif os.path.exists("conf/martilq.ini"):
ConfigPath = os.path.join(home, "conf/martilq.ini")
elif os.path.exists(os.path.join(home, ".martilq/martilq.ini")):
ConfigPath = os.path.join(home, ".martilq/martilq.ini")
if not ConfigPath is None:
self._Log.WriteLog("Usig configuration path '{}'".format(ConfigPath))
self._Log.WriteLog("Using configuration path '{}'".format(ConfigPath))
config_object.read(ConfigPath)
if config_object.has_section("General"):
@ -168,6 +175,9 @@ class mConfiguration:
self._oConfiguration["signKey_Password"] = os.getenv("MARTILQ_SIGNKEY_PASSWORD", self._oConfiguration["signKey_Password"])
self._oConfiguration["logPath"] = os.getenv("MARTILQ_LOGPATH", self._oConfiguration["logPath"])
self._oConfiguration["dataPath"] = os.getenv("MARTILQ_DATAPATH", self._oConfiguration["dataPath"])
self._oConfiguration["tempPath"] = os.getenv("MARTILQ_TEMPPATH", self._oConfiguration["tempPath"])
self._Log.WriteLog("Configuration load processed")
@ -300,7 +310,7 @@ class mConfiguration:
raise Exception("Expires value '"+ self._oConfiguration["expires"] +"' is invalid")
base = lExpires[0]
if sourcePath == "" or base == "m":
if sourcePath is None or sourcePath == "" or base == "m":
base = "t"
modified = datetime.datetime.today()

View File

@ -104,6 +104,7 @@ class mResource:
"modified": last_modified_date,
"expires": self._oConfiguration.ExpireDate(item).strftime("%Y-%m-%dT%H:%M:%S%z"),
"state": self._oConfiguration.GetConfig("state"),
"stateModified": dateToday,
"author": self._oConfiguration.GetConfig("author"),
"length": os.path.getsize(SourcePath),
"hash": hash,

View File

@ -25,7 +25,7 @@ class mUtility:
self._Log.SetConfig(self._oConfiguration.GetConfig("logPath"), self._oConfiguration.GetSoftwareName())
def MakeLocalTempFile(UrlPath, Configuration):
def NewLocalTempFile(UrlPath, Configuration, TempPath=None):
# Create temporary file on disk for cases
# where file size, hashing and encryption are required
# This is useful for (1) CKAN file fetch
@ -36,7 +36,11 @@ class mUtility:
if Configuration is None:
Configuration = mConfiguration()
if not TempPath is None:
temp_dir = TempPath
else:
temp_dir = Configuration.GetConfig("tempPath")
if not os.path.isdir(temp_dir):
_log = mLogging()
_log.SetConfig(Configuration.GetConfig("logPath"), Configuration.GetSoftwareName())
@ -44,8 +48,3 @@ class mUtility:
_log.WriteLog("Created temp folder : {}".format(temp_dir))
return os.path.join(temp_dir, doc_name)

View File

@ -1,28 +1,26 @@
To execute the PowerShell scripts, please invoke from the root Marti directory and not from
with the current directory set to ``.\test\powershell``
within the cdirectory ``.\test\powershell``
``powershell
# To seed the test data
# Add code for FTP fetch for BSB
.\test\powershell\test_retrievedata.ps1
# To seed the test data of BSB from the internet
.\test\powershell\martiLQ_base_test.ps1
# For creating a martiLQ definition on the files in docs directory
.\test\powershell\martiLQ_docs_test.ps1
# For initial tests
.\test\powershell\test_MartiLQ.ps1
# For converting CKAN definition to martiLQ
.\test\powershell\martiLQ_ckan_test.ps1
#
.\test\powershell\test_MartiLQCkan.ps1
.\test\powershell\martiLQ_data1_test.ps1
#
.\test\powershell\test_MartiLQData1.ps1
.\test\powershell\martiLQ_data2_test.ps1
#
.\test\powershell\test_MartiLQData2.ps1
#
.\test\powershell\test_MartiLQData3.ps1
.\test\powershell\martiLQ_data3_test.ps1
``

View File

@ -1,6 +1,4 @@
. .\source\powershell\MartiLQ.ps1
. .\source\powershell\MartiLQUtilities.ps1

View File

@ -55,4 +55,18 @@ $x = ConvertTo-Json -InputObject $oMarti -Depth 5
Set-Content -Path $outFile -Value $x
Write-Host "Wrote converted definition to: $outFile"
# SG
$outFile = ".\test\powershell\results\test_martiLQ_ckan_SG1.json"
$oMarti = ConvertFrom-Ckan -InputObject "https://data.gov.sg/api/action/package_show?id=e7a00a47-2676-4352-9495-a796124a3453" -FetchResource -DataPath "test/powershell/results/data"
$oMarti.description = "This data has been converted from SG CKAN data source with URL 'https://data.gov.sg/api/action/package_show?id=e7a00a47-2676-4352-9495-a796124a3453'"
$oMarti.tags += "ckan"
$oMarti.tags += "gov"
$oMarti.tags += "sg"
$oMarti.publisher = "Singapore"
$oMarti.landingPage = ""
$x = ConvertTo-Json -InputObject $oMarti -Depth 5
Set-Content -Path $outFile -Value $x
Write-Host "Wrote converted definition to: $outFile"
Write-Host "Execution completed"

View File

@ -91,7 +91,6 @@ try {
$lattribute += $oAttribute
$x = ConvertTo-Json -InputObject $lattribute
$x
}
catch {

View File

@ -0,0 +1,76 @@
import os
import sys
import urllib.request
import shutil
import json
import csv
import zipfile
import datetime
import time
sys.path.insert(0, "./source/python/client")
from martiLQ import *
os.environ["MARTILQ_LOGPATH"] = "./test/python/results/logs"
print("Python base sample/test case")
def HttpList(remote_url):
files = []
with open("./docs/source/samples/python/listfiles_bsb_http.txt", "r") as f:
files = f.read().splitlines()
return files
remote_url = "http://apnedata.merebox.com.s3.ap-southeast-2.amazonaws.com/au/bsb/"
print("Fetch sample file list")
files = HttpList(remote_url)
test_dir = "./test/python/results"
if not os.path.exists(test_dir):
os.mkdir(test_dir)
if not os.path.exists(os.path.join(test_dir, "data")):
os.mkdir(os.path.join(test_dir, "data"))
print("Fetch sample files via HTTP")
for file_name in files:
if file_name.startswith("BSBDirectory"):
if file_name.endswith(".csv") | file_name.endswith(".txt"):
try:
with urllib.request.urlopen(remote_url + file_name) as resp:
last_modified = resp.info()["Last-Modified"]
dt_obj = datetime.datetime.strptime(last_modified, '%a, %d %b %Y %H:%M:%S %Z')
data_file_name = os.path.join(test_dir, "data", file_name)
with open(data_file_name, 'wb') as data_file:
shutil.copyfileobj(resp, data_file)
modTime = time.mktime(dt_obj.timetuple())
os.utime(data_file_name, (modTime, modTime))
except Exception as e:
print("error "+ str(e))
print("error with fetching "+remote_url + file_name)
print("Creating martiLQ definition")
mlq = martiLQ()
oMarti = mlq.NewMartiDefinition()
for file_name in files:
if file_name.startswith("BSBDirectory"):
if file_name.endswith(".csv") | file_name.endswith(".txt"):
oResource = mlq.NewMartiLQResource(os.path.join(test_dir, "data", file_name), "", False, True)
oMarti["resources"].append(oResource)
mlq.Close()
print("Save martiLQ definition")
jsonFile = open(os.path.join(test_dir, "martiLQ_base_test.json"), "w")
jsonFile.write(json.dumps(oMarti, indent=5))
jsonFile.close()
print("Base sample JSON written: martiLQ_base_test.json")

View File

@ -13,9 +13,9 @@ os.environ["MARTILQ_LOGPATH"] = "./test/python/results/logs"
print("Python sample/test case for Singapore CKAN #1")
srcFile = ".\docs\source\samples\json\CKAN_SG_ChargeableIncomeofCompanies.json"
mlq = ConvertFromCkan(CkanPath=srcFile, PackageUrl="https://data.gov.sg/api/action/package_show?id=e7a00a47-2676-4352-9495-a796124a3453")
mlq = ConvertFromCkan(InputObject=srcFile)
saveFile = "./test/python/results/test_martiLQ_ckan_SG1.json"
saveFile = "./test/python/results/martiLQ_ckan_test_SG1.json"
mlq.Save(saveFile)
print("Saved martiLQ document: " + saveFile)
@ -23,19 +23,19 @@ print("Saved martiLQ document: " + saveFile)
print("Python sample/test case for Singapore CKAN #2")
srcFile = ".\docs\source\samples\json\CKAN_SG_ChargeableIncomeofCompanies.json"
mlq = ConvertFromCkan(CkanPath=None, PackageUrl="https://data.gov.sg/api/action/package_show?id=e7a00a47-2676-4352-9495-a796124a3453", FetchResource=True)
mlq = ConvertFromCkan(InputObject="https://data.gov.sg/api/action/package_show?id=e7a00a47-2676-4352-9495-a796124a3453", FetchResource=True, DataPath="test/python/results/data")
saveFile = "./test/python/results/test_martiLQ_ckan_SG2.json"
saveFile = "./test/python/results/martiLQ_ckan_test_SG2.json"
mlq.Save(saveFile)
print("Saved martiLQ document: " + saveFile)
print("Python sample/test case for Australia CKAN")
srcFile = ".\docs\source\samples\json\CKAN_AU_asic_ckan_api.json"
mlq = ConvertFromCkan(CkanPath=srcFile, PackageUrl="")
mlq = ConvertFromCkan(InputObject=srcFile)
print("Wrote converted definition to: " + srcFile)
saveFile = "./test/python/results/test_martiLQ_ckan_AU1.json"
saveFile = "./test/python/results/martiLQ_ckan_test_AU1.json"
mlq.Save(saveFile)
print("Saved martiLQ document: " + saveFile)

View File

@ -13,40 +13,40 @@ os.environ["MARTILQ_LOGPATH"] = "./test/python/results/logs"
print("Python sample/test case")
mlq = martiLQ()
mlq.LoadConfig()
mlq.LoadConfig(ConfigPath=None)
oMarti = mlq.NewMartiDefinition()
mlq.NewMartiChildItem(SourceFolder= "./docs/*", UrlPath="./docs" , ExcludeHash=False, ExtendAttributes=True)
oMarti["description"] = "Sample execution #1"
saveFile = "./test/python/results/DocsPlain1.json"
saveFile = "./test/python/results/martiLQ_docs_test_DocsPlain1.json"
mlq.Save(saveFile)
print("Saved martiLQ document: " + saveFile)
saveFile = "./test/python/results/DocsPlain2.json"
saveFile = "./test/python/results/martiLQ_docs_test_DocsPlain2.json"
oMarti["description"] = "Sample execution #2"
jsonFile = open(saveFile, "w")
jsonFile.write(json.dumps(oMarti, indent=5))
jsonFile.close()
print("Saved martiLQ document: " + saveFile)
saveFile = "./test/python/results/DocsPlain1.json"
saveFile = "./test/python/results/martiLQ_docs_test_DocsPlain1.json"
print("Load martiLQ document: "+saveFile)
mlq.Load(saveFile)
oMarti = mlq.Get()
print("Definition description is: {}".format(oMarti["description"]))
mlq.CloseLog()
mlq.Close()
configPath = "./docs/source/samples/conf/GEN005.ini"
sourcePath = "./docs/source/*"
saveFile = "./test/python/results/test_proc_docs.json"
ProcessFilePath(ConfigPath=configPath, SourcePath=sourcePath, Filter="", Recursive=True, UrlPrefix="https://localhost/", DefinitionPath=saveFile)
saveFile = "./test/python/results/martiLQ_docs_test_proc.json"
Make(ConfigPath=configPath, SourcePath=sourcePath, Filter="", Recursive=True, UrlPrefix="https://localhost/", DefinitionPath=saveFile)
print("Saved martiLQ document: " + saveFile)
sourcePath = "./docs/source/samples/python/test/http/*"
saveFile = "./test/python/results/test_proc_bsb.json"
ProcessFilePath(ConfigPath=configPath, SourcePath=sourcePath, Filter="", Recursive=True, UrlPrefix="http://apnedata.merebox.com.s3.ap-southeast-2.amazonaws.com/au/bsb/", DefinitionPath=saveFile)
sourcePath = "./test/python/results/data/*"
saveFile = "./test/python/results/martiLQ_docs_test_bsb.json"
Make(ConfigPath=configPath, SourcePath=sourcePath, Filter="BSBDirectory*", Recursive=True, UrlPrefix="http://apnedata.merebox.com.s3.ap-southeast-2.amazonaws.com/au/bsb/", DefinitionPath=saveFile)
print("Saved martiLQ document: " + saveFile)

View File

@ -1,9 +1,19 @@
# Tools
A number of tools are povided that can be incorporated into your
projects that are want to use the metadata transfer reconciliation format
projects that want to use the metadata transfer reconciliation format
(martiLQ document).
The Python or PowerShell (Windows or Linux) scripts can be
inserted into your processing pipeline either to pack or
unpack a single file or a set of files. Combining this
with the compress and encrypt facility will be a solution
design with load assurance with security.
If your data is for public or third party consumption
the scripts can produce an immutable package that
assures your consumers of its authenticity and provenance.
You can combine these in different ways such as:
* Use Python extract program to generate the martiLQ document