codeberg.org-steko-harris-matrix-data-package_-_2020-03-04_22-46-43
Item Preview
Share or Embed This Item
Flag this item for
- Publication date
- 2020-03-04
A Data Package specification for archaeological stratigraphy data following the Harris Matrix convention.
Harris Matrix Data Package
This repository contains archaeological stratigraphic datasets in CSVformat, following the table schema developed by Thomas S. Dye for thehm
Lisp package,together with a Python command-line tool that can check consistency ofdata with the format.
Each dataset contains various tables and a data package descriptor(datapackage.json
) that enables consistency checks and streamlineddata access with the Frictionless Datatools and programming libraries.
Setting up the environment
I installed the Python datapackage
and goodtables
packages withPipenv. The repository contains aPipfile, so it should be enough to run:
pipenv install
Then install the hmdp package with:
pipenv run python setup.py install
This will make the hmdp
command available in the virtualenvironment.
All source code is formatted with Black.
Glossary
In the Frictionless Data glossary:
- data descriptor is a JSON file, named
datapackage.json
, thatis-found in the top-level directory of a data package, and containsmetadata about the entire data package (name, description, creationdate, author names, references) together with the data packageschema - resource is a single block of data, such as a CSV table or aJSON data file
In the Harris Matrix Data Package:
- each Harris Matrix is a data package
- there is 1 data descriptor
- there are from 2 to 7 CSV tables
- each CSV table is a resource
The two resources that MUST be present are:
- contexts
- observations
Most often, excavation data will make use of three other resources:
- inferences
- periods
- phases
Only in case there are radiocarbon dates or other absolute chronologyavailable the two resources should be used:
- events
- event-order
Resource names are standardized so that the data descriptor can remainlargely untouched, except for the specific metadata.
Using the hmdp
program from the command line
hmdp matrix datapackage.json
will check stratigraphy data consistencyand output a matrix.gv
file for processing with Graphviz.
To create a graphical representation of the resulting matrix, thedefault procedure is to use the dot
command, like this:
dot matrix.gv -Tpng -o matrix.png
In case something goes wrong, but also if you are experimenting withthe data format, the check
command is a useful shortcut to run allpossible automated checks.
hmdp check datapackage.json
will perform three checks on the dataset:
- validate the data descriptor without looking at the data(e.g. resources can be missing or broken but the JSON file is wellformatted), this is equivalent to running
datapackage validatedatapackage.json
- validate every resource for internal consistency (e.g. there arecolumn headers, each row has the right number of columns,constraints like integer values, enums, etc. are respected), this isequivalent to running
goodtables datapackage.json
(but in case oferrors the separate command will give more details) - check the consistency of foreign keys based on the data descriptor,again using the goodtables programming library.
How to cite this work
If you use this software in your research, please provide a citation tothe paper introducing it:
Costa, Stefano. “Una proposta di standard per l’archiviazione e la condivisione di dati stratigrafici.” Archeologia e Calcolatori, 30, 2019, pp. 459–62, DOI: https://doi.org/10.19282/ac.30.2019.29
To restore the repository download the bundle
wget https://archive.org/download/codeberg.org-steko-harris-matrix-data-package_-_2020-03-04_22-46-43/steko-harris-matrix-data-package_-_2020-03-04_22-46-43.bundle
and run: git clone steko-harris-matrix-data-package_-_2020-03-04_22-46-43.bundle
Source: https://codeberg.org/steko/harris-matrix-data-package
Uploader: steko
Upload date: 2020-03-04
- Addeddate
- 2021-02-14 17:31:03
- Identifier
- codeberg.org-steko-harris-matrix-data-package_-_2020-03-04_22-46-43
- Pushed_date
- 2020-03-04 22:46:43
- Scanner
- Internet Archive Python library 1.9.9
- Uploaded_with
- iagitup - v1.6.2
- Year
- 2020