A Data Package specification for archaeological stratigraphy data following the Harris Matrix convention.
Harris Matrix Data Package
This repository contains archaeological stratigraphic datasets in CSVformat, following the table schema developed by Thomas S. Dye for thehm Lisp package,together with a Python command-line tool that can check consistency ofdata with the format.
Each dataset contains various tables and a data package descriptor(datapackage.json) that enables consistency checks and streamlineddata access with the Frictionless Datatools and programming libraries.
Setting up the environment
I installed the Python datapackage and goodtables packages withPipenv. The repository contains aPipfile, so it should be enough to run:
Then install the hmdp package with:
pipenv run python setup.py install
This will make the hmdp command available in the virtualenvironment.
data descriptor is a JSON file, named datapackage.json, thatis-found in the top-level directory of a data package, and containsmetadata about the entire data package (name, description, creationdate, author names, references) together with the data packageschema
resource is a single block of data, such as a CSV table or aJSON data file
In the Harris Matrix Data Package:
each Harris Matrix is a data package
there is 1 data descriptor
there are from 2 to 7 CSV tables
each CSV table is a resource
The two resources that MUST be present are:
Most often, excavation data will make use of three other resources:
Only in case there are radiocarbon dates or other absolute chronologyavailable the two resources should be used:
Resource names are standardized so that the data descriptor can remainlargely untouched, except for the specific metadata.
Using the hmdp program from the command line
hmdp matrix datapackage.json will check stratigraphy data consistencyand output a matrix.gv file for processing with Graphviz.
To create a graphical representation of the resulting matrix, thedefault procedure is to use the dot command, like this:
dot matrix.gv -Tpng -o matrix.png
In case something goes wrong, but also if you are experimenting withthe data format, the check command is a useful shortcut to run allpossible automated checks.
hmdp check datapackage.json will perform three checks on the dataset:
validate the data descriptor without looking at the data(e.g. resources can be missing or broken but the JSON file is wellformatted), this is equivalent to running datapackage validatedatapackage.json
validate every resource for internal consistency (e.g. there arecolumn headers, each row has the right number of columns,constraints like integer values, enums, etc. are respected), this isequivalent to running goodtables datapackage.json (but in case oferrors the separate command will give more details)
check the consistency of foreign keys based on the data descriptor,again using the goodtables programming library.
How to cite this work
If you use this software in your research, please provide a citation tothe paper introducing it:
Costa, Stefano. “Una proposta di standard per l’archiviazione e la condivisione di dati stratigrafici.” Archeologia e Calcolatori, 30, 2019, pp. 459–62, DOI: https://doi.org/10.19282/ac.30.2019.29