In search of Canadian HGIS data

Researchers who study Canada have generated large quantities of geohistorical data for many years. While we reflect on the creation of a national geohistorical infrastructure, it is pertinent to identify datasets at different scales which can become a part of such a portal. We are therefore trying to enhance the discoverability of existing and available datasets. In the long run, it would be preferable to enumerate and describe each layer and each attribute  table, it is not necessary, for the moment, to delve at such a level of detailed granularity. We hope, at this stage, to identify collections which have emerged from different research projects or from the online deposit of previously georeferenced digital data such as:

  • raster geographic maps
  • aerial photographs
  • vector layers
  • attribute data linked to vector layers

We have already identified datasets offered by different types of creators so that we can present diversity in the nature and the type of data which can interest researchers. We have therefore identified:

  • quality international data (FAO)
  • data from collaborative mapping projects (Open Street Map, Natural Earth)
  • data available on GIS company web sites  (ESRI)
  • national data (government of Canada, Géogratis)
  • provincial or territorial data (British Columbia, Yukon, Québec, Nova Scotia, Prince Edward Island, New Brunswick)
  • municipal data  (Toronto, Montréal, Sherbrooke)
  • research team data (CIEQ, NICHE, LHPM, MAP, VIHistory)
  • data from map library and archive centres (Scholars’ Geoportal, MADGIC, GéoIndex+)
  • personal initiative data  (historical railway lines )

Choosing what type of metadata to associate with each dataset has meant achieving a compromise. An insuficient level of detail would prevent effective searches while requirements for overly detailed metadata could discourage data creators who are not trainted to  create metadata which meet international standards. According to Rodolphe Devillers, we can use six criteria to define the quality of a geospatial dataset1.

i. Definition : Allows the user to evaluate if the nature of a datum and of the object it describes, i.e. “what”,  meets his or her requirements (semantic, spatial and temporal definitions);

ii. Coverage : Allows the user to evaluate if the territory and the period for which the data exists, i.e. the “where” and the “when”, meet his or her requirements ;

iii. Genealogy : Allows the user to know where the data came from, the project’s objectives when the data was acquired, the methods used to obtain the data, i.e. the “how” and the “why” and to verify if this meets the user’s requirements ;

iv. Precision : Allows the user to evaluate the data’s worth and if it is acceptable for the user’s requirements (semantic, temporal and spatial precision of the object and of its attributes);

v. Legitimacy : Allows the user to evaluate the official recognition and the legal standing of the data and if it meets the user’s requirements (de facto standards, recognised good practices, legal or administrative recognition by an official agency, legal garantee by a supplier, , etc.);

vi. Accessibility : Allows the user to evaluate how easily the user can obtain the data (cost, delays,format, privacy, respect of recognised good practices, copyright, etc.).

A metadata standard which would meet all of these criteria may seem overwhelming for may people who would like to make their data available. We therefore propose to use the format defined by the Dublin Core Metadata Initiative, an international standard for which the types of fields are easier to understand for people less familiar with metadata. We have applied and interpreted the DCMI based upon its general definition available on Wikipedia2 and on the interpretation of a few fields proposed by the Bibliothèque nationale de France3. This approach can certainly be criticised, because it is geared towards a simple application rather than perfection. Based on how metadata will be entered in this list, we can refine these principles to improve this compromise. The fields do not appear in the same order as in the DCMI and some are subdivided to provide for a slightly finer level of granularity.

Table 1. List of fields used to describe datasets

Élément (French) Élément (EnGLISH) Comment
Créateur Creator The main entity responsible for creating the content of the resource. It can be the name of one or many people, an organisation, or a service.
Format : Last name, First name.
Separate multiple entities with a semi colon.Optional
Contributeur Contributor Entity reponsible for contributing to the content of the resource. It can be the name of one or many people, an organisation, or a service.
Format : Last name, First name.
Separate multiple entities with a semi colon.Optional
Titre Title Name given to the resource.
The title is genarally the formal name under which the resource is known. Indicate the title in the language of origin of the resource.If the resource does not have a formal title and if the title is derived from the content, place the title between square brackets.Required
Description.Générale Description.General A presentation of the content of the  resource. Examples of descriptions are generally in free form text. As much as possible, use the description provided by the creators of the resource.

Optional

Description.Nature-du-projet Description.Project-type A key word which allows us to categoriese projects according to the following typology:

– gouvernemental
– NGO
– academic
– individual
– commercial
– collaborative

Required

Description.Méthodologie Description.Methodology Free form text which describes the process used to create the resource.

Required

Description.Sources Description.Sources List of documents which were used to create the resource. This field is different from the field Source, which is used to identify where a user can acquire the resource.

Optional

Description.Champs Description.Fields List of fields used in the table or database, preferably with a description.

Optional

Date.Publication Date.Published Date where the resource was originally created. This is not necessarily the date represented by the resource.

Required

Date.Mise-à-jour Date.Updated Date of an update event in the life cycle of the resource.

Optional

Couverture.Temps Coverage.Time Perimeter or domain of the resource, in this case, the date, the year or the period represented by the resource.

Required

Couverture.Espace Coverage.Space Perimeter or domain of the resource, in this case, the territory. It is recommended to use a value from a controled vocabulary.

Required

Couverture.Niveau Coverage.Level A key word which identified the level of the spatial coverage of the resource:

– international
– national
– provincial
– regional
– municipal
– local

Required

Sujet.ISO Subject.ISO A keyword which allows us to link the resource to one of the ISO categories of geospatial data.

– agriculture / farming
– biota / biota
– limites administratives / boundaries
– climatologie / climatology
– économie / economy
– élévation / elevation
– environnement / environment
– information géoscientifique / geoscientific information
– santé / health
– imagerie / imagery
– intelligence / intelligence (militaire)
– eaux intérieures / inland waters
– localisation / location
– océans / oceans
– urbanisme / planning
– société / society
– structure / structure
– transport / transportation
– services publics / utilities

Voir : https://geo-ide.noaa.gov/wiki/index.php?title=ISO_Topic_Categories

Required

Sujet Sujet One or several keywords which can be used to categorise the resource.

Optional

Format Format The physical or in this case, the digital manifestation of the resource, ie, the MIME type of the document :

– shp
– kml
– kmz
– zip
– csv
– other formats used in GIS

Required

Langue Language The language of the intellectual content of the resource.
It is recommended to use a value defined in RFC 3066 [RFC3066] which, with the ISO 639 [ISO639] standard, defines 2 letter primary language codes, as well as optional subcodes.
Exemples :- en
– frRequired
Type de ressource Type Type of content.
By default, the resources identified as part of this project are part of the dataset type.Required
Droits.Licence Rights.License Brief indication of the type of licence which applies to the data:

– copyright
– CC (or one of its variations)
– public domain
– open

Required

Droits.Accessibilité Rights.Access One of the following termes will allow us to indentify how the data can be accessed.

– free
– one time payment
– free subscription
– paid subcription

Required

Droits.Conditions d’utilisation Rights.Terms of use Text copied and pasted from the web site where the data is deposited to specify the creators’ terms of use.

Optional

Source Source Location from which a user can obtain the resource. This will generally be a URL.  A Source.URI could be added should it become pertinent.

Required

Relation Relation Link to other resources. A resource can be derived from another or can be associated with another as part of a project.
Exemples : isPartOf [other resource number]
isChildOf [other resource number]
isDerivedFrom [other resource number]Optional
Éditeur Publisher Name of the person, organisation or service which published the document.

Optional

Commentaire Comment Any additionnal information which can help users better undertand the resource.

Optional

 

A list of identified resources is available here:  http://bit.ly/2rlIkRC. Some of the notices are incomplete and we are working on completing them. If you would like to propose a dataset, you can fill out the form available here: http://geohist.ca/donnees-sigh-hgis-data-form

1  DEVILLERS, Rodolphe (2004). « Conception d’un système multidimensionnel d’information sur la qualité des données géospatiales », [En ligne], Ph. D., Université Laval <http://theses.ulaval.ca/archimede/fichiers/22242/22242.html>.

2  Collaborateurs de Wikipédia (2016). « Dublin Core » <https://fr.wikipedia.org/wiki/Dublin_Core#Liste_des_.C3.A9l.C3.A9ments_et_raffinements>.

3  Bibliothèque nationale de France, Direction des Services et des Réseaux, Département de l’Information bibliographique et numérique (2008). « Guide d’utilisation du Dublin Core (DC) à la BnF : Dublin Core simple et Dublin Core qualifié, avec indications pour utiliser le profil d’application de TEL », version 2.0 <http://www.bnf.fr/documents/guide_dublin_core_bnf_2008.pdf>.

This post is also available in: French