Researchers who study Canada have generated large quantities of geohistorical data for many years. While we reflect on the creation of a national geohistorical infrastructure, it is pertinent to identify datasets at different scales which can become a part of such a portal. We are therefore trying to enhance the discoverability of existing and available datasets. In the long run, it would be preferable to enumerate and describe each layer and each attribute table, it is not necessary, for the moment, to delve at such a level of detailed granularity. We hope, at this stage, to identify collections which have emerged from different research projects or from the online deposit of previously georeferenced digital data such as:
- raster geographic maps
- aerial photographs
- vector layers
- attribute data linked to vector layers
We have already identified datasets offered by different types of creators so that we can present diversity in the nature and the type of data which can interest researchers. We have therefore identified:
- quality international data (FAO)
- data from collaborative mapping projects (Open Street Map, Natural Earth)
- data available on GIS company web sites (ESRI)
- national data (government of Canada, Géogratis)
- provincial or territorial data (British Columbia, Yukon, Québec, Nova Scotia, Prince Edward Island, New Brunswick)
- municipal data (Toronto, Montréal, Sherbrooke)
- research team data (CIEQ, NICHE, LHPM, MAP, VIHistory)
- data from map library and archive centres (Scholars’ Geoportal, MADGIC, GéoIndex+)
- personal initiative data (historical railway lines )
Choosing what type of metadata to associate with each dataset has meant achieving a compromise. An insuficient level of detail would prevent effective searches while requirements for overly detailed metadata could discourage data creators who are not trainted to create metadata which meet international standards. According to Rodolphe Devillers, we can use six criteria to define the quality of a geospatial dataset1.
i. Definition : Allows the user to evaluate if the nature of a datum and of the object it describes, i.e. “what”, meets his or her requirements (semantic, spatial and temporal definitions);
ii. Coverage : Allows the user to evaluate if the territory and the period for which the data exists, i.e. the “where” and the “when”, meet his or her requirements ;
iii. Genealogy : Allows the user to know where the data came from, the project’s objectives when the data was acquired, the methods used to obtain the data, i.e. the “how” and the “why” and to verify if this meets the user’s requirements ;
iv. Precision : Allows the user to evaluate the data’s worth and if it is acceptable for the user’s requirements (semantic, temporal and spatial precision of the object and of its attributes);
v. Legitimacy : Allows the user to evaluate the official recognition and the legal standing of the data and if it meets the user’s requirements (de facto standards, recognised good practices, legal or administrative recognition by an official agency, legal garantee by a supplier, , etc.);
vi. Accessibility : Allows the user to evaluate how easily the user can obtain the data (cost, delays,format, privacy, respect of recognised good practices, copyright, etc.).
A metadata standard which would meet all of these criteria may seem overwhelming for may people who would like to make their data available. We therefore propose to use the format defined by the Dublin Core Metadata Initiative, an international standard for which the types of fields are easier to understand for people less familiar with metadata. We have applied and interpreted the DCMI based upon its general definition available on Wikipedia2 and on the interpretation of a few fields proposed by the Bibliothèque nationale de France3. This approach can certainly be criticised, because it is geared towards a simple application rather than perfection. Based on how metadata will be entered in this list, we can refine these principles to improve this compromise. The fields do not appear in the same order as in the DCMI and some are subdivided to provide for a slightly finer level of granularity.
Table 1. List of fields used to describe datasets
Élément (French) | Élément (EnGLISH) | Comment |
---|---|---|
Créateur | Creator | The main entity responsible for creating the content of the resource. It can be the name of one or many people, an organisation, or a service. Format : Last name, First name. Separate multiple entities with a semi colon.Optional |
Contributeur | Contributor | Entity reponsible for contributing to the content of the resource. It can be the name of one or many people, an organisation, or a service. Format : Last name, First name. Separate multiple entities with a semi colon.Optional |
Titre | Title | Name given to the resource. The title is genarally the formal name under which the resource is known. Indicate the title in the language of origin of the resource.If the resource does not have a formal title and if the title is derived from the content, place the title between square brackets.Required |
Description.Générale | Description.General | A presentation of the content of the resource. Examples of descriptions are generally in free form text. As much as possible, use the description provided by the creators of the resource.
Optional |
Description.Nature-du-projet | Description.Project-type | A key word which allows us to categoriese projects according to the following typology:
– gouvernemental Required |
Description.Méthodologie | Description.Methodology | Free form text which describes the process used to create the resource.
Required |
Description.Sources | Description.Sources | List of documents which were used to create the resource. This field is different from the field Source, which is used to identify where a user can acquire the resource.
Optional |
Description.Champs | Description.Fields | List of fields used in the table or database, preferably with a description.
Optional |
Date.Publication | Date.Published | Date where the resource was originally created. This is not necessarily the date represented by the resource.
Required |
Date.Mise-à-jour | Date.Updated | Date of an update event in the life cycle of the resource.
Optional |
Couverture.Temps | Coverage.Time | Perimeter or domain of the resource, in this case, the date, the year or the period represented by the resource.
Required |
Couverture.Espace | Coverage.Space | Perimeter or domain of the resource, in this case, the territory. It is recommended to use a value from a controled vocabulary.
Required |
Couverture.Niveau | Coverage.Level | A key word which identified the level of the spatial coverage of the resource:
– international Required |
Sujet.ISO | Subject.ISO | A keyword which allows us to link the resource to one of the ISO categories of geospatial data.
– agriculture / farming Voir : https://geo-ide.noaa.gov/wiki/index.php?title=ISO_Topic_Categories Required |
Sujet | Sujet | One or several keywords which can be used to categorise the resource.
Optional |
Format | Format | The physical or in this case, the digital manifestation of the resource, ie, the MIME type of the document :
– shp Required |
Langue | Language | The language of the intellectual content of the resource. It is recommended to use a value defined in RFC 3066 [RFC3066] which, with the ISO 639 [ISO639] standard, defines 2 letter primary language codes, as well as optional subcodes. Exemples :- en – frRequired |
Type de ressource | Type | Type of content. By default, the resources identified as part of this project are part of the dataset type.Required |
Droits.Licence | Rights.License | Brief indication of the type of licence which applies to the data:
– copyright Required |
Droits.Accessibilité | Rights.Access | One of the following termes will allow us to indentify how the data can be accessed.
– free Required |
Droits.Conditions d’utilisation | Rights.Terms of use | Text copied and pasted from the web site where the data is deposited to specify the creators’ terms of use.
Optional |
Source | Source | Location from which a user can obtain the resource. This will generally be a URL. A Source.URI could be added should it become pertinent.
Required |
Relation | Relation | Link to other resources. A resource can be derived from another or can be associated with another as part of a project. Exemples : isPartOf [other resource number] isChildOf [other resource number] isDerivedFrom [other resource number]Optional |
Éditeur | Publisher | Name of the person, organisation or service which published the document.
Optional |
Commentaire | Comment | Any additionnal information which can help users better undertand the resource.
Optional |
A list of identified resources is available here: http://bit.ly/2rlIkRC. Some of the notices are incomplete and we are working on completing them. If you would like to propose a dataset, you can fill out the form available here: http://geohist.ca/donnees-sigh-hgis-data-form
1 DEVILLERS, Rodolphe (2004). « Conception d’un système multidimensionnel d’information sur la qualité des données géospatiales », [En ligne], Ph. D., Université Laval <http://theses.ulaval.ca/archimede/fichiers/22242/22242.html>.
2 Collaborateurs de Wikipédia (2016). « Dublin Core » <https://fr.wikipedia.org/wiki/Dublin_Core#Liste_des_.C3.A9l.C3.A9ments_et_raffinements>.
3 Bibliothèque nationale de France, Direction des Services et des Réseaux, Département de l’Information bibliographique et numérique (2008). « Guide d’utilisation du Dublin Core (DC) à la BnF : Dublin Core simple et Dublin Core qualifié, avec indications pour utiliser le profil d’application de TEL », version 2.0 <http://www.bnf.fr/documents/guide_dublin_core_bnf_2008.pdf>.