ICES TCRE Virtual Research Environment
This VLab that supported the ICES Training Course in the R Environment
(TCRENV) (Date: 29 February-4 March 2016, Location: Copenhagen, Denmark).
The objective of the course was to provide participants with a solid
foundation in efficient use of the R environment using various typical and
familiar fisheries data sets (landings data, catch data, survey data and
tagging data) as case examples. Emphasis was put on data munging and
literate programming starting with 'raw' data (individual stations,
individual fish measurements) and culminating with deliverance of
publishable output produced from a single coded document file. This VRE
provides access to processing as well as to data preparation and sharing
facilities, including: an online R development environment (RStudio), data
mining algorithms, a tabular data management application, GIS maps of
environmental and biological data, access to biodiversity occurrence
records and taxonomic data from major data providers (e.g. OBIS, GBIF,
WoRMS, Catalog of Life), state-of-the-art stock assessment models and
sharing tools (Workspace, Social Networking). In addition to the basic
functionalities, as a workspace for sharing objects of interest, a social
networking area for supporting the discussions among members and a user
management facility for managing membership, this VRE is specifically
equipped with the following capability: Tabular Data Management: a facility
enabling users to import, curate and manage tabular data. This feature can
support data managers during the whole life cycle of data management from
data capture to publication and visualisation. It enables data managers to
import and transform datasets (CSV, SDMX, JSON) into tabular resources
(i.e. tabular data having proper types associated with columns eventually
referring to code lists) and reference datasets (code lists) representing
recognized value instances of the elements the dataset is about (e.g.,
species, zones, countries). This functionality guarantees that the tabular
resources are compliant with the defined types and code lists. Besides the
curation, the facility supports the analysis of the data by enabling a user
to perform operations like grouping and filtering, producing charts and GIS
maps (if the data have geographic features) and analysing the data via an R
environment as well as via the data analytics facilities. Finally, the
environment supports the publishing of tabular resources in the
infrastructure by equipping them with rich metadata so that such resources
can be used in other application contexts Data Analytics at Scale: a
facility enabling users to benefit from the offering of the DataMiner
service and interactively execute a large array of data analytics tasks on
datasets. These algorithms range from data clustering and anomalies
detection methods (e.g. DBScan and KMeans) to algorithms for manipulating
datasets from the geospatial perspective (e.g. transform FAO Area Code in
latitude and longitude) Species Data Discovery: facility enabling users to
discover and manage species data products (occurrence data and taxonomic
data) from a number of heterogeneous providers (including GBIF and
speciesLink for occurrences data, and ASFIS, BrazilianFlora,
CatalogueOfLife, IRMNG, IT IS, NCBI, WoRDSS, WoRMS for taxonomic data) in a
seamless way. Once discovered, objects can be stored in the workspace for
future uses Geospatial Data View: facility enabling users to discover and
visualize GIS layers, e.g. species distribution maps, Sea Surface
Temperature, that have been generated and/or published. This facility
relies on the GeoExplorer portlet and makes it possible to effectively
exploit the generated maps and perform comparisons and analysis of the
diverse distributions by enabling maps overlay, transects production and
values inspection