228 items found

Organisations: SoBigData Catalogue

Filter Results
  • Dataset

    Broad Twitter Corpus

    The Broad Twitter Corpus is a named entity-annotated dataset of tweets, collected in order to capture temporal, spatial and social diversity. The goal of the corpus is to...
    • JSON
      The resource: 'Broad Twitter Corpus' is not accessible as guest user. You must login to access it!
  • Method

    Gene-specific regularization for COPD partial-correlation estimation

    We introduce a gene-specific regularization factor when computing the Partial Correlation score to make the indeterminate regression feasible. We decided to slightly modify...
  • Dataset

    Estonian public sector electronic services and service providers and consumers

    The dataset contains records of electronic services (aka X-Road services), service providers and consumers harvested in April 2014 from RIHA (https://riha.eesti.ee). The data...
  • Dataset

    Twitter fake followers

    Fake followers are fake accounts massively created to follow a target account and that can be bought from online markets. In other words, their goal is that of increasing the...
  • Dataset

    Disease Twitter Dataset

    This Twitter dataset covers two recent outbreaks: Ebola and Zika. About 60 million tweets were collected through a query-based access to the Twitter Streaming API, covering...
  • Dataset

    e-MID interbank transactions

    This dataset is an edgelist containing daily interbank transactions as registered in the electronic Market for Interbank Deposits (e-MID), in the period 2010--2014. e-MID is...
  • Method

    EpiCID: A framework for discovering interactions between SNPs

    Epistatic interactions (EIs) of gene loci often determine complex trait phenotypes. EIs may indicate the underlying molecular mechanisms of multifactorial traits and diseases....
  • Dataset

    GeoLife - GPS trajectories dataset

    This (link to a) GPS trajectory dataset was collected in (Microsoft Research Asia) Geolife project by 182 users in a period of over three years (from April 2007 to August 2012)....
    • ZIP
      The resource: 'GeoLife Download page' is not accessible as guest user. You must login to access it!
  • Dataset

    Russell 3000 stock prices

    This dataset contains the price and volume of the 3000 stocks belonging to the Russell 3000 Index, roughly corresponding to the 3000 more capitalized stocks. Traded volume and...
  • Dataset

    .ee Web archive

    .ee Web archive consisting of snapshots from 2015
  • Dataset

    Mobility index for local quarantines in Chile

    Fighting the COVID-19 pandemic, most countries have implemented non-pharmaceutical interventions like wearing masks, physical distancing, lockdown, and travel restrictions....
    • CSV
      The resource: 'Mobility Index for Local ...' is not accessible as guest user. You must login to access it!
  • Dataset

    GPS Tracks - Tuscany 2011

    This dataset contains GPS trajectories of private vehicles crossing the region of Tuscany in Italy. It is composed of about 11 mln of trips of 150.000 users collected in May...
  • Dataset

    Twitter dataset about two premier UK music festivals

    The dataset contains twitter posts about two premier UK music festivals: Creamfields 2016 (on August 25th-28th) and VFestival 2016 (on August 20th-21st).
    • Github
      The resource: 'Twitter dataset about two ...' is not accessible as guest user. You must login to access it!
  • Dataset

    Food consumption data at the canteens of University of Pisa

    A dataset storing all the meals consumed by students at the canteen of University of Pisa during a six years-long period.
  • Dataset

    Retail Market Data

    This dataset contains Retail Market Data about food products, from 2007, for about 130 shops of an Italian Distribution chain. Data are of about 1 M of Active Clients, and...
  • Dataset

    Compas

    The compas dataset contains the features used by the COMPAS algorithm for scoring defendants and their risk (Low, Medium and High), for over $4,000$ individuals. We considered...
    • CSV
      The resource: 'https://www' is not accessible as guest user. You must login to access it!
  • Method

    Twitter preprocessor

    Tokeniser, lemmatiser, extraction of negation. Under development.
    • xslx
      The resource: 'Wyroles' is not accessible as guest user. You must login to access it!
  • Experiment

    Micro Project Experiments: Academic Migration and Academic Networks

    The experiments and results material for the micro project titled Academic Migration and Academic Networks: Evidence from Scholarly Big Data and the Iron Curtain
    • HTML
      The resource: 'Micro Project Experiments ...' is not accessible as guest user. You must login to access it!
  • Dataset

    Micro Project Datasets: Academic Migration and Academic Networks

    Datasets used and produced for and from the micro project titled: Academic Migration and Academic Networks: Evidence from Scholarly Big Data and the Iron Curtain
    • HTML
      The resource: 'Micro Project Datasets' is not accessible as guest user. You must login to access it!