245 items found

Groups: sobigdata-eu

Filter Results
  • Dataset

    Multilevel Monitoring of Activity and Sleep in Healthy people

    Multilevel Monitoring of Activity and Sleep in Healthy people (MMASH) dataset provides 24 hours of continuous beat-to-beat heart data, triaxial accelerometer data, sleep...
  • Dataset

    GPS Origin Destination Matrix in Tuscany

    This dataset is the origin and destination matrix among the municipalities of Tuscany extracted starting from GPS tracks of private vehicles collected from 2014-02-10 to...
    • CSV
      The resource: ' GPS Origin Destination Matrix' is not accessible as guest user. You must login to access it!
  • Dataset

    Soccer Events

    This dataset contains data regarding one full season of soccer games. For each player there are locations (positions in pitch) visited and all the events they generated...
    • ZIP
      The resource: 'Soccer event data' is not accessible as guest user. You must login to access it!
  • TrainingMaterial

    Introduction to Data Curation

    This course is an introduction to data collection, data preparation & transformation and data analysis. It contains the essential concepts for a researcher in order to...
    • PDF
      The resource: 'Introduction to Data Curation' is not accessible as guest user. You must login to access it!
  • Dataset

    Social Network dataset - LiveJournal

    LiveJournal is a free on-line blogging community where users declare friendship each other. LiveJournal also allows users form a group which other members can then join. We...
    • HTML
      The resource: 'LiveJournal social network ...' is not accessible as guest user. You must login to access it!
  • Dataset

    Call Data Record District of Pisa 2013 October

    The dataset contains mobile phone records collected in the provinces of Pisa, Lucca, Livorno and Firenze in October 2013. It contains about 60 mln of Call Data Records (CDR),...
  • Dataset

    ClueWeb09

    The ClueWeb09 dataset consists of about 1 billion web pages in ten languages that were collected in January and February 2009. It was created to support research on...
  • Dataset

    Official administrative information of Tuscany

    The data contains the spatial partitioning of Tuscany and some statistical information published by the Italian Statistical Bureau.
    • LOD
      The resource: 'Linked Open Data' is not accessible as guest user. You must login to access it!
  • Method

    A hybrid approach for PPI

    We propose a new framework that can exploit topological and biological information to predict protein-protein interactions. The algorithm relies on the underlying hypothesis...
  • Dataset

    German Credit

    In the german credit dataset each one of the 1,000 persons is classified as a good or bad creditor according to attributes like age, sex, checking_account, credit_amount,...
    • CSV
      The resource: 'German Credit' is not accessible as guest user. You must login to access it!
  • Dataset

    Twitter Dumps

    The dataset consists of the 10% of the daily stream of tweets produced on Twitter filtered into 3 subsets: English, Italian, geo-referenced. The tweets are a random sample of...
  • Dataset

    Open data from NervousNet

    This dataset contains anonymized proximity information sent by 154 mobile phones (both Android and iPhone) via phone apps. These information are sent by bluetooth beacons every...
    • ZIP
      The resource: 'open data from NervousNet' is not accessible as guest user. You must login to access it!
  • Dataset

    Car sharing dataset

    The dataset comprises pickup and drop-off times and locations of vehicles in 10 European cities for one of the major free-floating car sharing operator. For nine of these...
  • Dataset

    Twitter social bots

    Spambots are automated accounts (i.e., accounts driven by a bot) that repeatedly advertise unsolicited and often harmful content (e.g., malware, URLs to phishing Web sites,...
  • Dataset

    Broad Twitter Corpus

    The Broad Twitter Corpus is a named entity-annotated dataset of tweets, collected in order to capture temporal, spatial and social diversity. The goal of the corpus is to...
    • JSON
      The resource: 'Broad Twitter Corpus' is not accessible as guest user. You must login to access it!
  • Method

    Gene-specific regularization for COPD partial-correlation estimation

    We introduce a gene-specific regularization factor when computing the Partial Correlation score to make the indeterminate regression feasible. We decided to slightly modify...
  • Dataset

    Estonian public sector electronic services and service providers and consumers

    The dataset contains records of electronic services (aka X-Road services), service providers and consumers harvested in April 2014 from RIHA (https://riha.eesti.ee). The data...
  • Dataset

    Twitter fake followers

    Fake followers are fake accounts massively created to follow a target account and that can be bought from online markets. In other words, their goal is that of increasing the...
  • Dataset

    Disease Twitter Dataset

    This Twitter dataset covers two recent outbreaks: Ebola and Zika. About 60 million tweets were collected through a query-based access to the Twitter Streaming API, covering...
  • Dataset

    e-MID interbank transactions

    This dataset is an edgelist containing daily interbank transactions as registered in the electronic Market for Interbank Deposits (e-MID), in the period 2010--2014. e-MID is...