46 items found

Organisations: SoBigData Catalogue Groups: sobigdata-eu

Filter Results
  • Dataset

    ClueWeb09

    The ClueWeb09 dataset consists of about 1 billion web pages in ten languages that were collected in January and February 2009. It was created to support research on...
  • Dataset

    Twitter Dumps

    The dataset consists of the 10% of the daily stream of tweets produced on Twitter filtered into 3 subsets: English, Italian, geo-referenced. The tweets are a random sample of...
  • Dataset

    Twitter social bots

    Spambots are automated accounts (i.e., accounts driven by a bot) that repeatedly advertise unsolicited and often harmful content (e.g., malware, URLs to phishing Web sites,...
  • Dataset

    Broad Twitter Corpus

    The Broad Twitter Corpus is a named entity-annotated dataset of tweets, collected in order to capture temporal, spatial and social diversity. The goal of the corpus is to...
    • JSON
      The resource: 'Broad Twitter Corpus' is not accessible as guest user. You must login to access it!
  • Dataset

    Twitter fake followers

    Fake followers are fake accounts massively created to follow a target account and that can be bought from online markets. In other words, their goal is that of increasing the...
  • Dataset

    .ee Web archive

    .ee Web archive consisting of snapshots from 2015