-
ClueWeb12
The ClueWeb12 dataset consists of 733,019,372 English web pages, collected between February 10, 2012 and May 10, 2012. It was created to support research on information... -
Brexit Tweets Linked Domains
In this spreadsheet we share domains linked in the UK EU membership referendum tweet collection. Counts for links by leave voters and remain voters are given, enabling sites...-
ODS
The resource: 'Brexit Tweets Linked ...' is not accessible as guest user. You must login to access it!
-
ODS
-
Brexit Twitter User Vote Intent
A list of users for which vote intent in the UK EU membership referendum has been established. -
Twitter Newcomers Dataset
Twitter accounts detected right after registration and monitored for 21 days-
ZIP
The resource: 'New Accounts Dataset' is not accessible as guest user. You must login to access it!
-
ZIP
-
Sheffield NERD Tweet Corpus
The dataset contais 794 tweets annotated with named entities disambiguated against DBpedia, and split into equally sized training and test portions. 400 tweets from 2013 comes...-
FINF
The resource: 'Sheffield NERD Tweet Corpus' is not accessible as guest user. You must login to access it!
-
FINF
-
DE webarchive
The dataset consists of all the content from the .de top level domain as crawled by the Internet Archive.-
HTML
The resource: 'Internet Archive Wayback ...' is not accessible as guest user. You must login to access it!
-
HTML
-
UK General Election Vote Intent
A list of Twitter users for whom party political allegiance/vote intent has been established. -
Facebook Wallpost
Online interactions between users via the wall feature in the New Orleans regional network.-
HTML
The resource: 'Original data' is not accessible as guest user. You must login to access it!
-
HTML
-
Twitter Dataset 2013-2014
The dataset was collected by the Archive team through the Twitter Streaming API which provides free access to 1% of public tweets. The covered time period is from January 1st... -
Social Network dataset - LiveJournal
LiveJournal is a free on-line blogging community where users declare friendship each other. LiveJournal also allows users form a group which other members can then join. We...-
HTML
The resource: 'LiveJournal social network ...' is not accessible as guest user. You must login to access it!
-
HTML
-
ClueWeb09
The ClueWeb09 dataset consists of about 1 billion web pages in ten languages that were collected in January and February 2009. It was created to support research on... -
Twitter Dumps
The dataset consists of the 10% of the daily stream of tweets produced on Twitter filtered into 3 subsets: English, Italian, geo-referenced. The tweets are a random sample of... -
Twitter social bots
Spambots are automated accounts (i.e., accounts driven by a bot) that repeatedly advertise unsolicited and often harmful content (e.g., malware, URLs to phishing Web sites,... -
Broad Twitter Corpus
The Broad Twitter Corpus is a named entity-annotated dataset of tweets, collected in order to capture temporal, spatial and social diversity. The goal of the corpus is to...-
JSON
The resource: 'Broad Twitter Corpus' is not accessible as guest user. You must login to access it!
-
JSON
-
Twitter fake followers
Fake followers are fake accounts massively created to follow a target account and that can be bought from online markets. In other words, their goal is that of increasing the... -
Measurement Expression Annotator
Annotates numbers and measurement expressions in text. This method recognises many types of measurements including length, temperature, time and speed, and calculates their...-
method-engine
The resource: 'Run method' is not accessible as guest user. You must login to access it!
-
method-engine
-
Digital DNA fingerprinting
The "Digital DNA fingerprinting" is a spambot detection technique based on the "Digital DNA" online behavioral modeling technique. Given a set of Twitter user timelines, it is... -
SWAT
SWAT is a entity-salience system which identifies on-the-fly the semantic focus of a document, expressed by its Salient Wikipedia Entities. The core of this technology is... -
Twitter Opinion Mining English
This tool recognises opinionated sentences in English tweets and it classifies them as positive or negative. It also indicates emotion type, author and target of the opinion,...-
method-engine
The resource: 'Run method' is not accessible as guest user. You must login to access it!
-
method-engine