-
Private Cybersecurity NER BERT-base-cased model
This method includes a Python script and files of a BERT-base-cased model fine-tuned on our Cybersecurity NER dataset. The method requires as input a list of sentences that... -
Cybersecurity NER RoBERTa-base model
This method includes a Python script and files of a RoBERTa-base model fine-tuned on our Cybersecurity NER dataset. The method requires as input a list of sentences that will...-
JSON
The resource: 'config' is not accessible as guest user. You must login to access it!
-
TXT
The resource: 'merges' is not accessible as guest user. You must login to access it!
-
BIN
The resource: 'model' is not accessible as guest user. You must login to access it!
-
JSON
The resource: 'model_args' is not accessible as guest user. You must login to access it!
-
ZIP
The resource: 'scheduler' is not accessible as guest user. You must login to access it!
-
JSON
The resource: 'special_tokens_map' is not accessible as guest user. You must login to access it!
-
JSON
The resource: 'tokenizer_config' is not accessible as guest user. You must login to access it!
-
ZIP
The resource: 'training_args' is not accessible as guest user. You must login to access it!
-
JSON
The resource: 'tokenizer' is not accessible as guest user. You must login to access it!
-
JSON
The resource: 'vocab' is not accessible as guest user. You must login to access it!
-
ZIP
The resource: 'optimizer' is not accessible as guest user. You must login to access it!
-
py
The resource: 'inference' is not accessible as guest user. You must login to access it!
-
JSON
-
Santorini Tweets July-August 2021
This dataset contains 225.501 tweets written by 141.277 users. These tweets are geolocated in Santorini, or they contain the word or the hashtag "santorini" in the text. They...-
ZIP
The resource: 'tweet_santorini.csv' is not accessible as guest user. You must login to access it!
-
ZIP
-
FANCY Dataset
(NLI) FANCY (FActivity, Negation, Common-sense, hYpernimy) is a new dataset with 4000 sentence pairs concerning complex linguistic phenomena such as factivity, negation,... -
SWH Filenames
A 69 GB dataset with ~2.3 billion strings representing deduplicated names of source code files collected by Software Heritage, the great library of source code...-
ZIP
The resource: 'SWH Filenames' is not accessible as guest user. You must login to access it!
-
ZIP
-
Quantum Distance-Based Classifier
The Quantum Distance-Based Classifier is a technique inspired by the classical k-Nearest Neighbors that leverages quantum properties to perform prediction. -
CLiQS
CLiQS is a Python language software package for social media texts summarization with a diversified approach. -
Private Distributed W2V
Accelerated training of Word Embeddings for large text corpora. Creates a word2vec-model from an input corpus of tokenized texts through the use of parallel distributed... -
Conversational search dataset with labels
CAsT 2019 data is split into two files one for training and the other one for testing. - Training set: CAsT 2019 conversations from training set and from test set without... -
Dataset for Evaluating Abstractive Summaries of Crisis-Related Social Media
The dataset created for evaluation of summaries generated from social media posted during five natural disasters. The dataset contains: ground truth reports created by human... -
WIRE dataset
This dataset consists of 503 pairs of Wikipedia entities drawn from the New York Times dataset with a human assigned relatedness score. The domain experts based their... -
Ariadne English Dendrochronology Entity Recognizer
Identifies terms and phrases in English for analysing archaeological text. The method delivers named entities of archaeological elements, wood material, sample, and date, with...-
method-engine
The resource: 'Method Engine' is not accessible as guest user. You must login to access it!
-
method-engine
-
Ariadne Dutch Dendrochronology Entity Recognizer
Identifies terms and phrases in Dutch for analysing archaeological text. The method delivers named entities of archaeological elements, wood material, sample, and date, with...-
method-engine
The resource: 'Method Engine' is not accessible as guest user. You must login to access it!
-
method-engine
-
Amazon reviews
This (link to the) dataset contains product reviews and metadata from Amazon, including 142.8 million reviews spanning May 1996 - July 2014. This dataset includes reviews...-
HTML
The resource: 'Julian McAuley's repository.' is not accessible as guest user. You must login to access it!
-
HTML
-
Cross-Lingual Dataset of Crisis-Related Social Media
If you use this dataset, please cite the following paper: Fedor Vitiugin, Carlos Castillo: Cross-Lingual Query-Based Summarization of Crisis-Related Social Media: An Abstractive... -
Dictionary creator
This tool creates a dictionary with inverse document frequency (idf) values from the Google NGrams dataset. -
GATE Cloud Chemical Entity Recogniser
This service annotates chemical named entities using the open source OSCAR4 tagger. As well as the names of the detected entities the tagger also returns their structure in...-
method-engine
The resource: 'Method Engine' is not accessible as guest user. You must login to access it!
-
method-engine
-
Ariadne Swedish Archaeology Named Entity Recognizer
Identifies terms and phrases in Swedish for analysing archaeological text. The method delivers named entities of archaeological context, physical object, material, time...-
method-engine
The resource: 'Method Engine' is not accessible as guest user. You must login to access it!
-
method-engine
-
The Italian Music Dataset
The dataset is built by exploiting the Spotify and SoundCloud APIs. It is composed of over 14,500 different songs of both famous and less famous Italian musicians. Each song...-
JSON
The resource: 'Dataset' is not accessible as guest user. You must login to access it!
-
JSON
-
Ariadne Swedish Dendrochronology Entity Recognizer
Identifies terms and phrases in Swedish for analysing archaeological text. The method delivers named entities of archaeological elements, wood material, sample, and date, with...-
method-engine
The resource: 'Method Engine' is not accessible as guest user. You must login to access it!
-
method-engine