-
Private Cybersecurity NER BERT-base-cased model
This method includes a Python script and files of a BERT-base-cased model fine-tuned on our Cybersecurity NER dataset. The method requires as input a list of sentences that... -
Private Cybersecurity NER dataset
Our dataset is created by merging APTNER and CyNER datasets, containing 13601 sentences, 347779 tokens, and 37684 entities. The split ratio was roughly 70% for training and... -
Cybersecurity NER SecureBERT model
This method includes a Python script and files of a SecureBERT model fine-tuned on our Cybersecurity NER dataset. The method requires as input a list of sentences that will be...-
JSON
The resource: 'config' is not accessible as guest user. You must login to access it!
-
TXT
The resource: 'merges' is not accessible as guest user. You must login to access it!
-
BIN
The resource: 'model' is not accessible as guest user. You must login to access it!
-
JSON
The resource: 'model_args' is not accessible as guest user. You must login to access it!
-
ZIP
The resource: 'optimizer' is not accessible as guest user. You must login to access it!
-
ZIP
The resource: 'scheduler' is not accessible as guest user. You must login to access it!
-
JSON
The resource: 'special_tokens_map' is not accessible as guest user. You must login to access it!
-
JSON
The resource: 'tokenizer' is not accessible as guest user. You must login to access it!
-
JSON
The resource: 'tokenizer_config' is not accessible as guest user. You must login to access it!
-
ZIP
The resource: 'training_args' is not accessible as guest user. You must login to access it!
-
TXT
The resource: 'vocab' is not accessible as guest user. You must login to access it!
-
text/x-python
The resource: 'inference' is not accessible as guest user. You must login to access it!
-
JSON
-
Cybersecurity NER RoBERTa-base model
This method includes a Python script and files of a RoBERTa-base model fine-tuned on our Cybersecurity NER dataset. The method requires as input a list of sentences that will...-
JSON
The resource: 'config' is not accessible as guest user. You must login to access it!
-
TXT
The resource: 'merges' is not accessible as guest user. You must login to access it!
-
BIN
The resource: 'model' is not accessible as guest user. You must login to access it!
-
JSON
The resource: 'model_args' is not accessible as guest user. You must login to access it!
-
ZIP
The resource: 'scheduler' is not accessible as guest user. You must login to access it!
-
JSON
The resource: 'special_tokens_map' is not accessible as guest user. You must login to access it!
-
JSON
The resource: 'tokenizer_config' is not accessible as guest user. You must login to access it!
-
ZIP
The resource: 'training_args' is not accessible as guest user. You must login to access it!
-
JSON
The resource: 'tokenizer' is not accessible as guest user. You must login to access it!
-
JSON
The resource: 'vocab' is not accessible as guest user. You must login to access it!
-
ZIP
The resource: 'optimizer' is not accessible as guest user. You must login to access it!
-
py
The resource: 'inference' is not accessible as guest user. You must login to access it!
-
JSON
-
Private Dynamical Linear Upper Confidence Bound (DynLin-UCB)
The repository contains the code to run DynLin-UCB (Dynamical Linear Upper Confidence Bound). DynLin-UCB is an optimistic regret-minimization algorithm that can be used to... -
Spotify Tracks Dataset (full)
The dataset is created exploiting the Spotify API and the tracks id provided by the authors of https://www.kaggle.com/datasets/maharshipandya/-spotify-tracks-dataset.... The... -
Private Smart Cities Weather and Pollution conditions
A set of weather and climatic conditions gathered during the Toolsmart PoN project ( Open Community PA 2020 – Pon Governance 2014-2020). Data are obtained from IoT based... -
Spotify track dataset (small)
The dataset is created exploiting the Spotify API and the tracks id provided by the authors of https://www.kaggle.com/datasets/maharshipandya/-spotify-tracks-dataset.... The...-
ZIP
The resource: 'std_small' is not accessible as guest user. You must login to access it!
-
ZIP
-
GiveMeSomeCreditSC
The GiveMeSomeCredit dataset - https://www.kaggle.com/c/GiveMeSomeCredit - contains different features of borrowers. The task is predicting the financial distress of a...-
ZIP
The resource: 'GiveMeSomeCreditSC' is not accessible as guest user. You must login to access it!
-
ZIP
-
Santorini Tweets July-August 2021
This dataset contains 225.501 tweets written by 141.277 users. These tweets are geolocated in Santorini, or they contain the word or the hashtag "santorini" in the text. They...-
ZIP
The resource: 'tweet_santorini.csv' is not accessible as guest user. You must login to access it!
-
ZIP
-
Air Quality Datasets over L'Aquila Region
These datasets have been collected through ESA, CeTEMPS and ARTA. They are a work-in-progress deliverable of a virtual laboratory (VL-Disaster) in the context of the SoBigData. -
HANSEN: Spoken Text Authorship Analysis
HANSEN encom- passes meticulous curation of existing speech datasets accompanied by transcripts, along- side the creation of novel AI-generated spo- ken text datasets.... -
UK election abuse data
The GATE team (gate.ac.uk) at the University of Sheffield have collected 1.4 million tweets sent to and by UK members of parliament in the months leading up to the 2015 and...-
XLS
The resource: 'uk-election-abuse.tar.gz' is not accessible as guest user. You must login to access it!
-
XLS
-
Physical activity, quality of sleep, and quality of life in Italy: the long t...
From March 2020 to May 2021, several lockdown periods caused by COVID-19 pandemic have limited, with varying degrees of severity, the people’s usual activities and mobility in...-
ZIP
The resource: 'dataset and code' is not accessible as guest user. You must login to access it!
-
ZIP
-
-
CSV
The resource: 'Churn Dataset' is not accessible as guest user. You must login to access it!
-
CSV
-
Articles and comments of major Estonian newspapers
The dataset contains articles and comments of four major Estonian news portals since early 2000s to 2016. -
ClueWeb12
The ClueWeb12 dataset consists of 733,019,372 English web pages, collected between February 10, 2012 and May 10, 2012. It was created to support research on information... -
Medical Dataset
The medical dataset contains a corpus of fully anonymized clinical text. Each document in the corpus is associated with a set of ICD-9 codes which represents the diagnosis...-
ZIP
The resource: 'Medical Dataset' is not accessible as guest user. You must login to access it!
-
ZIP
-
Retail market dataset
The dataset contains purchases of Unicoop Tirreno customers, description and information of the shops (both small shops and supermarkets) and the customers. -
Global Peace Index data
A dataset of the Global Peace Index (GPI), which ranks 163 independent states and territories according to their level of peacefulness. The GPI covers 99.7 per cent of the...