26 items found

Tags: Text mining

Filter Results
  • Access required...

    ×

    Method

    Private Cybersecurity NER BERT-base-cased model

    This method includes a Python script and files of a BERT-base-cased model fine-tuned on our Cybersecurity NER dataset. The method requires as input a list of sentences that...
  • Dataset

    Supporting data for "CoVEffect: Interactive System for Mining the Effects of ...

    This repository contains the datasets created and extracted for the paper: Giuseppe Serna García, Ruba Al Khalaf, Francesco Invernici, Stefano Ceri, and Anna Bernasconi. 2022....
    • The resource: 'Supporting data for ...' is not accessible as guest user. You must login to access it!
  • Access required...

    ×

    Dataset

    Private Cybersecurity NER dataset

    Our dataset is created by merging APTNER and CyNER datasets, containing 13601 sentences, 347779 tokens, and 37684 entities. The split ratio was roughly 70% for training and...
  • Method

    Cybersecurity NER SecureBERT model

    This method includes a Python script and files of a SecureBERT model fine-tuned on our Cybersecurity NER dataset. The method requires as input a list of sentences that will be...
    • JSON
      The resource: 'config' is not accessible as guest user. You must login to access it!
    • TXT
      The resource: 'merges' is not accessible as guest user. You must login to access it!
    • BIN
      The resource: 'model' is not accessible as guest user. You must login to access it!
    • JSON
      The resource: 'model_args' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'optimizer' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'scheduler' is not accessible as guest user. You must login to access it!
    • JSON
      The resource: 'special_tokens_map' is not accessible as guest user. You must login to access it!
    • JSON
      The resource: 'tokenizer' is not accessible as guest user. You must login to access it!
    • JSON
      The resource: 'tokenizer_config' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'training_args' is not accessible as guest user. You must login to access it!
    • TXT
      The resource: 'vocab' is not accessible as guest user. You must login to access it!
    • text/x-python
      The resource: 'inference' is not accessible as guest user. You must login to access it!
  • Method

    Cybersecurity NER RoBERTa-base model

    This method includes a Python script and files of a RoBERTa-base model fine-tuned on our Cybersecurity NER dataset. The method requires as input a list of sentences that will...
    • JSON
      The resource: 'config' is not accessible as guest user. You must login to access it!
    • TXT
      The resource: 'merges' is not accessible as guest user. You must login to access it!
    • BIN
      The resource: 'model' is not accessible as guest user. You must login to access it!
    • JSON
      The resource: 'model_args' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'scheduler' is not accessible as guest user. You must login to access it!
    • JSON
      The resource: 'special_tokens_map' is not accessible as guest user. You must login to access it!
    • JSON
      The resource: 'tokenizer_config' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'training_args' is not accessible as guest user. You must login to access it!
    • JSON
      The resource: 'tokenizer' is not accessible as guest user. You must login to access it!
    • JSON
      The resource: 'vocab' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'optimizer' is not accessible as guest user. You must login to access it!
    • py
      The resource: 'inference' is not accessible as guest user. You must login to access it!
  • Dataset

    DNA 12-mers

    A 179 MB dataset containing all the ~14M unique 12-mers in the DNA sequences available in the Pizza&Chili Corpus (https://pizzachili.dcc.uchile.cl/texts.html). This dataset...
    • ZIP
      The resource: 'DNA 12-mers' is not accessible as guest user. You must login to access it!
  • Dataset

    Lexical networks from Croatian news articles

    The dataset includes lexical networks centered on keywords related to migration. The networks are built starting from Croatian news articles extracted from the dataset...
    • jsonl
      The resource: 'croatian_egoNet_w4' is not accessible as guest user. You must login to access it!
  • Dataset

    Semantic Networks from news articles (Portuguese sample)

    The Semantic Networks from news articles (Portuguese sample) contains semantic networks for a sample of migration-related news articles extracted from the dataset described in...
    • CSV
      The resource: 'Portuguese_sampleNet_anonym ...' is not accessible as guest user. You must login to access it!
  • Dataset

    Lexical networks from Swedish news articles

    The dataset includes lexical networks centered on keywords related to migration. The networks are built starting from Swedish news articles extracted from the dataset described...
    • jsonl
      The resource: 'swedish_egoNet_w4' is not accessible as guest user. You must login to access it!
  • Dataset

    Semantic Networks from news articles (French sample)

    The Semantic Networks from news articles (French sample) contains semantic networks for a sample of migration-related news articles extracted from the dataset described in...
    • CSV
      The resource: 'Semantic Networks from ...' is not accessible as guest user. You must login to access it!
  • Dataset

    Semantic Networks from news articles (Spanish sample)

    The Semantic Networks from news articles (Spanish sample) contains semantic networks for a sample of migration-related news articles extracted from the dataset described in...
    • CSV
      The resource: 'Spanish_sampleNet_anonymized' is not accessible as guest user. You must login to access it!
  • Dataset

    Synthetic Datasets for Fine-Grained Fairness Analysis of Abusive Language Det...

    Three synthetic datasets covering different types of bias grouped by target, namely sexism, racism and ableism. The reason for distinguishing the records by abuse targets is...
    • CSV
      The resource: 'Synthetic Datasets for ...' is not accessible as guest user. You must login to access it!
  • Dataset

    Semantic Networks from news articles (Romanian sample)

    The Semantic Networks from news articles (Romanian sample) contains semantic networks for a sample of migration-related news articles extracted from the dataset described in...
    • CSV
      The resource: 'Romanian_sampleNet_anonymized' is not accessible as guest user. You must login to access it!
  • Access required...

    ×

    Dataset

    Private Italian Thesaurus for Tourism domain

    An Italian thesaurus in the domain of the Tourism, counting 2,684 concepts, organized according to semantic relationships (equivalence, hierarchical and associative). The...
  • Dataset

    Lexical networks from Polish news articles

    The dataset includes lexical networks centered on keywords related to migration. The networks are built starting from Polish news articles extracted from the dataset described...
    • jsonl
      The resource: 'polish_egoNet_w4' is not accessible as guest user. You must login to access it!
  • Dataset

    Lexical networks from Finnish news articles

    The dataset includes lexical networks centered on keywords related to migration. The networks are built starting from Finnish news articles extracted from the dataset...
    • jsonl
      The resource: 'finnish_egoNet_w4' is not accessible as guest user. You must login to access it!
  • Dataset

    Santorini Tweets July-August 2021

    This dataset contains 225.501 tweets written by 141.277 users. These tweets are geolocated in Santorini, or they contain the word or the hashtag "santorini" in the text. They...
    • ZIP
      The resource: 'tweet_santorini.csv' is not accessible as guest user. You must login to access it!
  • Dataset

    Lexical networks from Lithuanian news articles

    The dataset includes lexical networks centered on keywords related to migration. The networks are built starting from Lithuanian news articles extracted from the dataset...
    • jsonl
      The resource: 'lithuanian_egoNet_w4' is not accessible as guest user. You must login to access it!
  • Dataset

    FANCY Dataset

    (NLI) FANCY (FActivity, Negation, Common-sense, hYpernimy) is a new dataset with 4000 sentence pairs concerning complex linguistic phenomena such as factivity, negation,...
    • The resource: 'FANCY Dataset' is not accessible as guest user. You must login to access it!
  • Dataset

    Semantic Networks from news articles (Danish sample)

    The Semantic Networks from news articles (Danish sample) contains semantic networks for a sample of migration-related news articles extracted from the dataset described in...
    • CSV
      The resource: 'Danish_sampleNet_anonymized' is not accessible as guest user. You must login to access it!