22 items found

Tags: Web data

Filter Results
  • Access required...

    ×

    Method

    Private Cybersecurity NER BERT-base-cased model

    This method includes a Python script and files of a BERT-base-cased model fine-tuned on our Cybersecurity NER dataset. The method requires as input a list of sentences that...
  • Method

    Cybersecurity NER RoBERTa-base model

    This method includes a Python script and files of a RoBERTa-base model fine-tuned on our Cybersecurity NER dataset. The method requires as input a list of sentences that will...
    • JSON
      The resource: 'config' is not accessible as guest user. You must login to access it!
    • TXT
      The resource: 'merges' is not accessible as guest user. You must login to access it!
    • BIN
      The resource: 'model' is not accessible as guest user. You must login to access it!
    • JSON
      The resource: 'model_args' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'scheduler' is not accessible as guest user. You must login to access it!
    • JSON
      The resource: 'special_tokens_map' is not accessible as guest user. You must login to access it!
    • JSON
      The resource: 'tokenizer_config' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'training_args' is not accessible as guest user. You must login to access it!
    • JSON
      The resource: 'tokenizer' is not accessible as guest user. You must login to access it!
    • JSON
      The resource: 'vocab' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'optimizer' is not accessible as guest user. You must login to access it!
    • py
      The resource: 'inference' is not accessible as guest user. You must login to access it!
  • Dataset

    Spotify Tracks Dataset (full)

    The dataset is created exploiting the Spotify API and the tracks id provided by the authors of https://www.kaggle.com/datasets/maharshipandya/-spotify-tracks-dataset.... The...
    • The resource: 'std_full' is not accessible as guest user. You must login to access it!
  • Access required...

    ×

    Dataset

    Private Smart Cities Weather and Pollution conditions

    A set of weather and climatic conditions gathered during the Toolsmart PoN project ( Open Community PA 2020 – Pon Governance 2014-2020). Data are obtained from IoT based...
  • Dataset

    Spotify track dataset (small)

    The dataset is created exploiting the Spotify API and the tracks id provided by the authors of https://www.kaggle.com/datasets/maharshipandya/-spotify-tracks-dataset.... The...
    • ZIP
      The resource: 'std_small' is not accessible as guest user. You must login to access it!
  • Dataset

    Santorini Tweets July-August 2021

    This dataset contains 225.501 tweets written by 141.277 users. These tweets are geolocated in Santorini, or they contain the word or the hashtag "santorini" in the text. They...
    • ZIP
      The resource: 'tweet_santorini.csv' is not accessible as guest user. You must login to access it!
  • Dataset

    SWH Filenames

    A 69 GB dataset with ~2.3 billion strings representing deduplicated names of source code files collected by Software Heritage, the great library of source code...
    • ZIP
      The resource: 'SWH Filenames' is not accessible as guest user. You must login to access it!
  • Method

    Quantum Distance-Based Classifier

    The Quantum Distance-Based Classifier is a technique inspired by the classical k-Nearest Neighbors that leverages quantum properties to perform prediction.
  • Experiment

    Using Computer Vision Techniques to Study Images from the Web [Video Tutorial]

    The video tutorial discusses using computer vision techniques to study online images, including case studies, research methods, challenges faced, and lessons learned.
    • The resource: ' Using Computer Vision ...' is not accessible as guest user. You must login to access it!
  • Experiment

    Analysing Meme Collections with the Computer Vision Network Approach [Video T...

    The video tutorial presents techniques for analysing meme collections with the computer vision network approach, including seven steps for network building and interpretation,...
    • The resource: 'Analysing Meme Collections ...' is not accessible as guest user. You must login to access it!
  • Experiment

    Making meme collections [Video Tutorial]

    The video tutorial discusses making meme collections, including the meme as a technical collection of objects, automated visual analysis, and meme collection distinctiveness.
    • The resource: 'Making meme collections ...' is not accessible as guest user. You must login to access it!
  • Access required...

    ×

    Method

    Private Boilernet

    Deploys an artificial neural network to remove the boilerplate from HTML files. Annotates the text content in the file or extracts the text from the HTML file.
  • Dataset

    Multi-Task Faces (MTF) dataset

    The Multi-Task Faces (MTF) dataset consists of cropped human faces for classification tasks or other research purposes. Each image in the dataset is labelled according to four...
    • ZIP
      The resource: 'MTF_dataset_20230701' is not accessible as guest user. You must login to access it!
  • Dataset

    CoPhIR

    The CoPhIR (Content-based Photo Image Retrieval) Test-Collection has been developed to make significant tests on the scalability of the SAPIR project infrastructure (SAPIR:...
    • The resource: 'cophir.isti.cnr.it' is not accessible as guest user. You must login to access it!
  • Dataset

    The Italian Music Dataset

    The dataset is built by exploiting the Spotify and SoundCloud APIs. It is composed of over 14,500 different songs of both famous and less famous Italian musicians. Each song...
    • JSON
      The resource: 'Dataset' is not accessible as guest user. You must login to access it!
  • Dataset

    GERDAQ Dataset

    This is a benchmark dataset of annotated search-engine queries. Mentions of entities in search-engine queries are tagged with the entity they refer to. Wikipedia is used as...
    • XML
      The resource: 'GERDAQ dataset' is not accessible as guest user. You must login to access it!
  • Method

    ArchiveSpark

    ArchiveSpark is an Apache Spark framework for easy data access, processing, extraction as well as derivation for Web archives and archival collections. It has a simple and...
    • The resource: 'ArchiveSpark on GitHub' is not accessible as guest user. You must login to access it!
  • Dataset

    German Academic Web

    The dataset contains regular crawls of the websites for German academic institutions.
  • Dataset

    MSN Search query log

    The data consists of an MSN Search query log excerpt with 15 million queries, from US users, sampled over one month of activity. Data attributes made available per query: 1)...
  • Dataset

    Product Reviews for Ordinal Quantification

    This data set comprises a labeled training set, validation samples, and testing samples for ordinal quantification. It appears in our research paper "Ordinal Quantification...
    • The resource: 'Zenodo link' is not accessible as guest user. You must login to access it!