approved
Ego Networks of Words in Twitter

This set of dataframes were used in our last paper :

Ollivier K, Boldrini C, Passarella A, Conti M (2022) Structural invariants and semantic fingerprints in the “ego network” of words. PLoS ONE 17(11): e0277182. https://doi.org/10.1371/journal.pone.0277182

root/ ├─ bert_vec/ ├─ egonets/ ├─ topics_distrib/

Four datasets were extracted from Twitter:

Journalists from the NYT Science writers Random users #1 Random users #2

Data files with content derivated from these four datasets are generally prefixed respectively with : nyt, science, random1, random2.

In order to comply with GDPR rules, we fully anonymized the results.

Directory egonets/, contains the egonetwork structures. Even if the words are replaced by ids in order to maintain anonymity, main structural results can be derived from those files.

Directory bert_vec/ contains the BERT embeddings after being reduced by UMAP.

The directory topics_distrib/ contains the dataframes where the topic distribution vectors are stored.

Tags
Data and Resources
To access the resources you must log in

This item has no data

Personal Data Attributes

Description: Personal Data related Information

Field Value
Anonymisation Methodology We have completely anonymised the twitter users (the id has been replaced by a pseudonym). In the files describing the structure of the ego network, containing words extracted from the Tweets, we replaced these words by ids in order to avoid publishing the content of the tweets.
Anonymised Anonymized
ChildrenData No
Cross Border Authorised Yes
General Data No
Non Personal Data Explanation The data is obtained from Twitter timelines from 4 distinct sets of users. We calculated the ego networks without keeping any personal data. Twitter user IDs are anonymised.
Personal Data No
Personal data was manifestly made public by the data subject N/A (Not appliable)
Sensitive Data No
Additional Info
Field Value
Accessibility Trans National Access
Accessibility Mode Download
Availability On-Line
Basic rights Download
Creation Date 2022-11-22
Creator Ollivier, Kilian, [email protected], orcid.org/0000-0003-2881-5845
Dataset Citation @article{ollivier2022, title={Structural Invariants and Semantic Fingerprints in the ``Ego Network'' of Words}, author={Ollivier, Kilian and Boldrini, Chiara and Passarella, Andrea and Conti, Marco}, journal={PLoS ONE 17(11): e0277182}, year={2022}, doi={https://doi.org/10.1371/journal.pone.0277182}}
Dataset Re-Use Safeguards To be reused for research purposes only
Field/Scope of use Research only
Group Others
License term 2023-03-02 12:35/2043-03-02 12:35
Manifestation Type Original
Processing Degree Secondary
Retention Period 2043-03-02 12:35/3043-03-02 12:35
Semantic Coverage ego-network, nlp, topic mining
Sublicense rights No
Territory of use World Wide
Thematic Cluster Social Network Analysis [SNA]
system:type Dataset
Management Info
Field Value
Author Ollivier Kilian
Maintainer Ollivier Kilian
Version 1
Last Updated 24 June 2023, 01:15 (CEST)
Created 3 March 2023, 18:05 (CET)