approved
EUR-Lex MOSTA

This dataset contains 4176 non-empty official public EU legal judgments that were finalized between 2008 and 2018, categorized in one or more subject matters, that fall within the case-law sector and the Court of Justice. In the dataset, we can find 133 distinct subject matters. In order to build the set of citation-based embeddings, we adopted a custom strategy to extract citations from the dataset, since they were not available as structured data. In particular, we reached EUR-Lex to identify common rules adopted for citations in the legal judgments of this specific dataset. Following their indications, we pre-processed the set of judgments J by: i) lowercasing the text, ii) removing punctuation except for the forward slash and the parenthesis (commonly used in citations), and iii) removing stop words except for the word of (commonly used in citations). Subsequently, we designed custom regular expressions to extract citations towards Directives, Decisions, and Regulations, following the numbering rules for articles and sub-levels.

Tags
Data and Resources
To access the resources you must log in
  • EUR-Lex MOSTAZIP

    This dataset contains 4176 non-empty official public EU legal judgments that...

    The resource: 'EUR-Lex MOSTA' is not accessible as guest user. You must login to access it!
Personal Data Attributes

Description: Personal Data related Information

Field Value
Anonymised No
ChildrenData No
Cross Border Authorised Yes
General Data Yes
Personal Data No
Personal data was manifestly made public by the data subject N/A (Not appliable)
Sensitive Data No
Additional Info
Field Value
Accessibility Both
Accessibility Mode Download
Associate Project FAIR
Availability On-Line
Basic rights Download
Creation Date 2024-07-09
Creator De Martino, Graziella, [email protected], orcid.org/0000-0002-3492-6317
Dataset Citation Graziella De Martino, Gianvito Pio, Michelangelo Ceci: Multi-view overlapping clustering for the identification of the subject matter of legal judgments. Inf. Sci. 638: 118956 (2023) DOI: https://doi.org/10.1016/j.ins.2023.118956
Dataset Re-Use Safeguards None
Field/Scope of use Non-commercial only
Format .pkl
Group Others
License term 2024-07-09 /2040-12-31
Manifestation Type Original
Processing Degree Primary
Retention Period 2024-07-09 /2040-12-31
SoBigData Node SoBigData EU
SoBigData Node SoBigData IT
Sublicense rights No
Territory of use World Wide
Thematic Cluster Other
system:type Dataset
Management Info
Field Value
Author De Martino Graziella
Maintainer De Martino Graziella
Version 1
Last Updated 23 November 2024, 16:08 (CET)
Created 23 November 2024, 16:07 (CET)