approved
Shopping retail synthetic dataset (GaussianCopula)

Synthetic shopping retail consumption data generated with GaussianCopula. The dataset provides monthly information on the spending of synthetic customers belonging to two classes (i.e., native Italians and foreigners residing in Italy). The data was generated starting from the UniCoop Tirreno dataset [1]. For each expense, together with the nationality of the synthetic customer, both general and specific shopping behaviors are included. General features include the total and average indicators of quantities and capture the average frequency of the period within which a customer makes a purchase. Specific features capture the specific shopping behavior for each one of the various supermarket products. Note that, to avoid very sparse data, the products are grouped into categories representing goods of similar type, e.g., bread, pasta, tomatoes, milk, etc.

[1] Guidotti, R., Nanni, M., Giannotti, F., Pedreschi, D., Bertoli, S., Speciale, B., & Rapoport, H. (2021). Measuring immigrants adoption of natives shopping consumption with machine learning. In Machine Learning and Knowledge Discovery in Databases. Applied Data Science and Demo Track: European Conference, ECML PKDD 2020, Ghent, Belgium, September 14–18, 2020, Proceedings, Part V (pp. 369-385). Springer International Publishing.

Dataset extent

Map data © OpenStreetMap contributors
Tiles by MapBox
Tags
Data and Resources
To access the resources you must log in
Personal Data Attributes

Description: Personal Data related Information

Field Value
Anonymisation Methodology The dataset contains only synthetic data, and a random number represents fictitious customer IDs.
Anonymised Anonymized
ChildrenData No
Cross Border Authorised No
Data Flow Legal Basis The synthetic data was generated using the Synthetic Data Vault (SDV) python library starting from the UniCoop Tirreno dataset described in [1].[1] Guidotti, R., Nanni, M., Giannotti, F., Pedreschi, D., Bertoli, S., Speciale, B., & Rapoport, H. (2021). Measuring immigrants adoption of natives shopping consumption with machine learning. In Machine Learning and Knowledge Discovery in Databases. Applied Data Science and Demo Track: European Conference, ECML PKDD 2020, Ghent, Belgium, September 14–18, 2020, Proceedings, Part V (pp. 369-385). Springer International Publishing.
Data Protection Impact Assessment No
Ethics Committee Approval No
General Data Yes
Informed Consent Template No
Non Personal Data Explanation The dataset provides information relating to behavioral habits, i.e., retail shopping. However, the customers were synthetically generated and, thus, do not represent/identify real people.
Personal Data No
Personal data was manifestly made public by the data subject N/A (Not appliable)
Sensitive Data No
Additional Info
Field Value
Accessibility Both
Accessibility Mode Download
Availability On-Line
Basic rights Download
Creation Date 2023-11-28 18:00
Creator Laura Pollacci, [email protected], orcid.org/0000-0001-9914-1943
Dataset Citation Guidotti, R., Nanni, M., Giannotti, F., Pedreschi, D., Bertoli, S., Speciale, B., & Rapoport, H. (2021). Measuring immigrants adoption of natives shopping consumption with machine learning. In Machine Learning and Knowledge Discovery in Databases. Applied Data Science and Demo Track: European Conference, ECML PKDD 2020, Ghent, Belgium, September 14–18, 2020, Proceedings, Part V (pp. 369-385). Springer International Publishing.
Dataset Re-Use Safeguards None
DiskSize 382
Field/Scope of use Non-commercial research only
Format csv
Group Migration Studies
IP/Copyrights University of Pisa
License term 2023-11-28 18:00/2030-11-28 18:00
Manifestation Type Virtual
Ownership and Governance University of Pisa
Processing Degree Primary
Retention Period 2030-11-28
Semantic Coverage shopping retail, synthetic data, human integration
SoBigData Node SoBigData IT
SoBigData Node SoBigData EU
Sublicense rights No
Territory of use World Wide
Thematic Cluster Human Mobility Analytics [HMA]
Time Coverage 2008-01-01 /2015-12-31
spatial
{"type":"Point", "coordinates":[10.319824330508709,43.46411146223545]}
system:type Dataset
Management Info
Field Value
Author Pollacci Laura
Maintainer Pollacci Laura
Version 1
Last Updated 9 December 2023, 11:27 (CET)
Created 9 December 2023, 11:27 (CET)