PMSC-UGR is a document collection  built using a large subset of MEDLINE/PubMed scientific articles, which have been subjected to a disambiguation process to identify unequivocally who are their authors (using ORCID). The collection has also been completed by adding citations to these articles available through Scopus/Elsevier’s API.

It can be used as a test collection for experiments with expert finding, document filtering, publication venue recommendation, expert profiling and text classification.

This paper describes the dataset and the methodology followed when collecting it. Please cite it if you intend to use this dataset:

The version of the collection available here (containing only papers from 2007 to 2016) has been used in the following papers :

  • L.M. de Campos, J.M. Fernández-Luna, J.F. Huete, Publication venue recommendation using profiles based on clustering, IEEE Access, 10:106886-106896, 2022. DOI 10.1109/ACCESS.2022.3212531
  • L.M. de Campos, J.M. Fernández-Luna, J.F. Huete, Use of topical and temporal profiles and their hybridisation for content-based recommendation, User Modeling and User-Adapted Interaction 33(4):911-937, 2023. DOI 10.1007/s11257-022-09354-7
  • L.M. de Campos, J.M. Fernández-Luna, J.F. Huete, Fusion strategies to combine topical and temporal information for publication venue recommendation, Proceedings of the 2nd Joint Conference of the Information Retrieval Communities in Europe (CIRCLE 2022), Eds. Lynda Tamine, Enrique Amigó, Josiane Mothe, Volume 3178 of CEUR workshop proceedings, 2022.