Data Science applied to arca: development and availability of tools for information retrieval in the Institutional Repository of Fundação Oswaldo cruz
DOI:
https://doi.org/10.29397/reciis.v11i0.1417Keywords:
Data Science, Information Storage and Retrieval, Data Mining, Machine Learning, Institutional Repositories.Abstract
The Arca institutional repository is the main instrument of open access at the Oswaldo Cruz Foundation, with the mission of gathering, hosting, preserving, making available and giving visibility to the institution’s intellectual production. The thematic diversity and institutional complexity of the Foundation foster a methodological challenge related to the classification and retrieval of deposited digital objects and the governance of the metadata recorded by the communities that make up the repository. In 2016, the Arca search engine counted more than 400 thousand queries. An Information Retrieval system is needed that meets the specificities of indexing the repository and the growing demand for information from users internal and external to Fiocruz. In this work we propose the use of Data Science tools, especially Data Mining and Machine Learning techniques, with the objective of improving Information Retrieval by means of automatic classification of digital objects deposited in the Arca and the development and availability of the system of IR based on quality metrics related to precision and recall concepts.Downloads
Published
How to Cite
Issue
Section
License
Author’s rights: The author retains unrestricted rights over his work.
Rights to reuse: Reciis adopts the Creative Commons License, CC BY-NC non-commercial attribution according to the Policy on Open Access to Knowledge by Oswaldo Cruz Foundation. With this license, access, download, copy, print, share, reuse, and distribution of articles is allowed, provided that it is for non-commercial use and with source citation, granting proper authorship credits and reference to Reciis. In such cases, no permission is required from the authors or editors.
Rights of authors’s deposit / self-archiving: The authors are encouraged to deposit the published version, along with the link of their article in Reciis, in institutional repositories.