CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível SuperiorResearcher
Sep. 2020 - Oct. 2021Niterói, Rio de Janeiro, BrasilFor the execution of this project, the tools Scikit Multiflow and Scikit Learn, implemented in Python, were used. Scikit learn has been widely used for activities involving the use of machine learning because its source code is open, allowing both its reuse and its use as a library in systems that require the use of machine learning. In addition, prototyping in Python is a very interesting feature for validation of concepts. Scikit Multiflow was inspired by MOA (BIFET et al, 2018), and uses Scikit Learn as a base. Like Scikit Learn, Scikit Multiflow was also implemented in Java and its source code is also open. The K-NN and SAM K-NN algorithms are implemented in Scikit Multiflow. This version was used as a basis for adapting the algorithms, in order to allow data to be forgotten. It should be noted that both tools are based on machine learning frameworks, which involve the steps of pre-processing data, learning and evaluating the models (concepts) built, which requires a learning curve on the part of the undergraduate student. from framework. Therefore, it is important to note that adapting the algorithm is not a trivial task, as it involves studying the impact of removing data from the concept from a statistical point of view. This project is part of a larger project, which aims to study the effects of the right to be forgotten on machine learning. The doctoral student I co-supervised, Leandro Botelho, also participates in the discussions of this work, as he assists in his research project on the issue of evolution of predictive models. In addition, this broader project is being carried out in partnership with Prof. Albert Bifet, from Paris-Tech University, author of the book Machine Learning for Data Streams with Practical Examples in MOA (BIFET et al, 2018).