Deleting Unethical Data Sets Isn’t Good Enough

Author: Karen Hao

Publisher: MIT Technology Review

Publication Year: 2021

Summary: The following article argues that deleting unethical data sets is not sufficient to be ethically responsible, and that “those working in [artificial intelligence (AI)] must also make a long-term commitment to maintaining them and using them ethically.” The author begins by telling of what happened with MS-Celeb-1M. It was released in 2016 as the largest face database in the world and was originally only supposed to contain celebrities’ faces, but this term was very loosely defined due to the inordinate amount of non-celebrities whose faces could be found in the database. Although this is just one example, there are numerous other databases with information on individuals who did not consent for their information to be public. To address the issue, the author echoes Margaret Mitchell, an AI ethics researcher and a leader in responsible data practices, by stating that data set stewardship organizations should exist for “scraped data that could contain biometric or personally identifiable information or intellectual property.” These data set stewardship organizations would ensure the ethical practice of data maintenance that is legal and has the subjects’ best interests at heart. If data scientists always handle every step of the process with extreme caution and care, many of the issues surrounding unethical data usage will be resolved.