Skip to content

Foster Accurate and Fair Data Collection new

Overview

Sustainability DimensionSocial
ML Development PhaseData Collection and Preparation
ML Development StakeholdersDomain Expert, ML Development, Auditing & Testing

Description

The DP “Foster Accurate and Fair Data Collection” bundles mitigation techniques within the data collection step to enhance the fairness of the dataset throughout the collection process. Several authors show that it is essential to be aware of biases in the underlying data (Ferrara, 2023; Greshgorn, 2018; Holstein et al., 2019). For instance, in cancer care the goal is to improve prevention. Thus, fair data sets may include additional factors like ethnicity, disability, and other determinants to reflect real-world circumstances better (Dankwa-Mullan & Weeraratne, 2022).

Sources

  • Ferrara, E. (2023). Fairness And Bias in Artificial Intelligence: A Brief Survey of Sources, Impacts, And Mitigation Strategies (arXiv:2304.07683). arXiv.
  • Greshgorn, D. (2018). If AI is going to be the world’s doctor, it needs better textbooks [Newspage]. https://qz.com/1367177/if-ai-is-going-to-be-the-worlds-doctor-it-needs-better-textbooks
  • Holstein, K., Wortman Vaughan, J., Daumé, H., Dudik, M., & Wallach, H. (2019). Improving Fairness in Machine Learning Systems: What Do Industry Practitioners Need? Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, 1–16. https://doi.org/10.1145/3290605.3300830
  • Dankwa-Mullan, I., & Weeraratne, D. (2022). Artificial Intelligence and Machine Learning Technologies in Cancer Care: Addressing Disparities, Bias, and Data Diversity. Cancer Discovery, 12(6), 1423–1427. https://doi.org/10.1158/2159-8290.CD-22-0373