Understanding Data Bias

Author: Prabhakar Krishnamurthy

Publisher: Medium

Publication Year: 2019

Summary: The following article discusses how most datasets suffer from bias which can affect conclusions drawn from the data in a way that is discriminatory. This paper describes different types of bias and how it may arise. Knowing the sources of bias can help us mitigate their effect or improve processes to collect data to use in modeling. 2 additional types of data bias to highlight that are not present: Response bias can be one source of missing information as opting in to a survey will only capture respondents who choose to reply; and omitted variable bias is another form of bias that is due to lack of information that may be relevant. Data scientists should be skeptical of conclusions drawn when little information was available to create the models.