--

I would also add a question under 'The Ethical Perspective': Why are we creating this AI model, and will the selected features be sufficient in meeting that purpose? If, for example, we intend for this model to generalize to the larger population yet want to rely on just 2 or 3 features, then we had better make sure that those variables lend to a very sensitive model.

As a sociologist, I hesitate to throw out any of these variables in the example given, on the basis that we ‘may need to explain the model and that may be hurtful.’ Quite the contrary: why is it that being a wife / single / etc would lead to a disparity in income? These are critical points to explore. We absolutely need to accurately quantify and model income disparities in order to be able to talk about the structure of privilege which leads to those disparities. From a statistical viewpoint, we don’t want to overwhelm our model with features; in traditional regression models we’d exclude variables which offered only marginal model improvement (unless they were considered key to the study at hand). Now, if we’re actually talking about whether the coding of the features was done correctly and whether we should recode, that’s another story.

By the way, any researcher with rigorous training in statistics and the scientific method will first make sure that the model they’re building will use an appropriate database with culturally-appropriate variables selected with care, and apply the correct techniques to answer the desired research questions. We need to take care in our methods, but not to the extent that we lose valuable explanatory power.

--

--

Danielle Durán, M.S.
Danielle Durán, M.S.

Written by Danielle Durán, M.S.

Statistician. Co-founder. ESG Advocate.

No responses yet