Ethics of algorithms or data? Or how they are used?

By now, we are all aware of the potential of AI and, to some extent, the risks associated with its unethical use.

However, I would like to bring attention to a use case that might change the perception of what is ethical and what the definition of ethics entails.

A well-known open dataset from UCI includes the characteristics of employees in a company, and among the attributes, there is a variable that can be used as a target, representing the status of the employee (attrition: yes or no).

The objective could be to pay more attention to employees who, according to the model, appear to be at higher risk of attrition, and this goal might alter the concept of ethics (which, by the way, is not uniform across communities, cultures, or contexts). For example, dataset bias related to attributes like gender or age in this case could help focus more on the disadvantaged groups (here meant as attributes). This is just a different point of view and it doesn’t necessarily mean it’s an ethical approach (e.g. somebody may object that a model built on this data would allow for retaining just resources with high scores in performance reviews).

Below is an analysis of the dataset that highlights some interesting aspects, such as the importance of certain attributes that may not be intuitively significant, or vice versa. For instance, after removing the employee number, which represents an identity, monthly salary ranks only fifth in importance, while gender is among the least important, thus having minimal influence on the target variable.

By breaking down according to the maximum value of Gini impurity 2p*(1-p), a binary tree is constructed in this way and shown in figure.

The second variable to observe is precisely OverTime, which also represents the dependency of attrition on the overtime value recorded for the employee. In this case we use a CNN and shapley values to determine dependence of the target from independent variables.

Finally, we must note that age has strong impact on the decision, but it is quite fragmented and it is selected to separate very well the classes close to the leaves. Here below two examples of clear separation between the two classes.

Edited by G.Fruscio