Every Database is Biased

“People generally see what they look for, and hear what they listen for.”

To Kill a Mockingbird, Harper Lee

AI and data science adoption rates are soaring as more organizations pursue a data-driven agenda. But have you stopped to consider the ethics of AI? It’s a complex undertaking, with many businesses struggling to apply ethical considerations in their day-to-day work.

‘Bias’ is a term that often gets thrown around, stalling data-driven initiatives, complicating project implementations and confusing stakeholders. But it’s a key consideration to take into account.

So, how can your organization achieve the right balance between the ethics of AI and achieving your business objectives? In this post, I’ll focus on a few important elements of bias, explaining how your business can embrace AI and Data Science in an ethical manner for digital success.

Wikipedia defines bias as:

“Bias is a disproportionate weight in favor of or against an idea or thing, usually in a way that is closed-mindedprejudicial, or unfair. Biases can be innate or learned. People may develop biases for or against an individual, a group, or a belief.[1] In science and engineering, a bias is a systematic errorStatistical bias results from an unfair sampling of a population, or from an estimation process that does not give accurate results on average.”

Let’s unpack the basics of that statement and explain why they are important from a Data Science context.

1. Data conditions

Good quality data is not important for good AI, right? Wrong. Ask any experienced data scientist and they’ll tell you the same thing: to make accurate (and therefore ethical) decisions based on your data, the quality of your data is essential.  

Another misconception is that data is objective. Bias within data, however, can lead to incorrect conclusions or reinforce existing prejudices within your data. As such, the state of your data and your data management efforts are incredibly important. Data privacy and data security are, therefore, vital boundary conditions for ethical data usage.  

From another perspective: you may think all databases are biased since, by their very nature, they are a selection of datasets (and cannot include everything   ). However, it is more important to understand the basics of your data sample, including how your selection of data (i.e. your database), and/or its sub-selections relate to one another.

2. Model conditions

You must take data bias and quality into account at the modeling stage. Bias can show up in the data and it can also be introduced when you select attributes for an AI model.

The transparency of your model matters. You must have justifiable reasons to opt for a more powerful but less transparent model. The good news is that transparency is not impossible to achieve. You can increase the transparency of, for example, a complicated neural network model by analyzing its operation or function, or by introducing human supervision.

Either way, an AI model must be auditable to ensure the output of the model or to ensure the steps leading to the model are replicable. To achieve this, an external company or your internal teams can conduct an audit.

3. Data scientist conditions

Whatever project you’re working on, it is unethical to act against your existing policies, rules or regulations.

This tenet also applies to data science. But you must have a clear accountability agreement in place to provide a consistent approach to ethics across your team. Your data scientists must also work in a proportional and transparent manner, adopting the least intrusive data strategies and clearly documenting your policies, rules and regulations.

4. Impact on stakeholders

Your AI and data science project has a people impact on both your employees and the data owners.

You should allow employees to provide feedback across the project lifecycle, including after deployment. You should also allow data owners to report any suspected issues. You may also need to make special considerations around the impact of your data project on vulnerable groups.

Accessibility is another consideration where people should have access to your AI products and services. This will safeguard certain groups within society, ensuring they are not discriminated against when your AI-based technologies are used in the wider world.

5. Impact on community

From a social, environmental and democratic perspective, data projects must have a positive impact on our community. Cambridge Analytica’s use of Facebook data during the 2016 US election is a clear example here of what not to do.

You should also apply one final consideration: the headline check. If you cannot easily justify your data project in one simple sentence, you may want to leave it on the drawing board. 

Here are some of the key ethical considerations for every data-driven initiative:

Integrated Approach of Data Science and Ethics

 

Intellerts ©

Martin Haagoort

m.haagoort@intellerts.com

MD Intellerts

Share this article:
Share on facebook
Share on twitter
Share on linkedin
Share on whatsapp
Share on pinterest

Hello!

Join our data science mailinglist

This website uses cookies to ensure you get the best experience on our website. More information.