Impact of Biases in Big Data

Special Session on "Impact of Biases in Big Data" at the 26th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2018)

Slides of introductory talk

Survey paper

Conference date: April 25 - April 27, 2018

Location: Bruges, Belgium

Special Session:

ESANN 2018:

Important Dates:

Paper submission due: November 20, 2017

Notification of acceptance: January 31, 2018

Early registration deadline: February 16, 2018

Conference: April 25 - April 27, 2018


Since its first happening in 1993, the European Symposium on Artificial Neural Networks (ESANN) has become the reference for researchers on fundamentals and theoretical aspects of artificial neural networks, computational intelligence, machine learning and related topics. Each year, around 100 specialists attend ESANN, in order to present their latest results and comprehensive surveys, and to discuss the future developments in this field. The ESANN 2018 conference will follow this tradition, while adapting its scope to the recent developments in the field. The ESANN conferences cover artificial neural networks, machine learning, statistical information processing and computational intelligence. Mathematical foundations, algorithms and tools, and applications are covered.

Special Session on "Impact of Biases in Big Data”

For about the last decade, the Big Data paradigm that has dominated research in machine learning can be summarized as follows: "It’s not who has the best algorithm that wins. It’s who has the most data." However, most data sets are biased and the corresponding biases are often ignored in research. This in turn makes the learned models unreliable. Concretely, the most frequently appearing biases in data sets are class imbalance and sample selection bias.

Class imbalance refers to a data set having a substantially different amount of examples per class. Models trained on such data sets often tend to predict the majority class, e.g. when using inappropriate metrics such as the accuracy. Sample selection bias refers to the problem of training data and production data having different distributions. In many real-world applications, this is a common issue because we do not have complete control over the data gathering process. Models trained on such training data poorly generalize to the production data. For both class imbalance and selection bias, having more representative data will help rather than just having a lot of more data.

This special session will be an opportunity for both researchers and practitioners, to present and discuss their latest works on the impact of biases in data sets on models and how these can be reduced. Topics include, but are not limited, to:

- Quantifying biases

- Class imbalance

- Sample selection bias and covariate shift

- Global and local learners

- Biases in Deep Learning

- Evaluation metrics for biased data sets

- Reweighting and subsampling methods

- Biases in anomaly detection

- Biases in spatial data and time series

- Sociological impact of biased machine learning models, e.g. biases in the criminal justice system

Session Organizers

Patrick Glauner, University of Luxembourg, Luxembourg,

Petko Valtchev, University of Quebec at Montreal, Canada

Radu State, University of Luxembourg, Luxembourg