Events - The Faculty of Data and Decision Sciences On the Limitations of Dataset Balancing: The Lost Battle Against Spurious Correlations

Home > Events

03.11 13:30 - 14:30

On the Limitations of Dataset Balancing: The Lost Battle Against Spurious Correlations

Category:

By:Roy Schwartz

From:Hebrew University of Jerusalem

Where:ZOOM

Advisors:

Academic Degree:

Abstract:

Recent work has shown that deep learning models in NLP are highly sensitive to low-level correlations between simple features and specific output labels, leading to overfitting and lack of generalization. To mitigate this problem, a common practice is to balance datasets by adding new instances or by filtering out “easy” instances, culminating in a recent proposal to eliminate single-word correlations altogether. In this talk, I will identify that despite these efforts, increasingly-powerful models keep exploiting ever-smaller spurious correlations, and as a result even balancing all single-word features is insufficient for mitigating all of these correlations. In parallel, a truly balanced dataset may be bound to “throw the baby out with the bathwater” and miss important signals encoding common sense and world knowledge. I will highlight several alternatives to dataset balancing, focusing on enhancing datasets with richer contexts, allowing models to abstain and interact with users, and turning from large-scale fine-tuning to zero- or few-shot setups.

This is joint work with Gabriel Stanovsky.

Zoom Link

https://technion.zoom.us/j/94950420992

View Past Events / Seminars