1958
The Faculty was Founded
3
Under Graduate Programs
7
Graduate Programs
52
Faculty Staff Members - The biggest faculty in the field in Israel
50%
Female Graduates

03.11 13:30 - 14:30
On the Limitations of Dataset Balancing: The Lost Battle Against Spurious Correlations
Roy Schwartz
Hebrew University of Jerusalem
ZOOM

Abstract:

Recent work has shown that deep learning models in NLP are highly sensitive to low-level correlations between simple features and specific output labels, leading to overfitting and lack of generalization. To mitigate this problem, a common practice is to balance datasets by adding new instances or by filtering out “easy” instances, culminating in a recent proposal to eliminate single-word correlations altogether. In this talk, I will identify that despite these efforts, increasingly-powerful models keep exploiting ever-smaller spurious correlations, and as a result even balancing all single-word features is insufficient for mitigating all of these correlations. In parallel, a truly balanced dataset may be bound to “throw the baby out with the bathwater” and miss important signals encoding common sense and world knowledge. I will highlight several alternatives to dataset balancing, focusing on enhancing datasets with richer contexts, allowing models to abstain and interact with users, and turning from large-scale fine-tuning to zero- or few-shot setups.

 

This is joint work with Gabriel Stanovsky.

 

Zoom Link

https://technion.zoom.us/j/94950420992

Scroll Top