Robust Visual Learning across Class Imbalance and Distributional Shift

Robust Visual Learning across Class Imbalance and Distributional Shift

  • Emanuel Sánchez Aimar
Publisher:Linköping University Electronic PressISBN 13: 9789181183085ISBN 10: 9181183089

Paperback & Hardcover deals ―

Amazon IndiaGOFlipkart GOSnapdealGOSapnaOnlineGOJain Book AgencyGOBooks Wagon₹1,358Book ChorGOCrosswordGODC BooksGO

e-book & Audiobook deals ―

Amazon India GOGoogle Play Books ₹0Audible GO

* Price may vary from time to time.

* GO = We're not able to fetch the price (please check manually visiting the website).

Know about the book -

Robust Visual Learning across Class Imbalance and Distributional Shift is written by Emanuel Sánchez Aimar and published by Linköping University Electronic Press. It's available with International Standard Book Number or ISBN identification 9181183089 (ISBN 10) and 9789181183085 (ISBN 13).

Computer vision aims to equip machines with perceptual understanding—detecting, recognizing, localizing, and relating visual entities to existing sources of knowledge. Machine learning provides the mechanism: models learn representations and decision rules from data and are expected to generalize beyond the training distribution. These systems already support biodiversity monitoring, autonomous driving, and geospatial mapping. In practice, however, textbook assumptions break down: the concept space is vast, data is sparse and imbalanced, many categories are rare, and high-quality annotations are costly. In addition, deployment conditions shift over time—class frequencies and visual domains evolve—biasing models toward frequent scenarios and eroding reliability. In this work, we develop methods for training reliable visual recognition models under more realistic conditions: class imbalance, limited labeled data, and distribution shift. Our contributions span three themes: (1) debiasing strategies for imbalanced classification that remain reliable under changes in class priors; (2) semi-supervised learning techniques tailored to imbalanced data to reduce annotation cost while preserving minority-class performance; and (3) a unified multimodal retrieval approach for remote sensing (RS) that narrows the domain gap. In Paper A, we study long-tailed image recognition, where skewed training data biases classifiers toward frequent classes. During deployment, changes in class priors can further amplify this bias. We propose an ensemble of skill-diverse experts, each trained under a distinct target prior, and aggregate their predictions to balance head and tail performance. We theoretically show that the ensemble’s prior bias equals the mean expert bias and that choosing complementary target priors cancels it, yielding an unbiased predictor that minimizes balanced error. With calibrated experts—achieved in practice via Mixup—the ensemble attains state-of-the-art accuracy and remains reliable under label shift. In Paper B, we investigate long-tailed recognition in the semi-supervised setting, where a small, imbalanced labeled set is paired with a large unlabeled pool. Semi-supervised learning leverages unlabeled data to reduce annotation costs, typically through pseudo-labeling, but the unlabeled class distribution is often unknown and skewed. Naïve pseudo-labeling propagates the labeled bias, reinforcing head classes and overlooking rare ones. We propose a flexible distribution-alignment framework that estimates the unlabeled class mix online and reweights pseudo-labels accordingly, guiding the model first toward the unlabeled distribution to stabilize training and then toward a balanced classifier for fair inference. The proposed approach leverages unlabeled data more effectively, improving accuracy, calibration, and robustness to unknown unlabeled priors. In Paper C, we move beyond recognition to unified multimodal retrieval for remote sensing—a domain with scarce image–text annotations and a challenging shift from natural images. Prior solutions are fragmented: RS dual encoders lack interleaved input support; universal embedders miss spatial metadata and degrade under domain shift; and RS generative assistants reason over regions but lack scalable retrieval. To overcome these limitations, we introduce VLM2GeoVec, a single-encoder, instruction-following embedder that aligns images, text, regions, and geocoordinates in a shared space. For comprehensive evaluation, we also propose RSMEB, a unified retrieval benchmark that spans conventional tasks (e.g., classification, cross-modal retrieval) and novel interleaved tasks (e.g., visual grounding, spatial localization, semantic geo-localization). In RSMEB, VLM2GeoVec narrows the domain gap relative to universal embedders and matches specialized baselines in conventional tasks in zero-shot settings. It further enables interleaved spatially-aware search, delivering several-fold gains in metadata-aware RS applications.