Ido Galil
I am a Deep Learning Researcher in the Deci group at NVIDIA (formerly Deci AI, acquired by NVIDIA) and a finishing PhD student under Prof. Ran El-Yaniv at the
CS faculty,
Technion.
My work focuses on Neural Architecture Search (NAS) for large language models (LLMs) and generative AI.
In my PhD research, I study deep neural networks’ reliability and safety in computer vision and natural language processing,
with an emphasis on uncertainty estimation, selective prediction, and adversarial robustness.
Email /
Scholar /
Linkedin
|
|
|
Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I Models
Authors: Michael Toker · Ido Galil · Hadas Orgad · Rinon Gal · Yoad Tewel · Gal Chechik · Yonatan Belinkov
NAACL, 2025
TL;DR: Our work reveals how text-to-image (T2I) diffusion models use “empty” padding tokens, which can still influence generated images depending on model architecture and training.
Read More
In T2I pipelines, prompts are padded to a fixed length with a special pad token—normally ignored by language models. However, we find T2I models may treat them differently. We develop two causal intervention methods in the text encoder and the diffusion process to test whether these tokens carry semantic information. Experiments on six T2I models reveal scenarios where padding tokens are ignored, or become semantically significant, or even serve as “registers” that store and recall data during diffusion. Our findings underscore the need for more precise handling of padding tokens in T2I model design.
Paper
|
|
Puzzle: Distillation-Based NAS for Inference-Optimized LLMs
Authors: Akhiad Bercovich · Tomer Ronen · Talor Abramovich · Nir Ailon · Nave Assaf ·
Mohammad Dabbah · Ido Galil · Amnon Geifman · Yonatan Geifman · Izhak Golan ·
Netanel Haber · Ehud Karpas · Roi Koren · Itay Levy · Pavlo Molchanov · Shahar Mor ·
Zach Moshe · Najeeb Nabwani · Omri Puny · Ran Rubin · Itamar Schen · Ido Shahaf ·
Oren Tropp · Omer Ullman Argov · Ran Zilberstein · Ran El-Yaniv
ArXiv, 2024
TL;DR: Puzzle accelerates LLM inference on specific hardware by leveraging blockwise local knowledge distillation and mixed-integer programming to preserve model performance while significantly reducing inference costs.
Read More
Despite LLMs’ impressive results, they are often limited by computational costs during inference.
Puzzle addresses this by optimizing large-scale models for specific hardware without sacrificing accuracy,
resulting in up to 2.17× speedups.
Paper
|
|
Hierarchical Selective Classification
Authors: Shani Goren* · Ido Galil* · Ran El-Yaniv
(*Equal contribution)
NeurIPS, 2024
TL;DR: We extend selective classification to a hierarchical setting, allowing models to reduce the specificity of predictions when uncertain.
Read More
Traditional selective classification only allows a full prediction or refusal.
Our method uses class hierarchies to offer partial but valuable predictions (e.g., “malignant tumor” without specifying the subtype), improving calibration and risk-coverage trade-offs.
Paper /
Video /
Code
|
|
A Framework for Benchmarking Class-out-of-distribution Detection and its Application to ImageNet
Authors: Ido Galil* · Mohammed Dabbah* · Ran El-Yaniv
(*Equal contribution)
ICLR, 2023 (Top 25%)
TL;DR: Introduces a new approach to generate multi-level C-OOD benchmarks for ImageNet classifiers, applied to 500+ models to reveal novel insights in open-set recognition.
Read More
Existing OOD benchmarks can be too easy or biased toward a particular model.
Our framework systematically evaluates different detectors across multiple difficulty levels, uncovering how training regimes, architecture choices, and other factors influence performance.
Paper /
Video /
Code
|
|
What Can We Learn From the Selective Prediction and Uncertainty Estimation Performance of 523 ImageNet Classifiers?
Authors: Ido Galil · Mohammed Dabbah · Ran El-Yaniv
ICLR, 2023
TL;DR: Extensive study on selective prediction and uncertainty estimation across 523 ImageNet models, highlighting that distillation and certain training regimes yield superior calibration and ranking.
Read More
Metrics such as AUROC, ECE, selective risk, and SAC show that knowledge distillation significantly improves uncertainty estimation.
A subset of ViTs outperforms other architectures, and temperature scaling benefits both calibration and ranking performance more than previously realized.
Paper /
Video /
Code
|
|
Disrupting Deep Uncertainty Estimation Without Harming Accuracy
Authors: Ido Galil · Ran El-Yaniv
NeurIPS, 2021
TL;DR: ACE (Attack on Confidence Estimation) disrupts a neural network’s uncertainty estimations without affecting its accuracy, making standard selective mechanisms unreliable.
Read More
Traditional adversarial attacks cross decision boundaries to harm accuracy.
ACE selectively increases or decreases confidence for correct/incorrect predictions without crossing boundaries,
rendering uncertainty estimates dangerous in sensitive scenarios.
Paper /
Video /
Code
|
I was interviewed (in Hebrew) about my PhD research and teaching experience.
You can listen to the interview on
Spotify.
|
I served as a TA for the “Data Structures” course at the Technion for 3.5 years.
All my tutorials and other helpful materials (in Hebrew) are available on my
YouTube channel.
|
- Riva Dam Foundation Honors Scholarship for Excellence in PhD (2023)
- Teaching Assistant Excellence Award (5 semesters)
- Final's award for excellence in Computer Science (2018)
|
|