Ido Galil

I am a Deep Learning Researcher in the Deci group at NVIDIA (formerly Deci AI, acquired by NVIDIA) and a finishing PhD student under Prof. Ran El-Yaniv at the CS faculty, Technion. My work focuses on Neural Architecture Search (NAS) for large language models (LLMs) and generative AI. In my PhD research, I study deep neural networks’ reliability and safety in computer vision and natural language processing, with an emphasis on uncertainty estimation, selective prediction, and adversarial robustness.

Email / Scholar / Linkedin

Publications

	Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I Models Authors: Michael Toker · Ido Galil · Hadas Orgad · Rinon Gal · Yoad Tewel · Gal Chechik · Yonatan Belinkov NAACL, 2025 (Oral) TL;DR: Our work reveals how text-to-image (T2I) diffusion models use “empty” padding tokens, which can still influence generated images depending on model architecture and training. Read More In T2I pipelines, prompts are padded to a fixed length with a special pad token—normally ignored by language models. However, we find T2I models may treat them differently. We develop two causal intervention methods in the text encoder and the diffusion process to test whether these tokens carry semantic information. Experiments on six T2I models reveal scenarios where padding tokens are ignored, or become semantically significant, or even serve as “registers” that store and recall data during diffusion. Our findings underscore the need for more precise handling of padding tokens in T2I model design. Paper
	Puzzle: Distillation-Based NAS for Inference-Optimized LLMs Authors: Akhiad Bercovich · Tomer Ronen · Talor Abramovich · Nir Ailon · Nave Assaf · Mohammad Dabbah · Ido Galil · Amnon Geifman · Yonatan Geifman · Izhak Golan · Netanel Haber · Ehud Karpas · Roi Koren · Itay Levy · Pavlo Molchanov · Shahar Mor · Zach Moshe · Najeeb Nabwani · Omri Puny · Ran Rubin · Itamar Schen · Ido Shahaf · Oren Tropp · Omer Ullman Argov · Ran Zilberstein · Ran El-Yaniv ArXiv, 2024 TL;DR: Puzzle accelerates LLM inference on specific hardware by leveraging blockwise local knowledge distillation and mixed-integer programming to preserve model performance while significantly reducing inference costs. Read More Despite LLMs’ impressive results, they are often limited by computational costs during inference. Puzzle addresses this by optimizing large-scale models for specific hardware without sacrificing accuracy, resulting in up to 2.17× speedups. Paper
	Hierarchical Selective Classification Authors: Shani Goren* · Ido Galil* · Ran El-Yaniv (Equal contribution) NeurIPS, 2024* TL;DR: We extend selective classification to a hierarchical setting, allowing models to reduce the specificity of predictions when uncertain. Read More Traditional selective classification only allows a full prediction or refusal. Our method uses class hierarchies to offer partial but valuable predictions (e.g., “malignant tumor” without specifying the subtype), improving calibration and risk-coverage trade-offs. Paper / Video / Code
	A Framework for Benchmarking Class-out-of-distribution Detection and its Application to ImageNet Authors: Ido Galil* · Mohammed Dabbah* · Ran El-Yaniv (Equal contribution) ICLR, 2023 (Top 25%)* TL;DR: Introduces a new approach to generate multi-level C-OOD benchmarks for ImageNet classifiers, applied to 500+ models to reveal novel insights in open-set recognition. Read More Existing OOD benchmarks can be too easy or biased toward a particular model. Our framework systematically evaluates different detectors across multiple difficulty levels, uncovering how training regimes, architecture choices, and other factors influence performance. Paper / Video / Code
	What Can We Learn From the Selective Prediction and Uncertainty Estimation Performance of 523 ImageNet Classifiers? Authors: Ido Galil · Mohammed Dabbah · Ran El-Yaniv ICLR, 2023 TL;DR: Extensive study on selective prediction and uncertainty estimation across 523 ImageNet models, highlighting that distillation and certain training regimes yield superior calibration and ranking. Read More Metrics such as AUROC, ECE, selective risk, and SAC show that knowledge distillation significantly improves uncertainty estimation. A subset of ViTs outperforms other architectures, and temperature scaling benefits both calibration and ranking performance more than previously realized. Paper / Video / Code
	Disrupting Deep Uncertainty Estimation Without Harming Accuracy Authors: Ido Galil · Ran El-Yaniv NeurIPS, 2021 TL;DR: ACE (Attack on Confidence Estimation) disrupts a neural network’s uncertainty estimations without affecting its accuracy, making standard selective mechanisms unreliable. Read More Traditional adversarial attacks cross decision boundaries to harm accuracy. ACE selectively increases or decreases confidence for correct/incorrect predictions without crossing boundaries, rendering uncertainty estimates dangerous in sensitive scenarios. Paper / Video / Code

Media / Interviews

I was interviewed (in Hebrew) about my PhD research and teaching experience. You can listen to the interview on Spotify.

Teaching

I served as a TA for the “Data Structures” course at the Technion for 3.5 years. All my tutorials and other helpful materials (in Hebrew) are available on my YouTube channel.

Awards & Honors

Riva Dam Foundation Honors Scholarship for Excellence in PhD (2023)
Teaching Assistant Excellence Award (5 semesters)
Final's award for excellence in Computer Science (2018)