|
Ido Galil
I am a Deep Learning Researcher in the Deci group at NVIDIA (formerly Deci AI, acquired by NVIDIA).
My work focuses on improving the inference efficiency of large language models (LLMs) and generative AI.
I graduated my PhD under Prof. Ran El-Yaniv at the
CS faculty,
Technion.
In my PhD research, I studied deep neural networks’ reliability and safety in computer vision and natural language processing,
with an emphasis on uncertainty estimation, selective prediction, and adversarial robustness.
Email /
Scholar /
LinkedIn
|
|
|
FFN Fusion: Rethinking Sequential Computation in Large Language Models
Authors: Akhiad Bercovich · Mohammad Dabbah · Omri Puny · Ido Galil · Amnon Geifman · Yonatan Geifman · Izhak Golan · Ehud Karpas · Itay Levy · Zach Moshe · Najeeb Nabwani · Tomer Ronen · Itamar Schen · Elad Segal · Ido Shahaf · Oren Tropp · Ran Zilberstein · Ran El-Yaniv
NeurIPS, 2025 (Spotlight)
TL;DR: FFN Fusion fuses consecutive FFN layers into larger blocks, reducing sequential depth and accelerating inference with minimal accuracy impact.
Read More
We run Puzzle to search hardware‑aware designs, then fuse consecutive FFNs into larger FFNs, decreasing the model's depth. Across models from tens to hundreds of billions of parameters, FFN Fusion reduces latency and cost while preserving quality, and complements techniques such as quantization and pruning.
Paper
|
|
Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I Models
Authors: Michael Toker · Ido Galil · Hadas Orgad · Rinon Gal · Yoad Tewel · Gal Chechik · Yonatan Belinkov
NAACL, 2025 (Oral)
TL;DR: Our work reveals how text-to-image (T2I) diffusion models use “empty” padding tokens, which can still influence generated images depending on model architecture and training.
Read More
In T2I pipelines, prompts are padded to a fixed length with a special pad token—normally ignored by language models. However, we find T2I models may treat them differently. We develop two causal intervention methods in the text encoder and the diffusion process to test whether these tokens carry semantic information. Experiments on six T2I models reveal scenarios where padding tokens are ignored, or become semantically significant, or even serve as “registers” that store and recall data during diffusion. Our findings underscore the need for more precise handling of padding tokens in T2I model design.
Paper
|
|
Scaling Up Synthetic Cell Production Using Robotics and Machine Learning Toward Therapeutic Applications
Authors: Noga Sharf-Pauker · Ido Galil · Omer Kfir · Gal Chen · Rotem Menachem · Jeny Shklover · Avi Schroeder · Shanny Ackerman
Advanced Biology, 2025 (Journal Cover)
TL;DR: We couple robotics with machine learning to optimize and monitor synthetic cell production. We use deep neural networks to assess the synthetic cells' quality.
Read More
An automated robotics-and-ML pipeline scales synthetic cell production with robust quality control; deep neural networks support quality assessment and assurance toward therapeutic-grade manufacturing.
Paper
|
|
Puzzle: Distillation-Based NAS for Inference-Optimized LLMs
Authors: Akhiad Bercovich · Tomer Ronen · Talor Abramovich · Nir Ailon · Nave Assaf ·
Mohammad Dabbah · Ido Galil · Amnon Geifman · Yonatan Geifman · Izhak Golan ·
Netanel Haber · Ehud Karpas · Roi Koren · Itay Levy · Pavlo Molchanov · Shahar Mor ·
Zach Moshe · Najeeb Nabwani · Omri Puny · Ran Rubin · Itamar Schen · Ido Shahaf ·
Oren Tropp · Omer Ullman Argov · Ran Zilberstein · Ran El-Yaniv
ICML, 2025
TL;DR: Puzzle accelerates LLM inference on specific hardware by leveraging blockwise local knowledge distillation and mixed-integer programming to preserve model performance while significantly reducing inference costs.
Read More
Despite LLMs’ impressive results, they are often limited by computational costs during inference.
Puzzle addresses this by optimizing large-scale models for specific hardware without sacrificing accuracy,
resulting in up to 2.17× speedups.
Paper /
Video /
Code
|
|
Llama-Nemotron: Efficient Reasoning Models
Authors: Akhiad Bercovich · Itay Levy · Izik Golan · Mohammad Dabbah · Ran El-Yaniv · Omri Puny · Ido Galil · Zach Moshe · Tomer Ronen · Najeeb Nabwani · Ido Shahaf · Oren Tropp · Ehud Karpas · Ran Zilberstein · Jiaqi Zeng · Soumye Singhal · Alexander Bukharin · Yian Zhang · Tugrul Konuk · Gerald Shen · Ameya Sunil Mahabaleshwarkar · Bilal Kartal · Yoshi Suhara · Olivier Delalleau · Zijia Chen · Zhilin Wang · David Mosallanezhad · Adi Renduchintala · Haifeng Qian · Dima Rekesh
Additional authors
Fei Jia · Somshubra Majumdar · Vahid Noroozi · Wasi Uddin Ahmad · Sean Narenthiran · Aleksander Ficek · Mehrzad Samadi · Jocelyn Huang · Siddhartha Jain · Igor Gitman · Ivan Moshkov · Wei Du · Shubham Toshniwal · George Armstrong · Branislav Kisacanin · Matvei Novikov · Daria Gitman · Evelina Bakhturina · Prasoon Varshney · Makesh Narsimhan · Jane Polak Scowcroft · John Kamalu · Dan Su · Kezhi Kong · Markus Kliegl · Rabeeh Karimi Mahabadi · Ying Lin · Sanjeev Satheesh · Jupinder Parmar · Pritam Gundecha · Brandon Norick · Joseph Jennings · Shrimai Prabhumoye · Syeda Nahida Akter · Mostofa Patwary · Abhinav Khattar · Deepak Narayanan · Roger Waleffe · Jimmy Zhang · Bor-Yiing Su · Guyue Huang · Terry Kong · Parth Chadha · Sahil Jain · Christine Harvey · Elad Segal · Jining Huang · Sergey Kashirsky · Robert McQueen · Izzy Putterman · George Lam · Arun Venkatesan · Sherry Wu · Vinh Nguyen · Manoj Kilaru · Andrew Wang · Anna Warno · Abhilash Somasamudramath · Sandip Bhaskar · Maka Dong · Nave Assaf · Shahar Mor · Omer Ullman Argov · Scot Junkin · Oleksandr Romanenko · Pedro Larroy · Monika Katariya · Marco Rovinelli · Viji Balas · Nicholas Edelman · Anahita Bhiwandiwalla · Muthu Subramaniam · Smita Ithape · Karthik Ramamoorthy · Yuting Wu · Suguna Varshini Velury · Omri Almog · Joyjit Daw · Denys Fridman · Erick Galinkin · Michael Evans · Shaona Ghosh · Katherine Luna · Leon Derczynski · Nikki Pope · Eileen Long · Seth Schneider · Guillermo Siman · Tomasz Grzegorzek · Pablo Ribalta · Monika Katariya · Chris Alexiuk · Joey Conway · Trisha Saar · Ann Guan · Krzysztof Pawelec · Shyamala Prayaga · Oleksii Kuchaiev · Boris Ginsburg · Oluwatobi Olabiyi · Kari Briski · Jonathan Cohen · Bryan Catanzaro · Jonah Alben · Yonatan Geifman · Eric Chung
ICML, 2025 - EXAIT Workshop
TL;DR: Llama‑Nemotron is a family of open reasoning LLMs (8B/49B/253B) that match state‑of‑the‑art reasoning quality while significantly improving inference throughput and memory efficiency, and include a dynamic reasoning toggle for controllable compute.
Read More
Llama‑Nemotron introduces heterogeneous reasoning models trained for both quality and efficiency. The recipe combines architecture search from Llama‑3 for faster inference, knowledge distillation and continued pretraining, followed by a reasoning‑focused post‑training stage (supervised fine‑tuning and large‑scale RL). The family (Nano 8B, Super 49B, Ultra 253B) achieves competitive reasoning vs. leading systems while improving throughput and memory use, and supports switching between standard chat and reasoning modes at inference time. The release includes models, a post‑training dataset, and training codebases (NeMo, NeMo‑Aligner, Megatron‑LM).
Paper
|
|
No Data, No Optimization: A Lightweight Method To Disrupt Neural Networks With Sign-Flips
Authors: Ido Galil* · Moshe Kimhi* · Ran El-Yaniv
(*Equal contribution)
2025
TL;DR: We present a data-free, optimization-free attack that disrupts neural networks by flipping a tiny number of sign bits in their parameters. Flipping just two sign bits in ResNet-50 on ImageNet causes a 99.8% accuracy drop; a single-pass variant further amplifies damage.
Read More
Deep Neural Lesion (DNL) locates highly sensitive sign bits that control model behavior and flips them without any data or optimization. We validate effectiveness across diverse CV architectures and datasets, and show that a one-pass variant intensifies disruption beyond the zero-pass setting. Finally, hardening a small subset of vulnerable sign bits mitigates parameter attacks.
Paper
|
|
Hierarchical Selective Classification
Authors: Shani Goren* · Ido Galil* · Ran El-Yaniv
(*Equal contribution)
NeurIPS, 2024
TL;DR: We extend selective classification to a hierarchical setting, allowing models to reduce the specificity of predictions when uncertain.
Read More
Traditional selective classification only allows a full prediction or refusal.
Our method uses class hierarchies to offer partial but valuable predictions (e.g., “malignant tumor” without specifying the subtype), improving calibration and risk-coverage trade-offs.
Paper /
Video /
Code
|
|
A Framework for Benchmarking Class-out-of-distribution Detection and its Application to ImageNet
Authors: Ido Galil* · Mohammed Dabbah* · Ran El-Yaniv
(*Equal contribution)
ICLR, 2023 (Top 25%)
TL;DR: Introduces a new approach to generate multi-level C-OOD benchmarks for ImageNet classifiers, applied to 500+ models to reveal novel insights in open-set recognition.
Read More
Existing OOD benchmarks can be too easy or biased toward a particular model.
Our framework systematically evaluates different detectors across multiple difficulty levels, uncovering how training regimes, architecture choices, and other factors influence performance.
Paper /
Video /
Code
|
|
What Can We Learn From the Selective Prediction and Uncertainty Estimation Performance of 523 ImageNet Classifiers?
Authors: Ido Galil · Mohammed Dabbah · Ran El-Yaniv
ICLR, 2023
TL;DR: Extensive study on selective prediction and uncertainty estimation across 523 ImageNet models, highlighting that distillation and certain training regimes yield superior calibration and ranking.
Read More
Metrics such as AUROC, ECE, selective risk, and SAC show that knowledge distillation significantly improves uncertainty estimation.
A subset of ViTs outperforms other architectures, and temperature scaling benefits both calibration and ranking performance more than previously realized.
Paper /
Video /
Code
|
|
Disrupting Deep Uncertainty Estimation Without Harming Accuracy
Authors: Ido Galil · Ran El-Yaniv
NeurIPS, 2021
TL;DR: ACE (Attack on Confidence Estimation) disrupts a neural network’s uncertainty estimations without affecting its accuracy, making standard selective mechanisms unreliable.
Read More
Traditional adversarial attacks cross decision boundaries to harm accuracy.
ACE selectively increases or decreases confidence for correct/incorrect predictions without crossing boundaries,
rendering uncertainty estimates dangerous in sensitive scenarios.
Paper /
Video /
Code
|
|
I was interviewed (in Hebrew) about my PhD research and teaching experience.
You can listen to the interview on
Spotify.
|
|
I served as a TA for the “Data Structures” course at the Technion for 3.5 years.
All my tutorials and other helpful materials (in Hebrew) are available on my
YouTube channel.
|
- Riva Dam Foundation Honors Scholarship for Excellence in PhD (2023)
- Teaching Assistant Excellence Award (5 semesters)
- Final's award for excellence in Computer Science (2018)
|
|