Teaching

My teaching centers on the systems layer that modern AI now depends on: parallel hardware, accelerators, memory hierarchy, and performance-aware computing. The developed course below is the clearest example of how I turn that into curriculum, materials, and hands-on training.

Developed course

How computation reaches modern AI hardware

Shared-Memory Parallelism: CPUs, GPUs, and In-Between

I developed this course around the computational substrate on which modern AI actually runs: parallel CPUs, GPUs, accelerators, memory movement, and the software-hardware interface that makes large-scale systems usable in practice.

It is still a shared-memory parallelism course in name and foundations, but the framing is current: how humans, code, compilers, runtimes, and hardware meet in the age of AI.

Explore the course page

Technion page Technion webcourse

Courses taught

Shared-Memory Parallelism: CPUs, GPUs, and In-Between
Technion

View
Distributed Systems (seminar)
Open University of Israel

Source
Multi-Core Processors and Embedded Processor Systems
Tel Aviv University

Source
Shared-Memory Parallelism: CPUs, GPUs, and In-Between
Ben-Gurion University of the Negev

Source

Project-based teaching

In 2025 and 2026, I supervised a Technion project course titled Cross-Platform GPU Kernel Translation with LLMs.

The project asked students to use large language models for portable GPU kernel translation across CUDA, OpenCL, SYCL/DPC++, ROCm/HIP, OpenMP offload, and OpenACC, while reasoning about optimization and benchmarking across heterogeneous platforms.

Course thoughts

The same background that feeds my research agenda also shapes my teaching: parallel programming, memory hierarchy, runtime behavior, and systems-level thinking. Today this matters not as an isolated HPC topic, but because AI itself lives on top of these systems. I try to teach not only APIs or tools, but the actual mechanics behind the computation.