Teaching

My teaching centers on the systems layer that modern AI now depends on: parallel hardware, accelerators, memory hierarchy, and performance-aware computing. The developed course below is the clearest example of how I turn that into curriculum, materials, and hands-on training.

Courses taught

  • Shared-Memory Parallelism: CPUs, GPUs, and In-Between

    Technion

    View
  • Distributed Systems (seminar)

    Open University of Israel

    Source
  • Multi-Core Processors and Embedded Processor Systems

    Tel Aviv University

    Source
  • Shared-Memory Parallelism: CPUs, GPUs, and In-Between

    Ben-Gurion University of the Negev

    Source

Project-based teaching

In 2025 and 2026, I supervised a Technion project course titled Cross-Platform GPU Kernel Translation with LLMs.

The project asked students to use large language models for portable GPU kernel translation across CUDA, OpenCL, SYCL/DPC++, ROCm/HIP, OpenMP offload, and OpenACC, while reasoning about optimization and benchmarking across heterogeneous platforms.

Course thoughts

The same background that feeds my research agenda also shapes my teaching: parallel programming, memory hierarchy, runtime behavior, and systems-level thinking. Today this matters not as an isolated HPC topic, but because AI itself lives on top of these systems. I try to teach not only APIs or tools, but the actual mechanics behind the computation.