AI for Systems
LLMs, agents, and hardware-aware parallel code
I treat code itself as a computational object for AI: something to read, translate, profile, optimize, and eventually generate in ways that stay grounded in performance and hardware constraints.
- Automatic parallelization and translation across OpenMP, MPI, CUDA, and related settings
- Domain-specific language models for HPC code and tasks
- Performance reasoning, profiling, and hardware-aware code generation