AI Systems / Compute Architect
Full-Time | San Fransico (Hybrid Or Remote)
A fast-growing semiconductor startup is developing a next-generation AI acceleration platform focused on dramatically improving efficiency for modern machine learning workloads, particularly large-scale inference. This role will play a key part in defining both the silicon architecture and the broader system stack, with an emphasis on tightly integrated compute, memory, and interconnect.
What You’ll Do
- Define and architect end-to-end compute systems for AI/ML workloads, including hardware requirements and system-level topology.
- Work cross-functionally with architecture, design, and external IP teams to drive development from early concept through production.
- Optimize system performance by balancing compute, memory bandwidth, and interconnect efficiency to improve throughput, latency, and power.
- Collaborate with software, compiler, and ML teams to enable scalable deployment of AI models through effective abstractions and tooling.
- Lead architectural exploration and performance modeling to evaluate tradeoffs and guide key design decisions.
- Stay current with advancements in AI models, compute architectures, and emerging optimization techniques.
What You’ll Bring
- Strong background in modern compute architectures, including CPUs, GPUs, and specialized accelerators.
- Solid understanding of machine learning workloads, particularly transformer-based models and large-scale inference techniques.
- Experience evaluating hardware tradeoffs across power, performance, and area (PPA).
- Familiarity with architectural simulation and performance analysis tools.
- Proficiency in C/C++ and Python for modeling, simulation, or system-level development.
- Exposure to ML frameworks such as PyTorch, JAX, or similar is a plus.
