CPU Storage Tech Lead

Openai·Remote(San Francisco)

Software Development

Excel

WFA Digital Insight

The demand for skilled tech leads in AI infrastructure has surged, with a 25% increase in related job postings in the last year. As companies like Openai push the boundaries of AI capabilities, the need for experts who can optimize compute and storage systems has become critical. With its innovative approach to AI development, Openai stands out as a leader in this field. Candidates should be aware that this role requires not only deep technical expertise but also the ability to drive strategic decisions and collaborate across multiple stakeholders. Before applying, consider how your skills in CPU architectures, storage systems, and systems architecture can contribute to Openai's mission.

Job Description

About the Role

The CPU Storage Tech Lead will play a pivotal role in defining and driving the server compute and storage architecture strategy for Openai's Stargate infrastructure. This involves owning the technical direction across CPU platforms, memory configurations, local and disaggregated storage systems, and their integration into large-scale AI clusters. The successful candidate will be responsible for evaluating vendor roadmaps, leading platform tradeoff decisions, and ensuring compute and storage systems are optimized for training, inference, and supporting services.

The Stargate team is at the forefront of building the physical infrastructure that powers large-scale AI systems. By designing and delivering next-generation data centers optimized for dense compute clusters, advanced networking, and rapidly evolving hardware platforms, this team translates cutting-edge compute roadmaps into scalable, production-ready environments. The CPU Storage Tech Lead will be part of a team that partners across silicon vendors, server and storage OEMs, networking teams, and data center engineering organizations to bring new capacity online quickly, reliably, and at global scale.

What You Will Do

Own CPU and storage technical strategy for Stargate compute infrastructure across current and future generations.
Evaluate CPU platforms across performance, efficiency, memory bandwidth, PCIe topology, cost, and roadmap alignment.
Define storage architectures for AI environments, including boot media, local NVMe, shared storage, caching tiers, metadata services, and high-performance data pipelines.
Drive server platform decisions involving CPU, memory, NIC, GPU, and storage subsystem integration.
Partner with performance modeling teams to quantify tradeoffs across compute, memory, I/O, and storage bottlenecks.
Work with silicon and hardware vendors on roadmap influence, feature requests, qualification plans, and technical escalations.
Lead bring-up and validation efforts for new CPU and storage platforms in lab and production environments.
Partner with networking and cluster architecture teams to optimize end-to-end node design and data movement.
Support supply chain and sourcing teams with technical vendor assessments and second-source strategies.
Drive reliability, serviceability, and fleet lifecycle planning for compute and storage platforms.
Translate future AI workload requirements into infrastructure platform specifications.
Provide technical leadership across cross-functional stakeholders and executive reviews.

What We Are Looking For

Bachelor’s degree in Computer Engineering, Electrical Engineering, Computer Science, or related technical field; advanced degree preferred.
10+ years of experience in server hardware, systems architecture, data center infrastructure, or hyperscale compute platforms.
Deep expertise in modern CPU architectures (x86, ARM, accelerator host systems) and server platform design.
Strong understanding of memory systems, PCIe/CXL fabrics, NUMA behavior, and platform-level performance constraints.
Experience with storage systems including NVMe, SSD qualification, RAID, distributed storage, object/file systems, or high-performance data pipelines.
Experience evaluating hardware tradeoffs across performance, cost, power, thermals, and supply availability.
Familiarity with GPU clusters and AI training/inference infrastructure strongly preferred.
Experience working directly with OEMs, ODMs, silicon vendors, or storage vendors.
Strong systems thinking with ability to connect component decisions to fleet-level outcomes.

Nice to Have

Experience with Excel for data analysis and presentation.
Knowledge of data center operations and infrastructure management.
Participation in open-source projects related to compute and storage systems.

Benefits and Perks

Competitive compensation package.
Opportunities for professional growth and development in a cutting-edge field.
Collaborative and dynamic work environment.
Flexible remote work options.
Access to cutting-edge technologies and tools.
Comprehensive health insurance and benefits package.

How to Stand Out

Highlight your experience with CPU architectures and storage systems, and be prepared to discuss how you've optimized these for AI workloads.
Showcase your ability to drive technical decisions and collaborate across multiple stakeholders, including vendors and internal teams.
Prepare examples of how you've evaluated hardware tradeoffs and made strategic decisions based on performance, cost, and other factors.
Be ready to discuss your understanding of the current AI landscape and how you see compute and storage evolving to meet future workload requirements.
Emphasize your problem-solving skills and ability to work in a fast-paced, dynamic environment.
Consider creating a portfolio or examples of your work that demonstrate your expertise in server hardware, systems architecture, and data center infrastructure.
When discussing salary, be prepared to reference industry standards for similar positions and highlight your unique value proposition.

This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.