Hardware Operations Technical Program Manager
WFA Digital Insight
The demand for skilled technical program managers in the AI infrastructure space has grown significantly, with a 25% increase in job postings over the past year. As companies like Openai continue to push the boundaries of artificial intelligence, the need for professionals who can bridge the gap between technical and operational expertise has become paramount. With the global AI market projected to reach
Job Description
About the Role
The Hardware Operations Technical Program Manager will play a critical role in driving the execution of AI infrastructure hardware programs at Openai. This will involve working closely with cross-functional teams, including hardware engineering, data center engineering, networking, supply chain, manufacturing, deployment, and operations. The successful candidate will be responsible for owning the end-to-end hardware operations readiness programs, developing scalable processes for high-volume infrastructure deployment, and creating operational scorecards to measure hardware operational health.The role will require a deep understanding of hardware systems, as well as the ability to identify operational blockers and drive accountability across teams. The ideal candidate will be comfortable operating at both the technical and programmatic level, with experience driving complex hardware or infrastructure programs from development through production and deployment.
As part of the Stargate team, the Hardware Operations Technical Program Manager will be responsible for building the physical infrastructure that powers Openai's largest-scale AI systems. This will involve designing, deploying, and operating next-generation data center infrastructure across a rapidly expanding footprint, bringing together hardware, networking, facilities, supply chain, and deployment execution.
What You Will Do
- Drive end-to-end Hardware Operations readiness programs across AI infrastructure systems, including servers, racks, networking hardware, power and cooling interfaces, and related data center infrastructure
- Develop and operationalize scalable hardware operations processes, workflows, and support models spanning deployment, repair operations, diagnostics, break/fix, escalation management, and sustaining operations
- Lead cross-functional execution of Hardware Operations readiness initiatives, ensuring operational capabilities, tooling, documentation, staffing models, and workflows are established prior to production deployment and operational handoff
- Partner across Hardware Engineering, Manufacturing, Supply Chain, Data Center Operations, Network Operations, Deployment, Reliability Engineering, and external suppliers to ensure alignment on operational requirements, supportability, and readiness milestones
- Develop operational scorecards, reporting frameworks, and metric algorithms to measure hardware operational health, repair performance, deployment quality, readiness status, and execution efficiency
- Identify operational, technical, supplier, tooling, and process risks early; drive mitigation plans, cross-functional alignment, and executive-level communication
- Lead cross-functional issue resolution efforts during hardware deployment, validation, operational ramp, and sustaining operations, ensuring rapid containment, corrective action development, and long-term process improvement
- Create and mature operational governance models, including standardized readiness reviews, action tracking, escalation management, performance reviews, and operational business rhythms
- Ensure operational knowledge sharing and alignment across internal teams, external suppliers, and infrastructure partners to improve execution consistency, issue resolution efficiency, and operational maturity
What We Are Looking For
- Experience driving complex hardware or infrastructure programs from development through production and deployment
- Comfort operating across engineering, manufacturing, supply chain, deployment, and operations teams
- Ability to understand technical system dependencies without needing to be the deepest engineer in every domain
- Experience creating structure in ambiguous, fast-moving environments
- Effective at driving accountability across teams and vendors without direct authority
- Ability to move between tactical execution details and strategic planning
- Strong programmatic skills, including Excel analysis and data visualization
- Experience with hardware operations, including deployment, repair, and maintenance
- Strong communication and collaboration skills, with the ability to work with cross-functional teams
Nice to Have
- Experience working in the AI or machine learning space
- Knowledge of data center infrastructure and operations
- Experience with cloud-based infrastructure and deployment
- Certification in program management or a related field
- Experience with agile development methodologies
Benefits and Perks
- Competitive compensation package
- Equity in a leading AI company
- Comprehensive health insurance
- Retirement savings plan
- Flexible PTO policy
- Remote work stipend
- Professional development opportunities
- Access to cutting-edge technology and tools
- Collaborative and dynamic work environment
- Recognition and rewards for outstanding performance
How to Stand Out
- Tip: Develop a strong understanding of hardware operations and program management to stand out in this role. Focus on creating scalable processes and workflows that can be applied to high-volume infrastructure deployment.
- Tip: Highlight your experience working with cross-functional teams, including engineering, manufacturing, and operations. Emphasize your ability to drive accountability and create operational scorecards to measure hardware operational health.
- Tip: Be prepared to discuss your experience with hardware operations, including deployment, repair, and maintenance. Provide specific examples of how you have driven complex hardware or infrastructure programs from development through production and deployment.
- Tip: Showcase your programmatic skills, including Excel analysis and data visualization. Demonstrate your ability to create operational scorecards and reporting frameworks to measure hardware operational health.
- Tip: Research Openai's company culture and values to understand their approach to AI infrastructure and deployment. Be prepared to discuss how you can contribute to the company's mission and goals.
- Tip: Prepare to discuss your experience working in fast-moving environments and creating structure in ambiguous situations. Highlight your ability to operate at both the technical and programmatic level, with a deep understanding of hardware systems and operational blockers.
This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.