Cluster Deployment Engineer
WFA Digital Insight
The demand for skilled engineers in AI and tech deployment has grown significantly, with a 25% increase in job openings over the past year. As companies like Anthropic continue to push the boundaries of AI systems, the need for experts who can manage and deploy these systems efficiently is on the rise. With the ability to work remotely, this role offers a unique opportunity for engineers to be at the forefront of technological innovation. Before applying, candidates should be aware that this position requires a deep understanding of deployment engineering, strategy, and the ability to work across multiple disciplines.
Job Description
About the Role
The Cluster Deployment Engineer role is a critical position at Anthropic, where you will be responsible for overseeing the deployment of large-scale AI compute clusters. This role requires a deep understanding of deployment engineering and strategy, as well as the ability to work across multiple disciplines, including hardware, networking, facilities, supply chain, and construction. As a senior individual contributor, you will have broad technical influence and will be working closely with internal research and systems teams, external developers, engineering firms, and OEM partners to deliver cluster capacity at the speed required by the frontier.The ideal candidate will have a strong background in deployment engineering, with a focus on cluster-level deployment strategy, rack interface standards, and multi-threaded cluster bring-up programs. You will be responsible for owning the deployment-engineering strategy for cluster build-out, including how racks are organized into pods, halls, and sites, and how compute, network, power, and cooling systems interface at the rack boundary.
What You Will Do
- Own cluster-level deployment strategy, defining how AI compute clusters are organized across the floor and how racks interconnect.
- Set rack interface standards, spanning power, network, mechanical, thermal, and spatial domains, and ensure that every deployment includes the complete set of infrastructure required to bring a cluster online.
- Drive multi-threaded cluster bring-up programs across hardware, networking, power, and cooling, owning plans, dependencies, and critical paths from hardware specification through energization and turn-up.
- Partner with internal engineering teams, including research, systems, networking, and hardware, to translate cluster requirements into deployable facility scope and to derisk onboarding of new hardware platforms well ahead of delivery.
- Lead external partner execution with developers, engineering firms, OEMs, and construction teams, driving technical reviews, deviation management, and handoffs that keep deployments on schedule and within specification.
- Improve cluster turn-up reliability and repeatability, identifying systemic gaps in deployment scope, tooling, and partner interfaces, and driving durable fixes that reduce time-to-serve for new capacity.
- Define and track deployment KPIs, including cluster readiness, schedule adherence, scope completeness, and time-to-first-packet, and use historical trends to forecast risk and inform capacity planning.
- Coordinate cross-functional readiness across supply chain, security, operations, and construction to ship production-ready compute capacity.
What We Are Looking For
- 5+ years of experience in deployment engineering, with a focus on cluster-level deployment strategy and rack interface standards.
- Strong background in computer science, electrical engineering, or a related field.
- Experience working with large-scale AI compute clusters, including deployment, management, and maintenance.
- Strong understanding of hardware, networking, and facilities, including power, cooling, and spatial domains.
- Experience working with external partners, including developers, engineering firms, OEMs, and construction teams.
- Strong communication and project management skills, with the ability to work effectively across multiple disciplines and teams.
Nice to Have
- Experience with cloud-based infrastructure and deployment tools, such as AWS or Azure.
- Knowledge of containerization and orchestration technologies, such as Kubernetes or Docker.
- Experience with automation and scripting languages, such as Python or Bash.
Benefits and Perks
- Competitive salary and benefits package.
- Opportunity to work with a cutting-edge AI company and contribute to the development of beneficial AI systems.
- Collaborative and dynamic work environment, with a team of experienced engineers and researchers.
- Flexible working hours and remote work options, with the ability to work from anywhere in the United States.
- Professional development opportunities, including training, mentorship, and conference attendance.
How to Stand Out
- To stand out as a candidate, be sure to highlight your experience with large-scale AI compute clusters and your ability to work effectively across multiple disciplines.
- Make sure to review the company's mission and values, and be prepared to discuss how your skills and experience align with these.
- Use specific examples to demonstrate your problem-solving skills and ability to drive projects forward.
- Be prepared to discuss your experience with deployment engineering and strategy, and how you have improved cluster turn-up reliability and repeatability in previous roles.
- Don't be afraid to ask questions during the interview process, and be sure to inquire about the company culture and team dynamic.
- Consider creating a portfolio or sample project that demonstrates your skills and experience, and be prepared to discuss this during the interview process.
This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.