Data Engineer, Scaling Analytics

Openai·Remote(San Francisco)

Software Development

Excel

WFA Digital Insight

As demand for advanced AI infrastructure grows, companies like Openai are scaling their operations and need skilled data engineers to drive informed decisions. With the global data analytics market expected to reach $274 billion by 2026, professionals with expertise in data engineering are in high demand. This role stands out for its focus on enabling leaders to make strategic decisions across infrastructure deployment, hardware operations, and supply chain management. Before applying, candidates should be aware of the complex operational environments and the need to translate infrastructure problems into scalable data solutions.

Job Description

About the Role

The Data Engineer, Scaling Analytics role is a critical component of Openai's infrastructure organization, responsible for building and scaling the analytical foundations that power the company's operations. This role entails designing, building, and maintaining scalable data pipelines that support infrastructure deployment, operations, capacity planning, and supply chain functions. As a key member of the Scaling Analytics team, the successful candidate will partner closely with cross-functional stakeholders to create reliable data products that support critical operational and strategic decisions.

The Scaling Analytics team serves as the data backbone for Openai's infrastructure ecosystem, enabling leaders and operators to make informed decisions across multiple domains. As the company expands its operations across an increasing number of global data center campuses, the complexity of managing infrastructure capacity, hardware health, supply flows, and operational performance continues to grow. The ideal candidate will have a strong foundation in data engineering and the ability to navigate ambiguous operational environments, translating complex infrastructure problems into scalable data solutions.

The role is based in San Francisco, and the successful candidate will be expected to work closely with the Hardware Operations, Capacity Planning, Supply Chain, Infrastructure Delivery, Finance, and Engineering teams to create trusted datasets and reporting systems that provide visibility into hardware inventory, deployment status, site readiness, capacity utilization, and operational performance.

What You Will Do

Design, build, and maintain scalable data pipelines supporting infrastructure deployment, operations, capacity planning, and supply chain functions.
Develop trusted datasets and reporting systems that provide visibility into hardware inventory, deployment status, site readiness, capacity utilization, and operational performance.
Partner with cross-functional stakeholders to define metrics, establish data standards, and improve decision-making across infrastructure organizations.
Create scalable data models that enable consistent reporting and analytics across multiple data sources and operational systems.
Improve data quality, lineage, observability, and governance practices across critical infrastructure datasets.
Support executive reporting, operational reviews, forecasting exercises, and strategic planning initiatives through reliable analytical foundations.
Collaborate with engineering teams to integrate new data sources and operational telemetry into existing analytics ecosystems.
Build solutions that reduce manual reporting efforts and improve the speed and accuracy of infrastructure decision-making.
Document systems, processes, and analytical frameworks to improve long-term maintainability and organizational resilience.
Develop and maintain data pipelines using modern data warehouses and orchestration frameworks.
Ensure data pipelines are scalable, reliable, and efficient, with a focus on maintainability, performance, and operational excellence.

What We Are Looking For

5+ years of experience building and maintaining production data pipelines and analytical systems.
Strong proficiency in SQL and experience designing scalable data models.
Proficiency in Python or another programming language commonly used for data engineering.
Experience working with modern data warehouses (e.g., Snowflake, BigQuery, Redshift) and orchestration frameworks (e.g., Airflow, Dagster).
Experience designing reliable ETL/ELT workflows with a focus on maintainability, performance, and operational excellence.
Experience partnering with cross-functional stakeholders to translate business requirements into technical solutions.
Experience implementing data quality checks, monitoring, and observability practices in production environments.
Strong problem-solving skills, with the ability to navigate complex operational environments and translate infrastructure problems into scalable data solutions.
Excellent communication skills, with the ability to collaborate with cross-functional stakeholders and communicate technical concepts to non-technical audiences.

Nice to Have

Experience working with Excel and other data analysis tools.
Knowledge of data governance and data quality best practices.
Experience working in a cloud-based environment, with a focus on scalability and reliability.
Familiarity with machine learning and AI concepts, and how they apply to infrastructure operations.
Experience working in a rapidly growing company, with a focus on adaptability and flexibility.

Benefits and Perks

Competitive salary and benefits package.
Opportunity to work with a rapidly growing company at the forefront of AI infrastructure development.
Collaborative and dynamic work environment, with a focus on innovation and creativity.
Professional development opportunities, with a focus on ongoing learning and growth.
Flexible working hours and remote work options, with a focus on work-life balance.
Access to cutting-edge technologies and tools, with a focus on staying up-to-date with industry trends and developments.
Comprehensive health and wellness package, with a focus on employee well-being.
Generous parental leave policy, with a focus on supporting working families.
Employee recognition and reward programs, with a focus on acknowledging and rewarding outstanding performance.

How to Stand Out

To stand out in this role, focus on developing a strong foundation in data engineering principles and practices, including experience with modern data warehouses and orchestration frameworks.
Be prepared to communicate complex technical concepts to non-technical stakeholders, and to partner closely with cross-functional teams to drive business outcomes.
Consider developing skills in machine learning and AI concepts, and how they apply to infrastructure operations, to stay ahead of the curve in this rapidly evolving field.
When applying, be sure to highlight your experience working with scalable data models, data quality checks, and data governance practices, as these are critical components of the role.
In the interview process, be prepared to walk through your experience designing and building data pipelines, and to discuss your approach to navigating complex operational environments and translating infrastructure problems into scalable data solutions.
When negotiating salary, be sure to research industry standards for data engineers, and to emphasize your unique skills and experience, to ensure you are fairly compensated for your work.
Be aware of the potential for rapid growth and change in a company like Openai, and be prepared to adapt and evolve with the organization as it continues to expand and mature.

This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.