Senior Data Engineer, Core Experimentation

Openai·Remote(Seattle)

Software Development

WFA Digital Insight

The demand for skilled data engineers in the AI sector has surged, with a 25% increase in job openings over the past year. As companies like Openai continue to push the boundaries of artificial intelligence, the need for experts who can design and manage complex data pipelines has become crucial. With the rise of remote work, candidates can now apply for roles that were previously inaccessible. Openai's commitment to AI research and deployment makes it an attractive employer for those passionate about AI. Before applying, candidates should be aware of the company's emphasis on statistical correctness, pragmatic solutions, and building trustworthy systems.

Job Description

About the Role

The Senior Data Engineer position at Openai is a critical role that involves designing, building, and managing data pipelines for the company's core experimentation platform. This platform powers product development, measurement, and decision-making across the organization. As a Senior Data Engineer, you will be working closely with various teams, including product, engineering, and infrastructure, to ensure that experiments are trustworthy, statistically rigorous, and scalable to the needs of frontier AI products.

The Statsig team at Openai is responsible for building and operating the experimentation platform, and as a Senior Data Engineer, you will be a key member of this team. Your expertise in data engineering will be essential in shaping the future of experimentation in the AI era. You will have the opportunity to collaborate with researchers behind ChatGPT and help them train new models to deliver to users.

What You Will Do

Design, build, and manage data pipelines to power analyses, safety systems, and business decisions
Develop canonical datasets to track key product metrics, including user growth, engagement, and revenue
Work collaboratively with various teams to understand their data needs and provide solutions
Implement robust and fault-tolerant systems for data ingestion and processing
Participate in data architecture and engineering decisions, bringing your strong experience and knowledge to bear
Ensure the security, integrity, and compliance of data according to industry and company standards
Collaborate with researchers to train new models and deliver them to users
Design and implement data pipelines to support the growth of the company's products and services
Develop and maintain data quality checks to ensure the accuracy and reliability of the data

What We Are Looking For

3+ years of experience as a data engineer and 8+ years of software engineering experience (including data engineering)
Proficiency in at least one programming language commonly used in data engineering, such as Python, Scala, or Java
Experience with distributed processing technologies and frameworks, such as Hadoop, Flink, and distributed storage systems (e.g., HDFS, S3)
Expertise with ETL schedulers, such as Airflow, Dagster, or Prefect
Solid understanding of Spark and the ability to write, debug, and optimize Spark code
Experience with data architecture and engineering decisions
Strong knowledge of data security, integrity, and compliance
Ability to work collaboratively with various teams and communicate technical concepts effectively

Nice to Have

Experience with cloud-based data platforms, such as AWS or GCP
Knowledge of machine learning algorithms and their applications
Familiarity with containerization using Docker
Experience with agile development methodologies

Benefits and Perks

Competitive salary and benefits package
Opportunity to work with a cutting-edge AI research and deployment company
Collaborative and dynamic work environment
Professional development opportunities
Flexible work arrangements, including remote work options
Access to the latest technologies and tools
Comprehensive health insurance and other benefits
Generous paid time off and holiday schedule

How to Stand Out

Develop a strong understanding of data engineering concepts, including data pipelines, distributed processing, and data architecture.
Familiarize yourself with Openai's products and services, including ChatGPT, to understand the company's mission and vision.
Highlight your experience with ETL schedulers, Spark, and other relevant technologies in your resume and cover letter.
Prepare to discuss your approach to data security, integrity, and compliance, as these are critical aspects of the role.
Be prepared to collaborate with various teams and communicate technical concepts effectively during the interview process.
Research the company culture and values to ensure you are a good fit for the organization.
Consider creating a portfolio of your work, including examples of data pipelines and architectures you have designed and implemented, to demonstrate your skills and experience.

This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.