Senior Data Engineer, Core Experimentation

OpenaiOpenai·Remote(Seattle)
Software Development

WFA Digital Insight

The demand for skilled data engineers in the AI sector has surged, with a 25% increase in job openings over the past year. As companies like Openai continue to push the boundaries of artificial intelligence, the need for experts who can design and manage complex data pipelines has become crucial. With the rise of remote work, candidates can now apply for roles that were previously inaccessible. Openai's commitment to AI research and deployment makes it an attractive employer for those passionate about AI. Before applying, candidates should be aware of the company's emphasis on statistical correctness, pragmatic solutions, and building trustworthy systems.

Job Description

About the Role

The Senior Data Engineer position at Openai is a critical role that involves designing, building, and managing data pipelines for the company's core experimentation platform. This platform powers product development, measurement, and decision-making across the organization. As a Senior Data Engineer, you will be working closely with various teams, including product, engineering, and infrastructure, to ensure that experiments are trustworthy, statistically rigorous, and scalable to the needs of frontier AI products.

The Statsig team at Openai is responsible for building and operating the experimentation platform, and as a Senior Data Engineer, you will be a key member of this team. Your expertise in data engineering will be essential in shaping the future of experimentation in the AI era. You will have the opportunity to collaborate with researchers behind ChatGPT and help them train new models to deliver to users.

What You Will Do

  • Design, build, and manage data pipelines to power analyses, safety systems, and business decisions
  • Develop canonical datasets to track key product metrics, including user growth, engagement, and revenue
  • Work collaboratively with various teams to understand their data needs and provide solutions
  • Implement robust and fault-tolerant systems for data ingestion and processing
  • Participate in data architecture and engineering decisions, bringing your strong experience and knowledge to bear
  • Ensure the security, integrity, and compliance of data according to industry and company standards
  • Collaborate with researchers to train new models and deliver them to users
  • Design and implement data pipelines to support the growth of the company's products and services
  • Develop and maintain data quality checks to ensure the accuracy and reliability of the data

What We Are Looking For

  • 3+ years of experience as a data engineer and 8+ years of software engineering experience (including data engineering)
  • Proficiency in at least one programming language commonly used in data engineering, such as Python, Scala, or Java
  • Experience with distributed processing technologies and frameworks, such as Hadoop, Flink, and distributed storage systems (e.g., HDFS, S3)
  • Expertise with ETL schedulers, such as Airflow, Dagster, or Prefect
  • Solid understanding of Spark and the ability to write, debug, and optimize Spark code
  • Experience with data architecture and engineering decisions
  • Strong knowledge of data security, integrity, and compliance
  • Ability to work collaboratively with various teams and communicate technical concepts effectively

Nice to Have

  • Experience with cloud-based data platforms, such as AWS or GCP
  • Knowledge of machine learning algorithms and their applications
  • Familiarity with containerization using Docker
  • Experience with agile development methodologies

Benefits and Perks

  • Competitive salary and benefits package
  • Opportunity to work with a cutting-edge AI research and deployment company
  • Collaborative and dynamic work environment
  • Professional development opportunities
  • Flexible work arrangements, including remote work options
  • Access to the latest technologies and tools
  • Comprehensive health insurance and other benefits
  • Generous paid time off and holiday schedule

How to Stand Out

  • Develop a strong understanding of data engineering concepts, including data pipelines, distributed processing, and data architecture.
  • Familiarize yourself with Openai's products and services, including ChatGPT, to understand the company's mission and vision.
  • Highlight your experience with ETL schedulers, Spark, and other relevant technologies in your resume and cover letter.
  • Prepare to discuss your approach to data security, integrity, and compliance, as these are critical aspects of the role.
  • Be prepared to collaborate with various teams and communicate technical concepts effectively during the interview process.
  • Research the company culture and values to ensure you are a good fit for the organization.
  • Consider creating a portfolio of your work, including examples of data pipelines and architectures you have designed and implemented, to demonstrate your skills and experience.

This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.