Staff Software Engineer

TwilioTwilio·Remote(Remote - Ireland)
Software Development
Excel

WFA Digital Insight

As the demand for real-time data processing grows, companies like Twilio are looking for skilled software engineers to build and optimize their event-aggregation services. With the shift to remote work, the need for professionals with expertise in Java, Kafka Streams, and ClickHouse has increased by 25% in the last year. Twilio stands out for its dedication to remote-first work and strong culture of connection, making it an attractive option for those seeking a collaborative and dynamic work environment. Before applying, candidates should be prepared to showcase their skills in designing resilient, high-performance systems and their experience with DevSecOps practices.

Job Description

About the Role

As a Staff Software Engineer at Twilio, you will play a key role in hardening, optimizing, and scaling the real-time event-aggregation services that power the Observability Insights/Analytics platform. This is a critical component of Twilio's infrastructure, and your work will have a direct impact on the company's ability to provide personalized customer experiences. You will be part of a vibrant team with diverse experiences, working together to make a global impact.

The role entails designing, building, and maintaining high-performance Java microservices using Spring Boot, capable of ingesting over 250K events per second with p99 latencies under 200ms. You will be working closely with the engineering team to implement stateful stream-processing pipelines, optimize ClickHouse schemas, and embed OpenTelemetry instrumentation.

Twilio values diverse experiences from all kinds of industries, and this role is an opportunity to bring your skills and expertise to a dynamic and innovative company.

What You Will Do

  • Design, build, and maintain high-performance Java microservices using Spring Boot, capable of ingesting over 250K events per second with p99 latencies under 200ms
  • Implement stateful stream-processing pipelines (Kafka Streams / Apache Flink) with idempotent replays, exactly-once semantics, and schema-evolution tooling
  • Optimize ClickHouse schemas, partitioning, and materialized views to support multi-region, sub-second queries for Early Warning System (EWS) detectors
  • Embed OpenTelemetry instrumentation and ship comprehensive metrics/traces/logs to Datadog and Grafana with SLI/SLO dashboards
  • Champion DevSecOps best practices including Terraform automation, CI/CD pipelines, Kubernetes orchestration, AWS infrastructure (EKS, MSK, S3), and compliance guardrails (HIPAA, SOX, GDPR)
  • Leverage best-in-class development productivity practices including AI-powered tooling to accelerate delivery and code quality
  • Mentor junior engineers and participate in rigorous code/design reviews to elevate team standards and foster knowledge sharing
  • Collaborate with the engineering team to design resilient, high-performance systems capable of processing over 250K events per second with p99 latencies under 200ms

What We Are Looking For

  • 8+ years of professional Java development experience with mastery of high-performance and low-latency design patterns
  • Production experience with Kafka Streams, Flink, or comparable stream-processing frameworks for building real-time data pipelines
  • Hands-on ClickHouse (or columnar database) performance tuning and SQL optimization experience
  • Experience with DevSecOps practices, including Terraform automation, CI/CD pipelines, and Kubernetes orchestration
  • Strong understanding of AWS infrastructure (EKS, MSK, S3) and compliance guardrails (HIPAA, SOX, GDPR)
  • Excellent coding skills, with the ability to write clean, efficient, and well-documented code
  • Strong communication and collaboration skills, with the ability to work effectively with remote teams

Nice to Have

  • Experience with AI-powered tooling for development productivity
  • Knowledge of OpenTelemetry instrumentation and comprehensive metrics/traces/logs
  • Experience with Datadog and Grafana for monitoring and visualization
  • Familiarity with Terraform automation and infrastructure as code

Benefits and Perks

  • Competitive salary and benefits package
  • Opportunity to work with a dynamic and innovative company
  • Collaborative and supportive team environment
  • Professional development opportunities, including training and conference sponsorships
  • Flexible working hours and remote work options
  • Access to the latest tools and technologies
  • Comprehensive health insurance and wellness programs
  • Generous paid time off and holidays
  • Stock options and equity participation
  • Remote stipend and home office setup support

How to Stand Out

  • Make sure to highlight your experience with Java, Kafka Streams, and ClickHouse in your resume and cover letter.
  • Be prepared to discuss your approach to designing resilient, high-performance systems and your experience with DevSecOps practices.
  • Showcasing your ability to work effectively in a remote team environment and your excellent communication and collaboration skills is crucial.
  • Having a strong understanding of AWS infrastructure and compliance guardrails is a plus, so be sure to review these topics before applying.
  • Prepare to talk about your experience with OpenTelemetry instrumentation and comprehensive metrics/traces/logs, as well as your knowledge of Datadog and Grafana.
  • Don't forget to ask about the company culture, team dynamics, and opportunities for professional growth during the interview.

This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.