Cloud Infrastructure Engineer (Open LMS) Colombia, Remote

LTG·Remote(Colombia)
Software Development

WFA Digital Insight

As remote work continues to redefine the digital landscape, the demand for skilled Cloud Infrastructure Engineers has surged. With a growth rate of 25% in the last year, companies like LTG are seeking experts who can navigate the complexities of AWS, Terraform, and Puppet. This role stands out for its focus on multi-tenant SaaS hosting platforms and the opportunity to work with a dynamic team. Candidates should be prepared to showcase their proficiency in distributed systems, Linux, and automation. With the right skills and experience, this role can be a launchpad for a successful career in cloud infrastructure engineering.

Job Description

## About the Role The Cloud Infrastructure Engineer will play a critical role in designing, building, and maintaining LTG's multi-tenant SaaS hosting platform on AWS. This is a hands-on infrastructure role that requires a deep understanding of Linux systems, distributed systems, and automation. The successful candidate will work across the full stack, from Terraform modules and Puppet manifests to Python automation and observability pipelines. The platform's architecture is built around custom orchestration tooling, distributed service discovery, and infrastructure as code. The team is looking for someone who can reason about distributed systems problems from first principles and has experience with Linux systems, particularly Ubuntu. The role offers real ownership and influence over the platform's architecture and direction as the company continues to grow and evolve the infrastructure.

## What You Will Do - Design, build, and maintain AWS infrastructure using Terraform (EC2, RDS, S3, SQS, Lambda, ALB, ElastiCache, Route 53, VPC networking) - Write and maintain Puppet modules to configure and manage fleets of EC2 instances across multiple auto-scaling groups - Maintain and extend Python-based automation and tooling that supports platform operations - Operate and improve distributed service discovery and configuration management (etcd) - Manage and tune a multi-tier caching strategy (Varnish, Redis/Valkey, PHP OPcache) - Run and scale the observability stack (Prometheus, Grafana, Loki, Fluentd, PagerDuty) and participate in on-call rotations - Evaluate and implement distributed storage solutions as the platform evolves - Improve deployment workflows and release processes - Collaborate with internal teams on API contracts, integration patterns, and operational tooling - Participate in incident response, root cause analysis, and platform reliability improvements

## What We Are Looking For - Strong experience with AWS services in production, particularly EC2, RDS, S3, SQS, Lambda, ALB, ElastiCache, Route 53, IAM, and VPC networking - Proficiency in authoring and maintaining Terraform modules for production infrastructure - Proficiency in authoring and maintaining Puppet modules (or equivalent agent-based configuration management) for fleet management - Solid Python skills, with experience writing and maintaining production daemons - Deep Linux systems knowledge (Ubuntu), with comfort in Apache/Nginx, PHP-FPM, Varnish, systemd, filesystem mounts, and networking fundamentals - Understanding of distributed systems concepts, including consensus, leader election, distributed locking, eventual consistency, and tradeoffs involved - Proficiency in building and maintaining observability pipelines (Prometheus, Grafana, Loki, or equivalent) in production - Clear communication skills, with the ability to document architectural decisions and explain technical tradeoffs to both technical and non-technical stakeholders

## Nice to Have - Hands-on experience with distributed storage systems, such as Ceph, GlusterFS, JuiceFS, CubeFS, or AWS EFS - Familiarity with etcd (or similar distributed key-value stores like Consul or ZooKeeper), including watch APIs, TTL-based locking, and cluster operations - Experience with Varnish and VCL, especially dynamic backend routing or multi-tenant configurations - Working knowledge of PHP, with understanding of integration scripts that bridge infrastructure and application layers

## Benefits and Perks - Competitive salary and benefits package - Opportunity to work with a dynamic team and contribute to the growth and evolution of the infrastructure - Remote work arrangement, with flexible working hours and a stipend for remote work setup - Professional development opportunities, with access to training and conferences - Health insurance and retirement plan - Generous paid time off and holiday package - Access to the latest tools and technologies, with a budget for professional development and continuous learning

How to Stand Out

- tip: Make sure to highlight your experience with AWS services, Terraform, and Puppet in your resume and cover letter.

  • tip: Be prepared to explain your understanding of distributed systems concepts, including consensus, leader election, and distributed locking.
  • tip: Showcase your proficiency in Python and Linux systems, with examples of automation scripts and infrastructure management.
  • tip: Research LTG's company culture and values, and be prepared to explain why you're a good fit for the team.
  • tip: Don't be afraid to ask about the company's approach to remote work, and what support is available for remote employees.

This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.