Description:
As a Site Reliability Engineer at Qlik, you’ll sit at the heart of our cloud ecosystem, helping power the reliability, security, and scalability of Qlik and Talend Cloud services used around the world.
This is your opportunity to work on systems operating at serious scale — supporting millions of transactions across a global cloud environment — while shaping how reliability engineering is done across the business.
You won’t just “keep the lights on.” You’ll design, improve, automate, and elevate how modern cloud platforms perform. If you’re motivated by complex distributed systems, Kubernetes at scale, and solving meaningful engineering challenges, this is where you’ll thrive.
What makes this role interesting?
This is a role for engineers who love depth, autonomy, and impact.
You’ll
- Solve real scale challenges – Work on reliability and performance across a global cloud platform handling millions of transactions.
- Engineer, not just operate – Build tooling, automation, alerts, and scalable infrastructure patterns that prevent problems before they happen.
- Collaborate with highly skilled teams – Partner with Global SRE, Architecture, Platform, and Domain Engineering teams to influence how infrastructure is designed from the ground up.
- Work with modern cloud-native technologies – Kubernetes, IaC, observability tooling, autoscaling, secret management, CI/CD — you’ll be hands-on with today’s most relevant technologies.
- Shape best practices – Help define and champion cloud optimization and reliability standards across the organization.
- Grow your technical influence – Act as a go-to resource for reliability, incident management, cloud engineering, and production operations.
- Continuously evolve – Stay close to emerging tools and practices, contributing to ongoing improvements in our cloud environment.
If you enjoy working on systems that demand thoughtful engineering — not quick fixes — you’ll find this role deeply rewarding.
Here’s How You’ll Be Making An Impact
Your work will directly influence the stability and performance of services relied on by customers worldwide.
You Will
- Increase reliability and availability by implementing resilient infrastructure patterns and performance optimizations.
- Reduce incidents and recovery time through better observability, automation, and proactive engineering.
- Strengthen scalability by designing infrastructure that adapts seamlessly to growth.
- Improve cloud efficiency by driving optimization best practices across AWS and Azure environments.
- Resolve complex system challenges across infrastructure, networking, applications, and distributed systems.
- On-Call Support: Participate in on-call duties to maintain the availability and performance of our cloud infrastructure, providing regular updates on project status and activities. This includes first-line incident response.
- Elevate engineering standards by mentoring peers and embedding reliability-first thinking into development workflows.