C

Site Reliability Engineer at Cowrywise

Cowrywise
May 23, 2026
Full-time
On-site
We're looking for a Site Reliability Engineer (SRE) to help build, maintain, and scale the infrastructure powering Cowrywise.You'll work closely with our engineering team to improve reliability, observability, security, and deployment processes across our systems.
Our infrastructure team specializes across four areas: Cloud, Databases, Platform, and Observability. We run primarily on AWS with some workloads on GCP. For this role, we're particularly interested in someone who can raise the bar on observability, helping us detect issues faster and resolve them with confidence.

What you'll do

Generally, members of the infrastructure team are able to do the following


Design, maintain, and improve cloud infrastructure and internal platforms
Improve system reliability, scalability, and performance across services
Build and maintain CI/CD pipelines and deployment workflows
Implement monitoring, logging, alerting, and observability systems
Respond to incidents, troubleshoot production issues, and lead root cause analysis
Automate operational tasks and infrastructure provisioning
Work with engineering teams to improve service architecture and operational readiness
Improve security posture, access controls, and infrastructure best practices
Manage containerized workloads and orchestration platforms
Maintain disaster recovery, backup, and high availability strategies


What we're looking for

Required


4+ years of experience in an SRE, DevOps, or Platform Engineering role running production systems
Strong hands-on experience with AWS (compute, networking, IAM, storage, managed services)
Deep expertise in observability designing meaningful metrics, dashboards, alerts, and SLOs that actually catch problems before users do
Hands-on experience with New Relic, Grafana, and Prometheus (or equivalent tooling)
A track record of reducing MTTD and MTTR through better instrumentation, alerting, and incident response practices
Proficiency with Docker and containerized workflows
Solid scripting and automation skills (Python, Bash, Go, or similar)
Experience with infrastructure-as-code (Terraform, Pulumi, or CloudFormation)
Strong Linux fundamentals and networking knowledge
Experience building and maintaining CI/CD pipelines
Comfort leading incident response and writing clear post-mortems


Nice to have


Experience operating Kubernetes in production
Exposure to GCP or multi-cloud environments
Background in one of our specialization areas: Databases (Postgres, MySQL, Redis), Platform engineering, or Cloud architecture
Security-focused experience (IAM hardening, secrets management, compliance frameworks)
Experience in fintech or other regulated, high-availability environments


The people who succeed on this team


People who are proactive and take ownership
Engineers who automate before repeating manual work
People who stay calm and methodical during incidents
Engineers who care about clean systems and operational excellence
Strong collaborators who work well across teams
Curious builders who enjoy learning and improving systems continuously