About The Role
Engineering at Moniepoint is an inspired, customer-focused community dedicated to crafting solutions that redefine our industry. Our infrastructure runs on some of the cool tools that excite infrastructure engineers - kubernetes, docker etc.
We also make business decisions based on the large stream of data we receive daily, so we work daily with big data, perform data analytics and build models to make sense of the noise and give our customers the best experience.
We are seeking an experienced Cloud Engineer to design, implement, and manage our multi-cloud infrastructure.
The ideal candidate will have deep expertise in cloud platforms, container orchestration, infrastructure automation, CI/CD pipelines, and observability solutions, ensuring scalable, reliable, and cost-effective cloud operations across multiple cloud providers.
Principal Duties And Responsibilities Cloud Infrastructure Management:
Design, deploy, and manage multi-cloud infrastructure across Google Cloud Platform (GCP), Amazon Web Services (AWS), Azure, and Oracle Cloud Infrastructure (OCI)
Architect and implement highly available, fault-tolerant, and scalable cloud solutions
Manage cloud resources including compute instances and networking components
Design and implement disaster recovery and business continuity plans for cloud workloads
Migrate on-premises applications and services to cloud environments with minimal disruption
Optimize cloud resource utilization and implement auto-scaling policies
Maintain comprehensive documentation of cloud architectures, configurations, and runbooks
Kubernetes & Container Orchestration:
Design, deploy, and manage production-grade Kubernetes clusters across multiple cloud providers
Implement and maintain container orchestration strategies for microservices architectures
Configure and manage Kubernetes resource objects
Manage Kubernetes cluster upgrades, scaling, and performance optimization
Troubleshoot complex container and orchestration issues in production environments
Implement multi-cluster and multi-region Kubernetes deployments for high availability
Service Mesh & Advanced Networking:
Design, deploy, and manage Istio service mesh for microservices communication and observability
Configure Istio traffic management, including virtual services, destination rules, and gateways
Implement advanced traffic routing (canary deployments, A/B testing, traffic splitting) using Istio
Deploy and manage Istio observability components (telemetry, distributed tracing, service graphs)
Implement circuit breaking, retries, timeouts, and fault injection for resilience testing
Configure Istio ingress and egress gateways for external traffic management
Monitor and optimize service mesh performance and resource utilization
Implement multi-cluster service mesh architectures across different cloud providers
Reverse Proxy & Load Balancing:
Deploy, configure, and manage HAProxy for high-performance load balancing and reverse proxy
Implement HAProxy ACLs, backend routing, health checks, and session persistence
Design and implement Nginx as reverse proxy for web applications and API gateways
Configure Nginx for rate limiting and request filtering
Implement Nginx load balancing algorithms and upstream health monitoring
Manage Nginx Plus features for advanced traffic management and monitoring
Optimize HAProxy and Nginx performance for high-throughput environments
Infrastructure as Code & Configuration Management:
Develop and maintain infrastructure as code using Terraform
Create reusable, modular Terraform configurations for various cloud resources and Implement Terraform state management and remote backends
Design and implement configuration management solutions using Ansible
Develop Ansible playbooks and roles for automated server provisioning and configuration
Integrate Terraform and Ansible workflows for end-to-end infrastructure automation
Implement infrastructure version control, code review processes, and GitOps practices
Manage infrastructure drift detection and remediation
Create and maintain infrastructure documentation and architecture diagrams
Implement policy-as-code using tools like OPA (Open Policy Agent) or Sentinel
CI/CD Pipeline Management:
Design, implement, and maintain continuous integration pipelines using Jenkins and Harness
Optimize build times and pipeline efficiency
Integrate security scanning (SAST, DAST, container scanning) into CI/CD pipelines
Configure Jenkins jobs, pipelines, and shared libraries for automated build, configure build agents, runners, and execution environments
Implement Harness deployment pipelines for cloud-native applications
Integrate CI/CD pipelines with version control systems (Git, GitHub, GitLab)
Implement continuous deployment workflows using ArgoCD for Kubernetes-based applications
Design and implement GitOps workflows with ArgoCD for declarative application delivery
Manage ArgoCD application definitions, sync policies and multi-cluster deployments
Implement progressive delivery strategies (blue-green deployments, canary releases) using ArgoCD
Message Streaming & Event-Driven Architecture:
Deploy and manage Apache Kafka clusters for real-time data streaming and event-driven architectures
Configure Kafka topics, partitions, replication factors, and retention policies
Implement Kafka Connect for data integration with various sources and sinks
Monitor Kafka cluster health, performance metrics, and consumer lag
Optimize Kafka performance for high-throughput and low-latency use cases
Troubleshoot Kafka producer and consumer issues
Database & Proxy Management:
Deploy, configure, and manage ProxySQL for MySQL load balancing and high availability
Implement query routing, caching, and connection pooling strategies using ProxySQL
Optimize database performance through ProxySQL query analysis and optimization
Implement database failover and disaster recovery using ProxySQL
Monitor ProxySQL metrics and troubleshoot connection and performance issues
Integrate ProxySQL with database clusters and replication topologies
Implement database access security and audit logging through ProxySQL
Cloud Networking:
Design and implement cloud networking architectures, including VPCs, subnets, and network segmentation
Configure and manage cloud load balancers (Application Load Balancers, Network Load Balancers, Cloud Load Balancing)
Implement VPN connections, Direct Connect/Interconnect, and hybrid cloud networking solutions
Implement network security controls, including security groups, network ACLs, and firewall rules
Implement network monitoring and traffic analysis
Troubleshoot complex networking issues across multi-cloud environments
Design and implement private connectivity between cloud providers
Secrets Management & Security:
Configure and manage HashiCorp Vault for centralized secrets management across multi-cloud environments
Configure Vault secret engines (KV, database, PKI, AWS, GCP, Azure dynamic secrets)
Manage Vault high availability clusters and disaster recovery procedures
Implement dynamic database credentials and secret rotation strategies
Manage Vault encryption as a service for application-level encryption
Implement Vault agent and sidecar injectors for Kubernetes workloads
Migrate secrets from legacy systems to Vault
Qualifications, Competency & Skills Required Education & Experience
Bachelor's degree or diploma in Computer Science, Information Technology, Engineering, or related field
Minimum of 5 years of proven experience in cloud engineering, DevOps, or platform engineering roles
Hands-on experience managing production workloads across multiple cloud platforms
Relevant cloud and technology certifications are highly desirable
Technical Skills Cloud Platforms (Required):
Google Cloud Platform (GCP): Deep expertise in Compute Engine, GKE, Cloud Storage, Cloud SQL, VPC, Cloud Functions, Cloud Run, IAM
Amazon Web Services (AWS): Proficiency in EC2, EKS, S3, RDS, VPC, Lambda, ECS, CloudFormation, IAM
Microsoft Azure: Experience with Virtual Machines, AKS, Blob Storage, Azure SQL, Virtual Networks, Azure Functions, ARM templates
Oracle Cloud Infrastructure (OCI): Familiarity with Compute, OKE, Object Storage, networking, and OCI-specific services
Multi-cloud architecture design and implementation experience
Cloud migration strategies and execution (lift-and-shift, re-platforming, re-architecting)
Container & Orchestration (Required):
Expert-level Kubernetes knowledge, including cluster architecture, networking, storage, and security
Hands-on experience with managed Kubernetes services (GKE, EKS, AKS)
Proficiency in Docker containerization, image optimization, and registry management
Experience with Helm charts for application packaging and deployment
Knowledge of container runtime environments (containerd, CRI-O)
Service Mesh & Microservices (Required):
Istio: Deep expertise in Istio architecture, deployment, and operations
Istio traffic management (virtual services, destination rules, gateways, service entries)
Istio security features (mTLS, authorization policies, peer authentication, request authentication)
Istio observability and telemetry configuration
Multi-cluster and multi-mesh deployments
Service mesh troubleshooting and performance optimization
Understanding of sidecar proxy patterns and Envoy proxy
Experience with other service mesh solutions (Linkerd, Consul Connect) is a plus
Reverse Proxy & Load Balancing (Required):
HAProxy Advanced configuration and management for load balancing and high availability
Nginx Expert-level configuration as reverse proxy and API gateway
Nginx rate limiting, and performance tuning
Nginx load balancing algorithms and upstream configurations
Experience with Nginx modules and custom configurations
High availability configurations using keepalived, VRRP, or similar
Integration with Kubernetes ingress controllers (Nginx Ingress, Istio Ingress)
Infrastructure As Code (Required):
Advanced Terraform skills for multi-cloud infrastructure provisioning
Terraform module development, state management, and workspace strategies
Proficiency in Ansible for configuration management and automation
Ansible playbook development, roles, and inventory management
Experience with version control systems (Git) and GitOps workflows
Infrastructure testing frameworks (Terratest, Kitchen-Terraform)
CI/CD Tools (Required):
Jenkins: Pipeline development (declarative and scripted), shared libraries, plugin management
Harness: Deployment pipeline configuration, workflow creation, approval gates
ArgoCD: GitOps workflows, application synchronization, multi-cluster management
Integration of CI/CD tools with Kubernetes and cloud platforms
Automated testing and deployment strategies
Artifact repository management (Nexus, Artifactory, cloud-native registries)
Messaging & Streaming (Required):
Apache Kafka architecture, cluster management, and operations
Kafka topic design, partitioning strategies, and performance tuning
Kafka Connect experience
Experience with Kafka management tools (Kafka Manager, Cruise Control)
Understanding of event-driven architectures and patterns
Database & Proxy Technologies (Required):
ProxySQL configuration, management, and optimization
MySQL database administration basics
Understanding of database replication and clustering
Observability & Monitoring (Required):
Prometheus metrics collection, PromQL, and alerting rules
Grafana dashboard design and visualization techniques
Log aggregation and analysis
Networking (Required):
Deep understanding of TCP/IP, DNS, HTTP/HTTPS, and network protocols
Cloud networking concepts (VPC, subnets, routing tables, NAT, VPN)
Load balancing strategies and implementations
Service discovery and DNS-based routing
Network security and firewall configuration
Software-defined networking (SDN) concepts
Scripting & Programming:
Proficient in scripting languages: Python, Bash
Go or python programming basics for tooling development
YAML and JSON for configuration management
Understanding of software development best practices
Secrets Management (Required):
HashiCorp Vault: Advanced knowledge of Vault architecture, deployment, and operations
Vault authentication methods and integration with cloud providers and Kubernetes
Vault secret engines (KV v1/v2, database, transit, cloud dynamic...
What To Expect In The Hiring Process:
A technical interview with the Hiring Manager
A behavioural and technical interview with a member of the Executive team.