Basic job summary:
The Senior Data Engineer will be responsible for designing, building, and maintaining data infrastructure and pipelines. l mplementing data ingestion frameworks,
Duties & Responsibilities:
Data Pipeline Design and Implementation
Design, implement, and maintain robust data ingestion and processing pipelines for heterogeneous data sources, including soil, weather, agronomic, geospatial, and related contextual datasets.
Develop scalable ETL/ELT workflows to transform raw data into structured, validated, and analytics-ready formats.
Ensure pipelines support both batch and, where required, near-real-time data processing.
Implement data versioning and lineage tracking to support reproducibility and auditability.
Cloud-Based Data Infrastructure
Design and manage cloud-native data architectures, including data lakes, data warehouses, and analytical storage solutions.
Optimize data storage and processing for performance, cost efficiency, and scalability.
Support deployment of data pipelines across development, testing, and pilot environments.
Collaborate with platform teams to ensure infrastructure aligns with DPI principles and interoperability standards.
Data Quality, Governance, and Reliability
Implement automated data quality checks, validation rules, and monitoring to ensure accuracy, completeness, and consistency.
Support enforcement of data governance requirements, including access controls, permissions, and audit logging.
Work with policy and governance partners to ensure technical implementations align with data protection and consent frameworks.
Proactively identify and remediate data reliability risks or bottlenecks.
Enablement of AI and LLM-Based Systems
Prepare and serve data in formats optimized for AI and LLM-based advisory systems, including retrieval-augmented generation (RAG) pipelines and structured knowledge services.
Support model evaluation, benchmarking, and experimentation workflows.
MLOps Support and Operational Readiness
Contribute to MLOps workflows by supporting data versioning, pipeline automation, and integration with model deployment and evaluation processes.
Implement monitoring and logging for data pipelines to support observability and issue diagnosis.
Support reproducible experimentation through consistent data environments and pipeline automation.
Documentation, Collaboration, and Delivery
Produce clear technical documentation covering data architectures, pipeline logic, and operational procedures.
Minimum Academic Qualifications:
Bachelor's degree in Computer Science, Software Engineering, Information Systems, or a closely related technical field
Experience:
Applicants should possess at least 5 years of professional experience in data engineering, with demonstrated responsibility for designing and operating complex data pipelines and data platforms.
Strong experience designing and implementing data ingestion, transformation, and processing pipelines (ETL/ELT) for large and heterogeneous datasets.
Proficiency in Python and SQL, and experience with data processing frameworks and tools commonly used in modern data engineering environments