Key Purpose
A Junior Data Scientist that will help develop and integrate natural language processing (NLP) and advanced large language models (LLMs) to enhance AI frameworks. These innovative capabilities will enable unstructured autonomous decision-making processes, increase operational efficiency, and deliver personalized customer experiences.
Areas of responsibility may include but are not limited to
Project Contribution: By collaborating with contents specialists, data engineers and system architects, contribute to the deployment of solutions and delivery of projects from inception through to business adoption.
Model Development and Integration: Deploy robust and scalable LLMs that are integrated into AI frameworks, enhancing natural language processing capabilities.
Advanced Data Retrieval Integration: Incorporate techniques like Retrieval-Augmented Generation (RAG), vector databases, and semantic searches to improve precision and relevance in data extraction from large datasets.
Automation Workflows: Design and implement automation workflows that improve speed and accuracy of processes, reducing manual intervention and operational costs.
Performance Metrics and System Maintenance: Develop performance metrics to consistently evaluate the efficiency and accuracy of AI models. Actively monitor and maintain these systems to ensure they remain effective and adaptable.
Regulatory Compliance: Ensure that all AI models and data handling practices comply with relevant laws and ethical guidelines, preparing documentation and reports as required for regulatory bodies.
Experimentation: Prototype ML systems and AI concepts, particularly those using NLP and LLMs, and evaluate the effects of different models and techniques on AI performance.
Future Trends and Industry Insights: Continuously monitor advancements in AI and LLM technologies and review relevant academic literature and industry releases to ensure our strategies and implementations align with the latest innovations and standards.
Knowledge and Skills
SQL and working with databases.
Python for data science and machine learning.
Competent with TensorFlow, PyTorch, NLP and LLM packages - Advantageous
Familiarity with Azure services - Advantageous
Familiarity with Databricks - Advantageous
Education and Experience
Education:
Matric (Essential)
Bachelor's degree in Computer Science, Mathematics, Statistics, Data Science, Actuarial Science, Statistics, Operations Research, Industrial engineering, Applied Mathematics, or similar quantitative field.
Honour's or Master's Degree in relevant field - Advantageous
Minimum Experience:
1-2 years' experience in a data science environment.
Demonstrated ability to implement ML workflows at scale, particularly using LLMs. - Advantageous
Experience in handling, analysing, and extracting insights from large and complex datasets, particularly unstructured text data. - Advantageous
Previous experience tuning open source and proprietary large language models - Advantageous