Key Purpose
An intermediate Data Scientist that will help develop and integrate natural language processing (NLP), advanced large language models (LLMs), with a particular focus on the use of voice models, voice chat bots and conversational design. These innovative capabilities will enable unstructured autonomous decision-making processes that involve voice, increase operational efficiency, and deliver personalized, audible customer experiences.
Areas of responsibility may include but are not limited to
Project Contribution: By collaborating with contents specialists, data engineers and system architects, contribute to the deployment of solutions and delivery of projects from inception through to business adoption, with a particular focus on Voice AI.
Model Development and Integration: Deploy robust and scalable LLMs that are integrated into AI frameworks, enhancing natural language processing capabilities.
Advanced Data Retrieval Integration: Incorporate techniques like Retrieval-Augmented Generation (RAG), vector databases, and semantic searches to improve precision and relevance in data extraction from large datasets.
Automation Workflows: Design and implement automation workflows that improve speed and accuracy of processes, reducing manual intervention and operational costs.
Performance Metrics and System Maintenance: Develop performance metrics to consistently evaluate the efficiency and accuracy of AI models. Actively monitor and maintain these systems to ensure they remain effective and adaptable.
Regulatory Compliance: Ensure that all AI models and data handling practices comply with relevant laws and ethical guidelines, preparing documentation and reports as required for regulatory bodies.
Experimentation: Prototype ML systems and AI concepts, particularly those using NLP and LLMs, and evaluate the effects of different models and techniques on AI performance.
Future Trends and Industry Insights: Continuously monitor advancements in AI and LLM technologies and review relevant academic literature and industry releases to ensure our strategies and implementations align with the latest innovations and standards.
Knowledge and Skills
SQL and working with databases.
Python for data science and machine learning.
Competent with TensorFlow, PyTorch, NLP and LLM packages - Advantageous
Familiarity with Azure services - Advantageous
Familiarity with Databricks - Advantageous
Education and Experience
Education:
Matric (Essential)
Bachelor's degree in Computer Science, Mathematics, Statistics, Data Science, Actuarial Science, Statistics, Operations Research, Industrial engineering, Applied Mathematics, or similar quantitative field.
Honour's or Master's Degree in relevant field - Advantageous
Minimum Experience:
2-4 years' experience in a data science environment.
Demonstrated ability to implement ML workflows at scale, particularly using LLMs.
Experience in handling, analysing, and extracting insights from large and complex datasets, particularly unstructured text data.
Previous experience tuning open source and proprietary large language models - Advantageous
Previous experience with Voice AI models and conversational design - Advantageous