Position A: Geoscience Associate Researcher (Bushveld Knowledge Curation & Geological QA)
Supervisor: Prof. Rais Latypov
Primary focus: Geological validation, controlled terminology/stratigraphic context, geochemical dataset curation, and quality assurance of extracted knowledge.
Position B: Computational Associate Researcher (OCR, Corpus Engineering, Data Pipelines, RAG Workflow and LLM for knowledge extraction)
Supervisor: Prof. Glen T. Nwaila
Primary focus: Corpus ingestion, OCR/text normalization, metadata standards, indexing/retrieval workflow, benchmarking, and reproducible pipeline implementation and front-end retrieval API.
Key Tasks and Responsibilities
Shared responsibilities (both positions):
Contribute to building and maintaining the Bushveld corpus (peer¿reviewed papers, theses, technical reports, maps, datasets) with clear provenance.
Implement quality¿control routines ensuring accuracy, consistency, and citation fidelity.
Participate in regular review meetings and staged deliverables during the 12¿month project.
Position A: Geoscience AR (Bushveld curation & geological QA):
Curate and validate geological content (stratigraphic units, lithologies, marker horizons, terminology).
Curate and structure geochemical datasets including sample metadata and stratigraphic attribution.
Develop and maintain a controlled vocabulary ("Bushveld ontology").
Evaluate RAG outputs and verify that statements are supported by retrieved sources.
Position B: Computational AR (corpus engineering, pipelines & RAG workflow):
Design document processing workflows including OCR, text normalization, segmentation, and context preservation.
Build and maintain metadata standards and reference¿management structures.
Implement hybrid retrieval systems (keyword + vector search) with source/page traceability.
Establish benchmarking and QA protocols for retrieval accuracy.
Maintain modular and reproducible codebase using version control.
LLM for language processing and knowledge graphs
Required Qualifications
Position A: Geoscience AR
Degree in Geoscience/Geology (BSc Honours, MSc, or PhD in progress/completed).
Training or interest in igneous petrology, layered intrusions, or economic geology.
Strong attention to detail and ability to work independently.
Position B: Computational AR
Degree in Computer Science, Computer Engineering, GeoData Science, Information Science, Geoinformatics/GIS, or related field.
Ability in scientific data management and coding (Python).
Knowledge of open-source LLMs such as Ollama.
Strong organizational and reproducible workflow habits.
Desirable Skills
Position A: Geoscience AR
Familiarity with the Bushveld Igneous Complex.
Experience working with geochemical datasets and data validation.
Reference management experience (Zotero, EndNote, etc.).
Position B: Computational AR
Experience with OCR workflows and metadata schema design.
Familiarity with SQL/databases, Git, semantic search or RAG systems.
Interest in geoscience problems and willingness to learn Bushveld terminology.