U

Associate Researcher at University of the Witwatersrand

University of the Witwatersrand
April 18, 2026
Full-time
On-site
Position A: Geoscience Associate Researcher (Bushveld Knowledge Curation & Geological QA)


Supervisor: Prof. Rais Latypov
Primary focus: Geological validation, controlled terminology/stratigraphic context, geochemical dataset curation, and quality assurance of extracted knowledge.


Position B: Computational Associate Researcher (OCR, Corpus Engineering, Data Pipelines, RAG Workflow and LLM for knowledge extraction)


Supervisor: Prof. Glen T. Nwaila
Primary focus: Corpus ingestion, OCR/text normalization, metadata standards, indexing/retrieval workflow, benchmarking, and reproducible pipeline implementation and front-end retrieval API.


Key Tasks and Responsibilities

Shared responsibilities (both positions):


Contribute to building and maintaining the Bushveld corpus (peer¿reviewed papers, theses, technical reports, maps, datasets) with clear provenance.
Implement quality¿control routines ensuring accuracy, consistency, and citation fidelity.
Participate in regular review meetings and staged deliverables during the 12¿month project.


Position A: Geoscience AR (Bushveld curation & geological QA):


Curate and validate geological content (stratigraphic units, lithologies, marker horizons, terminology).
Curate and structure geochemical datasets including sample metadata and stratigraphic attribution.
Develop and maintain a controlled vocabulary ("Bushveld ontology").
Evaluate RAG outputs and verify that statements are supported by retrieved sources.


Position B: Computational AR (corpus engineering, pipelines & RAG workflow):


Design document processing workflows including OCR, text normalization, segmentation, and context preservation.
Build and maintain metadata standards and reference¿management structures.
Implement hybrid retrieval systems (keyword + vector search) with source/page traceability.
Establish benchmarking and QA protocols for retrieval accuracy.
Maintain modular and reproducible codebase using version control.
LLM for language processing and knowledge graphs


Required Qualifications

Position A: Geoscience AR


Degree in Geoscience/Geology (BSc Honours, MSc, or PhD in progress/completed).
Training or interest in igneous petrology, layered intrusions, or economic geology.
Strong attention to detail and ability to work independently.


Position B: Computational AR


Degree in Computer Science, Computer Engineering, GeoData Science, Information Science, Geoinformatics/GIS, or related field.
Ability in scientific data management and coding (Python).
Knowledge of open-source LLMs such as Ollama.
Strong organizational and reproducible workflow habits.


Desirable Skills

Position A: Geoscience AR


Familiarity with the Bushveld Igneous Complex.
Experience working with geochemical datasets and data validation.
Reference management experience (Zotero, EndNote, etc.).


Position B: Computational AR


Experience with OCR workflows and metadata schema design.
Familiarity with SQL/databases, Git, semantic search or RAG systems.
Interest in geoscience problems and willingness to learn Bushveld terminology.