Lead Data Engineer
Johannesburg, South Africa
Duration
6
MONTHS
Negotiable
Ref
Esme - NS
Starts
ASAP
Opened On
22/01/2025
Required Skills
data delivery
data architecture
data sources
analytics
python
algorithms
sql databases
data retrieval
ssis
pipelines
Job Description

Job Description Summary

Minimum 5 years experience preferably from financial background as a data engineer.

Bachelor’s degree in Information Technology, Computer Science, Software Development, Engineering, or a related field.

Builds and monitors data pipelines from data retrieval, storage, database design, and distribution of information assets through the organisation and cloud solution.

Requirements and Skills:

• 5+ years of experience in data engineering within a production environment

• Advanced knowledge of PySpark, Python and Linux shell scripting

• Transformation and loading of data using data pipelines in MS SSIS, Databricks and StreamSets

• Experience with SQL/NoSQL databases and Hadoop

• Familiarity with Docker, Kubernetes, and cloud services (Databricks, AWS)

• Bonus: machine learning knowledge

Key accountabilities

• Assemble large, complex data sets to meet functional / non-functional requirements to the best big data practices

• Source data from internal and external data sources, engaging with technical subject matter experts

• Explore, analyse, and profile data from various internal and external data sources, and assist data scientist in preparing data for analytical purposes

• Ensure delivered solutions meet Systems Integration and User Acceptance Testing criteria.

• Productionalise solutions and ensure daily data refresh processes run successfully

Data Architecture & Data Engineering

• Understand the technical landscape and bank wide architecture that is connected to or dependent on the business area supported in order to effectively design & deliver data solutions (architecture, pipeline etc.)

• Translate / interpret the data architecture direction and associated business requirements & leverage expertise in analytical & creative problem solving to synthesise data solution designs (build a solution from its components) beyond the analysis of the problem

• Participate in design thinking processes to successfully deliver data solution blueprints

• Leverage state of the art relational and No-SQL databases as well integration and streaming platforms do deliver sustainable business specific data solutions.

• Design data retrieval, storage & distribution solutions (and OR components thereof) including contributing to all phases of the development lifecycle e.g. design process

• Develop high quality data processing, retrieval, storage & distribution design in a test driven & domain driven / cross domain environment

• Build analytics tools that utilize the data pipeline by quickly producing well organised, optimized, and documented source code & algorithms to deliver technical data solutions

• Create & Maintain Sophisticated CI / CD Pipelines (authoring & supporting CI/CD pipelines in Jenkins or similar tools and deploy to multi-site environments – supporting and managing your applications all the way to production)

• Debug existing source code and polish feature sets.

• Assemble large, complex data sets that meet business requirements & manage the data pipeline

• Build infrastructure to automate extremely high volumes of data delivery

• Create data tools for analytics and data science teams that assist them in building and optimizing data sets for the benefit of the business

• Ensure designs & solutions support the technical organisation principles of self service, repeatability, testability, scalability & resilience

• Apply general design patterns and paradigms to deliver technical solutions

• Inform & support the infrastructure build required for optimal extraction, transformation, and loading of data from a wide variety of data source

• Support the continuous optimisation, improvement & automation of data processing, retrieval, storage & distribution processes

• Ensure the quality assurance and testing of all data solutions aligned to the QA Engineering & broader architectural guidelines and standards of the organization

• Implement & align to the Group Security standards and practices to ensure the undisputable separation, security & quality of the organisation’s data

• Meaningfully contribute to & ensure solutions align to the design & direction of the Group Architecture & in particular data standards, principles, preferences & practices. Short term deployment must align to strategic long term delivery.

• Meaningfully contribute to & ensure solutions align to the design and direction of the Group Infrastructure standards and practices e.g. OLA’s, IAAS, PAAS, SAAS, Containerisation etc.

• Monitor the performance of data solutions designs & ensure ongoing optimization of data solutions

• Stay ahead of the curve on data processing, retrieval, storage & distribution technologies & processes (global best practices & trends) to ensure best practice

People

• Coach & mentor other engineers

• Conduct peer reviews, testing, problem solving within and across the broader team

• Build data science team capability in the use of data solutions

Risk & Governance

• Identify technical risks and mitigate these (pre, during & post deployment)

• Update / Design all application documentation aligned to the organization technical standards and risk / governance frameworks

• Create business cases & solution specifications for various governance processes (e.g. CTO approvals)

• Participate in incident management & DR activity – applying critical thinking, problem solving & technical expertise to get to the bottom of major incidents

• Deliver on time & on budget (always