Job Description
The Data Engineer will assist with strategic planning and oversee implementation of the Sponsor's cloud-based data environment. This role supports analysts through the provision of large datasets, methodologies, and data visualizations to address pressing intelligence questions.
Key responsibilities include:
- Developing and maintaining a cloud-based data environment to transport, store, extract, transform, and load (ETL), and disseminate data solutions
- Mapping data sources and implementing access controls
- Developing code, data models, and documentation to Sponsor standards
- Providing systems administration and programming support for ETL processes and data infrastructure efforts
- Training team members on issues and technologies related to Sponsor ETL process, on-premise high capacity compute cluster, and administrative duties
- Coordinating with external data and platform providers to ensure smooth functioning of systems and data flows
- Supporting the acquisition of new datasets or data management technologies
- Facilitating cross-domain transfer and integration of data
Mandatory Skills
- Demonstrated experience serving as a technical liaison between system engineers, data engineers, data scientists, analysts, and non-technical managers and personnel
- Demonstrated experience with AWS cloud services, including long-term storage options, and cloud-based database services such as Databricks or Elastic MapReduce (EMR)
- Demonstrated experience with SQL database structures and mapping between SQL databases
- Demonstrated experience in large-scale data migration efforts
- Demonstrated experience with database architecture, performance design methodologies, and system-tuning recommendations (preference for familiarity with Glue, Hive, and Iceberg or similar)
- Demonstrated experience with Python, Bash, and Terraform
- Demonstrated experience with DevSecOps solutions and tools
- Demonstrated experience implementing CI/CD pipelines using industry standard process
Desired Skills
- Demonstrated experience with the Sponsor's data environment and on-premises compute structure
- Demonstrated experience with Data Quality and Data Governance concepts and experience
- Demonstrated experience maintaining, supporting, and improving the ETL process through the implementation and standardization of data flows with Apache Nifi and other ETL tools
- Demonstrated experience with Apache Spark