Spark Scala Engineer at Pyramid Consulting Europe Ltd, Leeds, 6 Months, £Contract Rate

  • Contract Spy
  • Hybrid (Leeds, United Kingdom)
  • Jul 04, 2024
6 Months or more Information Technology

Contract Description

Job Description:

Job Title: Spark Scala Engineer
Location: Leeds, UK (3 days/week, onsite)
Job Type: 6+ month and possibility of extension

Looking for an enthusiastic Spark Scala Engineer who will be responsible for designing, building, and maintaining data pipelines using Apache Spark and Scala.

This includes tasks like:

  • Extracting data from various sources (databases, APIs, files)
  • Transforming and cleaning the data
  • Loading the data into data warehouses or data lakes (eg, BigQuery, Amazon Redshift)
  • Automating the data pipeline execution using scheduling tools (eg, Airflow)
  • Data analysis and modelling: While the primary focus might be on data engineering, some JDs might require basic data analysis skills: Writing analytical queries using SQL or Spark SQL to analyse processed data and Building simple data models to understand data relationship.

Work with Big Data technologies: You'll likely work with various Big Data technologies alongside Spark, including:

  1. Hadoop Distributed File System (HDFS) for storing large datasets
  2. Apache Kafka for Real Time data streaming
  3. Apache Hive for data warehousing on top of HDFS
  4. Cloud platforms like AWS, Azure, or GCP for deploying and managing your data pipelines

Your benefits:

As the Spark Scala Engineer, you will have the opportunity to work with one of the biggest IT landscapes in the world. You can also look forward to being mentored and groomed in your career journey by some of the finest in the business.

Your responsibilities:

As a Spark Scala Engineer you will be working for- GDT (Global Data Technology) Team, you will be responsible for:

  • Designing, building, and maintaining data pipelines using Apache Spark and Scala
  • Working on an Enterprise scale Cloud infrastructure and Cloud Services in one of the Clouds (GCP).

Mandatory Skills:

  • At least 8+ Years of IT Experience with designing, building, and maintaining data pipelines.
  • At least 4+ Years of experience with designing, building, and maintaining data pipelines using Apache Spark and Scala
  • Programming languages: Proficiency in Scala and Spark is essential. Familiarity with Python and SQL is often a plus.
  • Big Data technologies: Understanding of HDFS, Kafka, Hive, and cloud platforms is valuable.
  • Data engineering concepts: Knowledge of data warehousing, data pipelines, Datamodelling, and data cleansing techniques is crucial.
  • Problem-solving and analytical skills : You should be able to analyze complex data problems, design efficient solutions, and troubleshoot issues.
  • Communication and collaboration: The ability to communicate effectively with data scientists, analysts, and business stakeholders is essential.
  • Ready to work at least three days from Leeds (UK) office and accept changes as per customer policies .
  • To be able to traverse and explain the system designs and file format usages you have been a part of and why any tool/technology was used.

Good to have skills.

Ideally, you should be familiar with

  • Machine learning libraries: Familiarity with Spark ML or other machine learning libraries in Scala can be advantageous.
  • Cloud computing experience: Experience with cloud platforms like AWS, Azure, or GCP for data pipelines deployment is a plus.
  • DevOps tools: Knowledge of DevOps tools like Git, CI/CD pipelines, and containerization tools (Docker, Kubernetes) can be beneficial.