Job Title: Data lakes, Hadoop Developer
Location: London
Work model: Hybrid
Key Responsibilities
Design, build, and manage scalable Data Lakes to support large-scale data processing and analytics.
Develop and maintain Big Data solutions using the Hadoop ecosystem (HDFS, Hive, HBase, Spark, Pig, MapReduce, etc.).
Implement data ingestion pipelines and workflows for structured, semi-structured, and unstructured data.
Optimize data processing and storage to ensure high performance and low latency.
Collaborate with data engineers, analysts, and scientists to provide robust and efficient data access solutions.
Monitor and troubleshoot data pipelines and applications to ensure reliability and accuracy.
Implement data security, governance, and compliance practices across the data lake and Hadoop systems.
Stay updated with emerging Big Data technologies and recommend tools or approaches to enhance the data platform.
Required Skills And Qualifications
Proven experience with Hadoop ecosystems, including HDFS, YARN, Hive, HBase, MapReduce, and Spark.
Expertise in Data Lake architectures and principles.
Proficiency in programming languages such as Python, Java, or Scala for Big Data processing.
Hands-on experience with ETL tools, data ingestion frameworks, and workflow schedulers (e.g., Apache Nifi, Airflow).
Strong knowledge of cloud platforms such as AWS (S3, EMR, Glue), Azure (Data Lake Storage, Synapse), or Google Cloud (BigQuery, Dataflow).
Familiarity with query languages like SQL, HiveQL, or Presto.
Understanding of data governance, security, and compliance (e.g., GDPR, HIPAA).
Excellent problem-solving skills and the ability to debug and resolve issues in distributed systems.
Preferred Qualifications
Experience with Kubernetes, Docker, or other containerization technologies for Big Data deployments.
Knowledge of streaming frameworks like Kafka, Flume, or Spark Streaming.
Hands-on experience in implementing machine learning workflows in a Big Data environment.
Certifications in Big Data technologies or cloud platforms (e.g., AWS Big Data Specialty, Cloudera Certified Professional).
Familiarity with tools like Databricks, Delta Lake, or Snowflake.