Data Engineer

Role Overview

We are looking for a Data Engineer to take ownership of our data infrastructure and pipelines. You will collect data from diverse sources, build and maintain ETL processes, and ensure data quality and consistency across our systems. This role is ideal for a problem-solver who enjoys turning messy, unstructured data into clean, reliable datasets.

Key Responsibilities

Data Collection & Extraction

Design, develop, and maintain web scrapers to extract data from websites
Handle complex navigation, pagination, and authenticated sessions
Integrate with external APIs to fetch and sync data
Download and process files from various sources
Monitor data sources for changes and updates

ETL Pipeline Development

Build and maintain ETL/ELT pipelines to move data between systems
Clean, normalize, and transform raw data into structured formats
Handle data format conversions and schema mappings
Implement incremental and full data loads
Schedule and automate data workflows

Data Quality & Validation

Implement validation rules to detect inconsistencies and anomalies
Build checks for missing fields, duplicates, and data type errors
Create data quality metrics and monitoring
Set up logging and alerts for pipeline failures
Document data issues and implement fixes

Data Governance & Documentation

Maintain documentation for data sources, schemas, and pipelines
Define and enforce data standards and naming conventions
Track data lineage and transformation logic
Ensure data consistency across multiple systems

Tooling & Automation

Develop Python scripts and tools for data processing
Automate repetitive data tasks
Build internal tools for data monitoring and diagnostics
Improve existing processes for efficiency and reliability

Technical Requirements

Required Skills

Experience

2+ years of experience in Data Engineering, ETL development, or a related field

Programming

Strong proficiency in Python for data processing and automation
Experience with data libraries (Pandas, NumPy, or similar)
Web scraping skills (BeautifulSoup, Scrapy, Selenium, or similar)

Data Processing

Experience building ETL/ELT pipelines
Ability to work with various file formats (CSV, Excel, JSON, XML)
Data cleaning and transformation techniques
Handling large datasets efficiently

Soft Skills

Problem-solving mindset for handling messy, inconsistent data
Strong attention to detail
Ability to work independently and take ownership
Good communication skills for collaborating with team members
Self-motivated and proactive in identifying improvements

Nice-to-Have Skills

Experience with workflow orchestration tools (Airflow, Prefect, or cron-based scheduling)
Experience with cloud platforms (AWS, GCP, or Azure)
Knowledge of data warehousing concepts
Experience handling multilingual data
Background in statistical or governmental data sources
Knowledge of data visualization tools

What You’ll Work On

Building scrapers to collect data from various websites and sources
Creating automated pipelines that run on schedule
Transforming raw data into clean, normalized formats
Ensuring data quality and consistency across all datasets
Developing tools to monitor and diagnose data issues
Collaborating with the team to meet data requirements

How to Apply

Please submit:

Your resume/CV
Brief description of a relevant data engineering or scraping project you’ve worked on
(Optional) Links to GitHub or portfolio