Data Engineer
Role Overview
We are looking for a Data Engineer to take ownership of our data infrastructure and pipelines. You will collect data from diverse sources, build and maintain ETL processes, and ensure data quality and consistency across our systems. This role is ideal for a problem-solver who enjoys turning messy, unstructured data into clean, reliable datasets.
Key Responsibilities
Data Collection & Extraction
- Design, develop, and maintain web scrapers to extract data from websites
- Handle complex navigation, pagination, and authenticated sessions
- Integrate with external APIs to fetch and sync data
- Download and process files from various sources
- Monitor data sources for changes and updates
ETL Pipeline Development
- Build and maintain ETL/ELT pipelines to move data between systems
- Clean, normalize, and transform raw data into structured formats
- Handle data format conversions and schema mappings
- Implement incremental and full data loads
- Schedule and automate data workflows
Data Quality & Validation
- Implement validation rules to detect inconsistencies and anomalies
- Build checks for missing fields, duplicates, and data type errors
- Create data quality metrics and monitoring
- Set up logging and alerts for pipeline failures
- Document data issues and implement fixes
Data Governance & Documentation
- Maintain documentation for data sources, schemas, and pipelines
- Define and enforce data standards and naming conventions
- Track data lineage and transformation logic
- Ensure data consistency across multiple systems
Tooling & Automation
- Develop Python scripts and tools for data processing
- Automate repetitive data tasks
- Build internal tools for data monitoring and diagnostics
- Improve existing processes for efficiency and reliability
Technical Requirements
Required Skills
Experience
- 2+ years of experience in Data Engineering, ETL development, or a related field
Programming
- Strong proficiency in Python for data processing and automation
- Experience with data libraries (Pandas, NumPy, or similar)
- Web scraping skills (BeautifulSoup, Scrapy, Selenium, or similar)
Data Processing
- Experience building ETL/ELT pipelines
- Ability to work with various file formats (CSV, Excel, JSON, XML)
- Data cleaning and transformation techniques
- Handling large datasets efficiently
Soft Skills
- Problem-solving mindset for handling messy, inconsistent data
- Strong attention to detail
- Ability to work independently and take ownership
- Good communication skills for collaborating with team members
- Self-motivated and proactive in identifying improvements
Nice-to-Have Skills
- Experience with workflow orchestration tools (Airflow, Prefect, or cron-based scheduling)
- Experience with cloud platforms (AWS, GCP, or Azure)
- Knowledge of data warehousing concepts
- Experience handling multilingual data
- Background in statistical or governmental data sources
- Knowledge of data visualization tools
What You’ll Work On
- Building scrapers to collect data from various websites and sources
- Creating automated pipelines that run on schedule
- Transforming raw data into clean, normalized formats
- Ensuring data quality and consistency across all datasets
- Developing tools to monitor and diagnose data issues
- Collaborating with the team to meet data requirements
How to Apply
Please submit:
- Your resume/CV
- Brief description of a relevant data engineering or scraping project you’ve worked on
- (Optional) Links to GitHub or portfolio