HOSEA KOSGEI
HOSEA KOSGEI
Network Engineer
Home
Projects
Gallery
About
Experience
Contact
Resume
HK
Hosea Kosgei

Network Engineer & Intelligent Systems Developer building solutions for real-world problems.

Quick Links

ProjectsAboutExperienceContact

Connect

GitHubLinkedIn

© —— Hosea Kosgei. All rights reserved.

Back to Projects
Data EngineeringIn Development

Enterprise Data Warehouse

A scalable, cloud-native data warehouse solution built to consolidate siloed business data sources into a single source of truth, with automated ETL pipelines and real-time executive dashboards.

Year

2025

Timeline

Ongoing

Team

Solo Developer

Type

Personal Project

View Live DemoView Code

Enterprise Data Warehouse

Data Engineering

Project Overview

A cloud-native data warehouse project built to demonstrate end-to-end data engineering skills — from raw ingestion to business intelligence. The system consolidates multiple siloed data sources into a single source of truth using modern open-source tooling, enabling reliable analytics and executive reporting without expensive proprietary platforms.

The Challenge

Most organisations sit on mountains of data spread across disconnected systems — spreadsheets, transactional databases, SaaS exports — with no unified layer for analysis. Building a production-grade warehouse that is affordable, reproducible, and maintainable by a single engineer is a non-trivial challenge that this project directly addresses.

The Solution

The warehouse is built on a modern open-source stack: • Apache Airflow orchestrates scheduled ETL pipelines from multiple source systems • DuckDB serves as the analytical engine — fast, local-first, and free • dbt handles data transformation, testing, and documentation • AWS S3 stores raw and processed data as a cost-effective data lake layer • Apache Superset powers interactive dashboards and executive reporting • Python scripts handle custom extraction and loading logic • GitHub Actions automates testing and deployment of dbt models

Results & Impact

  • Consolidates multiple data sources into a single analytical layer
  • Fully automated ETL pipelines with scheduled Airflow DAGs
  • dbt models with built-in data quality tests
  • Interactive dashboards surfacing business KPIs in real time
  • Entirely free and open-source stack — no vendor lock-in

Project Gallery

Analytics Dashboard

ETL Pipeline Overview

Data Model Diagram

Project Info

Type

Personal Project

Duration

Ongoing

Team Size

Solo Developer

Status

In Development

Technologies Used

Orchestration

Apache AirflowPython

Transformation

dbtSQL

Storage & Compute

DuckDBAWS S3

Visualisation

Apache Superset

DevOps

DockerGitHub Actions
Project Links
Live DemoSource Code

Interested in Working Together?

I'm actively looking for internship opportunities and open to collaborating on projects in networking, intelligent systems, and data engineering.

Get in TouchView All Projects