Monde Nkuna Logo

Real-time Electricity Consumption Data Pipeline

An end-to-end real-time data streaming pipeline for electricity consumption in Gauteng using Python, Kafka, Docker, and Google Cloud services.

Technologies Used

Python
Kafka
Docker
Google Cloud
PostgreSQL

Key Features

  • • Real-time Data Streaming
  • • Data Validation
  • • Cloud Integration
  • • Scalable Architecture

Cloud Services

  • • Google Cloud SQL
  • • Compute Engine
  • • Cloud Storage
  • • Cloud Studio

Project Overview

This project demonstrates an end-to-end real-time data streaming pipeline for electricity consumption in Gauteng. It simulates smart meter data generation, performs real-time validation, and persists clean data from SQLite to a managed SQL database, ready for analytics. The pipeline is designed to be scalable, fault-tolerant, and production-ready.

Key Components

  • Producer: Simulates smart meter data generation (datagen-producer.py)
  • Kafka Broker: Manages real-time message streaming
  • Processor: Validates and filters incoming data (streamer.py)
  • Transporter: Writes valid data to database (google_cloud_writer.py)
  • Database: Stores processed data in Google Cloud SQL
  • Analytics: Query and visualize using Cloud Studio

Implementation Details

1. System Architecture

System Architecture Diagram

The architecture diagram above illustrates the end-to-end flow of our real-time electricity consumption data pipeline, from data ingestion to visualization.

2. Pipeline Flow

  • • Data Generator creates synthetic consumption data (city, sector, voltage, power usage)
  • • Kafka Broker manages topics for raw and processed data
  • • Streamer filters valid data (voltage 200–250V, power ≤ 5000 kWh)
  • • Writer persists clean data to Google Cloud SQL
  • • Analytics layer enables querying and visualization

3. Development Journey

  • Local Development: Built and tested Python scripts with SQLite
  • Docker Integration: Containerized all components for consistency
  • Cloud Migration: Deployed to Google Cloud with ARM64 VM
  • System Configuration: Set up systemd for service persistence

4. Data Schema

Table: gp_electric_cons
Field           Type
id             INT
timestamp      TIMESTAMP
city           TEXT
sector         TEXT
power_cons_in_kwh FLOAT
voltage        INT

Challenges & Solutions

  • Docker Compatibility: Customized Dockerfiles for ARM64 architecture
  • Cloud VM Setup: Implemented systemd for persistent services
  • PostgreSQL Access: Configured SSL and IAM bindings with retry logic
  • Network Configuration: Set up VPC for secure communication

Future Enhancements

  • • Integrate monitoring with Prometheus + Grafana
  • • Support batch inserts or BigQuery sink
  • • Expand sectors and cities for richer simulation
  • • Add authentication and access control for Cloud SQL

Interested in Similar Projects?

I'm always open to discussing new projects, creative ideas, or opportunities to be part of your vision.

Get in Touch