Real-time Electricity Consumption Data Pipeline
An end-to-end real-time data streaming pipeline for electricity consumption in Gauteng using Python, Kafka, Docker, and Google Cloud services.
Technologies Used
Key Features
- • Real-time Data Streaming
- • Data Validation
- • Cloud Integration
- • Scalable Architecture
Cloud Services
- • Google Cloud SQL
- • Compute Engine
- • Cloud Storage
- • Cloud Studio
Project Overview
This project demonstrates an end-to-end real-time data streaming pipeline for electricity consumption in Gauteng. It simulates smart meter data generation, performs real-time validation, and persists clean data from SQLite to a managed SQL database, ready for analytics. The pipeline is designed to be scalable, fault-tolerant, and production-ready.
Key Components
- • Producer: Simulates smart meter data generation (datagen-producer.py)
- • Kafka Broker: Manages real-time message streaming
- • Processor: Validates and filters incoming data (streamer.py)
- • Transporter: Writes valid data to database (google_cloud_writer.py)
- • Database: Stores processed data in Google Cloud SQL
- • Analytics: Query and visualize using Cloud Studio
Implementation Details
1. System Architecture

The architecture diagram above illustrates the end-to-end flow of our real-time electricity consumption data pipeline, from data ingestion to visualization.
2. Pipeline Flow
- • Data Generator creates synthetic consumption data (city, sector, voltage, power usage)
- • Kafka Broker manages topics for raw and processed data
- • Streamer filters valid data (voltage 200–250V, power ≤ 5000 kWh)
- • Writer persists clean data to Google Cloud SQL
- • Analytics layer enables querying and visualization
3. Development Journey
- • Local Development: Built and tested Python scripts with SQLite
- • Docker Integration: Containerized all components for consistency
- • Cloud Migration: Deployed to Google Cloud with ARM64 VM
- • System Configuration: Set up systemd for service persistence
4. Data Schema
Table: gp_electric_cons Field Type id INT timestamp TIMESTAMP city TEXT sector TEXT power_cons_in_kwh FLOAT voltage INT
Challenges & Solutions
- • Docker Compatibility: Customized Dockerfiles for ARM64 architecture
- • Cloud VM Setup: Implemented systemd for persistent services
- • PostgreSQL Access: Configured SSL and IAM bindings with retry logic
- • Network Configuration: Set up VPC for secure communication
Future Enhancements
- • Integrate monitoring with Prometheus + Grafana
- • Support batch inserts or BigQuery sink
- • Expand sectors and cities for richer simulation
- • Add authentication and access control for Cloud SQL
Interested in Similar Projects?
I'm always open to discussing new projects, creative ideas, or opportunities to be part of your vision.
Get in Touch