Moringa-wk9 : Apache Kafka and Streamlit exploration

Wednesday 26-April Project Brief: Data Streaming with Kafka

Background: Telecommunications Mobile Money Data Engineering with Kafka

In this project, you will work with telecommunications mobile money data to build a Kafka data engineering solution. You will be provided with a dummy json file containing sample data that you will use to test your solution.

The project aims to build a Kafka pipeline that can receive real-time data from telecommunications mobile money transactions and process it for analysis. The pipeline should be designed to handle high volumes of data and ensure that the data is processed efficiently.

To complete this project, you will need to follow these steps:
1. Set up a Kafka cluster: You must set up a Kafka cluster that can handle high volumes of data. You can use either a cloud-based or on-premises Kafka cluster.
2. Develop a Kafka producer: You must develop a Kafka producer that can ingest data from telecommunications mobile money transactions and send it to the Kafka cluster. The producer should be designed to handle high volumes of data and ensure that the data is sent to the Kafka cluster efficiently.
3. Develop a Kafka consumer: You must develop a Kafka consumer to receive data from the Kafka cluster and process it for analysis. The consumer should be designed to handle high volumes of data and ensure that the data is processed efficiently.
4. Process the data: Once you have set up the Kafka pipeline, you must process the data for analysis. This may involve cleaning and aggregating the data, performing calculations, and creating visualizations.
5. Test the solution: You must test your solution using the provided dummy json file. The file contains sample data that you can use to ensure that your Kafka pipeline is working correctly.

Here’s the dummy JSON file that represents our mobile money data.
{
"transaction_id": "12345",
"sender_phone_number": "256777123456",
"receiver_phone_number": "256772987654",
"transaction_amount": 100000,
"transaction_time": "2023-04-19 12:00:00"
}

Steps to setup the pipeline

1- Goto https://confluent.cloud/ and setup a kafka cluster and topic
2- Get the connection details for your cluster instance
3- In the attached .py file find the code section with below entries. Update the below connection details to reflect the connection details generated for your own confluence cluster instance.

bootstrap_servers = '#YOUR_URL#.confluent.cloud:9092'
security_protocol = 'SASL_SSL'
sasl_mechanism = 'PLAIN'
sasl_plain_username = '#YOUR_USERNAME#'
sasl_plain_password = '#YOUR_PASSWORD#'
topic = 'my_pipeline'

4- Run the .py file to start the streaming pipeline

========================================================================================

Thursday 27-April Project Brief: Visualizing streaming data with Streamlit

Introduction

In this project, you will create a real-time data visualization dashboard using Streamlit to analyze streaming data from Reddit to identify fraud in telecommunications. The project will involve connecting to Reddit's API, collecting real-time posts, processing the posts to extract useful information, and visualizing the data using Streamlit.

Problem Statement

Fraud in telecommunications is a significant problem that costs the industry billions of dollars annually. Fraudsters use various techniques to exploit telecom infrastructure weaknesses, including hacking into phone systems, stealing identities, and exploiting vulnerabilities in billing systems. The challenge for telecom companies is to detect and prevent fraud in real-time before it causes significant financial damage.

Your task is to develop a real-time data visualization dashboard that monitors Reddit for mentions of telecoms fraud and other related keywords, such as "telecoms scam", "phone fraud", "billing fraud", and "identity theft". You will extract useful information from the posts, such as the post text, user name, subreddit, and date/time, and use this information to analyze the data for patterns and trends related to telecom fraud.

Project Requirements

● Connect to Reddit's API and collect real-time posts related to telecom fraud and other related keywords.
● Process the posts to extract useful information, including the post text, user name, subreddit, and date/time.
● Analyze the data to identify patterns and trends related to telecom fraud and other related keywords.
● Use Streamlit to create an interactive data visualization dashboard that displays real-time information about telecom fraud and other related keywords.
● The dashboard should include at least one chart or graph that displays the data meaningfully, e.g., a bar chart showing the number of fraud mentions by subreddit or a line chart showing the frequency of fraud mentions over time.
● The dashboard should be easy to use and visually appealing, with clear and concise labels and instructions

Deliverables

● Python script to collect and process real-time posts from Reddit API.
● Interactive data visualization dashboard created using Streamlit.
● Deployment of the dashboard to a cloud-based platform.

Steps to access the dashboard

The application code is in file - streamlit_app.py
The libraries that need to be imported to run the dashboard are in file - requirements.txt
The dashboard is accessible at URL - https://joekibz-moringa-wk9-streamlit-app-sywmbo.streamlit.app/

joekibz / moringa-wk9 Goto Github PK

moringa-wk9's Introduction

Moringa-wk9 : Apache Kafka and Streamlit exploration

Wednesday 26-April Project Brief: Data Streaming with Kafka

Background: Telecommunications Mobile Money Data Engineering with Kafka

Thursday 27-April Project Brief: Visualizing streaming data with Streamlit

Introduction

Problem Statement

Project Requirements

Deliverables

Steps to access the dashboard

moringa-wk9's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs