This project builds a data warehouse to store and analyze data related to Ethiopian medical businesses scraped from public Telegram channels. It includes pipelines for data scraping, cleaning, object detection using YOLO, and exposing the collected data via FastAPI. The system is designed to be scalable, reliable, and insightful.
- Extract data from Telegram channels using
telethon
and custom Python scripts. - Target channels include:
- DoctorsET
- Chemed Telegram Channel
- Yetena Weg
- EAHCI
- Additional channels from Telegram Stats.
- Collect and store images for object detection.
- Perform cleaning operations:
- Remove duplicates.
- Handle missing values.
- Standardize formats.
- Transform data using DBT (Data Build Tool) for SQL-based processing.
- Detect objects in images from Telegram channels using YOLO.
- Process detection results for bounding boxes, confidence scores, and class labels.
- Store extracted insights in the database.
- Centralized storage for cleaned and enriched data.
- Facilitate advanced analytics to identify trends, patterns, and insights.
- RESTful API endpoints for CRUD operations.
- Integrate with SQLAlchemy for database management.
- Languages: Python
- Libraries and Frameworks:
- Data Scraping:
telethon
- Data Transformation:
DBT
,SQLAlchemy
- Object Detection: YOLO (
PyTorch
,OpenCV
) - API Development:
FastAPI
,Uvicorn
- Data Scraping:
- Database: PostgreSQL (or similar relational database)
- Logging & Monitoring: Custom logging for pipeline tracking.
This solution provides actionable intelligence about Ethiopian medical businesses by:
- Centralizing fragmented data scraped from Telegram channels.
- Enhancing analysis with object detection.
- Supporting fast, reliable decision-making through structured and queryable data.
git clone https://github.com/your-repo-name.git
cd Kara-Solutions-main
- Install dependencies:
pip install -r requirements.txt
- Configure database settings in
database.py
.
- Execute scripts for data scraping, cleaning, and transformation.
- Clone the YOLO repository:
git clone https://github.com/ultralytics/yolov5.git cd yolov5 pip install -r requirements.txt
- Start the server:
uvicorn scripts.main:app --reload
- Data scraping and transformation pipelines.
- Object detection insights from Telegram images.
- Scalable data warehouse with ETL/ELT processes.
- RESTful API for data access and management.
Contributions are welcome! Please fork the repository and submit a pull request for feature suggestions or bug fixes.
Leave a Reply