- Save audio messages from the chat window to database (DBMS or hard drive) by user’s character id of the account the message is associated with
- Convert all audio messages in .wav format with 16khz frequency
- Save only those sent picture images that contain face.
There are three types of bots: rule-based, AI powered using ML algorithms and hybrid. Hybrid chatbots combine the best of both worlds. It will use predefined rules for storing and converting audio messages and machine learning for identifying face on the photo.
WhatsApp will be selected as the channel. Other option included Telegram
-
Twilio is a cloud communications platform as a service (CPaaS) company which allows software developers to programmatically make and receive phone calls, send and receive text messages, and perform other communication functions using its web service APIs. With the Twilio API for WhatsApp, you can send notifications, have two-way conversations, or build chatbots. Twilio
-
Flask is a micro web framework written for Python. It’s lightweight, open source and offers a small and easily extensible core. It’s used primarily to develop minimalistic web applications and Rest APIs.
-
Ngrok allows you to create secure ingress to any app, device or service without spending hours learning arcane networking technologies. Ngrok
-
pydub is a Python library that provides easy manipulation of audio files.
-
PyTorch offers pre-trained models for various computer vision tasks, including face detection. One popular face detection model in PyTorch is the SSD (Single Shot MultiBox Detector) model.
python -m venv <env_name>
or
virtualenv <env_name>
pip install -r requirements.txt
pip install twilio flask
Set up a Twilio account
- Sign up for a Twilio account if you don’t have one.
- Obtain your Account SID and Auth Token from the Twilio dashboard and save them .env file
- Obtain the temporary telephone number for the testing purpose
Register and install ngrok
Follow instructions from ngrok website
Creating Flask application that processes media files sent through WhatsApp. It should use Twilio API to receive and process the media files. The application checks if the received media file is an audio or an image and then processes it accordingly.
For audio files it saves the file to a specific directory with a unique filename. It also converts the audio file to a WAV format with 16kHz sampling rate.
For image file it uses a pre-trained SSD300 model for face detection. It loads the model, preprocesses the image and then performs face detection. If any faces are detected, it saves the image with the detected faces.
The Application should also include error handling to catch any exceptions that may occur during the processing of the media files.
ngrok http 5000
- Copy endpoint URL from ngrok dashboard and paste it to Twilio Sanbox settings page adding /whatsapp at the end of URL (for example https://123456.ngrok-free.app/whatsapp)
- Save configuration in Twilio
- Save copy of the NGROK_URL to .env file
- Note that ngrok URLs are dynamically generated and do not remain the same across sessions. The above steps need to be repeated with every new session.
python bot.py
- Send an audio file in WhatsApp to the number provided by Twilio
- Check the terminal for the bot response
- Check if the file has been saved in audio_files folder
- Send a picture in WhatsApp to the same number containing a face
- Check the terminal for the bot response
- Check if the picture has been saved in images folder
- Repeat the above steps for image without face
- If false positive images are saved to the folder, adjust threshold parameter in save_image_with_face_detection() function