This project is a microservice for a hypothetical social media analytics platform, implemented in Python using Django. The service provides APIs for creating, retrieving, and analyzing social media posts.
- Installation
- API Endpoints
- Database Configuration
- Cache Configuration
- Rate Limiting
- Running the Application
- Scalability Considerations
- Infrastructure Considerations
-
Clone the repository:
git clone https://github.com/yourusername/social-media-analytics.git
-
Install the requirements
pip install -r requirements.txt
-
Post Creation (POST /api/v1/posts/) Accepts a JSON payload with text content and a unique identifier to create a new social media post.
Example:
curl -X POST -H "Content-Type: application/json" -d '{"id": "123", "content": "This is a sample post."}' http://localhost:8000/api/v1/posts/
-
Get Analysis (GET /api/v1/{id}/analysis) Provides an analysis endpoint that returns the number of words and average word length in a post.
Example:
curl http://localhost:8000/api/v1/posts/123/analysis/
Configure your database settings in settings.py. The project currently uses MySQL for local development. The same may be used for production due to its robustness and scalibility.
# settings.py
# modify these settings according to your database
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.mysql',
'NAME': 'social_media_analytics',
'USER': 'admin',
'PASSWORD': 'admin',
'HOST': 'localhost',
'PORT': '3306',
'OPTIONS': {
'charset': 'utf8mb4',
},
}
}
The project utilizes Django's caching framework. Adjust caching settings in settings.py and use the @cache_page decorator in views.
# settings.py
CACHES = {
'default': {
'BACKEND': 'django.core.cache.backends.db.DatabaseCache',
'LOCATION': 'analytics_post_cache',
}
}
# views.py
@cache_page(60 * 15) # Cache for 15 minutes (adjust as needed)
# function/endpoint to cache
Rate limiting is implemented using the django-ratelimit package. Adjust rate limits in views using the @ratelimit decorator.
# views.py
@ratelimit(key='ip', rate='1/s', block=True)
# function/endpoint to ratelimit
-
Run migrations
python manage.py migrate
-
Start the development server
python manage.py runserver
-
Use the IP address http://localhost:8000 to access the API.
Examples:
-
Handling large amounts of post data and high request volumes
- Using a database that scales well with data requirements, preferably horizontal scaling for optimised costs. PostgreSQL and MySQL are two good choices for this purpose.
- Caching the queries to avoid unnecessary calls to the large database in case of repeated queries.
- Rate limiting to restrict the frequency at which a certain IP is allowed to access the server
-
Parallelizing the analysis computation
- Asynchronous processing to execute multiple queries simultaneously
- Bacth processing to process multiple Posts at a time instead of a single post.
-
Database Django provides a batteries-included approach with built in modules to interface with many popular databases like SQLite, PostgreSQL, MySQL etc. Since the key consideration of the project is scalability, we opt for a database that has good community support and is able to scale well horizontally. MongoDB and Cassandra offer easy horizontal scaling, whereas PostgreSQL and MySQL are more robust but may require more careful planning for horizontal scaling.
-
Traffic Spikes
- Load testing each update to the service before deploying it to production.
- Using load balancers in production to avoid bottlenecking
- Techniques like caching, async processing and rate limiting.
- Content compression
-
Availability and Fault Tolerance of the Service
- Distributed architecture
- Redundant storage of critical data
- Database replication
- Service redundancy
-
Security of the Data
- Authentication and authorization (role based access)
- Data encryption in transport layer
- Input validation and sanity checks
- Rate limiting to prevent DOS attacks
- Regular data backups
-
Logging, Monitoring and Alerting
- Multi-level logging approach with contextual information
- Monitoring system metrics, applications metrics, health checks etc.
- System and service availability monitoring
- Threshold based alert triggers
- Anomaly detection alerts
- Severity levels for alerts (Caution, Warning, Critical etc)
-
Hosting Providers and Services
- Microsoft Azure and Amazon Web Services are considerable choices, due to their wide community support and thorough documentation.
- I, personally, would opt for Microsoft Azure given the scalability provided by blob storage, availability of both SQL and NoSQL databases and intuitive logging and monitoring interfaces for system performance.