๐ก๏ธ API Security: Access Behavior and Anomaly Detection With Boosting Algorithms (AdaBoost, Gradient Boost and XGBoost)
Welcome to the Cybersecurity Machine Learning project repository! This project delves into the world of cybersecurity, employing various machine learning algorithms to analyze and interpret data related to security behaviors and patterns.
The primary goal of this project is to leverage machine learning techniques to scrutinize cybersecurity datasets, focusing on features like inter_api_access_duration(sec)
, api_access_uniqueness
, sequence_length(count)
, vsession_duration(min)
, ip_type
, behavior
, behavior_type
, num_sessions
, num_users
, num_unique_apis
, and more.
The dataset serves as a window into the intricate dynamics between microservice-based applications and their gateway to the digital worldโApplication Programming Interfaces (APIs). In this realm, APIs act as the linchpin, facilitating seamless interactions between applications and programmatic functions.
Here, I delve into the vulnerabilities that lie within this seemingly robust structure. Attackers often exploit these APIs, manipulating the underlying business logic. This dataset sheds light on the stark differences in user behaviorโdrawing a clear line between normal routines and the subtle yet impactful maneuvers of attackers.
With hundreds of APIs orchestrating intricate sequences, variability emerges from various sources: browser refreshes, network errors, or session inconsistencies. These complexities weave together to form access graphsโa reflection of user behavior over time. Analyzing these graphs unveils attack patterns and anomalies, offering a unique perspective into understanding security threats.
- Best Hyperparameters:
{'learning_rate': 0.2, 'n_estimators': 200}
- Accuracy: Achieved an accuracy score of 83%.
- ROC-AUC: Demonstrated a strong ROC-AUC of 0.93, indicating a robust performance in classification tasks.
- Best Hyperparameters:
{'learning_rate': 0.1, 'max_depth': 5, 'min_samples_split': 4, 'n_estimators': 200}
- Accuracy: Attained an accuracy score of 86%.
- ROC-AUC: Exhibited a commendable ROC-AUC of 0.94, reflecting excellent discriminative capability between classes.
- Best Hyperparameters:
{'learning_rate': 0.1, 'max_depth': 7, 'min_child_weight': 5, 'n_estimators': 200}
- Accuracy: Achieved an accuracy score of 86%.
- ROC-AUC: Demonstrated an impressive ROC-AUC of 0.95, showcasing superior performance in distinguishing classes.
These models showcase competitive accuracy and robustness in their ability to classify instances, with XGBoost leading in both accuracy and ROC-AUC among the evaluated algorithms.
-
Chi-Square Test ๐งฎ
ip_type
: p-value = 6.58e-50behavior_type
: p-value = 0.0source
: p-value = 0.0type_ip
: p-value = 6.58e-50type_behaviour
: p-value = 0.0source_type
: p-value = 0.0
-
T-Test ๐
inter_api_access_duration(sec)
vsvsession_duration(min)
: p-value = 1.07e-07
-
Linear Regression ๐
- R-squared: 0.153
- AdaBoost: 0.93
- Gradient Boosting: 0.94
- XGBoost: 0.95
- Clone the repository to your local machine.
- Install the dependencies specified in
requirements.txt
. - Explore the notebooks and Python scripts for analysis and model implementation.
- For more detailed insights, refer to individual files.
The provided information serves as a summary. For a comprehensive understanding, refer to specific notebooks and analysis files available within the repository.