Elie A.'s Projects
A proof of concept converter for Apache Zeppelin notebooks.
A curated list of awesome Apache Spark packages and resources.
A minimal benchmark of various tools (statistical software, databases etc.) for working with tabular data of moderately large sizes (interactive data analysis).
A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).
MapReduce Application for processing universities raw data and write processed data to elasticsearch
Code for the Recsys 2018 paper entitled Causal Embeddings for Recommandation.
Complex Networks tools
Approximate Nearest Neighbors in Spark
The Leek group guide to data sharing
An implementation of DBSCAN runing on top of Apache Spark
HTML docs for all elasticseach projects
My configuration files (.bashrc, .profile, .gitconfig, etc)
Elastica is a PHP client for elasticsearch
Open Source, Distributed, RESTful Search Engine
Elasticsearch real-time search and analytics natively integrated with Hadoop
Mahout Taste-based recommendation on Elasticsearch
Score documents with pure dot product / cosine similarity with ES
ElasticSearch Java Sample
PHP Elasticsearch Indexer using Elastica (This repo is more like an example of how to use elastica)
A framework for writing algorithms and perform measurements on their execution
PageRank implementation using Apache Giraph graph processing system
A Graph Theory Framework
Example code for running R on Hadoop
This page is a summary to keep the track of Hadoop related projects, and relevant projects around Big Data scene focused on the open source, free software enviroment.