Data Science, Machine Learning, Artificial Intelligence, and Big Data Resources

Welcome!

Here is a non-exaustive, work in progress repository of resources for data science, machine learning, artificial intelligence (AI), big data, and more.

Resources include all files in this repository, including the links given in this file. A hyperlinked table of contents is provided below for your convenience.

Note that links are listed in no particular order of preference or relevance. Well... maybe except for my blog :)

Blogs

- [InnoArchiTech](http://www.innoarchitech.com/) - [Flowing Data](http://flowingdata.com/) - [KDnuggets](http://www.kdnuggets.com/) - [R-bloggers](https://www.r-bloggers.com/) - [Analytics Vidhya](https://www.analyticsvidhya.com/blog/) - [Statistical Modeling, Causal Inference, and Social Science](http://andrewgelman.com/) - [Simply Statistics](http://simplystatistics.org/) - [Walking Randomly](http://www.walkingrandomly.com/) - [FastML](http://fastml.com/) - [No Free Hunch](http://blog.kaggle.com/) - [Machine Learning Mastery](http://machinelearningmastery.com/) - [Data Science Weekly](https://www.datascienceweekly.org/) - [Edwin Chen](http://blog.echen.me/) - [Harvard Data Science](http://harvarddatascience.com/) - [OpenAI](https://openai.com/blog/)

GitHub Repos

Cheats

- [GitHub markdown cheatsheet](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet) - [GitHub markdown guide](https://guides.github.com/features/mastering-markdown/) - [Machine learning algorithm cheat sheet](https://azure.microsoft.com/en-us/documentation/articles/machine-learning-algorithm-cheat-sheet/) - [11 Steps for Data Exploration in R](https://www.analyticsvidhya.com/blog/2015/10/cheatsheet-11-steps-data-exploration-with-codes/) - [AI Cheat Sheet](http://alexoner.github.io/AI-cheat-sheet/) - [Data Science Cheat Sheet](http://www.datasciencecentral.com/profiles/blogs/data-science-cheat-sheet) - [MIT Statistics Cheat Sheet](http://web.mit.edu/~csvoss/Public/usabo/stats_handout.pdf) - [Machine Intelligence 3.0](https://format-com-cld-res.cloudinary.com/image/private/s--RCb7PzQR--/c_crop,h_1500,w_2000,x_0,y_0/c_fill,g_center,h_855,w_1140/a_auto,dpr_2,fl_keep_iptc.progressive,q_95/v1/19575bcc040a6dcff3097618ec9c585e/MI-Landscape-3_7.png) - [Common probability distributions](http://blog.cloudera.com/wp-content/uploads/2015/12/distribution.png)

Web Resources

- [Data Science Weekly resources](https://www.datascienceweekly.org/data-science-resources) - [Data School resources](http://www.dataschool.io/resources/) - [Open Source Data Science Masters](http://datasciencemasters.org/) - [Open Source Data Science Masters - GitHub](https://github.com/datasciencemasters)

Datasets

- [Awesome Public Datasets](https://github.com/caesar0301/awesome-public-datasets) - [AWS Public Datasets](https://aws.amazon.com/datasets/) - [100+ Interesting Data Sets for Statistics](http://rs.io/100-interesting-data-sets-for-statistics/) - [Kaggle Datasets](https://www.kaggle.com/datasets) - [FiveThirtyEight data](https://github.com/fivethirtyeight/data) - [Google BigQuery Public Datasets](https://cloud.google.com/bigquery/public-data/) - [UCI Machine Learning Repository](http://archive.ics.uci.edu/ml/datasets.html) - [Stanford Large Network Dataset Collection](http://snap.stanford.edu/data/#!) - [THE MNIST DATABASE of handwritten digits](http://yann.lecun.com/exdb/mnist/) - [THE Wikipedia Corpus](http://corpus.byu.edu/wiki/)

IDEs

- [Sublime Text](https://www.sublimetext.com/) - [R Studio](https://support.rstudio.com/hc/en-us/categories/200035113-Documentation) - [R Studio](https://support.rstudio.com/hc/en-us/categories/200035113-Documentation) - [Rodeo](http://rodeo.yhat.com/docs/) - [Spyder](https://pythonhosted.org/spyder/)

Programming Languages and OS

- [Python](https://docs.python.org/3/) - [R](https://cran.r-project.org/manuals.html) - [JavaScript](https://developer.mozilla.org/en-US/docs/Web/JavaScript) - [SQL](https://en.wikipedia.org/wiki/SQL) - [Julia](http://docs.julialang.org/en/release-0.5/) - [Scala](http://docs.scala-lang.org/) - [Java](https://docs.oracle.com/javase) - [C++](http://devdocs.io/cpp/) - [HTML](https://developer.mozilla.org/en-US/docs/Web/HTML) - [CSS](https://developer.mozilla.org/en-US/docs/Web/CSS) - [Bash](http://ss64.com/bash/) - [Ubuntu](https://help.ubuntu.com/) - [JSON](http://www.json.org/) - [JSON-RPC](http://json-rpc.org/) - [YAML](http://yaml.org/spec/1.2/spec.html) - [Git](https://git-scm.com/documentation) - [Octave](https://www.gnu.org/software/octave/doc/interpreter/) - Scientific programming language

Databases, Big Data, and Cloud Services

- [AWS](https://aws.amazon.com/documentation/) - [Redshift](https://aws.amazon.com/documentation/redshift/) - Fast, simple, cost-effective data warehousing - [DynamoDB](https://aws.amazon.com/documentation/dynamodb/) - Fast and flexible NoSQL database service for any scale - [RDS](https://aws.amazon.com/documentation/rds/) - Amazon Relational Database Service + [Amazon Aurora](https://aws.amazon.com/rds/aurora/getting-started/) - MySQL-compatible relational database with 5X performance + [Oracle](https://docs.oracle.com/en/database/) + [Microsoft SQL Server](https://msdn.microsoft.com/en-us/library/mt590198(v=sql.1).aspx) + [PostgreSQL](https://www.postgresql.org/docs/) + [MySQL](https://dev.mysql.com/doc/) + [MariaDB](https://mariadb.org/learn/) + [Kinesis](https://aws.amazon.com/documentation/kinesis/) - Real-time streaming data in the AWS cloud * Firehouse - Easily load real-time streaming data into AWS * Analytics - Get actionable insights from streaming data in real-time * Streams - Build custom applications that process or analyze streaming data for specialized needs + [Amazon EMR](https://aws.amazon.com/documentation/elastic-mapreduce/) - Easily Run and Scale Apache Hadoop, Spark, HBase, Presto, Hive, and other Big Data Frameworks + [QuickSight](https://aws.amazon.com/documentation/quicksight/) - Fast, easy to use business analytics + [Machine Learning](https://aws.amazon.com/documentation/machine-learning/) + [IoT](https://aws.amazon.com/documentation/iot/) - Easily and securely connect devices to the cloud + [AWS Data Pipeline](https://aws.amazon.com/documentation/data-pipeline/) - Easily automate the movement and transformation of data - [Google Cloud Platform](https://cloud.google.com/docs/) + [BigQuery](https://cloud.google.com/bigquery/docs/) - Fully managed, petabyte scale, low cost analytics data warehouse + [Dataflow](https://cloud.google.com/dataflow/docs/) - A fully-managed cloud service and programming model for batch and streaming big data processing + [Dataproc](https://cloud.google.com/dataproc/docs/) - A managed Apache Spark and Apache Hadoop service that lets you take advantage of open source data tools for batch processing, querying, streaming, and machine learning + [Datalab](https://cloud.google.com/datalab/docs/) - An easy to use interactive tool for large-scale data exploration, analysis, and visualization + [Machine Learning](https://cloud.google.com/ml/docs/) - Machine Learning on any data, of any size + [Prediction API](https://cloud.google.com/prediction/docs/) - A RESTful API to build Machine Learning models + [Jobs API](https://cloud.google.com/jobs-api/) - Job search and discovery powered by machine learning + [Natural Language API](https://cloud.google.com/natural-language/docs/) - Provides natural language understanding technologies to developers, including sentiment analysis, entity recognition, and syntax analysis + [Speech API](https://cloud.google.com/speech/docs/) - Easy integration of Google speech recognition technologies into developer applications + [Translate API](https://cloud.google.com/translate/docs/) - Dynamically translate text between thousands of language pairs + [Vision API](https://cloud.google.com/vision/docs/) - Easily integrate vision detection features within applications, including image labeling, face and landmark detection, optical character recognition (OCR), and tagging of explicit content + [Pub/Sub](https://cloud.google.com/pubsub/docs/) - A fully-managed real-time messaging service that allows you to send and receive messages between independent applications - [Apache Foundation](https://www.apache.org/) + [Apache Projects List (by category)](https://projects.apache.org/projects.html?category) + [HBase](https://hbase.apache.org/book.html) - Apache HBase is the Hadoop database, a distributed, scalable, big data store + [Hadoop](http://hadoop.apache.org/docs/current/) - Open-source software for reliable, scalable, distributed computing + [Spark](http://spark.apache.org/docs/latest/) - A fast and general engine for large-scale data processing + [Hive](https://cwiki.apache.org/confluence/display/Hive/LanguageManual) - Data warehouse software that facilitates reading, writing, and managing large datasets residing in distributed storage using SQL + [Pig](http://pig.apache.org/docs/r0.16.0/) - A platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs + [Kylin](http://kylin.apache.org/docs15/) - An open source Distributed Analytics Engine designed to provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets, original contributed from eBay Inc + [Lens](http://lens.apache.org/user/index.html) - A unified analytics interface + [Ignite](https://apacheignite.readme.io/docs) - A high-performance, integrated and distributed in-memory platform for computing and transacting on large-scale data sets in real-time, orders of magnitude faster than possible with traditional disk-based or flash technologies + [Brooklyn](https://brooklyn.apache.org/documentation/index.html) - A framework for modeling, monitoring, and managing applications through autonomic blueprints + [Apex](https://apex.apache.org/docs.html) - Enterprise-grade unified stream and batch processing engine + [Tajo](http://tajo.apache.org/docs/current/index.html) - A robust big data relational and distributed data warehouse system for Apache Hadoop + [Tez](https://tez.apache.org/user_guides.html) - An application framework which allows for a complex directed-acyclic-graph of tasks for processing data + [Bigtop](http://bigtop.apache.org/) - Project for Infrastructure Engineers and Data Scientists looking for comprehensive packaging, testing, and configuration of the leading open source big data components + [REEF](http://reef.apache.org/introduction.html) - Apache REEF (Retainable Evaluator Execution Framework) is a library for developing portable applications for cluster resource managers such as Apache Hadoop YARN or Apache Mesos + [Storm](http://storm.apache.org/index.html) - A free and open source distributed realtime computation system + [Kafka](https://kafka.apache.org/) - A distributed streaming platform + [Sqoop](http://sqoop.apache.org/) - A tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases + [JMeter](http://jmeter.apache.org/) - Java application designed to load test functional behavior and measure performance - NoSQL + Document + [MongoDB](https://docs.mongodb.com/) - NoSQL document store + [CouchBase](http://developer.couchbase.com/documentation-archive) - A document database with a SQL-based query language that is engineered to deliver performance at scale + [CouchDB](http://docs.couchdb.org/en/2.0.0/) - NoSQL document store + [DynamoDB](https://aws.amazon.com/documentation/dynamodb/) - A fully managed NoSQL database service that provides fast and predictable performance with seamless scalability + [RethinkDB](https://rethinkdb.com/docs/) - RethinkDB is the open-source, scalable database that makes building realtime apps dramatically easier + [Azure's DocumentDB](https://docs.microsoft.com/en-us/azure/documentdb/) - A fully managed NoSQL database service built for fast and predictable performance, high availability, elastic scaling, global distribution, and ease of development + Column and wide-column (Big-table style) + [BigTable](https://cloud.google.com/bigtable/docs/) - Fast, fully managed, massively scalable NoSQL database service + [Cassandra](http://cassandra.apache.org/doc/latest/) - Free and open-source distributed database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure + [Druid](http://druid.io/docs/latest/design/index.html) - An open source data store designed for OLAP queries on event data + [HyperTable](http://www.hypertable.com/documentation/) - A high performance, open source, massively scalable database modeled after Bigtable + [HBase](https://hbase.apache.org/book.html) - Apache HBase is the Hadoop database, a distributed, scalable, big data store + [SimpleDB](https://aws.amazon.com/documentation/simpledb/) - A highly available, scalable, and flexible non-relational data store that enables you to store and query data items using web services requests + Key-value + [Redis](http://redis.io/documentation) - An open source (BSD licensed), in-memory data structure store, used as database, cache and message broker + [Memcache](https://memcached.org/) - High-performance, distributed memory object caching system + [Riak](https://docs.basho.com/) - Distributed NoSQL Database + [DynamoDB](https://aws.amazon.com/documentation/dynamodb/) - A fully managed NoSQL database service that provides fast and predictable performance with seamless scalability + [Azure Table Storage (ATS)](https://docs.microsoft.com/en-us/azure/storage/) + Graph + [Neo4j](https://neo4j.com/docs/) - World's fastest and most scalable graph database + [Titan](https://github.com/thinkaurelius/titan) - A highly scalable graph database optimized for storing and querying large graphs with billions of vertices and edges distributed across a multi-machine cluster * [OrientDB](http://orientdb.com/docs/last/) - A document-graph database, meaning it has full native graph capabilities coupled with features normally only found in document databases - RDBMS + [AWS RDS](https://aws.amazon.com/documentation/rds/) - Amazon Relational Database Service + [MySQL](https://dev.mysql.com/doc/) - Open source RDBMS + [PostgreSQL](https://www.postgresql.org/docs/) - Open-source Object-Relational DBMS supporting almost all SQL constructs + [SQLite](https://www.sqlite.org/docs.html) - A self-contained, high-reliability, embedded, full-featured, public-domain, SQL database engine + [Oracle Database](http://www.oracle.com/technetwork/database/index.html) + [Microsoft SQL Server](https://msdn.microsoft.com/en-us/library/mt590198(v=sql.1).aspx) - Static storage - [S3](https://aws.amazon.com/documentation/s3/) - Simple, durable, massively scalable object storage - Search and full-text - [ElasticSearch](https://www.elastic.co/guide/index.html) - Service that makes it easy to deploy, operate, and scale Elasticsearch in the AWS Cloud - [Apache Lucene](https://lucene.apache.org/core/documentation.html) - Java-based indexing and search technology, as well as spellchecking, hit highlighting and advanced analysis/tokenization capabilities - [Apache Solr](http://lucene.apache.org/solr/resources.html#documentation) - A high performance search server built using Lucene Core, with XML/HTTP and JSON/Python/Ruby APIs, hit highlighting, faceted search, caching, replication, and a web admin interface - [PyLucene](http://lucene.apache.org/pylucene/features.html) - A Python port of the Lucene Core project - Cache - [Memcache](https://memcached.org/) - High-performance, distributed memory object caching system - [Redis](http://redis.io/documentation) - An open source (BSD licensed), in-memory data structure store, used as database, cache and message broker - Time-series and event data + [InfluxDB](https://docs.influxdata.com/influxdb/v1.1/) - A time series database built from the ground up to handle high write and query loads + [Prometheus](https://prometheus.io/docs/introduction/overview/) - An open-source systems monitoring and alerting toolkit originally built at SoundCloud + [Druid](http://druid.io/docs/0.9.1.1/design/index.html) - An open source data store designed for OLAP queries on event data - Cloud services and APIs + [DataRobot](https://www.datarobot.com/) - Automated Machine Learning + [IBM Watson](http://www.ibm.com/watson/developercloud/doc/getting_started/) - Cognitive computing features in your app using IBM Watson's Language, Vision, Speech and Data APIs + [Microsoft Machine Learning](Machine Learning) - Powerful cloud based analytics

Platforms, Libraries, and Packages

- Deep learning and neural networks + [Torch](http://torch.ch/docs/getting-started.html#_) - A scientific computing framework with wide support for machine learning algorithms that puts GPUs first + [Caffe](http://caffe.berkeleyvision.org/) - A deep learning framework made with expression, speed, and modularity in mind + [DL4J](https://deeplearning4j.org/) - Open-Source, Distributed, Deep Learning Library for the JVM + [Theano](http://deeplearning.net/software/theano/) - Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently + [TensorFlow](https://www.tensorflow.org/versions/r0.11/api_docs/index.html) - Open source software library for numerical computation using data flow graphs + [Amazon Deep Scalable Sparse Tensor Network Engine (DSSTNE)](https://github.com/amznlabs/amazon-dsstne) - An Amazon developed library for building Deep Learning (DL) machine learning (ML) models + [Keras: Deep Learning library for Theano and TensorFlow](https://keras.io/) - A high-level neural networks library, written in Python and capable of running on top of either TensorFlow or Theano - [Weka](http://www.cs.waikato.ac.nz/ml/weka/documentation.html) - A collection of machine learning algorithms for data mining tasks - [Anaconda](https://docs.continuum.io/) - Open data science platform powered by Python - [Python(x,y)](http://python-xy.github.io/) - A free scientific and engineering development software for numerical computations, data analysis and data visualization based on Python programming language, Qt graphical user interfaces and Spyder interactive scientific development environment - Python + [IPython Documentation](http://ipython.readthedocs.io/en/stable/) - Comprehensive environment for interactive and exploratory computing + [Jupyter notebook](http://jupyter-notebook.readthedocs.io/en/latest/) - A web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text + [Matplotlib](http://matplotlib.org/) - A python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms + [Natural Language Toolkit](http://www.nltk.org/) - A leading platform for building Python programs to work with human language data + [Numpy](https://docs.scipy.org/doc/) - The fundamental package for scientific computing with Python + [Scipy](https://docs.scipy.org/doc/) - A Python-based ecosystem of open-source software for mathematics, science, and engineering + [Pandas](http://pandas.pydata.org/pandas-docs/stable/) - An open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language + [PyBrain](http://pybrain.org/) - Python-Based Reinforcement Learning, Artificial Intelligence and Neural Network Library + [Scikit-image](http://scikit-image.org/) - A collection of algorithms for image processing + [Scikit-learn](http://scikit-learn.org/stable/documentation.html) - A Python module for machine learning + [Seaborn](http://seaborn.pydata.org/api.html) - A Python visualization library based on matplotlib + [StatsModels](http://statsmodels.sourceforge.net/documentation.html) - A Python module that allows users to explore data, estimate statistical models, and perform statistical tests + [Pattern](https://github.com/clips/pattern) - Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization + [Scrapy](https://scrapy.org/doc/) - An open source and collaborative framework for extracting the data you need from websites + [ggplot](http://yhat.github.io/ggplot/docs.html) - A package for plotting in Python + [Altair](https://github.com/ellisonbg/altair) - Declarative statistical visualization library for Python An open source and collaborative framework for extracting the data you need from websites + [Bokeh](http://bokeh.pydata.org/en/latest/docs/user_guide.html) - A Python interactive visualization library that targets modern web browsers for presentation + [Basemap](http://matplotlib.org/basemap/users/index.html) - A library for plotting 2D data on maps in Python + [NetworkX](http://networkx.github.io/documentation.html) - A Python language software package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks + [Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) - A Python library for pulling data out of HTML and XML files + [Gensim](http://radimrehurek.com/gensim/) - Python framework for fast Vector Space Modelling + [Shogun](https://github.com/shogun-toolbox/shogun) - Machine learning toolbox that provides a wide range of unified and efficient Machine Learning (ML) methods + [Chainer](http://docs.chainer.org/en/stable/) - A Powerful, Flexible, and Intuitive Framework for Neural Networks + [NuPIC](http://numenta.org/#docs) - An open source project based on a theory of neocortex called Hierarchical Temporal Memory (HTM) + [Neon](http://neon.nervanasys.com/index.html/) - Python-based deep learning library + [PyMC](https://pymc-devs.github.io/pymc/index.html) - A python module that implements Bayesian statistical models and fitting algorithms, including Markov chain Monte Carlo + [Fuel](https://fuel.readthedocs.io/en/latest/) - A data pipeline framework which provides your machine learning models with the data they need + [PyMVPA](http://www.pymvpa.org/manual.html) - PyMVPA stands for MultiVariate Pattern Analysis (MVPA) in Python + [Deap](http://deap.gel.ulaval.ca/doc/default/index.html) - A novel evolutionary computation framework for rapid prototyping and testing of ideas + [Annoy](https://github.com/spotify/annoy) - Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk - R + [General CRAN List - By task](https://cran.r-project.org/web/views/) + [General CRAN List - NLP/Text analytics](https://cran.r-project.org/web/views/NaturalLanguageProcessing.html) + [General CRAN List](https://cran.r-project.org/web/views/MachineLearning.html) + [ggplot2](http://docs.ggplot2.org/current/) - A plotting system for R + [ISLR](https://cran.r-project.org/web/packages/ISLR/index.html) - The collection of datasets used in the book "An Introduction to Statistical Learning with Applications in R" + [Rcpp](https://cran.r-project.org/web/packages/Rcpp/index.html) - Provides R functions as well as C++ classes which offer a seamless integration of R and C++ + [dplyr](https://cran.r-project.org/web/packages/dplyr/index.html) - A fast, consistent tool for working with data frame like objects, both in memory and out of memory + [plyr](https://cran.r-project.org/web/packages/plyr/index.html) - A set of tools that solves a common set of problems + [stringr](https://cran.r-project.org/web/packages/stringr/index.html) - A consistent, simple and easy to use set of wrappers around the fantastic 'stringi' package + [shiny](https://cran.r-project.org/web/packages/shiny/index.html) - Easy to build interactive web applications with R + [knitr](https://cran.r-project.org/web/packages/knitr/index.html) - A general-purpose tool for dynamic report generation in R using Literate Programming techniques + [readr](https://cran.r-project.org/web/packages/readr/index.html) - Read flat/tabular text files from disk (or a connection) + [R Markdown](https://cran.r-project.org/web/packages/rmarkdown/index.html) - Convert R Markdown documents into a variety of formats + [tidyr](https://cran.r-project.org/web/packages/tidyr/index.html) - Data tidying (not general reshaping or aggregating) and works well with 'dplyr' data pipelines + [lubridate](https://cran.r-project.org/web/packages/lubridate/index.html) - Functions to work with date-times and time-spans + [lme4](https://cran.r-project.org/web/packages/lme4/index.html) - Fit linear and generalized linear mixed-effects models + [nlme](https://cran.r-project.org/web/packages/nlme/index.html) - Fit and compare Gaussian linear and nonlinear mixed-effects models + [mime](https://cran.r-project.org/web/packages/mime/index.html) - Guesses the MIME type from a filename extension using the data derived from /etc/mime.types in UNIX-type systems + [mda](https://cran.r-project.org/web/packages/mda/index.html) - Mixture and flexible discriminant analysis, multivariate adaptive regression splines (MARS), BRUTO, ... + [lasso2](https://cran.r-project.org/web/packages/lasso2/index.html) - Routines and documentation for solving regression problems while imposing an L1 constraint on the estimates + [lars](https://cran.r-project.org/web/packages/lars/index.html) - Efficient procedures for fitting an entire lasso sequence with the cost of a single least squares fit + [digest](https://cran.r-project.org/web/packages/digest/index.html) - Implementation of a function 'digest()' for the creation of hash digests of arbitrary R objects (using the 'md5', 'sha-1', 'sha-256', 'crc32', 'xxhash' and 'murmurhash' algorithms) permitting easy comparison of R language objects, as well as a function 'hmac()' to create hash-based message authentication code + [reshape2](https://cran.r-project.org/web/packages/reshape2/index.html) - Flexibly restructure and aggregate data using just two functions: melt and 'dcast' (or 'acast') + [colorspace](https://cran.r-project.org/web/packages/colorspace/index.html) - Carries out mapping between assorted color spaces including RGB, HSV, HLS, CIEXYZ, CIELUV, HCL (polar CIELUV), CIELAB and polar CIELAB + [RColorBrewer](https://cran.r-project.org/web/packages/RColorBrewer/index.html) - Provides color schemes for maps (and other graphics) + [manipulate](https://cran.r-project.org/web/packages/manipulate/index.html) - Interactive plotting functions for use within RStudio + [scales](https://cran.r-project.org/web/packages/scales/index.html) - Graphical scales map data to aesthetics, and provide methods for automatically determining breaks and labels for axes and legends + [labeling](https://cran.r-project.org/web/packages/labeling/index.html) - Provides a range of axis labeling algorithms + [proto](https://cran.r-project.org/web/packages/proto/index.html) - An object oriented system using object-based, also called prototype-based, rather than class-based object oriented ideas + [randomForest](https://cran.r-project.org/web/packages/randomForest/index.html) - Classification and regression based on a forest of trees using random inputs + [glmnet](https://cran.r-project.org/web/packages/glmnet/index.html) - Extremely efficient procedures for fitting the entire lasso or elastic-net regularization path for linear regression, logistic and multinomial regression models, Poisson regression and the Cox model + [caret](https://cran.r-project.org/web/packages/caret/index.html) - Misc functions for training and plotting classification and regression models + [ggvis](https://cran.r-project.org/web/packages/ggvis/index.html) - An implementation of an interactive grammar of graphics, taking the best parts of 'ggplot2', combining them with the reactive framework of 'shiny' and drawing web graphics using 'vega' + [rgl](https://cran.r-project.org/web/packages/rgl/index.html) - Provides medium to high level functions for 3D interactive graphics, including functions modelled on base graphics (plot3d(), etc.) as well as functions for constructing representations of geometric objects (cube3d(), etc.) + [htmlwidgets](https://cran.r-project.org/web/packages/htmlwidgets/index.html) - A framework for creating HTML widgets that render in various contexts including the R console, 'R Markdown' documents, and 'Shiny' web applications + [leaflet](https://cran.r-project.org/web/packages/leaflet/index.html) - Create and customize interactive maps using the 'Leaflet' JavaScript library and the 'htmlwidgets' package + [dygraphs](https://cran.r-project.org/web/packages/dygraphs/index.html) - An R interface to the 'dygraphs' JavaScript charting library + [googleVis](https://cran.r-project.org/web/packages/googleVis/index.html) - R interface to Google Charts API, allowing users to create interactive charts based on data frames + [zoo](https://cran.r-project.org/web/packages/zoo/index.html) - An S3 class with methods for totally ordered indexed observations. It is particularly aimed at irregular time series of numeric vectors/matrices and factors + [RCurl](https://cran.r-project.org/web/packages/RCurl/index.html) - A wrapper for 'libcurl' Provides functions to allow one to compose general HTTP requests and provides convenient functions to fetch URIs, get & post forms, etc. and process the results returned by the Web server + [jsonlite](https://cran.r-project.org/web/packages/jsonlite/index.html) - A fast JSON parser and generator optimized for statistical data and the web + [bitops](https://cran.r-project.org/web/packages/bitops/index.html) - Functions for bitwise operations on integer vectors + [devtools](https://cran.r-project.org/web/packages/devtools/index.html) - Collection of package development tools + [magrittr](https://cran.r-project.org/web/packages/magrittr/index.html) - Provides a mechanism for chaining commands with a new forward-pipe operator, %>% + [packrat](https://cran.r-project.org/web/packages/packrat/index.html) - Manage the R packages your project depends on in an isolated, portable, and reproducible way + [Haven](https://cran.r-project.org/web/packages/haven/index.html) - Import foreign statistical formats into R via the embedded 'ReadStat' C library + [DT](https://cran.r-project.org/web/packages/DT/index.html) - Data objects in R can be rendered as HTML tables using the JavaScript library 'DataTables' (typically via R Markdown or Shiny) + [MICE](https://cran.r-project.org/web/packages/mice/index.html) - Multiple imputation using Fully Conditional Specification (FCS) implemented by the MICE algorithm + [rpart](https://cran.r-project.org/web/packages/rpart/index.html) - Recursive partitioning for classification, regression and survival trees + [party](https://cran.r-project.org/web/packages/party/index.html) - A computational toolbox for recursive partitioning + [nnet](https://cran.r-project.org/web/packages/nnet/index.html) - Software for feed-forward neural networks with a single hidden layer, and for multinomial log-linear models + [e1071](https://cran.r-project.org/web/packages/e1071/index.html) - Functions for latent class analysis, short time Fourier transform, fuzzy clustering, support vector machines, shortest path computation, bagged clustering, naive Bayes classifier, ... + [kernlab](https://cran.r-project.org/web/packages/kernlab/index.html) - Kernel-based machine learning methods for classification, regression, clustering, novelty detection, quantile regression and dimensionality reduction + [gbm](https://cran.r-project.org/web/packages/gbm/index.html) - Includes regression methods for least squares, absolute loss, t-distribution loss, quantile regression, logistic, multinomial logistic, Poisson, Cox proportional hazards partial likelihood, AdaBoost exponential loss, Huberized hinge loss, and Learning to Rank measures (LambdaMart) + [wordcloud](https://cran.r-project.org/web/packages/wordcloud/index.html) - Pretty word clouds + [c50](https://cran.r-project.org/web/packages/C50/index.html) - C5.0 decision trees and rule-based models for pattern recognition + [class](https://cran.r-project.org/web/packages/class/index.html) - Various functions for classification, including k-nearest neighbour, Learning Vector Quantization and Self-Organizing Maps + [neuralnet](https://cran.r-project.org/web/packages/neuralnet/index.html) - Training of neural networks using backpropagation, resilient backpropagation with (Riedmiller, 1994) or without weight backtracking (Riedmiller and Braun, 1993) or the modified globally convergent version by Anastasiadis et al. (2005) + [tm](https://cran.r-project.org/web/packages/tm/index.html) - A framework for text mining applications within R + [gmodels](https://cran.r-project.org/web/packages/gmodels/index.html) - Various R programming tools for model fitting + [rodbc](https://cran.r-project.org/web/packages/RODBC/index.html) - An ODBC database interface + [princurve](https://cran.r-project.org/web/packages/princurve/index.html) - Fits a principal curve to a data matrix in arbitrary dimensions - Analytics + [Segment's Analytics.js](https://github.com/segmentio/analytics.js) + [Snowplow](http://snowplowanalytics.com/guides/)

Cloud/SaaS/PaaS/IaaS

- [AWS](https://aws.amazon.com/documentation/) + [Lambda](https://aws.amazon.com/documentation/lambda/) - Serverless compute. AWS Lambda lets you run code without provisioning or managing servers + [EC2](https://aws.amazon.com/documentation/ec2/) - Web service that provides resizable compute capacity in the cloud + [Elastic Beanstalk](https://aws.amazon.com/documentation/elastic-beanstalk/) - Deploy and scale web applications and services + [ElastiCache](https://aws.amazon.com/documentation/elasticache/) - Web service that makes it easy to deploy, operate, and scale an in-memory data store or cache in the cloud + [Amazon Simple Notification Service (SNS)](https://aws.amazon.com/documentation/sns/) - Fully managed and highly scalable push messaging + [Amazon Simple Email Service (Amazon SES)](https://aws.amazon.com/documentation/ses/) - Reliable, cost-effective email platform + [Amazon Simple Queue Service (SQS)](https://aws.amazon.com/documentation/sqs/) - A fast, reliable, scalable, fully managed message queuing service - [Google Cloud Platform](https://cloud.google.com/docs/) - [Microsoft](https://docs.microsoft.com/en-us/azure/) - [Digital Ocean](https://developers.digitalocean.com/documentation/)