GithubHelp home page GithubHelp logo

krzjoa / awesome-python-data-science Goto Github PK

View Code? Open in Web Editor NEW
2.3K 55.0 325.0 1.41 MB

Probably the best curated list of data science software in Python.

Home Page: https://krzjoa.github.io/awesome-python-data-science

License: Creative Commons Attribution 4.0 International

awesome awesome-list awesome-python scikit-learn python deep-learning machine-learning data-science data-visualization data-analysis statistics

awesome-python-data-science's Introduction

pyds


Awesome Python Data Science


Probably the best curated list of data science software in Python

Contents

Machine Learning

General Purpose Machine Learning

  • scikit-learn - Machine learning in Python. sklearn
  • PyCaret - An open-source, low-code machine learning library in Python. R inspired lib
  • Shogun - Machine learning toolbox.
  • xLearn - High Performance, Easy-to-use, and Scalable Machine Learning Package.
  • cuML - RAPIDS Machine Learning Library. sklearn GPU accelerated
  • modAL - Modular active learning framework for Python3. sklearn
  • Sparkit-learn - PySpark + scikit-learn = Sparkit-learn. sklearn Apache Spark based
  • mlpack - A scalable C++ machine learning library (Python bindings).
  • dlib - Toolkit for making real-world machine learning and data analysis applications in C++ (Python bindings).
  • MLxtend - Extension and helper modules for Python's data analysis and machine learning libraries. sklearn
  • hyperlearn - 50%+ Faster, 50%+ less RAM usage, GPU support re-written Sklearn, Statsmodels. sklearn PyTorch based/compatible
  • Reproducible Experiment Platform (REP) - Machine Learning toolbox for Humans. sklearn
  • scikit-multilearn - Multi-label classification for python. sklearn
  • seqlearn - Sequence classification toolkit for Python. sklearn
  • pystruct - Simple structured learning framework for Python. sklearn
  • sklearn-expertsys - Highly interpretable classifiers for scikit learn. sklearn
  • RuleFit - Implementation of the rulefit. sklearn
  • metric-learn - Metric learning algorithms in Python. sklearn
  • pyGAM - Generalized Additive Models in Python.
  • causalml - Uplift modeling and causal inference with machine learning algorithms. sklearn

Gradient Boosting

  • XGBoost - Scalable, Portable, and Distributed Gradient Boosting. sklearn GPU accelerated
  • LightGBM - A fast, distributed, high-performance gradient boosting. sklearn GPU accelerated
  • CatBoost - An open-source gradient boosting on decision trees library. sklearn GPU accelerated
  • ThunderGBM - Fast GBDTs and Random Forests on GPUs. sklearn GPU accelerated
  • NGBoost - Natural Gradient Boosting for Probabilistic Prediction.
  • TensorFlow Decision Forests - A collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models in Keras. keras TensorFlow

Ensemble Methods

  • ML-Ensemble - High performance ensemble learning. sklearn
  • Stacking - Simple and useful stacking library written in Python. sklearn
  • stacked_generalization - Library for machine learning stacking generalization. sklearn
  • vecstack - Python package for stacking (machine learning technique). sklearn

Imbalanced Datasets

  • imbalanced-learn - Module to perform under-sampling and over-sampling with various techniques. sklearn
  • imbalanced-algorithms - Python-based implementations of algorithms for learning on imbalanced data. sklearn sklearn

Random Forests

Kernel Methods

  • pyFM - Factorization machines in python. sklearn
  • fastFM - A library for Factorization Machines. sklearn
  • tffm - TensorFlow implementation of an arbitrary order Factorization Machine. sklearn sklearn
  • liquidSVM - An implementation of SVMs.
  • scikit-rvm - Relevance Vector Machine implementation using the scikit-learn API. sklearn
  • ThunderSVM - A fast SVM Library on GPUs and CPUs. sklearn GPU accelerated

Deep Learning

PyTorch

  • PyTorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration. PyTorch based/compatible
  • pytorch-lightning - PyTorch Lightning is just organized PyTorch. PyTorch based/compatible
  • ignite - High-level library to help with training neural networks in PyTorch. PyTorch based/compatible
  • skorch - A scikit-learn compatible neural network library that wraps PyTorch. sklearn PyTorch based/compatible
  • Catalyst - High-level utils for PyTorch DL & RL research. PyTorch based/compatible
  • ChemicalX - A PyTorch-based deep learning library for drug pair scoring. PyTorch based/compatible

TensorFlow

  • TensorFlow - Computation using data flow graphs for scalable machine learning by Google. sklearn
  • TensorLayer - Deep Learning and Reinforcement Learning Library for Researcher and Engineer. sklearn
  • TFLearn - Deep learning library featuring a higher-level API for TensorFlow. sklearn
  • Sonnet - TensorFlow-based neural network library. sklearn
  • tensorpack - A Neural Net Training Interface on TensorFlow. sklearn
  • Polyaxon - A platform that helps you build, manage and monitor deep learning models. sklearn
  • tfdeploy - Deploy TensorFlow graphs for fast evaluation and export to TensorFlow-less environments running numpy. sklearn
  • tensorflow-upstream - TensorFlow ROCm port. sklearn Possible to run on AMD GPU
  • TensorFlow Fold - Deep learning with dynamic computation graphs in TensorFlow. sklearn
  • TensorLight - A high-level framework for TensorFlow. sklearn
  • Mesh TensorFlow - Model Parallelism Made Easier. sklearn
  • Ludwig - A toolbox that allows one to train and test deep learning models without the need to write code. sklearn
  • Keras - A high-level neural networks API running on top of TensorFlow. Keras compatible
  • keras-contrib - Keras community contributions. Keras compatible
  • Hyperas - Keras + Hyperopt: A straightforward wrapper for a convenient hyperparameter. Keras compatible
  • Elephas - Distributed Deep learning with Keras & Spark. Keras compatible
  • qkeras - A quantization deep learning library. Keras compatible

MXNet

  • MXNet - Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler. MXNet based
  • Gluon - A clear, concise, simple yet powerful and efficient API for deep learning (now included in MXNet). MXNet based
  • Xfer - Transfer Learning library for Deep Neural Networks. MXNet based
  • MXNet - HIP Port of MXNet. MXNet based Possible to run on AMD GPU

JAX

  • JAX - Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more.
  • FLAX - A neural network library for JAX that is designed for flexibility.
  • Optax - A gradient processing and optimization library for JAX.

Others

  • transformers - State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX. PyTorch based/compatible sklearn
  • Tangent - Source-to-Source Debuggable Derivatives in Pure Python.
  • autograd - Efficiently computes derivatives of numpy code.
  • Caffe - A fast open framework for deep learning.
  • nnabla - Neural Network Libraries by Sony.

Automated Machine Learning

  • auto-sklearn - An AutoML toolkit and a drop-in replacement for a scikit-learn estimator. sklearn
  • Auto-PyTorch - Automatic architecture search and hyperparameter optimization for PyTorch. PyTorch based/compatible
  • AutoKeras - AutoML library for deep learning. Keras compatible
  • AutoGluon - AutoML for Image, Text, Tabular, Time-Series, and MultiModal Data.
  • TPOT - AutoML tool that optimizes machine learning pipelines using genetic programming. sklearn
  • MLBox - A powerful Automated Machine Learning python library.

Natural Language Processing

  • torchtext - Data loaders and abstractions for text and NLP. PyTorch based/compatible
  • gluon-nlp - NLP made easy. MXNet based
  • KerasNLP - Modular Natural Language Processing workflows with Keras. Keras based/compatible
  • spaCy - Industrial-Strength Natural Language Processing.
  • NLTK - Modules, data sets, and tutorials supporting research and development in Natural Language Processing.
  • CLTK - The Classical Language Toolkik.
  • gensim - Topic Modelling for Humans.
  • pyMorfologik - Python binding for Morfologik.
  • skift - Scikit-learn wrappers for Python fastText. sklearn
  • Phonemizer - Simple text-to-phonemes converter for multiple languages.
  • flair - Very simple framework for state-of-the-art NLP.

Computer Audition

  • torchaudio - An audio library for PyTorch. PyTorch based/compatible
  • librosa - Python library for audio and music analysis.
  • Yaafe - Audio features extraction.
  • aubio - A library for audio and music analysis.
  • Essentia - Library for audio and music analysis, description, and synthesis.
  • LibXtract - A simple, portable, lightweight library of audio feature extraction functions.
  • Marsyas - Music Analysis, Retrieval, and Synthesis for Audio Signals.
  • muda - A library for augmenting annotated audio data.
  • madmom - Python audio and music signal processing library.

Computer Vision

  • torchvision - Datasets, Transforms, and Models specific to Computer Vision. PyTorch based/compatible
  • PyTorch3D - PyTorch3D is FAIR's library of reusable components for deep learning with 3D data. PyTorch based/compatible
  • gluon-cv - Provides implementations of the state-of-the-art deep learning models in computer vision. MXNet based
  • KerasCV - Industry-strength Computer Vision workflows with Keras. MXNet based
  • OpenCV - Open Source Computer Vision Library.
  • Decord - An efficient video loader for deep learning with smart shuffling that's super easy to digest.
  • MMEngine - OpenMMLab Foundational Library for Training Deep Learning Models. PyTorch based/compatible
  • scikit-image - Image Processing SciKit (Toolbox for SciPy).
  • imgaug - Image augmentation for machine learning experiments.
  • imgaug_extension - Additional augmentations for imgaug.
  • Augmentor - Image augmentation library in Python for machine learning.
  • albumentations - Fast image augmentation library and easy-to-use wrapper around other libraries.
  • LAVIS - A One-stop Library for Language-Vision Intelligence.

Time Series

  • sktime - A unified framework for machine learning with time series. sklearn
  • darts - A python library for easy manipulation and forecasting of time series.
  • statsforecast - Lightning fast forecasting with statistical and econometric models.
  • mlforecast - Scalable machine learning-based time series forecasting.
  • neuralforecast - Scalable machine learning-based time series forecasting.
  • tslearn - Machine learning toolkit dedicated to time-series data. sklearn
  • tick - Module for statistical learning, with a particular emphasis on time-dependent modeling. sklearn
  • greykite - A flexible, intuitive, and fast forecasting library next.
  • Prophet - Automatic Forecasting Procedure.
  • PyFlux - Open source time series library for Python.
  • bayesloop - Probabilistic programming framework that facilitates objective model selection for time-varying parameter models.
  • luminol - Anomaly Detection and Correlation library.
  • dateutil - Powerful extensions to the standard datetime module
  • maya - makes it very easy to parse a string and for changing timezones
  • Chaos Genius - ML powered analytics engine for outlier/anomaly detection and root cause analysis

Reinforcement Learning

  • Gymnasium - An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym).
  • PettingZoo - An API standard for multi-agent reinforcement learning environments, with popular reference environments and related utilities.
  • MAgent2 - An engine for high performance multi-agent environments with very large numbers of agents, along with a set of reference environments.
  • Stable Baselines3 - A set of improved implementations of reinforcement learning algorithms based on OpenAI Baselines.
  • Shimmy - An API conversion tool for popular external reinforcement learning environments.
  • EnvPool - C++-based high-performance parallel environment execution engine (vectorized env) for general RL environments.
  • RLlib - Scalable Reinforcement Learning.
  • Tianshou - An elegant PyTorch deep reinforcement learning library. PyTorch based/compatible
  • Acme - A library of reinforcement learning components and agents.
  • Catalyst-RL - PyTorch framework for RL research. PyTorch based/compatible
  • d3rlpy - An offline deep reinforcement learning library.
  • DI-engine - OpenDILab Decision AI Engine. PyTorch based/compatible
  • TF-Agents - A library for Reinforcement Learning in TensorFlow. TensorFlow
  • TensorForce - A TensorFlow library for applied reinforcement learning. TensorFlow
  • TRFL - TensorFlow Reinforcement Learning. sklearn
  • Dopamine - A research framework for fast prototyping of reinforcement learning algorithms.
  • keras-rl - Deep Reinforcement Learning for Keras. Keras compatible
  • garage - A toolkit for reproducible reinforcement learning research.
  • Horizon - A platform for Applied Reinforcement Learning.
  • rlpyt - Reinforcement Learning in PyTorch. PyTorch based/compatible
  • cleanrl - High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG).
  • Machin - A reinforcement library designed for pytorch. PyTorch based/compatible
  • SKRL - Modular reinforcement learning library (on PyTorch and JAX) with support for NVIDIA Isaac Gym, Isaac Orbit and Omniverse Isaac Gym. PyTorch based/compatible
  • Imitation - Clean PyTorch implementations of imitation and reward learning algorithms. PyTorch based/compatible

Graph Machine Learning

  • pytorch_geometric - Geometric Deep Learning Extension Library for PyTorch. PyTorch based/compatible
  • pytorch_geometric_temporal - Temporal Extension Library for PyTorch Geometric. PyTorch based/compatible
  • PyTorch Geometric Signed Directed - A signed/directed graph neural network extension library for PyTorch Geometric. PyTorch based/compatible
  • dgl - Python package built to ease deep learning on graph, on top of existing DL frameworks. PyTorch based/compatible TensorFlow MXNet based
  • Spektral - Deep learning on graphs. Keras compatible
  • StellarGraph - Machine Learning on Graphs. TensorFlow Keras compatible
  • Graph Nets - Build Graph Nets in Tensorflow. TensorFlow
  • TensorFlow GNN - A library to build Graph Neural Networks on the TensorFlow platform. TensorFlow
  • Auto Graph Learning -An autoML framework & toolkit for machine learning on graphs.
  • PyTorch-BigGraph - Generate embeddings from large-scale graph-structured data. PyTorch based/compatible
  • Auto Graph Learning - An autoML framework & toolkit for machine learning on graphs.
  • Karate Club - An unsupervised machine learning library for graph-structured data.
  • Little Ball of Fur - A library for sampling graph structured data.
  • GreatX - A graph reliability toolbox based on PyTorch and PyTorch Geometric (PyG). PyTorch based/compatible
  • Jraph - A Graph Neural Network Library in Jax.

Learning-to-Rank & Recommender Systems

  • LightFM - A Python implementation of LightFM, a hybrid recommendation algorithm.
  • Spotlight - Deep recommender models using PyTorch.
  • Surprise - A Python scikit for building and analyzing recommender systems.
  • RecBole - A unified, comprehensive and efficient recommendation library. PyTorch based/compatible
  • allRank - allRank is a framework for training learning-to-rank neural models based on PyTorch. PyTorch based/compatible
  • TensorFlow Recommenders - A library for building recommender system models using TensorFlow. TensorFlow Keras compatible
  • TensorFlow Ranking - Learning to Rank in TensorFlow. TensorFlow

Probabilistic Graphical Models

  • pomegranate - Probabilistic and graphical models for Python. PyTorch based/compatible
  • pgmpy - A python library for working with Probabilistic Graphical Models.
  • pyAgrum - A GRaphical Universal Modeler.

Probabilistic Methods

  • pyro - A flexible, scalable deep probabilistic programming library built on PyTorch. PyTorch based/compatible
  • PyMC - Bayesian Stochastic Modelling in Python.
  • ZhuSuan - Bayesian Deep Learning. sklearn
  • GPflow - Gaussian processes in TensorFlow. sklearn
  • InferPy - Deep Probabilistic Modelling Made Easy. sklearn
  • PyStan - Bayesian inference using the No-U-Turn sampler (Python interface).
  • sklearn-bayes - Python package for Bayesian Machine Learning with scikit-learn API. sklearn
  • skpro - Supervised domain-agnostic prediction framework for probabilistic modelling by The Alan Turing Institute. sklearn
  • PyVarInf - Bayesian Deep Learning methods with Variational Inference for PyTorch. PyTorch based/compatible
  • emcee - The Python ensemble sampling toolkit for affine-invariant MCMC.
  • hsmmlearn - A library for hidden semi-Markov models with explicit durations.
  • pyhsmm - Bayesian inference in HSMMs and HMMs.
  • GPyTorch - A highly efficient and modular implementation of Gaussian Processes in PyTorch. PyTorch based/compatible
  • sklearn-crfsuite - A scikit-learn-inspired API for CRFsuite. sklearn

Model Explanation

  • dalex - moDel Agnostic Language for Exploration and explanation. sklearnR inspired/ported lib
  • Shapley - A data-driven framework to quantify the value of classifiers in a machine learning ensemble.
  • Alibi - Algorithms for monitoring and explaining machine learning models.
  • anchor - Code for "High-Precision Model-Agnostic Explanations" paper.
  • aequitas - Bias and Fairness Audit Toolkit.
  • Contrastive Explanation - Contrastive Explanation (Foil Trees). sklearn
  • yellowbrick - Visual analysis and diagnostic tools to facilitate machine learning model selection. sklearn
  • scikit-plot - An intuitive library to add plotting functionality to scikit-learn objects. sklearn
  • shap - A unified approach to explain the output of any machine learning model. sklearn
  • ELI5 - A library for debugging/inspecting machine learning classifiers and explaining their predictions.
  • Lime - Explaining the predictions of any machine learning classifier. sklearn
  • FairML - FairML is a python toolbox auditing the machine learning models for bias. sklearn
  • L2X - Code for replicating the experiments in the paper Learning to Explain: An Information-Theoretic Perspective on Model Interpretation.
  • PDPbox - Partial dependence plot toolbox.
  • PyCEbox - Python Individual Conditional Expectation Plot Toolbox.
  • Skater - Python Library for Model Interpretation.
  • model-analysis - Model analysis tools for TensorFlow. sklearn
  • themis-ml - A library that implements fairness-aware machine learning algorithms. sklearn
  • treeinterpreter - Interpreting scikit-learn's decision tree and random forest predictions. sklearn
  • AI Explainability 360 - Interpretability and explainability of data and machine learning models.
  • Auralisation - Auralisation of learned features in CNN (for audio).
  • CapsNet-Visualization - A visualization of the CapsNet layers to better understand how it works.
  • lucid - A collection of infrastructure and tools for research in neural network interpretability.
  • Netron - Visualizer for deep learning and machine learning models (no Python code, but visualizes models from most Python Deep Learning frameworks).
  • FlashLight - Visualization Tool for your NeuralNetwork.
  • tensorboard-pytorch - Tensorboard for PyTorch (and chainer, mxnet, numpy, ...).
  • mxboard - Logging MXNet data for visualization in TensorBoard. MXNet based

Genetic Programming

  • gplearn - Genetic Programming in Python. sklearn
  • DEAP - Distributed Evolutionary Algorithms in Python.
  • karoo_gp - A Genetic Programming platform for Python with GPU support. sklearn
  • monkeys - A strongly-typed genetic programming framework for Python.
  • sklearn-genetic - Genetic feature selection module for scikit-learn. sklearn

Optimization

  • Optuna - A hyperparameter optimization framework.
  • Spearmint - Bayesian optimization.
  • BoTorch - Bayesian optimization in PyTorch. PyTorch based/compatible
  • scikit-opt - Heuristic Algorithms for optimization.
  • sklearn-genetic-opt - Hyperparameters tuning and feature selection using evolutionary algorithms. sklearn
  • SMAC3 - Sequential Model-based Algorithm Configuration.
  • Optunity - Is a library containing various optimizers for hyperparameter tuning.
  • hyperopt - Distributed Asynchronous Hyperparameter Optimization in Python.
  • hyperopt-sklearn - Hyper-parameter optimization for sklearn. sklearn
  • sklearn-deap - Use evolutionary algorithms instead of gridsearch in scikit-learn. sklearn
  • sigopt_sklearn - SigOpt wrappers for scikit-learn methods. sklearn
  • Bayesian Optimization - A Python implementation of global optimization with gaussian processes.
  • SafeOpt - Safe Bayesian Optimization.
  • scikit-optimize - Sequential model-based optimization with a scipy.optimize interface.
  • Solid - A comprehensive gradient-free optimization framework written in Python.
  • PySwarms - A research toolkit for particle swarm optimization in Python.
  • Platypus - A Free and Open Source Python Library for Multiobjective Optimization.
  • GPflowOpt - Bayesian Optimization using GPflow. sklearn
  • POT - Python Optimal Transport library.
  • Talos - Hyperparameter Optimization for Keras Models.
  • nlopt - Library for nonlinear optimization (global and local, constrained or unconstrained).
  • OR-Tools - An open-source software suite for optimization by Google; provides a unified programming interface to a half dozen solvers: SCIP, GLPK, GLOP, CP-SAT, CPLEX, and Gurobi.

Feature Engineering

General

  • Featuretools - Automated feature engineering.
  • Feature Engine - Feature engineering package with sklearn-like functionality. sklearn
  • OpenFE - Automated feature generation with expert-level performance.
  • skl-groups - A scikit-learn addon to operate on set/"group"-based features. sklearn
  • Feature Forge - A set of tools for creating and testing machine learning features. sklearn
  • few - A feature engineering wrapper for sklearn. sklearn
  • scikit-mdr - A sklearn-compatible Python implementation of Multifactor Dimensionality Reduction (MDR) for feature construction. sklearn
  • tsfresh - Automatic extraction of relevant features from time series. sklearn
  • dirty_cat - Machine learning on dirty tabular data (especially: string-based variables for classifcation and regression). sklearn
  • NitroFE - Moving window features. sklearn
  • sk-transformer - A collection of various pandas & scikit-learn compatible transformers for all kinds of preprocessing and feature engineering steps pandas compatible

Feature Selection

  • scikit-feature - Feature selection repository in Python.
  • boruta_py - Implementations of the Boruta all-relevant feature selection method. sklearn
  • BoostARoota - A fast xgboost feature selection algorithm. sklearn
  • scikit-rebate - A scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning. sklearn
  • zoofs - A feature selection library based on evolutionary algorithms.

Visualization

General Purposes

  • Matplotlib - Plotting with Python.
  • seaborn - Statistical data visualization using matplotlib.
  • prettyplotlib - Painlessly create beautiful matplotlib plots.
  • python-ternary - Ternary plotting library for Python with matplotlib.
  • missingno - Missing data visualization module for Python.
  • chartify - Python library that makes it easy for data scientists to create charts.
  • physt - Improved histograms.

Interactive plots

  • animatplot - A python package for animating plots built on matplotlib.
  • plotly - A Python library that makes interactive and publication-quality graphs.
  • Bokeh - Interactive Web Plotting for Python.
  • Altair - Declarative statistical visualization library for Python. Can easily do many data transformation within the code to create graph
  • bqplot - Plotting library for IPython/Jupyter notebooks
  • pyecharts - Migrated from Echarts, a charting and visualization library, to Python's interactive visual drawing library.pyecharts echarts

Map

  • folium - Makes it easy to visualize data on an interactive open street map
  • geemap - Python package for interactive mapping with Google Earth Engine (GEE)

Automatic Plotting

  • HoloViews - Stop plotting your data - annotate your data and let it visualize itself.
  • AutoViz: Visualize data automatically with 1 line of code (ideal for machine learning)
  • SweetViz: Visualize and compare datasets, target values and associations, with one line of code.

NLP

  • pyLDAvis: Visualize interactive topic model

Deployment

  • fastapi - Modern, fast (high-performance), a web framework for building APIs with Python
  • streamlit - Make it easy to deploy the machine learning model
  • streamsync - No-code in the front, Python in the back. An open-source framework for creating data apps.
  • gradio - Create UIs for your machine learning model in Python in 3 minutes.
  • Vizro - A toolkit for creating modular data visualization applications.
  • datapane - A collection of APIs to turn scripts and notebooks into interactive reports.
  • binder - Enable sharing and execute Jupyter Notebooks

Statistics

  • pandas_summary - Extension to pandas dataframes describe function. pandas compatible
  • Pandas Profiling - Create HTML profiling reports from pandas DataFrame objects. pandas compatible
  • statsmodels - Statistical modeling and econometrics in Python.
  • stockstats - Supply a wrapper StockDataFrame based on the pandas.DataFrame with inline stock statistics/indicators support.
  • weightedcalcs - A pandas-based utility to calculate weighted means, medians, distributions, standard deviations, and more.
  • scikit-posthocs - Pairwise Multiple Comparisons Post-hoc Tests.
  • Alphalens - Performance analysis of predictive (alpha) stock factors.

Data Manipulation

Data Frames

  • pandas - Powerful Python data analysis toolkit.
  • polars - A fast multi-threaded, hybrid-out-of-core DataFrame library.
  • Arctic - High-performance datastore for time series and tick data.
  • datatable - Data.table for Python. R inspired/ported lib
  • pandas_profiling - Create HTML profiling reports from pandas DataFrame objects
  • cuDF - GPU DataFrame Library. pandas compatible GPU accelerated
  • blaze - NumPy and pandas interface to Big Data. pandas compatible
  • pandasql - Allows you to query pandas DataFrames using SQL syntax. pandas compatible
  • pandas-gbq - pandas Google Big Query. pandas compatible
  • xpandas - Universal 1d/2d data containers with Transformers .functionality for data analysis by The Alan Turing Institute.
  • pysparkling - A pure Python implementation of Apache Spark's RDD and DStream interfaces. Apache Spark based
  • modin - Speed up your pandas workflows by changing a single line of code. pandas compatible
  • swifter - A package that efficiently applies any function to a pandas dataframe or series in the fastest available manner.
  • pandas-log - A package that allows providing feedback about basic pandas operations and finds both business logic and performance issues.
  • vaex - Out-of-Core DataFrames for Python, ML, visualize and explore big tabular data at a billion rows per second.
  • xarray - Xarray combines the best features of NumPy and pandas for multidimensional data selection by supplementing numerical axis labels with named dimensions for more intuitive, concise, and less error-prone indexing routines.

Pipelines

  • pdpipe - Sasy pipelines for pandas DataFrames.
  • SSPipe - Python pipe (|) operator with support for DataFrames and Numpy, and Pytorch.
  • pandas-ply - Functional data manipulation for pandas. pandas compatible
  • Dplython - Dplyr for Python. R inspired/ported lib
  • sklearn-pandas - pandas integration with sklearn. sklearn pandas compatible
  • Dataset - Helps you conveniently work with random or sequential batches of your data and define data processing.
  • pyjanitor - Clean APIs for data cleaning. pandas compatible
  • meza - A Python toolkit for processing tabular data.
  • Prodmodel - Build system for data science pipelines.
  • dopanda - Hints and tips for using pandas in an analysis environment. pandas compatible
  • Hamilton - A microframework for dataframe generation that applies Directed Acyclic Graphs specified by a flow of lazily evaluated Python functions.

Data-centric AI

  • cleanlab - The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
  • snorkel - A system for quickly generating training data with weak supervision.
  • dataprep - Collect, clean, and visualize your data in Python with a few lines of code.

Synthetic Data

  • ydata-synthetic - A package to generate synthetic tabular and time-series data leveraging the state-of-the-art generative models. pandas compatible

Distributed Computing

  • Horovod - Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. sklearn
  • PySpark - Exposes the Spark programming model to Python. Apache Spark based
  • Veles - Distributed machine learning platform.
  • Jubatus - Framework and Library for Distributed Online Machine Learning.
  • DMTK - Microsoft Distributed Machine Learning Toolkit.
  • PaddlePaddle - PArallel Distributed Deep LEarning.
  • dask-ml - Distributed and parallel machine learning. sklearn
  • Distributed - Distributed computation in Python.

Experimentation

  • mlflow - Open source platform for the machine learning lifecycle.
  • Neptune - A lightweight ML experiment tracking, results visualization, and management tool.
  • dvc - Data Version Control | Git for Data & Models | ML Experiments Management.
  • envd - ๐Ÿ•๏ธ machine learning development environment for data science and AI/ML engineering teams.
  • Sacred - A tool to help you configure, organize, log, and reproduce experiments.
  • Ax - Adaptive Experimentation Platform. sklearn

Data Validation

  • great_expectations - Always know what to expect from your data.
  • pandera - A lightweight, flexible, and expressive statistical data testing library.
  • deepchecks - Validation & testing of ML models and data during model development, deployment, and production. sklearn
  • evidently - Evaluate and monitor ML models from validation to production.
  • TensorFlow Data Validation - Library for exploring and validating machine learning data.

Evaluation

  • recmetrics - Library of useful metrics and plots for evaluating recommender systems.
  • Metrics - Machine learning evaluation metric.
  • sklearn-evaluation - Model evaluation made easy: plots, tables, and markdown reports. sklearn
  • AI Fairness 360 - Fairness metrics for datasets and ML models, explanations, and algorithms to mitigate bias in datasets and models.

Computations

  • numpy - The fundamental package needed for scientific computing with Python.
  • Dask - Parallel computing with task scheduling. pandas compatible
  • bottleneck - Fast NumPy array functions written in C.
  • CuPy - NumPy-like API accelerated with CUDA.
  • scikit-tensor - Python library for multilinear algebra and tensor factorizations.
  • numdifftools - Solve automatic numerical differentiation problems in one or more variables.
  • quaternion - Add built-in support for quaternions to numpy.
  • adaptive - Tools for adaptive and parallel samping of mathematical functions.
  • NumExpr - A fast numerical expression evaluator for NumPy that comes with an integrated computing virtual machine to speed calculations up by avoiding memory allocation for intermediate results.

Web Scraping

  • BeautifulSoup: The easiest library to scrape static websites for beginners
  • Scrapy: Fast and extensible scraping library. Can write rules and create customized scraper without touching the core
  • Selenium: Use Selenium Python API to access all functionalities of Selenium WebDriver in an intuitive way like a real user.
  • Pattern: High level scraping for well-establish websites such as Google, Twitter, and Wikipedia. Also has NLP, machine learning algorithms, and visualization
  • twitterscraper: Efficient library to scrape Twitter

Spatial Analysis

  • GeoPandas - Python tools for geographic data. pandas compatible
  • PySal - Python Spatial Analysis Library.

Quantum Computing

  • qiskit - Qiskit is an open-source SDK for working with quantum computers at the level of circuits, algorithms, and application modules.
  • cirq - A python framework for creating, editing, and invoking Noisy Intermediate Scale Quantum (NISQ) circuits.
  • PennyLane - Quantum machine learning, automatic differentiation, and optimization of hybrid quantum-classical computations.
  • QML - A Python Toolkit for Quantum Machine Learning.

Conversion

  • sklearn-porter - Transpile trained scikit-learn estimators to C, Java, JavaScript, and others.
  • ONNX - Open Neural Network Exchange.
  • MMdnn - A set of tools to help users inter-operate among different deep learning frameworks.
  • treelite - Universal model exchange and serialization format for decision tree forests.

Contributing

Contributions are welcome! ๐Ÿ˜Ž
Read the contribution guideline.

License

This work is licensed under the Creative Commons Attribution 4.0 International License - CC BY 4.0

awesome-python-data-science's People

Contributors

absterjr avatar agrover112 avatar aquemy avatar benedekrozemberczki avatar chrislemke avatar cnpryer avatar dongzide avatar eyaltrabelsi avatar gaocegege avatar grahamwaters avatar gsvigruha avatar guofei9987 avatar itaygabbay avatar jakubczakon avatar jamesmyatt avatar jwmueller avatar kelvins avatar khuyentran1401 avatar konrad avatar krzjoa avatar mierzejk avatar mohammad7t avatar pawnep avatar standardai avatar suranah avatar techwiz-3 avatar tezromach avatar yasminbraga avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

awesome-python-data-science's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.