GithubHelp home page GithubHelp logo

razisaji25 / python-data-engineering-resources Goto Github PK

View Code? Open in Web Editor NEW

This project forked from vajol/python-data-engineering-resources

0.0 0.0 0.0 46 KB

A handpicked collection of resources for Python developers in data engineering, machine learning, and AI. Inside, you'll discover a neatly arranged selection of frameworks, libraries, and tools crucial for machine learning, ETL, ORM, data/schema validation, database migration, and more, all focused on Python.

python-data-engineering-resources's Introduction

Python Data Engineering Resources

Welcome to my repository of Data Engineering with Python resources!

Throughout my time as a Data Engineer, I've gathered many bookmarks and resources that have really helped me learn and do my job. I organized these bookmarks and put them in this repository so they can help others too, whether you're new to Data Engineering with Python or looking to know more. I hope you find these resources as helpful as they were for me.

Repository Description

This repository is a handpicked collection of resources for Python developers in data engineering, machine learning, and AI. Inside, you'll discover a neatly arranged selection of frameworks, libraries, and tools crucial for machine learning, ETL, ORM, data/schema validation, database migration, and more, all focused on Python.

Each section includes:

  • A concise description of the tools within that category.
  • A list of the most relevant tools found in that category.
  • A guide on selecting the appropriate tool from each category.

Resources Included:

  1. ORMs for Python: Including popular ORMs like SQLAlchemy, Django ORM, Peewee, etc.

  2. Data/Schema Validation: Including libraries like Pydantic, Marshmallow, Cerberus, etc.

  3. Database Migration Tools: Tools like Alembic, Flyway, or Django's own migration system.

  4. Data Wrangling Tools: Libraries that help in cleaning, transforming, and preparing data, such as Pandas, Dask, etc.

  5. ETL (Extract, Transform, Load) Frameworks: Tools that help in the process of extracting data from various sources, transforming it, and loading it into a data store.

  6. Orchestration Tools: Tools such as Apache NiFi, Luigi, Airflow, and Prefect, are designed to automate and orchestrate ETL workflows, managing job scheduling and execution. However, the specific ETL tasks are typically defined with other dedicated libraries or frameworks.

  7. Data Visualization Libraries: Libraries that can help in visualizing data, such as Matplotlib, Seaborn, Plotly, Bokeh, etc.

  8. Machine Learning Libraries: While not exclusively for data engineering, having resources related to machine learning is useful. This includes libraries like scikit-learn, TensorFlow, and PyTorch.

  9. Big Data Processing Tools: Includes links to resources for tools like Apache Spark, Apache Hadoop, etc.

  10. Streaming Data Processing: Tools and frameworks for processing streaming data, such as Apache Kafka, Apache Flink, and Apache Storm.

  11. Data Modeling Tools: Resources for data modeling tools that can help in designing database schemas, such as dbdiagram.io, ER/Studio, or MySQL Workbench.

  12. API Development Frameworks: Since data engineering often involves API development for data access, includes resources for frameworks like Flask, FastAPI, or Django REST Framework.

  13. Data Governance and Metadata Management: Tools and frameworks that help in managing data access, security, and compliance, such as Apache Atlas, Collibra, or Amundsen.

  14. Cloud SDKs for Python: These SDKs, like boto3 for AWS, provide Python developers with the tools necessary to interact with cloud services efficiently, allowing for the automation of resource management and the utilization of cloud services within Python applications.

  15. Cloud Services and Tools: Include resources related to cloud services that are widely used in data engineering, like AWS, Azure, and GCP, particularly focusing on their data storage, processing, and analytics services.

  16. Data Storage Solutions: Resources on various data storage solutions like relational databases, NoSQL databases, data lakes, and data warehouses.

  17. Data Quality Tools: Tools that help in ensuring data quality, such as Great Expectations, Deequ, or Pandas Profiling.

  18. Learning Resources: Links to courses, tutorials, blogs, and books that offer in-depth knowledge about data engineering in Python.

  19. Community and Forums: Links to relevant forums and communities where developers can ask questions, share knowledge, and stay updated with the latest trends in data engineering.

  20. Free datasets and APIs: Great list of free datasets and APIs - a very useful collection of free data resources for people learning data engineering. These resources are great for getting a hands-on experience.

Contributing

We welcome contributions to this repository! If you'd like to add a resource, please submit a pull request or open an issue to suggest changes. Please ensure your suggestions align with contribution guidelines outlined in CONTRIBUTING file.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Authors

  • Vajo Lukic - Initial work - vajol

Acknowledgments

Giving Back to the Community

During my career as Data Engineer, I've used many free resources that have really helped me grow. I created this repository to give something back to the community that has helped me so much. I hope these resources will help others just like they helped me. Let's help each other and learn together as we move forward in our learning path!

Contact

For any inquiries or comments about this repository, feel free to connect with me on LinkedIn, follow and reach out on Twitter, or subscribe and send your thoughts via my Substack newsletter.

Your feedback and questions are always welcome!


Back To Top

python-data-engineering-resources's People

Contributors

vajol avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.