GithubHelp home page GithubHelp logo

kedro-org / kedro-starters Goto Github PK

View Code? Open in Web Editor NEW
61.0 4.0 57.0 6.44 MB

Templates for your Kedro projects.

License: Apache License 2.0

Python 95.95% Gherkin 3.70% Dockerfile 0.10% Makefile 0.25%
kedro project-template

kedro-starters's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

kedro-starters's Issues

Circular dependencies for pyspark starter

Hello, all kedro command return following error when run any kedro command inside a project created from kedro new --starter=pyspark-iris (haven't checked pyspark but I assumed same issue) inside a fresh virtualenv that has kedro installed (beside pip)

(mm-venv) bash-4.2$ kedro install
Traceback (most recent call last):
File "/you/shall/not/path/ds-workspace/mm-venv/bin/kedro", line 8, in <module>
sys.exit(main())
File "/you/shall/not/path/ds-workspace/mm-venv/lib/python3.7/site-packages/kedro/framework/cli/cli.py", line 268, in main
cli_collection = KedroCLI(project_path=Path.cwd())
File "/you/shall/not/path/ds-workspace/mm-venv/lib/python3.7/site-packages/kedro/framework/cli/cli.py", line 181, in __init__
self._metadata = bootstrap_project(project_path)
File "/you/shall/not/path/ds-workspace/mm-venv/lib/python3.7/site-packages/kedro/framework/startup.py", line 181, in bootstrap_project
configure_project(metadata.package_name)
File "/you/shall/not/path/ds-workspace/mm-venv/lib/python3.7/site-packages/kedro/framework/project/__init__.py", line 218, in configure_project
_validate_module(settings_module)
File "/you/shall/not/path/ds-workspace/mm-venv/lib/python3.7/site-packages/kedro/framework/project/__init__.py", line 210, in _validate_module
importlib.import_module(settings_module)
File "/usr/local/opt/[email protected]/Frameworks/Python.framework/Versions/3.7/lib/python3.7/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/private/tmp/new-kedro-project/src/new_kedro_project/settings.py", line 30, in <module>
from new_kedro_project.context import ProjectContext
File "/private/tmp/new-kedro-project/src/new_kedro_project/context.py", line 34, in <module>
from pyspark import SparkConf

As the stack trace shows, reason is because kedro tries to validate the project before kedro install got a chance to run, and all kedro command failed because project context try to import some not-yet-installed packages.

There are some not-beautiful workaround but wonder if this is indeed the issue or i misunderstood anything?

Spaceflights starter needs updating to match tutorial

Description

This issue in the framework will update to the tutorial docs for spaceflights. We should update the starter to match the final output of the tutorial.

Other changes/improvements/updates to the tutorial that affect code should also always be pushed through to the starter so they stay in sync.

There's separate discussion of creating additional tutorials (based on Spaceflights) to cover optional/advanced aspects such as reporting with Plotly, experiment tracking and modular pipelines/namespacing.

Very Minor error on typing for `pyspark-iris`

pyspark-iris/{{ cookiecutter.repo_name }}/src/{{ cookiecutter.python_package }}/pipelines/data_engineering/nodes.py
line 72 should be List[DataFrame] (it seems to me, unless I read the output of randomSplit wrong)

image

I'd raise this small PR myself, but only have read access

kedro ipython immediately fails for starter=spaceflights-pyspark-viz

Description

installed the requirements.txt, getting this immediately after kedro ipython
kedro.io.core.DatasetError: Class 'spark.SparkDataset' not found, is this a typo?

is there any additional config needed for starter=spaceflights-pyspark-viz?

I tried this on Windows with 0.19.1 and 0.19.2 both yielding the same issue. kedro-datasets is 1.5.1

Missing .gitignore when using starters

Description

The .gitignore file is missing when I tried to create a project using starters.

Steps to Reproduce

  • kedro new --starter=pyspark
  • kedro new --starter=pyspark-iris

Environment

Kedro version used (pip show kedro or kedro -V): 0.17.5
Python version used (python -V): Python 3.8.12

Make starters use `OmegaConfigLoader`

Description

Right now, the starters show a FutureWarning because they use the ConfigLoader, which is deprecated.

[08/17/23 11:53:39] INFO     Kedro project test-graph                                        session.py:364
                    WARNING  /Users/juan_cano/.micromamba/envs/kedro310/lib/python3.10/site warnings.py:109
                             -packages/kedro/framework/session/session.py:266:                             
                             FutureWarning: ConfigLoader will be deprecated in Kedro 0.19.                 
                             Please use the OmegaConfigLoader instead. To consult the                      
                             documentation for OmegaConfigLoader, see here:                                
                             https://docs.kedro.org/en/stable/configuration/advanced_config                
                             uration.html#omegaconfigloader                                                
                               warnings.warn(   

Revert changes to `test_requirements.txt` and `requirements.txt` once Kedro 0.19 releases

Description

Following starters work, #159, we switched to using the develop branch in test_requirements.txt so starters could use the new add-ons flow from kedro and unblock development. This also meant we had to update kedro-viz to work with develop so we had to point all requirments.txt from starters to the kedro-viz main repo instead of PyPi.

Once Kedro 0.19.0 and next release of Kedro-Viz is out we can revert these back to the default settings.

  • Revert kedro to using the main branch in test_requirements.txt in framework.
  • Revert kedro-viz pin in requirments.txt to point to the release version (PyPi).

`spaceflights-pandas` tests do not run successfully

Description

Upon running kedro new --starter=spaceflights-pandas, running python -m pytest will produce two layers of errors, across several versions of Python.

Context

How has this bug affected you? What were you trying to accomplish?

Steps to Reproduce

(This occurs across Python 3.10, 3.11, and 3.12)

  1. Run kedro new --starter=spaceflights-pandas
  2. Run cd spaceflights-pandas
  3. Run pip install -r requirements.txt
  4. Run python -m pytest

Expected Result

All tests that come with the starter should pass without error.

Actual Result

There are two levels of errors:

Level 1: An error occurs, resulting from the tests directory being at project root

The Kedro documentation's Automated Testing page instructs users to run pip install -e .; however, the starter's Readme makes no mention of this. Thus, upon seeing tests and running python -m pytest, users see this error message:

$ python -m pytest
==================================================================================== test session starts ====================================================================================
platform darwin -- Python 3.11.5, pytest-7.4.4, pluggy-1.3.0
rootdir: /Users/MyUserName/Downloads/spaceflights-pandas
configfile: pyproject.toml
plugins: mock-1.13.0, anyio-3.7.1, cov-3.0.0
collected 1 item / 1 error
/Users/MyUserName/Downloads/spaceflights-pandas/venv/lib/python3.11/site-packages/coverage/control.py:888: CoverageWarning: No data was collected. (no-data-collected)
  self._warn("No data was collected.", slug="no-data-collected")

========================================================================================== ERRORS ===========================================================================================
______________________________________________________________ ERROR collecting tests/pipelines/data_science/test_pipeline.py _______________________________________________________________
ImportError while importing test module '/Users/MyUserName/Downloads/spaceflights-pandas/tests/pipelines/data_science/test_pipeline.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/[email protected]/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/pipelines/data_science/test_pipeline.py:6: in <module>
    from spaceflights_pandas.pipelines.data_science import create_pipeline as create_ds_pipeline
E   ModuleNotFoundError: No module named 'spaceflights_pandas'
===================================================================================== warnings summary ======================================================================================
venv/lib/python3.11/site-packages/pytest_cov/plugin.py:256
  /Users/MyUserName/Downloads/spaceflights-pandas/venv/lib/python3.11/site-packages/pytest_cov/plugin.py:256: PytestDeprecationWarning: The hookimpl CovPlugin.pytest_configure_node uses old-
style configuration options (marks or attributes).
  Please use the pytest.hookimpl(optionalhook=True) decorator instead
   to configure the hooks.
   See https://docs.pytest.org/en/latest/deprecations.html#configuring-hook-specs-impls-using-markers
    def pytest_configure_node(self, node):

venv/lib/python3.11/site-packages/pytest_cov/plugin.py:265
  /Users/MyUserName/Downloads/spaceflights-pandas/venv/lib/python3.11/site-packages/pytest_cov/plugin.py:265: PytestDeprecationWarning: The hookimpl CovPlugin.pytest_testnodedown uses old-st
yle configuration options (marks or attributes).
  Please use the pytest.hookimpl(optionalhook=True) decorator instead
   to configure the hooks.
   See https://docs.pytest.org/en/latest/deprecations.html#configuring-hook-specs-impls-using-markers
    def pytest_testnodedown(self, node, error):

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

---------- coverage: platform darwin, python 3.11.5-final-0 ----------
Name                                                            Stmts   Miss  Cover   Missing
---------------------------------------------------------------------------------------------
src/spaceflights_pandas/__init__.py                                 1      1     0%   4
src/spaceflights_pandas/__main__.py                                30     30     0%   4-47
src/spaceflights_pandas/pipeline_registry.py                        7      7     0%   2-16
src/spaceflights_pandas/pipelines/__init__.py                       0      0   100%
src/spaceflights_pandas/pipelines/data_processing/__init__.py       1      1     0%   3
src/spaceflights_pandas/pipelines/data_processing/nodes.py         26     26     0%   1-68
src/spaceflights_pandas/pipelines/data_processing/pipeline.py       4      4     0%   1-7
src/spaceflights_pandas/pipelines/data_science/__init__.py          1      1     0%   3
src/spaceflights_pandas/pipelines/data_science/nodes.py            20     20     0%   1-55
src/spaceflights_pandas/pipelines/data_science/pipeline.py          4      4     0%   1-7
src/spaceflights_pandas/settings.py                                 3      3     0%   27-31
---------------------------------------------------------------------------------------------
TOTAL                                                              97     97     0%

================================================================================== short test summary info ==================================================================================
ERROR tests/pipelines/data_science/test_pipeline.py
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
=============================================================================== 2 warnings, 1 error in 11.80s ===============================================================================

Level 2: KedroContext throws an error

Upon either running pip install -e . or moving the tests directory within src, and then running python -m pytest again, users see a second error:

$ python -m pytest
==================================================================================== test session starts ====================================================================================
platform darwin -- Python 3.11.5, pytest-7.4.4, pluggy-1.3.0
rootdir: /Users/MyUserName/Downloads/spaceflights-pandas
configfile: pyproject.toml
plugins: mock-1.13.0, anyio-3.7.1, cov-3.0.0
collected 4 items

tests/test_run.py E                                                                                                                                                                   [ 25%]
tests/pipelines/data_science/test_pipeline.py ...                                                                                                                                     [100%]

========================================================================================== ERRORS ===========================================================================================
__________________________________________________________________ ERROR at setup of TestProjectContext.test_project_path ___________________________________________________________________

config_loader = OmegaConfigLoader(conf_source=/Users/MyUserName/Downloads/spaceflights-pandas, env=None, config_patterns={'catalog': ['ca... '**/parameters*'], 'credentials': ['credentials*',
'credentials*/**', '**/credentials*'], 'globals': ['globals.yml']})

    @pytest.fixture
    def project_context(config_loader):
>       return KedroContext(
            package_name="spaceflights_pandas",
            project_path=Path.cwd(),
            config_loader=config_loader,
            hook_manager=_create_hook_manager(),
        )
E       TypeError: KedroContext.__init__() missing 1 required positional argument: 'env'

tests/test_run.py:23: TypeError
===================================================================================== warnings summary ======================================================================================
venv/lib/python3.11/site-packages/pytest_cov/plugin.py:256
  /Users/MyUserName/Downloads/spaceflights-pandas/venv/lib/python3.11/site-packages/pytest_cov/plugin.py:256: PytestDeprecationWarning: The hookimpl CovPlugin.pytest_configure_node uses old-
style configuration options (marks or attributes).
  Please use the pytest.hookimpl(optionalhook=True) decorator instead
   to configure the hooks.
   See https://docs.pytest.org/en/latest/deprecations.html#configuring-hook-specs-impls-using-markers
    def pytest_configure_node(self, node):

venv/lib/python3.11/site-packages/pytest_cov/plugin.py:265
  /Users/MyUserName/Downloads/spaceflights-pandas/venv/lib/python3.11/site-packages/pytest_cov/plugin.py:265: PytestDeprecationWarning: The hookimpl CovPlugin.pytest_testnodedown uses old-st
yle configuration options (marks or attributes).
  Please use the pytest.hookimpl(optionalhook=True) decorator instead
   to configure the hooks.
   See https://docs.pytest.org/en/latest/deprecations.html#configuring-hook-specs-impls-using-markers
    def pytest_testnodedown(self, node, error):

tests/pipelines/data_science/test_pipeline.py::test_data_science_pipeline
  /Users/MyUserName/Downloads/spaceflights-pandas/venv/lib/python3.11/site-packages/sklearn/metrics/_regression.py:1187: UndefinedMetricWarning: R^2 score is not well-defined with less than
two samples.
    warnings.warn(msg, UndefinedMetricWarning)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

---------- coverage: platform darwin, python 3.11.5-final-0 ----------
Name                                                            Stmts   Miss  Cover   Missing
---------------------------------------------------------------------------------------------
src/spaceflights_pandas/__init__.py                                 1      0   100%
src/spaceflights_pandas/__main__.py                                30     30     0%   4-47
src/spaceflights_pandas/pipeline_registry.py                        7      7     0%   2-16
src/spaceflights_pandas/pipelines/__init__.py                       0      0   100%
src/spaceflights_pandas/pipelines/data_processing/__init__.py       1      1     0%   3
src/spaceflights_pandas/pipelines/data_processing/nodes.py         26     26     0%   1-68
src/spaceflights_pandas/pipelines/data_processing/pipeline.py       4      4     0%   1-7
src/spaceflights_pandas/pipelines/data_science/__init__.py          1      0   100%
src/spaceflights_pandas/pipelines/data_science/nodes.py            20      0   100%
src/spaceflights_pandas/pipelines/data_science/pipeline.py          4      0   100%
src/spaceflights_pandas/settings.py                                 3      3     0%   27-31
---------------------------------------------------------------------------------------------
TOTAL                                                              97     71    27%

================================================================================== short test summary info ==================================================================================
ERROR tests/test_run.py::TestProjectContext::test_project_path - TypeError: KedroContext.__init__() missing 1 required positional argument: 'env'
========================================================================== 3 passed, 3 warnings, 1 error in 14.05s ==========================================================================

The KedroContext documentation states that env should supply a default value of "local", but that seems not to be getting picked up here. Manually adding env="local" here does this error:

$ python -m pytest
==================================================================================== test session starts ====================================================================================
platform darwin -- Python 3.11.5, pytest-7.4.4, pluggy-1.3.0
rootdir: /Users/MyUserName/Downloads/spaceflights-pandas/spaceflights-pandas
configfile: pyproject.toml
plugins: mock-1.13.0, anyio-3.7.1, cov-3.0.0
collected 4 items

tests/test_run.py .                                                                                                                                                                   [ 25%]
tests/pipelines/data_science/test_pipeline.py ...                                                                                                                                     [100%]

===================================================================================== warnings summary ======================================================================================
../venv/lib/python3.11/site-packages/pytest_cov/plugin.py:256
  /Users/MyUserName/Downloads/spaceflights-pandas/venv/lib/python3.11/site-packages/pytest_cov/plugin.py:256: PytestDeprecationWarning: The hookimpl CovPlugin.pytest_configure_node uses old-st
yle configuration options (marks or attributes).
  Please use the pytest.hookimpl(optionalhook=True) decorator instead
   to configure the hooks.
   See https://docs.pytest.org/en/latest/deprecations.html#configuring-hook-specs-impls-using-markers
    def pytest_configure_node(self, node):

../venv/lib/python3.11/site-packages/pytest_cov/plugin.py:265
  /Users/MyUserName/Downloads/spaceflights-pandas/venv/lib/python3.11/site-packages/pytest_cov/plugin.py:265: PytestDeprecationWarning: The hookimpl CovPlugin.pytest_testnodedown uses old-styl
e configuration options (marks or attributes).
  Please use the pytest.hookimpl(optionalhook=True) decorator instead
   to configure the hooks.
   See https://docs.pytest.org/en/latest/deprecations.html#configuring-hook-specs-impls-using-markers
    def pytest_testnodedown(self, node, error):

tests/pipelines/data_science/test_pipeline.py::test_data_science_pipeline
  /Users/MyUserName/Downloads/spaceflights-pandas/venv/lib/python3.11/site-packages/sklearn/metrics/_regression.py:1187: UndefinedMetricWarning: R^2 score is not well-defined with less than tw
o samples.
    warnings.warn(msg, UndefinedMetricWarning)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

---------- coverage: platform darwin, python 3.11.5-final-0 ----------
Name                                                            Stmts   Miss  Cover   Missing
---------------------------------------------------------------------------------------------
src/spaceflights_pandas/__init__.py                                 1      0   100%
src/spaceflights_pandas/__main__.py                                30     30     0%   4-47
src/spaceflights_pandas/pipeline_registry.py                        7      7     0%   2-16
src/spaceflights_pandas/pipelines/__init__.py                       0      0   100%
src/spaceflights_pandas/pipelines/data_processing/__init__.py       1      1     0%   3
src/spaceflights_pandas/pipelines/data_processing/nodes.py         26     26     0%   1-68
src/spaceflights_pandas/pipelines/data_processing/pipeline.py       4      4     0%   1-7
src/spaceflights_pandas/pipelines/data_science/__init__.py          1      0   100%
src/spaceflights_pandas/pipelines/data_science/nodes.py            20      0   100%
src/spaceflights_pandas/pipelines/data_science/pipeline.py          4      0   100%
src/spaceflights_pandas/settings.py                                 3      3     0%   27-31
---------------------------------------------------------------------------------------------
TOTAL                                                              97     71    27%

=============================================================================== 4 passed, 3 warnings in 2.13s ===============================================================================

Your Environment

Include as many relevant details about the environment in which you experienced the bug:

  • Kedro version used (pip show kedro or kedro -V): 0.19.5 (though I believe that this happens with older versions, as well, ever since the tests directory was moved to the project root)
  • Python version used (python -V): 3.10, 3.11, 3.12
  • Operating system and version: MacOS Sonoma 14.4.1

Recommendations

  1. Add a Readme note to run pip install -e ., or else move tests to be within src.
  2. Add env="local" to KedroContext in the example test file above.
  3. Optionally, under pyproject.toml's [tool.pytest.ini_options] section, add filterwarnings = ["ignore::DeprecationWarning:.*pytest_cov*"] to suppress the pytest-cov warnings above.

I would be happy to contribute a PR implementing the above, but thought to ask first, would those changes be welcome? Or, specifically as with the KedroContext error above, is it possible that part of this may point to something that needs to be updated in kedro itself?

`kedro pyspark` starter version `0.18.3` conflicts with `logging.yml`

Description

  • We use the kedro pyspark starter with kedro version 0.18.3. When trying to execute kedro run, it produces the following error:
ConstructorError: while constructing a mapping
  in ".../conf/base/logging.yml", line 43, column 3
found unacceptable key (unhashable type: 'dict')
  • After deleting the logging.yml file, kedro run completes without error

Context

How has this bug affected you? What were you trying to accomplish?

  • We are using the starters to build new kedro projects with alloy, and this error is causing execution errors

Steps to Reproduce

  1. Use alloy to create a new kedro project with kedro pyspark starter version 0.18.3
  2. Execute kedro run within the created kedro project

Expected Result

kedro run should execute without error

Actual Result

Tell us what happens instead.

ConstructorError: while constructing a mapping
  in ".../conf/base/logging.yml", line 43, column 3
found unacceptable key (unhashable type: 'dict')

Your Environment

Include as many relevant details about the environment in which you experienced the bug:

  • Kedro version used (pip show kedro or kedro -V): 0.18.3
  • Python version used (python -V): 3.9
  • Operating system and version: macOS

Inconsistencies in spaceflights data

Description

This issue is the same as kedro-org/kedro#3110, which proposed resolution is proposed on PR kedro-org/kedro#3119.

Context

  • When merging rated_shuttles with companies, both dataframes have an id column, and this creates id_x and id_y, which could be avoided by selectively dropping id before merging
  • companies has duplicate rows, so when merging rated_shuttles with companies, some rows are repeated and this might distort the result, which could be avoided by doing a .drop_duplicates() on companies

Possible Implementation

[...]
 rated_shuttles = shuttles.merge(reviews, left_on="id", right_on="shuttle_id")
 rated_shuttles = rated_shuttles.drop("id", axis=1)
 companies = companies.drop_duplicates()
 model_input_table = rated_shuttles.merge(
     companies, left_on="company_id", right_on="id"
[...]
 )

In this repository, this should be add in two files:

Possible Alternatives

(Optional) Describe any alternative solutions or features you've considered.

Kedro new does not work for starter standalone-datacatalog (formerly known as mini-kedro)

Description

When I run the command kedro new -s standalone-datacatalog I get an error saying that the starter is not found.

Just for clarification, the following command works normally as an alternative: kedro new -s https://github.com/kedro-org/kedro-starters/ --checkout 0.18.0 --directory standalone-datacatalog

Context

I was trying to create a minimal project to test how the plugins work. This has an easy workaround, so it does not affect me.

Steps to Reproduce

  1. Run command kedro new -s standalone-datacatalog or kedro new -s standalone-datacatalog --checkout 0.18.0

Expected Result

Kedro should move on to ask the normal questions such as project and repo name.

Actual Result

An error is displayed that says the starter is not found.

kedro.framework.cli.utils.KedroCliError: Kedro project template not found at standalone-datacatalog . Specified tag 0.18.0. The following tags are available: . The aliases for the official Kedro starters are: 
- astro-airflow-iris
- mini-kedro
- pandas-iris
- pyspark
- pyspark-iris
- spaceflights

Run with --verbose to see the full exception
Error: Kedro project template not found at standalone-datacatalog . Specified tag 0.18.0. The following tags are available: . The aliases for the official Kedro starters are: 
- astro-airflow-iris
- mini-kedro
- pandas-iris
- pyspark
- pyspark-iris
- spaceflights

Your Environment

Include as many relevant details about the environment in which you experienced the bug:

  • Kedro version used (pip show kedro or kedro -V): kedro, version 0.18.0
  • Python version used (python -V): Python 3.8.5
  • Operating system and version: Windows 10

[KED-3022] Update and simplify all starters with the 0.18

To be merged after 0.18.0 has been released.

  • Registration hooks are deleted in 0.18.0, so we should remove all hooks.py files containing registration hooks from our starters.
  • Make sure all starters (including the built-in ones) are up to date with Kedro 0.18

Fix `pyspark` starters

Description

pyspark 3.4.0 was released on the 13th of April and has broken our pyspark-iris and potentially the other pyspark starter. This was initially discovered because the kedro-docker e2e tests that use the pyspark-iris starter were failing.

It's failing on all builds so Python version 3.7, 3.8, 3.9, 3.10.
Python 3.7 should be expected as the support is deprecated in Spark 3.4

The problem seems to be when the dataset is loaded. But it's not clear yet why.

Error

See the logs here: https://app.circleci.com/pipelines/github/kedro-org/kedro-starters/564/workflows/62a766f3-5e4c-4ad2-8e49-5907ec66a426/jobs/5820

Approach + findings so far

import pyspark.sql
    from kedro.io import DataCatalog
    from kedro.extras.datasets.spark import SparkDataSet

    spark_ds = SparkDataSet(
        filepath="/Users/merel_theisen/Projects/Testing/spark-issue/data/01_raw/iris.csv",
        file_format="csv",
        load_args={"header": True, "inferSchema": True},
        save_args={"header": True},
    )
    catalog = DataCatalog({"iris": spark_ds})

    df = catalog.load("iris")

Additional info

This line https://github.com/kedro-org/kedro/blob/main/kedro/extras/datasets/spark/spark_dataset.py#L386
Results in:
Screenshot 2023-04-20 at 10 50 56

The main change that seems to be new for 3.4.0 in the methods we use is:

 .. versionchanged:: 3.4.0
        Supports Spark Connect.

Change the directory structure used for starter pipeline tests to match the new structure used on `kedro pipeline create`

Description

With the changes to the directory structure applied in kedro-org/kedro#3731 now merged, @ankatiyar proposed updating the structure of folders created by the Kedro starters to match them.

Context

The current structure has the pipeline tests created by starters placed in <project root>/tests/pipelines, while the tests created from the kedro pipeline create command go into <project root>/tests/pipelines/<pipeline name>. What is being proposed is to put the pipeline tests from starters into their own directory as well.

For example:

image

In the image above, we can see test_data_science.py that was created by using the spaceflights-pandas starter and is located directly in the tests directory. The test file for my_pipeline, which was created with kedro pipeline create, is on it's own directory.

Possible Implementation

The initial idea would be putting the pipeline tests generated from starters on their own directory. In the aforementioned example, the structure would change from <project root>/tests/pipelines/test_data_science.py to <project root>/tests/pipelines/data_science/test_pipeline.py.

`standalone-datacatalog` doesn't have Kedro metadata

Description

As per title. pyproject.toml with the Kedro metadata is missing, which leads to some subcommands not being available, for example kedro jupyter:

https://github.com/kedro-org/kedro/blob/116ddd015e81d2a6930a0dfbe83e630e526634f4/kedro/framework/cli/cli.py#L172-L173

Context

I was trying to use the standalone-datacatalog from Jupyter to have a minimal Kedro setup from a notebook, but found that the kedro.ipython extension was not loading.

Steps to Reproduce

  1. kedro new -s standalone-datacatalog
  2. Try kedro jupyter notebook
  3. See it fail because it's not found

Expected Result

kedro jupyter works for all starters.

Actual Result

juan_cano@M-PH9T4K3P3C /t/test-kedro-ipython-mini> kedro jupyter notebook                                          (kpolars310) 
Usage: kedro [OPTIONS] COMMAND [ARGS]...
Try 'kedro -h' for help.

Error: No such command 'jupyter'.
juan_cano@M-PH9T4K3P3C /t/test-kedro-ipython-mini [2]> kedro                                                       (kpolars310) 
Usage: kedro [OPTIONS] COMMAND [ARGS]...

  Kedro is a CLI for creating and using Kedro projects. For more information,
  type ``kedro info``.

Options:
  -V, --version  Show version and exit
  -h, --help     Show this message and exit.

Global commands from Kedro
Commands:
  docs     See the kedro API docs and introductory tutorial.
  info     Get more information about kedro.
  new      Create a new kedro project.
  starter  Commands for working with project starters.

Your Environment

  • Kedro version used (pip show kedro or kedro -V): 0.18.4
  • Python version used (python -V): 3.10.9
  • Operating system and version: macOS Ventura

Update broken links across all starters

Description

We found that some links in pyspark-iris are broken when reviewing the databricks-iris starter.

It would be good to go through all the links in the starters (across docstrings / READMEs), check whether they are broken and if so, fix them.

[KED-2691] Link `kedro-starters` CI exclusively to `kedro` main branch

(transfer from Jira, created by @ignacioparicio)

Description

kedro-starters CI will fail whenever we update requirements.txt of kedro. This is because:

For now we are avoiding to do pip compile in order to deal with non-breaking changes in requirements.txt (see #36). A better solution:

  • Would have kedro-starters CI linked just to kedro's main branch, not the latest release
  • Would install just the requirements needed for each starter (status quo)

Possible implementation of this solution would be to patch requirements.txt in the kedro-starters CI to change any kedro[*]=={{ cookiecutter.kedro_version }} (which points to latest kedro release) to point to main instead. This would then enable a pure kedro install command to be run during CI (without the --no-build-reqs flag).

If a solution is found, we might consider reverting the changes made https://github.com/kedro-org/kedro-starters/pull/36/files, as they shouldn't be required anymore.

Update starters to support pandas 2.0

Description

Currently the starters can break if a user has pandas 2.0 installed. Update all starters so they can run fine with pandas 2.0 as well as older versions. This means updating the pin for kedro-datasets to ~=1.0 instead of ~=1.0.0.

Context

For example in spaceflights:

This should not be a problem if the user follows the normal workflow, but if they install pandas 2 separately, things break:

> pip install kedro pandas scikit-learn openpyxl pyarrow  # problems incoming
> kedro new --starter=spaceflights
> cd spaceflights
> kedro run  # uh oh
[05/05/23 15:25:57] INFO     Kedro project spaceflights                                                                                                                                                 session.py:360
[05/05/23 15:25:59] WARNING  /Users/juan_cano/.micromamba/envs/_test310/lib/python3.10/importlib/__init__.py:126: DeprecationWarning: `kedro.extras.datasets` is deprecated and will be removed in     warnings.py:109
                             Kedro 0.19, install `kedro-datasets` instead by running `pip install kedro-datasets`.                                                                                                    
                               return _bootstrap._gcd_import(name[level:], package, level)                                                                                                                            
                                                                                                                                                                                                                      
[05/05/23 15:26:00] INFO     Loading data from 'companies' (CSVDataSet)...                                                                                                                         data_catalog.py:343
                    INFO     Running node: preprocess_companies_node: preprocess_companies([companies]) -> [preprocessed_companies]                                                                        node.py:329
                    INFO     Saving data to 'preprocessed_companies' (ParquetDataSet)...                                                                                                           data_catalog.py:382
                    INFO     Completed 1 out of 6 tasks                                                                                                                                        sequential_runner.py:85
                    INFO     Loading data from 'shuttles' (ExcelDataSet)...                                                                                                                        data_catalog.py:343
[05/05/23 15:26:04] INFO     Running node: preprocess_shuttles_node: preprocess_shuttles([shuttles]) -> [preprocessed_shuttles]                                                                            node.py:329
                    ERROR    Node 'preprocess_shuttles_node: preprocess_shuttles([shuttles]) -> [preprocessed_shuttles]' failed with error:                                                                node.py:354
                             could not convert string to float: '$1325.0'                                                                                                                                             
                    WARNING  There are 5 nodes that have not run.                                                                                                                                        runner.py:205
                             You can resume the pipeline run from the nearest nodes with persisted inputs by adding the following argument to your previous command:                                                  
                               --from-nodes "preprocess_shuttles_node,create_model_input_table_node"                                                                                                                  
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /Users/juan_cano/.micromamba/envs/_test310/bin/kedro:8 in <module>                               │
│                                                                                                  │
│ /Users/juan_cano/.micromamba/envs/_test310/lib/python3.10/site-packages/kedro/framework/cli/cli. │
│ py:211 in main                                                                                   │
│                                                                                                  │
│   208 │   """                                                                                    │
│   209 │   _init_plugins()                                                                        │
│   210 │   cli_collection = KedroCLI(project_path=Path.cwd())                                     │
│ ❱ 211 │   cli_collection()                                                                       │
│   212                                                                                            │
│                                                                                                  │
│ /Users/juan_cano/.micromamba/envs/_test310/lib/python3.10/site-packages/click/core.py:1130 in    │
│ __call__                                                                                         │
│                                                                                                  │
│ /Users/juan_cano/.micromamba/envs/_test310/lib/python3.10/site-packages/kedro/framework/cli/cli. │
│ py:139 in main                                                                                   │
│                                                                                                  │
│   136 │   │   )                                                                                  │
│   137 │   │                                                                                      │
│   138 │   │   try:                                                                               │
│ ❱ 139 │   │   │   super().main(                                                                  │
│   140 │   │   │   │   args=args,                                                                 │
│   141 │   │   │   │   prog_name=prog_name,                                                       │
│   142 │   │   │   │   complete_var=complete_var,                                                 │
│                                                                                                  │
│ /Users/juan_cano/.micromamba/envs/_test310/lib/python3.10/site-packages/click/core.py:1055 in    │
│ main                                                                                             │
│                                                                                                  │
│ /Users/juan_cano/.micromamba/envs/_test310/lib/python3.10/site-packages/click/core.py:1657 in    │
│ invoke                                                                                           │
│                                                                                                  │
│ /Users/juan_cano/.micromamba/envs/_test310/lib/python3.10/site-packages/click/core.py:1404 in    │
│ invoke                                                                                           │
│                                                                                                  │
│ /Users/juan_cano/.micromamba/envs/_test310/lib/python3.10/site-packages/click/core.py:760 in     │
│ invoke                                                                                           │
│                                                                                                  │
│ /Users/juan_cano/.micromamba/envs/_test310/lib/python3.10/site-packages/kedro/framework/cli/proj │
│ ect.py:472 in run                                                                                │
│                                                                                                  │
│   469 │   with KedroSession.create(                                                              │
│   470 │   │   env=env, conf_source=conf_source, extra_params=params                              │
│   471 │   ) as session:                                                                          │
│ ❱ 472 │   │   session.run(                                                                       │
│   473 │   │   │   tags=tag,                                                                      │
│   474 │   │   │   runner=runner(is_async=is_async),                                              │
│   475 │   │   │   node_names=node_names,                                                         │
│                                                                                                  │
│ /Users/juan_cano/.micromamba/envs/_test310/lib/python3.10/site-packages/kedro/framework/session/ │
│ session.py:426 in run                                                                            │
│                                                                                                  │
│   423 │   │   )                                                                                  │
│   424 │   │                                                                                      │
│   425 │   │   try:                                                                               │
│ ❱ 426 │   │   │   run_result = runner.run(                                                       │
│   427 │   │   │   │   filtered_pipeline, catalog, hook_manager, session_id                       │
│   428 │   │   │   )                                                                              │
│   429 │   │   │   self._run_called = True                                                        │
│                                                                                                  │
│ /Users/juan_cano/.micromamba/envs/_test310/lib/python3.10/site-packages/kedro/runner/runner.py:9 │
│ 1 in run                                                                                         │
│                                                                                                  │
│    88 │   │   │   self._logger.info(                                                             │
│    89 │   │   │   │   "Asynchronous mode is enabled for loading and saving data"                 │
│    90 │   │   │   )                                                                              │
│ ❱  91 │   │   self._run(pipeline, catalog, hook_manager, session_id)                             │
│    92 │   │                                                                                      │
│    93 │   │   self._logger.info("Pipeline execution completed successfully.")                    │
│    94                                                                                            │
│                                                                                                  │
│ /Users/juan_cano/.micromamba/envs/_test310/lib/python3.10/site-packages/kedro/runner/sequential_ │
│ runner.py:70 in _run                                                                             │
│                                                                                                  │
│   67 │   │                                                                                       │
│   68 │   │   for exec_index, node in enumerate(nodes):                                           │
│   69 │   │   │   try:                                                                            │
│ ❱ 70 │   │   │   │   run_node(node, catalog, hook_manager, self._is_async, session_id)           │
│   71 │   │   │   │   done_nodes.add(node)                                                        │
│   72 │   │   │   except Exception:                                                               │
│   73 │   │   │   │   self._suggest_resume_scenario(pipeline, done_nodes, catalog)                │
│                                                                                                  │
│ /Users/juan_cano/.micromamba/envs/_test310/lib/python3.10/site-packages/kedro/runner/runner.py:3 │
│ 19 in run_node                                                                                   │
│                                                                                                  │
│   316 │   if is_async:                                                                           │
│   317 │   │   node = _run_node_async(node, catalog, hook_manager, session_id)                    │
│   318 │   else:                                                                                  │
│ ❱ 319 │   │   node = _run_node_sequential(node, catalog, hook_manager, session_id)               │
│   320 │                                                                                          │
│   321 │   for name in node.confirms:                                                             │
│   322 │   │   catalog.confirm(name)                                                              │
│                                                                                                  │
│ /Users/juan_cano/.micromamba/envs/_test310/lib/python3.10/site-packages/kedro/runner/runner.py:4 │
│ 15 in _run_node_sequential                                                                       │
│                                                                                                  │
│   412 │   )                                                                                      │
│   413 │   inputs.update(additional_inputs)                                                       │
│   414 │                                                                                          │
│ ❱ 415 │   outputs = _call_node_run(                                                              │
│   416 │   │   node, catalog, inputs, is_async, hook_manager, session_id=session_id               │
│   417 │   )                                                                                      │
│   418                                                                                            │
│                                                                                                  │
│ /Users/juan_cano/.micromamba/envs/_test310/lib/python3.10/site-packages/kedro/runner/runner.py:3 │
│ 81 in _call_node_run                                                                             │
│                                                                                                  │
│   378 │   │   │   is_async=is_async,                                                             │
│   379 │   │   │   session_id=session_id,                                                         │
│   380 │   │   )                                                                                  │
│ ❱ 381 │   │   raise exc                                                                          │
│   382 │   hook_manager.hook.after_node_run(                                                      │
│   383 │   │   node=node,                                                                         │
│   384 │   │   catalog=catalog,                                                                   │
│                                                                                                  │
│ /Users/juan_cano/.micromamba/envs/_test310/lib/python3.10/site-packages/kedro/runner/runner.py:3 │
│ 71 in _call_node_run                                                                             │
│                                                                                                  │
│   368 ) -> Dict[str, Any]:                                                                       │
│   369 │   # pylint: disable=too-many-arguments                                                   │
│   370 │   try:                                                                                   │
│ ❱ 371 │   │   outputs = node.run(inputs)                                                         │
│   372 │   except Exception as exc:                                                               │
│   373 │   │   hook_manager.hook.on_node_error(                                                   │
│   374 │   │   │   error=exc,                                                                     │
│                                                                                                  │
│ /Users/juan_cano/.micromamba/envs/_test310/lib/python3.10/site-packages/kedro/pipeline/node.py:3 │
│ 55 in run                                                                                        │
│                                                                                                  │
│   352 │   │   # purposely catch all exceptions                                                   │
│   353 │   │   except Exception as exc:                                                           │
│   354 │   │   │   self._logger.error("Node '%s' failed with error: \n%s", str(self), str(exc))   │
│ ❱ 355 │   │   │   raise exc                                                                      │
│   356 │                                                                                          │
│   357 │   def _run_with_no_inputs(self, inputs: Dict[str, Any]):                                 │
│   358 │   │   if inputs:                                                                         │
│                                                                                                  │
│ /Users/juan_cano/.micromamba/envs/_test310/lib/python3.10/site-packages/kedro/pipeline/node.py:3 │
│ 44 in run                                                                                        │
│                                                                                                  │
│   341 │   │   │   if not self._inputs:                                                           │
│   342 │   │   │   │   outputs = self._run_with_no_inputs(inputs)                                 │
│   343 │   │   │   elif isinstance(self._inputs, str):                                            │
│ ❱ 344 │   │   │   │   outputs = self._run_with_one_input(inputs, self._inputs)                   │
│   345 │   │   │   elif isinstance(self._inputs, list):                                           │
│   346 │   │   │   │   outputs = self._run_with_list(inputs, self._inputs)                        │
│   347 │   │   │   elif isinstance(self._inputs, dict):                                           │
│                                                                                                  │
│ /Users/juan_cano/.micromamba/envs/_test310/lib/python3.10/site-packages/kedro/pipeline/node.py:3 │
│ 75 in _run_with_one_input                                                                        │
│                                                                                                  │
│   372 │   │   │   │   f"{sorted(inputs.keys())}."                                                │
│   373 │   │   │   )                                                                              │
│   374 │   │                                                                                      │
│ ❱ 375 │   │   return self._func(inputs[node_input])                                              │
│   376 │                                                                                          │
│   377 │   def _run_with_list(self, inputs: Dict[str, Any], node_inputs: List[str]):              │
│   378 │   │   # Node inputs and provided run inputs should completely overlap                    │
│                                                                                                  │
│ /private/tmp/spaceflights/src/spaceflights/pipelines/data_processing/nodes.py:45 in              │
│ preprocess_shuttles                                                                              │
│                                                                                                  │
│   42 │   """                                                                                     │
│   43 │   shuttles["d_check_complete"] = _is_true(shuttles["d_check_complete"])                   │
│   44 │   shuttles["moon_clearance_complete"] = _is_true(shuttles["moon_clearance_complete"])     │
│ ❱ 45 │   shuttles["price"] = _parse_money(shuttles["price"])                                     │
│   46 │   return shuttles                                                                         │
│   47                                                                                             │
│   48                                                                                             │
│                                                                                                  │
│ /private/tmp/spaceflights/src/spaceflights/pipelines/data_processing/nodes.py:16 in _parse_money │
│                                                                                                  │
│   13                                                                                             │
│   14 def _parse_money(x: pd.Series) -> pd.Series:                                                │
│   15 │   x = x.str.replace("$", "", regex=True).str.replace(",", "")                             │
│ ❱ 16 │   x = x.astype(float)                                                                     │
│   17 │   return x                                                                                │
│   18                                                                                             │
│   19                                                                                             │
│                                                                                                  │
│ /Users/juan_cano/.micromamba/envs/_test310/lib/python3.10/site-packages/pandas/core/generic.py:6 │
│ 324 in astype                                                                                    │
│                                                                                                  │
│    6321 │   │                                                                                    │
│    6322 │   │   else:                                                                            │
│    6323 │   │   │   # else, only a single dtype is given                                         │
│ ❱  6324 │   │   │   new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)           │
│    6325 │   │   │   return self._constructor(new_data).__finalize__(self, method="astype")       │
│    6326 │   │                                                                                    │
│    6327 │   │   # GH 33113: handle empty frame or series                                         │
│                                                                                                  │
│ /Users/juan_cano/.micromamba/envs/_test310/lib/python3.10/site-packages/pandas/core/internals/ma │
│ nagers.py:451 in astype                                                                          │
│                                                                                                  │
│    448 │   │   elif using_copy_on_write():                                                       │
│    449 │   │   │   copy = False                                                                  │
│    450 │   │                                                                                     │
│ ❱  451 │   │   return self.apply(                                                                │
│    452 │   │   │   "astype",                                                                     │
│    453 │   │   │   dtype=dtype,                                                                  │
│    454 │   │   │   copy=copy,                                                                    │
│                                                                                                  │
│ /Users/juan_cano/.micromamba/envs/_test310/lib/python3.10/site-packages/pandas/core/internals/ma │
│ nagers.py:352 in apply                                                                           │
│                                                                                                  │
│    349 │   │   │   if callable(f):                                                               │
│    350 │   │   │   │   applied = b.apply(f, **kwargs)                                            │
│    351 │   │   │   else:                                                                         │
│ ❱  352 │   │   │   │   applied = getattr(b, f)(**kwargs)                                         │
│    353 │   │   │   result_blocks = extend_blocks(applied, result_blocks)                         │
│    354 │   │                                                                                     │
│    355 │   │   out = type(self).from_blocks(result_blocks, self.axes)                            │
│                                                                                                  │
│ /Users/juan_cano/.micromamba/envs/_test310/lib/python3.10/site-packages/pandas/core/internals/bl │
│ ocks.py:511 in astype                                                                            │
│                                                                                                  │
│    508 │   │   """                                                                               │
│    509 │   │   values = self.values                                                              │
│    510 │   │                                                                                     │
│ ❱  511 │   │   new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)           │
│    512 │   │                                                                                     │
│    513 │   │   new_values = maybe_coerce_values(new_values)                                      │
│    514                                                                                           │
│                                                                                                  │
│ /Users/juan_cano/.micromamba/envs/_test310/lib/python3.10/site-packages/pandas/core/dtypes/astyp │
│ e.py:242 in astype_array_safe                                                                    │
│                                                                                                  │
│   239 │   │   dtype = dtype.numpy_dtype                                                          │
│   240 │                                                                                          │
│   241 │   try:                                                                                   │
│ ❱ 242 │   │   new_values = astype_array(values, dtype, copy=copy)                                │
│   243 │   except (ValueError, TypeError):                                                        │
│   244 │   │   # e.g. _astype_nansafe can fail on object-dtype of strings                         │
│   245 │   │   #  trying to convert to float                                                      │
│                                                                                                  │
│ /Users/juan_cano/.micromamba/envs/_test310/lib/python3.10/site-packages/pandas/core/dtypes/astyp │
│ e.py:187 in astype_array                                                                         │
│                                                                                                  │
│   184 │   │   values = values.astype(dtype, copy=copy)                                           │
│   185 │                                                                                          │
│   186 │   else:                                                                                  │
│ ❱ 187 │   │   values = _astype_nansafe(values, dtype, copy=copy)                                 │
│   188 │                                                                                          │
│   189 │   # in pandas we don't store numpy str dtypes, so convert to object                      │
│   190 │   if isinstance(dtype, np.dtype) and issubclass(values.dtype.type, str):                 │
│                                                                                                  │
│ /Users/juan_cano/.micromamba/envs/_test310/lib/python3.10/site-packages/pandas/core/dtypes/astyp │
│ e.py:138 in _astype_nansafe                                                                      │
│                                                                                                  │
│   135 │                                                                                          │
│   136 │   if copy or is_object_dtype(arr.dtype) or is_object_dtype(dtype):                       │
│   137 │   │   # Explicit copy, or required since NumPy can't view from / to object.              │
│ ❱ 138 │   │   return arr.astype(dtype, copy=True)                                                │
│   139 │                                                                                          │
│   140 │   return arr.astype(dtype, copy=copy)                                                    │
│   141                                                                                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: could not convert string to float: '$1325.0'

I was about to do a quick demonstration of the spaceflights pipeline, and instead of following the normal process, I installed the dependencies "by hand".

Your Environment

Include as many relevant details about the environment in which you experienced the bug:

  • Kedro version used (pip show kedro or kedro -V): 0.18.8
  • Python version used (python -V): 3.10.10
  • Operating system and version: macOS Ventura

How to use a custom starter?

Hi!

I want to create a poetry starter with some of my personal setup.
What do I need to setup for kedro new --starter=poetry to recognize it?

Do I need to make a PR here or is it possible to supply the starter in another way? I couldn't find anything about it in the documentation.

Is my only option to use it with cookicutter ... ?

[KED-2622] Add Windows CI to kedro-starters

(transfer from Jira, created by @lorenabalan)

Follow-up on #24 (comment)

The setup was more fiddly to get behave working with windows, which felt like it deserved its own dedicated time and energy. Should change .circleci/config.yml, as well as some of the behave setup (like subprocess, bin_dir, etc.) to make it work.

Add Python 3.11 support to `kedro-starters`

Description

  • We've already added 3.11 support to Kedro and kedro-plugins. For the starters we'll just need to add 3.11 builds to ensure they run properly on Python 3.11.
  • Convert the CCI builds into Github Actions

CI is broken

#136 made a one character change in the README but there were some test failures.

https://app.circleci.com/pipelines/github/kedro-org/kedro-starters/618/workflows/687e92b1-d6d7-41e1-9348-dc8566bb9c39/jobs/6650

Failing scenarios:
  features/run.feature:26  Run a Kedro project created from pyspark-iris

0 features passed, 1 failed, 0 skipped
4 scenarios passed, 1 failed, 0 skipped
24 steps passed, 1 failed, 0 skipped, 0 undefined
Took 3m56.385s

Exited with code exit status 1

CircleCI received exit code 1

This is the error:

DatasetError: An exception occurred when parsing config for dataset 
'example_classifier':
Dataset type 'kedro.io.memory_dataset.MemoryDataSet' is invalid: all data set 
types must extend 'AbstractDataSet'.

Missing namespaces in spaceflights starter catalog

Now that the spaceflights starter has namespaced pipelines, the catalog entries should be updated to reflect this. Currently several of them are missing the right namespaces.

Also update the tutorial in the docs to match.
image

Example Notebook.ipynb in standalone-datacatalog fails with ValueError

Description

Example Notebook.ipynb in standalone-datacatalog has wrong configloader arguments.

Context

I wanted to try the Example Notebook.ipynb in the standalone-datacatalog starter.

Steps to Reproduce

  1. Open standalone-datacatalog/Example Notebook.ipynb with Jupyter
  2. Execute first cell

Expected Result

It should print the head of the dataframe.

Actual Result

It fails.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_33865/3764053128.py in <cell line: 8>()
      6 
      7 # Load the data catalog configuration from catalog.yml
----> 8 conf_catalog = conf_loader.get("catalog.yml")
      9 
     10 # Create the DataCatalog instance from the configuration

~/.local/share/virtualenvs/kedro-9vwvPpXf/lib/python3.10/site-packages/kedro/config/config.py in get(self, *patterns)
     97 
     98     def get(self, *patterns: str) -> Dict[str, Any]:
---> 99         return _get_config_from_patterns(
    100             conf_paths=self.conf_paths, patterns=list(patterns)
    101         )

~/.local/share/virtualenvs/kedro-9vwvPpXf/lib/python3.10/site-packages/kedro/config/common.py in _get_config_from_patterns(conf_paths, patterns, ac_template)
     93     for conf_path in conf_paths:
     94         if not Path(conf_path).is_dir():
---> 95             raise ValueError(
     96                 f"Given configuration path either does not exist "
     97                 f"or is not a valid directory: {conf_path}"

ValueError: Given configuration path either does not exist or is not a valid directory: conf/base/base

Possible fix

I am new to kedro, so I don't really understand the ConfigLoader yet.
If I change conf_loader = ConfigLoader("conf/base") to conf_loader = ConfigLoader("conf") it works though. Is this just on my machine?

Your Environment

  • Kedro version used (pip show kedro or kedro -V): kedro, version 0.18.0
  • Python version used (python -V): Python 3.10.4
  • Operating system and version: Manjaro Linux 21.2.6

Remove linting and test files + setup from all starters

Description

Remove all linting dependencies and configuration:

  • setup.cfg completely
  • linting config from pyproject.toml
  • linting dependencies from requirements.txt: flake8, black, isort

Remove test setup:

  • src/test directory completely
  • pytest dependencies from requirements.txt

Update e2e test setup to still lint the starters by calling flake8, isort and black commands directly instead of kedro lint.

  • To make these checks all pass we would need to add test setup for flake8, because the default max line length is 79 and we've always worked with a max lint length of 88.

Context

Follow up on: kedro-org/kedro#1849

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.