avaiga / taipy-core Goto Github PK

View Code? Open in Web Editor NEW

37.0 5.0 15.0 10.07 MB

A Python library to build powerful and customized data-driven back-end applications.

License: Apache License 2.0

Python 100.00%

hacktoberfest scenario-analysis

taipy-core's Issues

Improve Taipy Dockerisation

@florian-vuillemot commented on Mon Nov 22 2021

We can improve the Taipy docker with the following elements:

Remove the Gui part on the worker image.
Push Worker + Taipy on an internal Docker Registry.
Run unnitest during the Docker build

Remove unicode dependency on taipy-core

Since we do not use the unidecode library, we need to remove the dependency as well.

Improve Excel data node by allowing not to specify the sheet names

Description
Set the sheet name optional so if no sheet name is provided :

Taipy reads all the sheets and retrieve the name that is in the xls file.
Taipy write the sheets naming them according to the dict keys provided in the data to write.

Acceptance Criteria

Ensure new code is unit tested, and check code coverage is at least 90%
Propagate any change on the demos and run all of them to ensure there is no breaking change
Ensure any change is well documented

Expose more manager methods in the entities directly

What would that feature address
For all manager's method that takes a single scenario as parameter, create a similar method in the entity itself.

Description of the ideal solution
Both lines should do the same :

tp.submit(scenario)
scenario.submit()

Caveats
In order to avoid circular dependency, the manager should be imported dynamically in the entity method.

Acceptance Criteria

Do save a Generic data node after reading it.

Missing config_id in scenario model and pipeline model

config_id is missing from models ScenarioModel and PipelineModel. It must be addedd so we can retrieve it after it is stored.

BUG-Search environment variables in write_fct_params/read_fct_params

Description
The problem arises when trying to indicate an environment variable in write_fct_params/read_fct_params. The environment variables are not replaced by its true value. We just get 'ENV[VAR_NAME]' in the parameter of the write_fct.

How to reproduce
Have a generic data node and an environment variable indicated in the write_fct_params (here the write_fct_params notation is from an old version).

results_blob_storage = tp.configure_generic_data_node(id="results_blob_storage",
                                                      write_fct=write_blob_storage,
                                                      write_fct_params={'connect_str':'ENV[CONNECT_STR]'})

Expected behavior
We should receive in the write_fct, the true environment variables and not 'ENV[VAR_NAME]'.

Rename the `name` field of configs so it is more understandable

Description
For DataNodeConfig, TaskConfig, PipelineConfig, and ScenarioConfig, the name attribute should be renamed.
The name should change in variables, attributes, functions, docstrings, UTs

It can be done in multiple small PRs to avoid Huge PRS, and merging conflicts

Proposals

id

Acceptance Criteria

Rename name variable into configs, and config_name variables into entities
Ensure new code is unit tested, and check code coverage is at least 90%
Propagate any change on the milestones and run all of them to ensure there is no breaking change
Propagate any change on the taipy-rest, and taipy-airflow repos

Repository optimisation

@florian-vuillemot commented on Fri Feb 11 2022

To ensure we have the latest version of an object, we reload it from the disk on each access.
More ingenious ways can be to add a timestamp of the latest reload on each object. This timestamp should be reflected in the latest fill from the content on the disk.
If this timestamp is the same as the file containing its data, we don't need to reload it.

Acceptance criteria

Should be on each manager
Should improve performance
Timestamp should not be in the model

BUG-Writing SQL datanode from pandas.DataFrame (and maybe others) doesn't work

Description
The problem arises when trying to write to SQL database with a pd.DataFrame.

How to reproduce
Create a pandas dataframe and a SQL database of the same schema and try to write the pd.DataFrame to the SQL database thanks to a CSV data node.

*How to temporarily solve it
Change in sql.py Line 138 (_write):

self.__insert_dicts(data.to_dict(orient="records"), write_table, connection)

data.to_sql(self.write_table, con=connection, if_exists='replace', index=False)

This means that self.__insert_dicts will not be called. It is a temporarily fix.

Expected behavior
We shouldn't get any errors and the SQL database should be up to date with the values of the pd.DataFrame.

BUG-error raised when deleting a scenario

Description
The problem arises when trying to delete a scenario with tp.delete(scenario.id). An error is raised.

How to reproduce
Take the code here: https://github.com/Avaiga/taipy-getting-started. Add to step_6.py, tp.delete(scenario.id) at the end and run it.

Expected behavior
We should get our scenario deleted, and all the objects associated with this scenario (and not referenced by another scenario) also deleted without errors.

Screenshots

Runtime environment
Taipy 0.13

Ability to create multiple instance of taipy-core application

This is just my suggestion on taipy-core architecture and I would like this to be discussed in the next sprint planning as this is a breaking change.

Suppose I want to build two applications in one project: one for predicting weather and one for predicting stock prices.

Currently, since we only have one global config which is a singleton, which means every entities will share the same environment with others, regardless of relatedness.

Config.add_scenario([predict_weather_pipeline])
Config.add_scenario([predict_stock_pipeline])

In the above example, all the data nodes, tasks, pipelines and scenarios of weather and stock will be saved in the same folder, even though they are never used with each other.

What I propose is that we should allow user to create multiple instance of Taipy application (or Config). For example:

weather_app = Config()
weather_app.add_scenario([predict_weather_pipeline])

stock_app = Config()
stock_app.add_scenario([predict_stock_pipeline])

In the above example, since there are two different configs, we cannot access stock-related entities from the weather app (which is a good thing).

Even though the use case of having two different applications in the same project is not common, I think this issue should be discussed to see if it is useful enough to be refactored into.

improve clean_entities_enabled default value

Description
theconfig attribute clean_entities_enabled should have a default value equals to "ENV[TAIPY_CLEAN_ENTITIES_ENABLED]".
At run time, when reading the value, if no env variable is defined, then the default value should be false.

Acceptance Criteria

Ensure new code is unit tested, and check code coverage is at least 90%
Propagate any change on the demos and run all of them to ensure there is no breaking change
Ensure any change is well documented

Move Airflow config

Move airflow to taipy-airflow#1
Airflow configs is still in the Taipy repository.
We should move it and allow Taipy config to call it.

Todo:

Move Airflow config code in Taipy-airflow
Move Airflow config tests in Taipy-airflow
Update Airflow Config to be based on Properties and not class field => Will not be done
Update Taipy to be based on properties and not Airflow config class field => Will not be done
Re enable test in Taipy github action

Acceptance criteria

No Airflow code should stay in Taipy
Config interface should stay the same for end-user
Tests should be moved in Taipy-Airflow

Improve tp.configure_generic_data_node() signature

Add read_fct_params and write_fct_params
docstring and signature
and user manual

Hide all internal fields in entities and expose properties

Description
hide all fields in entities and expose properties

Acceptance Criteria

Ensure new code is unit tested, and check code coverage is at least 90%
Propagate any change on the demos and run all of them to ensure there is no breaking change
Ensure any change is well documented

Rename `master` into `official`

Variables, functions, docstrings, UTs

Improve tp.configure_generic_data_node() signature

Add read_fct_params and write_fct_params
docstring and signature
and user manual

Job repository should store subscribers in the job_model

Description

class _JobRepository(_FileSystemRepository[_JobModel, Job]):
    def __init__(self):
        super().__init__(model=_JobModel, dir_name="jobs")

    def _to_model(self, job: Job):
        return _JobModel(
            job.id,
            job._task.id,
            job._status,
            job._force,
            job._creation_date.isoformat(),
            [],
            self.__serialize_exceptions(job.exceptions),
        )

    def _from_model(self, model: _JobModel):
        job = Job(id=model.id, task=_TaskRepository().load(model.task_id))

        job.status = model.status  # type: ignore
        job.force = model.force  # type: ignore
        job.creation_date = datetime.fromisoformat(model.creation_date)  # type: ignore
        # for it in model.subscribers:
        #     try:
        #         job._subscribers.append(load_fct(it.get("fct_module"), it.get("fct_name")))
        #     except AttributeError:
        #         raise InvalidSubscriber(f"The subscriber function {it.get('fct_name')} cannot be load.")
        job._exceptions = [_load_fct(e["fct_module"], e["fct_name"])(*e["args"]) for e in model.exceptions]

        return job

The subscribers are not stored in the job_model.

Test improve test on DataManager

DataManager can create DataNode based on __DATA_NODE_CLASSES and __DATA_NODE_CLASS_MAP.

__DATA_NODE_CLASSES = {InMemoryDataNode, PickleDataNode, CSVDataNode, SQLDataNode, ExcelDataNode, GenericDataNode}
    __DATA_NODE_CLASS_MAP = {ds_class.storage_type(): ds_class for ds_class in __DATA_NODE_CLASSES}  # type: ignore

When we add a new DataNode type, we should also add it in the DataManager. But we already forgot to do it because no tests are done in DataNode if we can create the DataNode from the DataManager.
This bug is difficult to catch in the application because it is handled, and the error message is irrelevant.

How to reproduce:

class MyDataNode(InMemoryDataNode):
    ...

data_node = MyDataNode()
data_node.write('hello')
data_node.validity

Test improve test on DataManager

DataManager can create DataNode based on __DATA_NODE_CLASSES and __DATA_NODE_CLASS_MAP.

__DATA_NODE_CLASSES = {InMemoryDataNode, PickleDataNode, CSVDataNode, SQLDataNode, ExcelDataNode, GenericDataNode}
    __DATA_NODE_CLASS_MAP = {ds_class.storage_type(): ds_class for ds_class in __DATA_NODE_CLASSES}  # type: ignore

How to reproduce:

class MyDataNode(InMemoryDataNode):
    ...

data_node = MyDataNode()
data_node.write('hello')
data_node.validity

BUG-Performance of tp.create_scenario

work
Investigate the time spend on each part of the create calls.

Description
tp.create_scenarios takes a lot of time to execute, and the greater the number of scenarios is, the more it will take time. In a complete application with a lot of scenarios, it will cause a performance problem. Also, we can note that the general code is also slower.

How to reproduce
Try the Step 9 or Step 13 code of taipy-getting-started https://github.com/Avaiga/taipy-getting-started
download all the repo; each step is based on the previous ones. Then, create a lot of scenarios (in Scenario Manager for Step 13) and check in the log the time taken by this command.

Expected behavior
The time taken by this function should be lower.

Screenshots

Runtime environment
Please specify relevant indications.

OS: [Windows 10] conda environment with Taipy 0.13
Browser: [Firefox]

Label setting Issue

Added auto save for list type attributes

Description
Add auto save for list type attributes:

subscribers, exceptions for Job [to be removed]
job_ids for DataNode
subscribers for Scenario
subscribers for Pipeline

Acceptance Criteria

Ensure new code is unit tested, and check code coverage is at least 90%
Propagate any change on the demos and run all of them to ensure there is no breaking change
Ensure any change is well documented

Replace protect name method

Description

Instead of modifying a config id if it does not follow the python variable rules, the protect name method should raise an exception if the string does not respect the str.isidentifer() method.
Propose a better name for protect_name method

Acceptance Criteria

Ensure new code is unit tested, and check code coverage is at least 90%
Propagate any change on the demos and run all of them to ensure there is no breaking change
Ensure any change is well documented

Config refactoring

Job config is based on class, whereas we can establish them on dict.

TODO:

Update Airflow Config to be based on Properties and not class field
Update Taipy to be based on properties and not Airflow config class field

Acceptances criteria

Airflow should be updated
Tests should not be updated (interfaces will stay the same)

Ability to delete pickle files associated to data nodes when calling tp.delete_scenario/delete_pipeline

Description
The goal of this improvement will be to delete the pickle files associated with the data nodes that are being deleted when calling tp.delete_scenario/tp.delete_pipeline. This way, we can limit the storage use when we have a lot of pipelines, scenarios and data nodes.

Acceptance Criteria

Ensure new code is unit tested, and check code coverage is at least 90%
Propagate any change on the demos and run all of them to ensure there is no breaking change
Ensure any change is well documented

Label setting Issue

Data node Validity as a timedelta object

Description
Replace the 3 integer validity fields in data node and data node config by a single datetime.timedelta field.

Acceptance Criteria

Ensure new code is unit tested, and check code coverage is at least 90%
Propagate any change on the demos and run all of them to ensure there is no breaking change
Ensure any change is well documented

Hide private packages, classes and methods from user.

We should turn it as private - with '_' at the beginning of the function name + attributes - what we want to keep for internal stuff.
When turning as private, methods will not appear in the doc anymore.

Check reference manual generation is correct.

Merge subscribe_scenario and subscribe_pipeline in Taipy.py module.

Description
Replace subscribe_scenario and subscribe_pipeline in Taipy.py module by a single merged subscribe method. Here is a signature proposal :

def subscribe(callback: Callable[[Union[Scenario, Pipeline], Job], None], entity: Optional[Union[Scenario, Pipeline]] = None):

An exception should be raised if the callback type and the entity type are not compatible.
if no entity is provided, and if the callback is on Scenario class, the subscription should be on all the scenarios.
if no entity is provided, and if the callback is on Pipeline class, the subscription should be on all the pipelines.
do the same for unsubscribe

Acceptance Criteria

Ensure new code is unit tested, and check code coverage is at least 90%
Propagate any change on the demos and run all of them to ensure there is no breaking change
Ensure any change is well documented

Automatically save entity on repository each time an entity is edited

Description
The purpose is to make the usage of the set method unnecessary for the end user.
replace useless manager.set every time it is not necessary anymore

Proposed solution
Created a @.setter for all exposed attribute in all entity

We will want a with construct to batch the changes within a ContextManager.

Example

scenario._master_scenario = False
cls.set(master_scenario)

should be replaced by
scenario._master_scenario = False

With the with construct:

with scenario as s:
  s.bla
  s.blo
  s.blu

Acceptance Criteria

improve property "display_name"

Description
The display_name property of a scenario should be renamed to name.
We can also expose it as a direct property in the Scenario class so we have auto completion.

Acceptance Criteria

Ensure new code is unit tested, and check code coverage is at least 90%
Propagate any change on the demos and run all of them to ensure there is no breaking change
Ensure any change is well documented

Apply log strategy on Taipy REST

Description
What this improvement addresses (performance, API...).

Acceptance Criteria

Ensure new code is unit tested, and check code coverage is at least 90%

BUG-Tasks of second pipeline not executed because of cache

Description
The tasks of a second pipeline are not executed when the output of the first task has been cached.

How to reproduce
Put 'cacheable=True' dans cleaned_dataset_cfg dans le code https://github.com/Avaiga/taipy-getting-started and run step 9, 10, 11, 12 or 13.

Expected behavior
Tasks of the second pipeline should be executed. Here just the predicting with ML should be executed and not the first task clean_data that the first pipeline has already done.

Runtime environment
Taipy 0.13

Improve read_fct_params and write_fct_params acceptance type

Description

Improve read_fct_params and write_fct_params acceptance type. Currently, only accept List and Dict, which can be limited for the users. It should only accept Tuples which is more standard.

Replace Both List and Dict by Tuple only.

Acceptance Criteria

Ensure new code is unit tested, and check code coverage is at least 90%
Propagate any change on the demos and run all of them to ensure there is no breaking change
Ensure any change is well documented

Expose aliases into taipy-core.init.py

Implement a Data node subscription

Implement a data node subscription where a callback is called whenever a data node is written.

The subscription should be very similar to the scenario subscription.

Rename method

Rename fs_base.__get_model method to a better name like
fs_base.__get_model_path

Automatic end2end testing on taipy-rest

@jrobinAV commented on Fri Feb 11 2022

The purpose is to have end2end testing of Taipy-core. We can take the opportunity of having taipy-rest to make it simpler to implement.

Deploy an application
Propose a test plan for testing each api independently + analyze behavior

Define and implement logging strategy for taipy back end code

@jrobinAV commented on Thu Sep 23 2021

Define back end logging strategy : what are logs and their levels to expose.
Configuration : apply log strategy to the various config objects (and propagate)
Run time : apply log strategy to the various managers objects (and propagate)
Run time : apply log strategy to the scheduler and the job execution (and propagate)

Improve DataManager

DataManager can create DataNode based on __DATA_NODE_CLASSES and __DATA_NODE_CLASS_MAP.

__DATA_NODE_CLASSES = {InMemoryDataNode, PickleDataNode, CSVDataNode, SQLDataNode, ExcelDataNode, GenericDataNode}
    __DATA_NODE_CLASS_MAP = {ds_class.storage_type(): ds_class for ds_class in __DATA_NODE_CLASSES}  # type: ignore

How to reproduce:

class MyDataNode(InMemoryDataNode):
    ...

data_node = MyDataNode()
data_node.write('hello')
data_node.validity

Save datanode exposed type property

Description
Improve save datanode exposed type property to repo and add unit test

Acceptance Criteria

Ensure new code is unit tested, and check code coverage is at least 90%
Propagate any change on the demos and run all of them to ensure there is no breaking change
Ensure any change is well documented

BUG-SQL datanode not working with Azure SQL Database

Description
The problem raises when trying to connect to a Azure SQL database.
This is the error:
(pyodbc.InterfaceError) ('IM002', '[IM002] [Microsoft][Gestionnaire de pilotes ODBC] Source de données introuvable et nom de pilote non spécifié (0) (SQLDriverConnect)')

How to correct it

Change in SQLDataNode.py line 97 (__build_conn_string):

return "mssql+pyodbc:///?odbc_connect=" + urllib.parse.quote_plus(
                f"DRIVER=FreeTDS;SERVER={host};PORT={port};DATABASE={database};UID={username};PWD={password};TDS_Version=8.0;"
            )

return "mssql+pyodbc:///?odbc_connect=" + "DRIVER={ODBC Driver 18 for SQL Server};" + urllib.parse.quote_plus(
                f"SERVER={host};PORT={port};DATABASE={database};UID={username};")+f"PWD={password};TDS_Version=8.0;"

Change the default port from 143 to 1433. so change in taipy.py in line 636 (configure_sql_data_node)

db_port: int = 143,

db_port: int = 1433,

How to improve it

Make the 'db_host' parameter directly visible for the configure_sql_data_node function in taipy.py.
Line 628-639

def configure_sql_data_node(
    id: str,
    db_username: str,
    db_password: str,
    db_name: str,
    db_engine: str,
    read_query: str,
    write_table: str,
    db_port: int = 143,
    scope: Scope = DataNodeConfig._DEFAULT_SCOPE,
    **properties,
):

def configure_sql_data_node(
    id: str,
    db_username: str,
    db_password: str,
    db_name: str,
    db_engine: str,
    read_query: str,
    write_table: str,
    db_port: int = 1433,
    db_host: str = 'localhost',
    scope: Scope = DataNodeConfig._DEFAULT_SCOPE,
    **properties,
):

Make write_table optional if we don't want to write in it

How to reproduce
Run the demo 'classification_csv_churn_v2' from 'taipy_0_13' branch present here: https://github.com/Avaiga/demos
An error is raised because of the SQL datanode.

Expected behavior
We shouldn't have an error.

Runtime environment
Windows 11

Test improve test on DataManager

DataManager can create DataNode based on __DATA_NODE_CLASSES and __DATA_NODE_CLASS_MAP.

__DATA_NODE_CLASSES = {InMemoryDataNode, PickleDataNode, CSVDataNode, SQLDataNode, ExcelDataNode, GenericDataNode}
    __DATA_NODE_CLASS_MAP = {ds_class.storage_type(): ds_class for ds_class in __DATA_NODE_CLASSES}  # type: ignore

How to reproduce:

class MyDataNode(InMemoryDataNode):
    ...

data_node = MyDataNode()
data_node.write('hello')
data_node.validity

Implement hard delete for cycle

taipy.delete(cycle)
only delete the cycle and do not propagate the deletion. We should propagate the deletion to scenarios attached to the cycle to delete.

Refactor Manager

@florian-vuillemot commented on Wed Jan 12 2022

Backend manager shared a lot of code. Refactoring can help us to remove some.
Acceptance criteria

Interface should stay the same
Code + test should be refactored

BUG-The pipeline model is not compatible with new paradigm (task's inputs & outputs are optional)

Description
The pipeline model is a DAG representation of the pipeline. It actually holds the DAG as two lists of edges :

data -> task
task -> data
If a Task has no input and no output, the pipeline model does not hold the task at all.
So when the pipeline is set, the task disappear. Then, when the get method is used to retrieve a pipeline, the task is absent.

How to reproduce
The following unit test fails. It should not.

import taipy.core as tp

g = 0


def do_nothing():
    global g
    g += 1
    print('hello')



def test_submit_scenario_from_tasks():
    task_1_cfg = tp.configure_task("my_task_1", do_nothing)
    scenario_cfg = tp.configure_scenario_from_tasks("my_scenario", [task_1_cfg ])
    scenario = tp.create_scenario(scenario_cfg )
    tp.submit(scenario)

    assert g == 1

Clean github actions

@jrobinAV commented on Mon Jan 03 2022

Description
Github actions becomes big and can be refactor. Moreover some bug can be in (Avaiga/taipy#584)

Test installation based on pip and optional package.
Test python3.8 (milestones too).
Fix coverage that exclude optional test.
Fix pyarrow and python-dotenv package in setup.py
Build taipy each time develop is pushed => Check with GUI. We should be able to pip install .../taipy@develop
Add taibot on Taipy-airflow
Add taibot on Taipy-rest
Doc about the release process => No release process. All is build with the setup.py

Acceptance Criteria

Ensure tests are correctly runs
Refactor Actions
Improve usage of Github Actions ressources

Implement generic scenario tag

@jrobinAV commented on Mon Jan 31 2022

What would that feature address
Implement a tag system on scenario.
For each tag value, maximum one scenario per cycle have this tag value.
A scenario can have multiple tags
A tag may not have a scenario on a cycle.
A user can define any possible str value as tag
A user can retrieve all the tag values already defined
A user can select all scenarios by tag.
A user can run all scenarios with a tag.

Description of the ideal solution

Acceptance Criteria

Ensure new code is unit tested, and check code coverage is at least 90%
create a new milestone demo
Ensure any change is fully documented

avaiga / taipy-core Goto Github PK

taipy-core's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs