avaiga / taipy-core Goto Github PK
View Code? Open in Web Editor NEWA Python library to build powerful and customized data-driven back-end applications.
License: Apache License 2.0
A Python library to build powerful and customized data-driven back-end applications.
License: Apache License 2.0
@florian-vuillemot commented on Mon Nov 22 2021
We can improve the Taipy docker with the following elements:
Since we do not use the unidecode library, we need to remove the dependency as well.
Description
Set the sheet name optional so if no sheet name is provided :
Acceptance Criteria
What would that feature address
For all manager's method that takes a single scenario as parameter, create a similar method in the entity itself.
Description of the ideal solution
Both lines should do the same :
tp.submit(scenario)
scenario.submit()
Caveats
In order to avoid circular dependency, the manager should be imported dynamically in the entity method.
Acceptance Criteria
pipeline.subscribe(callback: Callable[[Pipeline, Job], None])
pipeline.unsubscribe(callback: Callable[[Pipeline, Job], None])
pipeline.submit(callbacks: Optional[List[Callable]] = None, force: bool = False)
scenario.subscribe(callback: Callable[[Scenario, Job], None])
scenario.unsubscribe(callback: Callable[[Scenario, Job], None])
scenario.submit(callbacks: Optional[List[Callable]] = None, force: bool = False)
scenario.set_master()
(or set_main)scenario.add_tag(tag: str)
scenario.remove_tag(tag: str)
Do save a Generic data node after reading it.
config_id is missing from models ScenarioModel and PipelineModel. It must be addedd so we can retrieve it after it is stored.
Description
The problem arises when trying to indicate an environment variable in write_fct_params/read_fct_params. The environment variables are not replaced by its true value. We just get 'ENV[VAR_NAME]' in the parameter of the write_fct.
How to reproduce
Have a generic data node and an environment variable indicated in the write_fct_params (here the write_fct_params notation is from an old version).
results_blob_storage = tp.configure_generic_data_node(id="results_blob_storage",
write_fct=write_blob_storage,
write_fct_params={'connect_str':'ENV[CONNECT_STR]'})
Expected behavior
We should receive in the write_fct, the true environment variables and not 'ENV[VAR_NAME]'.
Description
For DataNodeConfig, TaskConfig, PipelineConfig, and ScenarioConfig, the name
attribute should be renamed.
The name should change in variables, attributes, functions, docstrings, UTs
It can be done in multiple small PRs to avoid Huge PRS, and merging conflicts
Proposals
id
Acceptance Criteria
name
variable into configs, and config_name
variables into entities@florian-vuillemot commented on Fri Feb 11 2022
To ensure we have the latest version of an object, we reload it from the disk on each access.
More ingenious ways can be to add a timestamp of the latest reload on each object. This timestamp should be reflected in the latest fill from the content on the disk.
If this timestamp is the same as the file containing its data, we don't need to reload it.
Acceptance criteria
Description
The problem arises when trying to write to SQL database with a pd.DataFrame.
How to reproduce
Create a pandas dataframe and a SQL database of the same schema and try to write the pd.DataFrame to the SQL database thanks to a CSV data node.
*How to temporarily solve it
Change in sql.py Line 138 (_write):
self.__insert_dicts(data.to_dict(orient="records"), write_table, connection)
->
data.to_sql(self.write_table, con=connection, if_exists='replace', index=False)
This means that self.__insert_dicts will not be called. It is a temporarily fix.
Expected behavior
We shouldn't get any errors and the SQL database should be up to date with the values of the pd.DataFrame.
Description
The problem arises when trying to delete a scenario with tp.delete(scenario.id). An error is raised.
How to reproduce
Take the code here: https://github.com/Avaiga/taipy-getting-started. Add to step_6.py, tp.delete(scenario.id) at the end and run it.
Expected behavior
We should get our scenario deleted, and all the objects associated with this scenario (and not referenced by another scenario) also deleted without errors.
Runtime environment
Taipy 0.13
This is just my suggestion on taipy-core architecture and I would like this to be discussed in the next sprint planning as this is a breaking change.
Suppose I want to build two applications in one project: one for predicting weather and one for predicting stock prices.
Currently, since we only have one global config which is a singleton, which means every entities will share the same environment with others, regardless of relatedness.
Config.add_scenario([predict_weather_pipeline])
Config.add_scenario([predict_stock_pipeline])
In the above example, all the data nodes, tasks, pipelines and scenarios of weather and stock will be saved in the same folder, even though they are never used with each other.
What I propose is that we should allow user to create multiple instance of Taipy application (or Config). For example:
weather_app = Config()
weather_app.add_scenario([predict_weather_pipeline])
stock_app = Config()
stock_app.add_scenario([predict_stock_pipeline])
In the above example, since there are two different configs, we cannot access stock-related entities from the weather app (which is a good thing).
Even though the use case of having two different applications in the same project is not common, I think this issue should be discussed to see if it is useful enough to be refactored into.
Description
theconfig attribute clean_entities_enabled should have a default value equals to "ENV[TAIPY_CLEAN_ENTITIES_ENABLED]".
At run time, when reading the value, if no env variable is defined, then the default value should be false.
Acceptance Criteria
Move airflow to taipy-airflow#1
Airflow configs is still in the Taipy repository.
We should move it and allow Taipy config to call it.
Todo:
Acceptance criteria
Description
hide all fields in entities and expose properties
Acceptance Criteria
Variables, functions, docstrings, UTs
Description
class _JobRepository(_FileSystemRepository[_JobModel, Job]):
def __init__(self):
super().__init__(model=_JobModel, dir_name="jobs")
def _to_model(self, job: Job):
return _JobModel(
job.id,
job._task.id,
job._status,
job._force,
job._creation_date.isoformat(),
[],
self.__serialize_exceptions(job.exceptions),
)
def _from_model(self, model: _JobModel):
job = Job(id=model.id, task=_TaskRepository().load(model.task_id))
job.status = model.status # type: ignore
job.force = model.force # type: ignore
job.creation_date = datetime.fromisoformat(model.creation_date) # type: ignore
# for it in model.subscribers:
# try:
# job._subscribers.append(load_fct(it.get("fct_module"), it.get("fct_name")))
# except AttributeError:
# raise InvalidSubscriber(f"The subscriber function {it.get('fct_name')} cannot be load.")
job._exceptions = [_load_fct(e["fct_module"], e["fct_name"])(*e["args"]) for e in model.exceptions]
return job
The subscribers are not stored in the job_model.
DataManager can create DataNode based on __DATA_NODE_CLASSES and __DATA_NODE_CLASS_MAP.
__DATA_NODE_CLASSES = {InMemoryDataNode, PickleDataNode, CSVDataNode, SQLDataNode, ExcelDataNode, GenericDataNode}
__DATA_NODE_CLASS_MAP = {ds_class.storage_type(): ds_class for ds_class in __DATA_NODE_CLASSES} # type: ignore
When we add a new DataNode type, we should also add it in the DataManager. But we already forgot to do it because no tests are done in DataNode if we can create the DataNode from the DataManager.
This bug is difficult to catch in the application because it is handled, and the error message is irrelevant.
How to reproduce:
class MyDataNode(InMemoryDataNode):
...
data_node = MyDataNode()
data_node.write('hello')
data_node.validity
DataManager can create DataNode based on __DATA_NODE_CLASSES and __DATA_NODE_CLASS_MAP.
__DATA_NODE_CLASSES = {InMemoryDataNode, PickleDataNode, CSVDataNode, SQLDataNode, ExcelDataNode, GenericDataNode}
__DATA_NODE_CLASS_MAP = {ds_class.storage_type(): ds_class for ds_class in __DATA_NODE_CLASSES} # type: ignore
When we add a new DataNode type, we should also add it in the DataManager. But we already forgot to do it because no tests are done in DataNode if we can create the DataNode from the DataManager.
This bug is difficult to catch in the application because it is handled, and the error message is irrelevant.
How to reproduce:
class MyDataNode(InMemoryDataNode):
...
data_node = MyDataNode()
data_node.write('hello')
data_node.validity
work
Investigate the time spend on each part of the create calls.
Description
tp.create_scenarios takes a lot of time to execute, and the greater the number of scenarios is, the more it will take time. In a complete application with a lot of scenarios, it will cause a performance problem. Also, we can note that the general code is also slower.
How to reproduce
Try the Step 9 or Step 13 code of taipy-getting-started https://github.com/Avaiga/taipy-getting-started
download all the repo; each step is based on the previous ones. Then, create a lot of scenarios (in Scenario Manager for Step 13) and check in the log the time taken by this command.
Expected behavior
The time taken by this function should be lower.
Runtime environment
Please specify relevant indications.
Description
Add auto save for list
type attributes:
Acceptance Criteria
Description
Acceptance Criteria
Job config is based on class, whereas we can establish them on dict.
TODO:
Acceptances criteria
Description
The goal of this improvement will be to delete the pickle files associated with the data nodes that are being deleted when calling tp.delete_scenario/tp.delete_pipeline. This way, we can limit the storage use when we have a lot of pipelines, scenarios and data nodes.
Acceptance Criteria
Description
Replace the 3 integer validity fields in data node and data node config by a single datetime.timedelta field.
Acceptance Criteria
We should turn it as private - with '_' at the beginning of the function name + attributes - what we want to keep for internal stuff.
When turning as private, methods will not appear in the doc anymore.
Check reference manual generation is correct.
Description
Replace subscribe_scenario and subscribe_pipeline in Taipy.py module by a single merged subscribe method. Here is a signature proposal :
def subscribe(callback: Callable[[Union[Scenario, Pipeline], Job], None], entity: Optional[Union[Scenario, Pipeline]] = None):
Acceptance Criteria
Description
The purpose is to make the usage of the set method unnecessary for the end user.
replace useless manager.set
every time it is not necessary anymore
Proposed solution
Created a @.setter for all exposed attribute in all entity
We will want a with
construct to batch the changes within a ContextManager
.
Example
scenario._master_scenario = False
cls.set(master_scenario)
should be replaced by
scenario._master_scenario = False
With the with
construct:
with scenario as s:
s.bla
s.blo
s.blu
Acceptance Criteria
Description
The display_name property of a scenario should be renamed to name.
We can also expose it as a direct property in the Scenario class so we have auto completion.
Acceptance Criteria
Description
What this improvement addresses (performance, API...).
Acceptance Criteria
Description
The tasks of a second pipeline are not executed when the output of the first task has been cached.
How to reproduce
Put 'cacheable=True' dans cleaned_dataset_cfg dans le code https://github.com/Avaiga/taipy-getting-started and run step 9, 10, 11, 12 or 13.
Expected behavior
Tasks of the second pipeline should be executed. Here just the predicting with ML should be executed and not the first task clean_data that the first pipeline has already done.
Runtime environment
Taipy 0.13
Description
Replace Both List and Dict by Tuple only.
Acceptance Criteria
Implement a data node subscription where a callback is called whenever a data node is written.
The subscription should be very similar to the scenario subscription.
Rename fs_base.__get_model
method to a better name like
fs_base.__get_model_path
@jrobinAV commented on Fri Feb 11 2022
The purpose is to have end2end testing of Taipy-core. We can take the opportunity of having taipy-rest to make it simpler to implement.
@jrobinAV commented on Thu Sep 23 2021
Define back end logging strategy : what are logs and their levels to expose.
Configuration : apply log strategy to the various config objects (and propagate)
Run time : apply log strategy to the various managers objects (and propagate)
Run time : apply log strategy to the scheduler and the job execution (and propagate)
DataManager can create DataNode based on __DATA_NODE_CLASSES and __DATA_NODE_CLASS_MAP.
__DATA_NODE_CLASSES = {InMemoryDataNode, PickleDataNode, CSVDataNode, SQLDataNode, ExcelDataNode, GenericDataNode}
__DATA_NODE_CLASS_MAP = {ds_class.storage_type(): ds_class for ds_class in __DATA_NODE_CLASSES} # type: ignore
When we add a new DataNode type, we should also add it in the DataManager. But we already forgot to do it because no tests are done in DataNode if we can create the DataNode from the DataManager.
This bug is difficult to catch in the application because it is handled, and the error message is irrelevant.
How to reproduce:
class MyDataNode(InMemoryDataNode):
...
data_node = MyDataNode()
data_node.write('hello')
data_node.validity
Description
Improve save datanode exposed type property to repo and add unit test
Acceptance Criteria
Description
The problem raises when trying to connect to a Azure SQL database.
This is the error:
(pyodbc.InterfaceError) ('IM002', '[IM002] [Microsoft][Gestionnaire de pilotes ODBC] Source de données introuvable et nom de pilote non spécifié (0) (SQLDriverConnect)')
How to correct it
return "mssql+pyodbc:///?odbc_connect=" + urllib.parse.quote_plus(
f"DRIVER=FreeTDS;SERVER={host};PORT={port};DATABASE={database};UID={username};PWD={password};TDS_Version=8.0;"
)
->
return "mssql+pyodbc:///?odbc_connect=" + "DRIVER={ODBC Driver 18 for SQL Server};" + urllib.parse.quote_plus(
f"SERVER={host};PORT={port};DATABASE={database};UID={username};")+f"PWD={password};TDS_Version=8.0;"
db_port: int = 143,
->
db_port: int = 1433,
How to improve it
def configure_sql_data_node(
id: str,
db_username: str,
db_password: str,
db_name: str,
db_engine: str,
read_query: str,
write_table: str,
db_port: int = 143,
scope: Scope = DataNodeConfig._DEFAULT_SCOPE,
**properties,
):
->
def configure_sql_data_node(
id: str,
db_username: str,
db_password: str,
db_name: str,
db_engine: str,
read_query: str,
write_table: str,
db_port: int = 1433,
db_host: str = 'localhost',
scope: Scope = DataNodeConfig._DEFAULT_SCOPE,
**properties,
):
How to reproduce
Run the demo 'classification_csv_churn_v2' from 'taipy_0_13' branch present here: https://github.com/Avaiga/demos
An error is raised because of the SQL datanode.
Expected behavior
We shouldn't have an error.
Runtime environment
Windows 11
DataManager can create DataNode based on __DATA_NODE_CLASSES and __DATA_NODE_CLASS_MAP.
__DATA_NODE_CLASSES = {InMemoryDataNode, PickleDataNode, CSVDataNode, SQLDataNode, ExcelDataNode, GenericDataNode}
__DATA_NODE_CLASS_MAP = {ds_class.storage_type(): ds_class for ds_class in __DATA_NODE_CLASSES} # type: ignore
When we add a new DataNode type, we should also add it in the DataManager. But we already forgot to do it because no tests are done in DataNode if we can create the DataNode from the DataManager.
This bug is difficult to catch in the application because it is handled, and the error message is irrelevant.
How to reproduce:
class MyDataNode(InMemoryDataNode):
...
data_node = MyDataNode()
data_node.write('hello')
data_node.validity
taipy.delete(cycle)
only delete the cycle and do not propagate the deletion. We should propagate the deletion to scenarios attached to the cycle to delete.
@florian-vuillemot commented on Wed Jan 12 2022
Backend manager shared a lot of code. Refactoring can help us to remove some.
Acceptance criteria
Description
The pipeline model is a DAG representation of the pipeline. It actually holds the DAG as two lists of edges :
How to reproduce
The following unit test fails. It should not.
import taipy.core as tp
g = 0
def do_nothing():
global g
g += 1
print('hello')
def test_submit_scenario_from_tasks():
task_1_cfg = tp.configure_task("my_task_1", do_nothing)
scenario_cfg = tp.configure_scenario_from_tasks("my_scenario", [task_1_cfg ])
scenario = tp.create_scenario(scenario_cfg )
tp.submit(scenario)
assert g == 1
@jrobinAV commented on Mon Jan 03 2022
Description
Github actions becomes big and can be refactor. Moreover some bug can be in (Avaiga/taipy#584)
pyarrow
and python-dotenv
package in setup.py
setup.py
Acceptance Criteria
@jrobinAV commented on Mon Jan 31 2022
What would that feature address
Implement a tag system on scenario.
For each tag value, maximum one scenario per cycle have this tag value.
A scenario can have multiple tags
A tag may not have a scenario on a cycle.
A user can define any possible str value as tag
A user can retrieve all the tag values already defined
A user can select all scenarios by tag.
A user can run all scenarios with a tag.
Description of the ideal solution
Acceptance Criteria
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.