01-edu / branch-ai Goto Github PK

🧠 This repository was used to contain all the necessary AI Branch content of the @01-edu curriculum (Everything have been moved to our Public repository)

branch-ai's People

Contributors

Stargazers

Watchers

Forkers

bakarseck keivon

branch-ai's Issues

Week 2 day 3 Ex 5 Categorical variables: Incorrect order of output.

In the first question in the audit asks to get the number of unique values on each column. It asks for this output:

age             3
menopause      11
tumor-size      7
inv-nodes       2
node-caps       3
deg-malig       2
breast          5
breast-quad     2
irradiat        2
dtype: int64

But looking at the readme that comes with the data, that's not the order of the columns. It appears to be shifted. With the right order the result I get is:

age             6
menopause       3
tumor-size     11
inv-nodes       6
node-caps       2
deg-malig       3
breast          2
breast-quad     5
irradiat        2
dtype: int64

For instance, the possible values of menopause are: menopause: lt40, ge40, premeno.. The maximum amount of possible unique values is 3.

The rest of the exercise was done with this incorrect interpretation of the data, so it should be corrected accordingly.

Week 3 day 4 ex2 Innacurate audit output

When removing the punctuation, it shouldn't remove the spaces as well. In the audit file it's removing the last space.

Right output would be: Remove this from the sentence , with 2 spaces after from.
Output in the audit is Remove this from the sentence.

If extra spaces should be removed it should be specified in the description, and in that case the extra space after from should be removed as well. Either way what is asking on the audit file is not correct.

Week 3 day 3 ex 2 The data file is not there and there is no clue where to get it from

I managed to get it from plotly datasets, and then checked Lee's solution and that's the one he used.

It is not clear if that is the data-set or not. A link or some indication of where to get it from, or perhaps providing the actual file could solve this problem.

Week 1 Day 4 Ex4 Missing import

When trying to run the exercise as it suggests in the audit, the import tabulate is needed to run the function to_markdown().

This import is not specified as required on the readme file when setting up the environment.

week01/day01/ex00

The exercise is asking the learner to install python or conda and packages assuming that he or she has access to do that which is not the case in the campus computers.

let assume that they have access (root or sudo) to install python or conda the audit need to be done on the same computer where the learner prepared the environment and I don't think it is possible in all cases as the learners are using the campus computers. So it would be better to ask in the exercise to create a shell script to automate the process of preparing and installing the environment and to run the jupyter-notebook and they can update the script while they are going on the piscine

also for the first question:
1.Create a virtual environment named ex00, with Python 3.8, with the following libraries: numpy, jupyter.

I think it makes more sense if they create piscine_Ai environment and they will need to push the script file with ex00 contains the notebook file

Week 2 day 4 ex4 Plot_roc_curve is deprecated

When solving it I get a warning saying that Plot_roc_curve is deprecated and will be removed in future versions

Week 3 day 5 Different results

In this quest there are many exercises where I am getting different results that the ones in the audit file . Don't know if it is because I am doing something wrong, or because the pipelines were updated, or because there is some random component to the result or because of another reason.

Checking Lee's results, his results are different from mine, and different from the ones in the audit file. So perhaps the reason is randomness in some degree? I think if this is not clear when the learners are about to audit this exercise in the future this will be a source of issues since they won't know how to proceed.

Week 2 Day 1 ex 6 question 4 mismatch between result and the one required in audit file

When doing this exercise with the current data set I get different values than the ones required in the audit.

This is probably due to the fact that data being used now contains more recent entries, that were not there at the time of the exercise creation, about a year ago. Since the data-set is being provided by the library it was likely updated with more recent entries by now. I double-checked and the toy data-sets don't get updated, so it might be something else. Anyway if in the audit is asking for an specific result this should be addressed.

Perhaps it would be better to provide the data as a file, or to test if the student is accurately getting the expected results in a way that is not dependent on updating the expected values according to new data every given time.

Week 1 Day 5 Ex4 Data is not there

It is not clear if the data file is provided(but is not there) or the student should get it from the web.

Week 3 day 1 Ex1 The neuron. Python conventions.

As someone new learning python, doing some research regarding naming conventions in python, I found that classes should have the first argument named cls, instances should be self.

When defining the Neuron Class shouldn’t this convention be applied?

Remove all todos

Project 4 !!!NO LEAKAGE!!!

When trying to understand from the point of view of a learner what is required to be done, I found it difficult to understand what is meant by leakage by reading the document or the small guide provided. I don't remember if this was addressed in the piscine.

When doing a google search I found this definition quite clear: https://en.wikipedia.org/wiki/Leakage_(machine_learning).
Do you think it would be a good idea to add that link to facilitate the task of gathering concepts required as in other parts of the branch? or should it be left to each one to search and find by themselves?

Financial data missing project 4

Raid 02 Files missing

The description is mentioning 3 files: train.csv, test.csv, and covtype.data.
The files are not there.
Also the next line mentions covtype.info, it's not clear if this is a mistake or another file.

What data set to use? Week 3 Day 3 ex 5

The description of the exercise says to use the flower data-set. I assumed that it means the Iris data-set from sklearn. A link or a mention of where to get the data-set from, or if it was used in a previous exercise, a reference to that exercise would be good.

Week 2 day 5 ex04 Validation curve and learning curve, plot doesn't look the same

When trying to solve the first question i get this plot that looks a bit different:

I looked at Lee's solution to see if there was something wrong in my approach and his plot is closer to the one in the audit. By comparing the code I realized that it's using RandomForestRegressor instead of what the exercise description appears to suggest to use (RandomForestClassifier).

Would that be the reason why the plot looks a bit different? Should I use RandomForestRegressor instead?

Also, other than that, if there is some random element in the audit that might make the output not identical, I would suggest to make it explicitly clear.

Week 3 day 5 ex3 missing imports

In this exercise some libraries that are not defined on ex00 when setting the environment(sklearn and matplotlib) are needed.

week 3 day 5 en_core_web_sm and en_core_web_md installation

Before being able to use this pipelines they have to be installed. Some indication saying this would be good.

Week 3 day 5 ex6 missing file

This exercise can't be solved because the text file is not there, and there aren't any links or hints suggesting how to find it.

Week 3 day 2 ex01 Sequential. Inaccurate output.

The audit is asking for <tensorflow.python.keras.engine.sequential.Sequential object at xxx, and I get <keras.engine.sequential.Sequential at 0x7fc8eb75c8b0>.
It would be better to rephrase it as the output should end with keras.engine.sequential.Sequential at xxx

Audits should be question based

Audits should ask a question that can be answered with a yes or no. Most of the audit files don’t follow this, and the format is wrong.

Also the questions should be phrased in a way that a yes means a pass.

The right markdown to fit the format of the platform should be questions as h6 and instructions as h5

Week 2 Day 1 ex1 question 1

It's not clear how to audit that question, what does it mean by the output? when I try to return or print the regression I get nothing.

Week 3 Day 2 Ex4 Optimize.

sklearn is not included in the libraries when setting up the environment in ex00.

Also for question2 the epoch 50 accuracy in my case is 90%, and checking Lee’s it's 85%, with the same code. Perhaps allow more margin for randomness?

Week 2 Day 2 ex 1 question 1

When I return or print the logistic regression what I get is LogisticRegression(random_state=0). It's not clear what is meant by return in the audit.

Project 3 Link not working, project moved elsewhere

The link - https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_gui/py_video_display/py_video_display.html takes you to a page that says that the project is abandoned. Also making the link easy to copy with one click is a great idea, but it would be better without the - at the beginning so that it can be directly pastable.

Project 4 TODOS on the description

There are some TODOS annotations on the file describing the project that haven't been addressed.

01-edu / branch-ai Goto Github PK

branch-ai's People

Contributors

Stargazers

Watchers

Forkers

branch-ai's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs