60 Days of Udacity - DataScience Challenge

Hi @all! I am Jens and this is my diary for my 60 days of udacity challenge for the Data Track Scholarship. I pledged to work through the foundations course for at least 30 mins every days and this git will document my work. I started this challenge on December 11, 2020.

What can I say: When udacity asked us to take the pledge to study for 30 minutes every day, I was already through with the course. So I decided to do the course again and coded the examples and projects with Excel first, then with Python.

Day 30 - 2021-01-09

💡 Could not learn that much today. One of our pets died. Nevertheless, I learned something about Samples and sampling.

💡 I'm still attending the "to p or not to p" course on coursera.

Day 29 - 2021-01-08

💡 Did progress on the coursera statistics course. Now I have to wait until my assignment has been evaluated.

Day 28 - 2021-01-07

💡 I was inspired by the work of a fellow and decided to restructure my git similarly. I added a diary but had a problem: I forgot that slack does not let us access older posts, so I will have to do some diary entries from my memory. I hope it fits with the older slack entries. Shit happens. Yesterday I started the diary, now it is up to date.

💡 Today I started the "to p or not to p" course on coursera.

Day 27 - 2021-01-06

💡 I did the project again with the same result. To rule out errors, I did the project again with sklearn and got the same result. I put the Diamond Project done with Python, statsmodel and sklearn in my github which you can find here:
https://github.com/jegali/DataScience/blob/main/Lesson-4/Lesson-4.ipynb

💡 Finally I found the time to install KNIME, and I did not regret it so far. Its use is as simple as alteryx, not so stylish but can do the same and is free. Attached you see a screenshot of the diamond project. It gives me the same prediction like my python code and has the same deviation of 0,09% to alteryx. So I believe, alteryx uses a different algorithm which may cause the deviation.

Day 26 - 2021-01-05

💡 As promised, I coded the example from lesson 3-22 "Building your First Model in Alteryx" in Python and statsmodel. If you are interested, have a look at the source code at:
https://github.com/jegali/DataScience/blob/main/Lesson-3-22/Lesson-3-22.ipynb

💡 D26: I did the Diamond practice project in Python with statsmodel and think I'm close to the recommended solution. Instead of $8,230,695.69 I got $8,223,038.24. Pretty close, that's 99.91% with a deviation of 0.09%. I will check for an possible error tomorrow and try to get closer.

Day 25 - 2021-01-04

💡 Today I learned Bayes theorem and Laplace probability theory. Besides that, I did the example from Lesson 3-22 in python with statsmodel with exactly the same result as before in alteryx. I will post the Python sourcecode tomorrow in my github.

Day 24 - 2021-01-03

💡 This was a day without coding. I read in my "Head First Statistics" book by O'reilly and learned about probabilities.

Day 23 - 2021-01-02

💡 Today I learned the theorie behind boxplots and quartils. Good to know, you can use both to find outliers and make an educated guess about standard devation and variance.

Day 22 - 2021-01-01

💡 Happy new year! I found out that there is a relationship between nominal, ordinal and metric data on the one hand and mode, median and arithmetic mean on the other. I was very fascinated by the ability to convert nominal data into ordinal data. Now I understand better the connection between dummy variables on the one hand and nominal data on the other. This will help me a lot in implementing the linear regression examples in python.

Day 21 - 2020-12-31

💡 Today, I try something totally different: I have some spare raspberry Pis and connected them to a cluster. Now I try to parallelize the regression examples in python.

💡 I finished a bootcamp on statisics and data science on udemy.

Day 20 - 2020-12-30

💡 Today I did a case study in excel on portfolio management with mean value analysis, scatter analysis, correlation analysis, rank correlation, box plot, data classification and multi-regression analysis. Quite interesting to see a "real world scenario".

Day 19 - 2020-12-29

💡 I started a course on descriptive statistics with Excel. In the next days, I will transfer this knowledge to python.

Day 18 - 2020-12-28

💡 I signed up to medium and read a lot of stuff. Did some experiments on dummy variables in python for a better understanding of multiple linear regression and played around with seaborn for visualization.

Day 17 - 2020-12-27

💡 To bad, I realized that my slack account will delete older messages. I also found out that I cannot buy a slack account for me alone if other persons are in the channels to. So Slack wanted to charge me for a month for $17.000. I will have to reconstruct the days from Dec-11 to Dec-26 and my progress from my memory.

💡 I still have no idea how to code dummy variables in Python.

💡 Thank god, the next example from the udacity lesson is a multi-linear regression, but without nominal data, so I don't need any dummies.
https://github.com/jegali/DataScience/blob/main/lesson-3-12-multi-ticket-sample.ipynb

Day 16 - 2020-12-26

💡 Today, I did the Udacity Lesson 3 Ticket example. I made some progress in using nice Python libraries which made my life more convenient. Have a look at the code here:
https://github.com/jegali/DataScience/blob/main/lesson-3-9-ticket-sample.ipynb

Day 15 - 2020-12-25

💡 Ho ho ho! I tried to figure out some more of the math behind regression and learned about the sklearn and statsmodel package. So I fiddled around with some API calls and really like statsmodel now!

💡 Making progress in the "Head First Statistics" book.

Day 14 - 2020-12-24

💡 It's beginning to look a lot like christmas! I extended the linear regression tutorial by a second part and did the math for correlation coefficient and coefficient of determination on my own. I learned how to read files instead of putting the calculation values in the Python script. Have a look at the results here:
https://github.com/jegali/DataScience/blob/main/linear_regression_2.ipynb

Day 13 - 2020-12-23

💡 I tried some basic statistics in Python like word frequency counting and visualizing the results in a bar plot.
https://github.com/jegali/DataScience/blob/main/word_frequency.ipynb

💡 I transferred the math used for linear regression in Python and did a first example of linear regression by hand. I wrote a short tutorial you can find here:
https://github.com/jegali/DataScience/blob/main/linear_regression.ipynb

💡 I finished reading the "Head First Data Analysis" book.

💡 I started the "Head First Statustuics" book.

Day 12 - 2020-12-22

💡 I fiddled around with Jupyter notebook and got a deeper understanding of this interactive version of python. I learned about the markdown language I can use in the notebooks and did some steps with numpy, scipy, matplotlib and sympy.

Day 11 - 2020-12-21

💡 I have no alteryx license for my work laptop and will definetely not install software that has not been approved by our CIO. I found out that Python is allowed, so that was another important decision for my to switch from alteryx to python. I installed miniconda, since anaconda needs a paid license, which I do not have. I did some research on the packets installed with anaconda and decided to do my own "data sciene package". Here is what I did. First I downloaded miniconda and then I did some install on a console window:

# download miniconda from : https://docs.conda.io/en/latest/miniconda.html
# After installation (you do not need any administrative right for that)
# open up a console window and type these commands
# to download and install the desired packages.
# Installation is interactive, so sometimes you have to type in "yes" or "no" 

conda install -c conda-forge scipy
conda install -c conda-forge numpy
conda install -c conda-forge pandas
conda install -c conda-forge matplotlib
conda install -c conda-forge bokeh
conda install -c conda-forge plotly
conda install -c conda-forge pillow
conda install -c conda-forge statsmodels
conda install -c conda-forge bkcharts
conda install -c conda-forge dbf
conda install -c conda-forge libcurl
conda install -c conda-forge orange3
conda install -c conda-forge qt
conda install -c conda-forge pypi                                                                     
conda install -c conda-forge pyviz                                                                    
conda install -c conda-forge seaborn
conda install -c conda-forge spyder                                                                 
conda install -c conda-forge sympy
conda install -c conda-forge miktex
conda install -c conda-forge vispy
conda install -c conda-forge altair vega_datasets
conda install -c conda-forge panel
conda install -c conda-forge dash
conda install -c conda-forge scikit-learn
conda install -c conda-forge scrapy
conda install -c conda-forge tensorflow
conda install -c conda-forge keras
conda install -c conda-forge pytorch
conda install -c conda-forge theano
conda install -c conda-forge nltk
conda install -c conda-forge xlsxwriter
conda install -c conda-forge xlutils
conda install -c conda-forge xlwings
conda install -c conda-forge jupyterlab

Day 10 - 2020-12-20

💡 I understood the theory behind linear regression. I still do not know why it is called "machine learning". It simply calculates a regression formula which I can insert values in. Has nothing to do with "intelligent" or "learning", I think.

Day 9 - 2020-12-19

💡 I started with a data science course on udemy but found out I need some more basics on statistics. I decided to learn about the math behind linear regression and did the calculation by hand in Excel.

Day 8 - 2020-12-18

💡 I wanted to learn more about alteryx, so I did another course on udemy and passed today. But I have to say, this course is not worth the money. If you get it discounted, it is ok.

Day 7 - 2020-12-17

💡 I started another alteryx-course with some real world examples, but I was disappointed.

Day 6 - 2020-12-16

💡 I passed my first alteryx course on udemy today. This course is highly recommended!

Day 5 - 2020-12-15

💡 I surfed the udemy website for some courses and guess what, they actually had discounts for alteryx courses. So I decided to deepen my knowledge in that tool.

Day 4 - 2020-12-14

💡 I am still reading the "Head First" book.

💡 I searched the web for some python installation and came across anaconda which I installed on my laptop. I fell in love with Jupyter Noteboks and decided to learn more about it.

Day 3 - 2020-12-13

💡 Today I read my "Head First Data Analysis" book, which I started some time ago. I now have a much better unterstanding what's going on on the Data Track.

Day 2 - 2020-12-12

💡 Since I did all the tasks and project with alteryx before I decided to do some research on O'reilly website and ordered two books from amazon concerning data science. I am looking forward to get and read them.

Day 1 - 2020-12-11

💡 I did the pledge and joined the slack community for the 60 days of udacity challenge. I found a lot of learning material which I sorted and saved for further reading.

sechan9999 / datascience Goto Github PK

datascience's Introduction