Hi @all! I am Jens and this is my diary for my 60 days of udacity challenge for the Data Track Scholarship. I pledged to work through the foundations course for at least 30 mins every days and this git will document my work. I started this challenge on December 11, 2020.
What can I say: When udacity asked us to take the pledge to study for 30 minutes every day, I was already through with the course. So I decided to do the course again and coded the examples and projects with Excel first, then with Python.
๐ก Could not learn that much today. One of our pets died. Nevertheless, I learned something about Samples and sampling.
๐ก I'm still attending the "to p or not to p" course on coursera.
๐ก Did progress on the coursera statistics course. Now I have to wait until my assignment has been evaluated.
๐ก I was inspired by the work of a fellow and decided to restructure my git similarly. I added a diary but had a problem: I forgot that slack does not let us access older posts, so I will have to do some diary entries from my memory. I hope it fits with the older slack entries. Shit happens. Yesterday I started the diary, now it is up to date.
๐ก Today I started the "to p or not to p" course on coursera.
๐ก I did the project again with the same result. To rule out errors, I did the project again with sklearn and got the same result.
I put the Diamond Project done with Python, statsmodel and sklearn in my github which you can find here:
https://github.com/jegali/DataScience/blob/main/Lesson-4/Lesson-4.ipynb
๐ก Finally I found the time to install KNIME, and I did not regret it so far. Its use is as simple as alteryx, not so stylish but can do the same and is free. Attached you see a screenshot of the diamond project. It gives me the same prediction like my python code and has the same deviation of 0,09% to alteryx. So I believe, alteryx uses a different algorithm which may cause the deviation.
๐ก As promised, I coded the example from lesson 3-22 "Building your First Model in Alteryx" in Python and statsmodel. If you are interested, have a look at the source code at:
https://github.com/jegali/DataScience/blob/main/Lesson-3-22/Lesson-3-22.ipynb
๐ก D26: I did the Diamond practice project in Python with statsmodel and think I'm close to the recommended solution. Instead of $8,230,695.69 I got $8,223,038.24. Pretty close, that's 99.91% with a deviation of 0.09%. I will check for an possible error tomorrow and try to get closer.
๐ก Today I learned Bayes theorem and Laplace probability theory. Besides that, I did the example from Lesson 3-22 in python with statsmodel with exactly the same result as before in alteryx. I will post the Python sourcecode tomorrow in my github.
๐ก This was a day without coding. I read in my "Head First Statistics" book by O'reilly and learned about probabilities.
๐ก Today I learned the theorie behind boxplots and quartils. Good to know, you can use both to find outliers and make an educated guess about standard devation and variance.
๐ก Happy new year! I found out that there is a relationship between nominal, ordinal and metric data on the one hand and mode, median and arithmetic mean on the other. I was very fascinated by the ability to convert nominal data into ordinal data. Now I understand better the connection between dummy variables on the one hand and nominal data on the other. This will help me a lot in implementing the linear regression examples in python.
๐ก Today, I try something totally different: I have some spare raspberry Pis and connected them to a cluster. Now I try to parallelize the regression examples in python.
๐ก I finished a bootcamp on statisics and data science on udemy.
๐ก Today I did a case study in excel on portfolio management with mean value analysis, scatter analysis, correlation analysis, rank correlation, box plot, data classification and multi-regression analysis. Quite interesting to see a "real world scenario".
๐ก I started a course on descriptive statistics with Excel. In the next days, I will transfer this knowledge to python.
๐ก I signed up to medium and read a lot of stuff. Did some experiments on dummy variables in python for a better understanding of multiple linear regression and played around with seaborn for visualization.
๐ก To bad, I realized that my slack account will delete older messages. I also found out that I cannot buy a slack account for me alone if other persons are in the channels to. So Slack wanted to charge me for a month for $17.000. I will have to reconstruct the days from Dec-11 to Dec-26 and my progress from my memory.
๐ก I still have no idea how to code dummy variables in Python.
๐ก Thank god, the next example from the udacity lesson is a multi-linear regression, but without nominal data, so I don't need any dummies.
https://github.com/jegali/DataScience/blob/main/lesson-3-12-multi-ticket-sample.ipynb
๐ก Today, I did the Udacity Lesson 3 Ticket example. I made some progress in using nice Python libraries which made my life more convenient. Have a look at the code here:
https://github.com/jegali/DataScience/blob/main/lesson-3-9-ticket-sample.ipynb
๐ก Ho ho ho! I tried to figure out some more of the math behind regression and learned about the sklearn and statsmodel package. So I fiddled around with some API calls and really like statsmodel now!
๐ก Making progress in the "Head First Statistics" book.
๐ก It's beginning to look a lot like christmas! I extended the linear regression tutorial by a second part and did the math for correlation coefficient and coefficient of determination on my own. I learned how to read files instead of putting the calculation values in the Python script. Have a look at the results here:
https://github.com/jegali/DataScience/blob/main/linear_regression_2.ipynb
๐ก I tried some basic statistics in Python like word frequency counting and visualizing the results in a bar plot.
https://github.com/jegali/DataScience/blob/main/word_frequency.ipynb
๐ก I transferred the math used for linear regression in Python and did a first example of linear regression by hand. I wrote a short tutorial you can find here:
https://github.com/jegali/DataScience/blob/main/linear_regression.ipynb
๐ก I finished reading the "Head First Data Analysis" book.
๐ก I started the "Head First Statustuics" book.
๐ก I fiddled around with Jupyter notebook and got a deeper understanding of this interactive version of python. I learned about the markdown language I can use in the notebooks and did some steps with numpy, scipy, matplotlib and sympy.
๐ก I have no alteryx license for my work laptop and will definetely not install software that has not been approved by our CIO. I found out that Python is allowed, so that was another important decision for my to switch from alteryx to python. I installed miniconda, since anaconda needs a paid license, which I do not have. I did some research on the packets installed with anaconda and decided to do my own "data sciene package". Here is what I did. First I downloaded miniconda and then I did some install on a console window:
# download miniconda from : https://docs.conda.io/en/latest/miniconda.html
# After installation (you do not need any administrative right for that)
# open up a console window and type these commands
# to download and install the desired packages.
# Installation is interactive, so sometimes you have to type in "yes" or "no"
conda install -c conda-forge scipy
conda install -c conda-forge numpy
conda install -c conda-forge pandas
conda install -c conda-forge matplotlib
conda install -c conda-forge bokeh
conda install -c conda-forge plotly
conda install -c conda-forge pillow
conda install -c conda-forge statsmodels
conda install -c conda-forge bkcharts
conda install -c conda-forge dbf
conda install -c conda-forge libcurl
conda install -c conda-forge orange3
conda install -c conda-forge qt
conda install -c conda-forge pypi
conda install -c conda-forge pyviz
conda install -c conda-forge seaborn
conda install -c conda-forge spyder
conda install -c conda-forge sympy
conda install -c conda-forge miktex
conda install -c conda-forge vispy
conda install -c conda-forge altair vega_datasets
conda install -c conda-forge panel
conda install -c conda-forge dash
conda install -c conda-forge scikit-learn
conda install -c conda-forge scrapy
conda install -c conda-forge tensorflow
conda install -c conda-forge keras
conda install -c conda-forge pytorch
conda install -c conda-forge theano
conda install -c conda-forge nltk
conda install -c conda-forge xlsxwriter
conda install -c conda-forge xlutils
conda install -c conda-forge xlwings
conda install -c conda-forge jupyterlab
๐ก I understood the theory behind linear regression. I still do not know why it is called "machine learning". It simply calculates a regression formula which I can insert values in. Has nothing to do with "intelligent" or "learning", I think.
๐ก I started with a data science course on udemy but found out I need some more basics on statistics. I decided to learn about the math behind linear regression and did the calculation by hand in Excel.
๐ก I wanted to learn more about alteryx, so I did another course on udemy and passed today. But I have to say, this course is not worth the money. If you get it discounted, it is ok.
๐ก I started another alteryx-course with some real world examples, but I was disappointed.
๐ก I passed my first alteryx course on udemy today. This course is highly recommended!
๐ก I surfed the udemy website for some courses and guess what, they actually had discounts for alteryx courses. So I decided to deepen my knowledge in that tool.
๐ก I am still reading the "Head First" book.
๐ก I searched the web for some python installation and came across anaconda which I installed on my laptop. I fell in love with Jupyter Noteboks and decided to learn more about it.
๐ก Today I read my "Head First Data Analysis" book, which I started some time ago. I now have a much better unterstanding what's going on on the Data Track.
๐ก Since I did all the tasks and project with alteryx before I decided to do some research on O'reilly website and ordered two books from amazon concerning data science. I am looking forward to get and read them.
๐ก I did the pledge and joined the slack community for the 60 days of udacity challenge. I found a lot of learning material which I sorted and saved for further reading.