Hi @all! I am Jens and this is my diary for my 60 days of udacity challenge for the Data Track Scholarship. I pledged to work through the foundations course for at least 30 mins every days and this git will document my work. I started this challenge on December 11, 2020.
What can I say: When udacity asked us to take the pledge to study for 30 minutes every day, I was already through with the course. So I decided to do the course again and coded the examples and projects with Excel first, then with Python.
๐ก Completed my statistics course at udemy.
๐ก Attended my statistics course at udemy.
๐ก Attended my statistics course at udemy.
๐ก Provided 6 visualization books for my mates.
๐ก I did more exercises on combinatorics and Bayes.
๐ก I did exercises on combinatorics. Sometimes it is like nailing jelly to the wall.
๐ก Repeated combinatorics and read about Django REST APIs.
๐ก Today I learned more about highcharts and all the options the different diagrams have. I also implemented a dark/light theme for highcharts. In the evening I tried to make the highcharts dynamic and loaded realtime data from the web, parsed it and put it in a chart. It works on my local machine and suddenly - boom - it did not show up longer on my webserver. I looked at the code but found no error. It took two episodes of GoT... Guess what? No program error. No coding mistake. Apache choked. After a restart it works again. Remember: it's not always your fault...
๐ก I found the misconfiguration and started a new install of my webapp. It already has dark/light mode, responsiveness - it works on Desktop, iPhone and Huawei P40 pro, and the first chart which reads the planetary k-index and displays as a bar chart. The service is up - only in german at the moment, but localisation is the next topic - and can be found at www.aurorafox.de. Yeay!
๐ก I created the legal pages for my webapp, then I tried to copy the files from my local django installation to my webserver. This seems to be not so straightforward and destroyed the webserver. I will try again tomorrow... it was a long day with a lot of work.
๐ก Started a data science bootcamp in python at udemy. Making good progress so far.
๐ก I learned how to create charts with highcharts which I consider superior to chart.js. And it is free for personal use. I looked up covid data and tried to re-create a dashboard with different charts from sites like worldometer and who.
๐ก I managed to get a static barchart (chart.js) in my django/html page. After that I searched for some information to feed it with data fron the web and found an excellent video at youtube. This guy shows how to make a covid-19 (this word makes me aggressive...) dashboard with chart.js and highcharts.com. He uses a jupyter notebook for fiddling and testing and then put the code in his django website. Great! Have a look at it https://www.youtube.com/watch?v=yRjteiImIWw
๐ก Part two of my work: I managed to download KP-Data from Space Weather Prediction Center, parsed it and put it in a dynamically created chart. Very satisfying to see what can be done with 12 lines of python code.
๐ก I added dark/light theme switching for my Django webapp and created custom visualization components in bootstrap. CSS is still a miracle, but it is getting better every day. Now trying to query XLS/SQL to get data for my charts in python.
๐ก Whenever you think you can get started now, something new will get in your way. After I refreshed my CSS knowlegde to make my dashboard responsive, I wanted to change some variables globally and stumbled upon SASS / SCSS. So again... back to the drawingboard like Coyote would probably say...
๐ก Study Jam!
๐ก I managed to deploy Django to an Apache2 webserver on a raspberry installation. It worked locally but always showed the standard config page when I accessed it from another device. Searched the whole evening yesterday for the error. I'm still not sure what the reason is/was: a misleading browser cache or an error in the apache config. I disabled the standard page this morning and it worked. I then took care of the DNS redirection and voila - the standard Django page is on the Internet at www.aurorafox.de. Now I will rewrite the application. Troubleshooting is very timeconsuming... In order not to run into this trap again, I have documented the processes in step-by-step instructions
๐ก I continued fiddling with Django and Python. Got a simple Bootstrap page working and installed it on a raspberry. Now I try to put that all inside a apache webserver and redirect the device to one of my domains. My copy of "How charts lie" has been delivered. Something to read for the weekend.
๐ก As Monty Python would say: "And now to something totally different"... Today, I stumbled upon Django and developed my first Python/Django Webapp. I the near future want to rewrite my android AuroraFox (Northern Light prediction and forecast) app into a Python/Django webapp to reach more users and to deepen my Python knowledge. So many things to do and so little time....
๐ก I made it through lesson 2 of Udacity's tableau course https://classroom.udacity.com/courses/ud1006
๐ก I repeated Bayes theorem and discrete probability
๐ก I wrote a network worm in Python today for my teaching platform. Python is really great
๐ก Learning about visualization today. Some of the statistics basics were repeated in the course, leaving the quizzes as a no-brainer.
๐ก I finished the book "Head First Statistics" and will now start with some lectures on visualization.
๐ก I completed my statistics course
๐ก I did it! Submitted my project report for lesson 5. I will post it here as well - maybe you want to have a look at it. It contains an alteryx and python solution as well: The Predicting Catalog Demand Report
๐ก I also put the alteryx project in my git
๐ก Since I did the project with Python also, I also added the jupyter notebook
๐ก I'm writing the report for the lesson 5 project. Had to reactivate alteryx for some screenshots. I want to include my python jupyter notebook in the report so I think I need some more time. I think I can publish it tomorrow.
๐ก Revisited and recapitulated hypothesis testing. It is much clearer now.
๐ก Struggling with hypothesis testing today. Interesting, but very dry.
๐ก Could not learn that much today. One of our pets died. Nevertheless, I learned something about Samples and sampling.
๐ก I'm still attending the "to p or not to p" course on coursera.
๐ก Did progress on the coursera statistics course. Now I have to wait until my assignment has been evaluated.
๐ก I was inspired by the work of a fellow and decided to restructure my git similarly. I added a diary but had a problem: I forgot that slack does not let us access older posts, so I will have to do some diary entries from my memory. I hope it fits with the older slack entries. Shit happens. Yesterday I started the diary, now it is up to date.
๐ก Today I started the "to p or not to p" course on coursera.
๐ก I did the project again with the same result. To rule out errors, I did the project again with sklearn and got the same result.
I put the Diamond Project done with Python, statsmodel and sklearn in my github which you can find here:
https://github.com/jegali/DataScience/blob/main/Lesson-4/Lesson-4.ipynb
๐ก Finally I found the time to install KNIME, and I did not regret it so far. Its use is as simple as alteryx, not so stylish but can do the same and is free. Attached you see a screenshot of the diamond project. It gives me the same prediction like my python code and has the same deviation of 0,09% to alteryx. So I believe, alteryx uses a different algorithm which may cause the deviation.
๐ก As promised, I coded the example from lesson 3-22 "Building your First Model in Alteryx" in Python and statsmodel. If you are interested, have a look at the source code at:
https://github.com/jegali/DataScience/blob/main/Lesson-3-22/Lesson-3-22.ipynb
๐ก D26: I did the Diamond practice project in Python with statsmodel and think I'm close to the recommended solution. Instead of $8,230,695.69 I got $8,223,038.24. Pretty close, that's 99.91% with a deviation of 0.09%. I will check for an possible error tomorrow and try to get closer.
๐ก Today I learned Bayes theorem and Laplace probability theory. Besides that, I did the example from Lesson 3-22 in python with statsmodel with exactly the same result as before in alteryx. I will post the Python sourcecode tomorrow in my github.
๐ก This was a day without coding. I read in my "Head First Statistics" book by O'reilly and learned about probabilities.
๐ก Today I learned the theorie behind boxplots and quartils. Good to know, you can use both to find outliers and make an educated guess about standard devation and variance.
๐ก Happy new year! I found out that there is a relationship between nominal, ordinal and metric data on the one hand and mode, median and arithmetic mean on the other. I was very fascinated by the ability to convert nominal data into ordinal data. Now I understand better the connection between dummy variables on the one hand and nominal data on the other. This will help me a lot in implementing the linear regression examples in python.
๐ก Today, I try something totally different: I have some spare raspberry Pis and connected them to a cluster. Now I try to parallelize the regression examples in python.
๐ก I finished a bootcamp on statisics and data science on udemy.
๐ก Today I did a case study in excel on portfolio management with mean value analysis, scatter analysis, correlation analysis, rank correlation, box plot, data classification and multi-regression analysis. Quite interesting to see a "real world scenario".
๐ก I started a course on descriptive statistics with Excel. In the next days, I will transfer this knowledge to python.
๐ก I signed up to medium and read a lot of stuff. Did some experiments on dummy variables in python for a better understanding of multiple linear regression and played around with seaborn for visualization.
๐ก To bad, I realized that my slack account will delete older messages. I also found out that I cannot buy a slack account for me alone if other persons are in the channels to. So Slack wanted to charge me for a month for $17.000. I will have to reconstruct the days from Dec-11 to Dec-26 and my progress from my memory.
๐ก I still have no idea how to code dummy variables in Python.
๐ก Thank god, the next example from the udacity lesson is a multi-linear regression, but without nominal data, so I don't need any dummies.
https://github.com/jegali/DataScience/blob/main/lesson-3-12-multi-ticket-sample.ipynb
๐ก Today, I did the Udacity Lesson 3 Ticket example. I made some progress in using nice Python libraries which made my life more convenient. Have a look at the code here:
https://github.com/jegali/DataScience/blob/main/lesson-3-9-ticket-sample.ipynb
๐ก Ho ho ho! I tried to figure out some more of the math behind regression and learned about the sklearn and statsmodel package. So I fiddled around with some API calls and really like statsmodel now!
๐ก Making progress in the "Head First Statistics" book.
๐ก It's beginning to look a lot like christmas! I extended the linear regression tutorial by a second part and did the math for correlation coefficient and coefficient of determination on my own. I learned how to read files instead of putting the calculation values in the Python script. Have a look at the results here:
https://github.com/jegali/DataScience/blob/main/linear_regression_2.ipynb
๐ก I tried some basic statistics in Python like word frequency counting and visualizing the results in a bar plot.
https://github.com/jegali/DataScience/blob/main/word_frequency.ipynb
๐ก I transferred the math used for linear regression in Python and did a first example of linear regression by hand. I wrote a short tutorial you can find here:
https://github.com/jegali/DataScience/blob/main/linear_regression.ipynb
๐ก I finished reading the "Head First Data Analysis" book.
๐ก I started the "Head First Statustuics" book.
๐ก I fiddled around with Jupyter notebook and got a deeper understanding of this interactive version of python. I learned about the markdown language I can use in the notebooks and did some steps with numpy, scipy, matplotlib and sympy.
๐ก I have no alteryx license for my work laptop and will definetely not install software that has not been approved by our CIO. I found out that Python is allowed, so that was another important decision for my to switch from alteryx to python. I installed miniconda, since anaconda needs a paid license, which I do not have. I did some research on the packets installed with anaconda and decided to do my own "data sciene package". Here is what I did. First I downloaded miniconda and then I did some install on a console window:
# download miniconda from : https://docs.conda.io/en/latest/miniconda.html
# After installation (you do not need any administrative right for that)
# open up a console window and type these commands
# to download and install the desired packages.
# Installation is interactive, so sometimes you have to type in "yes" or "no"
conda install -c conda-forge scipy
conda install -c conda-forge numpy
conda install -c conda-forge pandas
conda install -c conda-forge matplotlib
conda install -c conda-forge bokeh
conda install -c conda-forge plotly
conda install -c conda-forge pillow
conda install -c conda-forge statsmodels
conda install -c conda-forge bkcharts
conda install -c conda-forge dbf
conda install -c conda-forge libcurl
conda install -c conda-forge orange3
conda install -c conda-forge qt
conda install -c conda-forge pypi
conda install -c conda-forge pyviz
conda install -c conda-forge seaborn
conda install -c conda-forge spyder
conda install -c conda-forge sympy
conda install -c conda-forge miktex
conda install -c conda-forge vispy
conda install -c conda-forge altair vega_datasets
conda install -c conda-forge panel
conda install -c conda-forge dash
conda install -c conda-forge scikit-learn
conda install -c conda-forge scrapy
conda install -c conda-forge tensorflow
conda install -c conda-forge keras
conda install -c conda-forge pytorch
conda install -c conda-forge theano
conda install -c conda-forge nltk
conda install -c conda-forge xlsxwriter
conda install -c conda-forge xlutils
conda install -c conda-forge xlwings
conda install -c conda-forge jupyterlab
๐ก I understood the theory behind linear regression. I still do not know why it is called "machine learning". It simply calculates a regression formula which I can insert values in. Has nothing to do with "intelligent" or "learning", I think.
๐ก I started with a data science course on udemy but found out I need some more basics on statistics. I decided to learn about the math behind linear regression and did the calculation by hand in Excel.
๐ก I wanted to learn more about alteryx, so I did another course on udemy and passed today. But I have to say, this course is not worth the money. If you get it discounted, it is ok.
๐ก I started another alteryx-course with some real world examples, but I was disappointed.
๐ก I passed my first alteryx course on udemy today. This course is highly recommended!
๐ก I surfed the udemy website for some courses and guess what, they actually had discounts for alteryx courses. So I decided to deepen my knowledge in that tool.
๐ก I am still reading the "Head First" book.
๐ก I searched the web for some python installation and came across anaconda which I installed on my laptop. I fell in love with Jupyter Noteboks and decided to learn more about it.
๐ก Today I read my "Head First Data Analysis" book, which I started some time ago. I now have a much better unterstanding what's going on on the Data Track.
๐ก Since I did all the tasks and project with alteryx before I decided to do some research on O'reilly website and ordered two books from amazon concerning data science. I am looking forward to get and read them.
๐ก I did the pledge and joined the slack community for the 60 days of udacity challenge. I found a lot of learning material which I sorted and saved for further reading.