60 Days of Udacity - DataScience Challenge

Hi @all! I am Jens and this is my diary for my 60 days of udacity challenge for the Data Track Scholarship. I pledged to work through the foundations course for at least 30 mins every days and this git will document my work. I started this challenge on December 11, 2020.

What can I say: When udacity asked us to take the pledge to study for 30 minutes every day, I was already through with the course. So I decided to do the course again and coded the examples and projects with Excel first, then with Python.

Day 62 - 2021-02-10

💡 Completed my statistics course at udemy.

Day 61 - 2021-02-09

💡 Attended my statistics course at udemy.

Day 60 - 2021-02-08

💡 Attended my statistics course at udemy.
💡 Provided 6 visualization books for my mates.

Day 59 - 2021-02-07

💡 I did more exercises on combinatorics and Bayes.

Day 58 - 2021-02-06

💡 I did exercises on combinatorics. Sometimes it is like nailing jelly to the wall.

Day 57 - 2021-02-05

💡 Repeated combinatorics and read about Django REST APIs.

Day 56 - 2021-02-04

💡 Today I learned more about highcharts and all the options the different diagrams have. I also implemented a dark/light theme for highcharts. In the evening I tried to make the highcharts dynamic and loaded realtime data from the web, parsed it and put it in a chart. It works on my local machine and suddenly - boom - it did not show up longer on my webserver. I looked at the code but found no error. It took two episodes of GoT... Guess what? No program error. No coding mistake. Apache choked. After a restart it works again. Remember: it's not always your fault...

Day 55 - 2021-02-03

💡 I found the misconfiguration and started a new install of my webapp. It already has dark/light mode, responsiveness - it works on Desktop, iPhone and Huawei P40 pro, and the first chart which reads the planetary k-index and displays as a bar chart. The service is up - only in german at the moment, but localisation is the next topic - and can be found at www.aurorafox.de. Yeay!

Day 54 - 2021-02-02

💡 I created the legal pages for my webapp, then I tried to copy the files from my local django installation to my webserver. This seems to be not so straightforward and destroyed the webserver. I will try again tomorrow... it was a long day with a lot of work.

Day 53 - 2021-02-01

💡 Started a data science bootcamp in python at udemy. Making good progress so far.
💡 I learned how to create charts with highcharts which I consider superior to chart.js. And it is free for personal use. I looked up covid data and tried to re-create a dashboard with different charts from sites like worldometer and who.

Day 52 - 2021-01-31

💡 I managed to get a static barchart (chart.js) in my django/html page. After that I searched for some information to feed it with data fron the web and found an excellent video at youtube. This guy shows how to make a covid-19 (this word makes me aggressive...) dashboard with chart.js and highcharts.com. He uses a jupyter notebook for fiddling and testing and then put the code in his django website. Great! Have a look at it https://www.youtube.com/watch?v=yRjteiImIWw
💡 Part two of my work: I managed to download KP-Data from Space Weather Prediction Center, parsed it and put it in a dynamically created chart. Very satisfying to see what can be done with 12 lines of python code.

Day 51 - 2021-01-30

💡 I added dark/light theme switching for my Django webapp and created custom visualization components in bootstrap. CSS is still a miracle, but it is getting better every day. Now trying to query XLS/SQL to get data for my charts in python.

Day 50 - 2021-01-29

💡 Whenever you think you can get started now, something new will get in your way. After I refreshed my CSS knowlegde to make my dashboard responsive, I wanted to change some variables globally and stumbled upon SASS / SCSS. So again... back to the drawingboard like Coyote would probably say...

Day 44 - 2021-01-23

💡 Study Jam!

Day 43 - 2021-01-22

💡 I managed to deploy Django to an Apache2 webserver on a raspberry installation. It worked locally but always showed the standard config page when I accessed it from another device. Searched the whole evening yesterday for the error. I'm still not sure what the reason is/was: a misleading browser cache or an error in the apache config. I disabled the standard page this morning and it worked. I then took care of the DNS redirection and voila - the standard Django page is on the Internet at www.aurorafox.de. Now I will rewrite the application. Troubleshooting is very timeconsuming... In order not to run into this trap again, I have documented the processes in step-by-step instructions

Day 42 - 2021-01-21

💡 I continued fiddling with Django and Python. Got a simple Bootstrap page working and installed it on a raspberry. Now I try to put that all inside a apache webserver and redirect the device to one of my domains. My copy of "How charts lie" has been delivered. Something to read for the weekend.

Day 41 - 2021-01-20

💡 As Monty Python would say: "And now to something totally different"... Today, I stumbled upon Django and developed my first Python/Django Webapp. I the near future want to rewrite my android AuroraFox (Northern Light prediction and forecast) app into a Python/Django webapp to reach more users and to deepen my Python knowledge. So many things to do and so little time....

Day 40 - 2021-01-19

💡 I made it through lesson 2 of Udacity's tableau course https://classroom.udacity.com/courses/ud1006

Day 39 - 2021-01-18

💡 I repeated Bayes theorem and discrete probability

Day 38 - 2021-01-17

💡 I wrote a network worm in Python today for my teaching platform. Python is really great

Day 37 - 2021-01-16

💡 Learning about visualization today. Some of the statistics basics were repeated in the course, leaving the quizzes as a no-brainer.

Day 36 - 2021-01-15

💡 I finished the book "Head First Statistics" and will now start with some lectures on visualization.

Day 35 - 2021-01-14

💡 I completed my statistics course

Day 34 - 2021-01-13

💡 I did it! Submitted my project report for lesson 5. I will post it here as well - maybe you want to have a look at it. It contains an alteryx and python solution as well: The Predicting Catalog Demand Report

💡 I also put the alteryx project in my git

💡 Since I did the project with Python also, I also added the jupyter notebook

Day 33 - 2021-01-12

💡 I'm writing the report for the lesson 5 project. Had to reactivate alteryx for some screenshots. I want to include my python jupyter notebook in the report so I think I need some more time. I think I can publish it tomorrow.

Day 32 - 2021-01-11

💡 Revisited and recapitulated hypothesis testing. It is much clearer now.

Day 31 - 2021-01-10

💡 Struggling with hypothesis testing today. Interesting, but very dry.

Day 30 - 2021-01-09

💡 Could not learn that much today. One of our pets died. Nevertheless, I learned something about Samples and sampling.

💡 I'm still attending the "to p or not to p" course on coursera.

Day 29 - 2021-01-08

💡 Did progress on the coursera statistics course. Now I have to wait until my assignment has been evaluated.

Day 28 - 2021-01-07

💡 I was inspired by the work of a fellow and decided to restructure my git similarly. I added a diary but had a problem: I forgot that slack does not let us access older posts, so I will have to do some diary entries from my memory. I hope it fits with the older slack entries. Shit happens. Yesterday I started the diary, now it is up to date.

💡 Today I started the "to p or not to p" course on coursera.

Day 27 - 2021-01-06

💡 I did the project again with the same result. To rule out errors, I did the project again with sklearn and got the same result. I put the Diamond Project done with Python, statsmodel and sklearn in my github which you can find here:
https://github.com/jegali/DataScience/blob/main/Lesson-4/Lesson-4.ipynb

💡 Finally I found the time to install KNIME, and I did not regret it so far. Its use is as simple as alteryx, not so stylish but can do the same and is free. Attached you see a screenshot of the diamond project. It gives me the same prediction like my python code and has the same deviation of 0,09% to alteryx. So I believe, alteryx uses a different algorithm which may cause the deviation.

Day 26 - 2021-01-05

💡 As promised, I coded the example from lesson 3-22 "Building your First Model in Alteryx" in Python and statsmodel. If you are interested, have a look at the source code at:
https://github.com/jegali/DataScience/blob/main/Lesson-3-22/Lesson-3-22.ipynb

💡 D26: I did the Diamond practice project in Python with statsmodel and think I'm close to the recommended solution. Instead of $8,230,695.69 I got $8,223,038.24. Pretty close, that's 99.91% with a deviation of 0.09%. I will check for an possible error tomorrow and try to get closer.

Day 25 - 2021-01-04

💡 Today I learned Bayes theorem and Laplace probability theory. Besides that, I did the example from Lesson 3-22 in python with statsmodel with exactly the same result as before in alteryx. I will post the Python sourcecode tomorrow in my github.

Day 24 - 2021-01-03

💡 This was a day without coding. I read in my "Head First Statistics" book by O'reilly and learned about probabilities.

Day 23 - 2021-01-02

💡 Today I learned the theorie behind boxplots and quartils. Good to know, you can use both to find outliers and make an educated guess about standard devation and variance.

Day 22 - 2021-01-01

💡 Happy new year! I found out that there is a relationship between nominal, ordinal and metric data on the one hand and mode, median and arithmetic mean on the other. I was very fascinated by the ability to convert nominal data into ordinal data. Now I understand better the connection between dummy variables on the one hand and nominal data on the other. This will help me a lot in implementing the linear regression examples in python.

Day 21 - 2020-12-31

💡 Today, I try something totally different: I have some spare raspberry Pis and connected them to a cluster. Now I try to parallelize the regression examples in python.

💡 I finished a bootcamp on statisics and data science on udemy.

Day 20 - 2020-12-30

💡 Today I did a case study in excel on portfolio management with mean value analysis, scatter analysis, correlation analysis, rank correlation, box plot, data classification and multi-regression analysis. Quite interesting to see a "real world scenario".

Day 19 - 2020-12-29

💡 I started a course on descriptive statistics with Excel. In the next days, I will transfer this knowledge to python.

Day 18 - 2020-12-28

💡 I signed up to medium and read a lot of stuff. Did some experiments on dummy variables in python for a better understanding of multiple linear regression and played around with seaborn for visualization.

Day 17 - 2020-12-27

💡 To bad, I realized that my slack account will delete older messages. I also found out that I cannot buy a slack account for me alone if other persons are in the channels to. So Slack wanted to charge me for a month for $17.000. I will have to reconstruct the days from Dec-11 to Dec-26 and my progress from my memory.

💡 I still have no idea how to code dummy variables in Python.

💡 Thank god, the next example from the udacity lesson is a multi-linear regression, but without nominal data, so I don't need any dummies.
https://github.com/jegali/DataScience/blob/main/lesson-3-12-multi-ticket-sample.ipynb

Day 16 - 2020-12-26

💡 Today, I did the Udacity Lesson 3 Ticket example. I made some progress in using nice Python libraries which made my life more convenient. Have a look at the code here:
https://github.com/jegali/DataScience/blob/main/lesson-3-9-ticket-sample.ipynb

Day 15 - 2020-12-25

💡 Ho ho ho! I tried to figure out some more of the math behind regression and learned about the sklearn and statsmodel package. So I fiddled around with some API calls and really like statsmodel now!

💡 Making progress in the "Head First Statistics" book.

Day 14 - 2020-12-24

💡 It's beginning to look a lot like christmas! I extended the linear regression tutorial by a second part and did the math for correlation coefficient and coefficient of determination on my own. I learned how to read files instead of putting the calculation values in the Python script. Have a look at the results here:
https://github.com/jegali/DataScience/blob/main/linear_regression_2.ipynb

Day 13 - 2020-12-23

💡 I tried some basic statistics in Python like word frequency counting and visualizing the results in a bar plot.
https://github.com/jegali/DataScience/blob/main/word_frequency.ipynb

💡 I transferred the math used for linear regression in Python and did a first example of linear regression by hand. I wrote a short tutorial you can find here:
https://github.com/jegali/DataScience/blob/main/linear_regression.ipynb

💡 I finished reading the "Head First Data Analysis" book.

💡 I started the "Head First Statustuics" book.

Day 12 - 2020-12-22

💡 I fiddled around with Jupyter notebook and got a deeper understanding of this interactive version of python. I learned about the markdown language I can use in the notebooks and did some steps with numpy, scipy, matplotlib and sympy.

Day 11 - 2020-12-21

💡 I have no alteryx license for my work laptop and will definetely not install software that has not been approved by our CIO. I found out that Python is allowed, so that was another important decision for my to switch from alteryx to python. I installed miniconda, since anaconda needs a paid license, which I do not have. I did some research on the packets installed with anaconda and decided to do my own "data sciene package". Here is what I did. First I downloaded miniconda and then I did some install on a console window:

# download miniconda from : https://docs.conda.io/en/latest/miniconda.html
# After installation (you do not need any administrative right for that)
# open up a console window and type these commands
# to download and install the desired packages.
# Installation is interactive, so sometimes you have to type in "yes" or "no" 

conda install -c conda-forge scipy
conda install -c conda-forge numpy
conda install -c conda-forge pandas
conda install -c conda-forge matplotlib
conda install -c conda-forge bokeh
conda install -c conda-forge plotly
conda install -c conda-forge pillow
conda install -c conda-forge statsmodels
conda install -c conda-forge bkcharts
conda install -c conda-forge dbf
conda install -c conda-forge libcurl
conda install -c conda-forge orange3
conda install -c conda-forge qt
conda install -c conda-forge pypi                                                                     
conda install -c conda-forge pyviz                                                                    
conda install -c conda-forge seaborn
conda install -c conda-forge spyder                                                                 
conda install -c conda-forge sympy
conda install -c conda-forge miktex
conda install -c conda-forge vispy
conda install -c conda-forge altair vega_datasets
conda install -c conda-forge panel
conda install -c conda-forge dash
conda install -c conda-forge scikit-learn
conda install -c conda-forge scrapy
conda install -c conda-forge tensorflow
conda install -c conda-forge keras
conda install -c conda-forge pytorch
conda install -c conda-forge theano
conda install -c conda-forge nltk
conda install -c conda-forge xlsxwriter
conda install -c conda-forge xlutils
conda install -c conda-forge xlwings
conda install -c conda-forge jupyterlab

Day 10 - 2020-12-20

💡 I understood the theory behind linear regression. I still do not know why it is called "machine learning". It simply calculates a regression formula which I can insert values in. Has nothing to do with "intelligent" or "learning", I think.

Day 9 - 2020-12-19

💡 I started with a data science course on udemy but found out I need some more basics on statistics. I decided to learn about the math behind linear regression and did the calculation by hand in Excel.

Day 8 - 2020-12-18

💡 I wanted to learn more about alteryx, so I did another course on udemy and passed today. But I have to say, this course is not worth the money. If you get it discounted, it is ok.

Day 7 - 2020-12-17

💡 I started another alteryx-course with some real world examples, but I was disappointed.

Day 6 - 2020-12-16

💡 I passed my first alteryx course on udemy today. This course is highly recommended!

Day 5 - 2020-12-15

💡 I surfed the udemy website for some courses and guess what, they actually had discounts for alteryx courses. So I decided to deepen my knowledge in that tool.

Day 4 - 2020-12-14

💡 I am still reading the "Head First" book.

💡 I searched the web for some python installation and came across anaconda which I installed on my laptop. I fell in love with Jupyter Noteboks and decided to learn more about it.

Day 3 - 2020-12-13

💡 Today I read my "Head First Data Analysis" book, which I started some time ago. I now have a much better unterstanding what's going on on the Data Track.

Day 2 - 2020-12-12

💡 Since I did all the tasks and project with alteryx before I decided to do some research on O'reilly website and ordered two books from amazon concerning data science. I am looking forward to get and read them.

Day 1 - 2020-12-11

💡 I did the pledge and joined the slack community for the 60 days of udacity challenge. I found a lot of learning material which I sorted and saved for further reading.

emoghena / datascience Goto Github PK

datascience's Introduction