Thursday, May 30, 2019

Python First Steps - A Hands On Tutorial

This month we had a first-steps introduction to Python. It was arranged in response to feedback from members who felt a beginners introduction would be useful in helping them explore the Python data science ecosystem of tools and methods.


The slides for the talk are online [link].


Aim

The aim of the session was not to provide a comprehensive coverage of python as a language, nor an exhaustive tour of the ecosystem of libraries and tools.

The aim was to:

  • demonstrate enough of the basic of python to see how it works,
  • write your own code,
  • practice the important skill of searching the internet for code syntax and how to use libraries
  • be able to understand a good amount of python code that others have written
  • and most importantly, develop the confidence to continue to learn and explore python.

Throughout we emphasised the point that today, an encyclopaedic knowledge of a language or a library's options is not indicative of a good programmer. Today, languages and tools are so huge in number and size, that the ability to find the right tool and learn how to use it is a much more important skill. Added to this, the fact that tools change at an ever faster rate.


Why Python?

We briefly set the scene by looking at several recent charts showing Python as one of the fastest growing languages, already in the top 3 in most market analyses, and far ahead in the fields of data science and especially machine learning.



We pondered on the fact that python was not initially designed as a numerical language, but its ease of use accelerated its adoption on many fields including data science.


Notebooks

In the last decade a key innovation has emerged that has made coding easier, friendlier, and avoids the technical setup that was previously necessary.


That innovation is the notebook. In essence, it is just a web page into which we write our instructions, and see the results of those instructions.

A web page is already very familiar to many people and reduces the barriers to coding.

Today notebooks are both simple, and also very capable, with the ability to show charts, include animations, and even include control widgets.

Github, and other code repositories, even support previewing uploaded notebooks - here's an example from one of our own meetups:




Getting / Using Python

Most users of python make extensive use of the healthy and vibrant ecosystem of libraries and tools. The official python distribution from python.org is fairly capable but doesn't include many of the now popular libraries.

Many data scientists and machine learning researchers use the Anaconda Python distribution which includes many of the common libraries used in these fields. They are fairly well tested to work together, and the distribution even includes performance optimisations for Intel CPUs. Anaconda Python also includes the standard jupyter notebook system.

Another good alternative is to use Google's hosted system called colab. This makes using python even easier as there is nothing to install. Everything runs in Google's infrastructure, through a web browser. Despite being a test, the service is robust and growing rapidly in popularity. Even better, the service is free, subject to some controls to avoid exploitative use. Most compelling to machine learning researchers is free access to otherwise very expensive GPUs for accelerating computation.


Python Basics

We worked through the following key python concepts - first discussing them, then seeing some examples, and finally having a go at solving some of the challenges which were designed to test our understanding of the theme, or our ability to find answers on the internet.

  • Variables and Lists
  • Loops and Logic
  • Functions
  • Objects and Classes
  • Visualisation


The slides include links to simple notebooks which you can open and explore, and even edit after you save your own copy. The following shows a snippet of the first notebook introducing variables and lists:




The class did very well, working through all the themes. Noting that some had never coded before, this is quite impressive.

The only topic that caused some trouble was the more advanced topic of objects and classes, which could be a topic for an entire class itself.


Object oriented programming is considered an advanced topic, so it is still an achievement if attendees can recognise it in code they look at in future, even if not all the details are immediately clear.


Looking At Other's Code

To demonstrate that what we covered was indeed a large proportion of the basic elements from which real-world code is built, we looked at two different examples:

  • a generative adversarial network which uses neural networks that learn to render faces
  • a web application server which runs a twitter-like service


We noted that the machine learning code was built from now-familiar elements such as variables, functions, imports, loops, classes and objects and visualising numerical arrays.

The web application code was stark in how small it was - given the service was in essence the sae as twitter. The point of this was to show that, with libraries, many problems can be solved with a very small amount of python - and very understandable python at that.


Conclusion

Speaking with attendees afterwards, I was pleased that the session had:

  • demystified coding and python
  • given some the confidence to explore more, noting that what we covered in class is a large proportion of the foundations on which most code is built
  • underlined the importance of research skills over memorising python instructions and options