thoughtwisps One commit at a time

Hello and welcome to thoughtwisps! This is a personal collection of notes and thoughts on software engineering, machine learning and the technology industry and community. For my professional website, please see race-conditions. Thank you for visiting!

Notes on setting up Jupyterhub on an EC2 instance (part 1 )

Notes on setting up Jupyterhub on an EC2 instance ( part 1 )

I am working on setting up a shared Python programming environment for PyLadies London beginner and intermediate programming workshops. My last programming workshop on generators and co-routines was less than successful and I think a lot of it had to with the fact that attendees

  1. had to spend a lot of time setting up an environment
  2. didn’t get direct feedback about whether or not the code they had produced was successful
  3. the material wasn’t great ( let’s be honest )

I hope that I can improve the situation for points 1. and 2. by making a unified PyLadies London Jupyterhub environment. Together with nbgrader, having a unified environment should hopefully make future PyLadies London workshops more enjoyable for the attendees. In this post ( or really ‘note-to-self’ ), I’m going to be listing some notes on setting up a Jupyterhub on an Amazon EC2 instance.

I am broadly following the Deploying JupyterHub on AWS guide. Please note that there may be errors here. If in doubt, always use the official Amazon AWS guides in lieu of whatever I just wrote here.

Setting up an Ubuntu instance on Amazon AWS

  1. Go to the Amazon AWS console
  2. In the ‘Services’ tab, select EC2
  3. Create a new instance and select Ubuntu. I am using the AWS free tier to test drive this deployment, so I’ll be using t2.micro instance type.
  4. Download the private key (.pem file) you get from the ‘Create Instance’ wizard( or create your own ) and move to safe location on local machine
  5. Once you move the .pem file to a safe location, you have to change the permissions on the file. Otherwise connecting to your EC2 machine via ssh may not work.

chmod 400 <my-pem-filepath.pem>

  1. Select your instance in the Instances tab and click on the ‘Connect’ button. This will bring up a dialogue with instructions with the ssh commands that you need to execute from your local machine’s shell to connect to your Ubuntu server running on EC2

ssh -i <my-pem-filepath.pem> ubuntu@<ec2-instance-dns-name> 
  1. Verify the fingerprint. This is, unfortunately, not as straightforward as comparing the fingerpring that appears on the ssh console with the key fingerpring you can find when you go to Network & Security -> Key Pairs tab in your management console. You can find more information about this by reading the answer to this Stack Overflow question. In short, you have to generate a fingerprint from the .pem file by running the command
openssl pkcs8 -in <my-pemfilepath>.pem -nocrypt -topk8 -outform DER | openssl sha1 -c
  1. You should now be able to ssh to your Ubuntu server.

Using Let’s Encrypt to secure communications between browser and server

Let’s Encrypt is a free certificate authority (CA), which means that it provides digital certificates to domains. This in turn helps to ensure secure communication between a browser and the domain it is connecting to (yes, this explanation is a bit waffly and that’s because to this day I have been very very clueless about SSL and certificates - that is all about to change. Hopefully by the next blog post I will be a bit more in tune with what all of this means ).

  1. Let’s Encrypt does not work with amazon.com domains, so I will have to point a domain that I registered with another domain provider to my EC2 instance.

  2. The Deployment guide does this with Route53, an Amazon AWS DNS service. I opted to go down the Route53 route (heheh), although now that I think about it, I should have just stuck with my own DNS provider. In the end, I decided to go back and use my domain’s original nameservers and add the EC2 instance as a new record into the zone file. The TTL on that record is 14400, so I suppose I have 12 hours to wait and see if anything comes to fruition out of my random rambling (TTL is the amount of time I need to wait between updating a DNS record and the DNS record update being reflected in the DNS servers ). I’ll save Route53 for another day. For now, I am too impatient ( and too much of a n00b ).

Passivate/Reactivate Pattern in SimPy

In this blog post we will explore how to use SimPy events to control when specific SimPy Processes start and resume

Suppose we have a train that travels for a random number of time units ( let’s say between 5 and 10 units) and stops at each station for a random number of time units (2-5 units) to pick up passengers. Those numbers don’t necessarily correspond to any real life time span for a train. I just picked them for the purposes of this example. In later blog posts, I will explore integrating the TfL API with the simulation to provide more realistic data of travelling and boarding times for trains.

To represent this situation, we can have two processes travel and board. When the train is travelling, the travel process will be active and the board process passive and vice versa when the train is boarding passengers. We can implement this type of pattern by yielding a SimPy Event, which will suspend the process until the event is successfully triggered.

Let’s take a look at a simple example first.

import simpy
import random

class Train(object):
    def __init__(self, env):
        self.env=env
        self.travel_proc = self.env.process(self.travel())
        self.board_proc = self.env.process(self.board())
        
    def travel(self):
        while True:
            print 'Start travelling at time %d' % self.env.now
            yield self.env.timeout(random.randint(5,11))
            print 'Stopping at time %d' % self.env.now
            
    def board(self):
        while True:
            print 'Start boarding at time %d' % self.env.now
            yield self.env.timeout(random.randint(2,5))
            print 'Stop boarding at time %d' % self.env.now
            
env = simpy.Environment()
train = Train(env)
env.run(until=10)

    Start travelling at time 0
    Start boarding at time 0
    Stop boarding at time 2
    Start boarding at time 2
    Stop boarding at time 7
    Start boarding at time 7
    Stopping at time 9
    Start travelling at time 9

In the previous example, we created a Train class with two Processes: travel and board and we ran the simulation for 10 time units. Note, since we are using while True in the Processes, it is important to include the until keyword argument in the call to env.run to make the simulation stop at a certain timeout.

As we can see from the print statements, both the travel and the board processes start at the same time, which of course is not ideal, since the train cannot be both travelling and boarding customers at once. To make sure that the board processes is passive when the travel processes is active and vice versa, we can introduce an Event that will control when each of the processes is active. Let’s write an example.

class Train(object):
    def __init__(self, env):
        self.env=env
        self.travel_proc=self.env.process(self.travel())
        self.board_proc=self.env.process(self.board())
        self.boarding_completed=self.env.event()
        self.train_stopping=self.env.event()
        
    def travel(self):
        while True:
            print 'Start travelling at time %d' % self.env.now
            yield self.env.timeout(random.randint(5,11))
            print 'Stopping at time %d' % self.env.now
            self.train_stopping.succeed()
            self.train_stopping=self.env.event()
            yield self.boarding_completed
            
    def board(self):
        while True:
            yield self.train_stopping
            print 'Start boarding at time %d' % self.env.now
            yield self.env.timeout(random.randint(2,5))
            print 'Stop boarding at time %d' % self.env.now
            self.boarding_completed.succeed()
            self.boarding_completed=self.env.event()
            
env = simpy.Environment()
train=Train(env)
env.run(until=30)
            
        

    Start travelling at time 0
    Stopping at time 5
    Start boarding at time 5
    Stop boarding at time 9
    Start travelling at time 9
    Stopping at time 19
    Start boarding at time 19
    Stop boarding at time 21
    Start travelling at time 21

If we look at the print statements printed during execution, we can see that now, the board process waits for the travel process to complete and vice versa. Activating and passivating processes is controlled by using two SimPy Events, one to signal that the train has stopped (self.train_stopping) and the other to signal that boarding has completed (self.boarding_complete). When we yield either of these events from a process, the process is suspended until the event is executed successfully. We ensure this by triggering Event.succeed() after boarding has completed and after the train has stopped.

The reader should also note that events cannot be recycled. That is, once an event has been triggered (as successful or as failed), it cannot be triggered again. Therefore, after self.train_stopping.succeed() or self.boarding_complete.succeed() is triggered, we have to assign new instances of Event to self.train_stopping or self.boarding_complete.

Writing your first simulation using SimPy (Part I )

A London Routemaster double decker bus

Image by Oxyman (Wikipedia CC)

SimPy is a Python library that allows you to simulate a variety of discrete-event scenarios. For example, you may want to investigate how the number of available checkout machines influences the length of the customer queue at your local supermarket or how the number of bus stops in a crowded neighbourhood affect your morning commute. SimPy offers you an easy way to build these kinds of simulations. In this blog post series, I am going to give you some introductory examples to help you get started with your own simulations.

Requirements

  1. Knowing a bit about Python generator functions and how to write them (and sort of what they do, though I don’t really dwell into the details here )
  2. Python 2.7 with SimPy installed (apologies all Python 3 fans, I will produce a Python 3 version soon )

Basic SimPy Terminology

A SimPy simulation is constructed using three simple components:

  1. A Process is the actor or active agent in your simulation (for example, customers in a busy store or buses on the road). In SimPy Processes are Python generator functions.
  2. An Environment is a like a container for your simulation, which helps to schedule and coordinate processes
  3. An Event is a way for the Processes in your simulation to commmunicate with each other.

Hopefully each of these will become clearer once we dive into the practical examples.

A Simple Bus Simulation

Let’s construct a simple SimPy simulation for a bus on a route around a London neighbourhood. The route of the bus includes 14 stops and the time it takes to board and offload commuters at each stop takes a constant 10 minutes (a simplification, but let’s roll with it for now). Also let’s suppose, for simplicity, that the time the bus drives between each stop stays constant (this, clearly, does not hold in any realworld situation).

The bus will be a SimPy Process. Thus to begin our simulation, we will construct a simple Python generator function to represent our bus. I will show the whole generator function below and then explain how the individual parts of the function work in the context of the SimPy simulation.

import simpy
import random 

def bus(simpy_environment):
    '''Simple Simpy bus simulation'''
    driving_duration = 5 #time taken to drive between two stops in this neighbourhood
    stopping_duration = 10
    for i in range(15):
        print 'Start driving to bus stop %d at time %d' % (i, simpy_environment.now )
        yield simpy_environment.timeout(driving_duration)
        print 'Stopping to pick up commuters at bus stop %d at time %d' % (i, simpy_environment.now )
        yield simpy_environment.timeout(stopping_duration)

There are a few interesting things going on in the bus method. First, notice that we pass an argument simpy_environment into the method. This is a SimPy Environment object that is in charge of scheduling, starting and suspending SimPy processes (the meaning of this will become clear in a moment). Second, notice that we are yielding two timeout events. When a SimPy Process yields an event, the Environment in which the process is running suspends the Process for the duration of the event. The Process (the bus generator function) resumes executing once the event has been finished.

In our case, we ‘suspend’ the Bus process while it is driving and when it is boarding commuters. It might be a bit strange to think about the bus process as suspended when the bus is actually driving or boarding commuters. The key here, in my opinion, is not to think of the process as being idle (ie not doing anything) when it is suspended, but to think about it as being in a state where the process itself need not execute any extra logic to complete the event. For example, in our simple simulation, from the bus processes perspective boarding commuters is simply an action that takes 10 minutes. There is no extra logic that needs to be executed. Thus but for now in a simple simulation, we represent ‘boarding commuters’ and ‘driving’.

Let’s add some more code to allow the SimPy simulation Environment to execute this process.

env = simpy.Environment() #create the SimPy environment
env.process( bus(env) ) # create an instance of the Bus process
env.run()

Start driving to bus stop 0 at time 0
Stopping to pick up commuters at bus stop 0 at time 5
Start driving to bus stop 1 at time 15
Stopping to pick up commuters at bus stop 1 at time 20
Start driving to bus stop 2 at time 30
Stopping to pick up commuters at bus stop 2 at time 35
Start driving to bus stop 3 at time 45
Stopping to pick up commuters at bus stop 3 at time 50
Start driving to bus stop 4 at time 60
Stopping to pick up commuters at bus stop 4 at time 65
Start driving to bus stop 5 at time 75
Stopping to pick up commuters at bus stop 5 at time 80
Start driving to bus stop 6 at time 90
Stopping to pick up commuters at bus stop 6 at time 95
Start driving to bus stop 7 at time 105
Stopping to pick up commuters at bus stop 7 at time 110
Start driving to bus stop 8 at time 120
Stopping to pick up commuters at bus stop 8 at time 125
Start driving to bus stop 9 at time 135
Stopping to pick up commuters at bus stop 9 at time 140
Start driving to bus stop 10 at time 150
Stopping to pick up commuters at bus stop 10 at time 155
Start driving to bus stop 11 at time 165
Stopping to pick up commuters at bus stop 11 at time 170
Start driving to bus stop 12 at time 180
Stopping to pick up commuters at bus stop 12 at time 185
Start driving to bus stop 13 at time 195
Stopping to pick up commuters at bus stop 13 at time 200
Start driving to bus stop 14 at time 210
Stopping to pick up commuters at bus stop 14 at time 215

There we go! Our first bus simulation completes its 14-stop round in 215 minutes! Admittedly, the logic is a bit borked and unrealistic, but we will work on that shortly. Let’s start by implementing (trying to implement :D ) the following things:

  1. A different driving time between each of the bus stops that varies according to a normal distribution to simulate what might happen as traffic varies according to different areas of the neighbourhood.
  2. A random commuter boarding time that varies between 0 minutes (bus did not stop) and 10 minutes (boarding commuters took a really long time)

    import numpy
    random.seed(10)
    def bus_improved(simpy_environment):
        '''Improved Bus simulation'''
        drive_times = [random.randint(5, 15) for _ in range(15) ] #14 driving times
        for i in range(15):
            print 'Driving to bus stop %d at time %d' % (i, simpy_environment.now)
            yield simpy_environment.timeout(drive_times[i])
            stop_time = random.randint(0,10) + abs(numpy.random.randn())
            if stop_time>0:
                print 'Stopping at bus stop %d at time %d' % (i, simpy_environment.now)
            else:
                print 'No one stopping at bus stop %d.'
            yield simpy_environment.timeout(stop_time)
            
    env = simpy.Environment()
    env.process(bus_improved(env))
    env.run()
        

    Driving to bus stop 0 at time 0
    Stopping at bus stop 0 at time 11
    Driving to bus stop 1 at time 17
    Stopping at bus stop 1 at time 26
    Driving to bus stop 2 at time 31
    Stopping at bus stop 2 at time 42
    Driving to bus stop 3 at time 46
    Stopping at bus stop 3 at time 53
    Driving to bus stop 4 at time 62
    Stopping at bus stop 4 at time 75
    Driving to bus stop 5 at time 81
    Stopping at bus stop 5 at time 95
    Driving to bus stop 6 at time 103
    Stopping at bus stop 6 at time 115
    Driving to bus stop 7 at time 123
    Stopping at bus stop 7 at time 129
    Driving to bus stop 8 at time 132
    Stopping at bus stop 8 at time 142
    Driving to bus stop 9 at time 151
    Stopping at bus stop 9 at time 159
    Driving to bus stop 10 at time 170
    Stopping at bus stop 10 at time 177
    Driving to bus stop 11 at time 187
    Stopping at bus stop 11 at time 202
    Driving to bus stop 12 at time 209
    Stopping at bus stop 12 at time 224
    Driving to bus stop 13 at time 225
    Stopping at bus stop 13 at time 230
    Driving to bus stop 14 at time 230
    Stopping at bus stop 14 at time 244

We can see from the information printed out that the drive times and boarding times have been randomized. While this is still not entirely realistic, it is a bit better than the very first simple bus example. I hope you found this tutorial useful! In the next parts of the tutorial, we’ll take a look at how we can add more logic to our Bus process to build a realistic (more or less) simulation of a morning commute on the Isle of Dogs ( a geographic region in East London ).

A year in a blink

I find it hard to believe we are over a quarter through 2016. It seems that just yesterday I took the train from Edinburgh to London to start my job as a software engineer (code monkey/random code person: I think calling myself a software engineer is overreaching, but that’s a post for another time). Since then I’ve spent a lot of time hanging out with the amazing PyLadies London, attending various meetups in the London tech scene and going to conferences.

PyData Paris 2015

On a whim, I decided to apply to speak at PyData Paris 2015. During my several months in London, I had become interested in the operation of the London underground and, in particular, using Python to study the graph properties of the underground transporation network. Looking back at the experience, I’m very glad I went. Although I seriously considered escaping from the auditorium just before I was due to give my talk, I ended up learning a lot about what works and what does not in effective tech talks ( just as a hint, large slabs of code on slides are usually not very engaging ). A speial thank you goes out to Dr. Stefan Fermigier, who, a few nights before the conference sent out an email with some articles on giving engaging and effective technical talks! Thank you for engaging your speakers and putting on a great PyData conference.

PyData London 2015

Dare I say that my experience in Paris left me yearning for more technology community events. In the months after PyData Paris, I had been expanding my study of the London Underground to incorporate some simulations construced using SimPy, a Python library for discrete event simulations (think about people waiting in line for a coffee at a coffee shop ) and I wanted to present my work to the PyData community. Not only is this one of the largest Python oriented communities in London (the last member count was somewhere between 1000 and 2000), it is also extremely friendly and supportive. Needless, to say I greatly enjoyed my experience at PyData London (though I once again considered running away right before my talk).

PyCon UK 2015 at Coventry

After PyData, it was time to start thinking about attending the largest Python programmer gathering in the UK, PyCon UK organized in Coventry. I rarely get the chance to venture beyong the M25, so a Pythonic trip to Coventry seemed suitable. I applied to mentor at Django Girls and was accepted. Of all my Python community engagements thus far, Django Girls was by far the most rewarding. I highly recommend signing up to be a mentor. You learn a ton from teaching Django to beginners and you also get to experience the joy of mentoring someone in their first ever programming venture! In addition to Django Girls, I spent my day attending various talks: high performance computing with Numba, an absolutely amazing keynote about the Philae lander to name a few.

2016

In 2016, I want to devote more time to three things:

  1. Actively participating in the Python open source community and engaging others to help in CPython core development (just take a look at the bug tracker: there is so much to learn and explore )
  2. Become a better software developer (and maybe even earn the right to call myself a software engineer).
  3. Devote less time for busy work like answering emails and browsing the internet and embrace deep work: producing high quality, resilient and robust, well-designed and elegant code to solve problems.

Conversations about Python Dicts

What is the main conceptual difference between dictionaries and lists?

Lists in Python store objects based on a positional offset and are fetched based on an index whereas in dictionary objects are fetched by keys.

#fetching entries in a list based on position
securities=['equities', 'bonds', 'options', 'futures']
print securities[0]

>>> equities
#fetching entries in a dictionary based on key

employees={'engineering':['Lisa', 'Ann', 'Bob'],
	   'marketing':['Charlie', 'Mike']}
print employees['engineering']

>>> ['Lisa', 'Ann', 'Bob']

What does it mean for a dictionary to be mutable?

A variable stores a reference to a dictionary not a copy. Not understanding this difference fully can sometimes lead to silly or rather serious runtime errors.

icecream={'strawberry':4, 'blueberry':5, 'banana':6}
new_icecream=icecream

The variable new_icecream refers to the exact same dictionary as the variable icecream. We can verify this by adding another item in icecream and then calling new_icecream to retreve that item.

icecream['raspberry']=9
new_icecream['raspberry']
>>>9

Python did not throw a KeyError, which means that there exists a key called raspberry in the dictionary referred to by the variable new_icecream even though we used the icecream variable to add it to the dictionary

What are some alternatives to literals when constructing dictionaries?

The vanilla way to construct a dictionary in Python is to use the literal expression.

vanilla_dictionary={'rasberry':4, 'vanilla':2}

A dictionary can also be constructed by calling dict().

another_dictionary=dict(raspberry=4, vanilla=2)

Alternatively, we can use a list of tuples (key, value) pairs.

dictionary_from_tuples=dict([('raspberry',4),('vanilla',2)])

Sometimes, your functions will give you separate lists for the key and for the values. In this case, it will be useful to employ the zip functions to create key-value pairs and then pass the key-value pairs to the dict() function.

flavours=['linux_mint', 'ubuntu','debian','fedora','redhat','scientific_linux']
number_of_users=[40,30,90,100,80,10]
print zip(flavours,number_of_users)
>>>[('linux_mint', 40), ('ubuntu', 30), ('debian', 90), ('fedora', 100), ('redhat', 80), ('scientific_linux', 10)]

As we can see, the zip() functions creates tuples. We can then pass the zip() function the dict() function to create a dictionary.

nix_users=dict(zip(flavours, number_of_users))
print nix_users
>>>{'scientific_linux': 10, 'fedora': 100, 'redhat': 80, 'linux_mint': 40, 'ubuntu': 30, 'debian': 90}

How do I find out about the other methods available for dictionaries?

Execute dir(dict) or help(dict).

dir(dict)
>>>
['__class__',
 '__cmp__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__init__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'clear',
 'copy',
 'fromkeys',
 'get',
 'has_key',
 'items',
 'iteritems',
 'iterkeys',
 'itervalues',
 'keys',
 'pop',
 'popitem',
 'setdefault',
 'update',
 'values',
 'viewitems',
 'viewkeys',
 'viewvalues']

As you can see, the list is vast!