Hello and welcome to thoughtwisps! This is a personal collection of notes and thoughts on software engineering, machine
learning and the technology industry and community. For my professional website, please see
race-conditions.
Thank you for visiting!
09 Aug 2016
Notes on setting up Jupyterhub on an EC2 instance ( part 1 )
I am working on setting up a shared Python programming environment for PyLadies London
beginner and intermediate programming workshops. My last programming workshop on
generators and co-routines was less than successful and I think a lot of it had to with
the fact that attendees
- had to spend a lot of time setting up an environment
- didn’t get direct feedback about whether or not the code they had produced was successful
- the material wasn’t great ( let’s be honest )
I hope that I can improve the situation for points 1. and 2. by making a unified PyLadies London Jupyterhub
environment. Together with nbgrader, having a unified
environment should hopefully make future PyLadies London workshops more enjoyable for the attendees.
In this post ( or really ‘note-to-self’ ), I’m going to be listing some notes on setting up a Jupyterhub on an
Amazon EC2 instance.
I am broadly following the Deploying JupyterHub on AWS guide.
Please note that there may be errors here. If in doubt, always use the official Amazon AWS guides in lieu of whatever I just wrote here.
Setting up an Ubuntu instance on Amazon AWS
- Go to the Amazon AWS console
- In the ‘Services’ tab, select EC2
- Create a new instance and select Ubuntu. I am using the AWS free tier to test drive this deployment, so I’ll be using t2.micro instance type.
- Download the private key (.pem file) you get from the ‘Create Instance’ wizard( or create your own ) and move to safe location on local machine
- Once you move the .pem file to a safe location, you have to change the permissions on the file. Otherwise connecting to your EC2 machine via ssh may not work.
chmod 400 <my-pem-filepath.pem>
- Select your instance in the Instances tab and click on the ‘Connect’ button. This will bring up a dialogue with instructions with the ssh commands that you need to execute from your local machine’s shell to connect to your Ubuntu server running on EC2
ssh -i <my-pem-filepath.pem> ubuntu@<ec2-instance-dns-name>
- Verify the fingerprint. This is, unfortunately, not as straightforward as comparing the fingerpring that appears on the ssh console with the key fingerpring you can find when you go to Network & Security -> Key Pairs tab in your management console. You can find more information about this by reading the answer to this Stack Overflow question. In short, you have to generate a fingerprint from the .pem file by running the command
openssl pkcs8 -in <my-pemfilepath>.pem -nocrypt -topk8 -outform DER | openssl sha1 -c
- You should now be able to ssh to your Ubuntu server.
Using Let’s Encrypt to secure communications between browser and server
Let’s Encrypt is a free certificate authority (CA), which means that it provides digital certificates to domains. This in turn helps to ensure secure communication between a browser and the domain it is connecting to (yes, this explanation is a bit waffly and that’s because to this day I have been very very clueless about SSL and certificates - that is all about to change. Hopefully by the next blog post I will be a bit more in tune with what all of this means ).
-
Let’s Encrypt does not work with amazon.com domains, so I will have to point a domain that I registered with another domain provider
to my EC2 instance.
-
The Deployment guide does this with Route53, an Amazon AWS DNS service. I opted to go down the Route53 route (heheh), although now
that I think about it, I should have just stuck with my own DNS provider. In the end, I decided to go back and use my domain’s original nameservers and add the EC2 instance as a new record into the zone file. The TTL on that record is 14400, so I suppose I have 12 hours to wait and see if anything comes to fruition out of my random rambling (TTL is the amount of time I need to wait between updating a DNS record and the DNS record update being reflected in the DNS servers ). I’ll save Route53 for another day. For now, I am too impatient ( and too much of a n00b ).
17 May 2016
In this blog post we will explore how to use SimPy events to control when specific SimPy Processes start and resume
Suppose we have a train that travels for a random number of time units ( let’s say between 5 and 10 units) and stops at each station for a random number of time units (2-5 units) to pick up passengers. Those numbers don’t necessarily correspond to any real life time span for a train. I just picked them for the purposes of this example. In later blog posts, I will explore integrating the TfL API with the simulation to provide more realistic data of travelling and boarding times for trains.
To represent this situation, we can have two processes travel
and board
. When the train is travelling, the travel
process will be active and the board
process passive and vice versa when the train is boarding passengers. We can implement this type of pattern by yielding a SimPy Event
, which will suspend the process until the event is successfully triggered.
Let’s take a look at a simple example first.
import simpy
import random
class Train(object):
def __init__(self, env):
self.env=env
self.travel_proc = self.env.process(self.travel())
self.board_proc = self.env.process(self.board())
def travel(self):
while True:
print 'Start travelling at time %d' % self.env.now
yield self.env.timeout(random.randint(5,11))
print 'Stopping at time %d' % self.env.now
def board(self):
while True:
print 'Start boarding at time %d' % self.env.now
yield self.env.timeout(random.randint(2,5))
print 'Stop boarding at time %d' % self.env.now
env = simpy.Environment()
train = Train(env)
env.run(until=10)
Start travelling at time 0
Start boarding at time 0
Stop boarding at time 2
Start boarding at time 2
Stop boarding at time 7
Start boarding at time 7
Stopping at time 9
Start travelling at time 9
In the previous example, we created a Train
class with two Processes
: travel
and board
and we ran the simulation for 10 time units. Note, since we are using while True
in the Processes
, it is important to include the until
keyword argument in the call to env.run
to make the simulation stop at a certain timeout.
As we can see from the print statements, both the travel
and the board
processes start at the same time, which of course is not ideal, since the train cannot be both travelling and boarding customers at once. To make sure that the board
processes is passive when the travel
processes is active and vice versa, we can introduce an Event
that will control when each of the processes is active.
Let’s write an example.
class Train(object):
def __init__(self, env):
self.env=env
self.travel_proc=self.env.process(self.travel())
self.board_proc=self.env.process(self.board())
self.boarding_completed=self.env.event()
self.train_stopping=self.env.event()
def travel(self):
while True:
print 'Start travelling at time %d' % self.env.now
yield self.env.timeout(random.randint(5,11))
print 'Stopping at time %d' % self.env.now
self.train_stopping.succeed()
self.train_stopping=self.env.event()
yield self.boarding_completed
def board(self):
while True:
yield self.train_stopping
print 'Start boarding at time %d' % self.env.now
yield self.env.timeout(random.randint(2,5))
print 'Stop boarding at time %d' % self.env.now
self.boarding_completed.succeed()
self.boarding_completed=self.env.event()
env = simpy.Environment()
train=Train(env)
env.run(until=30)
Start travelling at time 0
Stopping at time 5
Start boarding at time 5
Stop boarding at time 9
Start travelling at time 9
Stopping at time 19
Start boarding at time 19
Stop boarding at time 21
Start travelling at time 21
If we look at the print statements printed during execution, we can see that now, the board
process waits for the travel
process to complete and vice versa. Activating and passivating processes is controlled by using two SimPy Events
, one to signal that the train has stopped (self.train_stopping
) and the other to signal that boarding has completed (self.boarding_complete
). When we yield
either of these events from a process, the process is suspended until the event is executed successfully. We ensure this by triggering Event.succeed()
after boarding has completed and after the train has stopped.
The reader should also note that events cannot be recycled. That is, once an event has been triggered (as successful or as failed), it cannot be triggered again. Therefore, after self.train_stopping.succeed()
or self.boarding_complete.succeed()
is triggered, we have to assign new instances of Event
to self.train_stopping
or self.boarding_complete
.
27 Mar 2016
Image by Oxyman (Wikipedia CC)
SimPy is a Python library
that allows you to simulate a variety of discrete-event scenarios. For example,
you may want to investigate how the number of available checkout machines
influences the length of the customer queue at your local supermarket or how the
number of bus stops in a crowded neighbourhood affect your morning commute.
SimPy offers you an easy way to build these kinds of simulations. In this blog
post series, I am going to give you some introductory examples to help you get
started with your own simulations.
Requirements
- Knowing a bit about Python generator functions and how to write them (and
sort of what they do, though I don’t really dwell into the details here )
- Python 2.7 with SimPy installed (apologies all Python 3 fans, I will produce
a Python 3 version soon )
Basic SimPy Terminology
A SimPy simulation is constructed using three simple components:
- A Process is the actor or active agent in your simulation (for example,
customers in a busy store or buses on the road). In SimPy Processes are Python
generator functions.
- An Environment is a like a container for your simulation, which helps to
schedule and coordinate processes
- An Event is a way for the Processes in your simulation to commmunicate
with each other.
Hopefully each of these will become clearer once we dive into the practical
examples.
A Simple Bus Simulation
Let’s construct a simple SimPy simulation for a bus on a route around a London
neighbourhood. The route of the bus includes 14 stops and the time it takes to
board and offload commuters at each stop takes a constant 10 minutes (a
simplification, but let’s roll with it for now). Also let’s suppose, for
simplicity, that the time the bus drives between each stop stays constant (this,
clearly, does not hold in any realworld situation).
The bus will be a SimPy Process. Thus to begin our simulation, we will
construct a simple Python generator function to represent our bus. I will show
the whole generator function below and then explain how the individual parts of
the function work in the context of the SimPy simulation.
import simpy
import random
def bus(simpy_environment):
'''Simple Simpy bus simulation'''
driving_duration = 5 #time taken to drive between two stops in this neighbourhood
stopping_duration = 10
for i in range(15):
print 'Start driving to bus stop %d at time %d' % (i, simpy_environment.now )
yield simpy_environment.timeout(driving_duration)
print 'Stopping to pick up commuters at bus stop %d at time %d' % (i, simpy_environment.now )
yield simpy_environment.timeout(stopping_duration)
There are a few interesting things going on in the bus
method. First, notice
that we pass an argument simpy_environment
into the method. This is a SimPy
Environment
object that is in charge of scheduling, starting and suspending
SimPy processes (the meaning of this will become clear in a moment).
Second, notice that we are yielding two timeout
events. When a SimPy
Process yields an event, the Environment in which the process is running
suspends the Process for the duration of the event. The Process (the bus
generator function) resumes executing once the event has been finished.
In our case, we ‘suspend’ the Bus process while it is driving and when it is
boarding commuters. It might be a bit strange to think about the bus process as
suspended when the bus is actually driving or boarding commuters. The key here,
in my opinion, is not to think of the process as being idle (ie not doing
anything) when it is suspended, but to think about it as being in a state where
the process itself need not execute any extra logic to complete the event. For
example, in our simple simulation, from the bus processes perspective boarding
commuters is simply an action that takes 10 minutes. There is no extra logic
that needs to be executed. Thus but for now in a simple simulation, we
represent ‘boarding commuters’ and ‘driving’.
Let’s add some more code to allow the SimPy simulation Environment to execute
this process.
env = simpy.Environment() #create the SimPy environment
env.process( bus(env) ) # create an instance of the Bus process
env.run()
Start driving to bus stop 0 at time 0
Stopping to pick up commuters at bus stop 0 at time 5
Start driving to bus stop 1 at time 15
Stopping to pick up commuters at bus stop 1 at time 20
Start driving to bus stop 2 at time 30
Stopping to pick up commuters at bus stop 2 at time 35
Start driving to bus stop 3 at time 45
Stopping to pick up commuters at bus stop 3 at time 50
Start driving to bus stop 4 at time 60
Stopping to pick up commuters at bus stop 4 at time 65
Start driving to bus stop 5 at time 75
Stopping to pick up commuters at bus stop 5 at time 80
Start driving to bus stop 6 at time 90
Stopping to pick up commuters at bus stop 6 at time 95
Start driving to bus stop 7 at time 105
Stopping to pick up commuters at bus stop 7 at time 110
Start driving to bus stop 8 at time 120
Stopping to pick up commuters at bus stop 8 at time 125
Start driving to bus stop 9 at time 135
Stopping to pick up commuters at bus stop 9 at time 140
Start driving to bus stop 10 at time 150
Stopping to pick up commuters at bus stop 10 at time 155
Start driving to bus stop 11 at time 165
Stopping to pick up commuters at bus stop 11 at time 170
Start driving to bus stop 12 at time 180
Stopping to pick up commuters at bus stop 12 at time 185
Start driving to bus stop 13 at time 195
Stopping to pick up commuters at bus stop 13 at time 200
Start driving to bus stop 14 at time 210
Stopping to pick up commuters at bus stop 14 at time 215
There we go! Our first bus simulation completes its 14-stop round in 215
minutes!
Admittedly, the logic is a bit borked and unrealistic, but we will work on that
shortly. Let’s start by implementing (trying to implement :D ) the following
things:
- A different driving time between each of the bus stops that varies according
to a normal distribution to simulate what might happen as traffic varies
according to different areas of the neighbourhood.
- A random commuter boarding time that varies between 0 minutes (bus did not
stop) and 10 minutes (boarding commuters took a really long time)
import numpy
random.seed(10)
def bus_improved(simpy_environment):
'''Improved Bus simulation'''
drive_times = [random.randint(5, 15) for _ in range(15) ] #14 driving times
for i in range(15):
print 'Driving to bus stop %d at time %d' % (i, simpy_environment.now)
yield simpy_environment.timeout(drive_times[i])
stop_time = random.randint(0,10) + abs(numpy.random.randn())
if stop_time>0:
print 'Stopping at bus stop %d at time %d' % (i, simpy_environment.now)
else:
print 'No one stopping at bus stop %d.'
yield simpy_environment.timeout(stop_time)
env = simpy.Environment()
env.process(bus_improved(env))
env.run()
Driving to bus stop 0 at time 0
Stopping at bus stop 0 at time 11
Driving to bus stop 1 at time 17
Stopping at bus stop 1 at time 26
Driving to bus stop 2 at time 31
Stopping at bus stop 2 at time 42
Driving to bus stop 3 at time 46
Stopping at bus stop 3 at time 53
Driving to bus stop 4 at time 62
Stopping at bus stop 4 at time 75
Driving to bus stop 5 at time 81
Stopping at bus stop 5 at time 95
Driving to bus stop 6 at time 103
Stopping at bus stop 6 at time 115
Driving to bus stop 7 at time 123
Stopping at bus stop 7 at time 129
Driving to bus stop 8 at time 132
Stopping at bus stop 8 at time 142
Driving to bus stop 9 at time 151
Stopping at bus stop 9 at time 159
Driving to bus stop 10 at time 170
Stopping at bus stop 10 at time 177
Driving to bus stop 11 at time 187
Stopping at bus stop 11 at time 202
Driving to bus stop 12 at time 209
Stopping at bus stop 12 at time 224
Driving to bus stop 13 at time 225
Stopping at bus stop 13 at time 230
Driving to bus stop 14 at time 230
Stopping at bus stop 14 at time 244
We can see from the information printed out that the drive times and boarding
times have been randomized. While this is still not entirely realistic, it is a
bit better than the very first simple bus example.
I hope you found this tutorial useful!
In the next parts of the tutorial, we’ll take a look at how we can add more
logic to our Bus process to build a realistic (more or less) simulation of a
morning commute on the Isle of Dogs ( a geographic region in East London ).
24 Mar 2016
I find it hard to believe we are over a quarter through 2016. It seems that just yesterday
I took the train from Edinburgh to London to start my job as a software engineer (code monkey/random
code person: I think calling myself a software engineer is overreaching, but that’s a post for
another time). Since then I’ve spent a lot of time hanging out with the amazing PyLadies London,
attending various meetups in the London tech scene and going to conferences.
PyData Paris 2015
On a whim, I decided to apply to speak at PyData Paris 2015.
During my several months in London, I had become interested in
the operation of the London underground and, in particular, using
Python to study the graph properties of the underground transporation network.
Looking back at the experience, I’m very glad I went.
Although I seriously considered escaping from the auditorium just before I was due
to give my talk, I ended up learning a lot about what works and what does not in
effective tech talks ( just as a hint, large slabs of code on slides are usually not
very engaging ). A speial thank you goes out to Dr. Stefan Fermigier, who, a few nights
before the conference sent out an email with some articles on giving engaging and
effective technical talks! Thank you for engaging your speakers and putting on a
great PyData conference.
PyData London 2015
Dare I say that my experience in Paris left me yearning for more
technology community events. In the months after PyData Paris, I had been
expanding my study of the London Underground to incorporate some simulations
construced using SimPy, a Python library for discrete event simulations
(think about people waiting in line for a coffee at a coffee shop ) and I
wanted to present my work to the PyData community. Not only is this one of the largest
Python oriented communities in London (the last member count was somewhere between 1000 and 2000),
it is also extremely friendly and supportive. Needless, to say I greatly enjoyed my experience
at PyData London (though I once again considered running away right before my talk).
PyCon UK 2015 at Coventry
After PyData, it was time to start thinking about attending the largest
Python programmer gathering in the UK, PyCon UK organized in Coventry.
I rarely get the chance to venture beyong the M25, so a Pythonic trip
to Coventry seemed suitable. I applied to mentor at Django Girls and was accepted.
Of all my Python community engagements thus far, Django Girls was by far the most
rewarding. I highly recommend signing up to be a mentor. You learn a ton from
teaching Django to beginners and you also get to experience the joy of mentoring someone
in their first ever programming venture! In addition to Django Girls, I spent
my day attending various talks: high performance computing with Numba, an absolutely
amazing keynote about the Philae lander to name a few.
2016
In 2016, I want to devote more time to three things:
- Actively participating in the Python open source community and engaging others to help
in CPython core development (just take a look at the bug tracker: there is so much to learn
and explore )
- Become a better software developer (and maybe even earn the right to call myself a
software engineer).
- Devote less time for busy work like answering emails and browsing the internet
and embrace deep work: producing high quality, resilient and robust, well-designed
and elegant code to solve problems.
21 Mar 2015
What is the main conceptual difference between dictionaries and lists?
Lists in Python store objects based on a positional offset and are fetched
based on an index whereas in dictionary objects are fetched by keys.
#fetching entries in a list based on position
securities=['equities', 'bonds', 'options', 'futures']
print securities[0]
>>> equities
#fetching entries in a dictionary based on key
employees={'engineering':['Lisa', 'Ann', 'Bob'],
'marketing':['Charlie', 'Mike']}
print employees['engineering']
>>> ['Lisa', 'Ann', 'Bob']
What does it mean for a dictionary to be mutable?
A variable stores a reference to a dictionary not a copy.
Not understanding this difference fully can sometimes lead
to silly or rather serious runtime errors.
icecream={'strawberry':4, 'blueberry':5, 'banana':6}
new_icecream=icecream
The variable new_icecream
refers to the exact same dictionary as the variable icecream.
We can verify this by adding another item in icecream
and then calling
new_icecream
to retreve that item.
icecream['raspberry']=9
new_icecream['raspberry']
>>>9
Python did not throw a KeyError, which means that there exists a key called raspberry
in the dictionary
referred to by the variable new_icecream
even though
we used the icecream
variable to add it to the dictionary
What are some alternatives to literals when constructing dictionaries?
The vanilla way to construct a dictionary in Python is to use the literal expression.
vanilla_dictionary={'rasberry':4, 'vanilla':2}
A dictionary can also be constructed by calling dict()
.
another_dictionary=dict(raspberry=4, vanilla=2)
Alternatively, we can use a list of tuples (key, value) pairs.
dictionary_from_tuples=dict([('raspberry',4),('vanilla',2)])
Sometimes, your functions will give you separate lists for the key and for the
values. In this case, it will be useful to employ the zip
functions to create
key-value pairs and then pass the key-value pairs to the dict()
function.
flavours=['linux_mint', 'ubuntu','debian','fedora','redhat','scientific_linux']
number_of_users=[40,30,90,100,80,10]
print zip(flavours,number_of_users)
>>>[('linux_mint', 40), ('ubuntu', 30), ('debian', 90), ('fedora', 100), ('redhat', 80), ('scientific_linux', 10)]
As we can see, the zip()
functions creates tuples.
We can then pass the zip()
function the dict()
function to create a dictionary.
nix_users=dict(zip(flavours, number_of_users))
print nix_users
>>>{'scientific_linux': 10, 'fedora': 100, 'redhat': 80, 'linux_mint': 40, 'ubuntu': 30, 'debian': 90}
How do I find out about the other methods available for dictionaries?
Execute dir(dict)
or help(dict)
.
dir(dict)
>>>
['__class__',
'__cmp__',
'__contains__',
'__delattr__',
'__delitem__',
'__doc__',
'__eq__',
'__format__',
'__ge__',
'__getattribute__',
'__getitem__',
'__gt__',
'__hash__',
'__init__',
'__iter__',
'__le__',
'__len__',
'__lt__',
'__ne__',
'__new__',
'__reduce__',
'__reduce_ex__',
'__repr__',
'__setattr__',
'__setitem__',
'__sizeof__',
'__str__',
'__subclasshook__',
'clear',
'copy',
'fromkeys',
'get',
'has_key',
'items',
'iteritems',
'iterkeys',
'itervalues',
'keys',
'pop',
'popitem',
'setdefault',
'update',
'values',
'viewitems',
'viewkeys',
'viewvalues']
As you can see, the list is vast!