race_conditions Coding and running

What can a technologist do about climate change?

It is 22:14. The 23rd of August, 2016. The days are starting to get shorter, the nights slightly longer. It is 22:14 at night and it is hot. Someone has aimed a hot hair dryer at London and kept it running all day. Even as the sun set for the day, the exhausted, tired air of the city kept shimmering from the heat.

We are breaking records. In sports, on stadiums and in pools. And in statistics. This July was the hottest month recorded since recordkeeping began.

This Tuesday, the tally was 89 to 78. The residents of Shishmaref, an Inupiat community on the island of Sarichef off the coast of Alaska voted to move, because the land they had inhabited for centuries has become too unstable due to the effects of climate change.

If you are worried about this and wonder what you as a techie/software engineer/concerned technically-minded citizen can do, I encourage you to read Bret Victor's "What can a technologist do about climate change?".

Notes on TLA+ Hyperbook (part 1)

I am currently working through Leslie Lamport's TLA+ Hyperbook and writing up these notes/summaries/review questions to hopefully help me internalise the material.

Notes on TLA+ Hyperbook by Leslie Lamport

What follows are my notes on Leslie Lamport's book TLA+ Hyperbook.

Chapter1: Introduction

What is concurrent computation? What is parallel computation?

In concurrent computation, things occur at the same time. In parallel computation, a single task is executed concurrently (that is in chunks that occur at the same time ). Parallelism is optional (unless prohibited by costly computation time) and is usually easier than concurrency, because the programmer is in control of which chunks of the single task are executed concurrently.

What is a digital system? How to choose a suitable abstraction for a digital system?

A digital system is a system that performs computation as a collection of discrete events. What constitutes a discrete events depends on the abstraction we are using to model the system and who will be using that abstraction. Lamport given an example of a digital calculator. For a user of the digital calculator, pressing the key 5 represents one event, while for the calculator engineer the act of pressing can be two events. The abstraction chosen to model the system must be simple enough to model the system well.

What is the Standard Model?

A system is a collection of behaviours. A behaviour is a sequence of states and represents an execution path of the system. Each state is an assignment of values.

This is interesting! Can we come up with some "real world" example? Let's go with an example of an apple. ( To everyone that grows apples, I apologise - my knowledge of orchards is very poor!). An apple is a system that has many different execution paths. It starts its life as a flower on the apple tree and from there can follow any number of execution paths

  • flower -> frost bite

  • flower -> pollination -> ripped out by storm

  • flower -> pollination -> unripened apple -> eaten by bird

  • flower -> pollination -> unripened apple -> ripe apple

etc. I'm sure you can come with many more execution paths for an apple.

What is a specification? What is a formal speficiation?

A specification is a description of a model, a formal spec a description that is written in precisely defined language.

On the Internet This Week

One the Internet This Week - Week 33

Donald Knuth - All Questions Answered (stanfordonline)

Donald Knuth answers all kinds of questions from faculty, students and questions submitted by internet viewers. I always enjoying watching Don Knuth lecturing or speaking ( it must have been amazing to attend Stanford and see Knuth lecturing live! ), and this video is no exception. Some of the things that stood out for me from this recording:

  • Literate programming is the greatest thing since sliced bread. Programs should be written for people, not for computers.

  • Look up 'Selected Papers on Fun and Games' by Don Knuth

  • The hardest mathematical problem Knuth has resolved is called the 'Birth of a Giant Component in a Random Graph'. If you start with a graph with random vertices and start randomly joining vertices together, a giant change happens when the number of connections you have added is approximately one half the number of vertices. By using ideas from complex analysis, it was possible to 'slow this down and watch it happen by measuring time in a different war' and thus it was possible to study this change. When Knuth goes into a problem, he tries to train his brain. The first week is baby steps, after a few weeks giant steps. All this happens by getting familiar with the problem domain.

  • "There is no royal road to software, anymore than there is a royal road to mathematics."

Laurent Luce - Python Treads Synchronization

A great (though slightly outdated - using Python 2.6 for examples ) post about the various ways to synchronize threads in Python. I have very little experience of threading or synchronization and found this post very approachable.

Notes on setting up Jupyterhub on an EC2 instance ( part 1 )

I am working on setting up a shared Python programming environment for PyLadies London beginner and intermediate programming workshops. My last programming workshop on generators and co-routines was less than successful and I think a lot of it had to with the fact that attendees

  1. had to spend a lot of time setting up an environment
  2. didn't get direct feedback about whether or not the code they had produced was successful
  3. the material wasn't great ( let's be honest )

I hope that I can improve the situation for points 1. and 2. by making a unified PyLadies London Jupyterhub environment. Together with nbgrader, having a unified environment should hopefully make future PyLadies London workshops more enjoyable for the attendees. In this post ( or really 'note-to-self' ), I'm going to be listing some notes on setting up a Jupyterhub on an Amazon EC2 instance.

I am broadly following the Deploying JupyterHub on AWS guide. Please note that there may be errors here. If in doubt, always use the official Amazon AWS guides in lieu of whatever I just wrote here.

Setting up an Ubuntu instance on Amazon AWS

  1. Go to the Amazon AWS console
  2. In the 'Services' tab, select EC2
  3. Create a new instance and select Ubuntu. I am using the AWS free tier to test drive this deployment, so I'll be using t2.micro instance type.
  4. Download the private key (.pem file) you get from the 'Create Instance' wizard( or create your own ) and move to safe location on local machine
  5. Once you move the .pem file to a safe location, you have to change the permissions on the file. Otherwise connecting to your EC2 machine via ssh may not work.

chmod 400 <my-pem-filepath.pem>

  1. Select your instance in the Instances tab and click on the 'Connect' button. This will bring up a dialogue with instructions with the ssh commands that you need to execute from your local machine's shell to connect to your Ubuntu server running on EC2

ssh -i <my-pem-filepath.pem> ubuntu@<ec2-instance-dns-name> 
  1. Verify the fingerprint. This is, unfortunately, not as straightforward as comparing the fingerpring that appears on the ssh console with the key fingerpring you can find when you go to Network & Security -> Key Pairs tab in your management console. You can find more information about this by reading the answer to this Stack Overflow question. In short, you have to generate a fingerprint from the .pem file by running the command
openssl pkcs8 -in <my-pemfilepath>.pem -nocrypt -topk8 -outform DER | openssl sha1 -c
  1. You should now be able to ssh to your Ubuntu server.

Using Let's Encrypt to secure communications between browser and server

Let's Encrypt is a free certificate authority (CA), which means that it provides digital certificates to domains. This in turn helps to ensure secure communication between a browser and the domain it is connecting to (yes, this explanation is a bit waffly and that's because to this day I have been very very clueless about SSL and certificates - that is all about to change. Hopefully by the next blog post I will be a bit more in tune with what all of this means ).

  1. Let's Encrypt does not work with amazon.com domains, so I will have to point a domain that I registered with another domain provider to my EC2 instance.

  2. The Deployment guide does this with Route53, an Amazon AWS DNS service. I opted to go down the Route53 route (heheh), although now that I think about it, I should have just stuck with my own DNS provider. In the end, I decided to go back and use my domain's original nameservers and add the EC2 instance as a new record into the zone file. The TTL on that record is 14400, so I suppose I have 12 hours to wait and see if anything comes to fruition out of my random rambling (TTL is the amount of time I need to wait between updating a DNS record and the DNS record update being reflected in the DNS servers ). I'll save Route53 for another day. For now, I am too impatient ( and too much of a n00b ).

Passivate/Reactivate Pattern in SimPy

In this blog post we will explore how to use SimPy events to control when specific SimPy Processes start and resume

Suppose we have a train that travels for a random number of time units ( let's say between 5 and 10 units) and stops at each station for a random number of time units (2-5 units) to pick up passengers. Those numbers don't necessarily correspond to any real life time span for a train. I just picked them for the purposes of this example. In later blog posts, I will explore integrating the TfL API with the simulation to provide more realistic data of travelling and boarding times for trains.

To represent this situation, we can have two processes travel and board. When the train is travelling, the travel process will be active and the board process passive and vice versa when the train is boarding passengers. We can implement this type of pattern by yielding a SimPy Event, which will suspend the process until the event is successfully triggered.

Let's take a look at a simple example first.

import simpy
import random

class Train(object):
    def __init__(self, env):
        self.travel_proc = self.env.process(self.travel())
        self.board_proc = self.env.process(self.board())

    def travel(self):
        while True:
            print 'Start travelling at time %d' % self.env.now
            yield self.env.timeout(random.randint(5,11))
            print 'Stopping at time %d' % self.env.now

    def board(self):
        while True:
            print 'Start boarding at time %d' % self.env.now
            yield self.env.timeout(random.randint(2,5))
            print 'Stop boarding at time %d' % self.env.now

env = simpy.Environment()
train = Train(env)

    Start travelling at time 0
    Start boarding at time 0
    Stop boarding at time 2
    Start boarding at time 2
    Stop boarding at time 7
    Start boarding at time 7
    Stopping at time 9
    Start travelling at time 9

In the previous example, we created a Train class with two Processes: travel and board and we ran the simulation for 10 time units. Note, since we are using while True in the Processes, it is important to include the until keyword argument in the call to env.run to make the simulation stop at a certain timeout.

As we can see from the print statements, both the travel and the board processes start at the same time, which of course is not ideal, since the train cannot be both travelling and boarding customers at once. To make sure that the board processes is passive when the travel processes is active and vice versa, we can introduce an Event that will control when each of the processes is active. Let's write an example.

class Train(object):
    def __init__(self, env):

    def travel(self):
        while True:
            print 'Start travelling at time %d' % self.env.now
            yield self.env.timeout(random.randint(5,11))
            print 'Stopping at time %d' % self.env.now
            yield self.boarding_completed

    def board(self):
        while True:
            yield self.train_stopping
            print 'Start boarding at time %d' % self.env.now
            yield self.env.timeout(random.randint(2,5))
            print 'Stop boarding at time %d' % self.env.now

env = simpy.Environment()

    Start travelling at time 0
    Stopping at time 5
    Start boarding at time 5
    Stop boarding at time 9
    Start travelling at time 9
    Stopping at time 19
    Start boarding at time 19
    Stop boarding at time 21
    Start travelling at time 21

If we look at the print statements printed during execution, we can see that now, the board process waits for the travel process to complete and vice versa. Activating and passivating processes is controlled by using two SimPy Events, one to signal that the train has stopped (self.train_stopping) and the other to signal that boarding has completed (self.boarding_complete). When we yield either of these events from a process, the process is suspended until the event is executed successfully. We ensure this by triggering Event.succeed() after boarding has completed and after the train has stopped.

The reader should also note that events cannot be recycled. That is, once an event has been triggered (as successful or as failed), it cannot be triggered again. Therefore, after self.train_stopping.succeed() or self.boarding_complete.succeed() is triggered, we have to assign new instances of Event to self.train_stopping or self.boarding_complete.