race_conditions Coding and running

Code Mesh 2016

Code Mesh 2016 : Part 1

Thanks to the generous support of the Code Mesh scholarship program, I was able to attend the conference, meet interesting people working on non-mainstream technologies and most importantly added about 100 new entries to my "Learn More About This" list. What follows is a summary I've constructed to remind myself of the talks and provide a convenient reference in future technical research projects.

Code Mesh 2016 Day 1

The first day of the conference kicked off with a keynote by Conor McBride on the topic of Space Monads. My brain - a newcomer to the world of functional programming - was catapulted into the unfamiliar terrain of functors, monads, a peculiar (in terms of syntax ) programming language called Agda and something that eventually compiled to show two moving squares in a console window. What I did ( sort-of) understand is the fact that Agda allows the programmer to specify the type of a program and then automatically generates a program of that type. I am definitely adding space monads to my "Learn More About This" list, but I feel that I can only gain solid understanding of the things that were demonstrated in the keynote after gaining some ground in the functor-monoid-monad territory (Haskell to the rescue? ).

After the keynote, the conference attendees separated into three tracks. I attended a talk on 'Gossiping Unikernels' by Andreas Garnaes (Zendesk). Andreas' discussed rearchitecting a monolith application( everything lives in the same process ) into microservices which are hosted on unikernels. While the traditional 'server-side' stack involves multiple layers of virtualisation ( the application space, the container space, the virtual OS, the hypervisor ), unikernels are able to directly interface with the hypervisor. Because unikernels incorporate less code than traditional operating systems, they have a reduced attack surface and are thus (potentially) more secure. The 'potentially' in the paraphrasing of Andreas' talk is my addition - mainly as a footnote, because I am not entirely clear on what the drawbacks of reducing traditional operating system code are. Do unikernels lose some of the traditional OS security by including less library code?

Furthermore, Andreas discussed a problem that emerged once the monolith had been broken into microservices: service discovery - in other words how do other microservices know who is providing what. Two solutions are available: introduce a centralized messaging component or investigate peer-to-peer solutions. The naive implementation of a peer-to-peer solution would be to regularly heartbeat to all members of the microservices cluster, but this approach would take too much bandwidth, since the number of microservices that a component needs to ping to assert membership grows quadratically with the number of microservice nodes. Thus there is a need for a distrubited membership protocol that eventually detects a faulty node, is fairly fast and does not make high demands on the network. SWIM is such a protocol that achieves its aims by randomly heartbeating and gossiping. For example, in a three-node system with Alice, Bob and Charlie microservices, Alice pings Charlie to check for failure. If Charlie reponds, all is good. If Charlie does not respond within a particular timeframe, Alice pings Bob and asks Bob to ping Charlie. If Charlie, reponds to Bob, he relays this response to Alice. If not, Alice records Charlie as unavailable and propagates this information to Bob (gossiping), who is then aware of Charlie being unavailable and in turn propagates this information to all other nodes in the systems. Thus all nodes eventually recognize that Charlie is no longer available.

After describing the theoretical aspect of SWIM, Andreas gave a demo of a pure implementation of the protocol ( once again dipping toes into functional ) and demonstrated how pure implementations can make the development process easier by allowing different types of io interfaces( sync, async, mock, MirageOS) to wrap the core SWIM implementation on demand.
Finally, the talk concluded with a useful tip on how to apply property based testing to reduce the number of execution paths that have to be tested in a distributed system.

After Andreas' talk, I wandered into the main hall to hear Sophie Wilson speak about the 'Future of Microprocessors' . As someone who spends a big part of her day coming with instructions for a silicon brain to execute, I am ashamed to say that the inner workings of that silicon brain are a bit of a mystery. In spite of my inability to coherently define key terms such as microprocess or transistor, I had previously heard of Moore's Law ("the number of transistors we can fit on a silicon (chip?)" - a lapse in the note taking process - "doubles every 2 years" ). The law is more of an empirical observations and the precise wording, apparently, drifts depending on the chip industry. Sophie showed the original microprocessor designs for the 6502 and the ARM1. All I could do was marvel at the intricate photos and make mental notes to learn more about microprocessors ASAP. In her discussion about FirePath, Sophie demonstrated a set of instructions designed to be truly parallel and remarked that very few moden software development languages are truly parallel and thus are a poor fit for the parallel hardware environment ( a theme that was touched up in Kevin Hammond's ParaFormance talk later in the afternoon ).

The number of transisitors on a chip is rising, but the ability of devices to utilize this increasing capacity is limited by the cost of etching smaller gates and by the immense heat generated by these power dense microprocessors. Thus even if more and more cores are added to future devices, some of these cores will have to be kept dark to keep the device from overheating. The 10x performance increase that was witnessed in the early development of microprocessors will not be witnessed again.

In the afternoon, Mark Priestly spoke about the process behind the birth of new programming paradigms - with a specific slant toward the functional programming paradigm. Mark pointed out two ways to study the concept of a 'history of a programming paradigm' - a backward looking one, captured by Paul Hudak and David Turner and a forward looking one, which is constructed by looking at the prevailing programming problems and challenges in the early 1950s (before the development of the functional paradigm ) and working forwards to see which steps and paths of investigation led to the moden notion of the functional programming paradigm. While Hudak and Turner look to Church's lambda calculus as the first functional programming language, the forward looking history of the functional programming paradigm traces it's origin to the work of Herbert and Simon's theorem proving machine, which gives rise to the list datastructure, which in turn gives rise to Lisp.

After learning a bit about the functional programming paradigm, I went to learn about the 'A Brief History of Distributed Programming: RPC' by Caitie McCaffrey (Twitter) and Christopher Meiklejohn (Universite catholique de Louvain) - one of the talks on my 'Most Anticipated' list. RPC, as I learned during the talk, stands for Remote Procedure Call and in a nutshell ( which I am hopefully presenting here correctly ) allows one computer to execute some functionality on a remote computer. Caitie and Christopher walked the audience through several iterations of the RPC technology, which was captured in the Request for Comments memos published by various industry members. The problem with RPCs can be boiled down to two main issues: (1) Do we treat everything as local or (2) Everything as remote. RPC has re-emerged in the form of new frameworks, such as Finagle for the JVM. None of these frameworks addresses the original issues ('everything local' or everything remote') and only solves the problem of getting things on and off the wire. Future intresting academic research in this area involves: Lasp, Bloom and Spores.

Finally, I went to Neha Narula's (MIT Media Lab ) talk on 'The End of Data Silos: Interoperability via Cryptocurrencies'. Her talk proided an excellent summary on the three ideas that come together to make the blockchain technology possible 1) distributed consensus 2)private key cryptography 3) common data formats and protocols. In distributed consensus, multiple computers come together to agree on a value. This system has to be tolerant against Byzantine failures - failures where certain nodes in the system start sending false or incorrect data. Existing Byzantine fault tolerance protocols require participants to be known ahead of time, which is not possible in a distributed system such as cryptocurrency. This opens up the system to the Sybil attack, in which an attacker creates thousands of subverted nodes to influence the process of distributed consensus. What thwarts this effort in distributed cryptocurrencies is the computational price a new joiner in the blockchain has to pay. A Sybil attack is still possible but would require a large amount of power to be expended by the attacker.

What can a technologist do about climate change?

It is 22:14. The 23rd of August, 2016. The days are starting to get shorter, the nights slightly longer. It is 22:14 at night and it is hot. Someone has aimed a hot hair dryer at London and kept it running all day. Even as the sun set for the day, the exhausted, tired air of the city kept shimmering from the heat.

We are breaking records. In sports, on stadiums and in pools. And in statistics. This July was the hottest month recorded since recordkeeping began.

This Tuesday, the tally was 89 to 78. The residents of Shishmaref, an Inupiat community on the island of Sarichef off the coast of Alaska voted to move, because the land they had inhabited for centuries has become too unstable due to the effects of climate change.

If you are worried about this and wonder what you as a techie/software engineer/concerned technically-minded citizen can do, I encourage you to read Bret Victor's "What can a technologist do about climate change?".

Notes on TLA+ Hyperbook (part 1)

I am currently working through Leslie Lamport's TLA+ Hyperbook and writing up these notes/summaries/review questions to hopefully help me internalise the material.

Notes on TLA+ Hyperbook by Leslie Lamport

What follows are my notes on Leslie Lamport's book TLA+ Hyperbook.

Chapter1: Introduction

What is concurrent computation? What is parallel computation?

In concurrent computation, things occur at the same time. In parallel computation, a single task is executed concurrently (that is in chunks that occur at the same time ). Parallelism is optional (unless prohibited by costly computation time) and is usually easier than concurrency, because the programmer is in control of which chunks of the single task are executed concurrently.

What is a digital system? How to choose a suitable abstraction for a digital system?

A digital system is a system that performs computation as a collection of discrete events. What constitutes a discrete events depends on the abstraction we are using to model the system and who will be using that abstraction. Lamport given an example of a digital calculator. For a user of the digital calculator, pressing the key 5 represents one event, while for the calculator engineer the act of pressing can be two events. The abstraction chosen to model the system must be simple enough to model the system well.

What is the Standard Model?

A system is a collection of behaviours. A behaviour is a sequence of states and represents an execution path of the system. Each state is an assignment of values.

This is interesting! Can we come up with some "real world" example? Let's go with an example of an apple. ( To everyone that grows apples, I apologise - my knowledge of orchards is very poor!). An apple is a system that has many different execution paths. It starts its life as a flower on the apple tree and from there can follow any number of execution paths

  • flower -> frost bite

  • flower -> pollination -> ripped out by storm

  • flower -> pollination -> unripened apple -> eaten by bird

  • flower -> pollination -> unripened apple -> ripe apple

etc. I'm sure you can come with many more execution paths for an apple.

What is a specification? What is a formal speficiation?

A specification is a description of a model, a formal spec a description that is written in precisely defined language.

On the Internet This Week

One the Internet This Week - Week 33

Donald Knuth - All Questions Answered (stanfordonline)

Donald Knuth answers all kinds of questions from faculty, students and questions submitted by internet viewers. I always enjoying watching Don Knuth lecturing or speaking ( it must have been amazing to attend Stanford and see Knuth lecturing live! ), and this video is no exception. Some of the things that stood out for me from this recording:

  • Literate programming is the greatest thing since sliced bread. Programs should be written for people, not for computers.

  • Look up 'Selected Papers on Fun and Games' by Don Knuth

  • The hardest mathematical problem Knuth has resolved is called the 'Birth of a Giant Component in a Random Graph'. If you start with a graph with random vertices and start randomly joining vertices together, a giant change happens when the number of connections you have added is approximately one half the number of vertices. By using ideas from complex analysis, it was possible to 'slow this down and watch it happen by measuring time in a different war' and thus it was possible to study this change. When Knuth goes into a problem, he tries to train his brain. The first week is baby steps, after a few weeks giant steps. All this happens by getting familiar with the problem domain.

  • "There is no royal road to software, anymore than there is a royal road to mathematics."

Laurent Luce - Python Treads Synchronization

A great (though slightly outdated - using Python 2.6 for examples ) post about the various ways to synchronize threads in Python. I have very little experience of threading or synchronization and found this post very approachable.

Notes on setting up Jupyterhub on an EC2 instance ( part 1 )

I am working on setting up a shared Python programming environment for PyLadies London beginner and intermediate programming workshops. My last programming workshop on generators and co-routines was less than successful and I think a lot of it had to with the fact that attendees

  1. had to spend a lot of time setting up an environment
  2. didn't get direct feedback about whether or not the code they had produced was successful
  3. the material wasn't great ( let's be honest )

I hope that I can improve the situation for points 1. and 2. by making a unified PyLadies London Jupyterhub environment. Together with nbgrader, having a unified environment should hopefully make future PyLadies London workshops more enjoyable for the attendees. In this post ( or really 'note-to-self' ), I'm going to be listing some notes on setting up a Jupyterhub on an Amazon EC2 instance.

I am broadly following the Deploying JupyterHub on AWS guide. Please note that there may be errors here. If in doubt, always use the official Amazon AWS guides in lieu of whatever I just wrote here.

Setting up an Ubuntu instance on Amazon AWS

  1. Go to the Amazon AWS console
  2. In the 'Services' tab, select EC2
  3. Create a new instance and select Ubuntu. I am using the AWS free tier to test drive this deployment, so I'll be using t2.micro instance type.
  4. Download the private key (.pem file) you get from the 'Create Instance' wizard( or create your own ) and move to safe location on local machine
  5. Once you move the .pem file to a safe location, you have to change the permissions on the file. Otherwise connecting to your EC2 machine via ssh may not work.

chmod 400 <my-pem-filepath.pem>

  1. Select your instance in the Instances tab and click on the 'Connect' button. This will bring up a dialogue with instructions with the ssh commands that you need to execute from your local machine's shell to connect to your Ubuntu server running on EC2

ssh -i <my-pem-filepath.pem> ubuntu@<ec2-instance-dns-name> 
  1. Verify the fingerprint. This is, unfortunately, not as straightforward as comparing the fingerpring that appears on the ssh console with the key fingerpring you can find when you go to Network & Security -> Key Pairs tab in your management console. You can find more information about this by reading the answer to this Stack Overflow question. In short, you have to generate a fingerprint from the .pem file by running the command
openssl pkcs8 -in <my-pemfilepath>.pem -nocrypt -topk8 -outform DER | openssl sha1 -c
  1. You should now be able to ssh to your Ubuntu server.

Using Let's Encrypt to secure communications between browser and server

Let's Encrypt is a free certificate authority (CA), which means that it provides digital certificates to domains. This in turn helps to ensure secure communication between a browser and the domain it is connecting to (yes, this explanation is a bit waffly and that's because to this day I have been very very clueless about SSL and certificates - that is all about to change. Hopefully by the next blog post I will be a bit more in tune with what all of this means ).

  1. Let's Encrypt does not work with amazon.com domains, so I will have to point a domain that I registered with another domain provider to my EC2 instance.

  2. The Deployment guide does this with Route53, an Amazon AWS DNS service. I opted to go down the Route53 route (heheh), although now that I think about it, I should have just stuck with my own DNS provider. In the end, I decided to go back and use my domain's original nameservers and add the EC2 instance as a new record into the zone file. The TTL on that record is 14400, so I suppose I have 12 hours to wait and see if anything comes to fruition out of my random rambling (TTL is the amount of time I need to wait between updating a DNS record and the DNS record update being reflected in the DNS servers ). I'll save Route53 for another day. For now, I am too impatient ( and too much of a n00b ).