thoughtwisps One commit at a time

Hello and welcome to thoughtwisps! This is a personal collection of notes and thoughts on software engineering, machine learning and the technology industry and community. For my professional website, please see race-conditions. Thank you for visiting!

Tales from the Python Trenches

A year and a half ago, serious doubts, along with the occasional bug (or ten ) crept into my daily coding life. At the time, I was working for a startup company, which made specialised analytics products for operations and computer security. Most of my days were spent wielding Python against unruly CSV files with missing, incomplete or corrupted data. Although the job was enjoyable and I was absorbing knowledge, I felt that I was failing to develop an ability to understand how large software systems and products are built and maintained. For some reason, at the time, this seemed like a paramount step in the career of an aspiring software engineer ( I hesitated to call myself an engineer back and then and still do, even though it has been my designated job title for a while).

Memories of a recent visit to Google added more fuel to the fire. One of my hosts for the day was the lead developer on Google Chrome and I remember how he beamed with pride when he remarked that the best part about his job was ‘riding on the Tube in the morning and seeing people using your product’. I remember thinking how cool this must be: knowing about and working on the internals of a system used by millions of people. Yet, the event also showed how far I was from being able to work as a professional developer on anything remotely as large and complicated as Chrome. My background was in the world of chemistry laboratories and then the chalk and blackboard floors of the mathematics deparment. I had learned a bit Java on my own and then Python, but that was about it. My graduate degree blended machine learning and bioinformatics, not systems design and operating systems. Thus when my fellow attendees discussed binary search trees, stacks, caches and the Linux kernel all I could do was nod politely and add these items to my ‘To-Study’ list.

After a few months of interviewing while working in my role as Python data analyst, I was able to secure a role in the tech department of Big Corp, where I spent the next year and a half as part of a global team of a hundred or so developers building a large analytics platform in Python. In spite of the barrage of corporate politics and bureaucracy that I encountered, I learned a few very valuable lessons that I would like to note down. Granted not all of the lessons-learned are directly related to Python. Some are bugs with peopleware or process - but important bugs nonetheless!

Invest in Memory Profiling

For junior developers like me, Python’s automatic memory management is a blessing and a curse. A blessing, because with a few lines we can build an entire web application or a server, a curse, because our lack of battle scars with pointers and malloc can easily lead to a support call from hell. This is precisely the situation I found myself one unlucky Thursday morning. With a coffee in tow to jolt awake my still-sleeping brain cells, I logged into my work terminal at my desk and opened the developer chat. There were several messages marked as urgent from the team in Asia-Pacific. They were linking to server side jobs that were crashing and restarting with out-of-memory errors. Within half-an hour or so, the same issues started plaguing the instances serving European users. This had not happened previously and nothing in the environment had changed. There was no monitoring on the data volumes passing through the system, so I tried to eyeball the logfiles to size up if there had been a change in volume. Nothing seemed out of the orginary, yet like clockwork, the server side process kept ballooning and crashing every hour or so.

Eventually, the root cause was found: an innocent looking code change that had been released into production on the previous day. Since the CI system had no performance regression tests, there was no automated alert or failing build to use as a starting point for finding the root cause. Instead, our saving grace was luck - the engineer who had made the commit was online and saw the conversation on developer chat.

Performance regression testing is hard. How often do you run your regression tests? After every commit? Once a day? What data do you use and how do you predict production loads? After you have discovered a memory leak, how do you find which objects are being retained?

OOP is Great until it’s Not

At a recent ClojureBridge meeting, I met an engineer whose first programming language was ClojureScript. After spending many years in Java land, it was strange to discuss programming with someone who was ‘functional-first’. Unfortunately (or fortunately), my first language was Java and so my brain will now and probably far into the foreseeable future be, at least to a certain extent, influenced by the principles of OOP.

In any case, OOP is great if you’re working on a small codebase, but without discipline, OOP in large (>50 million lines) codebases becomes one large inheritance forest. It probably makes sense to the developers who have been working in the same code space for a while, but for a new person, the levels of indirection in giant inheritance hierarchies are overwhelming. As are object public interfaces that span hundreds of methods.

Name Your Variables Well

In 3 weeks time, you won’t remember what type of object is numberContainer. Use variable names and use them well.

Build a Robust Incident Response and Documentation culture

What happens after a major bug or production issue is solved? Is the issue logged anywhere? Does anyone write down the steps that were used to diagnose the root cause of the issue? How was the root cause fixed and what testing was done to make sure the fix will work?

This lesson is not Python specific, but important nonetheless. Too often, it’s easy to get sucked into the chase for the root cause and then, after the bug has been successfully fixed, forget everything about it. Only until the next time, when something similar happens and no one at hand remembers the details well enough. I witnessed this happen many, many times. I was one of the engineers who fixed a bug and then went off to have a midday latte instead of being disciplined and writing down a proper postmortem. Setup a documentation system (Jira, Trello, etc ) that is easy to use, allows code snippets, attachments and screenshorts and that can be easily searched. It will be a valuable record of how resilient your system is and how it evolves.

Coupling Client and Server Code

This one, is once again, not strictly Python specific, but makes ever single server or client release or bug fix into a nightmare. If you don’t use some kind of code-agnostic format of messaging between client and server, but rely on object serialisation, at some point there will be some user somewhere, who refuses to update his client program. If your change is not backwards compatible, the server side process will be in trouble when clients running on the old codebases make connections.

Concurrency is Hard

CPython makes it even harder. A. Jesse Jiryu Davis does a better job than I ever could explaining concurrency in CPython in this post.

Design Your Systems for Monitoring

If monitoring is an afterthought, it will always be an afterthought. In some places, monitoring means a few hastily added debug statements or a few loglines that time the execution of a code block. You might not be motivated during development time to invest in proper monitoring, but come production-issue-time, those hours invested in designing how to monitor a system will pay off. However, pure logging noise (such as logging all messages your program receives) is not helpful to anyone, least of all an on-call engineer who has never committed code into your program.

There will be situations when monitoring is your only hope for figuring out what’s going on. The incidents where a component is crashing are in many ways the easiest, because at least you have a concrete starting point for debug. What about situations where everything seems to be going well, but a user is telling you that a number is incorrect? Or that a number is updating too slowly? If that number is supplied by one service, you have a starting point. What if that number comes out of a pipeline that depends on 5 services interacting together? None of the services has crashed, so there is no obvious point of failure, no obvious point to begin an investigation.

I always strive to think about the metrics that I would find most useful in a incident debug situation. For example, if there is a critical piece of code that needs to executed before an update is sent to the client, I make sure to log this. I also log the number of updates received and sent by the application. In the absence of good metric collecting systems, these give me some ballpark estimates of load. Logging procedures can slow your system down, be mindful of this.

Starting off with what you, the designer of a system, considers useful is not always ideal, because someone else, someone who has never touched your system, may be on-call on the day that it breaks. This is why, ‘debug scenarios’ is one of my favourite team activities. Write down several real or hypothetical user complaints (for example, ‘number X is wrong on my display and it’s taking Y minutes to update. I expect the number to be A and the update to be immediate’) and then ask your team members to diagnose the issue. Observe how they interact with the logs of the application and make notes of log statements that are confusing, unhelpful or red herrings.

For a more persuasive argument about the important of good monitoring, see Nathan Marz’s excellent post “How becoming a pilot made me a better programmer”.

dev/notes/20170713

This is a completely unedited stream-of-consciousness (most certainly not the kind produced by Woolf and Joyce) style braindump of today’s development progress, questions and general wtf moments. Content may not make much sense. You have been warned!

I am back in the world of frontend web development. My goal is to put together a frontend for my London underground simulation. Because I am literally hopeless when things come to Javascript (um, Node, Ember, D3, Angular, React - things have moved on quite a bit since late 2014 early 2015 when I last touched JS ), I’m going through some newbie tutorials (mostly examples from D3.js By Example by Michael Heydt ) and trying my hands at visualizing [London air quality data] (https://data.london.gov.uk/dataset/london-average-air-quality-levels). The examples in D3.js By Example make calls to data hosted in gists, so lo and behold - things don’t work out of the box when I try to call d3.csv on a file stored in my local file system. Chrome surprises with a ‘No ‘Access-Control-Allow-Origin’ header is present.

A few moments of search-engine-ing tells me that perhaps I should not be calling to the file system directly, but using an http server to serve up the raw data file. In goes python -m SimpleHTTPServer and things move along just a bit before grinding to a halt with a ‘Cross-Origin Request Blocked’. As this article explains, a resource within a website may try to load additional resources using a different domain, protocol or port (which is the case when the D3.js script tries to read a file served by the SimpleHTTPServer). This StackOverflow question recommends creating a customised version of the built-in SimpleHTTPServer to send back the correct headers. Solution works, problem avoided, but not fully understood, yet.

the lights of this city

Stay strong, London.

Sleep does not come easily on a night like this.

What else is an insomniac to do than stare out of the window and listen to humming of the ventilation.

The view from the apartment opens to the river. In the dark of these early hours, the water is restless, infused with the orange light from the South bank buildings. A bit further to the west, the spires of the city sparkle in gold and red.

And blue. Where moments ago people laughed about the game or a lighthearted joke, the light is now a stacatto blue. Ambulance blue and police car blue; their strobe-like pulse echoes in the empty street.

The phone rings at 23:40. A worried voice checks that I am still alive. How fortunate am I? How fortunate am I to have walked the same streets, laughed the same laughs, enjoyed the late evening warmth and companionship and have made it home.

Somewhere a phone rings without an answer.

deactivate

I love(d) twitter like I love Finnish licorice and winegums and soy vanilla lattes that taste just a bit too acrid and peanut butter popcorn.

But I can’t derive any value from the constant info-flood, the constant buzz of the hivemind forking in thousands of different thoughtpaths, only to degenerate into a static whitenoise occasionally punctured by catgifs and witty, angry quips, all packaged into 140 chars.

I hit deactivate. I don’t regret it. The withdrawal symptoms will come later, when the idle brain is looking for fresh distractions. Then, I think I’ll open up that book on OCaml I’ve been meaning to read. Or read about brainf*ck or Pietr (esoteric programming langs I’ve been meaning to try out).

If a tree falls in a forest and no one is around to hear it, does it make a sound?

And more importantly, does the tree really give a damn?

the quiet

I come home, open up my laptop, log into Twitter and let my eyes tap into the simultaneous titbits of chatter from hundreds of people I follow. It is quiet in my apartment. A lukewarm silence briefly punctured by signals from a Thames Clipper ferrying tired commuters from the piers of Tower Bridge to Greenwich and onwards or an ambulance siren hurtling to some scene of tragedy. The controlled chaos of the city is subsiding into a calm lull of the evening.

It is quiet in the space, the street, the house, my room but it is a cacophony inside my mind. There are thoughts, words, sarcastic quips and angry retorts, distilled into a 140 character essence chattering about. Usually, when I am quiet, it is my thoughts that take tangible form and sift through the day’s experiences, replaying and reliving, molding and transforming into long term memories to be stored and re-narrated at convenient times.

There is no place for carefully crafted thought in the brain that is busy ingesting as much as it can from the delicious stream of fast food infocalories. There is always more. More tweets to read, more likes to administer, maybe even a retweet every now and then. A notification appears and triggers a cascade of pleasure. This is good, something says. Very good. You are. You exist. You exist in the eye of thousands of other semi-strangers who are to consuming this feed. You exist in someone’s eyes. Maybe that gives you some kind of legitimacy or consolation. An illusion of being seen, heard and understood. For sure, the most terrifying thing is not to be alone, but to know it.

I tweet, therefore I am?

I used to eat my feelings. Physically dampen down the chorus of anxiety by flooding the mind with pleasures of cakes and cookies.

A food-junkie’s addiction to the soothing waves of sugar is not unlike our addiction to a constant stream of information. We drown our thoughts with the voices of others, uttering half-formed sentences and ideas until all the mind has to do is become some kind of rating machine, dispensing likes and retweets with Pavlovian efficiency.

Anodyne. Anodyne is the word I am searching for. When you can fill yourself with the words of others, you are absolved. Absolved, relieved, released from the responsibility of living with yourself. It is a temporary pleasure, a temporary relief that turns into something sinister. Where will the narrative of the self come from if not from thoughts and memories shaped in the quiet spaces?