Friday, 22 July 2016

How to dive into a large codebase

Getting to grips with a new codebase can be very difficult. Every software developer has to dive into unfamiliar code on a regular basis, but to my knowledge there are no good guides on how to approach the task. My job(s) for the past decade have involved writing code, but more of my time has been spent reviewing code on dozens of active projects and learning how to quickly dive into an unfamiliar codebase has been crucial.

Many discussions on this topic focus on how to navigate code in a particular editor. In this post I want to focus on the general techniques rather than editor specifics (though I’ll get to my current preferred setup at the end).

Survey the directory structure

Start at the root directory of the project. Most normal projects will have 10 to 20 files and directories in the root. Go through these one by one and make a 1-line note of the purpose and contents of each.

Checklist – at the end of this step you should be able to answer these questions
  1. What files, if any, provide documentation
  2. What files drive the build and deployment system (for projects and languages that don’t strictly have a build system, there’s usually a deployment system; I’ll just refer to this as the build system)
  3. Where is the source code and is there a subdirectory structure for source files, if so, what does the subdirectory structure represent (libraries, components, executables?)
  4. Where is the test code (usually either commingled with the main source or in a separate subdirectory)
  5. What are the external build dependencies required to build the project
  6. What are the build targets (usually executables, libraries, tests, documentation)

Understand and Run the Build and Tests

Even though I’m often reviewing code that I’m never going to modify, I still like to start by verifying that I can successfully build the project outputs and run the main executable and tests (if those things exist). This step helps identify any weird dependencies the project has, and means that when you’re finally ready to edit code you don’t have to break flow to figure out the build system.

If the project has a test suite, figure out how to run it. Test suites vary a lot across languages and projects, and in some cases can be really finicky to get running, but it’s time well spent.

Identify the interfaces, inputs and outputs

Every program is just a way to transform input data into output data. If you “dive” into the middle of a large codebase and try to figure things out from the inside out, you will fail, or at least waste a lot more time than you should. Always start from the outside and work your way in.

Identify what the inputs are, and what the outputs are. Make notes describing them – force yourself to articulate this knowledge.

Projects that implement an “official” API ought to be easier to comprehend, and often they are, but don’t fall into the trap of assuming that all the inputs and outputs are captured by the API. Many APIs provide a partial account of the I/O, and in fact you need to understand the backend database interface and the dataflows into the DB in order to really identify all the relevant inputs and outputs.

Make sure that you identify all the inputs and outputs, that includes log file outputs and configuration inputs. Many projects have logging outputs that give you a very useful and comprehensive picture of what the program does.

Structured Examination of Code

Don’t just “browse” the code. Write down specific questions that you want to investigate, like “How are messages filtered and decrypted”. Keep focused on the point you are investigating, try to avoid being distracted by interesting looking code.

Make notes describing the answer to these questions, including a function call graph and any important data manipulations.

When you open a file, page down through the file, all the way to the bottom, spending about 5 seconds skim/scanning the code per screen. I don’t have a good explanation, but I find this really helps me to get oriented and get a feel for the size and shape of the code. You obviously can’t absorb much of the detail by doing this, but it answers a lot of high level questions like whether the code is repetitive boiler plate or a bunch of simple functions or a small number of really complicated functions.

Understand the branching structure

Thankfully most modern projects use good distributed version control systems with sane branching policies. You can usually figure out the branching policy quite quickly just by looking at the history, but always check the project documentation for specific information on this.

Spend 20 minutes reading the most recent commit messages and diffs

I time-box this activity because for large, long-running projects you could spend an indefinite length of time reading the changes. 20 minutes doesn’t sound like much but it’s more than enough to get a feel for the parts of the codebase that are under active development, which developers are working on those areas, and whether the development is issue-driven or new-feature.

Making Notes

You have to make notes as you go, otherwise you will flounder and waste an inordinate amount of time. If you need to dip in and out of codebases with weeks or months in between visits, your notes will be invaluable to you the next time through.

I start taking notes in Workflowy. If the notes grow a lot, I switch them to a git repository I’ve called “codenotes” just for this purpose. It has a subdirectory for every project, with cloning instructions, so I know how to get started next time around, along with my notes. If you’re spending a lot of time on one large project, consider writing a readme for developers and adding it to the project’s own wiki or source control.

My Personal Setup

I use Vim to read code. I turned off syntax highlighting a long time ago and am convinced that it’s far easier to quickly read and comprehend code without it. Actually I use the nofrils color scheme that has no syntax highlighting but does make comments a very slightly different color to the code.

I occasionally use folds (two keystrokes will hide all the code except the toplevel class and function declarations), but they are not crucial. I use the NERDTree plugin to browse the directory structure, but again I don’t think it’s crucial.

I have set up a few keyboard shortcuts that make it quicker to load files and switch between files.

Buffers: nnoremap Leader b :ls :buffer
Files: nnoremap Leader e q:iedit **/*

I’ve used tags on and off over the years. If you work with languages for which tag support is mature, then they’re good, but several of the languages I need to work with are still working out tags support (javascript and others), and the time needed to set up the finicky toolchain isn’t worth it in my view. There isn’t enough of a difference between tags and grep in my view for me to spend time on tags that don’t just work out of the box.

I also occasionally use Atom, Visual Studio, VS Code, neovim, and a few other editors and IDEs and find them all to be perfectly acceptable, I’m just more productive in Vim.

Friday, 17 June 2016

Browsing is Broken Part 3: Privacy

Access Provider Privacy

Whenever you connect to the web, you're connecting via some kind of access provider. Most people will think of their ISP (internet service provider) aka your home broadband provider, but these days we're constantly connecting our phones to wifi at work, in cafes, shops and airports. Many phone network providers are teaming up with wifi networks so your phone will automatically connect to wifi spots around your city, and the latest phones support making calls and texts over the wifi connection. 

My privacy requirement is that when I connect to an access point, my web traffic is protected from the access provider, and that they can't see what I'm browsing or read the emails or messages that I download over their wifi. You might feel that this is unnecessary; can't we just trust the access providers? I'm not going to get into that here, other than to point out that you're also trusting every individual tech geek that works at those companies, and a lot of small technology outsourcing companies that they will use for IT installation and support. You're also hoping that they haven't been hacked by malicious individuals, and that they never will be hacked (probability:zero). And regarding your need for privacy, even if you are entirely blameless, consider the possibility that one day a friend or relative sends you a "private" message in which they joke about something that looks illegal or sinister when taken out of context. Their privacy is dependent on your privacy. 


OK, let's start simple here. Say your company has a bunch of computers in two offices in different cities. Each office has its own private network, connecting just the computers in that office to each other. Naturally, you'll want to be able to connect the two networks together (an 'inter-office-network'!). That's the "N" in VPN. So you connect the two with a cable from one office to another. These days, unlike when the telegraph first arrived, you don't lay the cable yourself, you lease one from the phone company. Everything works great, the connection is Private (that's the P), but, leasing a line is really expensive. And since the internet is already available for free, why not use that instead? So you want a private network that goes out over the public internet, so you need some fancy software that creates a Virtual reality (that's the "V") style simulation of a Private Network on top of the public internet.  

That's where VPNs come from. These days, you can download a VPN client to your phone or laptop, and connect to a cloud-based VPN server. Now, it's as if you have a cable connecting your device to that server directly, throught the magic of encryption and internet routing. Any traffic that goes over that tunnel can't be accessed by the real devices in between, such as the wifi router in the cafe, because it's encrypted and only the VPN server knows how to decrypt it. 

So, VPN clients are a great solution for maintaining privacy from your access provider, right? It's true they provide a potential solution, but there are pitfalls. The VPN client on your phone can stop running, or need to reconnect to the server, while this is happening all your web traffic is susceptible. Even if the VPN stays up and running all the time, you can't always be sure what traffic is routed over it. Remember our two offices VPN example? Well in that situation, IT guys would still route the web traffic from PCs in the office directly to the internet - only traffic destined for the other office's machines would be routed over the VPN. 

Most VPN client apps for phones do try to route everything over the VPN, since that's the real reason people use them. But they can still leak information. When you connect to a wifi access point, your device has to talk to it directly in order to get configuration information so that it can actually work (this is called DHCP). If the VPN client refused to let any traffic go to any destination other than over the VPN, you wouldn't be able to connect to the access point in the first place. 

Even if you get your VPN configured as tight as possible, it's quite likely that you still leak DNS lookups (remember those from part 2?). So the access provider can't see exactly what data you're transferring, but they can see all the website addresses that you look up in order to connect to them, which is quite a lot of meta data and certainly doesn't constitute privacy. 

The larger access providers, such as the big home broadband companies, are aware of the use of VPNs and of course they can detect when your VPN client attempts to connect to a VPN server (since they know the DNS names and IP addresses of the popular VPN services). If they wanted to, it's pretty easy for them to cause these connections to fail by blocking the initial connection, so your client can't reach the VPN server to start the whole encryption process. 

It's also possible for the access provider to take the traffic from your VPN client and send it to one of their own servers. This requires some sophisticated NSA level techniques, but it's entirely feasible. A less sophisticated approach requires the attacker to first hack into the VPN servers and get some decryption keys, but that's not at all infeasible - most OSs have security vulnerabilities and it only takes one server to be unpatched for the attacker to succeed. 


Now that you understand VPNs, proxies are a sinch, and we already discussed them in a previous post. Essentially a proxy is a server in the cloud that your browser connects to and sends all its web requests to. It's arguably a little simpler than a VPN, and they are just focused on keeping your browser traffic private, unlike the more general purpose VPN. 

Unfortunately, many of the popular browsers and proxies still leak DNS requests. So your web traffic is encrypted, but a snooper can easily tell which sites you're accessing. 


I'm sure when technically minded people read this, they'll suggest many possible ways of securing your web traffic from the access provider, but I've yet to find anything that a person of basic technical ability can be confident they've configured correctly and be sure they won't leak information or leave themselves open to various vulnerabilities. 

Browsing is Broken Part 2: Blocking Unsolicited Content

In part 1, I explained why I want news websites to send me their content directly, instead of passing me off to third-party advert networks that they have no control over. Since that isn't going to happen any time soon, we have to find ways to stop our browsers fetching potentially damaging content from the third-party servers that the media companies refer us to. The most popular options are ad blockers, proxies and blacklists

Ad Blockers

Adblock is a browser add-in that hides ads. Your browser still fetches and downloads the ad, but then Adblock steps in and stops the ad from being displayed, or the video from being played. For many people, this is a fine solution, and Adblock is justifiably very popular. I used it for a while myself.

The problem with most ad blockers is that your browser still requests and fetches all that ad content in the first place, and then, the ad blocking software runs, taking up more time and cpu, in order to remove the ad. The advertising networks of course know about ad blockers, so they try to circumvent them, disguising ads, so the adblockers let them through. Some adblockers can really slow down the browser quite a bit, and use up a lot of cpu and memory.

The advertiser vs adblocker battle feels a lot like virus writers vs virus scanners. Both sides have to run as fast as possible just to stay still, and the users never quite know who is in the lead. When the virus writers jump ahead, the consequences are devastating, in part because the Darwinian selection pressure means that a successful virus has to be immensely sophisticated and difficult to eliminate.

There has also been a backlash against ad blockers from technology companies. Google removed Adblock from the play store in 2013, and many ad blockers have been removed from the Apple Appstore over the years too. So as a user, you can't rely on these apps being available on your device indefinitely.

The big advantage that ad blockers have over the techniques I'll discuss next is that they are really simple to install and use. Pretty much no technical knowledge is required by the user, and if the adblocker stops working for whatever reason, the browser usually works fine.


A proxy is a server that a browser uses to access the internet. So the conversation becomes:

mybrowser: hey, proxy, can you get me the nytimes front page please?
proxy: sure, I'll go get it
[proxy talks to nytimes]
proxy: here you go mybrowser
mybrowser: thanks proxy! ok, let's look at this html, ok, gotta get a bunch more stuff
mybrowser: hey proxy, can you get me all this stuff from strange_address_1 through strange_address_200
proxy: sheesh, sure, whatever, coming right up....ok here you go:
mybrowser: thanks! ...ah crap this is huge.

Every browser supports configuring a proxy to talk to, because in many corporate networks, the only way to access the web is via a proxy. This helps IT control and monitor web access, and partition bandwidth so that you syncing your iTunes library on your work PC doesn't interfere with corporate email traffic, which would go on a different route. 

Once you have all the web traffic going through a proxy server that you control, it's easy to do things like set up a blacklist of sites that the proxy refuses to access. So if a browser requests, the proxy says "Access forbidden" or something equally sinister. Naturally, it doesn't stop there, and lots of corporations set up their proxies to stop people accessing facebook. The smart ones let them access facebook but have the proxy log every access and track how much time the employee is goofing off. 

The functionality developed for corporate proxies is very close to what we need for unsolicited content blocking, and sure enough, there are ad-blocking proxies that do a great job of removing unsolicited and potentially harmful content. 

Cloud proxies

Most proxy services are cloud-based; you get the address details of the proxy, you input them into your browser proxy configuration, and from then on, your browser asks the proxy server in the cloud to fulfill all your requests. 

There are a couple of problems with this. Sometimes, the cloud proxy server is in a different country to you. So you get the google homepage for Romania (no joke, happens to me a lot when I use privatetunnel), and google asks you all the time if you want to translate the page into Romanian. 

When gmail sees you logging in via a proxy, it will probably have a bit of a fit, and ask you to reauthenticate, and prove that you're a human with one of those squiggly text things. Also, this will keep happening, because when you go through proxy services, the server you go through changes regularly. From gmail's point of view, it looks a lot like someone's trying to hack into your account from dodgy locations that keep changing. 

Another problem with cloud proxies is that your requests have to make the extra trip to the proxy server. Usually this isn't a big overhead, but it can mean that your browsing feels slower. 

Local Proxies

A proxy is just a software program, so you can install one on your PC. One of the best is a program called Privoxy, which works really well and can be configured to do whatever you need. Using Privoxy installed locally on your PC has none of the issues of using a cloud proxy. Websites like gmail don't see any difference when you access them. Unlike cloud based proxies, your web requests don't have to jump through a remote server, so your browsing should feel as fast as it does with no proxy - in fact it might feel a bit faster because Privoxy will filter out ads and other content. 

The downside of Privoxy is that it can require a bit of technical knowledge to set up and maintain. If you configure your browser to use the Privoxy proxy, then if Privoxy isn't running, you'll get a message saying "Proxy server not accepting connections" or something similar. If you've read this far, and installed Privoxy, I'm sure that won't be a problem for you to figure out, but it may not be something you install for non-technical friends and family. 


Every time your browser loads a page, it starts by converting the "human" name of the website, like into an IP address. It does this by accessing the domain name system, or DNS, which is like a big database that maps all the web addresses on the internet onto IP addresses. If no entry for the human name is found, the browser doesn't know what IP address to send the request to. If it can't find the address for the name you typed into the address bar, you see an error message like " not found". 

When your browser gets the html for a page from the primary site, like, it reads the html and fetches any content required to complete the page. The html will have addresses telling the browser where to go. This is how your browser ends up fetching content from when you want to read It asks DNS for the IP address of and then fetches that anorak ad. 

Browsers are very forgiving, and they expect errors to happen. This is good, because if you look at your browser console (a hidden debugging window you can usually access by hitting F12), you'll see that almost every page you load has some errors. Often these errors are due to broken links. The guy who created linked to lots of cat pictures on websites that no longer exist. The browser expects this kind of thing, so it just does its best and displays whatever parts of the page it could successfully get. 

Hosts File Blacklisting

Before DNS became cloud-based, computers had to have a file that listed all the mappings from human names to IP addresses. This is the "hosts" file, and it's still located in /etc/hosts on most unix systems, and C:\Windows\System32\drivers\etc\hosts on windows. The operating system still checks this file every time the browser makes a DNS request, just in case it has an entry. If it finds an entry, it's much faster than asking the cloud based DNS. These days, the hosts file usually contains just one or two entries, but there's no reason you can't add more.

That's how we create a blacklist. We add entries into the hosts file and map them to bad IP addresses, e.g. When the browser asks for the address of, the operating system checks the hosts file, and finds an entry mapping to, the browser tries to fetch content from, which fails, but the browser is built to expect such failures, so the rest of the page loads just fine. 

A lot of people work to create these blacklist files, and you can download good ones for free on the web. Although the description of how all this works is a bit technical, installing a hosts file is just a matter of backing up your existing file and copying in the new one to the right location. After that, you might want to get the latest version every few months as new sites are added, but there's really no maintenance required. 

The main catch with using the hosts file for blacklisting is that you need to have administrator access on your device. For PCs this isn't usually a problem (you definitely have admin access on your home PC), but on an Android phone, it means you need root access, which requires some technical knowledge. 


None of the methods we have to avoid unsolicited content is entirely satisfactory. I currently use a hosts blacklist on all my devices, and I really like the results. Ad blocker browser plug-ins are the best solution for non-technical users, but Google and Apple have shown that they are opposed to allowing us to use them. At the time of writing (June 2016), adblock apps are available, let's hope it stays that way. 

Privoxy is probably the best solution overall  - it gives you complete control, and nobody can stop you installing it on your laptop/PC. Unfortunately, in order to get Privoxy working on your phone, you need it to be jailbroken or rooted. 

Thursday, 16 June 2016

Browsing is Broken Part 1: Unsolicited Content

The websites of many of the major news outlets that I used to read regularly are now overloaded with ads and content from third parties that I just can't tolerate any more. I started to notice how bad things were getting about three years ago when visiting on my Android phone exposed me to malware that made a charge to my mobile bill. It's not just inconvenient, it's insecure, and ultimately it's lose-lose for the media and their audience.

I get the business model that online companies need to sell advertising, and in principle I support that absolutely. Heck, I tried signing this blog up for Adsense in the off chance I can finally make a dollar back off Google (they turned me down). What I don't agree with is the way they implement it. Simplistically, when I enter in my browser, the conversation between the computers involved goes something like this:

myphone: Hey, can I get the front page please?
nytimes: Hang on a sec...just looking you up...
nytimes: Hey adservers, dubhrosa just asked me for my front page, whaddya got for him
adserver_network: oh baby! dubhrosa, I've got a ton of stuff for that guy, I'll send it directly to him if that's ok.
nytimes: Yeah sure, go nuts, I'll send him the headlines and some pictures, I'll leave most of the page for you guys
adserver_network: Great! Last week he bought an anorak from amazon. Maybe he'd like to see a couple more ads for anoraks. Also, a few months ago, he clicked on an ad for septic tank inspection services, it might have been a misclick, and we've shown him about 2000 more of the same ad since then, but hey maybe today's the day. Oh, and he seems to be into cars, so lets put on that video for the new Ford truck that starts to play automatically.
nytimes: ok great, thanks dudes
adserver_network: sure thing nytimes, here's your 0.001c
nytimes: hey thanks! nice tip! you guys are sooo nice!
myphone: ok, here's the html for this nytimes front page, thanks nytimes
nytimes: my pleasure
myphone: ok, in order to display this page, I need to go fetch a crapload of pictures and stuff, let's get that
myphone: hey, strange_name_1 through strange_name_200, can I have this stuff please?
adserver_network: [teehee, they never know it's us] sure! here you go!
myphone: yikes, they're sending me 50 megabytes of crap here, oh well this is gonna hurt my data plan and my battery.

Here's how the conversation should go:

myphone: Hey, can I get the front page please?
nytimes: Hang on a sec...just looking you up...
nytimes: (to self) ok, what ads do I have today that I should show dubhrosa...ok, stick them into the page
nytimes: here you go, this is the front page html
myphone: thanks nytimes
myphone: ok, there's some other stuff I need to download from nytimes to complete the page
myphone: nytimes, give me these pictures and video links please
nytimes: here you go
myphone: thanks!

The key difference is that in this flow, the nytimes is responsible for storing and serving the advertising content to its readers. The ad content is stored on their servers, and their staff have the ability to control that content. They can still target me with ads they think are relevant based on my previous online activity, but they have full control over the content that is sent in response to my request. They can keep the page size below some sensible limit. They can ensure their readers have a nice experience when browsing their site. I don't think any of this is unreasonable demand. Imagine if a newspaper editor allowed advertisers to scrawl whatever they wanted into the adspace of the newspaper, with absolutely no review by the newspaper staff. Shouldn't media companies, whose brand is so important, take control of what they send to their readers?

Unfortunately, the way the online ad industry has turned out means that this is unlikely to change, and in order to make browsing tolerable, we have to find solutions. I've looked at quite a few, and that's what I'll be talking about in part 2. Read Part 2

Tuesday, 19 January 2016

A Personal Finance Application Wish List

Many years ago I used Microsoft Money, so when I recently looked for a personal finance app, I was surprised to find Microsoft end-of-lifed the product. Even more odd was that nothing seems to have come along to replace it. It seems that building an app to track personal finances should be relatively easy, and I’m sure many programmers would have noticed that it’s a “point of pain”, so why isn’t there anything really good out there?

A bit of digging revealed that there are a couple of good apps, like Mint, and some of the other spending trackers from the banks. Though many of them are region centric, so if you're not in the US or the UK, you might be out of luck. One of the leading independent apps appears to be YNAB, and it’s great for what it is, but it’s not a full featured personal finance application. I now use YNAB (much more on this later). 

The problem may be that there are several distinct activities that I want from my PFM (personal finance manager), and these activities get confused all the time. There isn’t a convention that I know of for what these activities are called, so for now I’m calling them Tracking, Budgeting, Planning, and Cashflow Planning.


Tracking is pretty simple – you just record all your transactions, add some meaningful categories, and at the end of the month you can look at how much you spent in each category. This was the way I used Microsoft Money a lot of the time. When I was using it, the mere fact that I was tracking everything and analyzing it regularly had a definite impact on my spending. I was much less likely to spend on trivial things. I understood that my bank balance was just a number, and the real picture could only be seen when all the commitments that money had to serve were taken into account.
Tracking is the least complex feature of a PFM, and it probably delivers you the bulk of the benefits. Mostly because you’re forming the habit of regularly analyzing your spending, and breaking the illusion that the cash in your bank account is money you can spend. Perhaps part of the reason that there aren’t too many good apps out there is that this is easy enough to do with a spreadsheet, and many banks and credit cards now offer this kind of view of spending.


To most people, this means writing down all the transactions you think are going to happen in the next month or longer. Then you either try to reduce some (if you don’t have enough money to cover them), or you see that you’ve got enough money and you’re done. Then at the end of the month you check that you’re within budget (you are almost certainly not), then you feel bad for a bit, resolve to do better in future, and eventually give up budgeting.

Until I used YNAB for a while, I didn’t realize that my idea of budgeting was nothing like true budgeting. YNAB is a software version of the old “envelope” system, where back in the days of paper paychecks and using cash for everything, the thrifty household would cash their paycheck, then put amounts of cash into envelopes labelled with the categories they were for. An envelope for Groceries, one for Heating Oil, another for Clothing and so on. If you were looking further out, you might have one for Christmas, and another for the Vacation you were planning for next year.

There’s a certain genius to this system, and more than any other, it forces you to break the delusion that the cash you have is the cash you can spend. Even better, if you do spend too much on something trivial, you have to physically take cash out of one of the other envelopes to do it; and it’s kind of tough to let yourself take money out of the Kids College Fund or Vacation to go bowling.

YNAB is a very opinionated piece of software (a good thing I think). It’s tough to learn how to use it in the intended way without spending a lot of time reading the docs and watching the excellent online videos. I’m thankful I stuck at it, I don’t think I would have ever really understood what budgeting means otherwise. That’s probably why there seems to be a steep learning curve for such a simple piece of software – it forces you to look at budgeting a different way.


I’ve grown to respect the YNAB approach and I hope I’ll always see budgeting in this new powerful way. However, I also need to plan. The YNAB way is that you don’t allocate money to any category until you actually have that money. So when you’re starting out, you rarely budget more than 1 month out. So let’s say you get paid on the 1st of the month, and your mortgage payment is paid on the 2nd of the month. In the YNAB budget view, you don’t show that mortgage payment on your budget until you have the cash for it. Now I get why this is right, and I get that forcing you not to budget until you’ve got the cash is the best way to be realistic and build up a proper buffer so you’re paying this month’s expenses with the paycheck from a month or two ago.

But. I also want to be able to see a plan or a projection of those transactions. It’s important for me to think beyond the next month and remember that I pay my car insurance annually in March, and that’s nearly 700. In YNAB, the way you deal with this is by creating a category / envelope for car insurance and put enough into it each month so you’ll be able to pay the 700 in March without screwing up your cash position. That’s fine. But there isn’t anywhere in YNAB where you can see that car insurance of 700 is due in March, and property tax is due in April, and school supplies are due in September, and so on.

I also need a planning system that helps me make big decisions. Like what happens if I switch jobs to something that pays less? A short term budget doesn’t give me what I need here. Should my wife go back to work? What happens if my son gets into an expensive school? These questions require plans that go months and years into the future, and sometimes, the output is along the lines of “get a second job or a big promotion buddy, you need way more cash”. And that’s ok (I’ve done this in the past, many people do), but we should have tools that make it easier to explore the what-if scenarios.

Cashflow Planning

I also need to resort to a spreadsheet in order to figure out how much cash I’m going to have in my checking account, my family joint account, and what the balance is going to be on any credit cards we use. The purist budgeter approach is that this stuff doesn’t really matter – you have income coming in, when you receive that income, you allocate it to your expenses, and when you pay those expenses, it doesn’t matter which account it comes from, what’s really happening is income is being moved to outflows.

Again, this took some getting used to, and when it finally clicked, it made a ton of sense and I can see why it’s the right way to think when budgeting. But. I still need something that tells me when our joint account needs to be topped up so it doesn’t go overdrawn. It would be nice if I could see when that was likely to happen, and since I can input the expenses I’m going to be paying this month and next, along with the dates and amounts, I’m pretty sure the computer could do this for me.

More urgently, when you first adopt YNAB, you’re probably living paycheck to paycheck. In fact, you’re probably living off next month’s paycheck, i.e. spending on a credit card and then paying it off when you get paid, then repeating the cycle. While you work to fix this, and move to living off income you already have in the bank, you really need help managing your credit cards. The most urgent question being: which expenses should I pay with my credit card this month, so that I don’t put my checking account into overdraft? Some expenses can’t be paid with credit card, so it’s not good enough to say that you pay everything out of your checking until that’s used up and then switch back to credit.

Overspending and Amendments

When you overspend in YNAB, or you add an item to your budget mid-way through the month, you overwrite the budget with this new information. There’s no way to see what your original budget was versus how reality ended up. Other users have pointed this out, and the YNAB response is that it’s just the right way to do things. I think it may be ok to display the current budget without all the clutter of what the original amounts were, but the app should show you how often you underestimate categories and by how much, and how often you completely omit items. Then, when you’re budgeting next month, a helpful tip would be text like “You underestimate the total budget by an average of 15% each month, and regularly overspend in Entertainment by 50%”. I think it would also be a helpful discipline for the user to be asked to “declare” that a budget is final. As it is now, it’s too easy to keep tweaking and there’s no way of tracking what the original intent was.

The Huh…? Factor

The first months I used YNAB, I entered every transaction diligently, I allocated my income, I amended my overspending as recommended, and I created a buffer category to start building toward living off old income. I also cut back on a lot of spending, making pretty significant changes to get things under control. But when I put all this in YNAB, and looked at the reports and the budget screen, I had no clue what it all meant. Was I doing well? Did I have enough cash to cover my main expenses next month, assuming I got paid as usual? I had no idea, and I still think it’s tough to see. Yes, you need to stop thinking of the cash in your bank account as the money you can spend, but it would be nice to be able to move into “budget space” in YNAB, and then have it translate back into “cash and time” space. Putting it simply, if I give the computer my current balances, all the upcoming transactions with dates and amounts, it should be able to tell me what my balances will be in future. I know this isn’t budgeting, but it’s an important part of what I want from a personal finance app, and if I find another app that does this, I’m less likely to enter my transactions and projections into both YNAB and that new app.

Commitments, Covered, Discretionary

In a budget app, I’d like to be able to label some future expenses as “Hard commitments”. These are things I’m legally obligated to pay, like rent or mortgage, my internet service, gym fees, anything where I’ve signed a contract and I can’t just decide to reduce my spending in that category in the short term.

When income comes in, the budget app should allocate to the Hard Commitments automatically, in the order that they will have to be paid. If there’s income left over, then I can use that to allocate to the discretionary categories.

One solution to resolve the conflict between YNAB’s idea that you only enter the expenses that you have income to cover, and the need to do this to properly plan future cashflows, might to make it possible to enter future month’s forecast outflows, but to clearly mark them as “uncovered”. When income is allocated to outflows, they are “covered”.

So let’s say you’ve got no income yet this month, and a couple of hard commitments in your budget for the month along with some discretionaries. The hard commitments should be colored red, making sure you can’t miss the fact that if you don’t get some source of funds to pay them, you’re in trouble. The discretionaries can be colored amber, indicating that they’re not covered. As income arrives, the hard commitments turn green, as do the discretionaries. The commitments for future months stay red until you’ve got the cash to cover them. This way, the budget view will eventually show you with a month, or two, or three, with all categories in green, making it clear that you have a buffer and how far into the future it reaches.


The ideal outcome for me would be that YNAB adds functionality to their app that makes it easier to plan and see a “cash and time” view of my data. Or perhaps, other apps get built that can read the YNAB data files, so I don’t have to have separate records. I love the way YNAB just uses dropbox to sync and share the data files, and it looks like they’ve designed a good format for the data. Maybe what we need is for this format to be made “open” or somehow standardized by convention so that app developers can all read the user’s data, no need to rekey or import.

I’m sure there are a bunch of great apps out there that I’m totally unaware of, or perhaps the right thing is to get a professional accounting package like Sage or Quicken and do all of this the way the pros do it. Let me know what I’m missing! 

Friday, 18 July 2014

MH17 flight path history

MH17 crashed in Ukraine on the 17th July 2014, with the loss of all on board. According to reports aircraft was flying directly over the the disputed territory, at an altitude of 33,000 feet. 

Eurocontrol reportedly said that they had kept the corridor open even as Ukrainian authorities banned any aircraft flying below 32,000 feet. 

The historical flight paths followed by MH17 show that the flight on the 17th followed a route further to the north than previous flights, taking it over the disputed territory. 

12-July, almost completely avoids Ukrainian airspace, flies to the south over the Black sea:

13th July, slightly further north, well inside Ukrainian space:

14th July, entered Ukrainian space further north, but routed far to the south of the disputed territory

15th July, similar to the previous day’s route

16th July, Way farther north, skirting the disputed territory

17th July, Further north still, directly over the disputed territory, with tragic result

As reported in some media, other airlines have routed away from the disputed territory’s airspace for months
e.g., Air India Delhi to Frankfurt

 All images snapped from

Thursday, 1 August 2013

Good Intentions

The biggest waste of time I know.

I'm one of those programmers who spends a lot of his time wondering "how could I do this better/easier/faster/safer/simpler". Often the conclusion is recognising some common patterns across a codebase or refactor functions into more meaningful units, the bread and butter of keeping your code in shape. Quite often however, "this" becomes programming itself, or more accurately, application development. I start out asking how to make a module better, and trace the question back until I'm questioning the overall architecture of applications in general, rather than the particular program I'm working on.

For example, a little project I'm working on at the moment has a data access layer. It's a small file of functions that wrap database calls. There's a certain amount of marshalling and demarshalling, and a few hard coded mappings from query results to types. So naturally, I'm asking "why can't I generate all this code automatically and not even have to think about it, and have something automatically generate typed mappings? I could just give it my types and it should figure out everything else!" And so I spend a few days reading about ORMs, trying them out, debugging exploratory code, getting a buzz from seeing something work "automagically" and then getting battered and bruised by dependencies, versioning issues, performance gotchas that are now opaque to my reason, eventually whimpering back to my simple wrappers, cleaning them up for an hour and moving on. Yeah, yeah, there'll be a few bugs in that module and I'll probably come back to it a few times, but doing it this way is highly likely to be better than downing tools and rearchitecting the entire project to use an ORM. (Ok, I knew about ORMs a long time ago, but it's a typical example).

And that's a real time waster. Probably the biggest waste of time in my entire life.

Don't get me wrong, about 5% of the time, I'll learn about a method or library or framework or approach that really is better. The next project I design, I'll be able to make an informed decision as to whether an ORM is appropriate or not. (Disclaimer: I said "informed" not "good"). Sometimes it's even relevant to the project. Once in a while, it's not a completely foregone conclusion or utterly obvious, so I actually get to make the decision.

But 95% of the time, it leads to nothing, even if I find something interesting, it's not applicable to the current project, or the next one, and the one after that will be running on hexacore smart watches with 6G connectivity so it's probably better to wait until we're starting work on that before doing a review and making decisions about what stack to build it with.

And speaking of stacks, a lot of the irresolvable flamewars are because fanboi A is arguing about how great his language is, and fanboi B is arguing about how great his entire stack is. They're both right, they're just talking about two completely different things. I'll even propose the humbly named "dubhrosa relation of language elegance": the more elegant a language, the less likely it's embedded in a productive stack. Why? Evolution.

A productive stack is one in which everything important to commercial app development is basically possible and reasonably straightforward, in which you can change code in one part of your stack without creating massive explosions of infeasibility in some other part. Usually these stacks have ugly edges. In some cases, they have entire continents of ugliness, like stacks with PHP in them.

Perhaps this ugliness is a kind of a genetic trait that hints at what drove its creation. Take perl, or PHP. They're both ugly. There are pockets of elegance and undoubtedly they are "powerful", but in general, you can tell they grew without any especially well-grounded master design. You can also tell that the people driving the development of these kinds of languages were really focused on making stuff work. I can be fairly confident that perl and PHP will allow me to connect to any major database. There might be a few ugly bits, but I know it's really unlikely that I'll find myself boxed into a corner, like not being able to connect to a major db, or being able to, but ending up rewriting the standard library for doing so because it's such poor performance. I don't have the same expectation with Haskell or OCaml, for instance.

So, in conclusion, if you're building an app these days, you should pick a stack, a popular, well used one, and stick with it for the duration of the project. Learn to love the one you're with. When you encounter overwhelming ugliness, put it on your workflowy list under "There must be a better way - discuss", write the code that makes you cringe, comment it so everyone knows you're aware of how ugly it is so there's no risk to your programmer-tribe status levels, and move on.

Gratuitous blogpost controversial statements that I probably don't really mean exactly 

Let's put some languages in order of ugliness:

Python,PHP, perl, Java, Ruby, C++, C#, F#, Haskell

(yeah, yeah, Python really is way uglier than C++, just not at first glance, it's the beer-goggles language. Its ugliness is hidden under a layer of cakey makeup and whitespace and bitchy bravado. It tells you how much you deserve list comprehensions and before you know it you're in too deep. You're shouting at the screen "if I wanted a glorified dictionary wrapper masquerading as a programming language I could have built my own! a fast one!". It's uglier than PHP in a deep way. PHP sits in the corner of the library sniffling and covered in acne but is honest and friendly and if you ask him out on a date he says "are you sure?" three or four times. Then he brings you to the fairground and you have a great time up until you fall off the rollercoaster (did he get too excited and push against the guardrail? you'll never know for sure) and break your arms and wake up in hospital and somehow you've caught his acne and his cold but he's there beside your bed with video games and taytos. Python, by contrast, roofies you with syntax and when you see through it, tells you that you're inadequate if you don't understand how great it is, and when you finally leave, he spams your twitwall for months.)

And yes, Haskell is the most elegant, by about a billion miles, don't even try to argue with me on this one. Haskell's community is fantastic, but its ecosystem sucks. Configuration management can be tricky (google cabal hell), and it doesn't fit well in any of the widespread stacks. I've tried using Haskell in a web app, first using the Haskell webapp frameworks, and then just as a CGI within Apache/postgres, pausing only to build a custom json protocol with session oriented connections and a custom client. Oh how I wept. It was like finding the partner of your dreams, perfect in every way, except they wake up and stab you in the back of your head at 3 in the morning on a regular basis. I say this in part because it's sufficiently vague that the absolutely lovely and helpful Haskell guys don't implode in exasperation because I tried version of yesod and the random stabbings were fixed in (and I'd explain I couldn't use that because I was using a version of the bystring library hand picked to be compatible with a particular version of the vector library that meant I couldn't upgrade to that version of yesod even if I wanted to, oh wait that's fixed now but I need to build ghc from source you say? How about the x64 bug?), and in part because it's entirely true.

There's also the little discussed, but very important, matter of a project's "moron impact factor" that you need to consider when understanding the successful stacks. Think of it this way: every successful project basically never completes, it's a successful application so people are using it, asking for new features, and burdening the system with ever increased load. At some point, you'll have new programmers, old ones might leave, or you're adding because there's just more work to be done. The law of large numbers and cretinous recruitment agents means that at some point some very stupid programmers will work on your application. The amount of damage they can do in a given period (moron detection time) is a characteristic of the stack you're using. This is the hidden value of verbose languages - it takes stupid programmers so long to wade through all the verbose code, that they can't do as much damage, there simply aren't enough hours for them to get to it. What's more, it's easier to make your peace with adding a mediocre programmer to the team to add some feature, because the stack is pretty ugly anyway, so we're all acclimatised to the necessity of ugliness, so we can tolerate ugly but functional code written by a mediocre programmer who was drafted in.

Please let me know when my Haskell stack is ready. Until then I'll be in a darkened office rubbing my temples and sighing.