Workflow Productivity for Dummies
Are you creating a startup? Unsure what kind of workflows are available? Maybe you’ve only made small projects during your free time, and don’t know how to manage more than two people efficiently? Or maybe you’ve been in IT for a long time, and don’t know what all of this "Agile" and "Scrum" stuff is all about. You think tracking bugs, controlling versions and so on is all humbug - you have your members right here, and you can juggle their tasks in your head, right?
This is a guide and summary to various workflows that exist in the industry, various ways to boost efficiency and make code less daunting to work with.
|This is not meant to be a "one stop shop". The purpose of this article is to inform the reader about what kinds of solutions and flows exist, not to go in-depth on each one. This should give the reader enough information to then research the topics they believe to be interesting further, and should not be taken as an authoritative guide on any of the given topics.|
Tool sets define what you utilize to organize your efforts. Where do you keep your code? How do you find and fix problems and bugs? How is your code built? How do you maximize reproducibility? Do you utilize any languages for specific purposes? This is what this category (and its subcategories) is about.
Where is your code stored? How is it backed up? This not only defines how you can possibly collaborate, but also is potentially a distribution method.
Under this scheme, each developer has their own tree. You might send tarballs to each other, and keep them around with date stamps in the title. This is obviously flawed in many ways. For example, if two developers are working on the same piece of code at the same time, merging the two sets of code becomes extraordinarily hard. It also becomes difficult to identify where a problem came from. Removing changes that turned out to be wrong becomes near-impossible. Unfortunately, this method of storing code is still too common. Whether it’s because the company is filled with bad developers who are unwilling to learn better methods, because additional organization appears to be but a timeloss; ultimately, it is simply not worth pursuing this method of managing source code in all cases but the smallest projects (100 lines and less).
CVS, or "Concurrent Versions System" is an old and rudimentary way to control code. It’s a centralized system, with a single "master" server, that developers "check out" from, and later "check in" their changes. CVS avoids conflicts when several "check in"s happen by only accepting changes against the latest version of any file (an implicit form of file locking). Which is to say, if developers A and B work on the same file, and then they both check in, the one that checks in last will have to check out the file, and reapply their changes on their own. It keeps changes of text files using diffs, but keeps individual versions of all binary files. CVS is greatly preferable to nothing, but it does still have a few problems: it’s still too difficult to revert a single change, and iterating over old versions of the project to find the change that caused an issue is still nontrivial. Moreover, none of this is likely to change - the last update was in 2008.
SVN, or "SubVersioN" was started by Apache in 2000. They wanted to make an open source VCS that would work similarly to CVS, but have less bugs, and more features. As opposed to CVS, interrupted commits are truly atomic (as opposed to potentially breaking in CVS), manipulating files keeps the revision history, metadata and directories get versioning too, symbolic links are tracked, branching is native and many more advantages over CVS exist. It also introduces a custom protocol ("svn", over port 3690, with plaintext over TCP). Svn has dedicated "properties" - metadata attached to filesystem entries, such as whether something is executable, or what end-of-line styles should be applied. SVN does still have issues: it still forces a centralized model (there is the one-true-trunk(tm), and all others are just local copies), it doesn’t handle some filesystems (such as HFS+) very well, tagging is implemented as snapshot copies of trunk at any given moment. SVN does allow for the file locking mechanism mentioned earlier, which can make merging simpler… or development less parallel. SVN is a solid choice for local teams, or teams with relatively little communication happening between developers. It is still in development, with a new and improved repository type in development.
Git was written by Linus Torvalds to develop the Linux kernel, because his previous VCS (BitKeeper) was proprietary at the time (it was opensourced in 2016). The goals was to have a distributed version control system, fully fledged local repositories, full version tracking, independence from any central server, without too much performance loss. It’s fully optimized for (though does not force) non-linear (non-waterfall, see later) workflows. It is easily extendible (being more a collection of scripts than a single program), getting features like better binary file capabilities over time. Git has a very specific way of tracking changes, and partial uploads, each called a commit. This allows one to segment their changes into logical blocks, which, with git-bisect, allows one to pinpoint the source of a newly introduced bug. File locking is less required, thanks to the introduction of branches, which are far less awkward (can be local) than they are in svn. Git (and similar, see last section) has become a near universal standard, with github and gitlab existing, and companies such as Google and Facebook actively using them. Most things said here also apply to pieces of software very similar (in functionality) to git - fossil and mercurial, which are both adequate alternatives (though with fewer resources the size of github available).
- Mercurial and Fossil
similar to git, various (minute) advantages and disadvantages over it.
- Bazaar (bzr)
GNU Project, sponsored by Canonical. Has Launchpad and Sourceforge services in place of Github/Gitlab. Similar to SVN in usage, similar to git in features. Can be interoperated with other systems.
basically distributed CVS.
Issue tracking is important. Ever wonder how that one problem from 2 years ago was fixed? Want to make sure your workers are actually doing their jobs? Want to measure how efficient each worker is, or how correct their implementations are? Maybe you want to be able to reassign workers on a whim, and not lose a ton of time on re-explaining the problem at hand. Then issue tracking is for you!
Trac is a web-based issue tracker (as are all on this list), written in 2004 and last updated (at the time of writing) in 2015. It integrates with most version control systems, though it only has support for svn and git by default. It is used by projects such as webkit, ffmpeg and wordpress. It supports having a wiki, a blog, multiple projects, user accounts, CI and pastebin integration, among others. In my experience, trac feels very "wiki-ish", with bug reports strapped on. Which can be good (such as in ffmpeg’s case), or horrible, depending on the project.
Bugzilla is a bug tracker initially released by netscape communications, but now developed by mozilla. It is used by projects like Gentoo, Linux, FreeBSD and LibreOffice. It’s based on Perl5 and CGI, and therefore harder to utilize with some servers (such as nginx). Bugzilla is hard-specialized in tracking software defects, giving up some of the integration that Trac has for better bug management.
JIRA is the the primary competitor to bugzilla, and is not an open source solution (though it provides free licenses to certain matching open source projects). It has tools to migrate from bugzilla. Jira’s focus is on trying to optimize the bug workflow around the various agile development models - allowing for rapid iterations. The only open source project I’m aware of using it prominently is the Qt project, and they do occasionally run into problems. For example, I recall a bug asking someone to put their patch code elsewhere, since they can’t merge it from JIRA, which kind of takes away from the supposed great integration. In proprietary projects with full licenses such issues shouldn’t arise though.
While trac feels like bug reporting stapled on a wiki system, github and gitlab (both full hosting solutions) feel like bug tracking and wiki stapled upon source hosting. The bug tracking in both of those tends to be less featureful than in some other solutions, but offers very tight integration to the source, and tends to be "enough" in most situations. However, since bugs and the source are more tightly linked than wikis and docs (seeing as a wiki is not formal documentation), this (in my experience) works better thant trac.
Ever have that intern push broken code? Like trying to load the entire 2TB database into RAM? But how do you avoid these without outright telling them to just "not do anything"? Code Review is the answer! The idea of code review is that people make mistakes, even if the overall idea is correct. To protect against those mistakes, all code added to the project has to be verified by another developer (a peer). This way, one person’s hard to catch mistakes become obvious to someone else, and are thus eliminated before they ever make it in. Since review tends to take less time than figuring out fixes to strange bugs in giant codebases, this is a net win. Solutions for code review tend to be quite similar, so this section will talk about the kinds of workflows around them exist.
Pre-Commit code review means the code gets reviewed before it ever gets added to the codebase. Seen in tools like Gerrit, GitLab/Hub and Phabricator, most of the time this takes the form of "pull requests". To add code to a project, one must clone the project, make a feature branch, and once they believe it is done, open a PR. The request doesn’t add the code yet, but the code is available for review. Once it passes tests (you use CI, right? (see below)) and is approved by the predetermined amount of reviewers, it gets merged in. If the reviewers deem it not good enough, they can comment on the PR, and the pusher then adds code to their branch (which will automatically show up in the PR, to be re-reviewed). The primary advantage of this approach is that when the code is merged, it is very unlikely to cause too many problems - it has already passed all the available tests, and been verified by trustworthy individuals in several cycles.
Mailing List code review is similar to pre-commit. A mailing list is a simple email construct - all messages that come to the mailing list will get sent to everyone else "subscribed" to that mailing list, allowing for mass communication, and threads. Under this system, patches are attached to a mailing list post, and are then reviewed by a central authority. If one patch (of several) is deemed correct, it may be merged immediately, with others being asked to be improved. This approach takes a trade-off: partial code merging can happen (so features get to the codebase faster), but it tends to hurt integration. Patchwork is a mailing-list centric system that analyzes all mailing list entries to catch patches and related data.
Post-Commit code review is somewhat less immediately obvious. The idea is that people push to upstream as normal, and the code gets reviewed after the fact. Usually, this means having a few "dedicated" reviewers, that review all the incoming code over time, or each developer has a certain period in the day where they review instead of write. This approach also needs the least setup - simply having access to the VCS history is already "enough". This approach means that bad code can and will make it into upstream, and you simply attempt to catch it before any release. Often, projects following this workflow will do what is called a "feature-freeze" before releases, where only review and bugfixing happens, to avoid introducing any new bugs. The advantage is that features do tend to appear significantly faster.
CI is using a tool to continually build new versions, doing predetermined tasks with the source. These can be anything from building it (and preparing a binary for deployment!), running tests to checking how the software works in various environments. As an example usage, a CI server could test every commit, and if any commit introduces a failing test, it’ll notify the developers, and expose the culprit who introduced said test failure into the system. These tools are (as you can tell) particularly useful if you have tests (you do though, right?), and all tend to operate the same way. Since they’re all similar, and almost all of them support plugins to extend beyond base language support, here is a simple list.
this one’s mostly useful if you’re also deploying to windows, as it does things in a windows VM.
built into GitLab and relatively good, it comes with the disadvantage of being hard to run it without gitlab.
Jenkins is an industry standard, open source (MIT), with a beautiful (brand-new) UI (the remake is called blue-ocean), and doesn’t come attached to anything. The only disadvantage is being written in java 7 (or above). (Note: Jenkins is a fork of Hudson, that the developers didn’t like for several reasons, and has since grown apart)
Started in 2012, strider is one of the earlier examples of nodejs being used for basically everything. Being written in JS does help with having a web UI, however.
A development model determines how you approach actually writing the software. All of these are ultimately disconnected from the tools section, but some models are more commonly combined with specific tool sets…
The waterfall model is one most people tend to go towards if they’re unaware of any alternatives. The waterfall model is a non-iterative design process. You figure out your requirements, you design a way to fulfill them, you implement the solution, you verify that it works, and you leave it for later maintenance. In theory, this allows you to catch issues early (e.g when computing the requirements), saving money and energy in the long run. This also lets you write documentation as you go along, and have it be functional for the lifetime of the solution; which allows new people to join in relatively quickly. In practice, documentation is often lacking, in an attempt to skip steps towards implementation. Maintenance ends up fixing the same bugs over and over.
Incremental development revolves around creating a prototype of a feature as a proof of concept, and then merging it with the "core" container of features. This allows one to avoid dirtying up the actual project with unproven features, in theory. In practice this tends to create a lot of spaghetti glue code in the project itself. It’s well suited for very modular programs (such as a chat bot, or projects approached like git), but tends to be unmaintainable for larger chunkier projects.
RAD (Rapid Application Development)
The RAD approach is somewhat similar to incremental, with a few key differences. RAD explicitly avoids to much up-front planning. The planning phase consists of determining the data workflow, and data models - figuring out the structure of the program. Each logical structure is then made into a prototype (with a partial goal of refining newfound practical issues), before all of them being merged together. This is then iterated upon repeatedly. The goal is to minimize the investment cost, but still deliver a high quality system quickly. This tends to cause the most "alpha software - do not touch" scenarios, since user feedback on earlier iterations are important - the goal isn’t to create throwaway prototypes. Over time, good software can come out of this, but early on it has all the issues of incremental development, which can interfere with adoption.
Agile is actually not a Development Model, but rather a family thereof. The rest of these entries are all "Agile". The common point is being based on Incremental development, but addressing the issues with it, such as by being adaptive rather than predictive, preferring quality over speed (speed is achieved through short cycles).
Scrum segments work into very structured periods called "sprints". A sprint is a timeboxed period of time, determined at the start of the project (though it can be changed). At the start of each sprint, a meeting (sprint planning event*) is held, with two goals. Determine the overall speed of the team using the backlog (what was done, what wasn’t done during the last sprint) and (with the help of that information!) plan what will be done during the next sprint. At the end of each sprint, a review session is done, to show stakeholders the progress made, and learn lessons to improve future sprints. Daily, scrum meetings (sometimes called stand-up) is held. Anyone can participate (though usually is restricted to the development team). It must always start at the same time (even if people are missing), at the same time every day, and is limited to 15 minutes. Int hat time each team member must explain what they’ve done yesterday in order to meet the sprint’s goals, what they are planning to do today, and if they see any potential roadblocks to achieving the goal. Any such roadblocks should be noted by the project lead, and solved later. This approach is very efficient at maximizing work efforts. It becomes inefficient when you have very long-term tasks that can’t easily be progressed towards in the space of a single sprint (though those are usually a sign of bad design). The strongest point of scrum is in business interactions. Most non-iterative pre-planned models do not deal well with changing requirements, so if an executive runs into the room screaming for everyone to drop what they’re doing, and start working on a brand new idea (tm), it is now possible to say "very well, we will discuss this during the next sprint planning event, make sure you attend!". This iterative *planning process* also allows to flexibly change the software over time to ever-adapting requirements.
LSD (Lean Software Development)
Lean Software Development takes an aspect of Toyota’s optimization process, specifically the eliminate waste bit. The goals are:… Eliminate as much as waste as possible (waste being anything that doesn’t add value to the customer, such as partially done work, or extra features). Amplify learning by writing code rather than thinking about it. Deciding things as late as possible (which forces iterative planning). Deliver as fast as possible (which forces fast iteration). Empowering the team (instead of managers telling developers what to do, developers explain what can (and maybe should) be done, as well as what their specialties is, and managers work by removing impediments and encouraging process. Building integrity in (the opposite of Incremental development, giving a sense of "the whole" to the customer), and Seeing the Whole - the idea that software isn’t just the sum of its parts, but is defined by its interactions, eliminating defects by decomposing the big tasks into smaller tasks (think a factory). The main problem with this approach is it focuses so much on development to user efficiency that it ultimately becomes inflexible - what if you have a requirement that needs research (such as picture recognition)? This is similar enough (including in inspiration) to Kanban that I won’t mention it further, especially since LSD seems to be more popular than the former.
Extreme Programming (XP)
Extreme Programming is the concept that anything beneficial in development should be taken to extremes. It advocates very frequent "releases", CI (Continuous Integration), and unit testing as much of the code as possible, similar to scrum. The similarities end there though. It also prefers reviewing code as much as possible (going as far as continuous pair programming), avoiding programming any feature until it’s actually needed, a flat management structure, simplicity and clarity in code, as well as expecting changes in the customer’s requirements over time (optimizing for communication). The way it achieves this is by having several iterative loops. Every aspect is iterated upon - code is iterated upon in pair programming (iterations are seconds long), unit testing (minute-long iterations), pair negotiation (hour-long iterations), stand-up meetings (day-long iterations), acceptance-testing (day-long iterations), iteration plans (week-long iterations) and release planning (month-long iterations). In my experience, this works relatively well, but tends to cause more burnouts, and has a tendency to disadvantage introverts (of which there are more than average in software development).