Michael Eriksson's Blog

A Swede in Germany

Posts Tagged ‘software development

Where did those millions go?

leave a comment »

The last ten days have been … interesting … work-wise, following a production release, and allowing some observations to be made and lessons to be drawn.

(Below, I am deliberately skimpy with contextual details to avoid a direct or indirect leak of internal information relating to my customer, as well as reducing the risk that someone identifies said customer.)

  1. The main event was several million (!) Euros more than intended being paid out to various parties, possibly including some misallocation of money between parties. Correcting this was fortunately possible, but it required a lot of additional efforts and caused face loss for my customer vis-a-vis its customers, the Development vis-a-vis the Payment department, and the both the Development and Payment departments vis-a-vis the executives.

    The reason for this was a single line of code* left in place, when it should have been removed, that caused the state of some records to not be updated appropriately, in turn causing them to be processed a handful of times each day (instead of one single time on one single day). How the hell did something like that get past not only the original changer (me), but also pass the code review and extensive tests by Quality Assurance? How did the resulting unexpected payment instruction not catch the attention of the (usually pedantic) Payment department until several days had gone by?

    *Specifically, a considerable part of the processing had been moved from using individual sets of database tables for data delivered from individual third parties to using one set of unified tables for all third parties. An SQL UPDATE that should just have altered all records given as input also checked whether a Foreign-Key field was filled—a check that might (cf. below) have once been necessary, but was now redundant. Worse, the test was now faulty, because the old tables had used “-1” to indicated the absence of the Foreign-Key while the new tables used a NULL-value. Since the check still used “-1”, no update took place.

    As is often the case, there has to be a confluence of a number of unfortunate circumstances. (Read up on the Chernobyl accident and note how many different screw-ups took place leading up to the event.)

    Here the problems include (but are likely not limited to):

    • My making the original change long before the filling of the unified tables was actually implemented (by someone else) and running my developer tests with correspondingly faulty data.
    • The left-in line of code looking very innocuous, possibly even like a safety measure, which would make it far easier to miss for the code reviewer, especially with no change being present compared to the original code: Very often, it is a good idea to write UPDATE statements in a manner that prevents unintentional changes by checking for an old value*. In this case it was not, because irrespective of the value at hand the records had been processed.

      *Consider writing a statement that corrects some incorrect data for a predefined set of records, and assume that, before the statement is actually executed, one of the entries is altered and processed further through some other mechanism or an intervention by someone else. The results can be quite problematic, and it is better to add a check that the records are still in the state they are assumed to be, e.g. in that a “SET a=[new value]” is accompanied by a “AND a = [old value]” in the WHERE clause.

    • The original code, pre-dating my change, was also almost* certainly faulty, because the check had the same dangers as in the previous item. The problem had simply never manifested, due to the Foreign-Key always having had the expected value until now. (Or had it? If the event was rare enough, it might have gone unnoticed…)

      *I do not have the code at hand, and there might conceivably have been some legitimate reason for its presence in the original; however, as I remember the corresponding statement, there was not—and even if there were, it was a sloppy solution to begin with.

    • The corresponding functionality was never tested by QA, limiting the tests to my previous developer tests (cf. above). Why was it not tested? Well, for the entirety of the test phase, this functionality ran into errors due to missing master data in a sub-system, preventing this specific line from even being reached for the test set chosen by QA*. The developer who would normally have been responsible for amending the data was on a month-long leave, and his replacement was his predecessor—who had moved on to project management and was swamped with the administrative parts of the same project. Despite several reminders over weeks, the data never were amended (and QA had their hands full testing other parts).

      The day before the end of the tests, I explicitly asked QA to do at least an approximate test by removing the sub-system from the equation** so that we had at least the equivalent of a smoke test (which would have been enough to discover this particular error). But here is the real kicker: As I come in the next day, QA tells me that the project manager (the very same as above!) had told them to not perform this test…

      *Inputs copied from the production system, intended to ensure that the test was as realistic as possible. Unfortunately, this often requires amending the master data in the QA systems, for which there is currently no way short of manually finding and copying the right set of entries. (I am not an enthusiastic supporter of this approach, but it is not my decision.)

      **Specifically, that we were to temporarily replace the interface to the sub-system with a dummy that delivered some approximation of the data for a very limited set of inputs.

      As an aside, my main contribution to this debacle was not that I made the original mistake—things like that happen every now-and-then and is the reason why code reviews and QA tests are needed. No, my main error was in not insisting that a test took place.

      Unfortunately, this is not the first time that an explainable, harmless error (here the missing master data) has lead to QA not discovering another, actually harmful error that was “hidden” by the first error.

    • Due to previous delays and the urgency of the deadlines, there were several complications, including the impossibility to extend the QA phase and the fact that the decision was made to go live even with a QA report clearly stating that there were untested areas (including the above) and that this was a high risk endeavor.
    • Production releases are theoretically only made after the informed approval of at least one of the two main executives, who are responsible for a final risk–benefit decision and whatnot. In practice, they rubberstamp all or almost all requests at a speed that forbids an informed decision.
    • The head of Payments had spotted a (still small) discrepancy on the very first day after the installation—but was so swamped that he, by his own statement, postponed the investigation.

    As an aside: I am a great fan of automatic and repeatable tests of various kinds. It is possibly that such would have helped finding the error. However, they are basically not a topic for this very non-agile customer, there is almost no infrastructure for them (apart from what little I have written myself), and even I rarely actually have the time with the hectic schedules that abound. This is the one thing from the Java world that I miss…

  2. In a further complication in the same rough area (and hit by the same instance of not-tested-by-QA) it turns out that some amounts had the wrong sign from the perspective of the next system in the chain of processing.

    An older piece of code (that we knew was faulty) calculated amount A from the absolute value of amount B and the sign of amount C—something that possibly might have been reasonable when the system was originally written, processing a more limited range of business cases from another source, but was now heavily outdated. After investigations and long discussions (including with the domain expert from and co-head of Product Management) the decision was made to ignore amount C and to use the negated sign of amount B instead. This made for a consistent and plausible calculation and preserved compatibility with the old results in the vast majority of cases—after all, if the old results had not been correct most of the time, the problem would have been fixed much earlier.

    Alas, no: The calculated values were only used in a minority of all cases—and this minority required the (unnegated) sign of amount B. The other cases had almost all been the exact opposite of what they should be for many years—but since the values were not actually used, no-one had ever noticed.

  3. Generally, it is often very small details that cause problems in software development, probably because they are so easy for both developer and code-reviewer to overlook. Above we have one example with a single line too much and another with a single character (a “-”) too much. A few weeks earlier, a colleague had in three cases used one of two different functions that only differed in that the name of the one contained an “n” (for n-umber or n-umerical) and the other an “a” (for a-scii, i.e. text) and that the one assumed that leading zeroes should be removed from input. One of the three cases correctly used the n-version, the second correctly used the a-version, the third accidentally the a-version when it should have used the n-version. Again: One single character—with very incorrect calculations as a result.
  4. The quite extensive changes made with this production release caused several prior errors that had gone undiscovered, typically for years, to manifest due to harder checking and less leniency.* This included, albeit mostly shortly before the release, the discovery that a third party had incorrectly passed us certain data in a pre-anonymized form that we needed in its original version, anonymization taking place in our system.** The effect was that several things that we needed to calculate based on this data were occasionally miscalculated, including some cash-flow dates.

    *Being lenient with e.g. input from other computer systems (as opposed, possibly, to “from human users”) is usually a big mistake. It is better to “fail fast” so that the problems with the input can actually be fixed, and do not remain hidden and potentially harmful for years.

    **Specifically, a string of characters that needed to be compared to configured ranges of characters to identify certain characteristics. A part of this string was overwritten with zeroes in a blanket manner, implying that the comparison often picked the wrong range or, more rarely, did not find any range at all. The latter case was sufficiently rare that we had hither-to assumed that it was a case of either the third party rarely receiving incorrect input, it self, or slight discrepancies in the range definitions used by various parties. (The ranges all coming from yet another third party on a regular basis, allowing for the possibility of small, temporary differences in definitions among different parties, depending on exactly when the definitions were received and imported.) The value it self, as opposed to the results of the calculation, had not caused any alarm for the simple reason that we normally never see these values: They are used for the calculations and then immediately anonymized, and, because we use the same anonymization algorithm as the third party, it is later impossible to tell that the data had been pre-anonymized.

  5. A strictly speaking undramatic and, in it self, unimportant issue was a temporary performance drop after the installation. This was easily resolved on the same day, but contributed (as did the interruption of services needed for the original installation) to Payment having its schedule delayed by several hours and indirectly to Payment not acting on the discrepancy they found (cf. above). The main hitch here is lack of communication: Payment had, as became obvious, no idea of the scope* of the changes going live that day, and were left unprepared for delays that were entirely within the expected from a Development point of view. Had they been sufficiently made aware that this was not just another run-of-the-mill release, things would have gone smother, the interdepartmental irritation would have been considerably smaller—and we might have discovered the main problem on the first day.

    *A very major project following the outdated “Waterfall” model that required correspondingly very major changes to the code. There is something to be said for the short release cycles of e.g. Scrum…

Advertisements

Written by michaeleriksson

March 4, 2018 at 1:37 am

The absurdities of my current project (and some attempts to draw lessons)

leave a comment »

I have spent roughly two years in a row with my current client, to which can be added at least two stints in the past, working on several related projects. Until the beginning of the latest project, roughly six months ago*, I have enjoyed the cooperation, largely through the over-average amount of thinking needed, but also due to my low exposure to bureaucracy, company politics, and the like. With this project, things have changed drastically, problems with and within the project have turned the experience negative, to the point of repeatedly having negative effects on my private life**, and I definitely do not feel like accepting another extension. I am, in fact, at a point where I want to bang my head against the wall and scream my frustration on a regular basis.

*Giving an exact number is tricky, with the unofficial kick-off and preparations taking place long before the official kick-off, combining a long phase of (often failed or useless, cf. below) requirement clarifications and study of various documents with work on other projects.

**When problems of a “political” nature occur or when my work is hindered by the incompetence or lack of cooperation from others, I often have difficulties letting go of these issues in the evening. This is one of the reasons that I prefer working as a contractor—I am less vested than I was as an employee and can (usually…) take a more relaxed attitude towards internal problems of a given client.

To look at a few problems:

(I apologize for the poor structuring below, but it is almost 1 A.M. in Germany and I want to close this topic.)

  1. The originally communicated live date was end of June. This was completely and utterly unrealistic, which I and several other developers communicated from the start. Eventually, the plan was changed to first quarter 2018—not only much more realistic, but something which allowed us to use a high-quality approach, even undo some of the past damage from earlier projects with a too optimistic schedule*. Unfortunately, some months in, with development already long underway based on this deadline, the schedule was revised considerably again**: Now the actual live date was end of October, with a “pilot” release, necessitating almost all the work to be already completed, several weeks before that…

    *What is often referred to as “technical debt”, because you gain something today and lose it again tomorrow—with usurious and ruinous interest… Unfortunately, technical debt is something that non-developers, including executives and project managers, only rarely understand.

    **I will skip the exact reasoning behind this, especially since there are large elements of speculation in what has been communicated to me. However, it falls broadly in the category of external constraints—constraints, however, that were, or should have been, known quite a bit earlier.

    Queue panic and desperate measures, including scrapping many of the changes we had hoped for. Importantly, because a waterfall* model with a separate test phase was followed, we were forced to the observation that from a “critical path” point of view we were already screwed: The projected deadline could only be reached by reducing the test phase to something recklessly and irresponsibly short. As an emergency measure, in order to use test resources earlier and in parallel with development, the decision was made** to divide the existing and future work tasks into smaller groupings and releases than originally intended. On the upside, this potentially saved the schedule; on the downside, it increased the amount of overall work to be completed considerably. Of course, if we had known about the October deadline at the right time and if we had not believed the earlier claims, this could have been done much more orderly and with far less overhead: We almost certainly would have kept the deadline, had this been what was communicated to begin with. As is, it will take a miracle—indeed, even the official planning has since been adjusted again, to allow for several more weeks.

    *I consider waterfall projects a very bad idea, but I do not make the policy, and we have to do what we have to do within the given frame-work.

    **With my support on a reluctant lesser-of-two-evils basis. Now, I am used to work with smaller “packages” of work and smaller releases in a SCRUM style from many of my Java projects. Generally, I consider this the better way; however, it is not something that can be easily done “after the fact”. To work well, it has to be done in a continual and controlled manner from the very beginning.

    Planner* lessons: Never shorten a dead-line. Be wary of rushing a schedule—even if things get done this time, future problems can ensue (including delays of future projects). Never shorten a dead-line. Do not set deadlines without having discussed the topic in detail with those who are to perform the work—the result will be unrealistic more often than not. Never shorten a dead-line.

    *For the sake of simplicity, I use the term planner as a catch-all that, depending on the circumstances, might refer to e.g. an executive, administrator, project manager, middle manager, … I deliberately avoid “manager”.

    Developer lessons: Do not trust dead-lines. Even in a waterfall project, try* to divide the work into smaller portions from the beginning. It will rarely hurt and it can often be a help.

    *This is not always possible or easy. In this case, there were at least two major problems: Firstly, poor modularization of the existing code with many (often mutual or circular) dependencies, knowledge sharing, lack of abstraction, … which implied that it was hard to make different feature changes without affecting many of the same files or even code sections. Secondly, the fact that the code pipeline was blocked by another project for half an eternity. (I apologize for the unclear formulation: Explaining the details would take two paragraphs.)

  2. During a meeting as early (I believe) as February, the main product manager (and then project manager) claimed that the requirements document would be completed in about a week—it still is not! In the interim, we have had massive problems getting him to actually clarify or address a great many points, often even getting him to understand that something needs to be clarified/addressed. This for reasons (many of which are not his faults) that include his lack of experience, repeated vacations and/or paternity leaves, uncooperative third parties and internal departments that need consulting, too many other tasks for him to handle in parallel, and, to be frank, his not being overly bright.

    A common problem was questions being asked and a promise given that the next version of the requirements document would contain the answer or that something was discussed in a meeting and met with the same promise—only for the next version to not contain a word on the matter. When asked to remedy this, the answer usually was either a renewed (and often re-broken) promise or a request for a list of the open issues—something which should have been his responsibility. Of course, sending a list with open issues never resulted in more than several of the issues actually being resolved—if any…

    As for the requirements document: I asked him early on to send me the updated version with “track changes” (MS Word being prescribed tool) activated either once a week or when there had been major changes. There was a stretch of several months (!) when a new version was promised again and again but never arrived. When it did arrive, there were no “track changes”…

    I recall with particular frustration a telephone conversation which went in an ever repeating circle of (with minor variations):

    Me: Please send me the changes that you have already made. I need something to work from.

    Him: Things could change when I get more feedback. It makes no sense give you my current version.

    Me: I’ll take the risk.

    Him: You can work with the old version. Nothing will change anyway.

    Me: If nothing will change, why can’t you give me your current version.

    Him: Because things could change when I get more feedback. It makes no sense give you my current version.

    (And, yes, I suspect that the problem was that he simply had left the document unchanged for half an eternity, despite knowing of a number of changes that must be made, and did not want to admit this. He likely simply did not have an interim version to send, because the current state of the document was identical or near identical to the older version. Had he sent this document, he would have been revealed—hence weird evasions and excuses that made no sense. But I stress that this is conjecture.)

    The other product managers are better; however, the only one assigned to the project full-time is a new employee who has started from scratch, with the others mostly helping out during vacations (and doing so with a shallower knowledge of the project and its details).

    Some blame falls back on development, specifically me, for being too trusting, especially through not keeping a separate listing of the open points and asked questions—but this is simply not normally needed from a developer. This is typically done by either product or project management—and he was both at the time. See also a below discussion of what happened when we did keep such a list…

    Planner lessons: Make sure that key roles in an important and complex project are filled with competent and experienced people and that they have enough time to dedicate to the project. (Both with regard to e.g. vacations and hours per typical week.) Do not mix roles like product and project manager—if you do, who guards the guards?

    Developer lessons: Do not trust other people to do their jobs. Escalate in a timely manner. Find out where you need to compensate in time.

    Generic/product manager lesson: If you are over-taxed with your assignments, try to get help. If you cannot get help, be honest about the situation. Do not stick your head in the sand and hope that everything will pan out—it probably will not.

  3. After the first few months, a specialist project manager was added to the team. As far as I was concerned*, he had very little early impact. In particular, he did not take the whip to the main product manager the way I had hoped.

    *Which is not automatically to say that he did nothing. This is a large project with many parties involved, and he could conceivably have contributed in areas that I was rarely exposed to, e.g. contacts with third parties.

    Around the time the October deadline was introduced, a second project leader was added and upper management appeared to develop a strong interest in the projects progress. Among the effects was the creation of various Excel sheets, Jira tasks, overviews, estimates, … (Many of which, in as far as they are beneficial, should have been created a lot earlier.) At this point, the main task of project management appeared to be related to project reporting and (unproductive types of) project controlling, while other tasks (notably aiming at making the project run more efficiently and effectively) were not prioritized. For instance, we now have the instruction to book time on various Jira tasks, which is a major hassle due to the sheer number*, how unclearly formulated they are, and how poorly they match the actual work done. To boot, a few weeks ago, we were given the directive that the tasks to book time for the “technical specification” had been closed and that we should no longer use these—weird, seeing that the technical specification is still not even close to being done: Either this means that there will be no technical specification or that the corresponding efforts will be misbooked on tasks that are unrelated (or not booked at all).

    *I counted the tasks assigned to me in Jira last week. To my recollection, I had about ten open tasks relating to actual actions within the project—and fourteen (14!) intended solely to book my time on. Do some work—and then decide which of the fourteen different, poorly described booking tasks should be used… It would have been much, much easier e.g. to just book time on the actual work tasks—but then there would have been harder for project management to make a progress report. In contrast, we originally had three tasks each, reflecting resp. meetings, technical specification, and development. Sure, there were border-line cases, but by and large, I could just make a rough approximation at the end of the day that I had spent X, Y, and Z on these three, clearly separated tasks. To that, I had no objections.

    We also had a weekly project/team meeting scheduled for one hour to report on progress. After we exceeded the time allotted, what was the solution for next week? (Seeing that we are on a tight schedule…) Keep a higher tempo? No. Replace the one big meeting with a daily ten minute mini-briefing? No. Skip the meeting and just have the project manager spend five to ten minutes with each individual developer? No. Increase the allotted time by another half hour? Yes!

    By now we have a third project manager: That makes three of them to two product managers, two testers, and seven developers. Call me crazy, but I find these proportions less than optimal…

    At some stage, the decision was made to keep a list of open questions in Confluence to ensure that these actually were resolved. Of course, the responsibility to fill these pages landed on the developers, even when emails with open items had already repeatedly been sent to product management. With barely any progress with the clarifications noticeable in Confluence, this procedure (originally agreed between development and product management) is suddenly unilaterally revoked by project management: Confluence is not to be used anymore, what is present in Confluence “does not exist”, and all open issues should be moved, again by development, to a new tool called TeamWork. Should the developers spend their time developing or performing arbitrary and/or unnecessary administrative tasks? We are on a deadline here…

    Well, TeamWork is an absolute disaster. It is a web based, cloudstyle tool with a poorly thought-through user interface. It does nothing in the areas I have contact with, which could not be done just as well per Jira, Confluence, and email. Open it in just one browser tab, and my processor moves above 50% usage—where it normally just ticks around no more than a few percent*. To boot, it appears** that the companies data are now resting on a foreigner server, on the Internet, with a considerably reduction in data security, and if the Internet is not there, if the cloud service is interrupted or bankrupted, whatnot, the data could immediately become inaccessible. And, oh yes, TeamWork does not work without JavaScript, which implies that I have to lower my security settings accordingly, opening holes where my work computer could be infected by malware and, in a worst case scenario, the integrity of my current client’s entire Intranet be threatened.

    *My private Linux computer, at the time of writing, does not even breach 1 percent…

    **Going by the URL used for access.

    The whole TeamWork situation has also been severely mishandled: Originally, it was claimed that TeamWork was to be used to solve a problem with communications with several third parties. Whether it actually brings any benefit here is something I am not certain of, but it is certainly not more than what could have been achieved with a 1990s style mailing-list solution (or an appropriately configured Jira)—and it forces said third parties to take additional steps. But, OK, let us say that we do this. I can live with the third party communication, seeing that I am rarely involved and that I do not have to log in to TeamWork to answer a message sent per TeamWork—they are cc-ed per email and answers to these emails are distributed correctly, including into TeamWork it self. However, now we are suddenly supposed to use TeamWork for ever more tasks, including those for which we already have better tools (notably Jira and Confluence). Even internal communication per email is frowned upon, with the schizophrenic situation that if I want to ask a product manager something regard this project, I must use TeamWork, for any other project, I would use email… What is next: No phone calls? I also note that these instruction come solely from project management. I have yet to see any claim from e.g. the head of the IT department or the head of the Software Development sub-department—or, for that matter, of the Product Management department.

    To boot: The main project manager, who appears to be the driving force behind this very unfortunate choice of tool, has written several emails* directed at me, cc-ing basically the entire team, where he e.g. paints a picture of me as a sole recalcitrant who refuses to use the tool (untruthful: I do use it, and I certainly not the only one less than impressed) and claims that it should be obvious that Confluence was out of the picture, because Confluence was intended to replace the requirements document (untruthful, cf. below), so that product management would be saved the effort to update it—but now it was decided that the requirements document should be updated again (untruthful, cf. below); ergo, Confluence is not needed. In reality, the main intention behind Confluence (resp. this specific use, one of many) was to track the open issues (cf. the problems discussed above). Not updating the requirements document was a secondary suggestion on the basis that “since we already have the answers in Confluence, we won’t need to have the requirements document too”. Not even this was actually agreed upon, however, and I always ran the line that everything that was a requirement belonged in the requirements document. To the best of my knowledge, the head of Software Development has consistently expressed the same opinion. There can be no “again” where something has never stopped.

    *Since I cannot access my email account at my client’s from home, I have to be a little vaguer here than I would like.

    In as far, as I have used TeamWork less than I could have, which I doubt, the problem rests largely with him: Instead of sending us a link to a manual (or a manual directly), what does he do? He sends a link to instructional videos, as if we were children and not software developers. There is a load of things to do on this project—wasting time on videos is not one of them. To boot, watching a video with sound, which I strongly suspect would be needed, requires head-phones. Most of the computers on the premises do not have head-phones per default, resulting in an additional effort.

    Drawing lessons in detail here is too complex a task, but I point to the need to hire people who actually know what they are doing; to remember that project management has several aspects, project reporting being just one; and to remember what is truly important, namely getting done on time with a sufficient degree of quality.

    While not a lesson, per se: Cloudstyle tools are only very, very rarely acceptable in an enterprise or professional setting—unless they run on the using companies own servers. (And they are often a bad idea for consumers too: In very many cases, the ones drawing all or almost all benefits compared to ordinary software solutions are the providers of the cloud services—not the users.) If I were the CEO of my client, I would plainly and simply ban the use of TeamWork for any company internal purposes.

  4. Last year four people were hired at roughly the same time to stock up the IT department, two (a developer and a tester) of which would have been important contributors to this project, while the other two (application support) would have helped in reducing the work-load on the developers in terms of tickets, thereby freeing resources for the project. They are all (!) out again. One of these was fired for what appears (beware that I go by the rumor mill) to be political reasons. Two others have resigned for reasons of dissatisfaction. The fourth, the developer, has disappeared under weird circumstances. (See excursion below.) This has strained the resources considerably, made planning that much harder, and left that much more work with us others to complete in the allotted time.

    Planner lesson: Try* to avoid hiring in groups that are large compared to the existing staff**. It is better to hire more continuously, because new employees are disproportionally likely to disappear, and a continuous hiring policy makes it easier to compensate. Keep an eye on new (and old…) employees, to make sure that lack of satisfaction is discovered and appropriate actions taken, be it making them satisfied or starting the search for replacements early. Watch out for troublesome employees and find replacements in time. (Cf. the excursion at the end; I repeatedly mentioned misgivings to his team lead even before the disappearance, but they were, with hindsight, not taken sufficiently seriously.)

    *Due to external constraints, notably budget allocations, this is not always possible (then again, those who plan those aspects should be equally careful). It must also be remembered that the current market is extremely tough for employers in the IT sector—beggars can’t be choosers.

    **Above, we saw (in a rough guesstimate) a 30% increase of the overall IT department, as well as a 100% increase of the application support team and a 50% increase of the QA team.

  5. At the very point where finally everyone intended for the project was allotted full-time and the looming deadline dictated that everyone should be working like a maniac, what happens? The German vacation period begins and everyone and his uncle disappears for two to four weeks… There was about a week were I was the single (!) developer present (and I was too busy solving production problems to do anything for the project…). Product management, with many issues yet to be clarified, was just as thin. There was at least one day, when not one single product manager was present…

    Lessons: Not much, because there is preciously little to be done in Germany without causing a riot. (And I assume that the planning that was done, did already consider vacations, being unrealistic for other reasons.)

Excursion on the missing developer: His case is possibly the most absurd I have ever encountered and worthy of some discussion, especially since it affected the project negatively not just through his disappearance but through the inability to plan properly—not to mention being a contributor to my personal frustration. He was hired to start in September (?) last year. Through the first months, he gave a very professional impression, did his allotted task satisfactorily (especially considering that he was often dropped into the deep end of the pool with little preparation), and seemed to be a genuine keeper. The one problem was that he had not actually done any “real” development (coding and such), instead being focused on solving tickets and helping the test department—and that was not his fault, just a result of how events played out with the on-going projects.

However, at some point in the new year, things grew weirder and weirder. First he started to come late or leave early, or not show up at all, for a variety of reasons including (claimed or real) traffic disturbances and an ailing girl-friend, who had to be driven here-and-there. Taken each on their own, this was not necessarily remarkable, but the accumulation was disturbing. To boot, he often made claims along the lines “it’s just another week, then everything is back to normal”—but when next week came he had some other reason. He also did not take measures to minimize the damage to his employer, e.g. through checking into a hotel* every now-and-then or through cutting down on his lunch breaks**. He especially, barring odd exceptions, did not stay late on the days he came late or come early on the days he left early. In fact, I quite often came earlier and left later than he did—I do not take a lunch break at all… A repeated occurrence was to make a promise for a specific day, and then not keep it. I recall especially a Thursday when we had a production release. He explicitly told me that he would be at work early the following day to check the results, meaning that I did not have to. I slept in, eventually arrived at work, and found that he was still not there. Indeed, he did not show the entire day… I do not recall whether there were any actual problems that morning, but if there were, there was no-one else around with the knowledge to resolve them until my arrival (due to vacations). One day, when he did not have any pressing reason to leave early, he upped and left early. Why? He wanted to go ride his horse…

*Hotels can be expensive, but he had a good job and it is his responsibility to get to and from work in a reasonable manner. Problems in this area are his problems, not his employers.

**In a rough guesstimate, he took an average of 45 minutes a day aside, even on days when he came late or knew that he would leave early. In his shoes, I would simply have brought a sandwich or two, and cut the lunch break down to a fraction on these days. He did not. There was even one day when he came in at roughly 11:30, went to lunch half-an-hour later, and was gone almost an hour… Practically speaking, he started the workday shortly before 13:00… He may or may not have stayed late, but he did not stay late enough to reach an eight hour day—security would have thrown him out too early in the evening…

The claimed problems with his girl-friend grew larger and larger. At some point, he requested a three-day vacation to drive her down to Switzerland for a prolonged treatment or convalescence. He promised that once she was there, there would be a large block of time without any further disturbance. The request was granted—and that was the last we ever saw of him. No, actually to my very, very great surprise, after a long series of sick notes and postponements, he actually showed up for three days, only to disappear again permanently after that, having announced his resignation and a wish to move to Switzerland. During these three days, I briefly talked to him about his situation (keeping fake-friendly in the naive belief that he actually intended to show up for the remainder of his employment) and he claimed that he actually had intended to show up two workdays earlier, feeling healthy, and, in fact, being fully clothed and standing in the door-way, when his girl-friend pointed out that the physician had nominally given him another two days. He then stayed at home, because, by his own claim, he felt that he could use that time better… In this manner, a three-day vacation somehow turned into three workdays—spread over a number of otherwise workless months.

Written by michaeleriksson

September 13, 2017 at 11:58 pm

Focus stealing—one of the deadly sins of software

leave a comment »

Experimenting with the (currently very immature) browser Aroraw, I re-encountered one of the deadly sins of software development: Presumptuous and unnecessary focus stealingw.

While I, as a Linux user, am normally not met with many instances of this sin, they are the more annoying when they do occur. Notably, they almost exclusively happen when I am off doing something completely unrelated on a different virtual desktopw, with the deliberate intention of finishing one thing and then revisiting the (as it eventually turns out) focus-stealing application once I am done or in five minutes. This re-visiting would include checking any results, answering any queries, giving confirmations, whatnot. Instead, I am pulled back to the focus-stealer mid-work, my concentration is disrupted, I have to switch my own (mental) focus to something new in a disruptive manner, and generally feel as if someone has teleported me from a (typically pleasant) situation to another (typically unpleasant).

There are other very good reasons never to steal focus, including that a typing or mouse-clicking user can accidentally cause an unwanted action to be taken. Consider, e.g., the user who is typing in a document, hits the return key—and sees the return being caught by a focus-stealing confirmation window, which interprets the return key as confirmation. In some cases, the user would have confirmed anyway, but in others he would not—and sometimes the results can be down-right disastrous.

Focus stealing is stealing: If an application steals focus, it takes something that is not its to take. Such acts, just as with physical property, must be reserved for emergencies and duress. Normally criminal acts can be allowable e.g. if they are needed to avert immediate physical danger; in the same way, focus stealing can be allowed for notifications of utmost importance, e.g. that the computer is about to be shut-down and that saving any outstanding work in the next thirty seconds would be an extremely good idea. Cases that are almost always not legitimate include requesting the user’s input; notification that a download is complete or a certain step of a process has been completed; and (above all) spurious focus stealing, without any particular message, because a certain internal state has changed (or similar).

“But some users want to be notified!!!”: This is not a valid excuse—we cannot let non-standard wishes from one group ruin software for another group. If there is a legitimate wish for notification (and most cases of focus stealing I have seen, have not been in situations where such a wish seemed likely to be common—even when allowing for the fact that different users have different preferences) other ways can be found than unwanted focus stealing. Consider e.g. letting the user specifically request focus stealing (more accurately, in this case, “focus taking”) for certain tasks by a checkbox or a configuration option (which, obviously, should be off per default), using a less intrusive notification mechanism (e.g. a notification in a taskbar or an auditory signal; may be on per default, but must be deactivatable), or the sending of an email/SMS (common for very long-running tasks and tasks on other computers; requires separate configuration).

As a particularity, if something requires a user involvement (e.g. a confirmation) before the application can continue, there is still only rarely a reason for focus stealing. Notably, users working on another desktop will almost always check-in regularly; those on the same desktop will usually notice without focus stealing; and there is always the above option of notification by other means. Further, for short-running tasks, it is seldom a problem that the user misses a notification—and he may well have physically left his computer for a long-running task.

Finally, any developer (in particular, those who feel that their own application and situation is important enough to warrant an exception) should think long and hard on the following: He may be about to commit one of the other deadly sins, namely over-estimating how important his application is to others. (Come to think of it, the applications that have stolen focus from me under Linux have usually been those of below average importance—the ones I use every now and then, or only use once or twice to see if they are worth having.)

Written by michaeleriksson

May 13, 2010 at 8:59 pm