Michael Eriksson's Blog

A Swede in Germany

Detection of manipulation of digital evidence / Follow-up: A few points concerning the movie “Anon”

leave a comment »

In a recent discussion of the movie “Anon”, I noted, regarding the uselessness of digital evidence, “Whatever is stored […] can be manipulated”, with a footnote on the limitations of write-only storage (an obvious objection to this claim).

A probably more interesting take than write-only storage is the ability to detect manipulation (or accidental change). Here there are many instances where some degree of protection can be added, say, a check digit or a check sum for an identifier (e.g. a credit-card number) respectively a larger piece of content (e.g. an executable file), cryptographic verification of extended change history in a version-control system (notably Git), or any number of Blockchain applications (originating with Bitcoin). The more advanced uses, including Blockchains, could very well be legitimately relevant even in a court of law in some cases.

In most cases, however, these are unlikely to be helpful—starting with the obvious observation that they only help when used during the manipulation, which (today and for the foreseeable future) will rarely be the case.* Worse, the victim of a manipulation will also need to convince the court that e.g. the planted evidence would necessarily have been covered by such verification mechanisms: Consider e.g. someone who meticulously keeps all his files under version control, but where incriminating evidence is planted outside of it. He can, obviously, claim that any file or change of a file actually owned by him would have been registered in version control. However, how can he prove this claim? How does he defeat the (not at all implausible) counter that he kept all his regular files in version control, but that these specific files were left outside due to their incriminating character, in an attempt to hide them from a search by a third-party?

*I note e.g. that the technologies are partly unripe; that the extra effort would often be disproportionate; and that a use sufficiently sophisticated to be helpful against hostile law enforcement might require compromises, e.g. to the ability to permanently delete incriminating content, that could backfire severely. In a worst case scenario, the use of such could it self lead to acts that are considered illegal. For instance, assume that someone inadvertently visits a site with a type of pornography illegal in his own jurisdiction, that the contents are cached by the browser, at some point automatically stored in a file-system cache, and that all contents stored in the file system are tracked in such detail that the contents can be retrieved at any future date. Alternatively, consider the same example with contents legal in his jurisdiction, followed by travel with the same computer to a jurisdiction where those contents are illegal. Note that some jurisdictions consider even the presence in a browser cache, even unbeknownst to the user, enough for “possession” to apply; by analogy, this would be virtually guaranteed to extend to the permanent storage discussed here. (This example also points to another practical complication: This type of tracking would currently be prohibitive in terms of disk space for many applications.)

Even when such measures are used and evidence is planted within their purview, however, it is not a given that they will help. Consider (for an unrealistically trivial example) a credit-card number, where a single (non-check) digit has been manipulated. A comparison with the check digit will* make it clear that a manipulation has taken place. However, nothing prevents the manipulator from recalculating the check digit… Unless the original check digit had somehow been made public knowledge in advance, or could otherwise be proved, the victim would have no benefit in a court of law. Indeed, he, himself, might now be unaware of the manipulation. The same principle can be used in more advanced/realistic scenarios, e.g. with a Git repository: While a naive manipulation is detectable, a more sophisticated one, actually taking the verification mechanisms into consideration, need not be. In doubt, a sophisticated manipulator could resort to simply “replaying” all the changes to the repository into a fresh one, making sure that the only deviation in content is the intended.** If older copies are publicly known, deviations might still be detected by comparison—but how many private repositories are publicly known?*** The victim might still try to point to differences through a comparison with a private backup, but (a) the manipulator can always claim that the backup has been manipulated by the victim, (b) it is not a given that he still has access to his backups (seeing that they are reasonably likely to have been confiscated at the same time as the computer where the repository resides).

*With reservations for some exceptional case. Note that changing more than one digit definitely introduces a risk that the check digit will match through coincidence. (It being intended as a minor precaution against accidental errors.)

**Counter-measures like using time stamps, mac addresses, some asymmetric-key transfer of knowledge to identify users, …, as input into the calculations of hashes and whatnots can be used to reduce this problem. However, combining a sufficiently sophisticated attacker with sufficient knowledge, even this is not an insurmountable obstacle. Notably, as long as we speak of a repository (or ledger, Blockchain, whatnot) that is only ever used from the computer(s) of one person, chances are that all information needed, including private keys, actually would be known to the manipulator—e.g. because he works for law-enforcement and has the computer running right in front of him.

***In contrast, many or most Git repositories used in software development (the context in which Git originated) will exist in various copies that are continually synchronized with each other. Here a manipulation, e.g. to try to blame someone else for a costly bug or to remove a historical record of a copyright violation, would be far easier to prove. (But then again, we might not need a verification mechanism for that—it would often be enough to just compare contents.)

Worse: All counter-measures might turn out to be futile with manipulations that do not try to falsify the past. Consider some type of verification system that allows the addition of new data (events, objects, whatnot) and verifies the history of that data. (This will likely be the most typical case.) It might now be possible to verify that a certain piece of data was or was not present at a given time in the past—but there is no automatic protection against the addition of new data here and now. For instance, a hostile with system access could just* as easily plant evidence in e.g. a version-control system (by simply creating a new file through the standard commands of the version-control system), as he can by creating a new file in the file system.

*Assuming, obviously, that he has taken the time to learn how the victim used his system, which should be assumed if someone becomes a high-priority target of a competent law-enforcement or intelligence agency.

Then we have complications like technical skills, actual access to the evidence, and similar: If digital evidence has been planted and a sufficiently skilled investigator looked at the details, possibly including comparisons with backups, he might find enough discrepancies to reveal the manipulation. However, there is no guarantee that the victim of the manipulations has these skills*, can find and afford a technical consultant and expert witness, has access to relevant evidence (cf. above), … To take another trivial and unrealistic example: Assume that a manipulating police employee adds a new file into the file system after a computer has been confiscated. Before court, testimony is given of the presence of the file, even giving screen shots** verifying the name, position, and contents of the file—but not the time stamp***! With sufficient access and knowledge, the defense could have demonstrated that the time stamp indicated a creation after the confiscation; without, it has nothing—no matter what mechanisms were theoretically available.

*And even when he has these skills himself, he would likely still need an expert witness to speak on his behalf, because others might assume that his technical statements are deliberate lies (or be unwilling to accept his own expertise as sufficiently strong).

**I am honestly uncertain how this would be done in practice. With minor restrictions, the same would apply even if the computer was run physically in the court room, however. (But I do note that screen shots, too, can be manipulated or otherwise faked, making any indirect evidence even less valuable.)

***Here the triviality of the example comes in. For instance, even many or most laymen do know that files have time stamps; the timestamp too could have been manipulated; if the computer was brought into the court room, the defense could just have requested that the time stamp be displayed; … In a more realistic example, the situation could be very different.

Excursion on auditing:
Some of these problems could be reduced through various forms of more detailed user auditing, to see exactly who did what and when. This, however, runs into a similar set of problems, including that such auditing is (at least for now) massive overkill for most computer uses, that auditing might not always be wanted, and that the auditing trail can it self be vulnerable to manipulation*. To boot, if a hostile has gained access to the victim’s user account(s), auditing might not be very helpful to begin with: It might tell us that the user account John.Smith deleted a certain file at a certain time—but it will not tell us whether the physical person John Smith did so. It could equally be someone who has stolen his credentials or otherwise invaded the account (e.g. in the form of a Bundestrojaner).

*To reduce the risk of manipulation, many current users of auditing store audit information on a separate computer/server. This helps when the circumstances are sufficiently controlled. However, when both computers have been confiscated, the circumstances are no longer controlled. To boot, such a solution would be a definite luxury for the vast majority of private computer users.

Excursion on naive over-reliance in the other direction:
Another danger with digital evidence (in the form discussed above or more generally) is that a too great confidence in it could allow skilled criminals to go free, through manipulation of their own data. A good fictional example of this is given in Stephen R. Donaldson’s “Gap Cycle”, where the (believed to be impossible) manipulation of “datacores”* allows one of the characters to get away with horrifying crimes. Real-life examples could include an analogous manipulation of tachographs or auditing systems, if these were given sufficient credibility in court.

*The in-universe name for an “append-only” data store, which plays a similar (but more complex and pervasive) role to current tachographs in tracking the actions taken by a space ship and its crew.

Excursion on digital devices in general:
Above I deal with computers. This partly, because “traditional” computers form the historical main case; partly, because most digital devices, e.g. smart-phones, formally are computers, making it easier use “computer” than some other term. However, the same principles and much of the details apply even with a broader discussion—and for a very large and rising proportion of the population, smart-phones might be more relevant than traditional computers.

Advertisement

Written by michaeleriksson

July 11, 2018 at 2:34 am

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: