Those Hacked Source Code Breaches At Microsoft, Nissan, Mercedes Are Ominous Lessons For Self-Driving Cars
It is said that software eats the world.
If that is indeed the case, presumably the source code underlying the software must be the atomic core that drives this global consuming beast.
They used to say that no bucks meant no Buck Rodgers, while today’s witticism might be that no source code means no supercilious software planetary takeover (or something like that). The effort to craft software requires hours upon hours of laborious and at times mind-numbing concentration to stipulate all the painstaking steps and activities that a system is supposed to undertake. Software developers use a variety of programming languages and can be at times perniciously cryptic and succinct in their coding, while in other cases the source code is quite elaborately expressed and somewhat self-evident in what it indicates.
There are two major opposition camps or heated viewpoints about how to best treat source code.
One belief is that source code ought to be kept close to the vest.
Keep the source code proprietary, hidden from view, and treat it like a deep dark secret that can only be seen by those that have an absolute need to take a glimpse. The thinking on this is that the source code is a form of revered Intellectual Property (IP) and should be housed under abundant lock-and-key safeguards. This is not simply due to retaining legal protections (which, certainly seems warranted as a result of significant costs involved in the labor to craft the programming code), but also because the source code might reveal the inner workings of the secret sauce or other vital machinations that should not be widely known (so it is believed).
The moment someone whispers or somehow leaks even the tiniest snippet of your source code, you need to immediately and with a great show of force put a stop to the leakage.
The other side of that secretive mantra is the polar opposite, namely, let all source code be free to roam.
Often referred to as open-source, the assertion is that you’ll never really be able to keep source code tightly under wraps, thus you might as well throw in the towel and make it readily available. Anyone that wants to see the source code is welcome to do so. In a sense, by proclaiming your source code to be free from the bonds of locked room confinement, the otherwise overbearing stress of trying to prevent others from seeing the code is completely deflated and not one iota of concern remains in that regard.
Like most things in life, some depict these two diametrically opposed stances as regrettable and altogether unnecessary extremes on a spectrum that might instead allow for a type of middle ground. These centrist proponents would likely suggest that some of your source code is fine to be padlocked and kept unseen, meanwhile other parts of the source should be openly displayed to the world at large. This seems like a potentially reasonable perspective, these mediators assert.
Not so, say the extremists, since this is the worst of both worlds’ contrivance rather than a best of both kinds of compromise.
By opening the door to any of your source code, it is said to be an invitation for further intrusion. The teasers revealed will simply whet appetites for more to be divulged. Furthermore, it could showcase aspects that make breaking into the rest of the source code a lot easier or at least allows for doing some mental reverse engineering to figure out what the remaining code consists of. Ergo, per the avid clampdown camp, whatever you do, assuredly do not allow your source code to see the light of day.
The opposing viewpoint by the open-source advocates is that you are never going to be able to prevent your source code from being inevitably seen. It will dribble out, one way or another. You will falsely lull yourself into thinking that you’ve got this caged animal and that there are no apparent means of escape. Instead, there is a likelihood that the creature is already out and about, you just do not know it, and you failed to take sufficient precautions because you foolishly assumed and continue blindly to assume that the enclosure is locked shut as tight as a drum.
Round and round this Merry-go-round we go on this contested topic.
Speaking of source code, consider a recent spate of newsworthy reports about high profile source code incursions.
In the headlines recently, Microsoft acknowledged that some of its source code was viewed in an unauthorized manner (per their press release on the matter): “We detected unusual activity with a small number of internal accounts and upon review, we discovered one account had been used to view source code in a number of source code repositories. The account did not have permissions to modify any code or engineering systems and our investigation further confirmed no changes were made. These accounts were investigated and remediated.”
For my coverage of the hacks about the SolarWinds cybersecurity mess, see my coverage at this link here.
You might also recall that recently there have been reports of leaked source code from Nissan, reportedly due to a misconfigured Git server (a Git server is an online facility to store source code and other related configuration facets for programming).
And, last year, there was a news report that Mercedes had encountered a source code reveal. Apparently, a Git site being used by Daimler AG was used by an unauthorized party to take a look at source code for the Onboard Logic Units (OLU) software used in some models of Mercedes-Benz vans.
There have been mixed reactions to these source code eyeball infiltrations.
To some, this emerging wave of known source code exposures is an obvious sign that trying to keep source code tucked away is bereft with issues and ultimately untenable as an approach to handling your source code (keeping in mind that the reported cases are probably just a tiny portion of the actual number of such instances). Others though point out that this merely indicates that there are people out there that will undercut the hard work of others and be willing to perform seemingly evil acts. There will always be evildoers and there will always be a need to have steel vaults and electrified fences to keep out intruders.
One aspect that is notable about source code breaches is how readily these incursions tend to be downplayed. The firms so struck are bound to paint a picture that these events are not especially earth-shattering, and thus in a Yoda-like way attempt to get you to look elsewhere and not become focused on the ramifications of such break-ins.
For the general public, they are oftentimes not quite sure what to make of these matters.
Just because someone was able to see your source code, it doesn’t appear to be anything to write home about and, though certainly disturbing and ought not to have taken place, appears to be a classic instance of no harm, no foul. Well, yes, it was assuredly foul to have dipped into the treasures of another, but merely looking seems harmless and ineffectual. Perhaps it is akin to breaking into a prized art museum and with rapt attention eyeing the works of art on the walls. As long as you don’t mar the artwork or spray graffiti, this seems like a peculiar though meaningless act.
Actually, there is harm being had.
I’ll get to those harms in a moment, and also poke holes in the aforementioned analogy to an art museum. Let’s be aboveboard and acknowledge that there are demonstrative problems associated with the unlawful revealing of proprietary source code.
Furthermore, we can up the ante.
Consider code that is quite serious stuff, namely the source code developed for self-driving cars.
Cars are life-or-death machines that roll around on our highways and byways. I say they are life-or-death due to the aspect that anyone driving a car, whether human or AI, have at their command the capability to determine where the car goes and what it does. Passengers inside a car are at risk, and so are nearby occupied cars, bike riders, and likewise, meandering pedestrians are at risk too.
You probably do not think of driving as a life-or-death matter, but it indeed is, and each time you are at the wheel, you are deciding the fate of others all around you. So are all the other drivers.
Time to unpack all this as it especially relates to self-driving cars.
Understanding The Levels Of Self-Driving Cars
As a clarification, true self-driving cars are ones that the AI drives the car entirely on its own and there isn’t any human assistance during the driving task.
These driverless vehicles are considered a Level 4 and Level 5 (see my explanation at this link here), while a car that requires a human driver to co-share the driving effort is usually considered at a Level 2 or Level 3. The cars that co-share the driving task are described as being semi-autonomous, and typically contain a variety of automated add-on’s that are referred to as ADAS (Advanced Driver-Assistance Systems).
There is not yet a true self-driving car at Level 5, which we don’t yet even know if this will be possible to achieve, and nor how long it will take to get there.
Meanwhile, the Level 4 efforts are gradually trying to get some traction by undergoing very narrow and selective public roadway trials, though there is controversy over whether this testing should be allowed per se (we are all life-or-death guinea pigs in an experiment taking place on our highways and byways, some contend, see my coverage at this link here).
Since semi-autonomous cars require a human driver, the adoption of those types of cars won’t be markedly different than driving conventional vehicles, so there’s not much new per se to cover about them on this topic (though, as you’ll see in a moment, the points next made are generally applicable).
For semi-autonomous cars, it is important that the public needs to be forewarned about a disturbing aspect that’s been arising lately, namely that despite those human drivers that keep posting videos of themselves falling asleep at the wheel of a Level 2 or Level 3 car, we all need to avoid being misled into believing that the driver can take away their attention from the driving task while driving a semi-autonomous car.
You are the responsible party for the driving actions of the vehicle, regardless of how much automation might be tossed into a Level 2 or Level 3.
Self-Driving Cars And Source Code Handling
For Level 4 and Level 5 true self-driving vehicles, there won’t be a human driver involved in the driving task.
All occupants will be passengers.
The AI is doing the driving.
How does the AI “know” how to drive a car?
For those that assume that an AI system that can drive requires human-like sentience, sorry to burst that bubble, the AI is just software (at least for now, though there is a lot of speculation about what AI of the future might be, see my predictions at the link here).
Underlying the AI-based driving systems there is source code that consists of conventional programming, millions upon millions of lines of code. Also, there are the use of Machine Learning and Deep Learning algorithms, which again are based on source code, along with the tons of data that is used to aid in training the computational pattern matching that is needed for driving a car.
Similar to the discussion earlier about the two divergent camps approach to source code, the self-driving car industry is likewise divided.
Some are advocating a decidedly open-source avenue for self-driving cars. Companies are developing open-source code for AI driving systems, and research entities including university AI labs doing so (see my coverage at this link here). Nonetheless, by-and-large the commercial automakers and self-driving tech firms are currently pursuing the proprietary route more so than the open-source path (that being said, some are doing a mix-and-match of their own private stuff with the added use of open source).
Is the proprietary or private source code akin to artwork in a locked museum and for which any unauthorized incursion is relatively benign if it does not seemingly mar or alter the code in place?
Simply stated, the answer is no.
Here’s why.
Those that get their eyes on the source code are just as likely able to copy it. In that case, they now have the source code in their own hands, separate and apart from wherever the source code was originally housed. With that copy, they can leisurely study it, and then make changes to their heart’s content and try to redeploy the software (more on this in a moment).
In an art museum, you are presumably looking at the originals of the artwork. There is a desire to keep that original artwork pristine and pure, unaltered or damaged in any respect. Generally, you can discern the difference between the true original and any faked or fabricated version.
With source code, there is essentially no ready way to ascertain whether the copy is a copy, and essentially it is a complete and indistinguishable copy of the original (all else being equal).
Furthermore, the source code is malleable in a manner that an artwork does not quite equally imbue.
All in all, though news reports seem to suggest that someone only glanced at the source code, the reality is that they could very well have copied it, and also then opt to change it, as they might so desire.
Indeed, the intruder might not have changed the so-called original instance, since the source code might be maintained in a read-only status at its point of origin, though this also imbues a potential “hidden” unrealized danger for those that are relying upon the original source code (in essence, if the intruder could have altered the original source code at its point of normal storage, it bodes for quite grave concerns about what changes were made, and especially if the developers of the source code are unaware of what was altered and are not deliberately seeking to find any such changes).
Okay, so let’s assume that the source code is still intact at its original point of storage (which might not necessarily be so) and that the intruder has “only” grabbed a copy of the source code.
Even if the intruder doesn’t seek to change the code, they can at least inspect the code, doing so for nefarious purposes. They can look to find weaknesses in the source code.
This might allow the intruder to devise a means to crack into the system that is running the software based on that source code. Or it might enable the intruder to find a means to augment the software and get the system to accept a type of Trojan Horse. Even in the simplest form of manipulation, there might be a discovered means of getting the running software to react in a way that you wish to have it do so, seemingly by a kind of magical mannerism when it is actually due to knowing what the presumed unrevealed internal mechanisms are doing.
For self-driving cars, the range of exposures is hopefully kept to less crucial elements, perhaps controlling the air conditioning or whether the entertainment system is working properly. The more dire possibilities include being able to access the driving controls or otherwise confound or redirect the AI driving systems (I won’t go into nitty-gritty details herein, but I’m sure you can envision the possible adverse outcomes, see my analyses in my columns).
Realize that there is source code underlying the software that runs all of the specialized sensors on a self-driving car, including the video cameras, radar, LIDAR, ultrasonic units, thermal imaging, etc. Knowing the intricacies of that source might provide insights about how to confuse a sensor or get it to misreport what has been sensed.
There is the source code that underpins the AI driving system as it brings together the sensory data and attempts to merge and align what the sensors are indicating. This is referred to as Multi-Sensor Data Fusion (MSDF), and usually is source code that is held in tight control and only seen by the programmers responsible for that capability.
The same can be said for the source code that entails the virtual world capability of the AI driving system, which is used to keep track of real-world sensed objects and try to figure out the surrounding driving environment. There is source code for the AI driving system portion that plans driving actions to perform. There is source code for the interface of the AI driving system to the driving controls, controlling the accelerator, the brakes, the steering, and the like.
All told, it is a veritable boatload of software and a massive shipload of source code.
Another devious aspect involves rewriting or changing the code and then trying to either put the altered version back into the source code repository, as though it was there all along or attempt to see if you can replace the running software with your altered version based on the source code changes that you’ve made.
There are cybersecurity concerns that some evildoers might arrange to be a seemingly everyday passenger for a self-driving car, and yet upon getting inside, would surreptitiously attempt to infiltrate the AI driving system by loading their alternative code into the on-board hardware. Presumably, this would be prevented by hardware and software security precautions, though if the source code has been thoroughly inspected by a bad actor, perhaps they will have found a gap or loophole that can be exploited.
The same qualms can be applied to the use of OTA (Over-The-Air) electronic updating.
We generally think of OTA as a great means of being able to remotely update the software of a self-driving car, thus quickly and easily keeping the AI driving systems updated (doing so without having to drive over to a dealership to do the updating). Unfortunately, the OTA also provides a prime portal for the infecting of computer viruses and other malware directly into the on-board AI driving system. Various cybersecurity protections are being built into the OTA, but if the bad actors can see what those protections are, this raises the chances of figuring out tricks or bypasses to let in their verboten code.
In short, being able to get access to proprietary source code provides numerous potential cybersecurity issues that can subsequently playout by a determined hostile hacker or evildoer.
Conclusion
The rule-of-thumb for those that are ardent believers of the proprietary source code approach is that they must always and continually be working under the assumption that their source code will get out. Those that take that mantra to heart are fervently bent on trying to ferret out all possibilities of how the revealing of their source code could lead to troubles and thus aim stridently to plug those pitfalls before the code is possibly ever leaked.
Essentially, the default mode of the software developers is that the source code has been or will be breached. In that manner of a cornerstone assumption, they should be devising the source code so that even if it is seen, the revealing facets will not undercut the security of the resulting system.
Can that mindset be fully realized?
The open-source proponents say that it is foolhardy to make such an assumption. Better to let all eyes see the source code, which also means that the “wisdom of the crowd” will find loopholes and gotchas, rather than relying upon the handfuls of programmers assigned to coding the privately held source.
If the software involved is relatively unimportant, perhaps a security breach of the source code is not particularly important. When the source code is used for software that has life-or-death consequences, any breach is worthy of substantive attention, and those developing AI driving systems are hopefully and diligently taking to heart the significance therein.
May the source be with you.
But only if that’s a good thing and not used for wrongdoing.