What you’ll learn:
- The risks of OTA updates.
- How to avoid bad OTA practices.
- Six crucial steps to ensure safer OTA rollouts.
The proliferation of mobile devices and IoT means that over-the-air (OTA) updates—the wireless delivery of new software, firmware, or other data to wireless-enabled devices—have become ubiquitous and crucial. OTAs ensure that remote software in devices stays secure and updated more efficiently than upgrading each device individually, but they’re not without risks.
Fitbit was recently in the news when a seemingly innocent OTA software update designed to fix bugs allegedly left devices “non-functional” or, in the case of Rivian’s software update, resulted in some of its vehicles needing physical repairs.
The One-Push Saboteur: The Risks of OTA Updates
Unlike with traditional framework devices, there are dangers associated with OTA updates, which are done remotely in the background and without requiring user oversight. On the other hand, manual updates via USB and other direct connections mean that there’s an added layer of control that’s missing with OTAs.
While one function of an OTA is to fix bugs, without due diligence, the reverse can happen. New bugs unrelated to hardware compatibility can be introduced, resulting in sensor malfunction, data loss, crashes, and breaking key device functionality. At worst, code that’s not exhaustively compatibility tested could risk permanently rendering devices unusable or “bricking” them, even if they remain powered on.
Corrupted firmware or updates that are incompatible with hardware configurations or variants are potential causes. Wirelessly updating devices have the benefit and the intention of saving original equipment manufacturers (OEMs) time and money. Conversely, though, the convenience of instantly updating devices means that one false move could brick up an entire batch of units, which could run into thousands or millions.
Less extreme but also problematic, an erroneous OTA may result in thermal-runaway processes, draining batteries. Elsewhere, bugs through faulty OTAs can prevent devices from performing as they should, including preventing them from entering sleep states or causing updates to endlessly loop and deplete batteries in this way.
OTAs offer no user control of visibility, with many updates starting and self-installing. Often, no option exists for update deferral or selecting which updates to deploy.
6 Approaches on How to Swerve Bad OTA practices
To avoid update scenarios that stem optimal device functionality, here are six ways to ensure safer OTA rollouts:
1. Test, test and test again
Attention to detail and rigorous testing on hardware configurations and a wide range of permutations of software that will apply to the device in question, e.g., Apple OS versions, will lay a solid foundation for the safe delivery of OTAs.
Specifically, test cases need to validate hardware compatibility, the impact of the OTA on battery life, feature functionality, performance, and data issues. Testing should ensure that there’s no bricking, firmware corruption, crashes, or regression, as well as no abnormal battery depletion.
For testing to be considered industry best practice, it should be carried out across the board and encompass unit, integration, system, and regression. Edge cases should not be discounted either, simulating unreliable network connectivity and low device memory and storage.
2. Ensure image security and validation
To prevent corrupted or malicious updates in the OTA system, techniques such as checksums and digital signing should be applied.
Checksums recorded when an update commenced can be used to confirm that the downloaded image is correct before attempting to apply it. For added security, consider signed updates via the manufacturer’s private key to guarantee that only official updates received by the OTA system are ever deployed.
3. Widen your beta group
Post-inhouse testing considers closed beta trials with a subset of users for an additional layer of validation and “real world” data on how an OTA will function in uncontrolled areas to further minimize risk. The beta test group should span geographic areas, levels of tech knowledge, use patterns, ages, and demographics.
Key data on issues experienced by the beta group is provided via telemetry analytics. Any surface failures missed during QA testing should come to the fore via bug reports.
4. Stagger your rollout
A phased or staggered rollout (see figure) from the beta user group is another way to minimize the risk of OTAs. The phasing would, for example, ensure the update reaches a small percentage of users first and then increases gradually. Again, monitoring for anomalies would enable OEMs to hit pause on rollout as issues arise so that they can be tackled before they affect additional users.
5. Track and analyze issues via telemetry
Running granular analytics throughout the testing and stage rollout will provide real-time insights into any issues with OTA updates. Ensure that errors are reported and monitored through analytics so that flaws and faulty updates can be assessed and a decision can be made before deciding whether to continue the OTA rollout or pause.
Furthermore, having a built-in and configurable rollback capability provides safeguards to reverse a rollout if critical issues crop up. Rollbacks can be triggered across all devices or only in specific problematic configurations.
The good news is that cloud and connected IoT technologies are enabling designers to collate information from devices in real-time. Now, with AI systems more in use, information on devices out in the field can be linked to AI systems for early error detection before the rollout further progresses to bigger audiences.
6. A/B deployment
We’ve used this technique in the past, where software running from “slot A” downloads the OTA update into slot B. In case of a faulty update that affects the device functionally, the new software version can be programmed to revert to slot A without it affecting the device. This significantly improves the reliability of devices operating in the field
Creating Safer OTA Updates
Over-the-air updates come with a multitude of potential risks, as seen in recent news headlines. Exhaustive device testing, beta groups, phased rollouts of OTA, and real-time issue identification via telemetry are some of the ways that companies can build in safeguards. With robust planning, rigorous execution, and deploying a spectrum of industry best practices or partnering with an experienced firm, companies can deploy OTA updates safely and seamlessly.
Read more articles in the TechXchange: Cybersecurity.