Tech alert 2025-01-21: Analytics Agent service may fail to start due to issue in self-update

Background

On Monday, 20th of January 2025, Applixure discovered that a published update to an Applixure Analytics Agent main code file on Windows had a logical fault in its structure. This fault would prevent the Applixure Agent service from loading the updated code once the service restarted after the self-update had been applied to the Agent installation on the machine, resulting in the service failing to start. As the updated code file was the latest update known to the Agent, it would not try to revert using an older known-good version that it had earlier self-updated to and/or shipped with the Agent's installer.

Once the issue was found, the faulty update was pulled from being published so that no further Agent devices would self-update into trying to use it. Unfortunately, if one or more Windows devices in your Applixure environment did manage to self-update into using this version of the update, the Applixure Agent service on those machines cannot successfully start until explicit actions are performed against that Agent installation, by using management tools or manually on a machine. As Applixure Analytics Windows Agent can only self-update itself while the service itself is in a running state, Agents on those devices cannot recover by themselves via Agent's own self-update mechanism.

 

How to know if I'm affected?

If you have Applixure Agents affected by this issue, you can tell by the fact that any Agent installations that were in running state before Monday 20th of January and reporting data to your Analytics dashboard, have the Agent's Windows service start and immediately stop after it has tried to start, leaving the Agent in the stopped state. Furthermore, the Windows Application event log will show an Error-level message from ApxService stating that the "Core did not initialise, exiting." each time the Agent service has tried to start:

error_in_eventlog.png

 

Another possibility to find out which Agent devices most likely have this issue is to use devices search in the Analytics Web UI to filter out devices that have been last seen on the 20th of January, or shortly before, but not since. You can enter a search query of LastSeen <= "2025-01-20" and LastSeen >= "2025-01-16" into a filter box to list all devices in your environment matching that condition:

filtering_string.png

 

Please note: As some of those Agent devices might have been offline since that date for other reasons as well, this list of devices not reporting in after a specific date might not be indicative of them actually having the non-starting Agent condition.

Tip for Workflow users: If you are also using a Applixure Workflow product and you have multiple environments to check, you can use the exact same filtering syntax presented above as a data filter for new rule in Workflow to get this information from multiple environments at once. Resulting work item should include all devices from all environments synchronized to a Workflow board that match the condition.

 

If you have determined that you have one or more devices with Applixure Analytics Windows Agents affected by the faulty update condition, please see the following section for instructions on what actions must be performed to remediate the situation and allow Agent service to start up again. It is also safe to perform option 2 (see below) of remediation action on Agents not affected by the condition, as doing so should not negatively impact their reporting of data to Analytics, so you may do so across all of your installed base of Applixure Agents even if unsure whether this condition applies to you.

 

Remediating the issue

There are two possible ways of correcting the issue with Applixure Agents not being able to start their services due to it having cached faulty update code file.

Applixure recommends option 2 as a course of action, as that will ensure that in the future the Applixure Agent installations are better guarded against faulty updates (see next section).

 

Option 1: one-time removal of downloaded updates

By removing directory c:\ProgramData\Applixure\Core from the disk on the machine, you can clear out the problematic update and allow Applixure Agent service to start up using installation-time working code.

This operation can be done as a mass operation using a simple script that runs on the devices and could be distributed through a device management software you originally used to install the Applixure Agent on the devices or through some other management tool that allows file/directory operations on your devices.

Please note that after running this directory removal operation, you must explicitly start Applixure Agent service (or wait for the next device reboot) for it to get back into a working state.

 

As an example, a simple Windows Powershell script - run with administrative privileges - to achieve this would be:

Remove-Item -Path C:\ProgramData\Applixure\Core -Force
Start-Service -Name ApxService

 

Please note: As a file in the directory being removed might been locked by the context process (ApxUserContextHelper.exe) running in the currently logged-on user account's session, the whole directory might not get deleted despite all the other files being removed. This is fine, as the issue will be fixed as long as the files used by the service process (axpcore.XXXXXXX files) are deleted from the directory.

 

Option 2: (re)installing Applixure Agent over on the device

Another option is to download the latest Applixure Agent installation MSI package from the Analytics Web UI, and distribute it to the devices. You can install a newer version of the Agent's package over the existing installation as it would just upgrade the already existing installation, and at doing so, would include the newer code file for the Agent that the service automatically uses instead of the faulty version cached on disk.

An additional benefit for remediating the Agent installation using this option is that the service process itself - installed in Program Files - is updated to the latest version, as that now contains additional recovery logic to prevent a similar issue from occurring in the future. The Agent's self-update mechanism is not able to update the service executable installed from the MSI package, so any future updates to this code always require upgrading the Agent's installation itself using software distribution tools. Applixure generally recommends replacing the Agent installer in the software distribution for your environment with the latest version from time to time to make sure new devices always receive the latest version as possible, even before auto-update can occur.

 

How do we plan to prevent this happening again?

As a result of detecting this possible now-realised failure condition in the Agent's update files, Applixure has implemented the following improvements:

Going forward, the build process for updates includes an extra step that validates the build results automatically against logical structural issues in the code, resulting from build tools incorrectly composing the update from several distinct technical code units, that would prevent Agent's service from loading it. This process automatically prevents non-working code from being published in the first place.

Additionally, the Agent's service's code has been improved to better guard against failures to load or to initialise updated code files. If such a condition happens, the service process explicitly will blacklist the said update and will skip it the next time it starts, effectively rolling back locally to a last-known-good condition.

Have more questions? Submit a request

0 Comments

Article is closed for comments.
Powered by Zendesk