Skip to content

Fix UserAgent ANR - Take 2 #14431

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: trunk
Choose a base branch
from

Conversation

hichamboushaba
Copy link
Member

@hichamboushaba hichamboushaba commented Aug 5, 2025

Closes WOOMOB-968

Description

This PR is a second attempt to try to fix the UserAgent ANR, as a reminder, this is an ANR that occurs when calling WebSettings.getDefaultUserAgent, we call this on the UserAgent#init, which happens on app launch, and sometimes it results in a background ANR. This is a known issue for Google, as the call is heavy, and they suggest making it on a background thread to avoid blocking the main thread, something that we tried on the first attempt, but it resulted in another WebView crash (peaMlT-Tk-p2), and we had to revert our fix.

My unproved theory for the crash is that the usage of the background thread in the UserAgent#init increased the chances for the stuck process situation that's explained here, and thus leading the AwDataDirLock crashes.

Now, we need to take a different approach for the fix, and I'll list the options we have to discuss and pick the better one:

Option 1: Use two UserAgent variants, one for API requests and one for WebView

My understanding is that for the API requests, the most important part of the UserAgent is just the app name and version, as the other parts of the UserAgent are more important when viewing HTML content where the web server might need to adapt the content depending on the WebView capabilities.
So based on the above, the idea here is to use two UserAgent variants:

  • One for the API requests, it will use the VM property http.agent, this is the default UserAgent of the device before adding the WebView parts, and it's the default value used by HttpUrlConnection
  • For the WebView, we'll keep using WebSettings.getDefaultUserAgent as then it will be called on foreground when the WebView is being initialized, and it will generally be fine.

For comparison, with this change, and with an emulator running Android 15, we'll use the following values:

  • apiUserAgent='Dalvik/2.1.0 (Linux; U; Android 15; sdk_gphone64_arm64 Build/AE3A.240806.043) wc-android/22.9-rc-2'
  • webViewUserAgent='Mozilla/5.0 (Linux; Android 15; sdk_gphone64_arm64 Build/AE3A.240806.043; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/124.0.6367.219 Mobile Safari/537.36 wc-android/22.9-rc-2'

Option 2: Use SharedPreferences for caching the UserAgent.

(This was the initial approach I used in this PR; check it in commit c74f359. After further research on AwDataDirLock and considering the shared theory above, I believe this could cause the same crashes. Sharing for discussion only)
In this approach, SharedPreferences will serve as a cache for the value. The plan is to load the initial value from SharedPreferences and then, after a set delay, update the cache (in case WebView has been updated).
The key factor in this fix is the delay before calling WebSettings.getDefaultUserAgent. When crashes related to AwDataDirLock occurred, we believe they mainly happened during app startup, as there were no encrypted logs available (edit: still confused about the lack of logs, but I'm not convinced it means necessarily app startup).

Option 3: Use SharedPreferences for caching the UserAgent 2

(This is a third option that's similar to Option 2, but which could be more robust, I didn't implement it just because Option 1 seemed simpler, I can implement it if we believe keeping the same UserAgent value for both API requests and the WebView is beneficial.)
In this option, we'll use the SharedPreferences as cache, but we'll make sure to call WebSettings.getDefaultUserAgent only when the app is going to foreground, when the app is going to foreground, there are less chances of keeping the process stuck, as the app will be given higher priority by the system. To achieve this, we can use ProcessLifecycleOwner and invoke the loading of the UserAgent when the app reaches the Started state.

@JorgeMucientes @malinajirka @wzieba pinging you as you have more context on this issue given the discussions on Linear, please share your thoughts on the suggested approaches.

Testing information

API requests

  1. Use a tool to inspect network requests (App Inspection from Android Studio or Flipper)
  2. Launch the app.
  3. Check some requests and confirm they have the expected UserAgent, in the format <http.agent> wc-android/<version>

WebView

  1. Open Blaze campaign creation screen.
  2. Enter all details and tap on confirm.
  3. Tap on the payment method button.
  4. Tap on Add a new payment method.
  5. Confirm the WebView loads as expected and that no nav bar is shown (I mean the Calypso nav bar)

The tests that have been performed

The above.

  • I have considered if this change warrants release notes and have added them to RELEASE-NOTES.txt if necessary. Use the "[Internal]" label for non-user-facing changes.

@hichamboushaba hichamboushaba added the type: crash The worst kind of bug. label Aug 5, 2025
@dangermattic
Copy link
Collaborator

dangermattic commented Aug 5, 2025

1 Warning
⚠️ View files have been modified, but no screenshot or video is included in the pull request. Consider adding some for clarity.

Generated by 🚫 Danger

private const val APP_VERSION = "1.0"

@OptIn(ExperimentalCoroutinesApi::class)
@RunWith(RobolectricTestRunner::class)
class UserAgentTest {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unit tests are broken now, I updated them when I implemented Option 1, but now they will fail, I will fix then when we agree on the approach.

@wpmobilebot
Copy link
Collaborator

wpmobilebot commented Aug 5, 2025

📲 You can test the changes from this Pull Request in WooCommerce-Wear Android by scanning the QR code below to install the corresponding build.
App Name WooCommerce-Wear Android
Platform⌚️ Wear OS
FlavorJalapeno
Build TypeDebug
Commit87ac35a
Direct Downloadwoocommerce-wear-prototype-build-pr14431-87ac35a.apk

@wpmobilebot
Copy link
Collaborator

wpmobilebot commented Aug 5, 2025

📲 You can test the changes from this Pull Request in WooCommerce Android by scanning the QR code below to install the corresponding build.

App Name WooCommerce Android
Platform📱 Mobile
FlavorJalapeno
Build TypeDebug
Commit87ac35a
Direct Downloadwoocommerce-prototype-build-pr14431-87ac35a.apk

Copy link
Contributor

@wzieba wzieba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the idea here is to use two UserAgent variants:

This sounds good to me 👍

I can't say I fully understood the AwDataDirLock issue (I read the attached comment from the tracker, but still) or how moving to the background thread could increase this, but having two user agents sound to me like a completely valid approach to test.

@malinajirka
Copy link
Contributor

I forgot to reply yesterday 🤦‍♂️.

Thanks for clearly summarizing your findings @hichamboushaba! I also think testing the two-agents approach is worth a shot.

@JorgeMucientes
Copy link
Contributor

Same as shared by Wojtek. I didn't really get the reasons why the last time moving the UserAgent initialization to the background led to the crashes. In any case, the 2 userAgents approach sounds like a good approach to test.

@hichamboushaba
Copy link
Member Author

hichamboushaba commented Aug 11, 2025

I can't say I fully understood the AwDataDirLock issue (I read the attached comment from the tracker, but still) or how moving to the background thread could increase this

Thank you all for the input, just regarding this, I'll try to explain further my theory here.
In our app, we use WorkManager to handle some background tasks, these background tasks use the NetworkType.CONNECTED constraint, so according to my theory, what could happen is that when the network is unstable, then this could happen:

  1. WorkManager starts the execution of a task, which will trigger the background thread for getting the UserAgent.
  2. Network disconnects quickly after, and WorkManager stops the Worker, then reschedule it for when Network connects again.
  3. For some reason, the process gets stuck (as discussed in the above issue)
  4. Network gets connected again, and the Worker is launched.
  5. Android starts a new process, and we launch a new background thread for getting the UserAgent.
  6. AwDataDirLock exception is thrown as we have now two processes accessing the same data dir.

This is just a theory, and I can't prove it, but it seems to match what we had, as all the crashes happened after a NETWORK_AVAILABLE event (as mentioned here peaMlT-Tk-p2#comment-2286).


The PR is now ready for review.

We'll save the user agent to SharedPreferences, and then load it from them on subsequent launches.
We'll keep the value up-to-date by lazy call to `WebSettings.getDefaultUserAgent` hoping this would avoid the race conditions leading to the `AwDataDirLock` crash.
We now have two userAgents, one used for API calls, and one for the WebView. The one used in API calls uses the `http.agent` property, to avoid ANRs caused by `WebSettings.getDefaultUserAgent`
@hichamboushaba hichamboushaba force-pushed the issue/WOOMOB-968-fix-UserAgent-ANR branch from 4a3f4d6 to 87ac35a Compare August 11, 2025 16:28
@hichamboushaba hichamboushaba added this to the 23.1 milestone Aug 11, 2025
@hichamboushaba hichamboushaba marked this pull request as ready for review August 11, 2025 16:29
@codecov-commenter
Copy link

Codecov Report

❌ Patch coverage is 0% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 37.95%. Comparing base (a8c5399) to head (87ac35a).

Files with missing lines Patch % Lines
...a/org/wordpress/android/fluxc/network/UserAgent.kt 0.00% 15 Missing ⚠️
...erce/android/ui/compose/component/web/WCWebView.kt 0.00% 1 Missing ⚠️
...pplicationpasswords/ApplicationPasswordsNetwork.kt 0.00% 1 Missing ⚠️
...onpasswords/WPApiApplicationPasswordsRestClient.kt 0.00% 1 Missing ⚠️
...pcom/jetpackai/JetpackAITranscriptionRestClient.kt 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##              trunk   #14431      +/-   ##
============================================
- Coverage     37.95%   37.95%   -0.01%     
+ Complexity     9188     9187       -1     
============================================
  Files          1989     1989              
  Lines        112311   112316       +5     
  Branches      14814    14815       +1     
============================================
- Hits          42630    42629       -1     
- Misses        65799    65804       +5     
- Partials       3882     3883       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@JorgeMucientes JorgeMucientes self-assigned this Aug 12, 2025
}

override fun toString(): String = userAgent
override fun toString(): String = apiUserAgent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason to keep this? Its unused.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure it's really unused? I left it because I'm not entirely sure it's not used somewhere, AS find usages doesn't work well here, because it's an overriden function.

If we can confirm it's unused, I also prefer to have a better toString implementation here, or to get rid of the implementation completely.

Copy link
Contributor

@JorgeMucientes JorgeMucientes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @hichamboushaba, everything works as expected and code looks good. I just left a minor suggestion but nothing blocking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: crash The worst kind of bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants