Skip to content

Conversation

holly-cummins
Copy link
Contributor

@holly-cummins holly-cummins commented Jul 17, 2025

As part of the test-classloading rewrite, I had to put in a hack to preload some Kubernetes-related test resources. I was a bit puzzled about why it made a difference, but decided maybe some class initialisation was writing system properties somewhere.

When I looked into it further, I realised the problem happened because dev services were being started on the bad path, and just the presence of the class in the classloader was enough; no system properties were being written. I was extra confused, until I found a guard that uses the presence of the class in the classloader to decide whether to start kubernetes dev services.

The guard used to work because we'd load test classes, and then load them + augment again before running tests. Now the augmentation happens much earlier, potentially before supporting classes have been loaded. This means the guard based on classes that have already been loaded is too fragile. Instead, I've kept the idea but attempted to load the class instead.

@holly-cummins holly-cummins changed the title Avoid the need to preload Kubernetes Avoid the need to preload mock Kubernetes server classes for Kubernetes tests to pass Jul 17, 2025
Copy link

quarkus-bot bot commented Jul 17, 2025

Status for workflow Quarkus CI

This is the status report for running Quarkus CI on commit 14a61d4.

✅ The latest workflow run for the pull request has completed successfully.

It should be safe to merge provided you have a look at the other checks in the summary.

You can consult the Develocity build scans.


private static boolean isClassAvailable(String className) {
try {
Class.forName(className);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this is correct, instead of the the variant where we pass the TCCL?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wondered about that. The example I coped did in fact use the TCCL, but I decided the TCCL wasn't the right choice, for two reasons. The first is that it would just go back through the FacadeClassLoader, and the FCL is a kind of expensive way to load classes. (It loads each class twice, and does inspection of the class to decide how to load it.) Because the class we're looking for is a Quarkus class, I decided that the classloader used to load the kube extension would also have good access to the test framework libraries, and could safely load it.
The second reason I didn't use the TCCL is that I wasn't 100% sure what it would be at that point, and whether it would be the FCL or a runtime classloader, and I've been fighting so many (self-inflicted) "TCCL is wrong at this point" chaos-bugs I was nervous to increase our reliance on the TCCL. :)

I think it probably actually doesn't matter, and all three options (FCL, defining classloader, or runtime classloader) would end up delegating to the system classloader. So if you think TCCL is a safer/less surprising choice, happy to change it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've also read the docs (which, admittedly, I should have done first, rather than going with intuition), and Package.getPackages() uses the caller's classloader, and Class.forName also uses the caller's classloader. So I've kept parity.

Class.forName does always initialise the class, which could be a waste if it wasn't used, but I think if it's on the classpath it's likely to be used (because if it's not used, the Kubernetes tests will probably fail :) )

@holly-cummins holly-cummins merged commit 450de5b into quarkusio:main Jul 18, 2025
57 checks passed
@quarkus-bot quarkus-bot bot added this to the 3.26 - main milestone Jul 18, 2025
@holly-cummins
Copy link
Contributor Author

Merging! I am SO glad to get my hack out of the codebase.

@metacosm
Copy link
Contributor

metacosm commented Aug 5, 2025

This seems to be causing issues with Kubernetes Dev Services, investigating…

@metacosm
Copy link
Contributor

metacosm commented Aug 5, 2025

More specifically, quarkiverse/quarkiverse#64 (comment) has started failing on July 21st, not running over the weekend, and this commit on Friday 18th appeared suspicious. I've since confirmed that the commit before that works as expected and this change is the one that makes the build fail but I haven't looked at why, yet. Locally, this appears to be an issue with the current namespace resolution when the dev service is used in the tests…

@holly-cummins
Copy link
Contributor Author

This seems to be causing issues with Kubernetes Dev Services, investigating…

Argh, let me know if I can help. I thought it was a pretty innocuous change, so I'm surprised it broke something. One of the next things on my todo list is to migrate the Kube dev services to the new model, and that will be a bigger, riskier change.

@holly-cummins
Copy link
Contributor Author

This once again reminds me that I need to do the thing so when an ecosystem CI starts failing it lists all the PRs in that build to alert change-owners that they might have broken something.

The two things that maybe might have changed are

  • Preloading those classes had some other effect that benefited the operator SDK tests
  • The changed check slightly changed the order in which things are started, causing dev services to start in a different sequence, or (if there are multiple profiles in those tests) causing a different profile to start first

The first would be easy to check by reinstating the pre-load code in FacadeClassLoader. The second is harder to check, and also harder to fix, and would really be a symptom of the bigger "do not start all dev services in the augmentation phase" problem that we're in the process of fixing. I guess one way to check it is to run just the tests for one profile (assuming there's more than one).

@metacosm
Copy link
Contributor

metacosm commented Aug 5, 2025

This once again reminds me that I need to do the thing so when an ecosystem CI starts failing it lists all the PRs in that build to alert change-owners that they might have broken something.

That would be neat, indeed… that said, in this case, you fixed something that was broken previously and was only working because things were actually not working as intended. So the feature should also notify the owner of the broken repo, not only the change owners… having the PR list that occurred since the last working build would help narrow things down a lot, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants