Skip to content

Conversation

pcrespov
Copy link
Member

@pcrespov pcrespov commented Sep 10, 2025

What do these changes do?

This PR addresses some issues discovered while investigating a user report together with @sanderegg

  • ♻️ The RPC server now includes an access log (similar to HTTP APIs) that records the lifecycle of calls on the server side at INFO level. This makes additional logs decorating RPC handlers unnecessary (NOTE: this is on the server side, the client side might still make sense in some cases).
  • ♻️ Unhandled exceptions (i.e., not caught within an RPC handler) are consistently converted into RPCServerError, serialized, transmitted to the client, and re-raised on the client side.
    • These errors are equivalent to HTTP 500 responses and are logged on the server. The logging of these errors has been enhanced for better traceability which includes an OEC.
  • 🐛 @GitHK : Replaced raise web.HTTPFound by an HTTP_404_NOT_FOUND error http response and logging the issue since 1. exception messages must never be added "as is" to the http errors 2. avoids allowed_errors=(StopDynamicServiceTaskError,), in the LRT's registry (i.e. those are logged and returned as ResultField).
    *🐛 @sanderegg : fixes UnboundLocalError in DaskClient.get_task_status (I am surprised mypy did not find this)
            except KeyError as exc:
                # the task does not exist
                _logger.warning(
                    **create_troubleshootting_log_kwargs(
                        f"Task {job_id} not found. State is UNKNOWN.",
                        error=exc,
>                       error_context=log_error_context,
                                      ^^^^^^^^^^^^^^^^^
                        tip="If the task is supposed to exist, the dask-schdeler has probably restarted. Check its status.",
                    ),
                )
E               UnboundLocalError: cannot access local variable 'log_error_context' where it is not associated with a value

Related issue/s

  • maintenance

How to test

Dev-ops

@pcrespov pcrespov self-assigned this Sep 10, 2025
@pcrespov pcrespov added the a:services-library issues on packages/service-libs label Sep 10, 2025
@pcrespov pcrespov added this to the Cheops milestone Sep 10, 2025
Copy link

codecov bot commented Sep 10, 2025

Codecov Report

❌ Patch coverage is 75.00000% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.67%. Comparing base (7913c04) to head (dd620bb).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8346      +/-   ##
==========================================
- Coverage   87.83%   87.67%   -0.17%     
==========================================
  Files        1945     1520     -425     
  Lines       75540    62912   -12628     
  Branches     1312      658     -654     
==========================================
- Hits        66351    55155   -11196     
+ Misses       8794     7524    -1270     
+ Partials      395      233     -162     
Flag Coverage Δ
integrationtests 64.00% <58.33%> (-0.03%) ⬇️
unittests 86.06% <75.00%> (-0.44%) ⬇️
Components Coverage Δ
pkg_aws_library ∅ <ø> (∅)
pkg_celery_library ∅ <ø> (∅)
pkg_dask_task_models_library ∅ <ø> (∅)
pkg_models_library ∅ <ø> (∅)
pkg_notifications_library ∅ <ø> (∅)
pkg_postgres_database ∅ <ø> (∅)
pkg_service_integration ∅ <ø> (∅)
pkg_service_library 71.10% <100.00%> (+0.01%) ⬆️
pkg_settings_library ∅ <ø> (∅)
pkg_simcore_sdk 85.03% <ø> (+0.05%) ⬆️
agent 93.53% <ø> (ø)
api_server 91.91% <ø> (ø)
autoscaling 95.77% <ø> (ø)
catalog 92.36% <ø> (+0.02%) ⬆️
clusters_keeper 99.13% <ø> (ø)
dask_sidecar 91.81% <ø> (ø)
datcore_adapter 97.94% <ø> (ø)
director 75.81% <ø> (ø)
director_v2 90.93% <100.00%> (-0.03%) ⬇️
dynamic_scheduler 96.27% <ø> (ø)
dynamic_sidecar 90.46% <ø> (-0.03%) ⬇️
efs_guardian 89.62% <ø> (ø)
invitations 91.44% <ø> (ø)
payments 92.61% <ø> (ø)
resource_usage_tracker 92.24% <ø> (ø)
storage 86.49% <ø> (ø)
webclient ∅ <ø> (∅)
webserver 87.98% <63.63%> (-0.04%) ⬇️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7913c04...dd620bb. Read the comment docs.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@pcrespov pcrespov added the t:maintenance Some planned maintenance work label Sep 10, 2025
@pcrespov pcrespov marked this pull request as ready for review September 10, 2025 09:20
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves RPC server logging and error handling by adding comprehensive access logs and enhancing error reporting mechanisms. The changes focus on better traceability and consistent error handling across RPC calls.

Key Changes:

  • Enhanced RPC server with access logging similar to HTTP APIs to track call lifecycles at INFO level
  • Improved unhandled exception handling with structured logging and error codes
  • Replaced HTTP exception raising with proper error response creation in dynamic service operations

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
services/web/server/src/simcore_service_webserver/projects/_controller/nodes_rest.py Enhanced error handling in _stop_dynamic_service_task with structured logging and proper error responses
services/catalog/src/simcore_service_catalog/api/rpc/_services.py Removed redundant @log_decorator annotations from RPC methods
packages/service-library/src/servicelib/rabbitmq/_rpc_router.py Added comprehensive RPC access logging and improved exception handling with error codes
packages/service-library/src/servicelib/rabbitmq/_errors.py Updated RPCServerError message template to include error codes

@pcrespov pcrespov enabled auto-merge (squash) September 10, 2025 09:26
@pcrespov pcrespov added the 🤖-automerge marks PR as ready to be merged for Mergify label Sep 10, 2025
Copy link
Contributor

mergify bot commented Sep 10, 2025

🧪 CI Insights

Here's what we observed from your CI run for dd620bb.

✅ Passed Jobs With Interesting Signals

Pipeline Job Signal Health on master Retries 🔍 CI Insights 📄 Logs
CI unit-tests Base branch is broken, but the job passed. Looks like this might be a real fix 💪 Broken 0 View View

@pcrespov pcrespov requested a review from wvangeit September 10, 2025 10:24
Copy link
Member

@sanderegg sanderegg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! looking good. please double check with @GitHK about the allowed_errors

Copy link
Contributor

@wvangeit wvangeit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@pcrespov pcrespov requested a review from GitHK September 11, 2025 07:59
Copy link
Contributor

@bisgaard-itis bisgaard-itis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks a lot

Copy link

@pcrespov pcrespov requested a review from GitHK September 11, 2025 09:18
Copy link
Contributor

@GitHK GitHK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go with this

@pcrespov pcrespov merged commit 572751c into ITISFoundation:master Sep 11, 2025
92 of 95 checks passed
@pcrespov pcrespov deleted the mai/remove-rpc-logs branch September 11, 2025 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖-automerge marks PR as ready to be merged for Mergify a:services-library issues on packages/service-libs t:maintenance Some planned maintenance work
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants