Skip to content

Conversation

gaya3-zipstack
Copy link
Contributor

@gaya3-zipstack gaya3-zipstack commented Sep 10, 2025

What

Add safety checks for URLs while configuring

  • adapters and invoking test connections
  • notifications for API/ETL
  • postprocessor webhook URLs in Prompt Studio
  • URLs when using the variable replacement

Why

  • To prevent SSRF attacks

How

  • AdapterProcessor: Added validate_adapter_urls() method for URL security validation
  • AdapterInstance Views: Integrated URL validation during adapter creation
  • **

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

  • Maybe. Explicit checks for backward compatibility of existing adapters is a must.

Database Migrations

  • None required

Env Config

  • Optional: WHITELISTED_ENDPOINTS

Relevant Docs

Related Issues or PRs

Dependencies Versions

SDK version roll is required

Notes on Testing

  • Tested test connection by whitelisting endpoints within docker network (172.16.0.0/12)
  • Tested test connection without whitelisting docker network
  • Tested adapter creation using POST API
  • Tested adapter edit using PUT API
  • Tested configuring notifications for API
  • Tested configuring posprocessor URLs in Prompt Studio
  • Tested using URLs in variable replacement in Prompt Studio

Screenshots

Test connection working with Whiteliest
image

Test connection failing
image

Create adapter failing
image

Update adapter failing
image

API deployment/tool working
image

Postprocessing URL validation working. Check the highlighted areas
image

API webhook notification verification for whtelisted URLs
image

Prompt Studio variable replacement fails when an unsafe URL is given
image

Checklist

I have read and understood the Contribution Guidelines.

Copy link
Contributor

coderabbitai bot commented Sep 10, 2025

Summary by CodeRabbit

  • New Features

    • Adapter URLs are validated before saving or testing.
    • Webhook and post-processing URLs are validated for safety.
    • Automatic default adapter assignment on creation.
    • Platform-provided key flow updates adapter metadata.
    • URL checks added for dynamic variable fetching.
  • Bug Fixes

    • Clearer, user-facing errors for invalid adapter URLs.
    • Safer variable replacement with proper error propagation and early exits on invalid/missing URLs.
  • Refactor

    • Centralized metadata decryption/validation and improved logging.
  • Chores

    • Sample envs include a WHITELISTED_ENDPOINTS setting for local adapters.

Walkthrough

Adds URL-only adapter validation and integrates it into adapter test/create/update flows; ViewSet centralizes decryption, metadata validation, platform-key handling, and default-assignment; URLValidator added to webhook and prompt-service dynamic/postprocessing flows; WHITELISTED_ENDPOINTS env var added.

Changes

Cohort / File(s) Summary
Adapter URL validation utility
backend/adapter_processor_v2/adapter_processor.py
Adds static validate_adapter_urls(adapter_id: str, adapter_metadata: dict) -> Adapter to obtain the adapter class via Adapterkit, instantiate it (triggering URL/config validation without a full connection), defensively copy metadata, log and re-raise errors; includes docstring and minor import/style tweaks.
View-level decryption / validation / platform-key flows
backend/adapter_processor_v2/views.py
Adds imports and private helpers (_decrypt_and_validate_metadata, _validate_adapter_urls, _check_platform_key_usage, _update_metadata_for_platform_key, _set_default_adapter_if_needed, _validate_update_metadata); decrypts adapter_metadata_b, enforces dict type, validates adapter URLs via AdapterProcessor before persisting, supports PLATFORM_PROVIDED_UNSTRACT_KEY (including paid-subscription branch), sets new adapters as user defaults, and surfaces failures as DRF ValidationError.
Notification webhook URL validation
backend/notification_v2/provider/webhook/webhook.py
Calls URLValidator.validate_url during Webhook.validate, logs validation, and raises ValueError with the validator message on invalid URLs (validation now occurs before payload presence check).
Prompt-service postprocessing URL validation
prompt-service/src/unstract/prompt_service/services/answer_prompt.py
Replaces low-level SSRF checks with URLValidator.validate_url for postprocessing webhook URLs; skips postprocessing on invalid URLs and updates metadata with processed highlights when available.
Prompt-service variable replacement defensiveness
prompt-service/src/unstract/prompt_service/controllers/answer_prompt.py
Wraps variable detection/replacement in try/except for BadRequest, logs and publishes an ERROR-level log on failure, and re-raises to halt processing.
Dynamic variable URL validation
prompt-service/src/unstract/prompt_service/helpers/variable_replacement.py and prompt-service/src/unstract/prompt_service/services/variable_replacement.py
Adds URL validation for dynamic variable fetching (URLValidator.validate_url), returns early or raises BadRequest for missing/invalid URLs, and ensures BadRequest propagates from replace_variables_in_prompt.
Environment configuration - backend
backend/sample.env
Adds WHITELISTED_ENDPOINTS (example 10.68.0.10) with comments for whitelisting local/managed adapter endpoints.
Environment configuration - prompt-service
prompt-service/sample.env
Adds WHITELISTED_ENDPOINTS (example 10.68.0.10) with comments for whitelisting local/managed adapter endpoints.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Client
  participant View as AdapterInstanceViewSet
  participant Crypto as Fernet
  participant Proc as AdapterProcessor
  participant Kit as Adapterkit
  participant Adapter as AdapterClass
  participant DRF as DRF

  Client->>View: POST/PUT /adapters (adapter_id, adapter_metadata_b)
  View->>Crypto: decrypt(adapter_metadata_b)
  Crypto-->>View: decrypted_metadata (dict)
  View->>Proc: validate_adapter_urls(adapter_id, decrypted_metadata)
  Proc->>Kit: get_adapter_cls(adapter_id)
  Kit-->>Proc: AdapterClass
  Proc->>Adapter: instantiate(decrypted_metadata)
  Note right of Adapter: construction triggers URL/config validation (no full connection)
  Adapter-->>Proc: success / raises error
  alt validation error
    Proc--x View: exception
    View->>DRF: raise ValidationError("Error testing '<adapter_name>'. <error>.")
    View--x Client: 400 ValidationError
  else valid
    View->>View: continue create/update (platform-key handling, default assignment)
    View-->>Client: 200/201 OK
  end
Loading
sequenceDiagram
  autonumber
  participant Service as PromptService
  participant Validator as URLValidator
  participant Remote as WebhookEndpoint

  Service->>Validator: validate_url(webhook_url)
  alt invalid
    Validator-->>Service: (is_valid=false, error)
    Service->>Service: log and skip postprocessing
  else valid
    Validator-->>Service: (is_valid=true)
    Service->>Remote: POST processed payload
    Remote-->>Service: success/failure
    Service->>Service: update metadata with processed highlights (if any)
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The title is concise and clearly summarizes the primary change—adding URL security validation for adapter configurations—while including the issue key and [FIX] label; it directly matches the main intent and changes described in the PR. It is specific and informative enough for a reviewer scanning history to understand the primary change. Therefore the title meets the repository guidance.
Description Check ✅ Passed The PR description follows the repository template and includes What, Why, How, the "Can this PR break..." section, Database Migrations, Env Config, Related Issues/PRs, Dependencies, Notes on Testing, Screenshots, and the Checklist, with concrete testing evidence and links; this makes the description largely complete and useful for review. The How section appears truncated and should be completed, and Relevant Docs is empty or missing links, which are omissions but non-critical given the provided testing notes and related PR references. Overall the description is sufficiently complete to pass the template check.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/url_safety_net

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between cd129cf and 2f5213f.

📒 Files selected for processing (1)
  • prompt-service/sample.env (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • prompt-service/sample.env

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/adapter_processor_v2/views.py (1)

415-416: Validate URLs on update (backend/adapter_processor_v2/views.py:415–416)
Update() currently skips SSRF URL validation when adapter_metadata_b changes; mirror create() by decrypting and validating URLs before calling super().update.

Apply this diff:

-        # For non-platform-key cases, use the default update behavior
-        return super().update(request, *args, **kwargs)
+        # For non-platform-key cases, pre-validate if metadata is updated
+        adapter = self.get_object()
+        serializer = self.get_serializer(adapter, data=request.data, partial=True)
+        serializer.is_valid(raise_exception=True)
+        adapter_metadata_b = serializer.validated_data.get(AdapterKeys.ADAPTER_METADATA_B)
+        if adapter_metadata_b:
+            from rest_framework.exceptions import ValidationError
+            from cryptography.fernet import Fernet
+            import json
+            try:
+                fernet = Fernet(settings.ENCRYPTION_KEY.encode("utf-8"))
+                decrypted_json = fernet.decrypt(adapter_metadata_b)
+                decrypted_metadata = json.loads(decrypted_json.decode("utf-8"))
+                AdapterProcessor.validate_adapter_urls(
+                    serializer.validated_data.get(AdapterKeys.ADAPTER_ID),
+                    decrypted_metadata
+                )
+            except SdkError as e:
+                from rest_framework.exceptions import ValidationError
+                adapter_name = (
+                    decrypted_metadata.get(AdapterKeys.ADAPTER_NAME, "adapter")
+                    if isinstance(decrypted_metadata, dict) else "adapter"
+                )
+                raise ValidationError(f"Error testing '{adapter_name}'. {e}") from e
+        return super().update(request, *args, **kwargs)
🧹 Nitpick comments (1)
backend/sample.env (1)

204-207: Tighten ALLOWED_ADAPTER_PRIVATE_ENDPOINTS formatting and scope

  • Remove spaces after commas to prevent parsing issues; for example:
-ALLOWED_ADAPTER_PRIVATE_ENDPOINTS="127.0.0.1, 10.68.0.10"
+ALLOWED_ADAPTER_PRIVATE_ENDPOINTS="127.0.0.1,10.68.0.10"

Ensure your config loader trims whitespace as expected.

  • Restrict loopback IPs to development only; if you need IPv6 loopback in dev, explicitly add “::1”.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Knowledge Base: Disabled due to Reviews > Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 4957c58 and 0963f8d.

📒 Files selected for processing (3)
  • backend/adapter_processor_v2/adapter_processor.py (3 hunks)
  • backend/adapter_processor_v2/views.py (3 hunks)
  • backend/sample.env (1 hunks)
🧰 Additional context used
🪛 Ruff (0.12.2)
backend/adapter_processor_v2/adapter_processor.py

166-166: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


166-166: Use explicit conversion flag

Replace with conversion flag

(RUF010)

backend/adapter_processor_v2/views.py

204-204: Use explicit conversion flag

Replace with conversion flag

(RUF010)


205-205: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build
🔇 Additional comments (2)
backend/adapter_processor_v2/views.py (1)

1-1: Imports LGTM

Also applies to: 6-7, 46-46

backend/adapter_processor_v2/adapter_processor.py (1)

139-168: Confirm adapter-level validate_config support and improve logging

  • I didn’t find any validate_config methods in adapter classes—please verify they expose this entrypoint before preferring it, or remove that branch.
  • Switch to logger.exception("URL validation failed for adapter %s: %s", adapter_id, e) to preserve tracebacks and use structured logging rather than f-strings.

@gaya3-zipstack gaya3-zipstack changed the title Add URL security validation for adapter configurations USEC-3 [FIX] Add URL security validation for adapter configurations Sep 10, 2025
gaya3-zipstack and others added 2 commits September 11, 2025 10:40
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Gayathri <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
backend/adapter_processor_v2/views.py (1)

205-208: Message accuracy and leakage: prefer “validating” over “testing” and consider sanitizing details

Small wording fix; optionally hide raw exception text unless DEBUG to avoid exposing internal network/allowlist details.

-            error_detail = f"Error testing '{adapter_name}'. {e!s}"
+            error_detail = f"Error validating '{adapter_name}' configuration."
+            if settings.DEBUG:
+                error_detail += f" {e!s}"
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Knowledge Base: Disabled due to Reviews > Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 0963f8d and 4307bf6.

📒 Files selected for processing (1)
  • backend/adapter_processor_v2/views.py (2 hunks)
🧰 Additional context used
🪛 Ruff (0.12.2)
backend/adapter_processor_v2/views.py

191-191: Avoid specifying long messages outside the exception class

(TRY003)


199-199: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build
🔇 Additional comments (2)
backend/adapter_processor_v2/views.py (2)

183-192: Pre-save SSRF gate on create(): good placement and flow

Validating and decrypting before persistence is the right place to block SSRF. The missing-metadata guard is also correct.


1-7: Config hygiene — add ENCRYPTION_KEY to sample env and docs

settings/base.py requires ENCRYPTION_KEY and multiple modules read settings.ENCRYPTION_KEY (backend/utils/fields.py; backend/adapter_processor_v2/{views,models,serializers,adapter_processor}.py). No sample .env or README entry was found. Add an ENCRYPTION_KEY entry to your sample/.env and document the expected format: a Fernet URL-safe base64 key (generate with cryptography.fernet.Fernet.generate_key()).

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/adapter_processor_v2/adapter_processor.py (1)

129-131: Fix potential KeyError when formatting TestAdapterError.
Use .get(...) with a default to avoid masking the original SDK error.

Apply this diff:

-            raise TestAdapterError(
-                e, adapter_name=adapter_metadata[AdapterKeys.ADAPTER_NAME]
-            )
+            raise TestAdapterError(
+                e,
+                adapter_name=adapter_metadata.get(AdapterKeys.ADAPTER_NAME, "adapter"),
+            )
♻️ Duplicate comments (2)
backend/adapter_processor_v2/views.py (2)

436-452: Harden update() decryption/shape/validation and avoid running when metadata isn’t updated.

  • Only decrypt when metadata_b present.
  • Add dict-shape check like create().
  • Make adapter-name extraction robust.

Apply this diff:

-        # Decrypt metadata to get configuration
-        try:
-            fernet = Fernet(settings.ENCRYPTION_KEY.encode("utf-8"))
-            decrypted_json = fernet.decrypt(adapter_metadata_b)
-            decrypted_metadata = json.loads(decrypted_json.decode("utf-8"))
-        except Exception as e:  # InvalidToken/JSONDecodeError/TypeError/etc.
-            raise ValidationError("Invalid adapter metadata.") from e
-
-        # Validate URLs for this adapter configuration
-        try:
-            _ = AdapterProcessor.validate_adapter_urls(adapter_id, decrypted_metadata)
-        except Exception as e:
-            # Format error message similar to test adapter API
-            adapter_name = decrypted_metadata.get(AdapterKeys.ADAPTER_NAME, "adapter")
-            error_detail = f"Error testing '{adapter_name}'. {e!s}"
-            raise ValidationError(error_detail) from e
+        if adapter_metadata_b:
+            try:
+                fernet = Fernet(settings.ENCRYPTION_KEY.encode("utf-8"))
+                decrypted_json = fernet.decrypt(adapter_metadata_b)
+                decrypted_metadata = json.loads(decrypted_json.decode("utf-8"))
+                if not isinstance(decrypted_metadata, dict):
+                    raise ValidationError("Invalid adapter metadata format: expected JSON object.")
+            except Exception as e:  # InvalidToken/JSONDecodeError/TypeError/etc.
+                raise ValidationError("Invalid adapter metadata.") from e
+            try:
+                AdapterProcessor.validate_adapter_urls(adapter_id, decrypted_metadata)
+            except Exception as e:
+                adapter_name = (
+                    decrypted_metadata.get(AdapterKeys.ADAPTER_NAME, "adapter")
+                    if isinstance(decrypted_metadata, dict)
+                    else "adapter"
+                )
+                error_detail = f"Error testing '{adapter_name}'. {e!s}"
+                raise ValidationError(error_detail) from e

216-236: Remove duplicate URL-validation block.
Validation runs twice; keep one to avoid duplicate work and double-logging.

Apply this diff:

-        # Validate URLs for this adapter configuration
-        try:
-            _ = AdapterProcessor.validate_adapter_urls(adapter_id, decrypted_metadata)
-        except Exception as e:
-            # Format error message similar to test adapter API
-            adapter_name = (
-                decrypted_metadata.get(AdapterKeys.ADAPTER_NAME, "adapter")
-                if isinstance(decrypted_metadata, dict)
-                else "adapter"
-            )
-            error_detail = f"Error testing '{adapter_name}'. {e!s}"
-            raise ValidationError(error_detail) from e
-
-        # Validate URLs for this adapter configuration
-        try:
-            AdapterProcessor.validate_adapter_urls(adapter_id, decrypted_metadata)
-        except Exception as e:
-            # Format error message similar to test adapter API
-            adapter_name = decrypted_metadata.get(AdapterKeys.ADAPTER_NAME, "adapter")
-            error_detail = f"Error testing '{adapter_name}'. {e!s}"
-            raise ValidationError(error_detail) from e
+        try:
+            AdapterProcessor.validate_adapter_urls(adapter_id, decrypted_metadata)
+        except Exception as e:
+            adapter_name = (
+                decrypted_metadata.get(AdapterKeys.ADAPTER_NAME, "adapter")
+                if isinstance(decrypted_metadata, dict)
+                else "adapter"
+            )
+            error_detail = f"Error testing '{adapter_name}'. {e!s}"
+            raise ValidationError(error_detail) from e
🧹 Nitpick comments (6)
backend/adapter_processor_v2/adapter_processor.py (3)

148-177: Use logger.exception and lazy formatting; align docstring with behavior.

  • Prefer logger.exception with %s placeholders (TRY400, RUF010).
  • The docstring says “AdapterError” but the method re-raises generic exceptions; clarify.

Apply this diff:

-        Raises:
-            AdapterError: If URL validation fails due to security violations
+        Raises:
+            Exception: If adapter instantiation/URL validation fails (e.g., security violations)
@@
-        except Exception as e:
-            logger.error(f"URL validation failed for adapter {adapter_id}: {str(e)}")
-            raise
+        except Exception as e:
+            logger.exception("URL validation failed for adapter %s: %s", adapter_id, e)
+            raise

125-125: Remove stale commented code.
# adapter_instance = adapter_class(adapter_metadata) is obsolete after centralizing via validate_adapter_urls.

Apply this diff:

-            # adapter_instance = adapter_class(adapter_metadata)

12-12: Avoid DRF ValidationError in service layer.
Consider raising a domain error (e.g., TestAdapterError) and let views translate to DRF ValidationError to keep layers clean.

backend/adapter_processor_v2/views.py (3)

200-201: Drop inner imports of ValidationError.
Already imported at top; avoid shadowing and repetition.

Apply this diff:

-        from rest_framework.exceptions import ValidationError

414-464: Use appropriate log levels.
The “platform key detected/processing” and “adapter type” logs are flow info, not errors.

Apply this diff:

-            logger.error(f"Platform key flag detected: {use_platform_unstract_key}")
+            logger.info("Platform key flag detected: %s", use_platform_unstract_key)
@@
-            logger.error(f"Adapter type from validated data: {adapter_type}")
+            logger.info("Adapter type from validated data: %s", adapter_type)
@@
-                logger.error("Processing X2TEXT adapter with platform key")
+                logger.info("Processing X2TEXT adapter with platform key")

175-236: DRY: extract “decrypt + shape-check + validate URLs” into a helper.
Both create() and update() repeat the same logic; move into a private method to reduce risk of divergence.

Example (outside the viewset):

def _decrypt_and_validate_metadata(adapter_id: str, token_b: bytes) -> dict:
    fernet = Fernet(settings.ENCRYPTION_KEY.encode("utf-8"))
    decrypted = json.loads(fernet.decrypt(token_b).decode("utf-8"))
    if not isinstance(decrypted, dict):
        raise ValidationError("Invalid adapter metadata format: expected JSON object.")
    AdapterProcessor.validate_adapter_urls(adapter_id, decrypted)
    return decrypted
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Knowledge Base: Disabled due to Reviews > Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 4307bf6 and fb1ea01.

📒 Files selected for processing (2)
  • backend/adapter_processor_v2/adapter_processor.py (4 hunks)
  • backend/adapter_processor_v2/views.py (4 hunks)
🧰 Additional context used
🪛 Ruff (0.12.2)
backend/adapter_processor_v2/adapter_processor.py

175-175: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


175-175: Use explicit conversion flag

Replace with conversion flag

(RUF010)

backend/adapter_processor_v2/views.py

197-197: SyntaxError: Expected an indented block after try statement

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build
🔇 Additional comments (4)
backend/adapter_processor_v2/adapter_processor.py (3)

9-15: Import reflow looks good.


35-37: Safe optional import pattern retained.


115-126: Good: gate test_connection behind constructor-time URL validation.
This closes the SSRF window on the test path and reuses a single instantiation.

backend/adapter_processor_v2/views.py (1)

121-136: Test endpoint path is fine given Processor now gates instantiation.
No duplication needed here.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
backend/adapter_processor_v2/views.py (1)

207-219: Remove duplicate URL validation call in create()

validate_adapter_urls(...) is invoked twice back-to-back. Keep a single guarded call; also no need to assign to _.

-        # Validate URLs for this adapter configuration
-        try:
-            _ = AdapterProcessor.validate_adapter_urls(adapter_id, decrypted_metadata)
-        except Exception as e:
-            # Format error message similar to test adapter API
-            adapter_name = (
-                decrypted_metadata.get(AdapterKeys.ADAPTER_NAME, "adapter")
-                if isinstance(decrypted_metadata, dict)
-                else "adapter"
-            )
-            error_detail = f"Error testing '{adapter_name}'. {e!s}"
-            raise ValidationError(error_detail) from e
-
-        # Validate URLs for this adapter configuration
-        try:
-            AdapterProcessor.validate_adapter_urls(adapter_id, decrypted_metadata)
-        except Exception as e:
-            # Format error message similar to test adapter API
-            adapter_name = decrypted_metadata.get(AdapterKeys.ADAPTER_NAME, "adapter")
-            error_detail = f"Error testing '{adapter_name}'. {e!s}"
-            raise ValidationError(error_detail) from e
+        # Validate URLs for this adapter configuration
+        try:
+            AdapterProcessor.validate_adapter_urls(adapter_id, decrypted_metadata)
+        except Exception as e:
+            adapter_name = (
+                decrypted_metadata.get(AdapterKeys.ADAPTER_NAME, "adapter")
+                if isinstance(decrypted_metadata, dict)
+                else "adapter"
+            )
+            error_detail = f"Error testing '{adapter_name}'. {e!s}"
+            raise ValidationError(error_detail) from e

Also applies to: 220-227

🧹 Nitpick comments (2)
backend/adapter_processor_v2/views.py (2)

405-405: Use info/debug instead of error for expected control flow logs

These are not error conditions; reduce noise in error logs.

-            logger.error(f"Platform key flag detected: {use_platform_unstract_key}")
+            logger.info(f"Platform key flag detected: {use_platform_unstract_key}")
@@
-            logger.error("Processing adapter with platform key")
+            logger.info("Processing adapter with platform key")
@@
-            logger.error(f"Adapter type from validated data: {adapter_type}")
+            logger.info(f"Adapter type from validated data: {adapter_type}")
@@
-                logger.error("Processing X2TEXT adapter with platform key")
+                logger.info("Processing X2TEXT adapter with platform key")

Also applies to: 445-445, 451-451, 454-454


446-448: Avoid double validation of the same serializer in update()

You already validated earlier; reuse it.

-            serializer = self.get_serializer(adapter, data=request.data, partial=True)
-            serializer.is_valid(raise_exception=True)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Knowledge Base: Disabled due to Reviews > Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between fb1ea01 and 08c5044.

📒 Files selected for processing (1)
  • backend/adapter_processor_v2/views.py (4 hunks)
🧰 Additional context used
🪛 Ruff (0.12.2)
backend/adapter_processor_v2/views.py

192-192: Avoid specifying long messages outside the exception class

(TRY003)


201-203: Abstract raise to an inner function

(TRY301)


201-203: Avoid specifying long messages outside the exception class

(TRY003)


205-205: Avoid specifying long messages outside the exception class

(TRY003)


425-425: Avoid specifying long messages outside the exception class

(TRY003)


433-433: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build
🔇 Additional comments (2)
backend/adapter_processor_v2/views.py (2)

186-206: Decryption + JSON shape validation LGTM

Good Fernet decrypt, JSON parse, dict-shape check, and 400 mapping.


130-132: Validation centralized — no extra validation needed at this callsite.

AdapterProcessor.test_adapter(...) invokes AdapterProcessor.validate_adapter_urls(adapter_id, adapter_metadata) in backend/adapter_processor_v2/adapter_processor.py.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Gayathri <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
backend/adapter_processor_v2/views.py (1)

220-227: Remove duplicate validation block (same call and message).

This double-validates and may double-log/obscure tracebacks. Keep the earlier block only.

-        # Validate URLs for this adapter configuration
-        try:
-            AdapterProcessor.validate_adapter_urls(adapter_id, decrypted_metadata)
-        except Exception as e:
-            # Format error message similar to test adapter API
-            adapter_name = decrypted_metadata.get(AdapterKeys.ADAPTER_NAME, "adapter")
-            error_detail = f"Error testing '{adapter_name}'. {e!s}"
-            raise ValidationError(error_detail) from e
#!/bin/bash
# Sanity: ensure there's only one validate_adapter_urls call in create()
rg -nC1 'def create\(' backend/adapter_processor_v2/views.py | sed -n '1,250p'
🧹 Nitpick comments (2)
backend/adapter_processor_v2/views.py (2)

6-7: Import InvalidToken for precise error handling.

Catching the specific Fernet error makes intent clearer and avoids over-broad except Exception.

-from cryptography.fernet import Fernet
+from cryptography.fernet import Fernet, InvalidToken

410-447: DRY the decrypt+validate flow into a helper to reuse across create/update.

Both paths perform identical steps; centralize to reduce risk of drift.

-        if AdapterKeys.ADAPTER_METADATA_B in serializer.validated_data:
-            adapter_id = (
-                serializer.validated_data.get(AdapterKeys.ADAPTER_ID)
-                or adapter.adapter_id
-            )
-            adapter_metadata_b = serializer.validated_data.get(
-                AdapterKeys.ADAPTER_METADATA_B
-            )
-            if not adapter_id or not adapter_metadata_b:
-                raise ValidationError("Missing adapter metadata for validation.")
-            # Decrypt metadata to get configuration
-            try:
-                fernet = Fernet(settings.ENCRYPTION_KEY.encode("utf-8"))
-                decrypted_json = fernet.decrypt(adapter_metadata_b)
-                decrypted_metadata = json.loads(decrypted_json.decode("utf-8"))
-                if not isinstance(decrypted_metadata, dict):
-                    raise ValidationError(
-                        "Invalid adapter metadata format: expected JSON object."
-                    )
-            except Exception as e:  # InvalidToken/JSONDecodeError/TypeError/etc.
-                raise ValidationError("Invalid adapter metadata.") from e
-            # Validate URLs for this adapter configuration
-            try:
-                AdapterProcessor.validate_adapter_urls(adapter_id, decrypted_metadata)
-            except Exception as e:
-                adapter_name = (
-                    decrypted_metadata.get(AdapterKeys.ADAPTER_NAME, "adapter")
-                    if isinstance(decrypted_metadata, dict)
-                    else "adapter"
-                )
-                error_detail = f"Error testing '{adapter_name}'. {e!s}"
-                raise ValidationError(error_detail) from e
+        if AdapterKeys.ADAPTER_METADATA_B in serializer.validated_data:
+            adapter_id = serializer.validated_data.get(AdapterKeys.ADAPTER_ID) or adapter.adapter_id
+            _ = self._decrypt_and_validate_metadata(
+                adapter_id, serializer.validated_data.get(AdapterKeys.ADAPTER_METADATA_B)
+            )

Helper (add inside AdapterInstanceViewSet):

def _decrypt_and_validate_metadata(self, adapter_id: str, adapter_metadata_b: bytes) -> dict:
    if not adapter_id:
        raise ValidationError("Missing adapter ID for validation.")
    if not adapter_metadata_b:
        raise ValidationError("Missing adapter metadata for validation.")
    if isinstance(adapter_metadata_b, str):
        adapter_metadata_b = adapter_metadata_b.encode("utf-8")
    try:
        fernet = Fernet(settings.ENCRYPTION_KEY.encode("utf-8"))
        decrypted_json = fernet.decrypt(adapter_metadata_b)
        decrypted_metadata = json.loads(decrypted_json.decode("utf-8"))
        if not isinstance(decrypted_metadata, dict):
            raise ValidationError("Invalid adapter metadata format: expected JSON object.")
    except Exception as e:
        raise ValidationError("Invalid adapter metadata.") from e
    try:
        AdapterProcessor.validate_adapter_urls(adapter_id, decrypted_metadata)
    except Exception as e:
        adapter_name = (
            decrypted_metadata.get(AdapterKeys.ADAPTER_NAME, "adapter")
            if isinstance(decrypted_metadata, dict)
            else "adapter"
        )
        raise ValidationError(f"Error testing '{adapter_name}'. {e!s}") from e
    return decrypted_metadata
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Knowledge Base: Disabled due to Reviews > Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 08c5044 and db3acff.

📒 Files selected for processing (1)
  • backend/adapter_processor_v2/views.py (4 hunks)
🧰 Additional context used
🪛 Ruff (0.12.2)
backend/adapter_processor_v2/views.py

192-192: Avoid specifying long messages outside the exception class

(TRY003)


201-203: Abstract raise to an inner function

(TRY301)


201-203: Avoid specifying long messages outside the exception class

(TRY003)


205-205: Avoid specifying long messages outside the exception class

(TRY003)


425-425: Avoid specifying long messages outside the exception class

(TRY003)


432-434: Abstract raise to an inner function

(TRY301)


432-434: Avoid specifying long messages outside the exception class

(TRY003)


436-436: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build
🔇 Additional comments (4)
backend/adapter_processor_v2/views.py (4)

1-1: LGTM on json import.


20-20: LGTM on DRF ValidationError import.


26-28: LGTM on OrganizationMemberService import.


207-218: URL validation and error shaping look good.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/adapter_processor_v2/views.py (1)

420-451: Re-validate on update() when platform-key path mutates metadata; adjust log level

Mirror create() behavior so URLs are gated after mutation; also use info/debug, not error, for control-flow logs.

-        if use_platform_unstract_key:
-            logger.error("Processing adapter with platform key")
+        if use_platform_unstract_key:
+            logger.info("Processing adapter with platform key")
             adapter_type = serializer.validated_data.get(AdapterKeys.ADAPTER_TYPE)
-            logger.error(f"Adapter type from validated data: {adapter_type}")
+            logger.debug("Adapter type from validated data: %s", adapter_type)
@@
-            self._update_metadata_for_platform_key(
+            updated_b = self._update_metadata_for_platform_key(
                 serializer.validated_data,
                 adapter_type,
                 is_paid_subscription=True,
             )
+            if updated_b is not None:
+                # Validate final, mutated metadata before save
+                final_md = self._decrypt_and_validate_metadata(updated_b)
+                final_adapter_id = (
+                    serializer.validated_data.get(AdapterKeys.ADAPTER_ID)
+                    or adapter.adapter_id
+                )
+                self._validate_adapter_urls(final_adapter_id, final_md)
♻️ Duplicate comments (1)
backend/adapter_processor_v2/views.py (1)

175-192: Accept str or bytes for metadata; normalize before decrypt

Harden decryption input handling and keep 400s instead of 500s on wrong type.

-    def _decrypt_and_validate_metadata(self, adapter_metadata_b: bytes) -> dict[str, Any]:
+    def _decrypt_and_validate_metadata(self, adapter_metadata_b: bytes | str) -> dict[str, Any]:
@@
-        if not adapter_metadata_b:
+        if not adapter_metadata_b:
             raise ValidationError("Missing adapter metadata for validation.")
+        if isinstance(adapter_metadata_b, str):
+            adapter_metadata_b = adapter_metadata_b.encode("utf-8")
@@
-            fernet = Fernet(settings.ENCRYPTION_KEY.encode("utf-8"))
+            fernet = Fernet(settings.ENCRYPTION_KEY.encode("utf-8"))
             decrypted_json = fernet.decrypt(adapter_metadata_b)
             decrypted_metadata = json.loads(decrypted_json.decode("utf-8"))
🧹 Nitpick comments (5)
backend/adapter_processor_v2/adapter_processor.py (2)

147-176: Use logging.exception with parameterized args; align docstring with behavior

Leverage lazy formatting and proper traceback; docstring claims AdapterError but you re-raise whatever came from adapter init.

-        Raises:
-            AdapterError: If URL validation fails due to security violations
+        Raises:
+            Exception: Bubbles up adapter initialization/validation errors
@@
-        except Exception as e:
-            logger.error(f"URL validation failed for adapter {adapter_id}: {str(e)}")
-            raise
+        except Exception as e:
+            logger.exception("URL validation failed for adapter %s: %s", adapter_id, e)
+            raise

12-12: Service layer throwing DRF ValidationError couples domains

AdapterProcessor is a service; prefer raising domain exceptions (e.g., ValidationFailed) and map to DRF ValidationError at the API boundary.

backend/adapter_processor_v2/views.py (3)

204-211: Derive platform-key usage from decrypted metadata, not raw request

Request may omit plaintext metadata; you already decrypt. Recommend using decrypted_metadata in create/update.

Example in create():

-        use_platform_unstract_key = self._check_platform_key_usage(request.data)
@@
-        decrypted_metadata = self._decrypt_and_validate_metadata(adapter_metadata_b)
+        decrypted_metadata = self._decrypt_and_validate_metadata(adapter_metadata_b)
+        use_platform_unstract_key = bool(
+            decrypted_metadata.get(AdapterKeys.PLATFORM_PROVIDED_UNSTRACT_KEY, False)
+        )

Consider deprecating _check_platform_key_usage or changing it to accept decrypted metadata.


212-227: Return updated bytes and signal mutation from _update_metadata_for_platform_key

Today it mutates serializer_validated_data in-place, making revalidation harder to reason about. Prefer returning updated bytes and let callers assign.

-    ) -> None:
+    ) -> bytes | None:
@@
-        if adapter_type == AdapterKeys.X2TEXT:
+        if adapter_type == AdapterKeys.X2TEXT:
             adapter_metadata_b = serializer_validated_data.get(
                 AdapterKeys.ADAPTER_METADATA_B
             )
             updated_metadata_b = AdapterProcessor.update_adapter_metadata(
                 adapter_metadata_b, is_paid_subscription=is_paid_subscription
             )
-            serializer_validated_data[AdapterKeys.ADAPTER_METADATA_B] = updated_metadata_b
+            serializer_validated_data[AdapterKeys.ADAPTER_METADATA_B] = updated_metadata_b
+            return updated_metadata_b
+        return None

289-307: Chain original IntegrityError for better diagnostics

Add exception chaining per Ruff B904.

-        except IntegrityError:
+        except IntegrityError as e:
             raise DuplicateAdapterNameError(
                 name=serializer.validated_data.get(AdapterKeys.ADAPTER_NAME)
-            )
+            ) from e
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between db3acff and 9310c68.

📒 Files selected for processing (2)
  • backend/adapter_processor_v2/adapter_processor.py (4 hunks)
  • backend/adapter_processor_v2/views.py (4 hunks)
🧰 Additional context used
🪛 Ruff (0.12.2)
backend/adapter_processor_v2/adapter_processor.py

174-174: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


174-174: Use explicit conversion flag

Replace with conversion flag

(RUF010)

backend/adapter_processor_v2/views.py

178-178: Avoid specifying long messages outside the exception class

(TRY003)


186-188: Abstract raise to an inner function

(TRY301)


186-188: Avoid specifying long messages outside the exception class

(TRY003)


189-189: Consider moving this statement to an else block

(TRY300)


191-191: Avoid specifying long messages outside the exception class

(TRY003)


268-268: Avoid specifying long messages outside the exception class

(TRY003)


305-307: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build
🔇 Additional comments (4)
backend/adapter_processor_v2/adapter_processor.py (1)

169-172: Instantiation-as-validation can have side effects

backend/adapter_processor_v2/adapter_processor.py (ln 169–172) returns adapter_class(adapter_metadata). If any adapter init performs network I/O, SSRF checks won't gate it — verify all adapter constructors are side‑effect free and only validate config/URLs. If any constructor does I/O, add an SDK method (e.g., validate_config(validate_urls_only=True)) and call that here instead of instantiating.

Auto-scan didn't locate adapter implementations (repo search returned only backend/adapter_processor_v2/* and tool-registry files). Re-check locally for adapter classes and network calls in init, e.g.:

rg -n --hidden -S --type py "class .*Adapter"
rg -n --hidden -S --type py "requests|httpx|urllib|socket|aiohttp|urllib3|boto3|botocore" -C3

backend/adapter_processor_v2/views.py (3)

193-203: LGTM: Centralized URL validation and consistent error shaping

Good: funnels adapter errors to DRF ValidationError with safe adapter-name lookup.


228-251: LGTM: Default-adapter bootstrap is correct and idempotent

Mapping and get_or_create flow look good.


252-274: LGTM: Update-time validation only when metadata is present

Nice guard; avoids UnboundLocal errors and unnecessary decryptions.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/adapter_processor_v2/views.py (1)

420-448: Update: use decrypted flag and re-validate after platform-key mutation; lower log level.

Ensures SSRF gate covers mutated config; avoids noisy error logs for normal flow.

Apply:

-        use_platform_unstract_key = self._check_platform_key_usage(request.data)
+        use_platform_unstract_key = self._check_platform_key_usage(request.data)
         adapter = self.get_object()
 
         # Get serializer and validate data
         serializer = self.get_serializer(adapter, data=request.data, partial=True)
         serializer.is_valid(raise_exception=True)
 
-        # Validate metadata if being updated
-        _, _ = self._validate_update_metadata(serializer.validated_data, adapter)
+        # Validate metadata if being updated; capture for platform-key decision
+        adapter_id, decrypted_md = self._validate_update_metadata(
+            serializer.validated_data, adapter
+        )
 
         # Handle platform key updates
-        if use_platform_unstract_key:
-            logger.error("Processing adapter with platform key")
+        # Prefer the flag from decrypted metadata when available
+        if decrypted_md is not None:
+            use_platform_unstract_key = bool(
+                decrypted_md.get(AdapterKeys.PLATFORM_PROVIDED_UNSTRACT_KEY, False)
+            )
+        if use_platform_unstract_key:
+            logger.info("Processing adapter with platform key")
             adapter_type = serializer.validated_data.get(AdapterKeys.ADAPTER_TYPE)
-            logger.error(f"Adapter type from validated data: {adapter_type}")
+            logger.debug("Adapter type from validated data: %s", adapter_type)
 
             # Update metadata for platform key usage
             self._update_metadata_for_platform_key(
                 serializer.validated_data,
                 adapter_type,
                 is_paid_subscription=True,
             )
 
+            # Re-validate mutated metadata before save
+            updated_b = serializer.validated_data.get(AdapterKeys.ADAPTER_METADATA_B)
+            if updated_b and adapter_id:
+                final_md = self._decrypt_and_validate_metadata(updated_b)
+                self._validate_adapter_urls(adapter_id, final_md)
+
             # Save and return updated instance
             serializer.save()
             return Response(serializer.data)
 
         # For non-platform-key cases, use the default update behavior
         return super().update(request, *args, **kwargs)
♻️ Duplicate comments (1)
backend/adapter_processor_v2/views.py (1)

275-303: Create: add adapter_id guard, derive platform-key from decrypted metadata, and re-validate after mutation.

Prevents 500s/false negatives and ensures SSRF gate applies to final saved config. Also chain IntegrityError.

Apply:

-    def create(self, request: Any) -> Response:
-        serializer = self.get_serializer(data=request.data)
-        use_platform_unstract_key = self._check_platform_key_usage(request.data)
+    def create(self, request: Request) -> Response:
+        serializer = self.get_serializer(data=request.data)
 
         serializer.is_valid(raise_exception=True)
 
         # Extract and validate metadata
         adapter_id = serializer.validated_data.get(AdapterKeys.ADAPTER_ID)
         adapter_metadata_b = serializer.validated_data.get(AdapterKeys.ADAPTER_METADATA_B)
+        if not adapter_id:
+            raise ValidationError("Missing adapter ID for validation.")
         decrypted_metadata = self._decrypt_and_validate_metadata(adapter_metadata_b)
 
         # Validate URLs for security
         self._validate_adapter_urls(adapter_id, decrypted_metadata)
 
         try:
             adapter_type = serializer.validated_data.get(AdapterKeys.ADAPTER_TYPE)
 
-            # Update metadata if using platform key
-            if use_platform_unstract_key:
+            # Decide platform-key usage from decrypted metadata to avoid false negatives
+            use_platform_unstract_key = bool(
+                decrypted_metadata.get(AdapterKeys.PLATFORM_PROVIDED_UNSTRACT_KEY, False)
+            )
+            # Update metadata if using platform key, then re-validate the final payload
+            if use_platform_unstract_key:
                 self._update_metadata_for_platform_key(
                     serializer.validated_data, adapter_type
                 )
+                updated_b = serializer.validated_data.get(AdapterKeys.ADAPTER_METADATA_B)
+                if updated_b:
+                    final_md = self._decrypt_and_validate_metadata(updated_b)
+                    self._validate_adapter_urls(adapter_id, final_md)
 
             # Save the adapter instance
             instance = serializer.save()
 
             # Set as default adapter if needed
             self._set_default_adapter_if_needed(instance, adapter_type, request.user.id)
 
-        except IntegrityError:
-            raise DuplicateAdapterNameError(
+        except IntegrityError as e:
+            raise DuplicateAdapterNameError(
                 name=serializer.validated_data.get(AdapterKeys.ADAPTER_NAME)
-            )
+            ) from e
🧹 Nitpick comments (3)
backend/adapter_processor_v2/views.py (3)

175-192: Normalize encrypted payload type before decryption.

Handle str tokens to avoid TypeError when clients send base64 tokens as text; keep behavior identical for bytes.

Apply:

-    def _decrypt_and_validate_metadata(self, adapter_metadata_b: bytes) -> dict[str, Any]:
+    def _decrypt_and_validate_metadata(self, adapter_metadata_b: bytes | str) -> dict[str, Any]:
         """Decrypt adapter metadata and validate its format."""
         if not adapter_metadata_b:
             raise ValidationError("Missing adapter metadata for validation.")
 
         try:
+            if isinstance(adapter_metadata_b, str):
+                adapter_metadata_b = adapter_metadata_b.encode("utf-8")
             fernet = Fernet(settings.ENCRYPTION_KEY.encode("utf-8"))
             decrypted_json = fernet.decrypt(adapter_metadata_b)
             decrypted_metadata = json.loads(decrypted_json.decode("utf-8"))

204-211: Platform-key detection may miss encrypted-only payloads.

This reads plaintext AdapterKeys.ADAPTER_METADATA from request.data; if clients only send encrypted metadata_b, detection returns False. Prefer deriving from decrypted metadata in create/update (see diffs below).

Would clients ever omit plaintext AdapterKeys.ADAPTER_METADATA in create/update?


252-274: Update validation helper: OK (minor nits).

Logic is sound. Small nit: callers don’t need the tuple; they can ignore the return or capture it (see update() change).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 9310c68 and ea273a1.

📒 Files selected for processing (1)
  • backend/adapter_processor_v2/views.py (4 hunks)
🧰 Additional context used
🪛 Ruff (0.12.2)
backend/adapter_processor_v2/views.py

178-178: Avoid specifying long messages outside the exception class

(TRY003)


186-188: Abstract raise to an inner function

(TRY301)


186-188: Avoid specifying long messages outside the exception class

(TRY003)


189-189: Consider moving this statement to an else block

(TRY300)


191-191: Avoid specifying long messages outside the exception class

(TRY003)


268-268: Avoid specifying long messages outside the exception class

(TRY003)


305-307: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build
🔇 Additional comments (3)
backend/adapter_processor_v2/views.py (3)

1-1: Imports: looks good.

New imports (Fernet, settings, ValidationError, OrganizationMemberService) are appropriate for the added logic.

Also applies to: 6-7, 20-20, 26-28


193-203: URL validation wrapper is fine.

Catches SDK errors and maps to DRF ValidationError with context; chaining preserved.


228-251: Default-adapter setter: OK.

Simple, idempotent mapping; no concerns.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
backend/adapter_processor_v2/adapter_processor.py (1)

115-125: Pre‑validation is right; avoid mutating caller metadata and guard missing ADAPTER_TYPE.

adapter_metadata.pop(AdapterKeys.ADAPTER_TYPE) will KeyError and you’re mutating the caller’s dict before validation. Make a defensive copy and pop with a default; if absent, raise a clean ValidationError. Also use safe .get when building the TestAdapterError below.

Apply:

 def test_adapter(adapter_id: str, adapter_metadata: dict[str, Any]) -> bool:
   logger.info(f"Testing adapter: {adapter_id}")
   try:
-      if adapter_metadata.pop(AdapterKeys.ADAPTER_TYPE) == AdapterKeys.X2TEXT:
+      # Defensive copy; do not mutate caller dict
+      adapter_metadata = dict(adapter_metadata)
+      adapter_type = adapter_metadata.pop(AdapterKeys.ADAPTER_TYPE, None)
+      if adapter_type == AdapterKeys.X2TEXT:
           if (
               adapter_metadata.get(AdapterKeys.PLATFORM_PROVIDED_UNSTRACT_KEY)
               and add_unstract_key
           ):
               adapter_metadata = add_unstract_key(adapter_metadata)
           adapter_metadata[X2TextConstants.X2TEXT_HOST] = settings.X2TEXT_HOST
           adapter_metadata[X2TextConstants.X2TEXT_PORT] = settings.X2TEXT_PORT
           platform_key = PlatformAuthenticationService.get_active_platform_key()
           adapter_metadata[X2TextConstants.PLATFORM_SERVICE_API_KEY] = str(
               platform_key.key
           )
+      elif adapter_type is None:
+          adapter_name = adapter_metadata.get(AdapterKeys.ADAPTER_NAME, "adapter")
+          raise ValidationError(
+              f"Missing '{AdapterKeys.ADAPTER_TYPE}' for '{adapter_name}'."
+          )
@@
-      except SdkError as e:
-          raise TestAdapterError(
-              e, adapter_name=adapter_metadata[AdapterKeys.ADAPTER_NAME]
-          )
+      except SdkError as e:
+          raise TestAdapterError(
+              e, adapter_name=adapter_metadata.get(AdapterKeys.ADAPTER_NAME, "adapter")
+          )
🧹 Nitpick comments (1)
backend/adapter_processor_v2/adapter_processor.py (1)

174-176: Ruff RUF010: avoid str(e) in f-strings.

Use conversion flags or logger’s %s formatting; your updated logger.exception above addresses this.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between ea273a1 and 6985184.

📒 Files selected for processing (2)
  • backend/adapter_processor_v2/adapter_processor.py (4 hunks)
  • backend/sample.env (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/sample.env
🧰 Additional context used
🪛 Ruff (0.12.2)
backend/adapter_processor_v2/adapter_processor.py

175-175: Use explicit conversion flag

Replace with conversion flag

(RUF010)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build
🔇 Additional comments (3)
backend/adapter_processor_v2/adapter_processor.py (3)

9-15: Import changes look good.

DRF ValidationError import and the multiline import style are fine.


34-37: Plugin import reflow is fine.

Keeps readability without behavior change.


147-179: Good separation: dedicated URL‑validation path.

Returning a pre‑validated Adapter instance to reuse for test_connection() is a clean design.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Gayathri <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/adapter_processor_v2/views.py (1)

425-456: Add post-mutation URL re-validation in update() for platform-key path

Platform-key mutation may alter URLs. Re-decrypt and re-validate before save to keep SSRF gate intact.

         # Handle platform key updates
         if use_platform_unstract_key:
             logger.error("Processing adapter with platform key")
             adapter_type = serializer.validated_data.get(AdapterKeys.ADAPTER_TYPE)
             logger.error(f"Adapter type from validated data: {adapter_type}")

             # Update metadata for platform key usage
-            self._update_metadata_for_platform_key(
+            updated_b = self._update_metadata_for_platform_key(
                 serializer.validated_data,
                 adapter_type,
                 is_paid_subscription=True,
             )
+
+            # Re-validate post-mutation metadata before persisting
+            reval_b = (
+                updated_b
+                if updated_b is not None
+                else serializer.validated_data.get(AdapterKeys.ADAPTER_METADATA_B)
+            )
+            if reval_b:
+                final_md = self._decrypt_and_validate_metadata(reval_b)
+                self._validate_adapter_urls(
+                    serializer.validated_data.get(AdapterKeys.ADAPTER_ID)
+                    or adapter.adapter_id,
+                    final_md,
+                )

             # Save and return updated instance
             serializer.save()
             return Response(serializer.data)
♻️ Duplicate comments (2)
backend/adapter_processor_v2/views.py (2)

212-227: Bug: helper always returns None; callers expect bytes for re-validation

_update_metadata_for_platform_key mutates serializer_validated_data but returns None. Callers use its return to trigger post-mutation re-validation, which is then silently skipped. Return the updated blob.

-    def _update_metadata_for_platform_key(
+    def _update_metadata_for_platform_key(
         self,
         serializer_validated_data: dict[str, Any],
         adapter_type: str,
         is_paid_subscription: bool = False,
-    ) -> None:
+    ) -> bytes | None:
         """Update adapter metadata when using platform key."""
         if adapter_type == AdapterKeys.X2TEXT:
             adapter_metadata_b = serializer_validated_data.get(
                 AdapterKeys.ADAPTER_METADATA_B
             )
             updated_metadata_b = AdapterProcessor.update_adapter_metadata(
                 adapter_metadata_b, is_paid_subscription=is_paid_subscription
             )
             serializer_validated_data[AdapterKeys.ADAPTER_METADATA_B] = updated_metadata_b
+            return updated_metadata_b
+        return None

275-301: Critical: post-mutation URL validation is currently bypassed in create()

Because the helper returns None, the re-validation never runs. Even after fixing the helper to return bytes, keep the check robust.

         # Update metadata if using platform key
         if use_platform_unstract_key:
-            updated_b = self._update_metadata_for_platform_key(
+            updated_b = self._update_metadata_for_platform_key(
                 serializer.validated_data, adapter_type
             )
-            if updated_b is not None:
-                # Re-validate post-mutation metadata before save
-                final_md = self._decrypt_and_validate_metadata(updated_b)
-                self._validate_adapter_urls(adapter_id, final_md)
+            # Re-validate post-mutation metadata before save (guard either path)
+            reval_b = (
+                updated_b
+                if updated_b is not None
+                else serializer.validated_data.get(AdapterKeys.ADAPTER_METADATA_B)
+            )
+            if reval_b:
+                final_md = self._decrypt_and_validate_metadata(reval_b)
+                self._validate_adapter_urls(adapter_id, final_md)
🧹 Nitpick comments (5)
backend/adapter_processor_v2/views.py (5)

175-192: Normalize metadata to bytes before decrypting (accept str) and keep error surface stable

If adapter_metadata_b arrives as a str (serializer/transport variance), Fernet will raise TypeError. Normalize to bytes for predictable 400s.

 def _decrypt_and_validate_metadata(self, adapter_metadata_b: bytes) -> dict[str, Any]:
     """Decrypt adapter metadata and validate its format."""
     if not adapter_metadata_b:
         raise ValidationError("Missing adapter metadata for validation.")

     try:
-        fernet = Fernet(settings.ENCRYPTION_KEY.encode("utf-8"))
-        decrypted_json = fernet.decrypt(adapter_metadata_b)
+        if isinstance(adapter_metadata_b, str):
+            adapter_metadata_b = adapter_metadata_b.encode("utf-8")
+        fernet = Fernet(settings.ENCRYPTION_KEY.encode("utf-8"))
+        decrypted_json = fernet.decrypt(adapter_metadata_b)
         decrypted_metadata = json.loads(decrypted_json.decode("utf-8"))

         if not isinstance(decrypted_metadata, dict):
             raise ValidationError(
                 "Invalid adapter metadata format: expected JSON object."
             )
         return decrypted_metadata
     except Exception as e:
         raise ValidationError("Invalid adapter metadata.") from e

193-203: Tighten exception scope or log original error; include adapter_id in detail

Catching bare Exception obscures failure mode. If feasible, catch the SDK’s specific error type; at minimum, log and include adapter_id in the user-facing detail for quicker triage.

     try:
         AdapterProcessor.validate_adapter_urls(adapter_id, decrypted_metadata)
     except Exception as e:
-        adapter_name = decrypted_metadata.get(AdapterKeys.ADAPTER_NAME, "adapter")
-        error_detail = f"Error testing '{adapter_name}'. {e!s}"
+        adapter_name = decrypted_metadata.get(AdapterKeys.ADAPTER_NAME, "adapter")
+        error_detail = (
+            f"Error testing '{adapter_name}' (adapter_id={adapter_id}). {e!s}"
+        )
         raise ValidationError(error_detail) from e

252-274: Update-path validation looks good; minor guard suggestion

LGTM on decrypt+URL-validate only when metadata is present. Consider also asserting adapter_id truthiness before calling _validate_adapter_urls for clearer 400s if upstream validation ever loosens.


308-315: Remove duplicate IntegrityError handler; unreachable and flagged by Ruff (B025/B904)

Two except blocks for the same exception; the second is dead code and lacks exception chaining.

-        except IntegrityError as e:
-            raise DuplicateAdapterNameError(
-                name=serializer.validated_data.get(AdapterKeys.ADAPTER_NAME)
-            ) from e
-        except IntegrityError:
-            raise DuplicateAdapterNameError(
-                name=serializer.validated_data.get(AdapterKeys.ADAPTER_NAME)
-            )
+        except IntegrityError as e:
+            raise DuplicateAdapterNameError(
+                name=serializer.validated_data.get(AdapterKeys.ADAPTER_NAME)
+            ) from e

281-288: Guard adapter_id presence before URL validation

If adapter_id is missing/empty, surfacing a clear 400 is preferable to passing None downstream.

         # Extract and validate metadata
         adapter_id = serializer.validated_data.get(AdapterKeys.ADAPTER_ID)
         adapter_metadata_b = serializer.validated_data.get(AdapterKeys.ADAPTER_METADATA_B)
         decrypted_metadata = self._decrypt_and_validate_metadata(adapter_metadata_b)

         # Validate URLs for security (pre-mutation)
-        self._validate_adapter_urls(adapter_id, decrypted_metadata)
+        if not adapter_id:
+            raise ValidationError("Missing adapter ID for validation.")
+        self._validate_adapter_urls(adapter_id, decrypted_metadata)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 6985184 and 082f56f.

📒 Files selected for processing (1)
  • backend/adapter_processor_v2/views.py (4 hunks)
🧰 Additional context used
🪛 Ruff (0.12.2)
backend/adapter_processor_v2/views.py

178-178: Avoid specifying long messages outside the exception class

(TRY003)


186-188: Abstract raise to an inner function

(TRY301)


186-188: Avoid specifying long messages outside the exception class

(TRY003)


189-189: Consider moving this statement to an else block

(TRY300)


191-191: Avoid specifying long messages outside the exception class

(TRY003)


268-268: Avoid specifying long messages outside the exception class

(TRY003)


312-312: try-except block with duplicate exception IntegrityError

(B025)


313-315: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build
🔇 Additional comments (1)
backend/adapter_processor_v2/views.py (1)

121-136: Validation confirmed: test_adapter() calls validate_adapter_urls before any network I/O. validate_adapter_urls(...) is invoked at backend/adapter_processor_v2/adapter_processor.py:117–119; adapter_instance.test_connection() (the network call) runs afterward at line 125.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
backend/adapter_processor_v2/adapter_processor.py (1)

102-116: Guard against missing ADAPTER_TYPE; avoid KeyError and make failure user-facing

pop(AdapterKeys.ADAPTER_TYPE) will 500 when the field is absent. Use a safe pop and raise a DRF ValidationError when type is missing; keep X2TEXT branch intact.

-            # Defensive copy; don't mutate caller dict
-            adapter_metadata = dict(adapter_metadata)
-            if adapter_metadata.pop(AdapterKeys.ADAPTER_TYPE) == AdapterKeys.X2TEXT:
+            # Defensive copy; don't mutate caller dict
+            adapter_metadata = dict(adapter_metadata)
+            adapter_type = adapter_metadata.pop(AdapterKeys.ADAPTER_TYPE, None)
+            if adapter_type is None:
+                raise ValidationError("Missing adapter type.")
+            if adapter_type == AdapterKeys.X2TEXT:
backend/adapter_processor_v2/views.py (2)

286-297: Re-validate post-mutation metadata before save (close SSRF gap)
Platform-key mutation can alter URLs. Re-run decrypt + URL validation on the final payload.

         # Validate URLs for security (pre-mutation)
         self._validate_adapter_urls(adapter_id, decrypted_metadata)
 
         try:
             adapter_type = serializer.validated_data.get(AdapterKeys.ADAPTER_TYPE)
 
             # Update metadata if using platform key
             if use_platform_unstract_key:
                 self._update_metadata_for_platform_key(
                     serializer.validated_data, adapter_type
                 )
+                # Re-validate final (mutated) metadata before persisting
+                final_b = serializer.validated_data.get(AdapterKeys.ADAPTER_METADATA_B)
+                final_md = self._decrypt_and_validate_metadata(final_b)
+                self._validate_adapter_urls(adapter_id, final_md)
 
             # Save the adapter instance
             instance = serializer.save()

427-445: Re-validate mutated metadata in platform-key update path
After _update_metadata_for_platform_key, decrypt and validate again prior to saving.

         # Handle platform key updates
         if use_platform_unstract_key:
-            logger.error("Processing adapter with platform key")
+            logger.info("Processing adapter with platform key")
             adapter_type = serializer.validated_data.get(AdapterKeys.ADAPTER_TYPE)
-            logger.error(f"Adapter type from validated data: {adapter_type}")
+            logger.debug("Adapter type from validated data: %s", adapter_type)
 
             # Update metadata for platform key usage
             self._update_metadata_for_platform_key(
                 serializer.validated_data,
                 adapter_type,
                 is_paid_subscription=True,
             )
 
+            # Re-validate final (mutated) metadata before save
+            final_b = serializer.validated_data.get(AdapterKeys.ADAPTER_METADATA_B)
+            adapter_id = serializer.validated_data.get(AdapterKeys.ADAPTER_ID) or adapter.adapter_id
+            final_md = self._decrypt_and_validate_metadata(final_b)
+            self._validate_adapter_urls(adapter_id, final_md)
+
             # Save and return updated instance
             serializer.save()
             return Response(serializer.data)
🧹 Nitpick comments (5)
backend/adapter_processor_v2/adapter_processor.py (3)

117-129: Use try/except/else for clearer flow (silences TRY300)
Return from the else block; keeps adapter_instance scoped to success path.

-            # Validate URLs for this adapter configuration
-            try:
+            # Validate URLs for this adapter configuration
+            try:
                 adapter_instance = AdapterProcessor.validate_adapter_urls(
                     adapter_id, adapter_metadata
                 )
-            except Exception as e:
+            except Exception as e:
                 # Format error message similar to test adapter API
                 adapter_name = adapter_metadata.get(AdapterKeys.ADAPTER_NAME, "adapter")
                 error_detail = f"Error testing '{adapter_name}'. {e!s}"
                 raise ValidationError(error_detail) from e
-            test_result: bool = adapter_instance.test_connection()
-            return test_result
+            else:
+                return adapter_instance.test_connection()

100-101: Prefer parameterized logging
Avoid f-strings with loggers; enables lazy formatting.

-        logger.info(f"Testing adapter: {adapter_id}")
+        logger.info("Testing adapter: %s", adapter_id)

150-181: Tighten typing, fix docstring ‘Raises’, and use logger.exception (RUF010)

  • Use dict[str, Any].
  • Docstring mentions AdapterError but the function raises generic exceptions.
  • Switch to logger.exception("... %s", adapter_id) and drop eager str(e).
-    def validate_adapter_urls(adapter_id: str, adapter_metadata: dict) -> Adapter:
+    def validate_adapter_urls(adapter_id: str, adapter_metadata: dict[str, Any]) -> Adapter:
@@
-        Raises:
-            AdapterError: If URL validation fails due to security violations
+        Raises:
+            Exception: If URL validation fails due to security violations.
         """
         try:
             # Get the adapter class
-            adapterkit = Adapterkit()
-            adapter_class = adapterkit.get_adapter_class_by_adapter_id(adapter_id)
+            adapter_kit = Adapterkit()
+            adapter_class = adapter_kit.get_adapter_class_by_adapter_id(adapter_id)
@@
-        except Exception as e:
-            logger.error(
-                f"URL validation failed for adapter {adapter_id}: {str(e)}",
-                exc_info=True,
-            )
-            raise
+        except Exception:
+            logger.exception("URL validation failed for adapter %s", adapter_id)
+            raise
backend/adapter_processor_v2/views.py (2)

175-191: Normalize str/bytes before decrypt; broaden input type
If clients send base64/utf-8 string tokens, decrypt will fail. Accept bytes | str and encode when needed.

-    def _decrypt_and_validate_metadata(self, adapter_metadata_b: bytes) -> dict[str, Any]:
+    def _decrypt_and_validate_metadata(self, adapter_metadata_b: bytes | str) -> dict[str, Any]:
         """Decrypt adapter metadata and validate its format."""
         if not adapter_metadata_b:
             raise ValidationError("Missing adapter metadata for validation.")
 
         try:
+            if isinstance(adapter_metadata_b, str):
+                adapter_metadata_b = adapter_metadata_b.encode("utf-8")
             fernet = Fernet(settings.ENCRYPTION_KEY.encode("utf-8"))
             decrypted_json = fernet.decrypt(adapter_metadata_b)
             decrypted_metadata = json.loads(decrypted_json.decode("utf-8"))

304-307: Preserve exception context when translating IntegrityError (B904)
Chain the original error.

-        except IntegrityError:
-            raise DuplicateAdapterNameError(
+        except IntegrityError as e:
+            raise DuplicateAdapterNameError(
                 name=serializer.validated_data.get(AdapterKeys.ADAPTER_NAME)
-            )
+            ) from e
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 082f56f and d78e5fb.

📒 Files selected for processing (2)
  • backend/adapter_processor_v2/adapter_processor.py (5 hunks)
  • backend/adapter_processor_v2/views.py (4 hunks)
🧰 Additional context used
🪛 Ruff (0.12.2)
backend/adapter_processor_v2/adapter_processor.py

128-128: Consider moving this statement to an else block

(TRY300)


178-178: Use explicit conversion flag

Replace with conversion flag

(RUF010)

backend/adapter_processor_v2/views.py

178-178: Avoid specifying long messages outside the exception class

(TRY003)


186-188: Abstract raise to an inner function

(TRY301)


186-188: Avoid specifying long messages outside the exception class

(TRY003)


189-189: Consider moving this statement to an else block

(TRY300)


191-191: Avoid specifying long messages outside the exception class

(TRY003)


268-268: Avoid specifying long messages outside the exception class

(TRY003)


305-307: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build
🔇 Additional comments (1)
backend/adapter_processor_v2/views.py (1)

193-203: Surface adapter name safely in errors — LGTM
Error shaping is consistent with test flow; chaining preserved.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
prompt-service/sample.env (1)

69-71: Default to empty allowlist and document format (IPs/CIDRs/hosts).

Avoid shipping a hardcoded IP in sample.env; show an example instead and keep default safe.

Apply:

-# Whitelisting 10.68.0.10 to allow frictionless adapter connection to
-# managed Postgres for VectorDB
-WHITELISTED_ENDPOINTS="10.68.0.10"
+# Optional: comma-separated allowlist of adapter endpoints (IPs, CIDRs, or hostnames).
+# Example: "10.68.0.10,172.16.0.0/12,db.example.com"
+WHITELISTED_ENDPOINTS=""
prompt-service/src/unstract/prompt_service/services/answer_prompt.py (1)

292-309: Tighten exception handling and make webhook timeout configurable.

Catching Exception is broad (Ruff BLE001). Use specific exceptions and read timeout from env for ops control.

Apply:

-                                processed_data, updated_highlight_data = postprocess_data(
+                                processed_data, updated_highlight_data = postprocess_data(
                                     parsed_data,
                                     webhook_enabled=True,
                                     webhook_url=webhook_url,
                                     highlight_data=highlight_data,
-                                    timeout=60,
+                                    timeout=int(get_env_or_die("POSTPROCESS_WEBHOOK_TIMEOUT", "60")),
                                 )
-                            except Exception as e:
+                            except (requests.exceptions.RequestException, TimeoutError, ValueError) as e:
                                 app.logger.warning(
                                     f"Postprocessing webhook failed: {e}. Using unprocessed data."
                                 )

Add import:

import requests  # at top-level imports
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between d78e5fb and 0b74f6f.

📒 Files selected for processing (4)
  • backend/notification_v2/provider/webhook/webhook.py (2 hunks)
  • backend/sample.env (1 hunks)
  • prompt-service/sample.env (1 hunks)
  • prompt-service/src/unstract/prompt_service/services/answer_prompt.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/sample.env
🧰 Additional context used
🪛 Ruff (0.12.2)
backend/notification_v2/provider/webhook/webhook.py

60-60: Avoid specifying long messages outside the exception class

(TRY003)

prompt-service/src/unstract/prompt_service/services/answer_prompt.py

306-306: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build
🔇 Additional comments (3)
prompt-service/src/unstract/prompt_service/services/answer_prompt.py (1)

17-17: Verify unstract-sdk contains URLValidator and align version pins

  • Imports using URLValidator found at:
    • prompt-service/src/unstract/prompt_service/services/answer_prompt.py:17
    • backend/notification_v2/provider/webhook/webhook.py:10
  • prompt-service/pyproject.toml pins "unstract-sdk[azure]~=0.77.1" (top-level and most services also use =0.77.1); tools/structure/requirements.txt contains "unstract-sdk[aws]=0.74.0".
  • Action: confirm that the runtime package version (v0.77.1) exports unstract.sdk.adapters.url_validator.URLValidator; if it does not, pin/bump the SDK where these services run and update any inconsistent requirement files/CI manifests.
prompt-service/sample.env (1)

68-71: Standardize env var: ALLOWED_ADAPTER_PRIVATE_ENDPOINTS vs WHITELISTED_ENDPOINTS

WHITELISTED_ENDPOINTS is defined in sample envs but ALLOWED_ADAPTER_PRIVATE_ENDPOINTS is not present in the repo; URLValidator is imported from the external unstract SDK (no implementation in-tree), so confirm which env name the SDK reads and either rename the envs or support both.

Locations: prompt-service/sample.env:71, backend/sample.env:207; URLValidator imported at prompt-service/src/unstract/prompt_service/services/answer_prompt.py and backend/notification_v2/provider/webhook/webhook.py.

backend/notification_v2/provider/webhook/webhook.py (1)

10-10: Ensure SDK roll includes URLValidator and update dependent pins

backend/notification_v2/provider/webhook/webhook.py and prompt-service/src/unstract/prompt_service/services/answer_prompt.py import URLValidator and both are pinned to unstract-sdk~=0.77.1 — confirm that v0.77.1 on PyPI includes unstract.sdk.adapters.url_validator. (pypi.org)

Repos still on older SDK pins (unstract/workflow-execution ~=0.70.0; tools/structure ~=0.74.0) — if they will use URLValidator, publish a release that contains it or bump those pins.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Gayathri <[email protected]>
Copy link
Contributor

filepath function $$\textcolor{#23d18b}{\tt{passed}}$$ SUBTOTAL
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_logs}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_cleanup}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_cleanup\_skip}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_client\_init}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_image\_exists}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_image}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config\_without\_mount}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_run\_container}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_image\_for\_sidecar}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_sidecar\_container}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{TOTAL}}$$ $$\textcolor{#23d18b}{\tt{11}}$$ $$\textcolor{#23d18b}{\tt{11}}$$

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
backend/notification_v2/provider/webhook/webhook.py (2)

55-61: Order of validation operations and logging placement needs adjustment.

Two issues need to be addressed:

  1. The logging occurs after the validation call, making debugging more difficult
  2. The validation occurs before the payload validation, which maintains security-first approach (good)

Apply this diff to fix the logging order:

 
+        logger.info("Validating webhook URL.")
         # Validate webhook URL for security
         is_valid, error_message = URLValidator.validate_url(self.notification.url)
-        logger.info("Validating webhook URL.")
         if not is_valid:
             raise ValueError(f"Webhook URL validation failed: {error_message}")

60-60: Consider extracting the error message to avoid the TRY003 lint warning.

The static analysis tool suggests avoiding long messages outside the exception class. While this is a minor style preference, you could extract this to a constant or use a shorter message.

Apply this diff if you want to address the lint warning:

+WEBHOOK_URL_VALIDATION_ERROR = "Webhook URL validation failed"
+
         # Validate webhook URL for security
         is_valid, error_message = URLValidator.validate_url(self.notification.url)
         logger.info("Validating webhook URL.")
         if not is_valid:
-            raise ValueError(f"Webhook URL validation failed: {error_message}")
+            raise ValueError(f"{WEBHOOK_URL_VALIDATION_ERROR}: {error_message}")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 0b74f6f and ac1d5c5.

📒 Files selected for processing (1)
  • backend/notification_v2/provider/webhook/webhook.py (2 hunks)
🧰 Additional context used
🪛 Ruff (0.12.2)
backend/notification_v2/provider/webhook/webhook.py

60-60: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build
🔇 Additional comments (3)
backend/notification_v2/provider/webhook/webhook.py (3)

10-10: LGTM! Import follows the pattern established in the codebase.

The URLValidator import from the unstract SDK is consistent with the broader URL validation strategy being implemented across the application.


47-63: Security validation placement is appropriate.

The URL validation is correctly placed after checking for URL presence but before payload validation, which ensures security checks happen early in the validation flow. This aligns well with the SSRF prevention objectives outlined in the PR.


56-60: Verify URLValidator.validate_url signature and return type

Repo search did not find the implementation of URLValidator.validate_url; current code unpacks it as (is_valid, error_message). Usages found at backend/notification_v2/provider/webhook/webhook.py:57 and prompt-service/src/unstract/prompt_service/services/answer_prompt.py:292. Confirm the SDK (unstract.sdk.adapters.url_validator) actually returns that tuple; if not, update these call sites to match the real signature (or add unpacking-safe handling for boolean/exception-based returns).

Copy link

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
prompt-service/src/unstract/prompt_service/helpers/variable_replacement.py (1)

120-128: Excellent URL security validation implementation.

The URL validation logic properly implements SSRF protection by:

  1. Validating that the URL is well-formed and belongs to trusted domains
  2. Raising a descriptive BadRequest exception on validation failure
  3. Preventing the HTTP request from proceeding if validation fails

The commented logging code suggests awareness of potential sensitive data in logs, which is good security practice.

Consider following the static analysis hint to improve exception message handling:

-            raise BadRequest(f"Invalid or unsafe URL: {url} - {error_message}")
+            raise BadRequest(f"Invalid or unsafe URL detected in dynamic variable")

This avoids including potentially sensitive URL details in the exception message while still providing useful context.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between ac1d5c5 and cd129cf.

📒 Files selected for processing (3)
  • prompt-service/src/unstract/prompt_service/controllers/answer_prompt.py (3 hunks)
  • prompt-service/src/unstract/prompt_service/helpers/variable_replacement.py (4 hunks)
  • prompt-service/src/unstract/prompt_service/services/variable_replacement.py (2 hunks)
🧰 Additional context used
🪛 Ruff (0.13.1)
prompt-service/src/unstract/prompt_service/helpers/variable_replacement.py

127-127: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (7)
prompt-service/src/unstract/prompt_service/services/variable_replacement.py (2)

7-7: LGTM: Proper exception import for security validation.

The import of BadRequest exception supports the new URL security validation flow and ensures consistent error handling.


78-82: Improve error handling for dynamic variable URL extraction

  • Location: prompt-service/src/unstract/prompt_service/constants.py:169 (DYNAMIC_VARIABLE_URL_REGEX) and prompt-service/src/unstract/prompt_service/helpers/variable_replacement.py (replace_dynamic_variable, ~line 78).
  • Action: Verify the regex covers all expected dynamic-variable formats and make extraction defensive: confirm url_match and group existence before calling url_match.group(0), parse/validate the URL (urllib.parse), and add unit tests with representative inputs.
prompt-service/src/unstract/prompt_service/controllers/answer_prompt.py (3)

14-16: Minor import formatting enhancement.

The import reorganization improves readability without changing functionality.


91-117: Excellent defensive error handling for variable replacement.

The try-catch block properly handles BadRequest exceptions from URL validation, provides comprehensive logging, and correctly propagates the exception to halt processing when URL security validation fails.

Based on SSRF prevention best practices, validation should "ensure that the data provided is a valid domain name" and "ensure that the domain name provided belongs to one of the domain names of the identified and trusted applications", which this implementation supports through the URLValidator.validate_url() call in the helper.


266-268: Minor import formatting enhancement.

The import reorganization improves readability without changing functionality.

prompt-service/src/unstract/prompt_service/helpers/variable_replacement.py (2)

9-11: LGTM: Proper security imports for SSRF protection.

The addition of BadRequest exception and URLValidator imports enables proper URL security validation to prevent SSRF attacks in dynamic variable processing.


27-30: Minor parameter alignment improvement.

The parameter alignment in the function call enhances code readability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant