Document fine sieve in URLInTextProcessor doc string (#186)

adamziel · web-flow · commit f1b965e3e5d7 · 2025-09-12T00:22:52.000+02:00
Updated the class docblock to clarify it implements both the thick and
the fine sieve.
diff --git a/components/DataLiberation/URL/class-urlintextprocessor.php b/components/DataLiberation/URL/class-urlintextprocessor.php
@@ -6,24 +6,32 @@
 use WP_HTML_Text_Replacement;
 
 /**
- * Finds string fragments that look like URLs and allow replacing them.
- * This is the first, "thick" sieve that yields "URL candidates" that must be
- * validated with a WHATWG-compliant parser. Some of the candidates will be
- * false positives.
+ * Finds string fragments that look like URLs and allows replacing them.
  *
- * This is a "thick sieve" that matches too much instead of too little. It
- * will yield false positives, but will not miss a URL
+ * This class implements two stages of detection:
  *
- * Looks for URLs:
+ * 1. **A "thick" sieve**
+ * 2. **A "fine" sieve**
  *
- * * Starting with http:// or https://
- * * Starting with //
- * * Domain-only, e.g. www.example.com
- * * Domain + path, e.g. www.example.com/path
+ * The thick sieve uses a regular expression to match URL-like substrings. It matches too
+ * much and may yield false positives.
+ *
+ * The fine sieve filters out invalid candidates using a WHATWG-compliant parser so only
+ * real URLs are returned.
+ *
+ * ## URL Detection
+ *
+ * The thick sieve looks for URLs:
+ *
+ * * Starting with http://, https://, or //, e.g. //wp.org.
+ * * With no protocol, e.g. www.wp.org or wp.org/path
+ *
+ * Here's a list of matching-related rules, limitations, and assumptions:
  *
  * ### Protocols
  *
- * As a migration-oriented tool, this processor will only consider http and https protocols.
+ * As a site migration tool, this processor only considers URLs with HTTP
+ * and HTTPS protocols.
  *
  * ### Domain names
  *