Skip to content

Commit f1b965e

Browse files
authored
Document fine sieve in URLInTextProcessor doc string (#186)
Updated the class docblock to clarify it implements both the thick and the fine sieve.
1 parent f46011c commit f1b965e

File tree

1 file changed

+20
-12
lines changed

1 file changed

+20
-12
lines changed

components/DataLiberation/URL/class-urlintextprocessor.php

Lines changed: 20 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -6,24 +6,32 @@
66
use WP_HTML_Text_Replacement;
77

88
/**
9-
* Finds string fragments that look like URLs and allow replacing them.
10-
* This is the first, "thick" sieve that yields "URL candidates" that must be
11-
* validated with a WHATWG-compliant parser. Some of the candidates will be
12-
* false positives.
9+
* Finds string fragments that look like URLs and allows replacing them.
1310
*
14-
* This is a "thick sieve" that matches too much instead of too little. It
15-
* will yield false positives, but will not miss a URL
11+
* This class implements two stages of detection:
1612
*
17-
* Looks for URLs:
13+
* 1. **A "thick" sieve**
14+
* 2. **A "fine" sieve**
1815
*
19-
* * Starting with http:// or https://
20-
* * Starting with //
21-
* * Domain-only, e.g. www.example.com
22-
* * Domain + path, e.g. www.example.com/path
16+
* The thick sieve uses a regular expression to match URL-like substrings. It matches too
17+
* much and may yield false positives.
18+
*
19+
* The fine sieve filters out invalid candidates using a WHATWG-compliant parser so only
20+
* real URLs are returned.
21+
*
22+
* ## URL Detection
23+
*
24+
* The thick sieve looks for URLs:
25+
*
26+
* * Starting with http://, https://, or //, e.g. //wp.org.
27+
* * With no protocol, e.g. www.wp.org or wp.org/path
28+
*
29+
* Here's a list of matching-related rules, limitations, and assumptions:
2330
*
2431
* ### Protocols
2532
*
26-
* As a migration-oriented tool, this processor will only consider http and https protocols.
33+
* As a site migration tool, this processor only considers URLs with HTTP
34+
* and HTTPS protocols.
2735
*
2836
* ### Domain names
2937
*

0 commit comments

Comments
 (0)