|
6 | 6 | use WP_HTML_Text_Replacement;
|
7 | 7 |
|
8 | 8 | /**
|
9 |
| - * Finds string fragments that look like URLs and allow replacing them. |
10 |
| - * This is the first, "thick" sieve that yields "URL candidates" that must be |
11 |
| - * validated with a WHATWG-compliant parser. Some of the candidates will be |
12 |
| - * false positives. |
| 9 | + * Finds string fragments that look like URLs and allows replacing them. |
13 | 10 | *
|
14 |
| - * This is a "thick sieve" that matches too much instead of too little. It |
15 |
| - * will yield false positives, but will not miss a URL |
| 11 | + * This class implements two stages of detection: |
16 | 12 | *
|
17 |
| - * Looks for URLs: |
| 13 | + * 1. **A "thick" sieve** |
| 14 | + * 2. **A "fine" sieve** |
18 | 15 | *
|
19 |
| - * * Starting with http:// or https:// |
20 |
| - * * Starting with // |
21 |
| - * * Domain-only, e.g. www.example.com |
22 |
| - * * Domain + path, e.g. www.example.com/path |
| 16 | + * The thick sieve uses a regular expression to match URL-like substrings. It matches too |
| 17 | + * much and may yield false positives. |
| 18 | + * |
| 19 | + * The fine sieve filters out invalid candidates using a WHATWG-compliant parser so only |
| 20 | + * real URLs are returned. |
| 21 | + * |
| 22 | + * ## URL Detection |
| 23 | + * |
| 24 | + * The thick sieve looks for URLs: |
| 25 | + * |
| 26 | + * * Starting with http://, https://, or //, e.g. //wp.org. |
| 27 | + * * With no protocol, e.g. www.wp.org or wp.org/path |
| 28 | + * |
| 29 | + * Here's a list of matching-related rules, limitations, and assumptions: |
23 | 30 | *
|
24 | 31 | * ### Protocols
|
25 | 32 | *
|
26 |
| - * As a migration-oriented tool, this processor will only consider http and https protocols. |
| 33 | + * As a site migration tool, this processor only considers URLs with HTTP |
| 34 | + * and HTTPS protocols. |
27 | 35 | *
|
28 | 36 | * ### Domain names
|
29 | 37 | *
|
|
0 commit comments