forked from ashi009/node-fast-html-parser
-
Notifications
You must be signed in to change notification settings - Fork 116
Open
Description
I work for a project that validates its links using this library. One link that is frequently validated is the HTML spec at https://html.spec.whatwg.org/. This page has one of the bigger HTML files on the web but node-html-parser was able to parse it well in approximately 23 seconds on my local machine until release 5.3.2.
Consider this example:
const HTMLParser = require('node-html-parser');
const nFetch = require('node-fetch');
async function parseHTMLSpec() {
try {
const response = await nFetch('https://html.spec.whatwg.org/');
const html = await response.text();
console.log('Fetched HTML. Attempting to parse...');
console.time('parseHTMLSpec');
const parsedHTML = HTMLParser.parse(html);
console.timeEnd('parseHTMLSpec');
console.log('HTML parsed successfully.');
console.log('Title:', parsedHTML.querySelector('title').text);
} catch (error) {
console.error('Error occurred:', error);
}
}
parseHTMLSpec();
With node-html-parser 5.3.1, this outputs the following:
Fetched HTML. Attempting to parse...
parseHTMLSpec: 23.415s
HTML parsed successfully.
Title: HTML Standard
With node-html-parser 5.3.2, this hangs indefinitely; only outputting the following even after running for hours:
console.log('Fetched HTML. Attempting to parse...');
Metadata
Metadata
Assignees
Labels
No labels