You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -220,6 +221,24 @@ curl https://example.com/page1 | percollate pdf --url=https://example.com/page1
220
221
221
222
Notice we're using the `url` option to tell percollate the source of our (now-anonymous) HTML it gets on stdin, so that relative URLs on links and images resolve correctly.
222
223
224
+
### Web feeds
225
+
226
+
Percollate has basic support for processing XML web feeds in [Atom](https://datatracker.ietf.org/doc/html/rfc4287) or [RSS](https://www.rssboard.org/rss-specification) format.
227
+
228
+
When processing a web feed, every entry in the feed becomes its own article, as if percollate received all the entry URLs as operands. The command below produces an EPUB book from the feed contents:
229
+
230
+
```bash
231
+
percollate epub https://example.com/posts.xml
232
+
```
233
+
234
+
To produce individual output files for the feed entries, use the `--individual` flag:
The content of the articles is read from the feed file rather than fetched anew. The content is passed through the DOM enhancements and sanitized as usual, but it’s not processed with Readability.
241
+
223
242
### The `--css` option
224
243
225
244
The `--css` option lets you pass a small snippet of CSS to percollate. Here are some common use-cases:
@@ -366,7 +385,7 @@ All export formats follow a common pipeline:
366
385
1. Fetch the page(s) using [`node-fetch`](https://github.com/node-fetch/node-fetch)
367
386
2. If an AMP version of the page exists, use that instead (disable with `--no-amp` flag)
368
387
3.[Enhance](./src/enhancements.js) the DOM using [`jsdom`](https://github.com/jsdom/jsdom)
369
-
4. Pass the DOM through [`mozilla/readability`](https://github.com/mozilla/readability) to strip unnecessary elements
388
+
4. Pass the DOM through [`@mozilla/readability`](https://github.com/mozilla/readability) to strip unnecessary elements
370
389
5. Apply the [HTML template](./templates/default.html) and the [stylesheet](./templates/default.css) to the resulting HTML
371
390
372
391
Different formats then use different tools to produce the final file.
@@ -383,7 +402,7 @@ Markdown files are produced the same way as HTMLs, then processed with a series
383
402
384
403
Percollate inherits the limitations of two of its main components, Readability and Puppeteer (headless Chrome).
385
404
386
-
The imperative approach Readability takes will not be perfect in each case, especially on HTML pages with atypical markup; you may occasionally notice that it either leaves in superfluous content, or that it strips out parts of the content. You can confirm the problem against [Firefox's Reader View](https://blog.mozilla.org/firefox/reader-view/). In this case, consider [filing an issue on `mozilla/readability`](https://github.com/mozilla/readability/issues).
405
+
The imperative approach Readability takes will not be perfect in each case, especially on HTML pages with atypical markup; you may occasionally notice that it either leaves in superfluous content, or that it strips out parts of the content. You can confirm the problem against [Firefox's Reader View](https://blog.mozilla.org/firefox/reader-view/). In this case, consider [filing an issue on `@mozilla/readability`](https://github.com/mozilla/readability/issues).
387
406
388
407
Using a browser to generate the PDF is a double-edged sword. On the one hand, you get excellent support for web platform features. On the other hand, [print CSS](https://www.smashingmagazine.com/2018/05/print-stylesheets-in-2018/) as defined by W3C specifications is only partially implemented, and it seems unlikely that support will be improved any time soon. However, even with modest print support, I think Chrome is the best (free) tool for the job.
389
408
@@ -419,14 +438,12 @@ Contributions of all kinds are welcome! See [CONTRIBUTING.md](./CONTRIBUTING.md)
419
438
420
439
Here are some other projects to check out if you're interested in building books using the browser:
0 commit comments