Skip to content

Commit a88ad2d

Browse files
committed
Update README
1 parent 051c84d commit a88ad2d

File tree

1 file changed

+26
-9
lines changed

1 file changed

+26
-9
lines changed

README.md

Lines changed: 26 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ Percollate is a command-line tool that turns web pages into beautifully formatte
1515
- [Available options](#available-options)
1616
- [Recipes](#recipes)
1717
- [Basic bundling](#basic-bundling)
18+
- [Web feeds](#web-feeds)
1819
- [The `--css` option](#the---css-option)
1920
- [The `--style` option](#the---style-option)
2021
- [The `--template` option](#the---template-option)
@@ -220,6 +221,24 @@ curl https://example.com/page1 | percollate pdf --url=https://example.com/page1
220221

221222
Notice we're using the `url` option to tell percollate the source of our (now-anonymous) HTML it gets on stdin, so that relative URLs on links and images resolve correctly.
222223

224+
### Web feeds
225+
226+
Percollate has basic support for processing XML web feeds in [Atom](https://datatracker.ietf.org/doc/html/rfc4287) or [RSS](https://www.rssboard.org/rss-specification) format.
227+
228+
When processing a web feed, every entry in the feed becomes its own article, as if percollate received all the entry URLs as operands. The command below produces an EPUB book from the feed contents:
229+
230+
```bash
231+
percollate epub https://example.com/posts.xml
232+
```
233+
234+
To produce individual output files for the feed entries, use the `--individual` flag:
235+
236+
```bash
237+
percollate epub --individual https://example.com/posts.xml
238+
```
239+
240+
The content of the articles is read from the feed file rather than fetched anew. The content is passed through the DOM enhancements and sanitized as usual, but it’s not processed with Readability.
241+
223242
### The `--css` option
224243

225244
The `--css` option lets you pass a small snippet of CSS to percollate. Here are some common use-cases:
@@ -366,7 +385,7 @@ All export formats follow a common pipeline:
366385
1. Fetch the page(s) using [`node-fetch`](https://github.com/node-fetch/node-fetch)
367386
2. If an AMP version of the page exists, use that instead (disable with `--no-amp` flag)
368387
3. [Enhance](./src/enhancements.js) the DOM using [`jsdom`](https://github.com/jsdom/jsdom)
369-
4. Pass the DOM through [`mozilla/readability`](https://github.com/mozilla/readability) to strip unnecessary elements
388+
4. Pass the DOM through [`@mozilla/readability`](https://github.com/mozilla/readability) to strip unnecessary elements
370389
5. Apply the [HTML template](./templates/default.html) and the [stylesheet](./templates/default.css) to the resulting HTML
371390

372391
Different formats then use different tools to produce the final file.
@@ -383,7 +402,7 @@ Markdown files are produced the same way as HTMLs, then processed with a series
383402

384403
Percollate inherits the limitations of two of its main components, Readability and Puppeteer (headless Chrome).
385404

386-
The imperative approach Readability takes will not be perfect in each case, especially on HTML pages with atypical markup; you may occasionally notice that it either leaves in superfluous content, or that it strips out parts of the content. You can confirm the problem against [Firefox's Reader View](https://blog.mozilla.org/firefox/reader-view/). In this case, consider [filing an issue on `mozilla/readability`](https://github.com/mozilla/readability/issues).
405+
The imperative approach Readability takes will not be perfect in each case, especially on HTML pages with atypical markup; you may occasionally notice that it either leaves in superfluous content, or that it strips out parts of the content. You can confirm the problem against [Firefox's Reader View](https://blog.mozilla.org/firefox/reader-view/). In this case, consider [filing an issue on `@mozilla/readability`](https://github.com/mozilla/readability/issues).
387406

388407
Using a browser to generate the PDF is a double-edged sword. On the one hand, you get excellent support for web platform features. On the other hand, [print CSS](https://www.smashingmagazine.com/2018/05/print-stylesheets-in-2018/) as defined by W3C specifications is only partially implemented, and it seems unlikely that support will be improved any time soon. However, even with modest print support, I think Chrome is the best (free) tool for the job.
389408

@@ -419,14 +438,12 @@ Contributions of all kinds are welcome! See [CONTRIBUTING.md](./CONTRIBUTING.md)
419438

420439
Here are some other projects to check out if you're interested in building books using the browser:
421440

422-
- [weasyprint](https://github.com/Kozea/WeasyPrint) ([website](https://weasyprint.org/))
423-
- [bindery.js](https://github.com/evnbr/bindery) ([website](https://evanbrooks.info/bindery/))
424-
- [HummusJS](https://github.com/galkahana/HummusJS)
425-
- [Editoria](https://gitlab.coko.foundation/editoria/editoria) ([website](https://editoria.pub/))
426-
- [pagedjs](https://gitlab.pagedmedia.org/tools/pagedjs) ([article](https://www.pagedmedia.org/pagedjs-sneak-peeks/))
427-
- [Mercury](https://mercury.postlight.com/)
441+
- [bindery.js](https://github.com/evnbr/bindery) ([website](https://bindery.info/))
428442
- [Foliojs](https://github.com/foliojs)
443+
- [Ketty](https://gitlab.coko.foundation/coko-org/products/ketty/ketty) ([website](https://ketty.community/))
429444
- [Magicbook](https://github.com/magicbookproject/magicbook)
430445
- [monolith](https://github.com/Y2Z/monolith)
431-
- [SaraVieira/starter-book](https://github.com/SaraVieira/starter-book)
446+
- [paged.js](https://github.com/pagedjs/pagedjs/) ([website](https://pagedjs.org/))
447+
- [Postlight Parser](https://github.com/postlight/parser)
432448
- [SingleFileZ](https://github.com/gildas-lormeau/SingleFileZ)
449+
- [weasyprint](https://github.com/Kozea/WeasyPrint) ([website](https://weasyprint.org/))

0 commit comments

Comments
 (0)