Skip to content

Conversation

rnewson
Copy link
Member

@rnewson rnewson commented Sep 15, 2025

Overview

Add a plugin to automatically purge deleted documents after a configurable interval has lapsed. This should be set high enough that all external consumers of the changes feed will have seen and processed the deleted documents. The plugin uses CouchDB's purge facility which ensures internal replication and indexes have processed the deletions before the deleted document is purged.

Testing recommendations

covered by eunit tests.

Related Issues or Pull Requests

N/A

Checklist

  • Code is written and works correctly
  • Changes are covered by tests
  • Any new configurable parameters are documented in rel/overlay/etc/default.ini
  • Documentation changes were made in the src/docs folder
  • Documentation changes were backported (separated PR) to affected branches

@rnewson rnewson marked this pull request as draft September 15, 2025 11:13
@rnewson
Copy link
Member Author

rnewson commented Sep 15, 2025

Draft as I still need to do a better database-level config for the override.


.. config:section:: couch_auto_purge_plugin :: Configure the Auto Purge plugin
.. config:option:: deleted_document_ttl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the name. Previously when discussing this we used the term tombstones, which it was descriptive it would be a new term for users to learn.

In the future it would be easier to also have a deleted_conflict_ttl which could be handled by the purge mechanism.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I like the new name, and it leaves open the possibility of a document_ttl too (for non-deleted documents).

@nickva
Copy link
Contributor

nickva commented Sep 15, 2025

This look pretty compact!

Draft as I still need to do a better database-level config for the override.

I'll look into adding a way to set and gets for the database level config. Hopefully as properties in the shard docs. We have some precent in resharding when we update the shard map:

update_shard_map(#job{source = Source, target = Target} = Job) ->

@rnewson rnewson force-pushed the auto-delete-tseq branch 3 times, most recently from 13ce28b to aeb78a7 Compare September 17, 2025 13:29
@rnewson
Copy link
Member Author

rnewson commented Sep 18, 2025

have added a commit that does the get/set of a database-level override into the _dbs document. I've placed it outside of the "props" object for now, but that's a discussion. I also deliberately don't add this property into the #shard{} records that mem3 would return as I'm trying to establish the database level property as a single value.

@rnewson rnewson marked this pull request as ready for review September 18, 2025 09:11
@rnewson
Copy link
Member Author

rnewson commented Sep 18, 2025

noting that get deliberately just reads the local unsharded _dbs db for its answer and set tries to ensure the same node in the cluster (lowest live node) does the updates, to avoid conflicts. any update to _dbs is replicated in a ring to all nodes. I return a 202 status code as a hint that the write was made but is not yet redundantly stored.

@rnewson rnewson force-pushed the auto-delete-tseq branch 2 times, most recently from b1172e3 to 60ddb11 Compare September 18, 2025 13:24
@nickva
Copy link
Contributor

nickva commented Sep 18, 2025

Yeah something like get/set can work and agree I am not a fan of how props is spread over all #shard{} copies in the shards cache, ideally it should be something like dbname -> props as a new mem3 cache (ets table). But it would be good ,I think, to have a general well working props and maybe like we discussed in the couchdb meeting even use security for it and get a nice optimization boost from not having to deal with get_db any longer.

Props also has a bit of an extra benefit that it will automatically show up in the dbs_info result so users can inspect them the flags.

So far I have been trying to extract the "update shard map" bit from mem3_reshard and make it a general utility. It's got some resharding specific bits in there and perhaps extra belts and suspenders like for instance:

  • before making the change the leader (first live node), pull changes from other nodes
  • after the change it force pushes the changes to all the nodes
  • there is a wait to propagate step where we wait for the change to take effect on other nodes

Some of theses are there to protect against creating conflicts or handle the case where the ring may have just broken and such but maybe some of it overkill, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants