-
Notifications
You must be signed in to change notification settings - Fork 2.4k
feat: add EmptyDocumentRemover component #8102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pull Request Test Coverage Report for Build 10151473812Details
💛 - Coveralls |
Hi @CarlosFerLo I thought about your suggestion of a higher abstraction by making a DocumentFilter component that accepts a predicate during initialization. Isn't that very similar to a ConditionalRouter? Or an OutputAdapter? What do you think about the following example with an OutputAdapter using a custom filter? from typing import List
from haystack import Pipeline, component, Document
from haystack.components.converters import OutputAdapter
def filter_empty_docs(docs):
return [doc for doc in docs if doc.content is not None]
@component
class DocumentProducer:
@component.output_types(documents=dict)
def run(self):
return {"documents": [Document(content="haystack1"), Document(content=None), Document(content=""), Document(content="haystack2")]}
pipe = Pipeline()
pipe.add_component(
name="output_adapter",
instance=OutputAdapter(template="{{ documents | filter_empty_docs}}",
output_type=List[Document],
custom_filters={"filter_empty_docs": filter_empty_docs}))
pipe.add_component(name="document_producer", instance=DocumentProducer())
pipe.connect("document_producer", "output_adapter")
result = pipe.run(data={})
print(result["output_adapter"]["output"]) |
I like re-using existing components 👍🏻 I think deserialization is a problem with your version @julian-risch . However, looking at this stackoverflow post everything would actually be possible via a jinja template, wouldn't it? |
That's correct. We don't need to define pipe.add_component(
name="output_adapter",
instance=OutputAdapter(template="{{ documents | rejectattr('content', 'none') | list}}",
output_type=List[Document])) |
@CarlosFerLo Thanks for opening this PR and thereby brining the discussion forward! We found that the requested behavior can be easily achieved with an OutputAdapter so I will close this PR and the issue referring to the code snippet above. I'll check with my colleagues how we can better document that the OutputAdapter can do this. |
This fails at runtime. Related to #8095 from typing import List
from haystack import Pipeline, component, Document
from haystack.components.converters import OutputAdapter
from haystack.components.joiners.document_joiner import DocumentJoiner
def filter_empty_docs(docs):
return [doc for doc in docs if doc.content is not None]
@component
class DocumentProducer:
@component.output_types(documents=dict)
def run(self):
return {"documents": [Document(content="haystack1"), Document(content=None), Document(content=""), Document(content="haystack2")]}
pipe = Pipeline()
pipe.add_component(name="document_producer", instance=DocumentProducer())
pipe.add_component(
name="output_adapter",
instance=OutputAdapter(template="{{ documents | filter_empty_docs}}",
output_type=List[Document],
custom_filters={"filter_empty_docs": filter_empty_docs}))
pipe.add_component(instance=DocumentJoiner(), name="joiner")
pipe.connect("document_producer", "output_adapter")
pipe.connect("output_adapter", "joiner")
result = pipe.run(data={})
print(result)
|
Thanks @anakin87 . I missed that recent change in my local tests and indeed the solution I proposed does not work from 2.3.1 onward. I suggest using a custom component instead: #8061 (comment) |
Related Issues
EmptyDocumentRemover
component #8061Proposed Changes:
Add a simple new component
EmptyDocumentRemover
as proposed on #8061 to remove empty documents from a list of documents, then return the new list.How did you test it?
Added a simple unit test, as its implementation is rather simple.
Notes for the reviewer
Checklist
fix:
,feat:
,build:
,chore:
,ci:
,docs:
,style:
,refactor:
,perf:
,test:
. ✅