-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
Function calls that return scalars can be used in SQL VALUES; however if they contain extension metadata the metadata is dropped.
To Reproduce
Output:
Regular select:
Field { name: "extension", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {"ARROW:extension:metadata": "foofy.foofy"} }
VALUES select:
Field { name: "extension", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }
use std::collections::HashMap;
use datafusion::{
arrow::datatypes::DataType,
logical_expr::{ScalarUDFImpl, Signature, Volatility},
prelude::*,
};
#[tokio::main]
async fn main() {
let ctx = SessionContext::new();
ctx.register_udf(MakeExtension::default().into());
let batches = ctx
.sql("SELECT make_extension('foofy zero') as extension")
.await
.unwrap()
.collect()
.await
.unwrap();
println!("Regular select:");
println!("{:?}", batches[0].schema().field(0));
let batches = ctx
.sql(
"
SELECT extension FROM (VALUES
('one', make_extension('foofy one')),
('two', make_extension('foofy two')),
('three', make_extension('foofy three')))
AS t(string, extension)
",
)
.await
.unwrap()
.collect()
.await
.unwrap();
println!("\nVALUES select:");
println!("{:?}", batches[0].schema().field(0));
}
#[derive(Debug)]
struct MakeExtension {
signature: Signature,
}
impl Default for MakeExtension {
fn default() -> Self {
Self {
signature: Signature::user_defined(Volatility::Immutable),
}
}
}
impl ScalarUDFImpl for MakeExtension {
fn as_any(&self) -> &dyn std::any::Any {
self
}
fn name(&self) -> &str {
"make_extension"
}
fn signature(&self) -> &Signature {
&self.signature
}
fn coerce_types(&self, arg_types: &[DataType]) -> datafusion::error::Result<Vec<DataType>> {
Ok(arg_types.to_vec())
}
fn return_type(&self, _arg_types: &[DataType]) -> datafusion::error::Result<DataType> {
unreachable!("This shouldn't have been called")
}
fn return_field_from_args(
&self,
args: datafusion::logical_expr::ReturnFieldArgs,
) -> datafusion::error::Result<datafusion::arrow::datatypes::FieldRef> {
Ok(args.arg_fields[0]
.as_ref()
.clone()
.with_metadata(HashMap::from([(
"ARROW:extension:metadata".to_string(),
"foofy.foofy".to_string(),
)]))
.into())
}
fn invoke_with_args(
&self,
args: datafusion::logical_expr::ScalarFunctionArgs,
) -> datafusion::error::Result<datafusion::logical_expr::ColumnarValue> {
Ok(args.args[0].clone())
}
}
Expected behavior
I would have expected the field metadata (if identical for all items) to be propagated to the schema of the values expression. This does bring the complexity of type equality, but byte-for-byte hash map equality should be safe. A "user defined extension type" (if there ever is one) could define a more lenient equality checker (e.g., JSON object metadata equality for extension types whose serialization is JSON).
Additional context
cc @timsaucer (😬 ...I can help with these, I'm just in the process of sorting through test failures and want to make sure anything we find is reported!)
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working