Skip to content

read_parquet Does Not Throw Error for Missing Column #1315

@knoam

Description

@knoam

Describe the bug

If you call pandas read_parquet and specify a missing column, you get an error as expected. But in corresponding awswrangler command, you do not.

How to Reproduce

Pandas code raises an exception:

pd.DataFrame({'a':[1],'b':[2]}).to_parquet('test.parquet')
pd.read_parquet('test.parquet', columns=['a','b','c'])

ArrowInvalid: No match for FieldRef.Name(c) in a: int64 b: int64

But awswrangler does not:

wr.s3.to_parquet(pd.DataFrame({'a':[1],'b':[2]}), f's3://{bucket}/test.parquet')
wr.s3.read_parquet(f's3://{bucket}/test.parquet', columns=['a','b','c'])

Expected behavior

I expect awswrangler to throw an error like the pandas one.

Your project

No response

Screenshots

No response

OS

Linux

Python version

3.6.13

AWS DataWrangler version

2.14.0

Additional context

No response

Metadata

Metadata

Labels

enhancementNew feature or requestquestionFurther information is requested

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions