Skip to content

Conversation

oleibman
Copy link
Collaborator

Add some additional support for Intersection and Union in the Calculation Engine. This allows me to reinstate 2 tests which were formerly skipped, without breaking any existing tests. There are almost certainly edge cases which I haven't thought of yet; I will leave this PR as a draft for several weeks before moving it formward.

This change gives some opportunities for users to go wrong. If you place the following formula in a cell:

=B1:B8 A7:D7

Excel will treat the space as an intersection operator, and will return the value in B7, which is where the ranges intersect. PhpSpreadsheet will do so as well. But, if you use the following formula:

=B1:B8,A7:D7

Excel will return #VALUE!. This seems like something they forgot to take care of when adding dynamic arrays. The comma should be interpreted as a union operator, and PhpSpreadsheet will now return the union of the ranges. It seems very difficult for PhpSpreadsheet to return an error when the formula seems easily evaluated. Furthermore, if you use the following formula:

=SUM(B1:B8,A7:D7)

Excel evaluates it as you would expect, summing the union of the ranges. So does PhpSpreadsheet. I guess there's no rule requiring Excel to be consistent, but ...

Another way that users might go wrong is by actually entering the union or intersection symbols in a formula rather than comma or space. Excel will not allow this; PhpSpreadsheet needs to change comma or space to the appropriate symbol in order for the rest of this PR to work properly. By the time the parser gets to it, it can't tell whether the symbol was part of the cell's original formula or if PhpSpreadsheet substituted it, so it just has to permit it. I do not expect many users to fall afoul of this problem, at least not more than once.

Just by way of explanation, the reason why PhpSpreadsheet has to make the substitution is because intersection has a higher priority than union, just as multiplication has a higher priority than addition. So, if you don't change the symbols beforehand, you may wind up figuring out that you need to perform an intersection too late - a lower-priority union may have already taken place.

This is:

  • a bugfix
  • a new feature
  • refactoring
  • additional unit tests

Checklist:

  • Changes are covered by unit tests
    • Changes are covered by existing unit tests
    • New unit tests have been added
  • Code style is respected
  • Commit message explains why the change is made (see https://github.com/erlang/otp/wiki/Writing-good-commit-messages)
  • CHANGELOG.md contains a short summary of the change and a link to the pull request if applicable
  • Documentation is updated as necessary

Add some additional support for Intersection and Union in the Calculation Engine. This allows me to reinstate 2 tests which were formerly skipped, without breaking any existing tests. There are almost certainly edge cases which I haven't thought of yet; I will leave this PR as a draft for several weeks before moving it formward.

This change gives some opportunities for users to go wrong. If you place the following formula in a cell:
```
=B1:B8 A7:D7
```
Excel will treat the space as an intersection operator, and will return the value in B7, which is where the ranges intersect. PhpSpreadsheet will do so as well. But, if you use the following formula:
```
=B1:B8,A7:D7
```
Excel will return `#VALUE!`. This seems like something they forgot to take care of when adding dynamic arrays. The comma should be interpreted as a union operator, and PhpSpreadsheet will now return the union of the ranges. It seems very difficult for PhpSpreadsheet to return an error when the formula seems easily evaluated. Furthermore, if you use the following formula:
```
=SUM(B1:B8,A7:D7)
```
Excel evaluates it as you would expect, summing the union of the ranges. So does PhpSpreadsheet. I guess there's no rule requiring Excel to be consistent, but ...

Another way that users might go wrong is by actually entering the union or intersection symbols in a formula rather than comma or space. Excel will not allow this; PhpSpreadsheet needs to change comma or space to the appropriate symbol in order for the rest of this PR to work properly. By the time the parser gets to it, it can't tell whether the symbol was part of the cell's original formula or if PhpSpreadsheet substituted it, so it just has to permit it. I do not expect many users to fall afoul of this problem, at least not more than once.

Just by way of explanation, the reason why PhpSpreadsheet has to make the substitution is because intersection has a higher priority than union, just as multiplication has a higher priority than addition. So, if you don't change the symbols beforehand, you may wind up figuring out that you need to perform an intersection too late - a lower-priority union may have already taken place.
@oleibman oleibman changed the title WIP Some Additional Support for Intersection and Union Some Additional Support for Intersection and Union Sep 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

1 participant