Skip to content

Conversation

danakj
Copy link
Contributor

@danakj danakj commented Dec 13, 2024

We propose to add Core.Array(T, N) as a library type in the Core package. Since arrays are a very frequent type, we propose to privilege use of this type by including it in the prelude library of the package.

We would like to see a shorthand where Core.Array is automatically imported into the file scope, and this proposal includes future work to this effect.

@danakj danakj added proposal A proposal proposal draft Proposal in draft, not ready for review labels Dec 13, 2024
@danakj danakj force-pushed the proposal-array-forwards-to-th branch from e5ce9a2 to 1eac5a0 Compare December 13, 2024 15:02
@danakj danakj force-pushed the proposal-array-forwards-to-th branch 4 times, most recently from 7729072 to 41cdb12 Compare January 8, 2025 19:35
@danakj danakj marked this pull request as ready for review January 8, 2025 19:36
@github-actions github-actions bot added proposal rfc Proposal with request-for-comment sent out and removed proposal draft Proposal in draft, not ready for review labels Jan 8, 2025
@danakj danakj requested review from zygoloid and chandlerc January 8, 2025 19:36
@github-actions github-actions bot requested a review from KateGregory January 8, 2025 19:36
@danakj danakj force-pushed the proposal-array-forwards-to-th branch from 41cdb12 to fca98cc Compare January 8, 2025 19:39
@clavin
Copy link
Contributor

clavin commented Jan 13, 2025

Wonderful proposal! 😄 The rationale behind this design is very impressive.

I wanted to highlight this information for your consideration: Swift is currently working on a new standard array type whose size is fixed at compile-time. In the proposal it is called Vector, along with a section discussing the decision behind this name; however, the associated implementation PR that just landed renamed the type to Slab.

One part of that proposal stuck out to me and reminded me of this proposal. They establish that they cannot use the name Swift.Array for this new type as it is already taken. To justify the name Swift.Vector instead, they argue that the naming of std::vector may have been mistake in hindsight:

A. Stepanov mentions in his book, "From Mathematics to Generic Programming", that using the name std::vector for their dynamically allocated growable array type was perhaps a mistake for this same reason: [...]

However, they clearly acknowledge that this decision is a potential source of confusion, especially for developers coming from C++:

We fully acknowledge that the Swift types, Swift.Array and Swift.Vector, are complete opposites of the C++ ones, std::vector and std::array.

It seems they ultimately walked back this decision, given the rename to Slab, for what reason I can only guess is to avoid this confusion. This was the overwhelming topic of discussion/bikeshedding in the proposal's second review (based on a quick skim), but I couldn't find an official decision anywhere.

This work in Swift may be too recent and volatile to derive any design decisions in Carbon right now, but it could add beneficial discussion to this proposal.

Consider noting this ongoing design work in the Background section for Swift. Also, consider discussing the argument that std::vector and std::array are misnomers in reference to the mathematical terms, strengthening the decision to stick with the term "array" that is familiar to C++ and Rust developers.

@danakj danakj force-pushed the proposal-array-forwards-to-th branch from 3976e27 to 8d90337 Compare January 20, 2025 16:25
@danakj danakj changed the title Array forwards to the prelude The Core.Array type for direct storage immutably-sized buffers Jan 20, 2025
@danakj danakj force-pushed the proposal-array-forwards-to-th branch from 09c2e46 to 7d390dd Compare January 20, 2025 18:49
Copy link
Contributor Author

@danakj danakj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, PTAL

@danakj danakj requested a review from zygoloid January 20, 2025 18:54
@danakj
Copy link
Contributor Author

danakj commented Jan 20, 2025

@clavin Thanks for all the links and context regarding the Swift Slab type, I've also added this context into the proposal's background. And the confusion re: vector as a name into the rationale discussion.

Comment on lines +67 to +68
with arrays. Indirect refers to heap allocation, where the type itself holds
storage of a pointer to the buffer, as with heap-buffers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it really influences this proposal in either direction, but I'm a bit uncomfortable with establishing a dichotomy between indirect vs. direct here. I'm not sure that's the focal point we want to rely on here.

For example, I worry that would lead towards designs that preclude small-size optimization or other techniques that don't use the heap necessarily...

I lean a bit towards static vs. dynamic allocation, static vs. dynamic immutable vs. dynamic mutable size as the classification basis...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definitely did not mean to weigh in on SSO, but also I would say SSO is an optimization of indirect storage in a type when it's value is small, rather than being a direct-storage type.

I don't know that static vs dynamic quite captures the intent here, it feels a bit worse to me. You can have static allocations on the heap and that's not what an array is, so we lose some of the array vs heap-array difference. Would it be enough to just explicitly discuss that I am talking about direct vs indirect storage as the type category and SSO as an optimization doesn't change the type from an indirect-storage type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a paragraph to clarify the intent here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly thinking / musing about this and capturing those thoughts here -- not really trying to make a "this is the direction I think we should go" conclusion reply yet...

This definitely did not mean to weigh in on SSO, but also I would say SSO is an optimization of indirect storage in a type when it's value is small, rather than being a direct-storage type.

There are a number of API designs that distinguish between strictly indirect storage and something that permits SSO -- the implications of moving the object specifically.

I don't know that static vs dynamic quite captures the intent here, it feels a bit worse to me. You can have static allocations on the heap and that's not what an array is, so we lose some of the array vs heap-array difference.

Hmm... One thing that has been often requested in C and C++ is to have the compiler automatically promote large stack arrays to actually be heap arrays. And I've seen libraries that try to provide this distinction directly in the library, and even expose extra APIs (specifically around moving into something explicitly on the heap) when it ends up on the heap.

I think that too blurs the line between stack vs. heap.

Also, with allocation customization, I could imagine lots of data structures where you're describing them as heap-based actually using global memory reservations, the stack, or something else...

Would it be enough to just explicitly discuss that I am talking about direct vs indirect storage as the type category and SSO as an optimization doesn't change the type from an indirect-storage type?

Still thinking about this, but some of the things that immediately popped into my head here are above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly thinking / musing about this and capturing those thoughts here -- not really trying to make a "this is the direction I think we should go" conclusion reply yet...

This definitely did not mean to weigh in on SSO, but also I would say SSO is an optimization of indirect storage in a type when it's value is small, rather than being a direct-storage type.

There are a number of API designs that distinguish between strictly indirect storage and something that permits SSO -- the implications of moving the object specifically.

Agree, yeah. Generally in a safe language, I expect you won't be able to hold a pointer into an indirect-storage type through it being destructively moved either, but maybe it could be exposed through the lifetime API somehow. A type with the SSO would then have to act like it's always direct-storage though, which would be more restrictive.

I don't know that static vs dynamic quite captures the intent here, it feels a bit worse to me. You can have static allocations on the heap and that's not what an array is, so we lose some of the array vs heap-array difference.

Hmm... One thing that has been often requested in C and C++ is to have the compiler automatically promote large stack arrays to actually be heap arrays. And I've seen libraries that try to provide this distinction directly in the library, and even expose extra APIs (specifically around moving into something explicitly on the heap) when it ends up on the heap.

I think that too blurs the line between stack vs. heap.

I certainly would agree a type can be a hybrid of direct and indirect storage: llvm::SmallVector for example. Automatic promotion of large stack arrays sounds like a different form of optimization that should be transparent from the perspective of code using those arrays, so conceptually I'd expect them to be treated as direct storage, even if the implementation internally does something different. Do these words still provide enough vocabulary flexibility to discuss such a scenario? Personally, I would think so.

Also, with allocation customization, I could imagine lots of data structures where you're describing them as heap-based actually using global memory reservations, the stack, or something else...

+1, I think we could definitely qualify any mention of heap allocation with "as defined by the allocator" or something, much like std::Vec does here: https://doc.rust-lang.org/std/vec/struct.Vec.html#guarantees

Would it be enough to just explicitly discuss that I am talking about direct vs indirect storage as the type category and SSO as an optimization doesn't change the type from an indirect-storage type?

Still thinking about this, but some of the things that immediately popped into my head here are above.

Thanks, let's think about this a bit more and lmk what changes you'd like to see once we're clear.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still struggle with "direct" and "indirect" here -- I think it suggests similarity between std::vector and std::unique_ptr that we won't want. If we try to apply it to std::string for example, it becomes very problematic because std::string doesn't provide address-stability of the contained data across move, but I think that's a good and reasonable thing to expect from std::unique_ptr.

Fundamentally, I think std::unique_ptr is indirect, and I like the term for that specific case.

But for what you're describing here, I continue to think the distinction is more about static allocation (a local array) vs. dynamic allocation (an owning pointer to an indirect array, or something growable like std::string or std::vector).

(That said, I continue to not think this is blocking -- we can have the wording in the background section of a proposal that doesn't match what we end up with long-term in the design. =])

@danakj danakj requested a review from chandlerc January 21, 2025 18:01
@danakj
Copy link
Contributor Author

danakj commented Feb 4, 2025

We discussed this in the toolchain sync a few weeks ago, and at the time it sounded like the leads were all landing on the idea that:

  • Use a lowercase name to indicate it's a thing you have to memorize at the top level, as a higher priority than matching the name exactly that it forwards to: array(T, N)
  • Use a builtin specifically, which (as mentioned in comments above, and which I didn't understand while writing the original proposal text) is different than an implicit import from Core, as it disallows using the name array in scopes other than the file scope.

Recording these here for now, but waiting to see if the leads really converge on this outcome or decide to alter it a bit.

@chandlerc
Copy link
Contributor

We discussed this in the toolchain sync a few weeks ago, and at the time it sounded like the leads were all landing on the idea that:

  • Use a lowercase name to indicate it's a thing you have to memorize at the top level, as a higher priority than matching the name exactly that it forwards to: array(T, N)
  • Use a builtin specifically, which (as mentioned in comments above, and which I didn't understand while writing the original proposal text) is different than an implicit import from Core, as it disallows using the name array in scopes other than the file scope.

Recording these here for now, but waiting to see if the leads really converge on this outcome or decide to alter it a bit.

So, if you (or anyone else) would like to record a more formal decision, I think the process we use to get there is to file a leads question with roughly the content above and a link here for context if needed. I'm also happy to do that if useful.

We don't always do this -- if folks reach a happy consensus, including the author of a proposal, can just update the proposal and ask the leads to take another look at it. It's basically an optional process for factoring out a smaller decision from a larger "yes" or "no" on a proposal as a whole.

Sorry if that process wasn't clear!

We propose to add `Core.Array(T, N)` as a library type in the `Core`
package. Since arrays are a very frequent type, we propose to privilege
use of this type by including it in the `prelude` library of the
package.

We would like to see a shorthand where `Core.Array` is automatically
imported into the file scope, and this proposal includes future work to
this effect.
We propose to add `Core.Array(T, N)` as a library type in the `prelude`
library of the `Core` package. Since arrays are a very frequent type,
we propose to privilege use of this type by providing a builtin
`Array(T, N)` type that resolves to the `Core.Array(T, N)` type. Users
can model this as an implicit import of the `Core.Array(T, N)` type
into the global scope, much like the implicit import of the `prelude`
library of the `Core` package.
@danakj danakj force-pushed the proposal-array-forwards-to-th branch from 8254427 to e457797 Compare February 13, 2025 14:56
@danakj danakj force-pushed the proposal-array-forwards-to-th branch from 4cd3bcc to 6511346 Compare February 13, 2025 15:52
@danakj danakj changed the title The Core.Array type for direct storage immutably-sized buffers The Core.Array type for direct-storage immutably-sized buffers Feb 13, 2025
Copy link
Contributor Author

@danakj danakj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL

Copy link
Contributor

@chandlerc chandlerc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposal here largely looks good to me. Still pondering some of the background discussion wording, but that also doesn't need to be perfect. Checking with other leads to see if others have any feedback on this iteration....

Comment on lines +67 to +68
with arrays. Indirect refers to heap allocation, where the type itself holds
storage of a pointer to the buffer, as with heap-buffers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly thinking / musing about this and capturing those thoughts here -- not really trying to make a "this is the direction I think we should go" conclusion reply yet...

This definitely did not mean to weigh in on SSO, but also I would say SSO is an optimization of indirect storage in a type when it's value is small, rather than being a direct-storage type.

There are a number of API designs that distinguish between strictly indirect storage and something that permits SSO -- the implications of moving the object specifically.

I don't know that static vs dynamic quite captures the intent here, it feels a bit worse to me. You can have static allocations on the heap and that's not what an array is, so we lose some of the array vs heap-array difference.

Hmm... One thing that has been often requested in C and C++ is to have the compiler automatically promote large stack arrays to actually be heap arrays. And I've seen libraries that try to provide this distinction directly in the library, and even expose extra APIs (specifically around moving into something explicitly on the heap) when it ends up on the heap.

I think that too blurs the line between stack vs. heap.

Also, with allocation customization, I could imagine lots of data structures where you're describing them as heap-based actually using global memory reservations, the stack, or something else...

Would it be enough to just explicitly discuss that I am talking about direct vs indirect storage as the type category and SSO as an optimization doesn't change the type from an indirect-storage type?

Still thinking about this, but some of the things that immediately popped into my head here are above.

danakj added a commit to danakj/carbon-lang that referenced this pull request Feb 19, 2025
In line with the proposal in carbon-language#4682, this changes the array syntax to be
array(T, N). `array` is a builtin keyword which must be followed by
parens containing two expressions and a separating comma.

The array type expression is still fully builtin, it does not forward to
a Core.Array library type yet. It merely adds the `ArrayType`
instruction, as was done with the previous syntax.

Followup work will change the instruction to reference to Core.Array,
once the library type exists and can be used directly.
danakj added a commit to danakj/carbon-lang that referenced this pull request Feb 19, 2025
In line with the proposal in carbon-language#4682, this changes the array syntax to be
array(T, N). `array` is a builtin keyword which must be followed by
parens containing two expressions and a separating comma.

The array type expression is still fully builtin, it does not forward to
a Core.Array library type yet. It merely adds the `ArrayType`
instruction, as was done with the previous syntax.

Followup work will change the instruction to reference to Core.Array,
once the library type exists and can be used directly.
Copy link
Contributor

@chandlerc chandlerc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked with leads, and we're good to go here, ship it!

(We can keep discussing the best terminology, that's not blocking.)

Comment on lines +67 to +68
with arrays. Indirect refers to heap allocation, where the type itself holds
storage of a pointer to the buffer, as with heap-buffers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still struggle with "direct" and "indirect" here -- I think it suggests similarity between std::vector and std::unique_ptr that we won't want. If we try to apply it to std::string for example, it becomes very problematic because std::string doesn't provide address-stability of the contained data across move, but I think that's a good and reasonable thing to expect from std::unique_ptr.

Fundamentally, I think std::unique_ptr is indirect, and I like the term for that specific case.

But for what you're describing here, I continue to think the distinction is more about static allocation (a local array) vs. dynamic allocation (an owning pointer to an indirect array, or something growable like std::string or std::vector).

(That said, I continue to not think this is blocking -- we can have the wording in the background section of a proposal that doesn't match what we end up with long-term in the design. =])

@danakj danakj added this pull request to the merge queue Feb 20, 2025
Merged via the queue into carbon-language:trunk with commit 4b45caa Feb 20, 2025
8 checks passed
@danakj danakj deleted the proposal-array-forwards-to-th branch February 20, 2025 15:32
github-merge-queue bot pushed a commit that referenced this pull request Feb 21, 2025
In line with the proposal in #4682, this changes the array syntax to be
array(T, N). `array` is a builtin keyword which must be followed by
parens containing two expressions and a separating comma.

The array type expression is still fully builtin, it does not forward to
a Core.Array library type yet. It merely adds the `ArrayType`
instruction, as was done with the previous syntax.

Followup work will change the instruction to reference to Core.Array,
once the library type exists and can be used directly.

---------

Co-authored-by: zygoloid <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal rfc Proposal with request-for-comment sent out proposal A proposal
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants