The Core.Array type for direct-storage immutably-sized buffers #4682

danakj · 2024-12-13T15:02:12Z

We propose to add Core.Array(T, N) as a library type in the Core package. Since arrays are a very frequent type, we propose to privilege use of this type by including it in the prelude library of the package.

We would like to see a shorthand where Core.Array is automatically imported into the file scope, and this proposal includes future work to this effect.

clavin · 2025-01-13T19:08:47Z

Wonderful proposal! 😄 The rationale behind this design is very impressive.

I wanted to highlight this information for your consideration: Swift is currently working on a new standard array type whose size is fixed at compile-time. In the proposal it is called Vector, along with a section discussing the decision behind this name; however, the associated implementation PR that just landed renamed the type to Slab.

One part of that proposal stuck out to me and reminded me of this proposal. They establish that they cannot use the name Swift.Array for this new type as it is already taken. To justify the name Swift.Vector instead, they argue that the naming of std::vector may have been mistake in hindsight:

A. Stepanov mentions in his book, "From Mathematics to Generic Programming", that using the name std::vector for their dynamically allocated growable array type was perhaps a mistake for this same reason: [...]

However, they clearly acknowledge that this decision is a potential source of confusion, especially for developers coming from C++:

We fully acknowledge that the Swift types, Swift.Array and Swift.Vector, are complete opposites of the C++ ones, std::vector and std::array.

It seems they ultimately walked back this decision, given the rename to Slab, for what reason I can only guess is to avoid this confusion. This was the overwhelming topic of discussion/bikeshedding in the proposal's second review (based on a quick skim), but I couldn't find an official decision anywhere.

This work in Swift may be too recent and volatile to derive any design decisions in Carbon right now, but it could add beneficial discussion to this proposal.

Consider noting this ongoing design work in the Background section for Swift. Also, consider discussing the argument that std::vector and std::array are misnomers in reference to the mathematical terms, strengthening the decision to stick with the term "array" that is familiar to C++ and Rust developers.

proposals/p4682.md

danakj

Thanks, PTAL

proposals/p4682.md

danakj · 2025-01-20T18:55:55Z

@clavin Thanks for all the links and context regarding the Swift Slab type, I've also added this context into the proposal's background. And the confusion re: vector as a name into the rationale discussion.

chandlerc · 2025-01-21T02:58:12Z

proposals/p4682.md

+with arrays. Indirect refers to heap allocation, where the type itself holds
+storage of a pointer to the buffer, as with heap-buffers.


I don't think it really influences this proposal in either direction, but I'm a bit uncomfortable with establishing a dichotomy between indirect vs. direct here. I'm not sure that's the focal point we want to rely on here.

For example, I worry that would lead towards designs that preclude small-size optimization or other techniques that don't use the heap necessarily...

I lean a bit towards static vs. dynamic allocation, static vs. dynamic immutable vs. dynamic mutable size as the classification basis...

This definitely did not mean to weigh in on SSO, but also I would say SSO is an optimization of indirect storage in a type when it's value is small, rather than being a direct-storage type.

I don't know that static vs dynamic quite captures the intent here, it feels a bit worse to me. You can have static allocations on the heap and that's not what an array is, so we lose some of the array vs heap-array difference. Would it be enough to just explicitly discuss that I am talking about direct vs indirect storage as the type category and SSO as an optimization doesn't change the type from an indirect-storage type?

I've added a paragraph to clarify the intent here.

Mostly thinking / musing about this and capturing those thoughts here -- not really trying to make a "this is the direction I think we should go" conclusion reply yet...

This definitely did not mean to weigh in on SSO, but also I would say SSO is an optimization of indirect storage in a type when it's value is small, rather than being a direct-storage type.

There are a number of API designs that distinguish between strictly indirect storage and something that permits SSO -- the implications of moving the object specifically.

I don't know that static vs dynamic quite captures the intent here, it feels a bit worse to me. You can have static allocations on the heap and that's not what an array is, so we lose some of the array vs heap-array difference.

Hmm... One thing that has been often requested in C and C++ is to have the compiler automatically promote large stack arrays to actually be heap arrays. And I've seen libraries that try to provide this distinction directly in the library, and even expose extra APIs (specifically around moving into something explicitly on the heap) when it ends up on the heap.

I think that too blurs the line between stack vs. heap.

Also, with allocation customization, I could imagine lots of data structures where you're describing them as heap-based actually using global memory reservations, the stack, or something else...

Would it be enough to just explicitly discuss that I am talking about direct vs indirect storage as the type category and SSO as an optimization doesn't change the type from an indirect-storage type?

Still thinking about this, but some of the things that immediately popped into my head here are above.

Mostly thinking / musing about this and capturing those thoughts here -- not really trying to make a "this is the direction I think we should go" conclusion reply yet...

This definitely did not mean to weigh in on SSO, but also I would say SSO is an optimization of indirect storage in a type when it's value is small, rather than being a direct-storage type.

There are a number of API designs that distinguish between strictly indirect storage and something that permits SSO -- the implications of moving the object specifically.

Agree, yeah. Generally in a safe language, I expect you won't be able to hold a pointer into an indirect-storage type through it being destructively moved either, but maybe it could be exposed through the lifetime API somehow. A type with the SSO would then have to act like it's always direct-storage though, which would be more restrictive.

I don't know that static vs dynamic quite captures the intent here, it feels a bit worse to me. You can have static allocations on the heap and that's not what an array is, so we lose some of the array vs heap-array difference.

Hmm... One thing that has been often requested in C and C++ is to have the compiler automatically promote large stack arrays to actually be heap arrays. And I've seen libraries that try to provide this distinction directly in the library, and even expose extra APIs (specifically around moving into something explicitly on the heap) when it ends up on the heap.

I think that too blurs the line between stack vs. heap.

I certainly would agree a type can be a hybrid of direct and indirect storage: llvm::SmallVector for example. Automatic promotion of large stack arrays sounds like a different form of optimization that should be transparent from the perspective of code using those arrays, so conceptually I'd expect them to be treated as direct storage, even if the implementation internally does something different. Do these words still provide enough vocabulary flexibility to discuss such a scenario? Personally, I would think so.

Also, with allocation customization, I could imagine lots of data structures where you're describing them as heap-based actually using global memory reservations, the stack, or something else...

+1, I think we could definitely qualify any mention of heap allocation with "as defined by the allocator" or something, much like std::Vec does here: https://doc.rust-lang.org/std/vec/struct.Vec.html#guarantees

Would it be enough to just explicitly discuss that I am talking about direct vs indirect storage as the type category and SSO as an optimization doesn't change the type from an indirect-storage type?

Still thinking about this, but some of the things that immediately popped into my head here are above.

Thanks, let's think about this a bit more and lmk what changes you'd like to see once we're clear.

I still struggle with "direct" and "indirect" here -- I think it suggests similarity between std::vector and std::unique_ptr that we won't want. If we try to apply it to std::string for example, it becomes very problematic because std::string doesn't provide address-stability of the contained data across move, but I think that's a good and reasonable thing to expect from std::unique_ptr.

Fundamentally, I think std::unique_ptr is indirect, and I like the term for that specific case.

But for what you're describing here, I continue to think the distinction is more about static allocation (a local array) vs. dynamic allocation (an owning pointer to an indirect array, or something growable like std::string or std::vector).

(That said, I continue to not think this is blocking -- we can have the wording in the background section of a proposal that doesn't match what we end up with long-term in the design. =])

proposals/p4682.md

danakj · 2025-02-04T19:37:13Z

We discussed this in the toolchain sync a few weeks ago, and at the time it sounded like the leads were all landing on the idea that:

Use a lowercase name to indicate it's a thing you have to memorize at the top level, as a higher priority than matching the name exactly that it forwards to: array(T, N)
Use a builtin specifically, which (as mentioned in comments above, and which I didn't understand while writing the original proposal text) is different than an implicit import from Core, as it disallows using the name array in scopes other than the file scope.

Recording these here for now, but waiting to see if the leads really converge on this outcome or decide to alter it a bit.

chandlerc · 2025-02-04T20:23:26Z

We discussed this in the toolchain sync a few weeks ago, and at the time it sounded like the leads were all landing on the idea that:

Use a lowercase name to indicate it's a thing you have to memorize at the top level, as a higher priority than matching the name exactly that it forwards to: array(T, N)

Use a builtin specifically, which (as mentioned in comments above, and which I didn't understand while writing the original proposal text) is different than an implicit import from Core, as it disallows using the name array in scopes other than the file scope.

Recording these here for now, but waiting to see if the leads really converge on this outcome or decide to alter it a bit.

So, if you (or anyone else) would like to record a more formal decision, I think the process we use to get there is to file a leads question with roughly the content above and a link here for context if needed. I'm also happy to do that if useful.

We don't always do this -- if folks reach a happy consensus, including the author of a proposal, can just update the proposal and ask the leads to take another look at it. It's basically an optional process for factoring out a smaller decision from a larger "yes" or "no" on a proposal as a whole.

Sorry if that process wasn't clear!

We propose to add `Core.Array(T, N)` as a library type in the `Core` package. Since arrays are a very frequent type, we propose to privilege use of this type by including it in the `prelude` library of the package. We would like to see a shorthand where `Core.Array` is automatically imported into the file scope, and this proposal includes future work to this effect.

We propose to add `Core.Array(T, N)` as a library type in the `prelude` library of the `Core` package. Since arrays are a very frequent type, we propose to privilege use of this type by providing a builtin `Array(T, N)` type that resolves to the `Core.Array(T, N)` type. Users can model this as an implicit import of the `Core.Array(T, N)` type into the global scope, much like the implicit import of the `prelude` library of the `Core` package.

imports are typically in the file scope, they don't become part of the package

danakj

PTAL

chandlerc

The proposal here largely looks good to me. Still pondering some of the background discussion wording, but that also doesn't need to be perfect. Checking with other leads to see if others have any feedback on this iteration....

chandlerc · 2025-02-13T19:41:31Z

proposals/p4682.md

+with arrays. Indirect refers to heap allocation, where the type itself holds
+storage of a pointer to the buffer, as with heap-buffers.


Mostly thinking / musing about this and capturing those thoughts here -- not really trying to make a "this is the direction I think we should go" conclusion reply yet...

This definitely did not mean to weigh in on SSO, but also I would say SSO is an optimization of indirect storage in a type when it's value is small, rather than being a direct-storage type.

There are a number of API designs that distinguish between strictly indirect storage and something that permits SSO -- the implications of moving the object specifically.

I don't know that static vs dynamic quite captures the intent here, it feels a bit worse to me. You can have static allocations on the heap and that's not what an array is, so we lose some of the array vs heap-array difference.

Hmm... One thing that has been often requested in C and C++ is to have the compiler automatically promote large stack arrays to actually be heap arrays. And I've seen libraries that try to provide this distinction directly in the library, and even expose extra APIs (specifically around moving into something explicitly on the heap) when it ends up on the heap.

I think that too blurs the line between stack vs. heap.

Also, with allocation customization, I could imagine lots of data structures where you're describing them as heap-based actually using global memory reservations, the stack, or something else...

Would it be enough to just explicitly discuss that I am talking about direct vs indirect storage as the type category and SSO as an optimization doesn't change the type from an indirect-storage type?

Still thinking about this, but some of the things that immediately popped into my head here are above.

In line with the proposal in carbon-language#4682, this changes the array syntax to be array(T, N). `array` is a builtin keyword which must be followed by parens containing two expressions and a separating comma. The array type expression is still fully builtin, it does not forward to a Core.Array library type yet. It merely adds the `ArrayType` instruction, as was done with the previous syntax. Followup work will change the instruction to reference to Core.Array, once the library type exists and can be used directly.

chandlerc

Checked with leads, and we're good to go here, ship it!

(We can keep discussing the best terminology, that's not blocking.)

chandlerc · 2025-02-20T01:49:39Z

proposals/p4682.md

+with arrays. Indirect refers to heap allocation, where the type itself holds
+storage of a pointer to the buffer, as with heap-buffers.


I still struggle with "direct" and "indirect" here -- I think it suggests similarity between std::vector and std::unique_ptr that we won't want. If we try to apply it to std::string for example, it becomes very problematic because std::string doesn't provide address-stability of the contained data across move, but I think that's a good and reasonable thing to expect from std::unique_ptr.

Fundamentally, I think std::unique_ptr is indirect, and I like the term for that specific case.

But for what you're describing here, I continue to think the distinction is more about static allocation (a local array) vs. dynamic allocation (an owning pointer to an indirect array, or something growable like std::string or std::vector).

(That said, I continue to not think this is blocking -- we can have the wording in the background section of a proposal that doesn't match what we end up with long-term in the design. =])

In line with the proposal in #4682, this changes the array syntax to be array(T, N). `array` is a builtin keyword which must be followed by parens containing two expressions and a separating comma. The array type expression is still fully builtin, it does not forward to a Core.Array library type yet. It merely adds the `ArrayType` instruction, as was done with the previous syntax. Followup work will change the instruction to reference to Core.Array, once the library type exists and can be used directly. --------- Co-authored-by: zygoloid <[email protected]>

danakj added proposal A proposal proposal draft Proposal in draft, not ready for review labels Dec 13, 2024

danakj force-pushed the proposal-array-forwards-to-th branch from e5ce9a2 to 1eac5a0 Compare December 13, 2024 15:02

danakj force-pushed the proposal-array-forwards-to-th branch 4 times, most recently from 7729072 to 41cdb12 Compare January 8, 2025 19:35

danakj marked this pull request as ready for review January 8, 2025 19:36

github-actions bot added proposal rfc Proposal with request-for-comment sent out and removed proposal draft Proposal in draft, not ready for review labels Jan 8, 2025

danakj requested review from zygoloid and chandlerc January 8, 2025 19:36

github-actions bot requested a review from KateGregory January 8, 2025 19:36

danakj force-pushed the proposal-array-forwards-to-th branch from 41cdb12 to fca98cc Compare January 8, 2025 19:39

zygoloid reviewed Jan 13, 2025

View reviewed changes

proposals/p4682.md Show resolved Hide resolved

proposals/p4682.md Outdated Show resolved Hide resolved

danakj force-pushed the proposal-array-forwards-to-th branch from 3976e27 to 8d90337 Compare January 20, 2025 16:25

danakj changed the title ~~Array forwards to the prelude~~ The Core.Array type for direct storage immutably-sized buffers Jan 20, 2025

danakj force-pushed the proposal-array-forwards-to-th branch from 09c2e46 to 7d390dd Compare January 20, 2025 18:49

danakj commented Jan 20, 2025

View reviewed changes

proposals/p4682.md Show resolved Hide resolved

proposals/p4682.md Outdated Show resolved Hide resolved

danakj requested a review from zygoloid January 20, 2025 18:54

chandlerc reviewed Jan 21, 2025

View reviewed changes

danakj requested a review from chandlerc January 21, 2025 18:01

danakj added 4 commits February 13, 2025 09:56

Fix numbering

e24ebcf

Use sub-headings for alternatives considered

8494b3e

danakj and others added 10 commits February 13, 2025 09:56

hyphen

3fd966b

providing-typo

05020fb

mention spanTN vs arrayTN-pointer

63647ce

global scope -> package scope

91e7f70

specify T and N

dc2b7f8

package scope -> file scope

a49f5da

imports are typically in the file scope, they don't become part of the package

Add table, talk about Swift Slab and math vectors

f896d81

Move the forward from file scope to future work

8aba24c

link future work

da7bb75

put T[N] before std::array<T, N>, relocatable=>resizable

e457797

danakj force-pushed the proposal-array-forwards-to-th branch from 8254427 to e457797 Compare February 13, 2025 14:56

array-keyword

6511346

danakj force-pushed the proposal-array-forwards-to-th branch from 4cd3bcc to 6511346 Compare February 13, 2025 15:52

danakj changed the title ~~The Core.Array type for direct storage immutably-sized buffers~~ The Core.Array type for direct-storage immutably-sized buffers Feb 13, 2025

title

1172c45

danakj commented Feb 13, 2025

View reviewed changes

danakj added 2 commits February 13, 2025 10:55

full-name

d5183a7

remove-period-in-heading

802018e

chandlerc reviewed Feb 13, 2025

View reviewed changes

danakj mentioned this pull request Feb 19, 2025

Change array syntax from [T; N] to array(T, N) #4981

Merged

chandlerc approved these changes Feb 20, 2025

View reviewed changes

danakj added this pull request to the merge queue Feb 20, 2025

Merged via the queue into carbon-language:trunk with commit 4b45caa Feb 20, 2025
8 checks passed

danakj deleted the proposal-array-forwards-to-th branch February 20, 2025 15:32

		with arrays. Indirect refers to heap allocation, where the type itself holds
		storage of a pointer to the buffer, as with heap-buffers.

The Core.Array type for direct-storage immutably-sized buffers #4682

The Core.Array type for direct-storage immutably-sized buffers #4682

Uh oh!

Conversation

danakj commented Dec 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clavin commented Jan 13, 2025

Uh oh!

Uh oh!

Uh oh!

danakj left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

danakj commented Jan 20, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

danakj commented Feb 4, 2025

Uh oh!

chandlerc commented Feb 4, 2025

Uh oh!

danakj left a comment

Choose a reason for hiding this comment

Uh oh!

chandlerc left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chandlerc left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

danakj commented Dec 13, 2024 •

edited

Loading