-
Notifications
You must be signed in to change notification settings - Fork 1.5k
The Core.Array type for direct-storage immutably-sized buffers #4682
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The Core.Array type for direct-storage immutably-sized buffers #4682
Conversation
e5ce9a2
to
1eac5a0
Compare
7729072
to
41cdb12
Compare
41cdb12
to
fca98cc
Compare
Wonderful proposal! 😄 The rationale behind this design is very impressive. I wanted to highlight this information for your consideration: Swift is currently working on a new standard array type whose size is fixed at compile-time. In the proposal it is called One part of that proposal stuck out to me and reminded me of this proposal. They establish that they cannot use the name
However, they clearly acknowledge that this decision is a potential source of confusion, especially for developers coming from C++:
It seems they ultimately walked back this decision, given the rename to This work in Swift may be too recent and volatile to derive any design decisions in Carbon right now, but it could add beneficial discussion to this proposal. Consider noting this ongoing design work in the Background section for Swift. Also, consider discussing the argument that |
3976e27
to
8d90337
Compare
09c2e46
to
7d390dd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, PTAL
@clavin Thanks for all the links and context regarding the Swift |
with arrays. Indirect refers to heap allocation, where the type itself holds | ||
storage of a pointer to the buffer, as with heap-buffers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it really influences this proposal in either direction, but I'm a bit uncomfortable with establishing a dichotomy between indirect vs. direct here. I'm not sure that's the focal point we want to rely on here.
For example, I worry that would lead towards designs that preclude small-size optimization or other techniques that don't use the heap necessarily...
I lean a bit towards static vs. dynamic allocation, static vs. dynamic immutable vs. dynamic mutable size as the classification basis...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This definitely did not mean to weigh in on SSO, but also I would say SSO is an optimization of indirect storage in a type when it's value is small, rather than being a direct-storage type.
I don't know that static vs dynamic quite captures the intent here, it feels a bit worse to me. You can have static allocations on the heap and that's not what an array is, so we lose some of the array vs heap-array difference. Would it be enough to just explicitly discuss that I am talking about direct vs indirect storage as the type category and SSO as an optimization doesn't change the type from an indirect-storage type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a paragraph to clarify the intent here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly thinking / musing about this and capturing those thoughts here -- not really trying to make a "this is the direction I think we should go" conclusion reply yet...
This definitely did not mean to weigh in on SSO, but also I would say SSO is an optimization of indirect storage in a type when it's value is small, rather than being a direct-storage type.
There are a number of API designs that distinguish between strictly indirect storage and something that permits SSO -- the implications of moving the object specifically.
I don't know that static vs dynamic quite captures the intent here, it feels a bit worse to me. You can have static allocations on the heap and that's not what an array is, so we lose some of the array vs heap-array difference.
Hmm... One thing that has been often requested in C and C++ is to have the compiler automatically promote large stack arrays to actually be heap arrays. And I've seen libraries that try to provide this distinction directly in the library, and even expose extra APIs (specifically around moving into something explicitly on the heap) when it ends up on the heap.
I think that too blurs the line between stack vs. heap.
Also, with allocation customization, I could imagine lots of data structures where you're describing them as heap-based actually using global memory reservations, the stack, or something else...
Would it be enough to just explicitly discuss that I am talking about direct vs indirect storage as the type category and SSO as an optimization doesn't change the type from an indirect-storage type?
Still thinking about this, but some of the things that immediately popped into my head here are above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly thinking / musing about this and capturing those thoughts here -- not really trying to make a "this is the direction I think we should go" conclusion reply yet...
This definitely did not mean to weigh in on SSO, but also I would say SSO is an optimization of indirect storage in a type when it's value is small, rather than being a direct-storage type.
There are a number of API designs that distinguish between strictly indirect storage and something that permits SSO -- the implications of moving the object specifically.
Agree, yeah. Generally in a safe language, I expect you won't be able to hold a pointer into an indirect-storage type through it being destructively moved either, but maybe it could be exposed through the lifetime API somehow. A type with the SSO would then have to act like it's always direct-storage though, which would be more restrictive.
I don't know that static vs dynamic quite captures the intent here, it feels a bit worse to me. You can have static allocations on the heap and that's not what an array is, so we lose some of the array vs heap-array difference.
Hmm... One thing that has been often requested in C and C++ is to have the compiler automatically promote large stack arrays to actually be heap arrays. And I've seen libraries that try to provide this distinction directly in the library, and even expose extra APIs (specifically around moving into something explicitly on the heap) when it ends up on the heap.
I think that too blurs the line between stack vs. heap.
I certainly would agree a type can be a hybrid of direct and indirect storage: llvm::SmallVector for example. Automatic promotion of large stack arrays sounds like a different form of optimization that should be transparent from the perspective of code using those arrays, so conceptually I'd expect them to be treated as direct storage, even if the implementation internally does something different. Do these words still provide enough vocabulary flexibility to discuss such a scenario? Personally, I would think so.
Also, with allocation customization, I could imagine lots of data structures where you're describing them as heap-based actually using global memory reservations, the stack, or something else...
+1, I think we could definitely qualify any mention of heap allocation with "as defined by the allocator" or something, much like std::Vec
does here: https://doc.rust-lang.org/std/vec/struct.Vec.html#guarantees
Would it be enough to just explicitly discuss that I am talking about direct vs indirect storage as the type category and SSO as an optimization doesn't change the type from an indirect-storage type?
Still thinking about this, but some of the things that immediately popped into my head here are above.
Thanks, let's think about this a bit more and lmk what changes you'd like to see once we're clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still struggle with "direct" and "indirect" here -- I think it suggests similarity between std::vector
and std::unique_ptr
that we won't want. If we try to apply it to std::string
for example, it becomes very problematic because std::string
doesn't provide address-stability of the contained data across move, but I think that's a good and reasonable thing to expect from std::unique_ptr
.
Fundamentally, I think std::unique_ptr
is indirect, and I like the term for that specific case.
But for what you're describing here, I continue to think the distinction is more about static allocation (a local array) vs. dynamic allocation (an owning pointer to an indirect array, or something growable like std::string
or std::vector
).
(That said, I continue to not think this is blocking -- we can have the wording in the background section of a proposal that doesn't match what we end up with long-term in the design. =])
We discussed this in the toolchain sync a few weeks ago, and at the time it sounded like the leads were all landing on the idea that:
Recording these here for now, but waiting to see if the leads really converge on this outcome or decide to alter it a bit. |
So, if you (or anyone else) would like to record a more formal decision, I think the process we use to get there is to file a leads question with roughly the content above and a link here for context if needed. I'm also happy to do that if useful. We don't always do this -- if folks reach a happy consensus, including the author of a proposal, can just update the proposal and ask the leads to take another look at it. It's basically an optional process for factoring out a smaller decision from a larger "yes" or "no" on a proposal as a whole. Sorry if that process wasn't clear! |
We propose to add `Core.Array(T, N)` as a library type in the `Core` package. Since arrays are a very frequent type, we propose to privilege use of this type by including it in the `prelude` library of the package. We would like to see a shorthand where `Core.Array` is automatically imported into the file scope, and this proposal includes future work to this effect.
We propose to add `Core.Array(T, N)` as a library type in the `prelude` library of the `Core` package. Since arrays are a very frequent type, we propose to privilege use of this type by providing a builtin `Array(T, N)` type that resolves to the `Core.Array(T, N)` type. Users can model this as an implicit import of the `Core.Array(T, N)` type into the global scope, much like the implicit import of the `prelude` library of the `Core` package.
imports are typically in the file scope, they don't become part of the package
8254427
to
e457797
Compare
4cd3bcc
to
6511346
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proposal here largely looks good to me. Still pondering some of the background discussion wording, but that also doesn't need to be perfect. Checking with other leads to see if others have any feedback on this iteration....
with arrays. Indirect refers to heap allocation, where the type itself holds | ||
storage of a pointer to the buffer, as with heap-buffers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly thinking / musing about this and capturing those thoughts here -- not really trying to make a "this is the direction I think we should go" conclusion reply yet...
This definitely did not mean to weigh in on SSO, but also I would say SSO is an optimization of indirect storage in a type when it's value is small, rather than being a direct-storage type.
There are a number of API designs that distinguish between strictly indirect storage and something that permits SSO -- the implications of moving the object specifically.
I don't know that static vs dynamic quite captures the intent here, it feels a bit worse to me. You can have static allocations on the heap and that's not what an array is, so we lose some of the array vs heap-array difference.
Hmm... One thing that has been often requested in C and C++ is to have the compiler automatically promote large stack arrays to actually be heap arrays. And I've seen libraries that try to provide this distinction directly in the library, and even expose extra APIs (specifically around moving into something explicitly on the heap) when it ends up on the heap.
I think that too blurs the line between stack vs. heap.
Also, with allocation customization, I could imagine lots of data structures where you're describing them as heap-based actually using global memory reservations, the stack, or something else...
Would it be enough to just explicitly discuss that I am talking about direct vs indirect storage as the type category and SSO as an optimization doesn't change the type from an indirect-storage type?
Still thinking about this, but some of the things that immediately popped into my head here are above.
In line with the proposal in carbon-language#4682, this changes the array syntax to be array(T, N). `array` is a builtin keyword which must be followed by parens containing two expressions and a separating comma. The array type expression is still fully builtin, it does not forward to a Core.Array library type yet. It merely adds the `ArrayType` instruction, as was done with the previous syntax. Followup work will change the instruction to reference to Core.Array, once the library type exists and can be used directly.
In line with the proposal in carbon-language#4682, this changes the array syntax to be array(T, N). `array` is a builtin keyword which must be followed by parens containing two expressions and a separating comma. The array type expression is still fully builtin, it does not forward to a Core.Array library type yet. It merely adds the `ArrayType` instruction, as was done with the previous syntax. Followup work will change the instruction to reference to Core.Array, once the library type exists and can be used directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked with leads, and we're good to go here, ship it!
(We can keep discussing the best terminology, that's not blocking.)
with arrays. Indirect refers to heap allocation, where the type itself holds | ||
storage of a pointer to the buffer, as with heap-buffers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still struggle with "direct" and "indirect" here -- I think it suggests similarity between std::vector
and std::unique_ptr
that we won't want. If we try to apply it to std::string
for example, it becomes very problematic because std::string
doesn't provide address-stability of the contained data across move, but I think that's a good and reasonable thing to expect from std::unique_ptr
.
Fundamentally, I think std::unique_ptr
is indirect, and I like the term for that specific case.
But for what you're describing here, I continue to think the distinction is more about static allocation (a local array) vs. dynamic allocation (an owning pointer to an indirect array, or something growable like std::string
or std::vector
).
(That said, I continue to not think this is blocking -- we can have the wording in the background section of a proposal that doesn't match what we end up with long-term in the design. =])
In line with the proposal in #4682, this changes the array syntax to be array(T, N). `array` is a builtin keyword which must be followed by parens containing two expressions and a separating comma. The array type expression is still fully builtin, it does not forward to a Core.Array library type yet. It merely adds the `ArrayType` instruction, as was done with the previous syntax. Followup work will change the instruction to reference to Core.Array, once the library type exists and can be used directly. --------- Co-authored-by: zygoloid <[email protected]>
We propose to add
Core.Array(T, N)
as a library type in theCore
package. Since arrays are a very frequent type, we propose to privilege use of this type by including it in theprelude
library of the package.We would like to see a shorthand where
Core.Array
is automatically imported into the file scope, and this proposal includes future work to this effect.