-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Add Thread::Local(T)
#15616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Thread::Local(T)
#15616
Conversation
I think both the Hash-based AFAIK the primary problem with the TLS not being supported isn't very common, so I wouldn't want to introduce unnecessary complexity and overhead just for them. I've built a small POC for using a "Virtual TLS", which basically means allocating a big struct with all thread-local values for a thread. On the few platforms not supporting TLS, we could still use pthread specifics for that one pointer. The same technique could be used to easily create a "Fiber-Local-Storage" (#15088). POC Code# :nodoc:
module Crystal::ThreadLocalManager
@[ThreadLocal]
@@this_thread_data : Void* = Pointer(Void).new(0u64)
@@containers : Crystal::PointerLinkedList(Container) = Crystal::PointerLinkedList(Container).new
@@list_lock : Mutex = Mutex.new(:unchecked)
struct Container
include Crystal::PointerLinkedList::Node
end
# Registers a TLS section for this thread using the GC
def self.register_self : Nil
# normally I'd use GC.malloc_uncollectable but crystal doesnt expose that
container = GC.malloc(sizeof(Container))
# There's no pointer-based unsafe_construct for struct
container.as(Container*).value = Container.new
@@this_thread_data = container
@@list_lock.synchronize do
@@containers.push(container.as(Container*))
end
end
# Destroys the TLS section of this thread
def self.unregister_self
container = @@this_thread_data
@@list_lock.synchronize do
@@containers.delete(container.as(Container*))
end
@@this_thread_data = Pointer(Void).new(0u64)
GC.free(container)
end
@[AlwaysInline]
def self.thread_data : Void*
@@this_thread_data
end
end
macro thread_local(var)
{% unless var.is_a?(TypeDeclaration) %}
{% var.raise "thread_local requires a type declaration.\n\
Example: thread_local pcre_context : Void* = Pointer(Void).null" %}
{% end %}
{% unless var.value %}
{% var.raise "thread_local requires a default value" %}
{% end %}
{% if @def %}
{% var.raise "Cannot use thread_local inside a def" %}
{% end %}
struct ::Crystal::ThreadLocalManager::Container
@%var : {{var.type}}{% if var.value %} = {{var.value}}{% end %}
end
def self.{{var.var.id}} : {{var.type}}
ptr = ::Crystal::ThreadLocalManager.thread_data + offsetof(::Crystal::ThreadLocalManager::Container, @%var)
ptr.as({{var.type}}*).value
end
def self.{{var.var.id}}=(value : {{var.type}})
ptr = ::Crystal::ThreadLocalManager.thread_data + offsetof(::Crystal::ThreadLocalManager::Container, @%var)
ptr.as({{var.type}}*).value = value
end
end
module Test
thread_local string : String = "test"
end
Crystal::ThreadLocalManager.register_self
Test.string = "thread 1!!"
Thread.new do
Crystal::ThreadLocalManager.register_self
puts Test.string # should print "test"
Test.string = "thread 2?"
puts Test.string # should print "thread 2?"
Crystal::ThreadLocalManager.unregister_self
end
sleep 3.seconds
puts Test.string # should print "thread 1!!"
EDIT: Since this is a fixed-size data structure, we could even try to allocate it at the stack bottom to remove any GC involvement. Performance-wise, that should be very close to using plain TLS. |
Referencing #15395 (comment):
AFAIU from the LLVM bug report (https://bugs.llvm.org/show_bug.cgi?id=19177), the noinline call is only relevant when Fibers change which Thread they run on (work stealing). Since crystal doesn't do that right now, this could probably be optimized. In the issue, a Dlang developer even stated that they just didn't implement work-stealing because it was a net performance loss. Also, I think we should more cleanly separate the Crystal Compiler/Language from the Stdlib. |
I think I must specify that That probably puts some perspective? |
Indeed, it usually works, but we have targets where it doesn't: OpenBSD and MinGW64 for example, Android (Termux) also exhibited problems, and I'm pretty sure it can't work with Cosmopolitan (no yet supported).
Using The GC visibility is what led to the
Which is what we introduced in Crystal 1.16: the |
@BlobCodes I must say that your PoC is very interesting, and a public We still can't rely on the And |
I still think we could reduce the overhead of reading a TLS variable. |
Oh my, indeed. It wouldn't even need the linked list + lock anymore, we just need |
Yes, the main pain point could be callbacks given to c libraries, being executed in another thread. |
It seems like the situation around thread-switching fibers hasn't gotten much better yet: It is apparently possible to communicate to LLVM that the fiber may be re-scheduled to another thread at a given point using I don't know if the If the latter is the case, we probably can't optimize TLS accesses right now. |
We can't use `Box(T)` because it will allocate in the GC HEAP and the only live reference to that memory will be put in the system Thread Local Storage (TLS) that the GC can't reach, and thus collect the pointer 💥
I fixed this PR (1012b54): we can't use |
I thought lots about this, and here are my personal conclusions:
Unresolved question: with the custom table, we'll have to wrap PCRE2 matchdata and jitstack pointers in a class with a finalizer to free them (no issues), but we'd always have a live reference from the |
Yeah, this is quite an intriguing concept 🤔 However, that means it's based on global identifiers, thus it can really only work for class properties. It can't be used for instance and local variables. |
So, what do we do with this PR? I believe it still stands, at least for:
|
Yeah, sounds good. We should continue this in parallel to FLS 👍 |
At fixed points. There is no external preemption of fibers; each fiber reschedules itself through |
The
@[ThreadLocal]
annotation only works on some targets and doesn't allow to register a destructor callback that will be invoked when a thread shuts down.We currently don't have threads shutting down, but with RFC 2 it will start happening (at least isolated contexts are expected to shutdown, others should eventually evolve to shutdown too), or use the complex
Crystal::ThreadLocalValue
to tie a value to a thread, which in turn requires finalize methods.Extracted from #15395
and now usesWe can't useBox
since #15562 has been merged.Box
, see 1012b54.This PR is to be followed by:
Thread::Local(T)
to track the current Thread object (instead of a mix of the annotation or system TLS).