-
-
Notifications
You must be signed in to change notification settings - Fork 195
Faster TLS v5 #1212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster TLS v5 #1212
Conversation
296e23c
to
f76ecac
Compare
…v/math into feature/faster-ad-tls-v5
@syclik What do you think about point 2 above? I am good with not reverting the commit 236b6d9 as we would then merge exactly what was approved before. Revering that one can still be done whenever we touch that AD system again later (and I plan to overhaul it a bit with parallelisation which is coming). ... or we revert that commit, but then we would have to rerun a few performance benchmarks, I guess, as a sanity check. This is why I am somewhat in favor for leaving that commit in for now... we will anyway benchmark the AD stack thoroughly when we add parallelisation. |
It would be really good to get this merged as is given it was already approved. I just had to fight another merge conflict which is unnecessary resource drain. |
Sorry, missed the question. It would really be better to revert that commit (point 2). That adds another level of complication to this PR; this changes the API when it seems like it's unnecessary. (At least based on what you've said.) Could you revert it? I'll approve it within a day of that happening, and we'll merge when tests pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thanks for cleaning up the PR so much! It made it a lot easier to review.
@@ -46,6 +46,7 @@ TEST(AgradAutoDiff, gradient_threaded) { | |||
EXPECT_FLOAT_EQ(x_ref(0) * x_ref(0) + 3 * 2 * x_ref(1), grad_fx_ref(1)); | |||
|
|||
auto thread_job = [&](double x1, double x2) { | |||
stan::math::ChainableStack thread_instance; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this required now when using threads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you need an instance of ChainableStack in every thread. Otherwise the AD tape is not initialized. You can have more than one of those instances, but only the first one created is relevant.
@@ -0,0 +1,102 @@ | |||
#ifndef STAN_MATH_REV_MAT_FUNCTOR_MAP_RECT_CONCURRENT_HPP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick question: why was this logic moved? I didn't see it mentioned in the PR description and it's easier to spot it now in the diff. (Just wondering why it's done this way, I'm not trying to imply that there's anything wrong with this.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was moved due to dependecy reasons. The threaded map_rect
depends now on the include
#include <stan/math/rev/core/chainablestack.hpp>
as we are sticking the chainablestack instance into the lambda doing all the work. This requires us to move essentially the prim version into rev. This is not ideal, but once we get the TBB we can handle the instantiation of chainablestack better (for now we have to make sure via the lambda that an instance is available).
@seantalts & @serban-nicusor-toptal could you please have a look at the (upstream) Jenkins failure on this one? The error looks like garbage to me. Thanks. |
@wds15 Thanks! |
I am running now into this for upstream tests... looks like Jenkins or something-else-out-of-my-control thing?
|
I see a build in progress: http://d1m1s1b1.stat.columbia.edu:8080/job/Math%20Pipeline/view/change-requests/job/PR-1212/13/ I will let that run and if it fails I will look into why that happened. |
Ok... now the downstream cmdstan build 599 fails with:
These Jenkins failures are very frustrating... so help is very much appreciated. |
Good morning, looking into it. |
Thanks for restarting... one more hickup:
One more restart? |
I restarted once more. @seantalts I hope there are plans+ideas to stabilize Jenkins... this PR is now on its way to do it's third repeat on the Jenkins pipeline. The Jenkins config appears fragile and Windows is not stable. If possible we could almost think about automatic restarts for those OS related failures. Anything what makes this more stable would be great. It's work to follow this along and babysit. |
It's pretty strange that this happens, it never did before and there were no changes to how this part worked. Thank you! |
Sure, will keep you posted. Windows problems I have seen in the past: virus scanner, no disk space, permissions, ... |
What is going on now? Jenkins claims that the performance tests failed downstream, but when I go there everything is green. I am not sure where I can look for an error. If in doubt I am going to restart a 4th time as there were changes in between to the Jenkins config. |
Let's take a look. |
That happened because the performance was +15% worse than the last run. |
Thanks for diving into this. (This changeset was merged 2x to develop and reverted 2x...looks like now Jenkins really wants to prevent this presumably possibly "final" merge) |
ahh... that makes sense. This is because the SIR example on develop is now a lot faster and that changeset hasn't been merge to this PR. Where would I have seen this? Is there some doc on where I can see this myself? I mean to me things just look odd. |
It will now complete successfully! As always please keep me updated anytime Jenkins goes nuts. Thanks! Have a great day ! |
…v/math into feature/faster-ad-tls-v5
See my latest commit: stan-dev/performance-tests-cmdstan@00f25e9 |
I am happy to ping you, but I also don't intend to waste your time on these things... so if there is doc about things I should know then just let me know. |
You're not wasting my time. Sadly there isn't a doc about this but I've assigned myself an Issue for this so I'll work on it. It'll take a bit since I have to understand the whole picture too. Expect to start finding stuff in here by the next week or so: https://github.com/stan-dev/jenkins-config/wiki
This is the part that triggered the FAILURE. Which translated to, if a test performs 15% worse, the job will automatically fail and in a later stage notify through email. |
Ok...right now it is not apparent to me to see this 15% being crossed. I would also like to see which of the tests failed. I am usually clicking on red stuff to find out and that did not work this time. It would be good if the doc contained pointers where i can look for helping myself. Apart from that we should consider testing not the pr branch itself, but the tests should be on the pr branch merged to develop (this is anyway what we care about). |
You can go here: http://d1m1s1b1.stat.columbia.edu:8080/job/CmdStan%20Performance%20Tests/job/downstream_tests/24/console And all the way down. You should see the test results that caused the Math Job to FAIL. I think the test logic was fine, the reporting tool was the issue here. We're in Math Console, to see the Stan downstream just do a |
@serban-nicusor-toptal Looks like one more hickup:
This happens in the cmdstan downstream tests. Thanks for looking into this. |
Good morning.
Which in our case is:
And the script is called by the following:
So i have no idea how the -j25 got in the Investigating more |
Whow... after 5x repeats of running Jenkins! |
Oh that was me. I didn't know cmdstan had a limit on j and it passed tests
before I guess because it didn't run on a large machine.
…On Sat, May 11, 2019 at 12:05 wds15 ***@***.***> wrote:
Merged #1212 <#1212> into develop.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1212 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAGET3F7BU76CI5QT3QASTTPU3VFBANCNFSM4HHOIMBA>
.
|
Revert "Merge pull request #1212 from stan-dev/feature/faster-ad-tls-v5"
…-ad-tls-v5"" This reverts commit db1a002.
This reverts commit 07209fb, reversing changes made to 7dfc00c.
This PR reapplies the changes of the PR faster TLS v4 which was reverted from develop after test failures showed up on Windows under the cmd shell.
The test failures of develop seem to be caused by a bug in the
test/unit/math/rev/arr/functor/coupled_ode_system_test.cpp
which should be resolved after merging the branch https://github.com/stan-dev/math/tree/bugfix/issue-1210-coupled_ode_system . The final confirmation is pending as this bug in the test only showed up when this PR was combined with the original code and when run on Windows on the cmd shell.The content of this PR was approved as is in its previous v4 version. In this (hopefully last) iteration the reviewer may consider these points:
Agreement on proceeding with issue Problem with coupled_ode_system #1210Done, it's merged.ChainableStack::instance()
intoChainableStack::instance_->
. I did this as I was anticipating compiler problems on Windows with it which I was hoping to avoid, but as it turns out, this is not the case and the additional abstraction which we had with theinstance()
method is a good one. On the other hand we can keep things simple and just merge this PR as is to avoid further need to validate this PR. We can revert this change later should we think it is needed.Summary
Please refer to the linked v4 PR, #1171 , of this for details.
As I am not a git guru, here is what I did to get to this PR git wise:
Tests
Side Effects
Checklist
Math issue Thread performance penalty #1102 and map_rect with threading broken on windows #1134
Copyright holder: Sebastian Weber
The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
- Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
- Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
the basic tests are passing
./runTests.py test/unit
)make test-headers
)make doxygen
)make cpplint
)the code is written in idiomatic C++ and changes are documented in the doxygen
the new changes are tested