-
Notifications
You must be signed in to change notification settings - Fork 18.6k
Make training iterations 0-indexed to display training loss in 0th iter #800
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
(it also improves #796 by displaying the debug outputs before any updates have been made, which may help clarify whether a problem is with initialization vs. training) |
Oh, on second thought I gave a bad example of how it cleans up the logic; instead of a special test before the training loop we just have to do a special test afterwards instead, whoops. Still prefer the 0-indexing for the other reasons mentioned though :) |
Always thumb up for 0 index :) |
src/caffe/solver.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
display && param.debug_info()
May be faster than ?:
Thanks for the review Yangqing! Not sure why I used ternary operator instead of &&; fixed. Also, there was another small problem: I'd added I also added a final forward pass of training after optimization so that we display the loss @ max_iter as dictated by the Finally, in the last commit I moved the loss display to right after ForwardBackward, before ComputeUpdateValue. I think it makes sense to print the loss as soon as we know it, but I mainly did it so that the added loss display at the end doesn't look weird:
rather than:
|
Make training iterations 0-indexed to display training loss in 0th iter
Make training iterations 0-indexed to display training loss in 0th iter
Make training iterations 0-indexed to display training loss in 0th iter
This PR makes the training iterations 0 indexed so that the loss is displayed in the 0th iter (if display is set), which I think makes sense now that we display the test outputs on the "0th" iter. It also makes the logic a bit cleaner/more natural (IMO), e.g. we can just use the one
TestAll
call inside the training loop rather than having a special one before. I personally like the semantics better -- the iteration number is the number of times the weights have been updated. Which is actually how the snapshots behaved before, so snapshots should be exactly the same (assuming you use a random seed and all other training inputs are the same), but now the loss displayed as "Iteration 20" will be the loss that used to display as "Iteration 21".Any thoughts, @Yangqing or anyone else?