Skip to content

Conversation

jeffdonahue
Copy link
Contributor

This PR makes the training iterations 0 indexed so that the loss is displayed in the 0th iter (if display is set), which I think makes sense now that we display the test outputs on the "0th" iter. It also makes the logic a bit cleaner/more natural (IMO), e.g. we can just use the one TestAll call inside the training loop rather than having a special one before. I personally like the semantics better -- the iteration number is the number of times the weights have been updated. Which is actually how the snapshots behaved before, so snapshots should be exactly the same (assuming you use a random seed and all other training inputs are the same), but now the loss displayed as "Iteration 20" will be the loss that used to display as "Iteration 21".

Any thoughts, @Yangqing or anyone else?

@jeffdonahue
Copy link
Contributor Author

(it also improves #796 by displaying the debug outputs before any updates have been made, which may help clarify whether a problem is with initialization vs. training)

@jeffdonahue
Copy link
Contributor Author

Oh, on second thought I gave a bad example of how it cleans up the logic; instead of a special test before the training loop we just have to do a special test afterwards instead, whoops. Still prefer the 0-indexing for the other reasons mentioned though :)

@Yangqing
Copy link
Member

Always thumb up for 0 index :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

display && param.debug_info()

May be faster than ?:

@jeffdonahue
Copy link
Contributor Author

Thanks for the review Yangqing! Not sure why I used ternary operator instead of &&; fixed.

Also, there was another small problem: I'd added iter_ > 0 to the "should we snapshot?" condition, but if the snapshot was loaded from a solverstate it would do one update and then overwrite because iter_ is non-zero in the loaded snapshot. I fixed this by remembering the start_iter after loading a snapshot and changed the check to iter_ > start_iter.

I also added a final forward pass of training after optimization so that we display the loss @ max_iter as dictated by the display setting (but no backward pass because we've already done max_iter iterations).

Finally, in the last commit I moved the loss display to right after ForwardBackward, before ComputeUpdateValue. I think it makes sense to print the loss as soon as we know it, but I mainly did it so that the added loss display at the end doesn't look weird:

I0727 11:02:54.009799 17379 solver.cpp:107] Iteration 800, loss = 0.240632
I0727 11:02:54.009887 17379 solver.cpp:281] Iteration 800, lr = 0.00943913
I0727 11:02:55.549463 17379 solver.cpp:107] Iteration 900, loss = 0.159342
I0727 11:02:55.551097 17379 solver.cpp:281] Iteration 900, lr = 0.00937411
I0727 11:02:57.025902 17379 solver.cpp:127] Iteration 1000, loss = 0.0909304

rather than:

I0726 20:44:59.267554  4949 solver.cpp:277] Iteration 800, lr = 0.00943913
I0726 20:44:59.267832  4949 solver.cpp:107] Iteration 800, loss = 0.240632
I0726 20:45:00.714905  4949 solver.cpp:277] Iteration 900, lr = 0.00937411
I0726 20:45:00.715170  4949 solver.cpp:107] Iteration 900, loss = 0.159342
I0726 20:45:02.173050  4949 solver.cpp:123] Iteration 1000, loss = 0.0909304

@jeffdonahue
Copy link
Contributor Author

(also, sorry for polluting the PR with 4fe16db which just fixes some bad variable names I came up with in #796. merging after travis.)

jeffdonahue added a commit that referenced this pull request Jul 27, 2014
Make training iterations 0-indexed to display training loss in 0th iter
@jeffdonahue jeffdonahue merged commit 4bd9489 into BVLC:dev Jul 27, 2014
@jeffdonahue jeffdonahue deleted the zero-indexed-train-iter branch July 27, 2014 19:23
@jeffdonahue jeffdonahue mentioned this pull request Jul 30, 2014
@shelhamer shelhamer mentioned this pull request Aug 7, 2014
mitmul pushed a commit to mitmul/caffe that referenced this pull request Sep 30, 2014
Make training iterations 0-indexed to display training loss in 0th iter
RazvanRanca pushed a commit to RazvanRanca/caffe that referenced this pull request Nov 4, 2014
Make training iterations 0-indexed to display training loss in 0th iter
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants