Skip to content

Conversation

AlexandreSinger
Copy link
Contributor

I found that the way that the annealer estimates the initial temperature to be too high when the initial placement is of very good quality (for example, after AP).

Added a new way of estimating the starting temperature by setting it to an estimation of the equilibrium temperature. The equilibrium temperature is the temperature at which the change in cost after an annealing iteration would be 0.

The old way (of using the variance of the change in cost) is still the default; however, this new method can be turned on in the command-line.

I found that the way that the annealer estimates the initial temperature
to be too high when the initial placement is of very good quality (for
example, after AP).

Added a new way of estimating the starting temperature by
setting it to an estimation of the equilibrium temperature. The
equilibrium temperature is the temperature at which the change in cost
after an annealing iteration would be 0.

The old way (of using the variance of the change in cost) is still the
default; however, this new method can be turned on in the command-line.
@github-actions github-actions bot added VPR VPR FPGA Placement & Routing Tool docs Documentation lang-cpp C/C++ code labels Sep 10, 2025
@AlexandreSinger
Copy link
Contributor Author

Results on Titan (titan quick qor). Baseline is using the original cost variance approach (current default), the other is using my new equilibrium option (only one command-line change):

  baseline.txt equilibrium.txt
vtr_flow_elapsed_time 1 0.91
num_LAB 1 1
num_DSP 1 1
num_M9K 1 1
num_M144K 1 1
max_vpr_mem 1 1.00
num_pre_packed_blocks 1 1
num_post_packed_blocks 1 1
device_grid_tiles 1 1
pack_time 1 0.99
placed_wirelength_est 1 1.01
place_time 1 0.85
placed_CPD_est 1 0.99
routed_wirelength 1 1.01
critical_path_delay 1 0.99
geomean_nonvirtual_intradomain_critical_path_delay 1 1.00
crit_path_route_time 1 1.01

Overall, it looks like these changes improved place time by over 15% and improved CPD by 1%, at the expense of 1% wirelength! That's a very good tradeoff in my opinion! I predict AP would only be better!

Raw results:
comparison_output.xlsx

Looking at direct_rf:
Cost variance:
Screenshot from 2025-09-11 16-33-48

Equilibrium:
Screenshot from 2025-09-11 16-34-26

We can see that this new equilibrium approach is achieving its goal of not setting the temperature too high.

@vaughnbetz What do you think? I think we should not make this default yet; but this at least demonstrates the value of this approach.

@AmirhosseinPoolad FYI

@vaughnb-cerebras
Copy link

Definitely looks promising!

@AlexandreSinger
Copy link
Contributor Author

@soheilshahrouz was curious if the gains we are seeing is just due to this new estimator always scaling down the initial temperature; so we could get the same results by just scaling down the initial temperature.

To counter that point, I got the initial temperatures for each circuit for each estimator:

Circuit Baseline Temp Equilibrium Temp Est Ratio (Equil / Baseline)
bitcoin_miner_stratixiv_arch_timing.blif 7.90E-04 3.10E-05 0.04
bitonic_mesh_stratixiv_arch_timing.blif 7.10E-04 1.60E-04 0.23
cholesky_bdti_stratixiv_arch_timing.blif 6.10E-04 9.90E-05 0.16
cholesky_mc_stratixiv_arch_timing.blif 7.00E-04 2.30E-04 0.33
dart_stratixiv_arch_timing.blif 5.20E-04 1.10E-04 0.21
denoise_stratixiv_arch_timing.blif 6.00E-04 6.80E-05 0.11
des90_stratixiv_arch_timing.blif 6.50E-04 2.40E-04 0.37
directrf_stratixiv_arch_timing.blif 9.30E-04 2.40E-05 0.03
gsm_switch_stratixiv_arch_timing.blif 7.10E-04 5.70E-05 0.08
LU230_stratixiv_arch_timing.blif 7.50E-04 7.30E-05 0.10
LU_Network_stratixiv_arch_timing.blif 8.60E-04 4.20E-05 0.05
mes_noc_stratixiv_arch_timing.blif 4.90E-04 2.80E-05 0.06
minres_stratixiv_arch_timing.blif 8.10E-04 1.80E-04 0.22
neuron_stratixiv_arch_timing.blif 6.80E-04 4.20E-04 0.62
openCV_stratixiv_arch_timing.blif 7.90E-04 2.00E-04 0.25
segmentation_stratixiv_arch_timing.blif 9.30E-04 2.40E-04 0.26
SLAM_spheric_stratixiv_arch_timing.blif 4.60E-04 1.20E-04 0.26
sparcT1_chip2_stratixiv_arch_timing.blif 7.40E-04 3.40E-05 0.05
sparcT1_core_stratixiv_arch_timing.blif 4.60E-04 1.50E-04 0.33
sparcT2_core_stratixiv_arch_timing.blif 4.10E-04 3.50E-05 0.09
stap_qrd_stratixiv_arch_timing.blif 6.60E-04 5.80E-05 0.09
stereo_vision_stratixiv_arch_timing.blif 8.10E-04 4.40E-04 0.54
    AVERAGE 0.20

We can see that although, on average, the temperature was reduced by 4x, the ratio for most circuits is not near the average, with some circuits being 10x reduced, and some being 2x reduced. This demonstrates that this new approach is adapting based on the circuit's initial placement.

One big bonus of this new flow is that it does not have any magic scaling factors, which will make it more automatic (so we do not need to keep readjusting it using new scaling factors).

@AlexandreSinger
Copy link
Contributor Author

For @AmirhosseinPoolad , I ran VTR Master to see if there is any run time degredation due to this approach:

  vtr_master.txt variance.txt
vtr_flow_elapsed_time 1 0.965944
num_LAB 1 1
num_DSP 1 1
num_M9K 1 1
num_M144K 1 1
max_vpr_mem 1 0.999895
num_pre_packed_blocks 1 1
num_post_packed_blocks 1 1
device_grid_tiles 1 1
pack_time 1 0.966783
placed_wirelength_est 1 1
place_time 1 0.963469
placed_CPD_est 1 1
routed_wirelength 1 1
critical_path_delay 1 1
geomean_nonvirtual_intradomain_critical_path_delay 1 1
crit_path_route_time 1 0.98467

It looks like machine load hides the results some. But if we compare the change in pack time to the change in place time, we see that the run time does not increase by a noticeable amount due to this change.

accepted_swaps.reserve(move_lim);
rejected_swaps.reserve(move_lim);
for (int i = 0; i < move_lim; i++) {
bool manual_move_enabled = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd delete this. No need to enable manual moves during initial temperature calculations -- this is intended to let people directly control the annealer, but just doing that in the main annealer is enough.

There should be code like this at the start of the main annealer -- if it is there, there isn't any need to duplicate it here. If it isn't there, I think we should move this there anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this should not be here assuming that the entire purpose of this method is to estimate the starting temperature.

Looking at the code, however, I do not believe that the manual move feature is working at all. It appears to only ever turn on in the initial temperature estimation. I do not believe that this should be resolved in this PR, but I can raise an issue if you would like?

In the meantime, I will remove this from the equilibrium estimator (which I have added), but I will leave it the default flow for now so we do not regress the feature further.

Copy link
Contributor Author

@AlexandreSinger AlexandreSinger Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ill raise an issue to look into this!

@amin1377
Copy link
Contributor

@AlexandreSinger: For the results you reported on Titan and VTR benchmarks, are you using the default placement flow with just the equilibrium approach for init temp, or is this with AP? Also, if the results are for AP, when you say “place time,” are you referring to the total placement time or just the detailed placement (i.e., SA) portion? Thanks!

@amin1377
Copy link
Contributor

@AlexandreSinger: Just to write down what we discussed earlier. It would also be interesting to try a random initial placement (i.e., use dense placement for all blocks instead of doing centroid and then falling back on dense place) and check the initial temperature reported for both the default and equilibrium approaches. Ideally, with your approach, the temperature should follow this order: random > centriod > AP

@amin1377
Copy link
Contributor

@AlexandreSinger: Also, since this PR makes equilibrium the default init temp estimator, I’d suggest running the Koios benchmark as well. It’s become almost standard to run all three (VTR, Titan, Koios) to ensure there are no regressions.

Copy link
Contributor

@amin1377 amin1377 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Alex!

@AlexandreSinger
Copy link
Contributor Author

@AlexandreSinger: Also, since this PR makes equilibrium the default init temp estimator, I’d suggest running the Koios benchmark as well. It’s become almost standard to run all three (VTR, Titan, Koios) to ensure there are no regressions.

That was a typo... The default is variance in the code. I just messed up the docs. Thanks for pointing that out!

@AlexandreSinger
Copy link
Contributor Author

@AlexandreSinger: For the results you reported on Titan and VTR benchmarks, are you using the default placement flow with just the equilibrium approach for init temp, or is this with AP? Also, if the results are for AP, when you say “place time,” are you referring to the total placement time or just the detailed placement (i.e., SA) portion? Thanks!

All results presented on this PR are on default VTR (No AP).

@AlexandreSinger
Copy link
Contributor Author

@vaughnbetz FYI, I have read Prof. Rose's paper on equilibrium dynamics over the weekend and I have found that my work is a special case of the process that they proposed. In this work, I have independently achieved a special case of their approach where the P[ΔC] (used in their integration) is estimated using relative frequency. Instead of approaching this using theoretical probability though, my approach arrives at the same solution through reasoning about a trial anneal iteration. Its quite interesting actually!

@amin1377 FYI

@AlexandreSinger
Copy link
Contributor Author

@vaughnbetz I have updated the PR with your comments. Let me know if you have any further comments. Once we merge this in, I want to experiment more with accepting the swaps or not during the estimation.

Copy link
Contributor

@vaughnbetz vaughnbetz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vaughnbetz vaughnbetz merged commit 7a678a6 into verilog-to-routing:master Sep 17, 2025
30 checks passed
@AlexandreSinger AlexandreSinger deleted the feature-place-equilibrium-init-t-est branch September 18, 2025 14:17
@AlexandreSinger
Copy link
Contributor Author

@vaughnbetz I will present these results at the VTR meeting today, but I ran the original temperature estimator (variance) and my proposed equilibrium estimator on a set of placements across the Titan Benchmark Suite:

  • Default VPR's Initial Placement
  • Default VPR's Final Placement
  • AP's Initial Placement
  • AP's Final Placement

And got the final results:

  First Iter Cost Ratio Average First Iter Cost Ratio Variance First Iter Cost Ratio Average Absolute Error Geomean Initial Temperature Geomean Place Time Geomean Final WL Geomean Final CPD
Variance (default) 1.4045 0.4593 0.4048 1.00 1.00 1.00 1.00
Equilibrium 0.9995 0.0039 0.0319 0.22 0.81 1.00 1.01

I wanted to post these here to keep organized. As you can see, across all of these placements, the ratio of the cost before and after first iteration is MUCH closer to 1.00 and the variance is much tighter. This results in an initial temperature that is more than 4x lower than default with little to no loss in WL or CPD.

I wrote a script to collect these results automatically, so I am starting to use it to explore variations on my equilibrium approach. Will discuss in the VTR meeting.

@amin1377 FYI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation lang-cpp C/C++ code VPR VPR FPGA Placement & Routing Tool
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants