-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathfast_vert_cache_opt.html
More file actions
1525 lines (1321 loc) · 74.2 KB
/
fast_vert_cache_opt.html
File metadata and controls
1525 lines (1321 loc) · 74.2 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<html>
<head>
<title>Linear-Speed Vertex Cache Optimisation</title>
<style>
<!--
/* Font Definitions */
@font-face
{font-family:Wingdings;
panose-1:5 0 0 0 0 0 0 0 0 0;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman";}
h1
{margin-top:12.0pt;
margin-right:0cm;
margin-bottom:3.0pt;
margin-left:0cm;
page-break-after:avoid;
font-size:16.0pt;
font-family:Arial;}
h2
{margin-top:12.0pt;
margin-right:0cm;
margin-bottom:3.0pt;
margin-left:0cm;
page-break-after:avoid;
font-size:14.0pt;
font-family:Arial;
font-style:italic;}
a:link, span.MsoHyperlink
{color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{color:#606420;
text-decoration:underline;}
p.Code, li.Code, div.Code
{margin:0cm;
margin-bottom:.0001pt;
font-size:10.0pt;
font-family:"Courier New";}
@page Section1
{size:595.3pt 841.9pt;
margin:72.0pt 90.0pt 72.0pt 90.0pt;}
div.Section1
{page:Section1;}
/* List Definitions */
ol
{margin-bottom:0cm;}
ul
{margin-bottom:0cm;}
-->
</style>
</head>
<body lang=EN-GB link=blue vlink="#606420">
<div class=Section1>
<h1>Linear-Speed Vertex Cache Optimisation</h1>
<p class=MsoNormal>Tom Forsyth, RAD Game Tools: <a
href="mailto:tom.forsyth@eelpi.gotdns.org">tom.forsyth@eelpi.gotdns.org</a></p>
<p class=MsoNormal>28<sup>th</sup> September 2006</p>
<p class=MsoNormal> </p>
<h2>Introduction</h2>
<p class=MsoNormal>This paper introduces an algorithm for optimising an indexed
triangle list to be friendly to a wide variety of hardware vertex caches of
unknown size. The method has the advantage of being universally good for a wide
range of cache sizes and replacement policies, of running in linear (and fast) time
and space proportional to the number of triangles in the mesh. It is also
within a few percent performance of the best known alternative algorithms, and
easy to understand and tweak if needed.</p>
<p class=MsoNormal> </p>
<h2>Background</h2>
<p class=MsoNormal>Indexed primitives have replaced strips and fans as the most
common rendering primitive in graphics hardware today. When rendering each
triangle, the vertices it uses must be processed (for example, transformed to
screen space and lit in various ways) before rendering. Indexed rendering hardware
typically uses a small cache of vertices to avoid re-processing vertices that
are shared between recently-rendered triangles. Using a moderately-sized cache
(anywhere between 12 and 24 vertices is common) reduces the load on the vertex
processing pipeline far more than the older non-indexed strips and fans model –
often halving the amount of work required per triangle.</p>
<p class=MsoNormal> </p>
<p class=MsoNormal>However, the vertex cache relies on the ordering of the
triangles to be optimised for its characteristics. If the triangles are sent in
a random order, the vertices almost always miss the cache and there is no
benefit obtained. If triangles are sent in clusters so that each triangle is
close to previously-rendered triangles, it is more likely that more of its
vertices are still in the cache.</p>
<p class=MsoNormal> </p>
<p class=MsoNormal>Thus, finding the correct order to render triangles in is
important to reduce the work of the vertex processing pipeline. The fundamental
problem is that meshes are typically two-dimensional surfaces, and caches are
at their most efficient when streaming one-dimensional data. Thus the ideal ordering
is far from obvious.</p>
<p class=MsoNormal> </p>
<p class=MsoNormal>[Hoppe] discussed an algorithm that optimised the order of
triangles for a given cache of a certain size and replacement policy. This is
useful for a hardware target that is fixed and known, such as a games console.
However, if the hardware cache is a different size or type, the algorithm needs
to be re-run for that new configuration.</p>
<p class=MsoNormal> </p>
<p class=MsoNormal>On many platforms, such as the PC or Macintosh, and also for
shipping the same art for many different consoles, the exact size and type of
the vertex cache is not known at authoring time, and frequently not even known
at runtime, as the exact structures are often proprietary information. For this
reason, [Bogomjakov] discusses a “universal rendering sequence”, a single
sequence that is favourable to a wide range of cache sizes and types, and has
no particular pathological weaknesses on any particular one of them.</p>
<p class=MsoNormal> </p>
<p class=MsoNormal>The usual way to refer to the efficiency of a vertex cache
is to divide the number of vertex-cache misses by the number of triangles in
the mesh. This gives the average cache miss ratio (ACMR) – the average number
of vertices that need to be processed per triangle.</p>
<p class=MsoNormal> </p>
<h2>Outline</h2>
<p class=MsoNormal>The algorithm presented here is based on a cache of vertices
using a least recently used (LRU) replacement policy. Various “scores” are
assigned to each vertex depending on its position within the LRU cache (i.e.
how recently-used it is). Triangles are also assigned a score, which is the sum
of the scores of the vertices it uses. When considering which triangle in the mesh
to add to the sequence next, the triangle with the greatest score is added. The
three vertices for the triangle are then moved or added to the top of the
cache, and the other vertices are shuffled down. The algorithm is greedy and
never backtracks or looks ahead at its choices of triangle.</p>
<p class=MsoNormal> </p>
<p class=MsoNormal>Although most hardware uses FIFO-replacement caches for
simplicity and speed, here we use an LRU-replacement cache. This cache is
referred to as the “modelled cache”, as opposed to the real “hardware cache”
that an actual graphics card has. The reason the modelled cache is LRU rather
than FIFO is that the single sequence of triangles this algorithm generates is
intended to be usable on a wide range of (usually unknown) hardware cache sizes
and replacement policies, and modelling with a FIFO cache of the incorrect size
causes either pathological “thrashing” of the real hardware cache, or
under-utilisation. In practice, an LRU modelled cache produces a more
consistent sequence, even if it does not match common hardware.</p>
<p class=MsoNormal> </p>
<p class=MsoNormal>The modelled cache size chosen merely needs to be “big
enough” to include all likely hardware caches. [Hoppe] shows that it is
unlikely that any hardware will see any significant benefit from a cache larger
than 32 entries, and so 32 was chosen as the modelled cache size here.
Experiments were also performed with a 64-entry modelled cache, but no
significant difference in eventual cache performance was found – the algorithm
simply ran slower.</p>
<p class=MsoNormal> </p>
<p class=MsoNormal>As described, the algorithm is a greedy algorithm with no
lookahead or backtracking. Once it has chosen to add a triangle to the
rendering sequence, it keeps that decision. Although the choice of triangle
affects future choices through the vertex and triangle scores, the current
triangle is not chosen in order to influence those future choices. In this way,
the algorithm remains very fast, and not only runs in linear time proportional
to the number of triangles in the mesh, but the linear time is very fast. This
speed is extremely helpful in authoring pipelines, where artists need to have
fast, accurate previews of their models.</p>
<p class=MsoNormal> </p>
<p class=MsoNormal>It is also helpful for games that have dynamically generated
content, such as joining together parts of bodies and faces to make composite
characters – the resulting mesh should be optimised as a whole for the hardware
vertex cache for maximum efficiency, but this cannot be performed until all the
parts have been joined into a single mesh. In games such as RPGs or MMOs with a
massive number of potential body-shapes, each specified dynamically, new meshes
can appear every second, and algorithms that take significant processing time
to optimise a mesh are counterproductive.</p>
<p class=MsoNormal> </p>
<h2>Vertex Scores in Detail</h2>
<p class=MsoNormal>The score of a given vertex indicates how likely a triangle
using it will be added to the rendered list next. High scores make it more
likely, low scores make it less. The score depends on a number of things.</p>
<p class=MsoNormal> </p>
<p class=MsoNormal>The first is that the more recently the vertex was used, the
higher its score. The score is calculated by taking the vertex’s position in
the LRU cache as a fractional number between zero and one, with zero at the
last (LRU) position and one in the first (MRU) position. This scaled position
is then raised to a power greater than one. The value of the power was chosen
by repeated simulation, and the value 1.5 seems to give good results. Using a
power rather than a simple linear score seems to give a behaviour that is more
scale-independent – a desired trait in an algorithm attempting to optimise for
an unknown size of hardware cache.</p>
<p class=MsoNormal> </p>
<p class=MsoNormal>The exception to this rule is that the most recent three
vertices, i.e. those used by the most recently added triangle, have a fixed
score. The reason they are position-independent is that the order in which
those three vertices were used depends entirely upon the order they were
specified in the most recent triangle, and that is subject to the whims of
export pipelines and various other unpredictable factors. Note that the scores
for the rest of the vertices in the cache is performed as if these first three
were not there, i.e. the fourth vertex in the cache gets a “top” score of 1.0.
The surprising twist is that the score the first three are fixed at was also
found by experimentation to be best set to 0.75. This is lower than the score
of the vertex after them in the cache!</p>
<p class=MsoNormal> </p>
<p class=MsoNormal>The reasoning for this is that if the three most-recent
vertices get the highest score, the next triangle to be added is almost certain
to be one that shares an edge (two of the vertices) of the most recently-added
triangle. On the face of it, this sounds like exactly what we want. The problem
is that because this algorithm has no look-ahead capability, it will first form
a long thin strip through the mesh, and then the strip will turn back on itself
for a long way, and then again. Although this sounds fairly good, it is only
getting an ACMR of around 1.0 – one vertex per triangle – far worse than a good
algorithm is capable of. By slightly discouraging immediate reuse of an edge,
the algorithm is forced to look a little further away, and this seems to produce
a more generally acceptable pattern that is not unlike the theorised ideal
shape – a Hilbert curve. Here is a table showing how position in the first half
of the 32-entry LRU cache corresponds to score:</p>
<p class=MsoNormal> </p>
<table class=MsoNormalTable border=1 cellspacing=0 cellpadding=0 width=146
style='width:109.45pt;margin-left:4.9pt;border-collapse:collapse;border:none'>
<tr style='height:12.75pt'>
<td width=73 nowrap valign=bottom style='width:54.5pt;border:solid windowtext 1.0pt;
padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal><span style='font-size:10.0pt;font-family:Arial'>LRU pos</span></p>
</td>
<td width=73 nowrap valign=bottom style='width:54.95pt;border:solid windowtext 1.0pt;
border-left:none;padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal><span style='font-size:10.0pt;font-family:Arial'>Score</span></p>
</td>
</tr>
<tr style='height:12.75pt'>
<td width=73 nowrap valign=bottom style='width:54.5pt;border:solid windowtext 1.0pt;
border-top:none;padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>0</span></p>
</td>
<td width=73 nowrap valign=bottom style='width:54.95pt;border-top:none;
border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;
padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>0.75</span></p>
</td>
</tr>
<tr style='height:12.75pt'>
<td width=73 nowrap valign=bottom style='width:54.5pt;border:solid windowtext 1.0pt;
border-top:none;padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>1</span></p>
</td>
<td width=73 nowrap valign=bottom style='width:54.95pt;border-top:none;
border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;
padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>0.75</span></p>
</td>
</tr>
<tr style='height:12.75pt'>
<td width=73 nowrap valign=bottom style='width:54.5pt;border:solid windowtext 1.0pt;
border-top:none;padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>2</span></p>
</td>
<td width=73 nowrap valign=bottom style='width:54.95pt;border-top:none;
border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;
padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>0.75</span></p>
</td>
</tr>
<tr style='height:12.75pt'>
<td width=73 nowrap valign=bottom style='width:54.5pt;border:solid windowtext 1.0pt;
border-top:none;padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>3</span></p>
</td>
<td width=73 nowrap valign=bottom style='width:54.95pt;border-top:none;
border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;
padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>1.00</span></p>
</td>
</tr>
<tr style='height:12.75pt'>
<td width=73 nowrap valign=bottom style='width:54.5pt;border:solid windowtext 1.0pt;
border-top:none;padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>4</span></p>
</td>
<td width=73 nowrap valign=bottom style='width:54.95pt;border-top:none;
border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;
padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>0.99</span></p>
</td>
</tr>
<tr style='height:12.75pt'>
<td width=73 nowrap valign=bottom style='width:54.5pt;border:solid windowtext 1.0pt;
border-top:none;padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>5</span></p>
</td>
<td width=73 nowrap valign=bottom style='width:54.95pt;border-top:none;
border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;
padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>0.98</span></p>
</td>
</tr>
<tr style='height:12.75pt'>
<td width=73 nowrap valign=bottom style='width:54.5pt;border:solid windowtext 1.0pt;
border-top:none;padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>6</span></p>
</td>
<td width=73 nowrap valign=bottom style='width:54.95pt;border-top:none;
border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;
padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>0.97</span></p>
</td>
</tr>
<tr style='height:12.75pt'>
<td width=73 nowrap valign=bottom style='width:54.5pt;border:solid windowtext 1.0pt;
border-top:none;padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>7</span></p>
</td>
<td width=73 nowrap valign=bottom style='width:54.95pt;border-top:none;
border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;
padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>0.95</span></p>
</td>
</tr>
<tr style='height:12.75pt'>
<td width=73 nowrap valign=bottom style='width:54.5pt;border:solid windowtext 1.0pt;
border-top:none;padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>8</span></p>
</td>
<td width=73 nowrap valign=bottom style='width:54.95pt;border-top:none;
border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;
padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>0.93</span></p>
</td>
</tr>
<tr style='height:12.75pt'>
<td width=73 nowrap valign=bottom style='width:54.5pt;border:solid windowtext 1.0pt;
border-top:none;padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>9</span></p>
</td>
<td width=73 nowrap valign=bottom style='width:54.95pt;border-top:none;
border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;
padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>0.91</span></p>
</td>
</tr>
<tr style='height:12.75pt'>
<td width=73 nowrap valign=bottom style='width:54.5pt;border:solid windowtext 1.0pt;
border-top:none;padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>10</span></p>
</td>
<td width=73 nowrap valign=bottom style='width:54.95pt;border-top:none;
border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;
padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>0.88</span></p>
</td>
</tr>
<tr style='height:12.75pt'>
<td width=73 nowrap valign=bottom style='width:54.5pt;border:solid windowtext 1.0pt;
border-top:none;padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>11</span></p>
</td>
<td width=73 nowrap valign=bottom style='width:54.95pt;border-top:none;
border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;
padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>0.86</span></p>
</td>
</tr>
<tr style='height:12.75pt'>
<td width=73 nowrap valign=bottom style='width:54.5pt;border:solid windowtext 1.0pt;
border-top:none;padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>12</span></p>
</td>
<td width=73 nowrap valign=bottom style='width:54.95pt;border-top:none;
border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;
padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>0.83</span></p>
</td>
</tr>
<tr style='height:12.75pt'>
<td width=73 nowrap valign=bottom style='width:54.5pt;border:solid windowtext 1.0pt;
border-top:none;padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>13</span></p>
</td>
<td width=73 nowrap valign=bottom style='width:54.95pt;border-top:none;
border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;
padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>0.80</span></p>
</td>
</tr>
<tr style='height:12.75pt'>
<td width=73 nowrap valign=bottom style='width:54.5pt;border:solid windowtext 1.0pt;
border-top:none;padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>14</span></p>
</td>
<td width=73 nowrap valign=bottom style='width:54.95pt;border-top:none;
border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;
padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>0.77</span></p>
</td>
</tr>
<tr style='height:12.75pt'>
<td width=73 nowrap valign=bottom style='width:54.5pt;border:solid windowtext 1.0pt;
border-top:none;padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>15</span></p>
</td>
<td width=73 nowrap valign=bottom style='width:54.95pt;border-top:none;
border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;
padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>0.73</span></p>
</td>
</tr>
</table>
<p><b>EMERGENCY EDIT</b>: Actually, the numbers in the table above are wrong! It's the right sort of shape, but I typed the Excel formula in badly, so the actual numbers aren't correct. However, the code listed below <b>is</b> correct, so keep using that.</p>
<p class=MsoNormal> </p>
<p class=MsoNormal>Added to this vertex score is a boost to a vertex if its
remaining valence is low. The remaining valence of a vertex is the number of
triangles that use it that have not yet been drawn. A vertex with only one
triangle left to be drawn gets the highest boost, and a vertex with a lot of
triangles yet to be drawn gets a much lower boost. The increase in score is
proportional to the number of triangles still remaining using it, raised to a
negative fractional power (testing showed that -0.5 is most effective), and
then scaled relative to the FIFO score. Again, testing showed that a scale of
2.0 was most effective. Here is a table of vertices with reasonable valences
and their score:</p>
<p class=MsoNormal> </p>
<table class=MsoNormalTable border=1 cellspacing=0 cellpadding=0 width=128
style='width:96.0pt;margin-left:4.9pt;border-collapse:collapse;border:none'>
<tr style='height:12.75pt'>
<td width=64 nowrap valign=bottom style='width:48.0pt;border:solid windowtext 1.0pt;
padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal><span style='font-size:10.0pt;font-family:Arial'>Valence</span></p>
</td>
<td width=64 nowrap valign=bottom style='width:48.0pt;border:solid windowtext 1.0pt;
border-left:none;padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal><span style='font-size:10.0pt;font-family:Arial'>Score</span></p>
</td>
</tr>
<tr style='height:12.75pt'>
<td width=64 nowrap valign=bottom style='width:48.0pt;border:solid windowtext 1.0pt;
border-top:none;padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>1</span></p>
</td>
<td width=64 nowrap valign=bottom style='width:48.0pt;border-top:none;
border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;
padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>2.00</span></p>
</td>
</tr>
<tr style='height:12.75pt'>
<td width=64 nowrap valign=bottom style='width:48.0pt;border:solid windowtext 1.0pt;
border-top:none;padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>2</span></p>
</td>
<td width=64 nowrap valign=bottom style='width:48.0pt;border-top:none;
border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;
padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>1.41</span></p>
</td>
</tr>
<tr style='height:12.75pt'>
<td width=64 nowrap valign=bottom style='width:48.0pt;border:solid windowtext 1.0pt;
border-top:none;padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>3</span></p>
</td>
<td width=64 nowrap valign=bottom style='width:48.0pt;border-top:none;
border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;
padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>1.15</span></p>
</td>
</tr>
<tr style='height:12.75pt'>
<td width=64 nowrap valign=bottom style='width:48.0pt;border:solid windowtext 1.0pt;
border-top:none;padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>4</span></p>
</td>
<td width=64 nowrap valign=bottom style='width:48.0pt;border-top:none;
border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;
padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>1.00</span></p>
</td>
</tr>
<tr style='height:12.75pt'>
<td width=64 nowrap valign=bottom style='width:48.0pt;border:solid windowtext 1.0pt;
border-top:none;padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>5</span></p>
</td>
<td width=64 nowrap valign=bottom style='width:48.0pt;border-top:none;
border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;
padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>0.89</span></p>
</td>
</tr>
<tr style='height:12.75pt'>
<td width=64 nowrap valign=bottom style='width:48.0pt;border:solid windowtext 1.0pt;
border-top:none;padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>6</span></p>
</td>
<td width=64 nowrap valign=bottom style='width:48.0pt;border-top:none;
border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;
padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>0.82</span></p>
</td>
</tr>
<tr style='height:12.75pt'>
<td width=64 nowrap valign=bottom style='width:48.0pt;border:solid windowtext 1.0pt;
border-top:none;padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>7</span></p>
</td>
<td width=64 nowrap valign=bottom style='width:48.0pt;border-top:none;
border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;
padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>0.76</span></p>
</td>
</tr>
<tr style='height:12.75pt'>
<td width=64 nowrap valign=bottom style='width:48.0pt;border:solid windowtext 1.0pt;
border-top:none;padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>8</span></p>
</td>
<td width=64 nowrap valign=bottom style='width:48.0pt;border-top:none;
border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;
padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>0.71</span></p>
</td>
</tr>
<tr style='height:12.75pt'>
<td width=64 nowrap valign=bottom style='width:48.0pt;border:solid windowtext 1.0pt;
border-top:none;padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>9</span></p>
</td>
<td width=64 nowrap valign=bottom style='width:48.0pt;border-top:none;
border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;
padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>0.67</span></p>
</td>
</tr>
<tr style='height:12.75pt'>
<td width=64 nowrap valign=bottom style='width:48.0pt;border:solid windowtext 1.0pt;
border-top:none;padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>10</span></p>
</td>
<td width=64 nowrap valign=bottom style='width:48.0pt;border-top:none;
border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;
padding:0cm 5.4pt 0cm 5.4pt;height:12.75pt'>
<p class=MsoNormal align=right style='text-align:right'><span
style='font-size:10.0pt;font-family:Arial'>0.63</span></p>
</td>
</tr>
</table>
<p class=MsoNormal> </p>
<p class=MsoNormal>The reasoning behind rewarding low valence is that without
it, the algorithm will still pick out the best path through a mesh, maintaining
a low vertex/triangle ratio. The problem is that it will leave some lone or
small clusters of triangles behind as it does so, because adding them is
slightly less optimal than adding other triangles. By the time the mesh is 90%
done, all that remains are these little bits of “detritus” – isolated fragments
that still need to be drawn. Although at this point the ACMR is still pleasingly
low, adding these remaining fragments is very costly, because they are not
close to each other, and so each incur costs of around 3 vertices per triangle.
This can have a terrible effect on the ACMR for the mesh as a whole, and drives
efficiency down again. Far better is to add these bits of detritus to the
rendering order as the algorithm goes along. Adding them at the time is very
slightly sub-optimal, but adding them later would be very costly. Boosting the
score of a vertex with very few triangles remaining encourages the addition of
the detritus at a sensible time.</p>
<p class=MsoNormal> </p>
<p class=MsoNormal>Additionally, if the algorithm runs into a “dead end” part
of the mesh, where there are no more nearby triangles to add (for example, at
the end of a finger of a hand), it encourages the algorithm to start again at
triangle that is near another “dead end”, such as the edge of a mesh, rather
than start right in the middle of a patch and create an artificial dead end.</p>
<p class=MsoNormal> </p>
<p class=MsoNormal>The two scores – that due to the FIFO position and the one
due to remaining valence – are added together to find the total vertex score.
To find each triangle’s score, the score for each of the three vertices is
added together. And at each stage, the triangle with the highest score is added
to the draw list, and the scores for the vertices are recalculated.</p>
<p class=MsoNormal> </p>
<p class=MsoNormal>The full code for calculating the score of a vertex is as
follows:</p>
<p class=MsoNormal> </p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New";color:blue'>const</span><span style='font-size:10.0pt;
font-family:"Courier New"'> <span style='color:blue'>float</span> FindVertexScore_CacheDecayPower
= 1.5f;</span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New";color:blue'>const</span><span style='font-size:10.0pt;
font-family:"Courier New"'> <span style='color:blue'>float</span>
FindVertexScore_LastTriScore = 0.75f;</span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New";color:blue'>const</span><span style='font-size:10.0pt;
font-family:"Courier New"'> <span style='color:blue'>float</span>
FindVertexScore_ValenceBoostScale = 2.0f;</span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New";color:blue'>const</span><span style='font-size:10.0pt;
font-family:"Courier New"'> <span style='color:blue'>float</span>
FindVertexScore_ValenceBoostPower = 0.5f;</span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New";color:blue'>float</span><span style='font-size:10.0pt;
font-family:"Courier New"'> FindVertexScore ( vcache_vertex_data *VertexData )</span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New"'>{</span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New"'> <span style='color:blue'>if</span> (
VertexData->NumActiveTris == 0 )</span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New"'> {</span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New"'> <span style='color:green'>// No tri needs
this vertex!</span></span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New"'> <span style='color:blue'>return</span>
-1.0f;</span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New"'> }</span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New"'> </span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New"'> <span style='color:blue'>float</span> Score =
0.0f;</span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New"'> <span style='color:blue'>int</span>
CachePosition = VertexData->CacheTag;</span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New"'> <span style='color:blue'>if</span> (
CachePosition < 0 )</span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New"'> {</span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New"'> <span style='color:green'>// Vertex is not
in FIFO cache - no score.</span></span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New"'> }</span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New"'> <span style='color:blue'>else</span></span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New"'> {</span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New"'> <span style='color:blue'>if</span> (
CachePosition < 3 )</span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New"'> {</span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New"'> <span style='color:green'>// This vertex
was used in the last triangle,</span></span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New"'> <span style='color:green'>// so it has a
fixed score, whichever of the three</span></span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New"'> <span style='color:green'>// it's in.
Otherwise, you can get very different</span></span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New"'> <span style='color:green'>// answers
depending on whether you add</span></span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New"'> <span style='color:green'>// the
triangle 1,2,3 or 3,1,2 - which is silly.</span></span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New"'> Score = FindVertexScore_LastTriScore;</span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New"'> }</span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New"'> <span style='color:blue'>else</span></span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New"'> {</span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New"'> assert ( CachePosition <
MaxSizeVertexCache );</span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New"'> <span style='color:green'>// Points for
being high in the cache.</span></span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New"'> <span style='color:blue'>const</span> <span
style='color:blue'>float</span> Scaler = 1.0f / ( MaxSizeVertexCache - 3 );</span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New"'> Score = 1.0f - ( CachePosition - 3 ) *
Scaler;</span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New"'> Score = powf ( Score,
FindVertexScore_CacheDecayPower );</span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New"'> }</span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New"'> }</span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New"'> </span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New"'> <span style='color:green'>// Bonus points for
having a low number of tris still to </span></span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New"'> <span style='color:green'>// use the vert, so we
get rid of lone verts quickly.</span></span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New";color:blue'> float</span><span style='font-size:
10.0pt;font-family:"Courier New"'> ValenceBoost = powf (
VertexData->NumActiveTris,</span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New"'>
-FindVertexScore_ValenceBoostPower );</span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New"'> Score += FindVertexScore_ValenceBoostScale *
ValenceBoost;</span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New"'> </span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New"'> <span style='color:blue'>return</span> Score;</span></p>
<p class=MsoNormal style='text-autospace:none'><span style='font-size:10.0pt;
font-family:"Courier New"'>}</span></p>
<p class=MsoNormal> </p>
<h2>The Constants</h2>
<p class=MsoNormal>The four constants used in the code above were determined by
a process of guesswork, judgement, and a few days of brute-force annealing of
each one with fairly typical data sets and the results sent through simulations
of fairly typical hardware cache structures, taking the best aggregate scores.
It is entirely possible that slightly better settings and values exist that
have not been discovered yet. It is also entirely possible that using power
functions for the scores is not the appropriate function, and that using other
functions, or possibly fully custom look-up-tables, may yield better results.
However, the results are fairly close to what is considered optimal, and it is
likely that any remaining improvements are small. In summary, these values
appear to be “good enough” for most purposes.</p>
<p class=MsoNormal> </p>
<p class=MsoNormal>As a side note, it is curious that the four constants are
all so close to “normal” numbers. The values 1.5, 0.75, 2.0 and 0.5 seem almost
boring. And yet the process of annealing did consider possible values for all
of them that varied in 0.05 increments, out to a range of at least +/- 0.5 from
their current settings. Nor were these the values selected using human
judgement at the start of the annealing process (for example, the valence boost
scale began far smaller than 2.0 – it is somewhat surprising it turned out to
be so important). Nevertheless, these values consistently produced the best
overall scores across the range of caches and meshes.</p>
<p class=MsoNormal> </p>
<h2>Runtime Efficiency</h2>
<p class=MsoNormal>The draw-ordering of the triangles is as given in the
algorithm above. However, there are many ways to implement this algorithm that
run slowly, and one of the key features of this algorithm is speed.</p>
<p class=MsoNormal> </p>
<p class=MsoNormal>Unlike many conventional mesh topology algorithms, no edge
data is kept, nor is any needed. This avoids the expense of finding edge data
(typically somewhat laborious), and also the problems inherent in pathological
meshes where a single edge is shared by many (which in typical algorithms means
more than two!) triangles.</p>
<p class=MsoNormal> </p>
<p class=MsoNormal>Instead, each vertex holds the following data:</p>
<ul style='margin-top:0cm' type=disc>
<li class=MsoNormal>Its position in the modelled cache (-1 if it is not in the
cache)</li>
<li class=MsoNormal>Its current score</li>
<li class=MsoNormal>The total number of triangles that use it</li>
<li class=MsoNormal>The number of triangles not yet added that use it</li>
<li class=MsoNormal>The list of triangle indices that use it</li>
</ul>
<p class=MsoNormal>Note that the total number of triangles using the vertex is
not strictly necessary for the algorithm to run, but is useful for debugging.
The list of triangle indices is always this length, but it is ordered so the
triangle indices yet to be added are listed first, followed by the triangle
indices that have already been added to the draw list.</p>
<p class=MsoNormal> </p>
<p class=MsoNormal>Each triangle in the mesh also stores the following data:</p>
<ul style='margin-top:0cm' type=disc>
<li class=MsoNormal>Whether it has been added to the draw list or not</li>
<li class=MsoNormal>The triangle’s score (the sum of the scores of its
vertices)</li>
<li class=MsoNormal>The indices to the three vertices</li>
</ul>
<p class=MsoNormal> </p>
<p class=MsoNormal>The other data structure is the FIFO cache itself, which is
simply a list of vertex indices ordered from most recently used to least.
Notionally this cache is 32 entries long, though for ease of implementation it
actually grows to three entries longer.</p>
<p class=MsoNormal> </p>
<p class=MsoNormal>Initialisation of the data is fairly straightforward, with
two passes over the triangle data. The first simply increments the counter of
the number of triangles that use each vertex, the second allocates the lists of
triangles for each vertex and fills them in. After that, the score of each vertex
is found using the above code, and then the score of each triangle is found by
summing the scores of each vertex the triangle uses. These scores are simply
cached values of the computed scores, and are updated when necessary as the
algorithm runs.</p>
<p class=MsoNormal> </p>
<p class=MsoNormal>Then comes the main body of the algorithm, which picks one
triangle at a time to add to the list of drawn triangles, until there are no
more triangles left to draw. Usually, the algorithm knows from the previous iteration
which triangle has the highest score, and simply picks that one. In some cases,
for example the first time the algorithm runs, it does not already know the
best triangle, or the supposed best triangle has an unusually low score
(implying that there may be others in the mesh that are better). In those rare
circumstances, it runs through all the remaining triangles in the mesh
searching for the best score.</p>
<p class=MsoNormal> </p>
<p class=MsoNormal>[<b>HINDSIGHT</b>: Actually, the "unusually low score" clause turns out to have almost no impact on the results. It was useful while I was developing the algorithm, but it turns out that for the final algorithm presented here, it happens for only three triangles in my entire set of test data, so it really has no significant effect on the efficiency.]</p>
<p class=MsoNormal> </p>
<p class=MsoNormal>The best triangle is then added to the draw list. For each
vertex the triangle used, the valence of that vertex (the number of triangles
not yet drawn that use it) is reduced by one, and the list of triangle indices in
the vertex is updated appropriately.</p>
<p class=MsoNormal> </p>
<p class=MsoNormal>The three vertices used by the triangle are either moved to
the head of the LRU modelled cache, or added to the head if they were not
already in it. The cache is temporarily grown in size by three vertices to
include all vertices that were previously in the cache, and up to the three new
ones of this triangle. Then the new positions of the vertices in the cache are
updated, their corresponding scores are found using the code given above, and
the scores of all their still-to-be-added triangles are also updated.</p>
<p class=MsoNormal> </p>
<p class=MsoNormal>As this is done, the score and index of the highest-scoring
triangle so far are noted. Although only the triangles that use vertices
currently in the cache are scanned for their score, this is a reasonable
estimate of the best score of any triangle in the entire mesh. Debugging code
that exhaustively checks whether this estimate is true found it to be false
only a few times in all the testing done, and even then the score difference
between the correct answer and the approximation was very small. In practice,
this difference is unlikely to make any significant difference to the
cache-performance, and hugely decreases the runtime cost of the algorithm.</p>
<p class=MsoNormal> </p>
<p class=MsoNormal>Finally, the cache is shrunk back to its normal size, with
up to three vertices falling out of it. Their cache positions have already been
updated appropriately. The algorithm repeats until there are no triangles left
to add.</p>
<p class=MsoNormal> </p>
<p class=MsoNormal>Examining the above carefully, we can see that with a
“normal” mesh where each vertex is used by around 6 triangles, the runtime
speed and memory use of this algorithm is linearly proportional to the number
of vertices in the mesh. This is considerably better than various look-ahead or