Benchmark Data

Benchmarks from the git 172ae54 version of turnout-petrinet-benchmarks serve as the source of this data analysis.

Benchmarking Platform

Benchmarks were run on a virtual machine. All the resources of the host machine were assigned to the VM to prevent interference from other VMs. The heap space of the JVM was increased to 25GB, while stack space was increased to 8MB.

Setting Value
CPU Intel(R) Xeon(R) L5420 @ 2.50 GHz (4 cores with 8 threads)
Memory 32 GB
Virtualization Apache VCL with VMWare (software version 4, hardware version 6)
Operating System Ubuntu Server 14.04
uname -sr Linux 3.13.0-24-generic
uname -v #47-Ubuntu SMP Fri May 2 23:30:00 UTC 2014
uname -mpio x86_64 x86_64 x86_64 GNU/Linux
java -version java version "1.8.0_111"
Java(TM) SE Runtime Environment (build 1.8.0_111-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.111-b14, mixed mode)
Scala Version Scala 2.11.8
JAVA_OPTS -Xmx25G -Xms25G -Xss8M

Data Import

Various statistics about the saturation (states space size, EVIDD and MDD node count, cache size, etc.) were generated by the command

JAVA_OPTS="-Xmx25G -Xms25G -Xss8M -XX:+UseConcMarkSweepGC" sbt "petrinet-benchmarks/runMain hu.bme.mit.inf.turnout.petrinet.saturation.SaturationStatistics -o statistics.csv""

This tools simply runs saturation and tangible state calculation, then saves the statistics provided by the decision diagram package turnout-symbolic.

misc_statistics <- read.csv("statistics.csv")

Performance of saturation was measured with the JMH Java benchmarking harness in single-shot mode. For each model, 5 warmup iterations are executed, then the running times of the 5 next iterations are averaged. The benchmark script saves measurements in seconds.

sbt "petrinet-benchmarks/jmh:run  .*StateSpaceExplorationBenchmark.* -rf CSV -rff saturation.csv -jvmArgs \"-Xmx25G -Xms25G -Xss8M -XX:+UseConcMarkSweepGC\""
saturation <- read.csv("saturation.csv")

EVIDD creation time is measured by JMH in average time mode. Warmup is done by repeatedly constructing EVIDDs for the model for 10 seconds 5 times, then 5 additional iterations are measured (if an invocation takes more than 10 seconds, the benchmark effectively becomes single-shot). The benchmark script saves measurements in milliseconds.

sbt "petrinet-benchmarks/jmh:run  .*EddCreationBenchmark.* -rf CSV -rff /home/102778/turnout/eddCreation.csv -jvmArgs \"-Xmx25G -Xms25G -Xss8M -XX:+UseConcMarkSweepGC\""

The CSV contains results for both \(\textit{Top}(t)\) nondecreasing and unordered EVIDD handle merging.

edd_creation <- read.csv("eddCreation.csv")

Tangible state space calculation is benchmarked the same way as EVIDD creation. However, an invocation-level @TearDown is used to clear caches and avoid calling saturation before every tangible state space calculation again. Thus measurements < 1 millisecond would be unreliable. The benchmark script saves measurements in milliseconds.

sbt "petrinet-benchmarks/jmh:run  .*GetTangibleStatesBenchmark.* -rf CSV -rff /home/102778/turnout/getTangible.csv -jvmArgs \"-Xmx25G -Xms25G -Xss8M -XX:+UseConcMarkSweepGC\""
get_tangible <- read.csv("getTangible.csv")

Table of All Measurements

We will need some packages to wrangle data.

library(dplyr)
library(tidyr)

Now we can join the outputs of the different benchmarks together and create a table showing all measurements.

ordered_edd_creation <- edd_creation %>%
  filter(grepl("BottomToTop", Benchmark))

all_benchmarks <- misc_statistics %>%
  inner_join(ordered_edd_creation %>%
               transmute(modelNameAndSize = Param..modelNameAndSize, eddCreationTime = Score),
             by = "modelNameAndSize") %>%
  inner_join(saturation %>%
               transmute(modelNameAndSize = Param..modelNameAndSize, saturationTime = Score),
             by = "modelNameAndSize") %>%
  inner_join(get_tangible %>%
               transmute(modelNameAndSize = Param..modelNameAndSize, getTangibleTime = Score),
             by = "modelNameAndSize") %>%
  separate(modelNameAndSize, into = c("modelName", "n"), sep = ":") %>%
  mutate(
    # Sum the time taken by the three phases, but first convert everything to seconds.
    totalTime = saturationTime + eddCreationTime / 1000 + getTangibleTime / 1000,
    # Post-process the output of separate().
    n = as.integer(n))

The attributes starting with edd refer to the EVIDD container, where the attributes starting with mdd refer to the MDD container. As no garbage collection is applied to the unique table and caches (Lazy strategy), reported unique table sizes and cache sizes correspond to both final and peak values. When reporting the number of EVIDD and MDD nodes reachable from the roots, we also count terminal nodes.

modelName n eddNodes eddUniqueTable eddTotalCacheSize nReachable reachableNodes nTangible tangibleNodes mddUniqueTable mddTotalCacheSize eddCreationTime saturationTime getTangibleTime totalTime
philosophers 16 477 3331 2355 4.286648e+07 2751 4.870845e+06 1188 7331 39656 10.015887 0.204366 1.782994 0.2161649
philosophers 30 925 11311 7899 5.408923e+13 5649 3.461453e+12 2364 14775 80966 36.461141 0.350377 3.578311 0.3904165
philosophers 60 1885 44251 26739 3.624730e+26 11859 1.198166e+25 4884 31198 172557 152.984909 0.768443 8.111648 0.9295396
philosophers 90 2845 98791 53563 1.861288e+39 18069 4.147394e+37 7404 48981 265175 329.886881 1.077550 12.634806 1.4200717
philosophers 120 3805 174931 96825 8.542493e+51 24279 1.435601e+50 9924 64045 350990 699.454176 1.543637 18.387891 2.2614791
philosophersRH 16 617 3475 3127 3.778150e+07 3290 4.870845e+06 1188 6413 41677 13.691648 0.208972 1.946802 0.2246104
philosophersRH 30 1205 11595 8559 4.731369e+13 6818 3.461453e+12 2364 13627 90196 39.270428 0.383227 3.994622 0.4264920
philosophersRH 60 2465 44835 28119 3.155665e+26 14378 1.198166e+25 4884 28683 191245 159.971954 0.737416 9.845943 0.9072339
philosophersRH 90 3725 99675 58479 1.617741e+39 21938 4.147394e+37 7404 42560 285799 361.570393 1.103863 13.675012 1.4791084
philosophersRH 120 4985 176115 103411 7.418459e+51 29498 1.435601e+50 9924 58802 396746 669.711136 1.689820 20.657878 2.3801890
philosophersBH 16 4009 6875 16965 3.972379e+07 3821 4.870845e+06 1188 6235 93895 65.530928 0.410923 2.944229 0.4793982
philosophersBH 30 14257 24655 93839 4.990172e+13 8021 3.461453e+12 2364 12929 306902 400.234031 1.494226 6.884014 1.9013440
philosophersBH 60 57337 99707 721059 3.334832e+26 17021 1.198166e+25 4884 27254 1153002 3821.669406 6.196324 14.865872 10.0328593
philosophersBH 90 129217 225175 2144539 1.710768e+39 26021 4.147394e+37 7404 41596 2527638 11045.825981 13.890982 22.749371 24.9595574
philosophersBH 120 229897 401035 4965389 7.847802e+51 35021 1.435601e+50 9924 55947 4428199 25252.002257 25.592277 31.973336 50.8762526
philosophersTH 16 477 3331 4533 3.972379e+07 3482 4.870845e+06 1188 8810 68601 13.272683 0.271268 2.120575 0.2866613
philosophersTH 30 925 11311 15663 4.990172e+13 7220 3.461453e+12 2364 18029 195951 48.049765 0.763002 4.024010 0.8150758
philosophersTH 60 1885 44251 61947 3.334832e+26 15230 1.198166e+25 4884 38184 660218 226.079540 3.012283 10.977812 3.2493404
philosophersTH 90 2845 98791 138843 1.710768e+39 23240 4.147394e+37 7404 59641 1371431 570.135466 6.156579 15.805052 6.7425195
philosophersTH 120 3805 174931 248227 7.847802e+51 31250 1.435601e+50 9924 78564 2352797 1028.760348 11.505685 23.379308 12.5578247
kanban 8 25 37 44 8.048366e+07 474 4.232452e+07 280 1763 6311 0.216289 0.043470 0.949228 0.0446355
kanban 30 25 37 44 6.093551e+12 3389 2.355459e+12 1985 21222 51125 0.236240 0.625804 11.579997 0.6376202
kanban 40 25 37 44 7.656243e+13 5514 2.857761e+13 3240 41427 87015 0.235310 1.128257 22.458401 1.1509507
kanban 50 25 37 44 5.509427e+14 8139 2.012438e+14 4795 71532 134005 0.232801 2.211643 39.858468 2.2517343
FMS 8 28 121 123 2.379298e+07 2102 4.459455e+06 748 5972 27569 0.587470 0.183342 2.089185 0.1860187
FMS 10 28 121 123 1.485073e+08 3289 2.539766e+07 1081 9645 44596 0.569015 0.277304 3.221964 0.2810950
FMS 20 28 121 123 6.464884e+10 13004 8.831322e+09 3646 44910 190891 0.562734 1.388950 17.075358 1.4065881
FMS 40 28 121 123 4.189220e+13 51334 4.969188e+12 13276 231940 805281 0.608593 7.314636 98.108353 7.4133529
FMS 80 28 121 123 3.387329e+16 203594 3.705727e+15 50536 1352000 3461261 0.571674 51.405283 602.989442 52.0088441
polling 5 21 347 234 1.957000e+07 428 5.909651e+06 279 2459 8406 1.154291 0.054097 0.362260 0.0556136
polling 10 46 1442 850 5.019087e+17 2783 9.338803e+16 1604 29160 97316 3.923240 0.719069 3.041551 0.7260338
polling 15 71 3287 1840 1.693998e+29 8588 2.275978e+28 4729 131980 430316 8.716698 3.844660 13.272989 3.8666497
polling 20 96 5882 3207 3.040618e+41 19343 3.196358e+40 10404 392630 1260871 16.100387 11.780375 34.822891 11.8312983
courier 10 69 538 424 1.226648e+10 3236 4.253738e+09 1433 17165 85414 1.855812 0.620738 3.878427 0.6264722
courier 20 69 538 424 6.244343e+12 9346 2.257258e+12 4193 54920 264639 1.927887 2.647023 16.657813 2.6656087
courier 40 69 538 424 5.852142e+15 30566 2.180492e+15 13913 190730 891589 1.903843 14.703267 83.855361 14.7890262
courier 60 69 538 424 3.833196e+17 63786 1.443699e+17 29233 406940 1876539 1.929164 42.618562 226.795738 42.8472869

Comparison with Matrix Diagram Method

Matrix Diagram Benchmark Data

Unfortunately, we were unable to obtain a version of SMART that supported priorities. Therefore, we instead use previously published data and compare only trends in runtime instead of absolute values.

matrix_diagram_benchmarks <- read.csv("matrixDiagram.csv") %>%
  mutate(totalTime = nextStateTime + generateTangibleTime)

side_by_side <- rbind(
  all_benchmarks %>%
    # Drop versions of models that were not considered in the original papers.
    filter(!grepl("^philosophers.+", modelName)) %>%
    mutate(algorithm = "EVIDD") %>%
    select(modelName, n, algorithm, totalTime),
  matrix_diagram_benchmarks %>%
    select(modelName, n, algorithm, totalTime))

We will need some packages for visualization.

library(ggplot2)

Plot with absolute values of measurements (log scale):

It is not very easy to compare the measurements, because the matrix diagram measurements were performed on a now very old machine with different speed for the atomic operations. Asymptotically equaivalent algorithm should correspond to parallel curves on the log scale plot above.

We can make this a bit easier to see by normalizing the matrix diagram measurement by dividing with the faster running time for each algorithm. This has the effect of moving the start of the curves to a common origin.

normalizer <- side_by_side %>%
  group_by(modelName, algorithm) %>%
  summarize(minTotalTime = min(totalTime))

side_by_side_normalized <- side_by_side %>%
  inner_join(normalizer, by = c("modelName", "algorithm")) %>%
  mutate(totalTime = totalTime / minTotalTime) %>%
  select(-minTotalTime)

Another way to compare is to calculate the ratio of the total running time of our method and the total running time of the other algorithms. In this visualization, algorithms asymptotically equivalent to ours apper as horizontal lines, while asymptotically better algorithms appear as line with negative slope.

ratio <- side_by_side %>%
  inner_join(all_benchmarks %>%
         transmute(modelName = modelName, n = n, ourTime = totalTime),
       by = c("modelName", "n")) %>%
  mutate(totalTime = totalTime / ourTime)