Apple Silicon is looking pretty impressive. I’m impressed enough to replace my 2018 MacBook Pro with the shitty keyboard with a new M1 MBP. All the benchmarks though are useless to me since I’m primarily a .NET and JVM developer who will be running under emulation in Rosetta for the foreseeable future. I intend to quantify the performance of the new Macs versus the old Intel ones with a suite of benchmarks specifically targeting .NET and JVM runtimes.
I’ve eagerly awaited a real non-Intel laptop and desktop option ever since Apple switched over to Intel processors in 2006. I have nothing against Intel per se. I just like having diversity of hardware as well as software. When the rumors of Apple making their own chips started surfacing I figured they would be just rumors. The last time a consumer computer company made their own CPUs and end products was Commodore. In the end it was part of what did them in. Back in the 1980s Apple experimented with doing this exact move with their Project Aquarius but ended up abandoning that project for the PowerPC. Large expensive workstation companies used to make their own chips but they mostly went the way of the dodo too: SGI, Sun, DEC. Even IBM is mostly out of the POWER computing business except for ultra high end workstations and servers. My skepticism may have been warranted but it turned out the rumors were true.
The initial performance and power marketing curves looked promising but what about real results? Tons of people are running benchmarks on the raw CPUs. Some are looking at specific applications. I’m a mostly .NET and JVM developer though. While those platforms do hope to ultimately support Apple Silicon it’s going to be some time. I am going to be doing a series of benchmarks focusing on .NET and JVM performance first under emulation on M1 and then as native libraries once those options become available on Apple Silicon.
I’ve compiled a set of .NET and JVM programs and benchmarks which I can use to really put the .NET and Java runtimes through their paces. The hardware I have available to me is my 2018 MacBook Pro and a new 2020 MacBook Pro M1. I’d love to get others running these benchmarks on other systems which is why I’m publishing everything I’m doing on Gitlab here . The idea is to run each test on each platform and then tabulate the relative performance. Some of these programs are microbenchmarks. Some of them are compile tasks. The idea is to quantify what the relative performance is against present day Intel Mac hardware. The tests break down as follows:
- Java
- JavaFX Bouncy Balls Benchmark
- Compilation of Orekit astrodynamics library
- Execution of the Orekit test suite
- The Renaissance Java project benchmark suite
- Dotnet
- Avalonia Project’s Benchmark programs
- Compilation of the Avalonia main project from scratch
- .NET Team’s standard benchmark suite
- (Potentially) Avalonia Graphics Benchmark Suite based on the JavaFX Bouncy Balls Benchmark
Most of the work has been done by the developers of the various projects. I ended up starting with an existing project for the JavaFX bouncy balls benchmark and polished it up a bit for my own uses (documented in the repository). If I can’t find a .NET graphics benchmark that works then I’ll try to write a Bouncy Balls equivalent in Avalonia. I’m also going to write some data reduction programs which will convert the various benchmark formats to a more common one which can be used to programmatically compile raw data sets into something which can be easily graphed and compared.
The first stage of the benchmarking is mostly completed. I’ve compiled the various tests and will complete running them on my reference MBP today, with the exception of the graphical .NET benchmark. Next up will be writing the data reduction code and if necessary (and time) the .NET graphics test. I’m presently running the benchmarks against macOS 10.15 (Catalina). I’m hoping to run them again against macOS 11 (Big Sur) as well since that’ll be the operating system the M1 MBP will be running. My own new M1 MBP arrives next week some time. At which point I’ll be able to run these same benchmarks against that.
For anyone interested in running these benchmarks and contributing results back feel free to issue PRs through Gitlab and/or reach out to me through Twitter.