Back to chats Igalia's Brian Kardell and Eric Meyer learn about RISCV and LLVM their compilers colleague Alex Bradbury

0:00

Transcription

  • Eric Meyer: This episode of Igalia Chats is brought to you by Igalia, an open source consultancy with a proven track record of landing features in devices from XR to handheld, deep expertise with embedded devices, advancements in flagship web browsers, kernel contributions, and much more. To get in touch or to learn more, visit Igalia.com.
  • Brian Kardell: Okay. Hi, I am Brian Kardell. I'm a developer advocate at Igalia.
  • Eric Meyer: And I'm Eric Meyer, also a developer advocate at Igalia. And we have a special guest today. Alex, please introduce yourself and let people know what you do.
  • Alex Bradbury: Hi, I'm Alex Bradbury, on the compilers team here at Igalia. I've been working primarily on RISCV LLVM, where I guess I've been working with RISCV for something like 10 years now, and about 15, 14 years with LLVM in some capacity.
  • Eric Meyer: And, sorry, let me just ask, RISCV and LLVM, what do those stand for for people who have never heard of them?
  • Alex Bradbury: Right. So I'll start with LLVM. So LLVM used to stand for low level virtual machine. It now stands for nothing, it's just LLVM. So LLVM is a compiler framework and as many of these software projects, it's kind of started with a narrower scope and has now grown so large that the initial kind of expansion doesn't really make sense. RISCV is a instruction set architecture. So RISC stands for reduced instruction set computer. And it is a open standard instruction set architecture that's seen increasing adoption over recent years.
  • Brian Kardell: Yeah, I know for me, I got my first computer when I was seven years old. It was a TI-99/4A. Did you have a home computer, Eric?
  • Eric Meyer: In terms of an actual programmable computer, we got a Tandy Color Computer 1, I believe it was revision 1. It had 8K of RAM and 8K of ROM.
  • Brian Kardell: That were sold by RadioShack.
  • Eric Meyer: RadioShack.
  • Brian Kardell: In the States. But there were a bunch. There were a Commodore 64 and Atari had one, and then you had the Apple. And then when I got just a little bit older, I got a PC clone, I saved up and built it. And I guess 8088, it would've been, this part of the x86 family. Since then we've had 8086, 286, 386. And then we switch to these words like Pentium. And I don't know, I'm embarrassed to say that I generally only think about this when I'm buying hardware. And I've kind of thought about it very generically in these two worlds. There's the PC world and the Mac world. But then there's always been these words that fly around like ARM and RISC. And I guess it's a little bit embarrassing to admit, but I don't really know about them or what they're about, and why should I even care? And every once in a while it gets a little bit confusing for me, because I had a MacBook that had Intel inside. And then there were also this phase where there were Apple Power PCs and I was like, 'What does this mean?'
  • Eric Meyer: Right?
  • Brian Kardell: So I'm really hoping that Alex can help us demystify some of this. And I don't know, is it clear to you, all these things or is it just me that feels like kind of a dummy with this stuff?
  • Eric Meyer: Well, I figure Alex is probably clear to Alex.
  • Alex Bradbury: I guess I can try and demystify a little bit. Fundamentally, your instruction set architecture is providing a kind of interface between the code that's running on your machine and the hardware on your processor. So it's defining things like the register sets, the instructions it supports, but importantly how those are encoded and what their semantics are, so what they do when you run them. And it's probably not surprising, it's not something you have to think about all that much, because if you imagine some stack of interfaces or things that make up a modern computer, the instructions that architecture, it's... Well not at the bottom, but somewhere quite a long way below the bits that you'd normally interact with, the web browser, the operating system, Android, and user interface, and the like. So it's, I guess increasing and especially now, that I guess there's a smaller number of instruction set architectures that are out there in day-to-day use. Which I think is something you were hinting at. It's not something that you think about all that often as to whether you go and buy a system that supports this particular ISA versus ISAB. You just go and buy an Android phone or a laptop and it just works. Though I guess on both the laptop and server side, that is a case where there's been increasing diversity in recent years, because that ARM on both a laptop and server space has been moving forwards with companies like Amazon deploying quite a lot of ARM CPUs in the AWS. And of course you mentioned MacBook, that in recent years moved over to using an ARM-based design with the M1 and so on.
  • Eric Meyer: And an ARM-based design, what?
  • Brian Kardell: I had two of them.
  • Alex Bradbury: Yeah, so ARM is both a... It's a company who produced an instruction set architecture, the ARM instruction set architecture. And they also license designs that implement that instruction set architecture. And they're what we'd call a fabless semiconductor company, in that they don't produce their own chips, they license it. And then others, say, Qualcomm, or Samsung, or others produce chips using the design. It's a little bit complicated in the ARM case, because there are actually two types of ARM license. You can either license one of the designs that the ARM engineers produced. So you get the latest Cortex-A, whatever, core design that their engineers have produced. You put it on your system on chip. Alternatively, there's the architectural license, which is what Apple has, where they are just taking the instruction set architecture, that interface, but then doing their own implementation. And with Apple, with the M1, when that kind of hit the market, that was a pretty big leap in performance versus what was available otherwise.
  • Brian Kardell: So you're saying Apple with the M1, but Google also has Silicon, right? So is their Silicon also ARM-based?
  • Alex Bradbury: So I think Google has, to my understanding, done some limited end-user available Silicon. I think some of the Pixel phones may have an SoC that was done by Google, but in partnership with another firm. I don't believe that was their own custom core design.
  • Brian Kardell: So would it be ARM-based?
  • Alex Bradbury: So on the Pixel phones, yes, that would be an ARM-based CPU.
  • Brian Kardell: Okay. Yeah, I have one of those actually. I had distilled this down into Apple versus PC kind of in my mind. So to be clear, we have these instruction set architectures, basically there's the two that we've been talking about, right? There's Intel and ARM, and then there's RISC, right? The RISC architecture.
  • Alex Bradbury: RISCV, yeah.
  • Brian Kardell: RISCV. Was there a RISCIV, and a RISCIII, and a RISCII?
  • Alex Bradbury: Yeah, so it was called RISCV because it kind of came from a series of primarily research architectures. And so they kind of looked back and saw that this was actually the fifth one they'd done. It came out of UC Berkeley. But I guess the RISCV, it's a name. It mainly serves to confuse people who aren't sure whether to pronounce it RISCV or RISC Vee.
  • Brian Kardell: So these are very low-level, above though, the assembly?
  • Alex Bradbury: No, so essentially assembly is basically how you would write something in your target instruction set architecture. In your assembly, you'd use what's called the mnemonics. So the names of the instructions along with the operands that they support. So say RISCV, of the most base instructions would be addi, for add an integer. So you might have to addi. And then the register you want to write the result to, name the two registers that you want to add, and then you put that in text form into your assembler. The assembler then produces a binary which encodes that instruction in the appropriate 32-bit encoding.
  • Eric Meyer: What I understand you're saying is that RISCV is an assembly language?
  • Alex Bradbury: Well, I'd say that to interact directly at the ISA level, you would write assembly language. Of course nowadays we mostly rely on compilers. So you'd write your Hello, World! in C, you put it into a compiler like Clang or GCC. You can ask it to emit the assembly, and then that's where you'd see it's a different... Depending on which instruction set architecture you're targeting, you can ask it to emit the assembly for RISCV, or for x86, or for ARM. And you'd see that it's using different instructions at the assembly level, which are targeting the different structures and architectures.
  • Eric Meyer: Okay. I sort of understand.
  • Brian Kardell: So these have to be made for the hardware, right? The hardware needs some way to communicate with the hardware. And so there are very few that you have and they're basically these three. But I think also now, we have graphics cards that have their own chips on them, right?
  • Alex Bradbury: Mm-hmm.
  • Brian Kardell: So can you mix them in the same device? Can you have an ARM CPU with a GPU that has RISCV?
  • Alex Bradbury: Yeah, indeed. And I'd say, I think you said there are, but maybe these three architectures, there's probably a few more that are still in fairly common use. But where you really see or have historically seen an explosion in them is all of the extra bits of logic on a larger chip. So you might have an end-user programmable core where you really, really care about the software support story. The software ecosystem is there. If it's running something like Linux, you want to have a port of Debian for it and all the other major distros, decent compiler support and the like. But then for other bits of the chip like a power management engine or things like that, it's something which isn't directly exposed to the end-user. It might be something that is used only internally by the hardware vendor. And then for them, you have more freedom in the instructional set architecture choice and the possibility to take something like an open standard like RISCV and then make modifications to it when you need to for your own kind of custom needs. That has been or can be quite compelling. And so previously they might've had their own homegrown instructional set architecture for that, whereas now they at least can benefit from the kind of shared investment in baseline RISCV support. And if they need any extra features, they can go and do it themselves. And I think at the RISCV summit towards the end of last year, there was a keynote presentation there from NVIDIA, who spoke a bit about using RISCV in precisely that kind of use case. So if you plug in your graphics card, it'll have a bit of RISCV core on there. You don't directly program it or have any direct interaction with it, but providing services as part of the wider GPU. And I think they mentioned a figure of expecting to ship a billion RISCV cores over the course of 2024. You think about buying a laptop and it's got a processor in it, and maybe it's got four cores, or eight cores, or 16 cores. But on a modern SoC, you typically have dozens upon dozens of processors just dotted all around for all these different subsystems on the chip that you don't directly interact with.
  • Eric Meyer: Sorry, SoC stands for?
  • Alex Bradbury: Oh, sorry, the system-on-chip. So it's what you typically call a kind of highly integrated processor.
  • Eric Meyer: Okay. And you said end-user programmable chips, cores?
  • Alex Bradbury: Yeah, I suppose I really mean programmer there. So end-user, not necessarily in the consumer sense, but really what can I, as a developer, get access to? I guess maybe it's better to think it in terms of levels of abstraction, where there's the documented and exposed instruction set architecture, say, on an Intel chip, x86. But then even on Intel chips, there'll be different bits of that designed that have different ISAs purely for internal purposes and they are not exposed to you directly.
  • Brian Kardell: So I think when we first started talking about this, I was asking you questions and you used this thing where you compared them and you said it's RISC versus CISC?
  • Alex Bradbury: So yeah, historically there's been this branch between the reduced instruction set architectures, so RISC, versus CISC, which are the complex instruction set architectures, where x86 would be an example. And the difference really is that for a CISC processor design or instruction set architecture, you might have fairly complex instructions that take many, many, many cycles to execute. Whereas RISC is trying to kind of simplify the definition of each instruction. It's more reliant on compiler smarts to try and produce an efficient sequence of instructions. It is also kind of the direction that most instruction set architectures are moved in overall. But you might argue about, particularly with some of the later ARM instruction set extensions, to what degree they're starting to pull in CISC-like features. But I'd say that trying to perform a minimum unit of work within a single instruction has kind of caught on as a main design principle.
  • Brian Kardell: Way back at the beginning of that, you said, 'We're talking about these three, but there are some other ones, they're popular too.' So what are they? Just to get this all straight in my mind, what words am I overloading? Which things am I not understanding? What are some other ones that I might also know?
  • Alex Bradbury: Yeah, so I guess it depends on your definition of popularity, if you like. I suppose MIPS and PowerPC would be two which are waning in popularity, but still there. System Z from IBM, again, that's for a fairly niche use case, but there's a Debian port for it, for instance. More recently there's Loongson, coming out of China, which has similar goals to RISCV really, but taking a slightly different approach. We said ARM, but there's actually multiple instruction set architectures within that. There's the old 32-bit ARM and then there's the 64-bit ARM, which is often called AArch64. But then if you start looking at things like microcontrollers, although those are largely moving towards ARM or sometimes 32-bit RISCV. You still have stuff out there like AVR on some Arduino boards and things like that, so you might've seen some of those.
  • Brian Kardell: In trying to prepare a little bit for this, I found something that was the history of all kinds of different computing stuff. And it went into basically very early computing and it talked about how we wound up with what we wound up with. And I heard in that, I don't know if it's true, because it's just one source of information, but I heard that sort of ultimately, the IBM PC kind of just picked a winner. And they didn't invent it, they needed something that was commodity already and that you could assemble easily. And that it won more because they picked it than because it was the best thing available or something. Which sounds very business believable to me. So I don't know, do you know anything about that history? Because there's a follow-on to this that you can go ahead and also answer, which is why choose one versus the other? Right?
  • Alex Bradbury: Right.
  • Brian Kardell: Is it just PC versus Mac? JavaScript versus TypeScript? It's a lot of preference or are there really solid reasons to choose one versus the other?
  • Alex Bradbury: Okay. So I think there's fairly fundamental reasons to go with one versus another, particularly for RISCV versus the proprietary instruction set architectures. There are many cases throughout computing history where a standard has won out, not because it's the best, but because it was there at the right time and then here's enough buy-in that it's hard for anything else to take over. I think RISCV is a well-designed instruction set architecture, maybe not. There might be a few things people change about it if they started again. Even if it wasn't a well-designed one, I think there would be strong reasons for people to move towards it. And it's the fact that it's an open standard, whereas the other instruction set architectures are not. So with ARM, you have to go and pay a licensing fee at least typically to license their cores. It's very hard or expensive to get a license to do your own core. And even then, you don't have much ability to add in your own customer extensions and the like. They're very concerned about fragmentation, things like that, so don't want to have all kinds of different ARM cores from different vendors being mutually incompatible. Same with x86, it's basically just AMD or Intel who have the ability to produce x86 chips. So I guess AMD, for some historical lawsuit reasons, it ended up with them both getting a license. I think the way that RISCV has been pitched is to compare it to an open standard like HTTP or SQL, so to think in those kind of terms. So yes, there may be open source implementations, but there are also proprietary ones. And the point is that it's a standard interface that you can swap out a different component, either one you built in-house or one that you bought in externally. So you have this ability to multi-source, sort of change things over time. And I said before, also the ability to add in your own custom extensions, which often that just isn't possible with these other instruction set architectures. There are more niche ones that have had that ability, which have been proprietary. But then they suffer from having a limited amount of investment in things like compilers and tool chains. Whereas the idea of RISCV is that you have a kind of common base, but then a whole bunch of standard extensions which you can opt into if your use case requires it. And then you can define your own custom extensions on top of that if you need to.
  • Eric Meyer: Okay. So the draw of RISCV is that no one person owns it and you can use it essentially at no cost?
  • Alex Bradbury: Yeah, and so you can implement a standard at no cost, you may very well go and license a core from somebody else or buy a core from somebody else. In fact, I think most kind of shipping RISCV silicon are currently with proprietary cores, even though there's quite a lot of open source core designs available. Prior to Igalia, I co-founded lowRISC, which was working in that kind of space, working on open source RISCV designs. And then I mentioned before, the idea of extensions. If you go and look at the initial motivation behind RISCV, they were very keen that you have a standard which scales down but also scales up. I think part of what we've been discussing and maybe partly one of the reasons it can be a bit difficult to discuss something like instruction architecture is because it's used across such a wide range of domains. All the way from microcontrollers, from little tiny corners of course, that you don't really get access to, through to mobile phones, laptops, servers, more specialized use cases. So this is things that might be running Linux all the way down to things that have a couple of kilobytes of memory in. They're running some kind of standard, either a minimal real-time operating system or a basic state machine. And so the idea is that there's a core set of roughly, I think it's about 40, 50 instructions that you need to implement to be RISCV compliant, but then there are a set of standard extensions which you may or a core may choose to implement. And if it's running Linux, it would implement things like single and double precision floating point, integer multiply, whole bunch of bit manipulation instructions you'd expect to see there. But then if you think back to these really, really teeny tiny cores which have very specialized needs, they don't necessarily need that. But certainly as with picking any component of a larger system, you need to look at requirements, and then what's on the market, and how it can meet them. And the advantage of a open standard is that if there isn't something on the market which meets all the requirements, you may be able to customize something, build your own, or take an open source implementation and modify it. So there are all these options available to you which may not have been available otherwise. We talked about how you often don't really think about the instruction set architecture and I typically that's true, you don't. You don't see really, any adoption of RISCV on the server side so far, right? You see there are a few of startups there who are working in that space. But as it stands right now, I can relatively easily go and rent a server that's got ARM CPUs by the hour. I think Amazon had a stats that 50% of the CPUs deployed within their fleet over the last however many years were using their Graviton ARM-based system. The same isn't yet true for RISCV. And on the one hand, you do have concern about ecosystem compatibility, what software can you run? But then once you have the necessary software ported, you're typically not going to care what the underlying instruction set architecture is. If you're renting something like a database server sold as a infrastructure-as-a-service type role, you just care about how many operations it can run per second. It doesn't really matter what the underlying ISA is. You typically don't interact with machines at that level. So you can see that's kind of a route for alternative ISAs being adopted over time.
  • Brian Kardell: You talked about compatibility. But as I understand, we all use high-level languages and then use Clang or something to compile. So you could compile for any of the architectures, right? At least these three, you can cross-compile to any of them. Is it incomplete? Are there things that you just can't do with one and can do with the other?
  • Alex Bradbury: Yeah, so you're totally right in that you can take your C code and compile it for other architecture. Often there'll be a little bit of porting work that's needed if it did use any inline assembly or things like that, you need to dive in and adjust that. And then largely, the work is a long tail of support work. So you might run into... I'm a compiler developer. It'd be nice if compiler bugs didn't exist, but yep, they do. I've introduced numbers of them myself and fixed a few, hopefully fixed more than I've written. So you will find issues where you try and cross-compile a package, and you try and run it, either the compiler crashes or you try and run it and there's a problem, so the porting effort. You mentioned that we're often using high-level languages, those also need to be ported. So things like if you're running your JavaScript JITs, you've got Node.js embedding V8. V8 needs to have a RISCV backend for it. There is at least some RISCV support upstream in V8. How complete it is and how well-performing it is, I don't know. But the same is true, you'd be looking for the same thing with Java and the C# .NET ecosystem and anything else you're looking to support. And then beyond that, there's just the packaging and support. So things like Debian, there's a RISCV port, but it's not yet at the official level, so it doesn't have the same level of support as the ARM and x86. So we're hoping that will change in the next release. Canonical have been putting out a bunch of releases for RISCV, but again, it's still a lower tier support architecture versus x86 and ARM. So there's no fundamental barrier. It is just with quotation marks there, a bunch of engineering work to go through, and actually kick the tires, and make sure all this stuff's working, and do the necessary packaging work.
  • Brian Kardell: Yeah, I guess now that I think about it, it probably depends a lot on what you're building, because you're talking about these things and I wasn't even thinking. But of course, depending on what you're building, you need Red Hat, or Debian, or some kind of high-level operating system. And then a package manager and you have to have all that stuff compiled. And that's not just you doing the compiling, that's other people creating the packages. And so if there already was that ecosystem and you wanted to provide a spreadsheet application, that would be considerably easier than all the rest, right? So it's like getting a lot of that into the world in the first place is a little bit more complicated.
  • Alex Bradbury: Yeah, and with more kind of embedded systems you're maybe thinking about first, those have been the kind of systems where RISCV has had the most rapid uptake. And probably for that reason, because you don't care about the wider software ecosystem. You have a often internally written piece of firmware for managing your cellular modem, whatever it is, it needs to run that. And if it does that, the chip does everything it needs to, so you don't have to care about all this porting.
  • Brian Kardell: Yeah, that is exactly what I was thinking of. I wasn't thinking of desktops and things, but that's interesting when you bring it back to that, that I understand. So the other thing that I wanted to ask was the fact that you said it hasn't been successful on server side yet, right? It's not like you can go spin up a bunch of ARM VMs, whatever, but you don't see that happening with RISCV. That's kind of surprising to me in the sense that... It's not, given this thing that we just talked about, but it is surprising to me in the sense that where it's competing, as I understand, is at hardware level. You're creating some hardware that runs this instruction set. But creating hardware, I think is expensive and I think there are not that many places that do it. And they have lots of built-in things that advantage them, in a way, to be proprietary, would be my guess. And so the fact that succeeding in those areas and not on the server where it has a bunch of other challenges that we would have to solve, but they all seem very solvable without those resources and infrastructure that you need to build chips and stuff.
  • Alex Bradbury: Well, I guess fundamentally if you have RISCV on the server, you still need somebody to produce that chip. Which is going to require a huge amount of investment and be a very complicated design, plus needing all of the software infrastructure work. So it's not so much RISCV hasn't been successful in the server, it's more it's too early in the story there. There's multiple startups who are working on devices in that area, Ventana and Rivos being two. But the RISCV stated goal, as Christo put it, who is one of the creators of it, is to become the industry standard ISA for all computing devices. Which is a very lofty goal and it's a long way from there right now, but I don't see a reason why it can't get a foothold within the server market over time. But it is something that does require time and investment. And as you said, it takes a lot of time to reduce chips and particularly very high-end server chips are massively more complicated to produce than, say, smaller microcontroller designs. So those are kind of factors that mean that it's going to take a bit longer for people to start producing this stuff and shipping it.
  • Brian Kardell: Maybe there's a disconnect in my head still around what we are and aren't talking about in terms of what does exist in the world. Because like you said, there are these goals to be the standard architecture for all computing devices, which is up there with world peace. It's a lofty aim that probably we're not going to achieve, but we want to aim for. So can you buy a RISCV computer today? If you wanted to build a server today, could you do it?
  • Alex Bradbury: Yeah, so there are multiple things you can buy. So there's multiple development kits like Raspberry Pi-style single board computers. The Banana Pi F-III is one that's a number of us work on with LLVM use, because it has full support for RISCV vector instruction set architecture, which is where a lot of the compiler work is focused right now. But it's kind of something which gives you okay performance, but it's not something you'd be looking to stick in a server chassis, and expect reasonable performance, and expect to get side results out of versus a higher-end x86. But it's definitely super helpful for initial porting and experimentation work. And then there's some companies out there who have put that chip in tablet form factor or a laptop form factor, so you definitely can go and buy this stuff. It's just at the level where it's a fun thing to play with as a hobbyist.
  • Brian Kardell: Yeah, because there's not the same kind of end-user ecosystem as we said, right?
  • Alex Bradbury: Both the ecosystem and the core designs themselves are typically not a design point that's... You get higher performance out of Intel or ARM cores for those kinds of devices based on the chips that are currently available.
  • Brian Kardell: Why is that?
  • Alex Bradbury: We kind of talked about the embedded use cases being an area where you have to worry less about compatibility. If you kind of think further up the stack, there's also use cases where, yes, it's running Linux, but maybe it's in a particular environment. Maybe it's in a router, or a set-top box, or something. And so I think that vendors are maybe moving up the stack in that way. So looking at typically, you don't have a system that's quite as high performance versus a laptop or a high-end mobile phone. So I think it's just time really. As these companies are developing their portfolio of products, they're typically not jumping straight to the highest possible performance one. Which would be the highest amount of engineering effort and highest risk, both in terms of engineering risk of producing it in the first place, plus the risk of people actually buying it because of the software ecosystem maybe not being there yet.
  • Brian Kardell: So you also gave examples, I think it was NVIDIA and maybe Qualcomm were examples where they were talking about using RISCV on.
  • Alex Bradbury: Yeah, as part of larger systems. Yeah, I think Qualcomm, it must be many more by now, but towards the end of 2023, they said that they'd shipped 650 million RISCV cores. And then I think the NVIDIA figure I mentioned was a billion RISCV cores shipped in the last year.
  • Brian Kardell: So they must be quite performant for those tasks anyway?
  • Alex Bradbury: Well, performant for the tasks they're used for. Well, I think NVIDIA have maybe said the most about their use case and it's more kind of embedded cores as part of management engines within a larger system. There are multiple companies who are producing high-end RISCV cores. It's just not yet at the point where I can go to Mouser, or Amazon, or whatever and buy one. But there's no fundamental limitation there. It's just not part of the market that's developed to the point in which I can just go, and hit a buy button, and get one of the chips shipped to me.
  • Eric Meyer: One of the things that you mentioned at the beginning was LLVM, how does that relate to all of this?
  • Alex Bradbury: That's a good question. So LLVM is a compiler framework that I've been working with for a long time now. And I'd say the link is basically any of these, whether it's ARM, whether it's Intel, whether it's RISCV, whatever chip you have access to, it's not much good unless you can actually program it and run software on it. And naturally that's where the compiler comes in. And you pass in your C or C++, your compiler generates hopefully high performance code that takes best advantage of the hardware that you have available. And so compilers like LLVM and GCC, they really complete the story by making your chip actually useful rather than just being a lump of silicon that sat there on your desk that you can't really do anything with. For LLVM in particular, we've kind of grown quite a large contributor community to that over the past 9 or 10 years that we've been working on RISCV LLVM things, for a number of reasons really. I'd say LLVM has had more and more interest from both companies and academia as being easier to modify versus GCC. So particularly GCC being the GNU compiler, that's often the default on the next distributions. So for things like working with new, maybe more novel instruction set extensions, RISCV vector instruction set extension is an example of that, where it's... Has this concept of being length agnostic. So rather than hard coding a particular length of your vector, it's something that you can modify it at runtime and which a compiler can try and make use of this. There are features like this where having easier to hack on code base obviously means it's much easier to iterate on designs there and try out new things separately. LLVM is under a permissive license, so it's under an Apache-style license versus GCC's GPL, and so that's another reason that there's been more and more corporate interest. And also, LLVM acts as the standard backend for newer programming languages like Rust or Swift. And so if you want to support those on RISCV, then you need to ensure you have solid RISCV LLVM support.
  • Brian Kardell: So because of our work here, we hear people talk about the Clang compiler all the time. And what's the relationship with Clang and LLVM?
  • Alex Bradbury: Yeah, so Clang is a subproject within LLVM. So LLVM, you can think of as being a suite of toolchain components. So it has the Clang, which is a C and C++ frontend. So that's a bit which takes in your C or C++ source code and then produces the LLVM intermediate representation, which is kind of the bit which you run all the optimizations on. There are a whole bunch of other subprojects within LLVM these days such as linker, LLD, a debugger. There's even a libc, so C library, libc++, which are things like the C++ headers. And then other newer subprojects like MLIR, which provides a new extensible approach to implementing your own intermediate representations and that kind of thing. So Clang is the user-facing piece that takes your C and C++ and puts it through the rest of the LLVM toolchain. So the bit which I work on primarily is on the LLVM side, it's the backend. So you have Clang going from C and C++ through to LLVM IR. You then have a bunch of mostly target-independent passes which go into optimizations to that. And then it passes through to the backend, which translates from LLVM intermediate representation to the assembly, effectively, for your target instruction set architecture, which is where it's... Naturally becomes more and more target-dependent.
  • Brian Kardell: Okay. So we're going to go back to the... Put my dunce cap on. Because it's not a VM though, is it? Is it really a VM? It's not a VM in the sense of the Java virtual machine, right?
  • Alex Bradbury: No. So the intermediate representation that defines a kind of low-level language for things like reading and writing to memory, for performing arithmetic operations. So you could maybe look at that and compare it to something like JVM bytecode or, say, the bytecode implantations that you'd have within a JavaScript JIT. But no, it isn't.
  • Brian Kardell: But isn't it more like WebAssembly?
  • Alex Bradbury: Yeah.
  • Brian Kardell: Is that a better analogy?
  • Alex Bradbury: Yes, I'd say that's probably a better analogy. The focus really is on providing a common intermediate, but documented format that your input, whether it be C, Rust, or Fortran or whatever, is converted into, you can then reuse as much as possible between all the different backend targets so as much as possible is running on that in a target-independent way. Perhaps querying the backend every now and again for specifics it needs. And then you pass it through to the backend, where it selects the native instructions, so RISCV, or x86, or whatever.
  • Brian Kardell: But it's very widely used, right? Clang and LLVM?
  • Alex Bradbury: Yes, very much so.
  • Brian Kardell: Much more than RISCV, you would say?
  • Alex Bradbury: Yeah, but I guess by definition, more people will be using Clang and LLVM in some way, because it's used by so many of the other instruction set architectures. Although there's work to have a GCC-based backend to Rust and also a Cranelift-based backend, which is another code generator that's maybe most commonly used with WebAssembly, LLVM is still the primary backend for Rust. And then you have various vendors like... ARM's default compiler is derived from Clang and LLVM. And Clang and LLVM is the default compiler used for build to Chromium, I think on all platforms now. Firefox, all platforms now, I think. Though some of the Linux distros still try and build it with GCC. So yes, very widely used and adopted.
  • Brian Kardell: It's used for WebKit as well, isn't it?
  • Alex Bradbury: I believe upstream, yes. I'm not sure if they're still maintaining support for GCC alongside Clang for the WebKit portal for embedded. But I think Clang would be the primary compiler for WebKit as well.
  • Brian Kardell: I understand that ARM64 is related kind of mainly in name to the previous ARM, right? It doesn't actually share an actual lineage somehow?
  • Alex Bradbury: Yeah, essentially a new instruction set architecture, yeah.
  • Brian Kardell: And is it similar with the Intel family at some point? We've not really been just slowly evolving the 8088 instruction set architecture. At some point, we jumped to 64-bit or something, some big...
  • Alex Bradbury: I think that there was less reimagining on the x86 to x86-64 side. There's been more recent proposals from Intel that would be there would more radical changes to the x86 ISA. But yeah, it's a different compiler target, the 32-bit versus 64-bit x86.
  • Brian Kardell: Right. So my question is sort of like how long has each one been at it?
  • Alex Bradbury: Yeah, off the top of my head, I couldn't say quite how old the x86-64. But I think ARM64 was probably announced a few years before RISCV started appearing publicly. I guess it'd been under development for some years prior to that as well. So I think RISCV, the precise date RISCV started, it is a bit hard to pin down it. I think coming out of the university, there have been students who were working on variants of it for some time. But it's around about 10 years ago when it became public. They first had a public workshop and companies started announcing they were looking at it or doing things with it. So around about a decade there. But yeah, I think you're right that it's hard to look at another architecture and say, 'Well, it took so long for that architecture, it'll take so long for RISCV.' Really, it all comes down to market forces, and investment, and how long it takes these things to coalesce. It depends on use cases or other factors. There's been a very high-profile court case between Qualcomm and ARM about an acquisition they had of the Nuvia CPU design and about whether the license that Nuvia had for the architectural license was transferable to Qualcomm or not. So you see all of these kind of concerns, some of them, they may lead to additional investment in alternatives, which are where you're not reliant on licensing the design or the ability to produce a design for a single company. Where you're more in control of your own destiny and you can just make use of the open standard.
  • Brian Kardell: It's about the same age as React, I guess.
  • Alex Bradbury: Right.
  • Eric Meyer: I think my question is what's next for RISCV?
  • Alex Bradbury: I'm a terrible person with predictions, but it does seem inevitable that RISCV has had that uptake on the embedded side. It seems as though there's only going to be more of that over time, maybe not so much the end-user or the programmer-accessible embedded stuff. So if you go to a standard parts catalog like Mouser, or RS, or Element14, you're going to be inundated with options for ARM microcontrollers and not have many for RISCV. They'll have a few. But there's still all kinds of usage in that area that maybe aren't quite so end-user available. It seems inevitable that with more and more companies working in that space, you'll have more support for higher performance Linux-capable cores. It just seems as though that's a matter of time with various startups trying to produce products in that area. I think what's particularly interesting to me, coming as a compilers person, it's about how you further expand and mature that instruction set architecture design. There's activities now like a scalar efficiency working group within RISCV International, which are the standards body, trying to bring vendors together, try and look for any gaps within the ISA design. Where if we had a few additional instructions of a certain type, we might be able to generate better code and close any performance gap that might exist between similar designs that use, say, AR64 versus RISCV. So I think there's more work to be done there and that's quite interesting. There's also things like security extensions like with Cherry. It started off from the University of Cambridge and other collaborators being defined... It initially used a 20-year-old MIPS design as a base, because that was known to be out of patent. Then they had a version of it for ARM and for RISCV. The ARM one was in collaboration with ARM through a government-funded program. But it looks like the RISCV one is the one that's actually going to ship in the not too distant future and there are companies who are interested. There's a project underway to try and standardize it. And that's an example of the kind of thing which you can do with an open instruction set architecture, because there's not a single gatekeeper who needs to decide is this viable for everybody to adopt? You just need to have enough people interested for it to move forward.
  • Brian Kardell: Well, that's an interesting thing that we didn't talk about, but I would be kind of interested to talk about briefly if we can, which is where is it standardized? Where is that done and how is it done? What does the governance model look like?
  • Alex Bradbury: Sure. So it's at the RISCV International, it's a body that's responsible for the ISA specification. They're headquartered in Switzerland. Companies pay to be a member. And if you're an individual, you can be a member for free. And then collaborate through technical working groups in order to make proposals for changes to the ISA. Or you might propose a new working group, just try and design a new extension. Ultimately, there's a whole lot of documentation on all the processes that are used. But at the end of the day, once an extension has been completed by the people who want to propose it, it's sent out for a vote, for ratification. Whereby the main technical body vote on it, it gets approved by the board of directors, and then it becomes an official RISCV specification.
  • Brian Kardell: And how many extensions have gone through that process roughly? Is it 5? 50? 500?
  • Alex Bradbury: Yeah, it must be somewhere of 50 to a hundred, I would think. Because there's maybe a small number of larger extensions, and then there's a lot of extensions that end up being really quite small. Particularly things like control registers and the like, that operating systems might modify. So there's a large number of them, but a whole range of complexity for different extensions. Some of them are huge, like the vector extension. Whereas there are others which are defining only one or two instructions. Or maybe no instructions, they're just defining a few control registers for things like performance monitoring and that kind of thing.
  • Brian Kardell: So you said some of them are kind of huge, but does that make it Syskey?
  • Alex Bradbury: No, I'd say some extensions, by their nature, just have a large number of operations that are still hopefully designed to be operations that are logically distinct and targetable by a compiler rather than needing to be handwritten to reasonably produce them.
  • Brian Kardell: Are there a number of organizations that are part of that, that really would like to help shape these things?
  • Alex Bradbury: Yeah, so if you look at the RISCV members list, then I think you can see a large number of companies there. And you do have companies like Qualcomm, and Google, and NVIDIA, and others who are at the top tier membership. Lots of people want to move their specs in the right direction. It's often difficult to kind of match that up with resources being committed or particularly resources being committed at the right time. As with any specification work, it's very difficult to predict what your return on investment might be. And you could put a lot of time, but then if nobody's really ready to engage on that particular idea, it can be kind of lost. Or alternatively, you get a bit stuck with back and forth, where one person has their view of how it should be done, another company has their view. But we all hope that these things work out well over time. Probably it's the case that open standards, I think probably don't develop faster than closed standards. I'd say the time I've been involved in RISCV, particularly on things like security extensions, ARM has moved a lot faster. But hopefully over time, with an open standard like RISCV, you get to a better endpoint. And also, as things continue to grow and you have more contributors, there's probably scope for more things being standardized that just wouldn't have made the cut when you're reliant on a single gatekeeper.
  • Brian Kardell: It's been a really educational hour for me. So definitely thanks for your patience and our questions, which are probably considerably basic for you. But I learned a lot in the last hour or so. So thanks for coming on, Alex.
  • Alex Bradbury: Thanks very much having me.
  • Eric Meyer: Yeah, thank you.