GithubHelp home page GithubHelp logo

shitty_cpu's Introduction

Shitty CPU

Some time ago, I participated in a CTF and one task peeked my curiosity like no other.

This task was created in memory of John Conway for his research on game of life. As it turns out, game of life is turing complete and CTF organizers created a small CPU which runs completely in game of life.

This CPU is made of so many cells, that just scrolling to see individual cells takes some time:

scale

This peeked my curiosity enough that I decided to design my own simple CPU in order to better understand how CPU's work under the hood.

Idea

For my purposes I need something as simple as it gets. So do not expect anything fancy like pipelines, interrupts, fancy maths or whatever...

The need for simplicity basically dictated von Neumann architecture. Also in order to be able to do any kind of maths the minimum amount of registers required was two - and thus, A and B registers were born. The "maths" I am talking about here is just a simple addition of those two registers. Obviously because this is two's complement addition, just adding negated number will give us subtraction for free. For simplicitly I decided to ditch logic operations.

To be turing complete - I will also need some conditional branching, so basically jumps to some address only if the result of addition is zero.

Because I will try to write at least one real program for this CPU (aka blink some LEDs), it would be quite nice to be able to address memory directly "inside" instructions. In the end this "requirement" ended up somewhat complicating CPU design, but, as a result, the instruction set became much more flexible.

Oh and of course - this is an 8 bit CPU. Why whould you choose anything else in this scenario?

At first I thought that it would be nice to find some logic simulators and build the whole CPU out of logic elements by manually connecting them. But after some reading, it felt like HDLs are a natural fit for this. Even though those operate at a bit "higher-level".

I chose VHDL completely arbitrary, no preference for this over the other languages.

Components

It turns out, that (simple) CPUs do not require that many components after all. In this CPU there basically is:

  • A few Multiplexers
  • Some registers
  • Some counters
  • ALU
  • Memory
  • Controller

Multiplexer

Well this is a simple component which "connects" single output to some number of inputs. Input selection is controlled by the binary value of selection signals.

multiplexer

In this example there are four inputs, one output and consequently - two selection signals.

This is a truth table ("sort of"):

sel2 sel1 output
L L out == in1
L H out == in2
H L out == in3
H H out == in4

This is an example 1 bit multiplexer out of pure logic elements:

multiplexer_logic

If we would like to build a multiplexer with wider inputs than one bit - we can simply copy this circuit multiple times. All respective selection pins from separate 1 bit muxes should be connected together and inputs to separate 1 bit muxes represent separate bits of input.

multiplexer2

For our 8 bit CPU's we will be using a multiplexer with 8 selectable inputs (and 4 selector signals) each of which is 8 bits wide.

Register

Register is basically a small memory built from logic components (think expensive memory :D).

This CPU will be using so called "synchronized" registers, meaning that it will set its value only on rising clock edge. So our register should have:

  • data input
  • clock input
  • write enable input
  • output

So basically I would like a "memory" which remembers input value when clock is on the rising edge and write enable input is high. And output should always represent the "remembered" value.

This is how we can achieve this for single bit:

register

This is called D-flip flop.

Again wider register can be achieved by connecting "control" (clock, write enable) inputs together for multiple flip flops and by connecting data inputs for separate bits to separate flip flops data inputs.

register2

Counter

Counter is basically a register, but it additionally has increment signal. Value stored in the register is incremented on each rising edge of the increment signal.

This component is extremely useful to hold Program Counter value, and to increment it once instruction is fetched :).

This is a version of a 2-bit counter with two J-K flip flops:

counter

ALU

In this case ALU is perfoming only only one operation - adding numbers. So it is probably better to call it adder ¯\(ツ)/¯, but oh well... ALU sounds fancier.

As mentioned before we would like to have conditional jumps, therefore we will need to output some kind of signal indicating whether or not the jump should be taken. For this CPU that signal will indicate if the sum is zero. Other than that the ALU is simple adder.

Lets start with adding two bits (a and b) together:

bit a bit b output
L L L
H L H
L H H
H H L

Note: that summing 1 and 1 together produces zero, as it "wraps around".

The truth table above is just a XOR gate. This would be enough in order to add two bits together, but we would like to be able to chain these adders in order to add larger numbers (e.g. 8bit). In order to achieve this we will be adding three bits together instead of two, and also we will be outputing additional carry signal.

bit a bit b carry in carry out output
L L L L L
H L L L H
L H L L H
H H L H L
L L H L H
H L H H L
L H H H L
H H H H H

We can achieve this with this schematic:

adder

And chaining them together:

adder2

Memory

For memory I will not be going into details. I will just describe how to use it and will leave it as a black box.

Naturally we need some way to tell it when to write and when to read. This will be determined by a write enable signal. When write enable signal is low - memory will be reading. When write enable signal is high - memory will be writing.

Reading and writing will always be happening on rising clock edge.

And, obviously, address bus will select the address where to read or write.

When writing - data will be "taken" from data input bus, and when reading - data will be "put" on data output bus.

memory


By now, only the controller is left. But I believe that it is easier to understand what it does and how it achieves what it needs after there is some idea how the rest of the components will be connected together.

Therefore I will leave it for later stage.

Architecture

Now that we have our components, lets look how we could connect them together (as mentioned before, I will be leaving out the controller for now).

We know that we would like to have two registers: A and B. We know that we would like to have accumulator register to store ALU result. Our instruction set (see instruction set) does require operand register. Obivously we need instruction register.

Keeping all that in mind, this is what I came up with: cpu without controller

The idea here is that we do have two multiplexers, one for selecting address at memory and the other for selecting "data bus" value. Input to both multiplexers are all of the registers/counters/memory outputs. So basically we can select any register/counter/etc for either address or data bus via multiplexer select signal. Now the output of the address multiplexer is directly connected to the address bus of the memory. So by manipulating selection bits of that multiplexer - we can select the source for the address in memory (be it a PC register, ALU accumulator register or just simply register A).

The data bus is a bit more interesting. Output of all of the registers are inputs to the data bus multiplexer, just like the address bus. But the output of the data multiplexer is connected to the inputs of all the various register/counters/memory/etc. This forms some kind of loop.

Lets simulate what would controller do in order to execute some simple instruction. For example: MOV A, B (e.g. copy value of register B to register A).

So for that we would like to:

    1. On the first clock, we select data multiplexer to output the value of the register B onto the bus
    1. On the second clock, we set register A write enable signal to high

mov

By this point it should be much clearer that the job of the contoller is basically to manipulate various control signals in the CPU in order to achieve the results. But again before drawing the controller - lets define our instruction set.

Instruction set

Lets define our instruction set in order to be able to create controller for our CPU and write some programs.

So I would like to address memory directly from instructions, or move constants directly into registers, but, unfortunately, I cannot fit full byte for constant value and the opcode into single byte. So I decided to make all instructions two bytes. First byte encodes the instruction itself and the second one - operand value. Note that not all instruction uses that operand value, which wastes a little bit of memory, but always having this byte is just simpler compared to conditionally fetching operand from memory according to instruction.

0        8        16
+--------+--------+
| Opcode |Operand |
+--------+--------+

Operand is just a arbitrary 8bit value available during instruction execution.

Opcode on the other has some structure to itself. I decided to divide those 8bits of the opcode like this:

  • 3 LSB bits for selection of of first suboperand (S1)
  • 3 middle bits for selection of second suboperand (S2)
  • 2 MSB bits for selection of instruction type (there will be four instruction types) (T)
+---+---+--+
|S1 |S2 |T |
+---+---+--+

The mentioned four instruction types are:

  • Register to register - Moves data between registers (or ALU and accumulator register) (T value of 00)
  • Register to memory - Moves data from some register to memory (T value of 01)
  • Memory to register - Moves data from memory to some register (T value of 10)
  • Conditional - well the only conditional instruction supported by this CPU is JZ (jump if accumulator register value is zero) (T value of 11)

S1 and S2 suboperands semantics differ according to instruction type, therefore I will list them separately:

Register to register

For register to register instructions, type value T is equal to 00.

The second suboperand selects source register like this:

  • 000 - register a
  • 001 - register b
  • 010 - ALU accumulator
  • 011 - operand byte from instruction
  • 100 - program counter
  • 101 - memory
  • 110 - unused
  • 111 - unused

The first suboperand S1 selects destination register like this:

  • 000 - register a
  • 001 - register b
  • 010 - acc register (aka performs addition)
  • 100 - program counter register

Examples:

                instr.   operand.
MOV A, B        00001000 00000000
ADD             00000010 00000000# Note: when writing into accumulator register, source register does not matter
MOV B, 0x15     00011001 00010101
JMP 0x15        00100011 00010101

Register to memory

For register to memory instructions type value T is equal to 01.

suboperand S1 selects data source for memory write. suboperand S2 selects address source for memory write.

suboperand meaning and values matches the one from register to register selection:

  • 000 - register a
  • 001 - register b
  • 010 - ALU accumulator
  • 011 - operand byte from instruction
  • 100 - program counter
  • 101 - memory
  • 110 - unused
  • 111 - unused

Examples:

                instr.   operand.
MOV [0x15], B   01011001 00010101
MOV [A], B      01000001 00000000

Memory to register

For memory to register instructions type value T is equal to 10.

suboperand S1 selects destination register for data write.

  • 000 - register a
  • 001 - register b
  • 010 - acc register (aka performs addition)
  • 100 - program counter register

suboperand S2 selects address value source for memory. suboperand meaning and values matches the one from register to register selection:

  • 000 - register a
  • 001 - register b
  • 010 - ALU accumulator
  • 011 - operand byte from instruction
  • 100 - program counter
  • 101 - memory
  • 110 - unused
  • 111 - unused

Examples:

                instr.   operand.
MOV B, [0x15]   10011001 00010101
MOV B, [A]      10000001 00000000
JMP [0x15]      10011100 00010101

Conditional

For conditional instructions type value is equal to 11.

All other bits are ignored therefore can be set to anything else.

Examples:

                instr.   operand.
JZ              11000000 00000000

Controller

Now finally we have all the pieces required to start talking about the controller: the last part of our CPU.

Sequence generator

Ok I lied, we still need a seqence generator, but its a simple one, and I consider it to be part of the controller.

As we have seen before, execution of instruction is basically a two step process. First we need to put "the data" onto the bus. And secondly - we need to write the results somewhere.

So each instruction will be executed in two steps:

  • load - during this stage, data is loaded onto data bus
  • store - during this stage, data is read from the data bus

But wait! We need to know what instruction are we executing currently. And looking back at the our ISA - we need (sometimes) to know our operand byte. Therefore in order to execute one instruction we will have to:

  • Fetch instruction byte from memory
  • Fetch opcode byte from memory
  • Execute the instruction

Therefore combining the above two, we get these stages:

  1. load instruction byte onto data bus
  2. store instruction byte into instruction register from data bus
  3. load operand byte onto data bus
  4. store operand byte into operand register from data bus
  5. load whatever data is required by the instruction onto data bus
  6. store data wherever required by the instruction

For this we will build a small shift register which will have a signal for each of the stages, and will "shift" the stage on every rising edge of the clock signal:

sequence generator

This way we will have a signal indicating which stage we are at.

Mapping controller inputs and outputs

The controller takes instruction byte, alu zero out as its input signals (I am excluding obvious ones like reset and clock). It internally generates sequence signals :). Therefore we now only need to just map the input signals (including sequence signals) to the output signals:

signal map

Note that I have separated execution by instruction type. But we already have "signals" for instruction type because we just mapped two MSB bits in the instruction itself to designate instruction type.

Therefore using some combinatorial logic, we get this masterpiece:

controller

Here we just "replicated" the above table with the logic elements.

This is how single NOP (MOV A, A) instruction looks like "inside" the controller itself:

nop instruction

So lets connect our full CPU:

cpu

Here we just connected all the controll signals to the corresponding outputs of the controller and created alu zero signal with very wide AND gate with all the inputs inverted.

So lets execute some instructions :). This is how MOV A, 0x15 would look like:

cpu instruction

Program

One of my goals was to write a LED blinker on a real FPGA, so I wrote this program:

# Configure GPIO into output mode
0b00011000, 0xFF,   # 0: MOVE A, 0xFF
0b10011000, 0xFC,   # 2: MOVE [0xFC], A

# Initialize our index counter
0b00011000, 0x08,   # 4: MOVE A, 0x08
0b00011001, 39,     # 6: MOVE B, 39

# Output value from memory into GPIO
0b00000010, 0x00,   # 8: ADD
0b01010001, 0x00,   # 10: MOVE B, [ALU]
0b10011001, 0xFD,   # 12: MOVE [0xFD], B

# Decrement index counter
0b00011001, 0xFF,   # 14: MOVE B, 0xFF
0b00000010, 0x00,   # 16: ADD
0b00010000, 0x00,   # 18: MOVE A, ALU
0b11000000, 0x00,   # 20: JZ 0x00

# Wait a little
0b10011000, 0x80,   # 22: MOVE [0x80], A
0b00011000, 0x14,   # 24: MOVE A, 0x0A
0b00011001, 0xFF,   # 26: MOVE B, 0xFF
0b00000010, 0x00,   # 28: ADD
0b00010000, 0x00,   # 30: MOVE A, ALU
0b11000000, 36,     # 32: JZ 36
0b00011100, 28,     # 34: J 28
0b01011000, 0x80,   # 36: MOVE A, [0x80]

# continue cycle
0b00011100, 6,      # 38: J 6

# GPIO data:
0x80,
0x40,
0x20,
0x10,
0x08,
0x04,
0x02,
0x01

For this program to work, I mapped GPIO pins with the LEDs connected to the highest address bits of the address bus. It is somewhat like memory mapped GPIO.

And finally... The CPU ran on the FPGA itself: fpga

shitty_cpu's People

Contributors

jauler avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

shitty_cpu's Issues

Z states don't really make sense in CPU logic

output <= "ZZZZZZZZ";

In general, Z state is known as high impedance or floating (https://en.wikipedia.org/wiki/High_impedance) and is used for IO ports.

Let's say we have a device, which has I/O voltage of +5 V and we want to control one output pin. If we set it to 1, the output is connected to Vcc (+5 V in this case). If we set it to 0, it gets connected to GND (0 V in this case).

Now let's imagine a theoretical case, where a single wire is connected to multiple sensors, that sense something important. And once that event is triggered, it sends an interrupt to an input pin of a device in a form of a short +5V impulse. Logic would dictate that the output of sensors (connected to the wire) should be 0 (hence connected to GND). But that would be wrong, because once a sensor decides to send an interrupt over the wire, it's pulse would immediately get shorted to GND, thus sinking all current directly to ground and potentially damaging the output (The sensor sending the interrupt signal is effectively connected to +5 and all remaining sensors on the wire are connected to GND - so a short).

Enter Z state - a special state of a pin where it is neither high, nor low and for all intents and purposes - disconnected (if you are curious about how this is accomplished and why it is truly never disconnected, hit me up). So the correct application of previous example would be that all sensors are not set to 0 by default, but to Z and once the event happens, pin is set to +5.

Of course, the wire connecting all sensors still has to be biased to ground somehow (usually a resistor to GND or Vcc), but that is a topic for another day :)

A secret trick to minimize code blaot

alu1 : alu port map(

Here's a secret trick, that for some reason is not commonly shown in FPGA examples, but hidden in books and courses:

Currently, to "include" another block in your code, you need to first define it:

COMPONENT alu IS
port(
	clk    : IN std_logic;
	in1    : IN std_logic_vector(7 downto 0);
	in2    : IN std_logic_vector(7 downto 0);
	sum    : OUT std_logic_vector(7 downto 0);
	zero   : OUT std_logic
);
END COMPONENT;

And the "connect" it:

COMPONENT alu IS
port(
	clk    : IN std_logic;
	in1    : IN std_logic_vector(7 downto 0);
	in2    : IN std_logic_vector(7 downto 0);
	sum    : OUT std_logic_vector(7 downto 0);
	zero   : OUT std_logic
);
END COMPONENT;

Defining + connecting is good if you are connecting multiple blocks of the same time (like you do with reg1 and 2).

But just for one block, it's a bit pointless and bloaty. You can do this instead:

In line 123, you would write:

alu1 : entity work.alu port map (
		clk => clk,
		in1 => reg_a_out,
		in2 => reg_b_out,
		sum => alu_out,
		zero => alu_zero
);

Or even:

alu1 : entity work.alu port map (clk, reg_a_out, reg_b_out, alu_out, alu_zero);

Using this method, there is no reason to declare component before begin.

Default initialization does not mean what you think it means

SIGNAL clk: std_logic := '0';

This statement would be correct if you were writing software. But in HDL, this may cause some obscure and weird issues. In simulation - this is fine, before simulation starts, it gets default initialized. But on real devices, this kind of default initialization may not be supported (you have to check FPGA's datasheet on that). Even it is supported, it may be annoying, because to reset your logic back to initial state, you would have to physically power off and power on the board.

To counteract this, it is SUPER common to have reset logic in processes:

  process(clk, n_reset) begin
    if (n_reset = '0') then
      <reset your shit here>
    else (if(rising_edge(clk)) then
      <do your shit here>
    end if;
  end process;

In FPGAs it is also common to have reset hardware, that tugs the n_reset signal on power on and can be wired to some external signals to reset on demand.

How memory is created on physical devices

library ieee;

This is just going to be some observations and advice.

The way you defined your RAM is good for your simulation purposes, but this approach may not be the most efficient when performing synthesis as this will create your memory out of LUT's which is expensive. It is better to utilize LUTs for logic. To solve this problem, FPGAs have dedicated memory blocks just for such cases (if you are curious how intel/Altera does it, search for M20K, M9K, M4K on google, it will bring you to intel's docs for more info).

To utilize this memory, there are two common ways:

  • Use a gui tool to generate an ENTITY file, which handles the memory "allocation" under the hood. Not my favorite method, because its a bit clunky and most FPGA GUI tools are shit.
  • Inferrence - which means a specially written file, that once seen by synthesis toolchain, creates a memory block of your desired size. This is quick, simple and compatible with simulation tools, even giving you correct timing responses (on a digital level). To see how intel requires you to do it, take a look here: https://www.intel.com/content/www/us/en/programmable/quartushelp/13.0/mergedProjects/hdl/vhdl/vhdl_pro_ram_inferred.htm. This also has the added benefit of giving you the ability to write much more portable code. I'm not going to go into the details of how it is implemented, but it is something along the lines of C++'s if constexpr

Keep track of your labels

END PROCESS;

HDL files have a tendency to baloon out of control (i personally have written 1-2 k line files). Process (and other keyword) labels are used to help you keep track in which block your logic is defined. So it if you define a process label step : PROCESS (clk) it is customary to end it the same way: END PROCESS step;. Of course in such small block you may just omit them (well sometimes they do help in FPGA tools as well, but this is a topic for another time).

Mux optimization

step : PROCESS(in1, in2, in3, en, sel)

This can be rewritten as:

architecture mux of mux_arch is
  signal intermediate : std_logic_vector(7 downto 0) := (others => '0');
begin
  case sel is
    when "0" =>  intermediate <= in1;
    when "1" =>  intermediate <= in2;
    when "2" =>  intermediate <= in3;
    when others => intermediate <= (others =>'0');
  end case;

  output <= intermediate when en = '1' else (others => '0');

end architecture mux_arch;

See this: https://insights.sigasi.com/tech/signal-assignments-vhdl-withselect-whenelse-and-case/

Also some points of interest:

  • VHDL is case insensitive, if you are too lazy to pres Shift
  • This approach has the potential to synthesize better logic
  • You could also use generic list (https://www.ics.uci.edu/~jmoorkan/vhdlref/generics.html) and array of std_logic_vector to make your mux more generic (with port count set at synthesis, not hardcoded)
  • It is customary to name your architecture block differently than your entity block

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.