GithubHelp home page GithubHelp logo

kosarev / z80 Goto Github PK

View Code? Open in Web Editor NEW
63.0 12.0 10.0 1.15 MB

Fast and flexible Z80/i8080 emulator with C++ and Python APIs

License: MIT License

C++ 54.09% CMake 0.81% Python 44.05% Assembly 1.05%
z80 z80-emulator i8080 i8080a 8080 emulator mit-license python cpu-emulators

z80's Introduction

z80

Fast and flexible Z80/i8080 emulator.

C/C++ CI

Quick facts

  • Implements accurate machine cycle-level emulation.

  • Supports undocumented instructions, flags and registers.

  • Passes the well-known cputest, 8080pre, 8080exer, 8080exm, prelim and zexall tests.

  • Follows a modular event-driven design for flexible interfacing.

  • Employs compile-time polymorphism for zero performance overhead.

  • Cache-friendly implementation without large code switches and data tables.

  • Offers default modules for the breakpoint support and generic memory.

  • Supports multiple independently customized emulator instances.

  • Written in strict C++11.

  • Does not rely on implementation-defined or unspecified behavior.

  • Single-header implementation.

  • Provides a generic Python 3 API and instruments to create custom bindings.

  • MIT license.

Contents

Hello world

#include "z80.h"

class my_emulator : public z80::z80_cpu<my_emulator> {
public:
    typedef z80::z80_cpu<my_emulator> base;

    my_emulator() {}

    void on_set_pc(z80::fast_u16 pc) {
        std::printf("pc = 0x%04x\n", static_cast<unsigned>(pc));
        base::on_set_pc(pc);
    }
};

int main() {
    my_emulator e;
    e.on_step();
    e.on_step();
    e.on_step();
}

hello.cpp

Building:

$ git clone [email protected]:kosarev/z80.git
$ cmake z80
$ make
$ make test
$ make hello  # Or 'make examples' to build all examples at once.

Running:

$ ./examples/hello
pc = 0x0000
pc = 0x0001
pc = 0x0002

In this example we derive our custom emulator class, my_emulator, from a mix-in that implements the logic and default interfaces necessary to emulate the Zilog Z80 processor. As you may guess, replacing z80_cpu with i8080_cpu would give us a similar Intel 8080 emulator.

The on_set_pc() method overrides its default counterpart to print the current value of the PC register before changing it. For this compile-time polymorphism to be able to do its job, we pass the type of the custom emulator to the processor mix-in as a parameter.

The main() function creates an instance of the emulator and asks it to execute a few instructions, thus triggering the custom version of on_set_pc(). The following section reveals what are those instructions and where the emulator gets them from.

Adding memory

Every time the CPU emulator needs to access memory, it calls on_read() and on_write() methods. Their default implementations do not really access any memory; on_read() simply returns 0x00, meaning the emulator in the example above actually executes a series of nops, and on_write() does literally nothing.

Since both the reading and writing functions are considered by the z80::z80_cpu class to be handlers, which we know because they have the on preposition in their names, we can use the same technique as with on_set_pc() above to override the default handlers to actually read and write something.

class my_emulator : public z80::z80_cpu<my_emulator> {
public:
    ...

    fast_u8 on_read(fast_u16 addr) {
        assert(addr < z80::address_space_size);
        fast_u8 n = memory[addr];
        std::printf("read 0x%02x at 0x%04x\n", static_cast<unsigned>(n),
                    static_cast<unsigned>(addr));
        return n;
    }

    void on_write(fast_u16 addr, fast_u8 n) {
        assert(addr < z80::address_space_size);
        std::printf("write 0x%02x at 0x%04x\n", static_cast<unsigned>(n),
                    static_cast<unsigned>(addr));
        memory[addr] = static_cast<least_u8>(n);
    }

private:
    least_u8 memory[z80::address_space_size] = {
        0x21, 0x34, 0x12,  // ld hl, 0x1234
        0x3e, 0x07,        // ld a, 7
        0x77,              // ld (hl), a
    };
};

adding_memory.cpp

Output:

read 0x21 at 0x0000
pc = 0x0001
read 0x34 at 0x0001
read 0x12 at 0x0002
pc = 0x0003
read 0x3e at 0x0003
pc = 0x0004
read 0x07 at 0x0004
pc = 0x0005
read 0x77 at 0x0005
pc = 0x0006
write 0x07 at 0x1234

Input and output

Aside of memory, another major way the processors use to communicate with the outside world is via input and output ports. If you read the previous sections, it's now easy to guess that there is a couple of handlers that do that. These are on_input() and on_output().

Note that the handlers have different types of parameters that store the port address, because i8080 only supports 256 ports while Z80 extends that number to 64K.

    // i8080_cpu
    fast_u8 on_input(fast_u8 port)
    void on_output(fast_u8 port, fast_u8 n)

    // z80_cpu
    fast_u8 on_input(fast_u16 port)
    void on_output(fast_u16 port, fast_u8 n)

The example:

class my_emulator : public z80::z80_cpu<my_emulator> {
public:
    ...

    fast_u8 on_input(fast_u16 port) {
        fast_u8 n = 0xfe;
        std::printf("input 0x%02x from 0x%04x\n", static_cast<unsigned>(n),
                    static_cast<unsigned>(port));
        return n;
    }

    void on_output(fast_u16 port, fast_u8 n) {
        std::printf("output 0x%02x to 0x%04x\n", static_cast<unsigned>(n),
                    static_cast<unsigned>(port));
    }

private:
    least_u8 memory[z80::address_space_size] = {
        0xdb,        // in a, (0xfe)
        0xee, 0x07,  // xor 7
        0xd3,        // out (0xfe), a
    };
};

input_and_output.cpp

Accessing processor's state

Sometimes it's necessary to examine and/or alter the current state of the CPU emulator and do that in a way that is transparent to the custom code in overridden handlers. For this purpose the default state interface implemented in the i8080_state<> and z80_state<> classes provdes a number of getters and setters for registers, register pairs, interrupt flip-flops and other fields constituting the internal state of the emulator. By convention, calling such functions does not fire up any handlers. The example below demonstrates a typical usage.

Note that there are no such accessors for memory as it is external to the processor emulators and they themselves have to use handlers, namely, the on_read() and on_write() ones, to deal with memory.

class my_emulator : public z80::z80_cpu<my_emulator> {
public:
    ...

    void on_step() {
        std::printf("hl = %04x\n", static_cast<unsigned>(get_hl()));
        base::on_step();

        // Start over on every new instruction.
        set_pc(0x0000);
    }

accessing_state.cpp

Modules

By overriding handlers we can extend and otherwise alter the default behavior of CPU emulators. That's good, but what do we do if it's not enough? For example, what if the default representation of the processor's internal state doesn't fit the needs of your application? Say, you might be forced to follow a particular order of registers or you just want to control the way they are packed in a structure because there's some external binary API to be compatible with. Or, what if you don't need to emulate the whole processor's logic, and just want to check if a given sequence of bytes forms a specific instruction?

That's where modules come into play. To understand what they are and how to use them, let's take a look at the definitions of the emulator classes and see what's under the hood.

template<typename D>
class i8080_cpu : public i8080_executor<i8080_decoder<i8080_state<root<D>>>>
{};

template<typename D>
class z80_cpu : public z80_executor<z80_decoder<z80_state<root<D>>>>
{};

Each of these classes is no more than a stack of a few other mix-ins. The root<> template provides helpers that make it possible to call handlers of the most derived class in the heirarchy, D, which is why it takes that class as its type parameter. It also contains dummy implementations of the standard handlers, such as on_output(), so you don't have to define them when you don't need them.

i8080_state<> and z80_state<> have been mentioned in the previous section as classes that define transparent accessors to the processor state, e.g., set_hl(). They also define corresponding handlers, like on_set_hl(), that other modules use to inspect and modify the state.

i8080_decoder<> and z80_decoder<> modules analyze op-codes and fire up handlers for specific instructions, e.g, on_halt().

Finally, the job of i8080_executor<> and z80_executor<> is to implement handlers like on_halt() to actually execute corresponding instructions.

The convention is that modules shall communicate with each other only via handlers. Indeed, if they would call the transparent accessors or refer to data fields directly, then those accessors wouldn't be transparent anymore and handlers would never be called. This also means that modules are free to define transparent accessors in a way that seems best for their purpose or even not define them at all.

All and any of the standard modules can be used and customized independently of each other. Moreover, all and any of the modules can be replaced with custom implementations. New modules can be developed and used separately or together with the standard ones. In all cases the only requirement is to implement handlers other modules rely on.

The root module

template<typename D>
class root {
public:
    typedef D derived;

    ...

    fast_u8 on_read(fast_u16 addr) {
        unused(addr);
        return 0x00;
    }

    void on_write(fast_u16 addr, fast_u8 n) {
        unused(addr, n);
    }

    ...

protected:
    const derived &self() const{ return static_cast<const derived&>(*this); }
    derived &self() { return static_cast<derived&>(*this); }
};

The main function of the root module is to define the self() method that other modules can use to call handlers. For example, a decoder could do self().on_ret() whenever it runs into a ret instruction.

Aside of that, the module contains dummy implementations of the standard handlers that do nothing or, if they have to return something, return some default values.

State modules

template<typename B>
class i8080_state : public internals::cpu_state_base<B> {
public:
    ...

    bool get_iff() const { ... }
    void set_iff(bool f) { ... }

    ...
};

template<typename B>
class z80_state : public internals::cpu_state_base<z80_decoder_state<B>> {
public:
    ...

    void exx_regs() { ... }
    void on_exx_regs() { exx_regs(); }

    ...
};

The purpose of state modules is to provide handlers to access the internal state of the emulated CPU. They also usually store the fields of the state, thus defining its layout in memory.

Regardless of the way the fields are represented and stored, the default getting and setting handlers for register pairs use access handlers for the corresponding 8-bit registers to obtain or set the 16-bit values. Furthermore, the low half of the register pair is always retrieved and set before the high half. This means that by default handlers for 8-bit registers are getting called even if originally a value of a register pair they are part of has been queried. Custom implementations of processor states, however, are not required to do so.

    fast_u16 on_get_bc() {
        // Always get the low byte first.
        fast_u8 l = self().on_get_c();
        fast_u8 h = self().on_get_b();
        return make16(h, l);

    void on_set_bc(fast_u16 n) {
        // Always set the low byte first.
        self().on_set_c(get_low8(n));
        self().on_set_b(get_high8(n));
    }

Aside of the usual getters and setters for the registers and flip-flops, both the i8080 and Z80 states have to provide an on_ex_de_hl_regs() handler that exchanges hl and de registers the same way the xchg and ex de, hl do. And the Z80 state additionally has to have an on_exx_regs() that swaps register pairs just as the exx instruction does. The default swapping handlers do their work by accessing registers directly, without relying on the getting and setting handlers, similarly to how silicon implementations of the processors toggle internal flip-flops demux'ing access to register cells without actually transferring their values.

Because the CPUs have a lot of similarities, processor-specific variants of modules usually share some common code in helper base classes that in turn are defined in the internal class. That class defines entities that are internal to the implementation of the library. The client code is therefore supposed to be written as if the module classes are derived directly from their type parameters, B.

Note that z80_state has an additional mix-in in its inheritance chain, z80_decoder_state<>, whereas i8080_state is derived directly from the generic base. This is because Z80 decoders are generally not stateless objects; they have to track which of the IX, IY or HL registers has to be used as the index register for the current instruction. The decoder state class stores and provides access to that information.

template<typename B>
class z80_decoder_state : public B {
public:
    ...

    iregp get_iregp_kind() const { ... }
    void set_iregp_kind(iregp r) { ... }

    iregp on_get_iregp_kind() const { return get_iregp_kind(); }
    void on_set_iregp_kind(iregp r) { set_iregp_kind(r); }

    ...
};

In its simplest form, a custom state module can be a structure defining the necessary state fields together with corresponding access handlers.

template<typename B>
struct my_state : public B {
    fast_u16 pc;

    ...

    fast_u16 on_get_pc() const { return pc; }
    void on_set_pc(fast_u16 n) { pc = n; }

    ...

    // These always have to be explicitly defined.
    void on_ex_de_hl_regs() {}
    void on_ex_af_alt_af_regs() {}
    void on_exx_regs() {}
};

custom_state.cpp

Feedback

Any notes on overall design, improving performance and testing approaches are highly appreciated. Please file an issue or use the email given at https://github.com/kosarev. Thanks!

z80's People

Contributors

arpruss avatar kosarev avatar simonowen avatar toniwestbrook avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

z80's Issues

Fix power-on state

Some CPU state fields are not getting set correctly on power on, hard reset and soft reset. See further details in the Simon's comment at #29 (comment) .

Inconsistent timing of on_read/write() and on_input/output()

We call on_read() and on_write() in the beginning of the cycle but seem to aim to call on_input() and on_output() at where they actually should happen within the cycle. Now that we are about to introduce something like on_wait() (see #10), which doesn't make much sense if is not being called at right cycle's tick, maybe we should fix the timing of on_read() and on_write() calls as well.

pip install z80 fails

Running setup.py install for z80 did not run successfully.
โ”‚ exit code: 1
โ•ฐโ”€> [20 lines of output]
running install
C:\Python311\Lib\site-packages\setuptools\command\install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
running build
running build_py
creating build
creating build\lib.win-amd64-cpython-311
creating build\lib.win-amd64-cpython-311\z80
copying z80_disasm.py -> build\lib.win-amd64-cpython-311\z80
copying z80_disasm_parser.py -> build\lib.win-amd64-cpython-311\z80
copying z80_error.py -> build\lib.win-amd64-cpython-311\z80
copying z80_instr.py -> build\lib.win-amd64-cpython-311\z80
copying z80_machine.py -> build\lib.win-amd64-cpython-311\z80
copying z80_main.py -> build\lib.win-amd64-cpython-311\z80
copying z80_source.py -> build\lib.win-amd64-cpython-311\z80
copying z80_token.py -> build\lib.win-amd64-cpython-311\z80
copying z80_init_.py -> build\lib.win-amd64-cpython-311\z80
running build_ext
building 'z80._z80' extension
error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/
[end of output]

this cpp built tools take over 1.3 GB to install. maybe use another compiler or use the VScode C++ compiler /

odd behavior (via Python API)

Here's the script I ran:

import z80

m = z80.Z80Machine()

image = b'\x00\xdd\x21\xf5\x04\x00'
m.set_memory_block(0, image)

for i in range(5):
    print('-' * 35)
    print(f"m.pc = {m.pc:04X}")

    (instr_asm, instr_nbytes) = m._disasm(image[m.pc: m.pc+4])
    addr_of_next_instr = m.pc + instr_nbytes
    instr_byte_string = ' '.join(
        f"{byte:02X}"
        for byte in image[m.pc: addr_of_next_instr]
    )
    print(f"instr is next {instr_nbytes} bytes (so following instr starts at {addr_of_next_instr:04X})")
    print(f"{instr_byte_string:12} = {instr_asm}")
    print('')

    m.ticks_to_stop = 1
    m.run()

And here's the output I got:

-----------------------------------
m.pc = 0000
instr is next 1 bytes (so following instr starts at 0001)
00           = nop

-----------------------------------
m.pc = 0001
instr is next 4 bytes (so following instr starts at 0005)
DD 21 F5 04  = ld Pix, W0x04f5

-----------------------------------
m.pc = 0002
instr is next 3 bytes (so following instr starts at 0005)
21 F5 04     = ld Phl, W0x04f5

-----------------------------------
m.pc = 0005
instr is next 1 bytes (so following instr starts at 0006)
00           = nop

-----------------------------------
m.pc = 0006
instr is next 1 bytes (so following instr starts at 0007)
             = nop

So:
(1) At the line DD 21 F5 04 = ld Pix, W0x04f5, I would have expected the disasm string to be ld ix, 0x04f5. Instead, the format characters P and W are included along with their expansions. Looking at the code in on_format_char, I can't even figure out how this is possible.
(2) Given that the instruction at address 1 is 4 bytes long, I would have expected the next value of m.pc to be 5. Instead, it's only 2. (But the one after that is 5, so the machine isn't just stepping through the image one byte at a time.)
(3) Am I mis-using the Python API? I couldn't find any documentation.

(I fetched the z80 code today, so it's presumably up-to-date.)

Missing undocumented RETI behaviour

As detailed in the Undocumented Z80, the RETI instruction copies iff2 to iff1, which is the same behaviour as RETN. This isn't mentioned in the official documentation.

I've just tested it on real hardware and can confirm it does restore iff1 from iff2. My test code was as follows:

8000          JR  8000
8002          HALT
8003          DI
8004          LD  A,04
8006          OUT (FE),A
8008          HALT

0038          LD  B,06
003A          DJNZ 003A
003C          EI
003D          RET

0066          LD  HL,8002
0069          PUSH HL
006A          LD  A,02
006C          OUT (FE),A
006E          RETI

It starts with a tight loop with interrupts enabled. The maskable interrupt handler delays long enough for the interrupt to become inactive, then returns. Triggering an NMI sets the border red and uses RETI to restore iff1 and return to a HALT. If interrupts are disabled at this point the border remains red and the CPU is halted. If iff1 is restored to the previous enabled state the HALT waits for the next frame interrupt then continues to set the border green, before halting.

Tests show that both RETI and RETN give the same behaviour on real hardware, with the border turning green. In the emulator RETN works as expected but the border remains red with RETI because iff1 wasn't restored.

The on_reti handler could probably just call on_retn instead of on_return. Though it uses handlers that mention 'retn' (such as set_iff1_on_retn) so it might be better to have separate 'reti' versions?

Incorrect halted CPU behaviour

The traditional way to implement HALT has been to keep PC on the same instruction and execute NOPs. PC is incremented as part of acknowledging the next interrupt to step over it. However, this has been shown to not match the real Z80 CPU behaviour, and it's possible to detect the difference in code.

The following article describes it, in the section "Halt and the special reset":
http://www.primrosebank.net/computers/z80/z80_special_reset.htm

When HALT is executed it puts the CPU into a halted state (already implemented in your core). It also advances PC to point to the next instruction. During the halted state the opcode fetch runs on this new PC value but a NOP is executed instead, and PC isn't advanced. When an interrupt occurs the halted state is cleared and execution continues from the current point as normal.

It's possible to detect this behaviour with a HALT in the last byte before a contention boundary, such as 0x7FFF on the ZX Spectrum. The incorrect behaviour reads from 0x7FFF for each NOP executed at the HALT, which has more contention than the correct fetches from 0x8000. The difference in timing can be detected by measuring how much has R changed when the next interrupt is ackknowledged.

Support 'bulk' interfaces

We currently only support handlers like on_step() and on_read()/on_write() that mandate the emulators to execute a single instruction or provide a mean to access a single memory cell. This prevents us from attempting to do some kinds of optimisations, e.g.:

  • memmove()-like implementations for LDIR/LDDR (means better performance, mentioned in #16 (comment)).
  • Fast HALT/HLT emulation.
  • #35 requires knowing it in advance how much code we can execute without being interrupted.
  • Just-in-time compilation for hot basic blocks. Better performance again.
  • Support port handlers that deal with blocks of input/output data rather than with individual bytes. It would particularly be useful for the Python API to void calling handlers on every IN/OUT instruction (then https://github.com/kosarev/zx can be updated to use that).

Performance benefit using fast types with registers?

I'm keeping an execution trace in my emulator debugger, which stores the z80_state structure to preserve the CPU state after each instruction. I noticed this was quite a lot bigger than the old CPU core, with 76 bytes needed by default. It looks like the extra size is due to the 16-bit register pairs being stored as 32-bit values, with the fast_u16 type being uint_fast16_t, which is unsigned int in my environment.

I experimented with using uint_least16_t instead, for a real 16-bit value. That shrinks the state to 44 bytes, and I didn't notice any difference in performance for either x86 or x64 builds. Were the fast versions used because of a performance difference seen in some cases, or just because they should give the best performance for any environment?

On a related note, while experimenting with changing fast_u32 to uint_least32_t I noticed that the pf_ari template has a small type mismatch in two uses. Changing them from pf_ari(r32 to pf_ari(r16 should fix it I think.

Breakpoint incorrectly triggers on CALL.

Possibly that's bug on my side, I'll look into it from my side further.

So, I'm trying to implement breakpoints, in a way similar to one in z80.h.

What I do:

void Machine::run(uint64_t max_ops) {
  for (uint64_t op = 0; op < max_ops; ++op) {
    events_ = 0;
    on_step();
    if (events_ & kEventInterrupt) {
      ++interrupts;
      on_handle_active_int();
    }
    if (event_mask_ & events_) break;
  }
}
void Machine::on_set_pc(z80::fast_u16 addr) {
  if (breakpoints_[addr]) events_ |= kEventBreakpoint;
  base::on_set_pc(addr);
}

Now I have the following code:

...
40147 | CALL 41055
40150 | CALL 40433
40153 | CALL 41602
...
...
41055 | PUSH HL ย 
41056 | LD A,121
...

When I do:

my_machine.breakpoints_[40150] = true;
my_machine.run(1000000);
std::cout << my_machine.get_pc();

I expect it to stop at 40150, but instead 41055 is output.

Revisit the SCF and CCF logic

In an attempt to make sure we do the right thing about the WZ register for the undocumented BIT r, b, (i + d) instructions (#21 and #22), I ran into the PORTAR MSX I/O MAPPING paper, http://datassette.nyc3.cdn.digitaloceanspaces.com/tech/portarmsxiomapping.pdf . It says:

For 'SCF' and 'CCF' flags are calculated as "(A OR F) AND 28h", ie. the flags remain set if they have been set before.

With our current implementation being changed to that, we still seem to pass ZEXALL:

diff --git a/z80.h b/z80.h
index 8c94574..dd14b32 100644
--- a/z80.h
+++ b/z80.h
@@ -3709,7 +3709,7 @@ public:
     void on_scf() {
         fast_u8 a = self().on_get_a();
         fast_u8 f = self().on_get_f();
-        f = (f & (sf_mask | zf_mask | pf_mask)) | (a & (yf_mask | xf_mask)) |
+        f = (f & (sf_mask | zf_mask | pf_mask)) | ((a | f) & (yf_mask | xf_mask)) |
                 cf_mask;
         self().on_set_f(f); }
     void on_set(unsigned b, reg r, fast_u8 d) {
Preliminary tests complete
Z80all instruction exerciser
<adc,sbc> hl,<bc,de,hl,sp>....  OK
add hl,<bc,de,hl,sp>..........  OK
add ix,<bc,de,ix,sp>..........  OK
add iy,<bc,de,iy,sp>..........  OK
aluop a,nn....................  OK
aluop a,<b,c,d,e,h,l,(hl),a>..  OK
aluop a,<ixh,ixl,iyh,iyl>.....  OK
aluop a,(<ix,iy>+1)...........  OK
bit n,(<ix,iy>+1).............  OK
bit n,<b,c,d,e,h,l,(hl),a>....  OK
cpd<r>........................  OK
cpi<r>........................  OK
<daa,cpl,scf,ccf>.............  OK

This raises some thoughts:

  • Our implementation of SCF and other instructions may be wrong.
  • ZEXALL is not ideal.

Marking this a bug to reflect the severity of the issue.

The emulator is not as fast as it's advertised. :-P

Sorry for the provocative issue title, and not really a bug but just a piece of feedback. :-)

I've checked ~10 z80 emulation libraries, and most of them claim to be "fast", but it doesn't look like any performance comparison was made for any of them. Possibly anything faster than original z80 is considered "fast", but I believe that bar would be too low.

https://github.com/floooh/chips/blob/master/chips/z80.h is an example of something faster that this library (in my benchmark's it's 2.5x faster). But even it on modern CPU is only ~600 times faster than the real Z80. Which if you calculate CPU cycles is impressive ("works as if z80 clock was 2.0Ghz"), but given that Z80 instructions took much more cycles than instructions in the modern CPUs, I think it may be space to explore the ways to make it faster.

I did run a profiler for this library in my experiments ("on clang -O3, and on g++ -O3"), and as far as I remembered and understood them, the main slowdown seemed to be due to lots of nested function calls including calling self() just to get this of the correct type, during every of instruction decode. One may think that as all function calls are static, compiler would be clever enough to inline them or optimize them out, but it didn't happen neither on clang nor in g++ (both with -O3).

Unfortunately, I didn't keep the profiler stats, but I can try to recreate them if needed.

As a side note not related to this project, I personally am in a search of really fast emulator, which doesn't have to have any precise timings. Even going as far as using memcpy() when decoding LDIR (and checking time till interrupt, whether BC or HL intersect 0 address, or whether they intersect the instruction itself) would be great.

Does a method exist to query if interrupts are disabled (python)

Hey

Thanks for the project!

As you know by now, I use the python bindings.

I've come to the point where I need to emulate keyboard interrupts to progress further. So far I do it in a crude way:
periodically i do a 'kbhit()' call. If a key was pressed I store the key value in a variable, save the current PC on the
stack and set the PC to 0x38.

This barely works: I have been somewhat successful in getting resposes from the OS but it is vary flaky. I suspect that
I might trigger 'interrupts' while already in a interrupt or when interrupts are disabled.

Do you have a suggestion for how to query for the ei/di status? Or any other good ideas for how to do this?

Best
Morten

Shuffling nodes affects scf/ccf behaviour

scf/ccf seems to be susceptible to the order nodes are getting updated in during simulation.

Can be reproduced on ec19a48, with seemingly any seed, though one time I observed all tests passing with the shuffling enabled on an early version of the patch that didn't support seeding yet, so presumably not all seeds will do.

Feels like this may have something to do with rlca & Co. not having the expected effect on scf/ccf in our simulation as mentioned in #42 (comment).

The task is to try to minimise the reproducer and determine the specific conditions that trigger the behaviour change.

xref: https://discord.com/channels/654774470652723220/689220116801650811/1031181913878183966

$ pypy3 z80sim.py --seed=568
17:21:43  8/51 cpl                                                                         
17:25:50  1/51 <alu> (hl)
17:25:52  9/51 daa
17:26:14  6/51 bit (hl)
17:26:29  3/51 <alu> {b, c, d, e, h, l, a}
17:27:28  2/51 <alu> n
17:27:42  7/51 call nn
17:29:23  11/51 ei/di
17:30:08  13/51 ex af, af'
17:30:37  5/51 add hl, <rp>
17:31:10  14/51 ex de, hl
17:31:45  15/51 exx
17:34:46  16/51 im/xim n
17:37:06  12/51 ex (sp), hl
17:37:17  4/51 adc/sbc hl, <rp>
17:38:11  19/51 inc/dec (hl)
17:40:30  17/51 in a, (n)/out (n), a
17:41:05  20/51 inc/dec <rp>
17:42:38  22/51 jp hl
17:43:31  21/51 inc/dec {b, c, d, e, h, l, a}
17:43:55  18/51 in/out r, (c)
17:46:51  23/51 jp nn
17:49:30  28/51 ld (hl), {b, c, d, e, h, l, a}
17:50:13  26/51 ld (<rp>), a/ld a, (<rp>)
17:50:16  27/51 ld (hl), n
17:53:53  25/51 jr d
17:55:18  33/51 ld sp, hl
17:57:17  30/51 ld <rp>, nn
18:00:41  31/51 ld a, (nn)/ld (nn), a
18:01:23  34/51 ld {b, c, d, e, h, l, a}, (hl)
18:02:57  29/51 ld <rp>, (nn)/ld (nn), <rp>
18:03:44  35/51 ld {b, c, d, e, h, l, a}, n
18:05:11  36/51 ld {b, c, d, e, h, l, a}, {b, c, d, e, h, l, a}
18:05:21  32/51 ld hl, (nn)/ld (nn), hl
18:05:33  39/51 nop
18:06:11  38/51 neg/xneg
18:10:10  10/51 djnz d
18:11:34  42/51 ret
18:11:55  37/51 ld {i, r}, a/ld a, {i, r}
18:14:01  44/51 reti/retn/xretn
18:15:00  40/51 pop <rp2>
18:15:09  45/51 rlca/rrca/rla/rra
18:15:34  41/51 push <rp2>
17:17:06  
FAILED: scf/ccf reg_f3 (xf)
  before: f_b3
  after: (and (or is_ex_af_af2 a_b3) (or (not is_ex_af_af2) f_b3))
  expected: (and (or f_b3 a_b3) (or (not is_ex_af_af2) f_b3))
  diff: (and f_b3 (not is_ex_af_af2) (not a_b3))
18:17:58  50/51 Traceback (most recent call last):
  File "./z80sim.py", line 2962, in test_instr_seq
    process_instr(seq, state, test=True)
  File "./z80sim.py", line 2836, in process_instr
    token = test_node(instrs, n, at_start, at_end, before, after)
  File "./z80sim.py", line 2315, in test_node
    return check(Bool.ifelse(ignores_f, a, a | f))
  File "./z80sim.py", line 2144, in check
    raise TestFailure()
TestFailure

18:18:42  49/51 rst n
18:18:46  51/51 xnop
18:20:02  24/51 jr cc, d
18:20:54  48/51 rrd/rld
18:39:00  43/51 ret cc
19:18:08  47/51 rot/res/set (hl)
19:27:45  46/51 rot/bit/res/set {b, c, d, e, h, l, a}
FAILED    

Make transistors to store their states, not nodes

The original https://github.com/trebonian/visual6502 code maintains separate states for nodes and transistors. 48a84a6 changed that to store transistors' states in their gate nodes, because these were always supposed to be the same. However, making nodes stateless and gates themselves be storing their states might be better idea.

Firstly, not all nodes are tied to gates. This means we waste time updating them. When a node 'state' is needed, we should be able to compute it on demand without having to store anything.

Secondly, storing gate states in nodes means there is no way for several gates connected to the same node to have different states, which we need to simulate various possible orders of switching transistor states (#51).

Try to avoid using switches

As mentioned in #16, working on #13 revealed a problem with the standard state module utilising switches where it seems we could use indexed accesses for better performance, e.g., in on_get_reg().

Refine evaluation of flags

For every flag give each way of computing its value a separate function, so it can be seen how ways for each flag do we have and what values they depend on. This is supposed to simplify further analysis of relevant logic.

Support IM0

Aside of #2, this seems to be the only missing feature.

setup.py fails

The indentation of line 26 in setup.py is wrong and the file fails to run with an "opts" not found message.

ED-prefixed Z80 instructions affected by the DD/FD prefixed

Turns out, some of the ED instruction handlers ask for current iregp and try to work with index registers, despite the disassembling part being correct and always assuming it be iregp::hl. What a shame!

Caught during falling down the rabbit hole of #21 / #22, checking for duplicate instruction disassemblies. (And yes, we do have duplicates for ld (0x0000), hl, encodings 22xxxxand ed63xxxx, and ld hl, (0x0000), 2axxxx and ed6bxxxx.)

Complete the support for NMOS/CMOS Z80 differences

https://sinclair.wiki.zxnet.co.uk/wiki/Z80#Differences_between_NMOS_and_CMOS_Z80s says:

LD A,I and LD A,R bug
The NMOS Z80s suffer a problem whereby LD A,I and LD A,R record the state of IFF2 after it has been reset if an interrupt is delivered during that instruction. This behaviour, along with workarounds for this for use in interrupt handlers are documented in the Z80 Family Questions and Answers section of the Zilog Product Specifications Databook, and is useful for detecting the model of Z80 in use, so as to determine whether the CPU (assuming it is a genuine NMOS or CMOS Z80) provides an 'OUT (C),0' instruction (NMOS), or 'OUT (C),255' instead (CMOS).
http://z80.info/zip/ZilogProductSpecsDatabook129-143.pdf
(OCRed version: https://archive.org/stream/Zilog-Z80familyDataBook1989OCR/Zilog-Z80familyDataBook1989OCR_djvu.txt)

From the Q&A clause mentioned:

Q: I don't seem to get the correct state of the interrupts when using the LD A, I and LD A, R instructions to read the state of IFF2. Why is this? How can I get around this?
A: On CMOS Z80 CPU, we've fixed this problem. On NMOS Z80 CPU, in certain narrowly defined circumstances, theZ80CPU interrupt enable latch, IFF2, does not necessarily reflect the true interrupt status. The two instructions LD A, R and LD A, I copy the state of interrupt enable latch (IFF2) into the parity flag and modifies the accumulator contents (See table 7.0.1 in the Z80 CPU technical manual for details). Thus, it is possible to determine whether interrupts are enabled or disabled at the time that the instruction is executed. This facility is necessary to save the complete state of the machine. However, if an interrupt is accepted by the CPU during the execution of the instruction -- implying that the interrupts must be enabled -- the P/V flag is cleared. This incorrectly asserts that interrupts were disabled at the time the instruction was executed.

Related to #27.

Request CMOS chip behaviour

The undocumented out (c),0 instruction is the original NMOS Z80 behaviour. On newer CMOS Z80 chips it writes 0xff instead. I have an emulator option to control the behaviour, so I can see the effects on software that uses that instruction. I've only just realised that since switching CPU core the option no longer has any effect.

In my local copy I've implemented a handler called on_is_cmos_z80, which returns true if it's CMOS, false if it's NMOS (the default). The core uses the return value to pick either 0xff or 0 for the instruction implementation and the disassembler output.

If you think that's how you'd implement it I can put up a small pull request with the changes. If you think there's a better way to do it could I please request it be supported in a future release?

Support undocumented DD/FD CB instructions

Caught by Simon @simonowen, see #21. Subtasks would be:

  • Extend the API and disassembler as necessary to represent the undocumented instructions.
  • Make sure we haven't missed any other undocumented instructions. (Generate full lists of instructions and check that there are no duplicates?)
  • For z80 and i8080.

Symbolic simulation

Having spent some time on thinking about possible ways to analyse the properties of the transistor net and all the difficulties with rewriting the net as a complete set of proper boolean expressions, it suddenly crossed my mind a few days ago that at least theoretically it should be possible to amend the simulator to just propagate signals in symbolic form rather than concrete ones and zeroes, and thus let the formulas develop themselves as necessary. As unrealistic in practical terms as it sounds, I was actually able to get some practical results by leveraging the power of Z3, which look rather promising and probably deserve a separate ticket.

5b95f12 introduces symbolic simulation mode and does probably the simplest thing -- just assigns pins to symbolic values and then propagates the states of all nodes and then prints the value of the A register:

$ pypy3 ./z80sim.py --symbolic --no-tests
Round 1, 3535 nodes.
Round 2, 7364 nodes.
Round 3, 4976 nodes.
Round 4, 2146 nodes.
Round 5, 1370 nodes.
Round 6, 1232 nodes.
Round 7, 622 nodes.
Round 8, 446 nodes.
Round 9, 392 nodes.
Round 10, 230 nodes.
Round 11, 190 nodes.
Round 12, 142 nodes.
Round 13, 120 nodes.
Round 14, 122 nodes.
Round 15, 164 nodes.
Round 16, 202 nodes.
Round 17, 154 nodes.
Round 18, 36 nodes.
Round 19, 1152 nodes.
Round 20, 28 nodes.
Round 21, 30 nodes.
Round 22, 56 nodes.
Round 23, 68 nodes.
Round 24, 64 nodes.
Round 25, 40 nodes.
Round 26, 46 nodes.
Round 27, 32 nodes.
Round 28, 24 nodes.
Round 29, 72 nodes.
a: 0x55

It takes some minutes to run on my machine, which is much faster compared to what I expected. And more good news is that the process is clearly convergent. So after ~30 rounds the system was able to conclude that no further propagation is necessary, meaning reaching the point where all the new gate state expressions are equivalent to their old ones.

The last line is an indication of that it has been formally proved that the initial value of the register A does not depend on the state of the pins -- something we probably already know, but the formality of the knowledge somehow turns it into something exciting.

The convergence doesn't seem to be an accident; with some further changes (to be committed soon) it was possible to perform the initialisation sequence, the reset sequence and then even execute the ld a, imm instruction where imm was represented in a completely symbolic form, so after two more ticks it showed the imm0...imm7 symbols happily landed in A.

Test against the die-level simulator

Goran Devic @gdevic, the author of https://github.com/gdevic/Z80Explorer, has just kindly told me that we have https://github.com/hoglet67/Z80Simulator -- a Linux port of a simulator originally written by Pavel Zima that makes it possible to literally execute die images of the original Z80 chip. I find both the explorer and the simulator completely mind-blowing projects, which I'm sure we can benefit from in a number of ways.

$ ./Z80_Simulator Z80
-------------------------------------------------------
----------------- Starting simulation -----------------
-------------------------------------------------------
       : C// // // // AAAA AA                      
       : LRH MR RW MI 1111 11AA AAAA AAAA DDDD DDDD
       : KSL 1F DR QQ 5432 1098 7654 3210 7654 3210
0000000: 00. .. .. .. .... .... .... .... .... .... PC:ffff IR:ffff SP:ffff WZ:ffff IX:ffff IY:ffff HL:ffff HL':ffff DE:ffff DE':ffff BC:ffff BC':ffff A:ff A':ff F:SZ5H3VNC F':SZ5H3VNC T:1...5. M:1.34. ***** OPCODE FETCH: 0000[21]
0000120: 001 01 11 00 1111 1111 1111 1111 0010 0001 PC:ffff IR:ffff SP:ffff WZ:5555 IX:5555 IY:5555 HL:5555 HL':5555 DE:5555 DE':5555 BC:5555 BC':5555 A:ff A':55 F:SZ5H3VNC F':.Z.H.V.C T:1..... M:.....
0000240: 001 01 11 00 1111 1111 1111 1111 1111 1111 PC:ffff IR:ffff SP:ffff WZ:5555 IX:5555 IY:5555 HL:5555 HL':5555 DE:5555 DE':5555 BC:5555 BC':5555 A:ff A':55 F:SZ5H3VNC F':.Z.H.V.C T:1..... M:.....
0000360: 001 01 11 00 1111 1111 1111 1111 1111 1111 PC:ffff IR:ffff SP:ffff WZ:5555 IX:5555 IY:5555 HL:5555 HL':5555 DE:5555 DE':5555 BC:5555 BC':5555 A:ff A':55 F:SZ5H3VNC F':.Z.H.V.C T:1..... M:.....
0000480: 001 01 11 00 1111 1111 1111 1111 1111 1111 PC:ffff IR:ffff SP:ffff WZ:5555 IX:5555 IY:5555 HL:5555 HL':5555 DE:5555 DE':5555 BC:5555 BC':5555 A:ff A':55 F:SZ5H3VNC F':.Z.H.V.C T:1..... M:.....
...

verbalise instructions example

I just downloaded your emulator which I plan to use for understanding an
ancient computer called Q1. This is part of a danish computer history project.

I managed to write a brief loader code to load the ROMs into memory and
then use the single_step python example to get started.

However it is not exactly clear to me how to customise this to my needs and
maybe you could advise me on this?

The code currently looks like this:

import z80, sys

def load(m, file, address):
        fh = open(file, 'rb')
        block = list(fh.read())
        assert len(block) + address < 65535
        for i in range(len(block)):
            m.memory[address + i] = block[i]
        print(f'loaded {len(block)} bytes from {file} at address {address}')

def main():
    m = z80.Z80Machine()

    load(m, "../../mjcgit/Q1/src/roms/IC25.BIN", 0x0000)
    load(m, "../../mjcgit/Q1/src/roms/IC26.BIN", 0x0400)
    load(m, "../../mjcgit/Q1/src/roms/IC27.BIN", 0x0800)
    load(m, "../../mjcgit/Q1/src/roms/IC28.BIN", 0x0C00)

    while True:
        print(f'PC={m.pc:04X} {m.memory[m.pc]:02X} {m.memory[m.pc+1]:02X} {m.memory[m.pc+2]:02X} {m.memory[m.pc+3]:02X} ;          | SP={m.sp:04X}, BC={m.bc:04X}, DE={m.de:04X}, HL={m.hl:04X}')

        data = m.memory[m.pc] +  (m.memory[m.pc+1] << 8) + (m.memory[m.pc+2] << 16) + (m.memory[m.pc+3] << 24)
        if data == 0:
            print(f'all zeroes at {m.pc:04x}, exiting ...')
            sys.exit()

        # Limit runs to a single tick so each time we execute exactly one instruction.
        m.ticks_to_stop = 1
        m.run()

And produces output like this:

loaded 1024 bytes from ../../mjcgit/Q1/src/roms/IC25.BIN at address 0
loaded 1024 bytes from ../../mjcgit/Q1/src/roms/IC26.BIN at address 1024
loaded 1024 bytes from ../../mjcgit/Q1/src/roms/IC27.BIN at address 2048
loaded 1024 bytes from ../../mjcgit/Q1/src/roms/IC28.BIN at address 3072
PC=0000 C3 E5 01 C3 ;          | SP=0000, BC=0000, DE=0000, HL=0000
PC=01E5 ED 56 3E 04 ;          | SP=0000, BC=0000, DE=0000, HL=0000
PC=01E7 3E 04 D3 01 ;          | SP=0000, BC=0000, DE=0000, HL=0000
etc.

However I'd like to be able to produce output like this:

loaded 1024 bytes from roms/IC25.BIN at address 0
loaded 1024 bytes from roms/IC26.BIN at address 1024
0000 C3 E5 01     ; JP 01E5        | PC:01E5, SP:0000, A:00,  BC:0000, DE:0000 HL:0000, S Z PV N: 0 0 0 0
01E5 ED 56        ; IM1            | PC:01E7, SP:0000, A:00,  BC:0000, DE:0000 HL:0000, S Z PV N: 0 0 0 0
01E7 3E 04        ; LD A,4         | PC:01E9, SP:0000, A:04,  BC:0000, DE:0000 HL:0000, S Z PV N: 0 0 0 0
01E9 D3 01        ; OUT (1),A      | PC:01EB, SP:0000, A:04,  BC:0000, DE:0000 HL:0000, S Z PV N: 0 0 0 0
01EB 11 3F 00     ; LD DE,003F     | PC:01EE, SP:0000, A:04,  BC:0000, DE:003F HL:0000, S Z PV N: 0 0 0 0

which is from an early attempt to write my own emulator. I realised that a) I was probably not smart enough to do this correctly and 2b) there are plenty of emulators 'out there', this being one of them :-)

But I could not understand from looking at your code how I can adapt the single_step code to print out

  1. just the actually used bytes 1, 2, 3 or 4 according to the opcode and
  2. how to integrate the disassembler to print out the mnemonics

I hope you can help to shed some light on this.

Thanks for making this project available

Best

Morten

Request CPU reset function

Many use cases are likely to want to perform a soft reset of the CPU core. Should there be a reset function to do the appropriate register/state initialisation?

I've been debugging a strange issue that caused the first reset to fail but a second one to succeed. I tracked it down to the CPU being halted (using di;halt) at the point of the reset. This was fixed by manually clearing the halted state in my own reset code.

I'm now using:

        cpu.set_is_halted(false);
        cpu.set_iff1(false);
        cpu.set_pc(0);
        cpu.set_ir(0);

Do you think that's enough to cover a reset, and could it be something I could call instead?

missing OTIR

hi ivan. i've been evaluating cpu emulators and was able to get your z80 up and running in an hour or so. thanks for great work.

in one of my tests i hit an assert. i suppose that was due to a missing OTIR handler. added the code below, though i'm unsure if it is entirely correct. no idea about tick count.

void on_otir() {
  fast_u16 hl = self().on_get_hl();
  fast_u16 bc = self().on_get_bc();
  fast_u8 b = bc >> 8;
  fast_u8 c = bc & 0xff;
  while (b--) {
    fast_u8 t = self().on_read_cycle(hl);
    self().on_output(c, t);
    hl = inc16(hl);
  }
  self().on_set_bc(make16(0,c));
  self().on_set_hl(hl);
  fast_u8 f = self().on_get_f();
  f |= zf_mask;
  f &= ~nf_mask;
  self().on_set_f(f);
}

Wrong mnemonics for 0xed 0x50 (python)

Hi,
The sequence 0xed, 0x50 is decoded (for z80) as

176a 10 fb       ; djnz 0x1767        
176c ed           ; db 0xed            
176d 50           ; ld d, b            
176e 82           ; add a, d

Which is misleading. I believe

176c ed 50 ; in d, (c)

is correct.

Best regards
Morten

Support lazy evaluation of flags for i8080

#5 confirmed that lazy flags are generally implementable and might be beneficial in terms of performance. This task is (to try) to support lazy flags for i8080 and see if it works as a proof of concept and how much speed-up we can have from that. Lazy flags for Z80 is to be addressed with a separate ticket.

Emulator exception on 'ix' instruction

Hey

I am running some old z80 code (from the late 70's). While simulating keyboard input I managed to
cause an exception in the emulator. It looks like the 'ix' instruction is the culprit.

As far as I can tell, the instruction DD BE 0F is valid.

Please let me know if I can assist in debugging this.

Cheers
Morten

--- output from my wrapper

loading program: Emulator exception (ix)
loaded 3 bytes from list at address 2000h
########### HEXDUMP 0x2000 - 0xffff ####################################
icount 0
2000 DD BE 0F FF FF FF FF FF FF FF FF FF FF FF FF FF ................
....
########### HEXDUMP END #################################################
Traceback (most recent call last):
File "/Users/mortenchristensen/projects/mjcgit/Q1/src/emulator.py", line 86, in
main(args)
File "/Users/mortenchristensen/projects/mjcgit/Q1/src/emulator.py", line 40, in main
inst_str, bytes, bytes_str = C.getinst()
^^^^^^^^^^^
File "/Users/mortenchristensen/projects/mjcgit/Q1/src/cpu.py", line 40, in getinst
instr = self.b.build_instr(self.m.pc, bytes(self.m.memory[self.m.pc:self.m.pc + Cpu.MAX_INSTR_SIZE]))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/z80-1.0b2-py3.11-macosx-12-x86_64.egg/z80/_disasm.py", line 291, in build_instr
op = self.__build_op(addr, op_text)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/z80-1.0b2-py3.11-macosx-12-x86_64.egg/z80/_disasm.py", line 229, in __build_op
return At(self.__build_op(addr, text[1:-1]))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/z80-1.0b2-py3.11-macosx-12-x86_64.egg/z80/_disasm.py", line 243, in __build_op
op = self.__OPS[text[:2]]
~~~~~~~~~~^^^^^^^^^^
KeyError: 'ix'

Query about contention delays

In the zx project memory contention is applied in places such as on_fetch_cycle and on_fetch_cycle before any base operations are performed. That would imply that wait states happen even before the address has been placed on the bus. Is that correct?

I would expect T1 to always execute first to place the address on the bus, before device contention is considered. Applying contention relies on the current tick count too, so it seems important to have applied the initial cycle(s) before any rounding up to the next uncontended position. The Zaks book doesn't cover it in detail but other online sources suggests that T2 might effectively repeat as long as WAIT is held low. I'd guess that means the T2 tick is added after the contention.

For example, if contention only allows memory reads at multiples of 4 ticks (0, 4, 8, ...):

class my_emulator : public z80::z80_cpu<my_emulator> {
public:
    typedef z80::z80_cpu<my_emulator> base;
    unsigned cycles = 0;

    my_emulator() {}

    void on_tick(unsigned t) {
        cycles += t;
    }

    void handle_memory_contention(z80::fast_u16) {
        on_tick(((cycles + 3) & ~3) - cycles);
    }

    z80::fast_u8 on_fetch_cycle() {
        handle_memory_contention(get_pc());
        return base::on_fetch_cycle();
    }

    z80::fast_u8 on_read_cycle(z80::fast_u16 addr) {
        handle_memory_contention(addr);
        return base::on_read_cycle(addr);
    }
};

int main() {
    my_emulator e;
    e.on_step();
    e.on_step();
    std::printf("cycles = 0x%04x\n", e.cycles);
}

Executing two NOPs using the current code from T=0 takes 8 cycles, even though the memory access points within the instruction aren't aligned to the access boundaries. I feel the timing should perhaps be 11 cycles: T1 C C C T2 T3 T4 T1 T2 T3 T4.

I may be missing something, but does what I'm asking make sense?

How to enable 'hooks' for out and in instructions

Hey

I am using this emulator to help revive an old Z80 based minicomputer (Q1 Lite).

I have modified the single_stepping.py example to created a disassembler and also a use it to run the code (have 8 ROM images that I can load).

Now it seems like I am likely to get stuck on the out and in instructions.

Do you have a hint on how I could emulate an IO device. For example by having a python function called
on every in and out instruction?

Thanks for the project
Morten

B decremented too late in on_block_out()

The block out instructions decrement B before the port write but the currently implementation does it afterwards. I was finding my 16-bit port writes were all offset by one when using OTDR.

It seems to just need the existing:

        self().on_output_cycle(bc, r);
        bc = sub16(bc, 0x0100);
        fast_u8 s = get_high8(bc);

changing to:

        bc = sub16(bc, 0x0100);
        fast_u8 s = get_high8(bc);
        self().on_output_cycle(bc, r);

The on_block_in implementation is already correct in decrementing B before the port access.

Should on_handle_active_int check is_hl_iregp()?

In my emulator I'm finding I need to check that an index prefix is not active before calling on_handle_active_int, otherwise interrupts can be accepted mid-instruction. The function itself performs other checks before accepting, so should it also include a && is_hl_iregp() in the condition?

Support NMI

Could I request the core support an initiate_nmi for NMI? Here's what I'm using at the moment, which seems to work for me:

    void initiate_nmi() {
        self().on_set_iff1(false);

        fast_u16 pc = self().on_get_pc();

        // Get past the HALT instruction, if halted. Note that
        // HALT instructions need to be executed at least once to
        // be skipped on an interrupt, so checking if the PC is
        // at a HALT instruction is not enough here.
        if(self().on_is_halted()) {
            pc = inc16(pc);
            self().on_set_pc(pc);
            self().on_set_is_halted(false);
        }

        self().on_inc_r_reg();
        self().on_tick(2);
        self().on_push(pc);

        self().on_jump(0x0066);
    }

It's mostly a stripped down version of initiate_int but it doesn't clear iff2 has a different fixed handler address, and reduced timing. It's probably worth checking the details!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.