cmuratori / computer_enhance Goto Github PK

Source code for the https://computerenhance.com programming series

License: Other

Assembly 7.45% Batchfile 0.63% C++ 78.76% C 2.58% C# 1.64% Odin 0.51% Python 1.50% Go 1.29% JavaScript 2.46% Zig 1.75% Rust 0.87% Ruby 0.45% HTML 0.11%

computer_enhance's People

Contributors

Stargazers

Watchers

Forkers

jpmckinney irooc craftlinks misterzeus cypressf leanid pankkor chelnov18 madarauchiha ethanfischer meagar sumeet simenlk trumpet63 kiroxas rchopra nilssonmicke robert42 setharchambault chuckrector claytongreen visuallization cybertengu sirrobbe cttillman gautam1168 akmubi jeng ctfhacker jordanpsleeper jersni edvinandersson ayshvab mbger ernesernesto mr-martian phasesync shumatejr jacobstern adamipc phtrivier liamsain brian-woodard shiver r-cpr kjmcclain fluffels daneestar darkgiggs gregorgullwi delittle davidegrayson mr427 rhoens fminoru zang3tsu stevensavold aymanosman tomasz-rozanski rluba karm xrxr strager puremourning dankeyy ryanschneider thegag96 bigjeff96 tomcoadjoint knexator tymfry timokramer juanlumorales santiagocabrera96 pinatamostgrim bbogdn2 odiyan hamu77 charlesastaylor jpjamipark ryanbeatty joshua-alt zhengyqmp orhayat adylanrff rhuibertsjr schwasam hgranthorner thelonelyvulpes umutagil sylvainbouxin minchopaskal brandonmakin evertonse patnebe angorzan roshypoo camilomcatasus satrac goncalo

computer_enhance's Issues

[QUESTION] Listing 57: Is Casey using the displacement and not the calculated address for the counts contributed by transfers?

For reference, this is my code:

std::pair<size_t, size_t> get_clocks_for_ea_and_transfers(
  effective_address_expression ea,
  size_t num_transfers,
  bool from_to_acc
) {
  auto term0_reg_info = ea.Terms[0].Register;
  auto term1_reg_info = ea.Terms[1].Register;
  auto term0_reg_val = term0_reg_info.Index != 0 ? registers[term0_reg_info.Index - 1] : 0;
  auto term1_reg_val = term1_reg_info.Index != 0 ? registers[term1_reg_info.Index - 1] : 0;
  auto disp = ea.Displacement;
  auto addr = memory[term0_reg_val + term1_reg_val + disp];

  // Replacing `addr` with `disp` here lends the same numbers as Casey's listing text file
  auto clocks_transfers = 4 * num_transfers * (addr & 1);
  if (from_to_acc) {
    return {0, clocks_transfers};
  }

  auto clocks_ea = 0;

  if (term1_reg_info.Index == 0 && term0_reg_info.Index == 0 && disp != 0) {
    clocks_ea = 6;
  }

  if (disp == 0 && (term0_reg_info.Index > 0 || term1_reg_info.Index > 0)) {
    clocks_ea = 5;
  }

  if (term0_reg_info.Index > 0 && term1_reg_info.Index == 0 && disp != 0) {
    clocks_ea = 9;
  }

  // bp + di / bx + si
  if (term0_reg_info.Index == 6 && term1_reg_info.Index == 8
    || term0_reg_info.Index == 2 && term1_reg_info.Index == 7) {
    clocks_ea = disp == 0 ? 7 : 11;
  }

  // bp + si / bx + di
  if (term0_reg_info.Index == 6 && term1_reg_info.Index == 7
    || term0_reg_info.Index == 2 && term1_reg_info.Index == 8) {
    clocks_ea = disp == 0 ? 8 : 12;
  }

  return {clocks_ea, clocks_transfers};
}

My output when calculating using addr and not disp:

; ip: 0x0000    ; bx: 0x0000 -> 0x03e8  ; clocks: +4 = 4
mov bx, 1000
; ip: 0x0003    ; bp: 0x0000 -> 0x07d0  ; clocks: +4 = 8
mov bp, 2000
; ip: 0x0006    ; si: 0x0000 -> 0x0bb8  ; clocks: +4 = 12
mov si, 3000
; ip: 0x0009    ; di: 0x0000 -> 0x0fa0  ; clocks: +4 = 16
mov di, 4000
; ip: 0x000c    ; cx: 0x0000 -> 0x0000  ; clocks: +15 (8 + 7ea) = 31
mov cx, [bp + di]
; ip: 0x000e    ; [bx + si]: 0x0000 -> 0x0000   ; clocks: +16 (9 + 7ea) = 47
mov [bx + si], cx
; ip: 0x0010    ; cx: 0x0000 -> 0x0000  ; clocks: +16 (8 + 8ea) = 63
mov cx, [bp + si]
; ip: 0x0012    ; [bx + di]: 0x0000 -> 0x0000   ; clocks: +17 (9 + 8ea) = 80
mov [bx + di], cx
; ip: 0x0014    ; cx: 0x0000 -> 0x0000  ; clocks: +19 (8 + 11ea) = 99
mov cx, [bp + di + 1000]
; ip: 0x0018    ; [bx + si + 1000]: 0x0000 -> 0x0000    ; clocks: +20 (9 + 11ea) = 119
mov [bx + si + 1000], cx
; ip: 0x001c    ; cx: 0x0000 -> 0x0000  ; clocks: +20 (8 + 12ea) = 139
mov cx, [bp + si + 1000]
; ip: 0x0020    ; [bx + di + 1000]: 0x0000 -> 0x0000    ; clocks: +21 (9 + 12ea) = 160
mov [bx + di + 1000], cx
; ip: 0x0024    ; dx: 0x0000 -> 0x0000 | SF: 0 -> 0 ZF: 0 -> 1  ; clocks: +21 (9 + 12ea) = 181
add dx, [bp + si + 1000]
; ip: 0x0028    ; [bp + si]: 0x0000 -> 0x004c | SF: 0 -> 0 ZF: 1 -> 1   ; clocks: +25 (17 + 8ea) = 206
add [bp + si], 76
; ip: 0x002b    ; dx: 0x0000 -> 0x0000 | SF: 0 -> 0 ZF: 1 -> 1  ; clocks: +21 (9 + 12ea) = 227
add dx, [bp + si + 1001]
; ip: 0x002f    ; [di + 999]: 0x4c00 -> 0x4c00 | SF: 0 -> 0 ZF: 1 -> 0  ; clocks: +25 (16 + 9ea) = 252
add [di + 999], dx
; ip: 0x0033    ; [bp + si]: 0x004c -> 0x0097 | SF: 0 -> 0 ZF: 0 -> 1   ; clocks: +33 (17 + 8ea + 8odd) = 285
add [bp + si], 75

Final registers:
        bx: 0x03e8 (1000)
        bp: 0x07d0 (2000)
        si: 0x0bb8 (3000)
        di: 0x0fa0 (4000)
        ip: 0x0036 (54)

Please explain.

Revise `[bp + 0]` effective address cycles from 5 to 9 in listing 56?

In Q+A 21 for question [00:13], you reviewed the microcode and concluded (at around [25:40]) that a displacement of 0 would still go through the motions in an effective address calculation, and thus take 9 cycles instead of 5. Were you planning on changing listing 56 to reflect this (and I guess your simulator as well)?

computer_enhance/perfaware/part1/listing_0056_estimating_cycles.txt

Line 18 in 15e0e7b

mov cx, [bp] ; Clocks: +13 = 62 (8 + 5ea) | ip:0x17->0x1a

should really be

mov cx, [bp] ; Clocks: +17 = 66 (8 + 9ea) | ip:0x17->0x1a

and

computer_enhance/perfaware/part1/listing_0056_estimating_cycles.txt

Line 55 in 15e0e7b

mov cx, [bp] ; Clocks: +17 = 74 (8 + 5ea + 4p) | ip:0x17->0x1a

should really be

mov cx, [bp] ; Clocks: +21 = 78 (8 + 9ea + 4p) | ip:0x17->0x1a

I bring this up because in order to satisfy the listings as they currently are in my simulator code, I had to add extra logic to check if the displacement value was 0, and if so, discard the displacement cycles.

It seems to me that perhaps the reference simulator treats a displacement of 0 and 'no displacement' as the same thing due to the if check simply checking that the value is not 0 rather than checking for existence. Perhaps this is the only reason why the reference simulator showed 5 cycles instead of 9 in the first place:

computer_enhance/perfaware/sim86/sim86_cycles.cpp

Lines 65 to 68 in 15e0e7b

 if(Expr.Displacement) 

 { 

 Result += 4; 

 }

I'm fine with "correct, but won't fix," but I just wanted to point this out in case others got confused like I did, and to verify that I'm understanding this correctly.

Call direct intersegment not handled?

So I could totally be misunderstanding, but I don't think the reference decoder is handling "Call direct intersegment"

If I give NASM this assembly

bits 16 ; or cpu 8086 - same result
call 999:888

I get these bytes (hex)

9A 78 03 E7 03

Giving this to the reference decoder I get this output

bits 16
call 231
ERROR: Instruction extends outside disassembly region

Unless this isn't an 8086 instruction? But those by bytes do look to me like that instruction.

(Built on Windows 10 MSVC. Other listing work as expected. Believe the same also applies to "Jump direct intersegment")

Opcodes for text and xchg overlap

test instruction at https://github.com/cmuratori/computer_enhance/blob/main/perfaware/sim86/sim86_instruction_table.inl#L145 should have opcode 1000010 without a D flag, according to table 4-31 in the manual. Table 4-25 (incorrectly) has it with a different opcode and a D flag so understandable how it got missed. I was playing around with code generation and noticed it getting confused distinguishing xchg and test in a table that was extracted from the .inl file.

Wrong register in listing 39 - part 1

In the homework, I believe this line of assembly is wrong.

computer_enhance/perfaware/part1/listing_0039_more_movs.asm

Line 36 in b71fef2

mov dx, [bp]

According to the 8086 manual, there is not source address calculation with the register bp since the corresponding code is instead use for Direct address.

Addresses for jumps/calls not printed out

Input bytes E9 39 0A produce jmp , but should output jmp 0xa3c
Input bytes E8 16 2E produce call , but should output call 0x2e19

Undefined behavior in circular buffer

Circular buffer implementation that involves page mapping mentioned in the recent video (which is a great video btw) behaves inconsistently on different optimization levels, which is likely caused by undefined behavior. The reason is probably because of aliasing rules that modern compilers use aggressively to optimize code. The following code uses the circular buffer defined in perfaware/part3/listing_0121_circular_buffer_main.cpp:

int main(void)
{
    printf("Circular buffer test:\n");
    
    const size_t BUF_SIZE = 64 * 4096;

    circular_buffer Circular = AllocateCircularBuffer(BUF_SIZE, 3);
    
    if(IsValid(Circular))
    {
        u8 *Data = Circular.Base.Data + BUF_SIZE;

        Data[0] = 1;
        Data[BUF_SIZE] = 2;

        printf("%u\n", Data[0]);

        DeallocateCircularBuffer(&Circular);
    }
    else
    {
        printf("  FAILED\n");
    }
    
    // NOTE(casey): Since we do not use these functions in this particular build, we reference their pointers
    // here to prevent the compiler from complaining about "unused functions".
    (void)&IsInBounds;
    (void)&AreEqual;
    (void)&AllocateBuffer;
    (void)&FreeBuffer;
    
    return 0;
}

This code outputs (which is the expected result) on each compiler with optimizations off (cl /Od, g++ -O0, clang++ -O0):

Circular buffer test:
2

But it gives the following output when optimizations are on (cl /O2, g++ -O2, clang++ -O2):

Circular buffer test:
1

It seems like compilers assume that writing to Data[BUF_SIZE] could not possibly affect the value of Data[0], so it can safely put the known value of Data[0] directly into printf.
Here is the assembly generated with g++ -O2 (g++ version 13.1, mingw-w64)

   140007eba:   c6 80 00 00 04 00 01    mov    BYTE PTR [rax+0x40000],0x1   ; write 1 to Data[0]
   140007ec1:   48 8d 0d 8b 21 00 00    lea    rcx,[rip+0x218b]
   140007ec8:   ba 01 00 00 00          mov    edx,0x1                      ; put 1 directly into printf args
   140007ecd:   c6 80 00 00 08 00 02    mov    BYTE PTR [rax+0x80000],0x2   ; write 2 to Data[BUF_SIZE]
   140007ed4:   e8 f7 fd ff ff          call   140007cd0 <_Z6printfPKcz>    ; call printf

And here is the assembly generated with g++ -O0

   140001aec:   c6 00 01                mov    BYTE PTR [rax],0x1           ; write 1 to Data[0]
   140001aef:   48 8b 45 f0             mov    rax,QWORD PTR [rbp-0x10]
   140001af3:   48 05 00 00 04 00       add    rax,0x40000
   140001af9:   c6 00 02                mov    BYTE PTR [rax],0x2           ; write 2 to Data[BUF_SIZE]
   140001afc:   48 8b 45 f0             mov    rax,QWORD PTR [rbp-0x10]
   140001b00:   0f b6 00                movzx  eax,BYTE PTR [rax]           ; read Data[0] again
   140001b03:   0f b6 c0                movzx  eax,al
   140001b06:   89 c2                   mov    edx,eax                      ; put the value of Data[0] into printf args
   140001b08:   48 8d 05 6b 85 00 00    lea    rax,[rip+0x856b]
   140001b0f:   48 89 c1                mov    rcx,rax
   140001b12:   e8 39 68 00 00          call   140008350 <_Z6printfPKcz>    ; call printf

Sorry if it's not the right place to disscuss this, but YouTube comments are disabled, and Computerenhance comments are for subscribers only. But I believe it should be mentioned somewhere that this kind of circular buffers are not really safe to use with modern compilers unless someone figures out how to reliably tell the compiler that this kind of page manipulation is involved.

Is underflow ok ?

I have hard time to understand why you did the Tester->TimeAccumulatedOnThisTest -= ReadCPUTimer(); part of your code.
To me, and after trying to find a different answer, I conclude that if Tester->TimeAccumulatedOnThisTest is =0 (apparently on first Tester it can) you will "underflow".
I presume that we don't care because even if the underflow is not consistent on every platform, the behaviour is consistent on the platform it runs on.
But what I don't get is that on the first repetition of a specific Tester, you will get a TSCElapsed for a single read not representative of the actual TSC it takes to read, because:

I am certain that if you did it, the reason is that the code don't care about it. But I need someone to point me to the obvious.

In my code I did :

static void BeginTime(repetition_tester *Tester)
{
    Tester->TSCLastRepetition = ReadCPUTimer();
}

static void EndTime(repetition_tester *Tester)
{
    Tester->TSCLastRepetition = ReadCPUTimer() - Tester->TSCLastRepetition; 
}

Thank you for anyone that will make me understand that part of Casey's source code. As computer enhance is about CPU and not the source code it's the only place where I can post it.

Direct intersegment jump producing incorrect output

A direct intersegment jump cause a jmp and push to generate

Example input:
JMP 0x5566:0x7788 ;ea 66 55 88 77

Example hexdump:
00000000: ea88 7766 55 ..wfU

sim86 output:

bits 16
jmp 102
push bp

Calls lose far for intra-segment addresses

Both FF 52 C6 and FF 5A C6 prints to call word [bp+si-58]

But it for FF 5A C6 it should be call far word [bp+si-58] to indicate it is inter-segment call.

Similar for jumps:
FF 25 and FF 2D give jmp word [di]
But second one should be jmp far word [di] as inter-segment jump.

Consider doing retf for inter-segment returns

CA 98 44 and C2 98 44 disassembles to same ret 17560 instruction.
But actually first is intra-segment return, second is inter-segment return - they use different opcodes.
Typically they are written as retn and retf in asm to specify which one you want. NASM knows retn and retf.

Similarly for CB should be retf, and C3 should be retn, but sim86 does ret for both.

Bad size prefix for moving segment reg to memory

Input bytes 8C 40 3B decodes & prints to mov byte [bx+si+59], es which is nonsense. Cannot move byte from segment register. There should be no byte prefix. Just mov [bx+si+59], es assembles back to 8C 40 3B correctly.

Always print immediates as unsigned in the disassembly

I'm perhaps late to the party.

I have a problem with how Immediate are being printed out in the disassembly.

https://github.com/cmuratori/computer_enhance/blob/main/perfaware/part1/listing_0045_challenge_register_movs.txt#L5

Binary:
```
BA 88 88
```
Your reference sim86 disassembles it as:
```
mov dx, 34952
```
https://github.com/cmuratori/computer_enhance/blob/main/perfaware/part1/listing_0047_challenge_flags.txt#L12

Binary
```
81 C3 40 9C
83 C1 A6
```
Your reference sim86 disassembles it as:
```
add cx, 40000
add cx, -90
```

As we can't differentiate between positive and negative immediates in the binary, the last -90 seems to be out of place. I would appreciate if it would be printed as unsigned 65446.

Immediate is printed as signed integer here:
https://github.com/cmuratori/computer_enhance/blob/main/perfaware/sim86/sim86_text.cpp#L128

And this is how it's read:
https://github.com/cmuratori/computer_enhance/blob/main/perfaware/sim86/sim86_decode.cpp#L62

When we read wide, 2 bytes are read and stored into lower word of 32-bit Result. No sign extension happens if the word has bit 15 set, hence this 16-bit immediate would be always printed as a positive 32-bit integer.
However in case of sign extension we sign extend all 32-bits of Result. This immediate could be printed out as a negative 32-bit integer.
I think it would be more appropriate to sign extend only up to a word boundary and always print as unsigned. What do you think?

REPNE prefix not handled

repne.binary.txt

$ sim86_clang_debug.exe repne.binary.txt
; repne.binary.txt disassembly:
bits 16
rep cmpsb
rep scasb
rep cmpsw
rep scasw
rep cmpsb
rep scasb
rep movsw
rep cmpsw
rep scasw

$ ndisasm.exe repne.binary.txt
00000000  F3A6              repe cmpsb
00000002  F3AE              repe scasb
00000004  F3A7              repe cmpsw
00000006  F3AF              repe scasw
00000008  F2A6              repne cmpsb
0000000A  F2AE              repne scasb
0000000C  F2A5              repne movsw
0000000E  F2A7              repne cmpsw
00000010  F2AF              repne scasw

REPE is same as REP for cmps/scas instruction.

Basically movs/stos/lods use REP prefix.
cmps/scas use REPE/REPZ (which has same encoding as REP) or REPNE/REPNZ.

Is listing_0101_read_bandwidth_main.cpp in the wrong folder ?

I may guess it wrong but it seems that you missplaced listing_0101_read_bandwidth_main.cpp in part 2 folder instead of part 3. But as the TOC in https://www.computerenhance.com/p/table-of-contents is broken for me, I cannot be sure. If it was obvious I wouldn't mind but it's being quite confusing for me.
Edit: You have it duplicated in both folders actually, as I said it's fine, but as the TOC is broken it was confusing
Edit: TOC has been repaired today.

Thank you !

DllNotFoundException running C# sim86_test.cs (Ubuntu)

Hi,

When I follow the instructions at the top of sim86_test.cs, ie. copying the sim86_shared_debug.dll next to sim86.cs and sim86.test, etc, and running dotnet run from that directory, I get the following exception:

Exception has occurred: CLR/System.DllNotFoundException
An unhandled exception of type 'System.DllNotFoundException' occurred in sim86.dll: 'Unable to load shared library 'sim86_shared_debug' or one of its dependencies. In order to help diagnose loading problems, consider setting the LD_DEBUG environment variable: libsim86_shared_debug: cannot open shared object file: No such file or directory'
   at Sim86.Native.Sim86_GetVersion()
   at Sim86.GetVersion() in /home/ethan/repos/computer_enhance/perfaware/sim86/shared/contrib_csharp/sim86.cs:line 291
   at Program.<Main>$(String[] args) in /home/ethan/repos/computer_enhance/perfaware/sim86/shared/contrib_csharp/sim86_test.cs:line 23

I'm new to calling into c++ dlls so maybe I'm missing something obvious but figured I post here in case anyone can help.

I'm running it in vscode on Ubuntu 20.04

Probably I'm wrong but it's swapped? I mean if wide then the MSB is not 1<<15?

https://github.com/cmuratori/computer_enhance/blob/62ec92d84ff46f6ad9b1eb0132a5d2cf6d3b36aa/perfaware/sim86/sim86_execute.cpp#LL123C55-L123C55

How to view ASM output - CLion

Hey,

I mostly use CLion for C/C++ and I noticed the instruction markdown for how to view ASM does not mention it.

This could be covered inside the "Using a debugger section" along with Visual Studio, but someone also made a cool Compiler Explorer plugin.

Would that be helpful/interesting to add in a PR?

Sim DLL doesn't work on Apple M1 running Ventura 13.6

I think the issue is that MacOS wants binary files in "Mach-O" format, which the DLLS aren't. Found a post (here) that seems to confirm.

I will try & build myself in the mean time. If I get it working, would it be helpful to post the MacOS DLL?

Thank you!

Full error for context:

Traceback (most recent call last):
  File "/Users/username/dev/computer-enhance/sim86_test.py", line 1, in <module>
    import sim86
  File "/Users/username/dev/computer-enhance/sim86.py", line 202, in <module>
    dll = ctypes.CDLL(str(pathlib.Path(__file__).parent / "sim86_shared_debug.dll"))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.11.7/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ctypes/__init__.py", line 376, in __init__
    self._handle = _dlopen(self._name, mode)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: dlopen(/Users/username/dev/computer-enhance/sim86_shared_debug.dll, 0x0006): tried: '/Users/username/dev/computer-enhance/sim86_shared_debug.dll' (not a mach-o file), '/System/Volumes/Preboot/Cryptexes/OS/Users/username/dev/computer-enhance/sim86_shared_debug.dll' (no such file), '/Users/username/dev/computer-enhance/sim86_shared_debug.dll' (not a mach-o file)

`[0]` mistakenly decodes to `[]`

Given the file:

; listing A
bits 16
mov ax, [0]

nasm produces A1 00 00

sim86 decodes that to:

; listing A disassembly
bits 16
mov ax, []

which nasm cannot assemble (error: expression syntax error).

I would expect it to produce listing A.

This happens because PrintEffectiveAddressExpression of an effective_address_expression with no terms and a 0 Displacement doesn't print anything due to this conditional:

    if(Address.Displacement != 0)
    {
        fprintf(Dest, "%+d", Address.Displacement);
    }

Push instruction decode has operand 0 as Operand_None operand 1 as Operand_Register

Hi Casey,

Calling Sim86_RegisterNameFromOperand on a push r instruction, eg push cx, gives an instruction that has the register in Operands[1], while Operands[0] is Operand_None.

This surprised me when setting up to use your shared library to decode (which otherwise was incredibly easy to use/understand!). Looking at your code I'm afraid I can't tell for sure if this is an issue or if it is expected, so this might not be an issue at all!

Loving the course btw!

cmuratori / computer_enhance Goto Github PK

computer_enhance's People

Contributors

Stargazers

Watchers

Forkers

computer_enhance's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs