GithubHelp home page GithubHelp logo

cmuratori / computer_enhance Goto Github PK

View Code? Open in Web Editor NEW
544.0 544.0 135.0 5.82 MB

Source code for the https://computerenhance.com programming series

License: Other

Assembly 7.45% Batchfile 0.63% C++ 78.76% C 2.58% C# 1.64% Odin 0.51% Python 1.50% Go 1.29% JavaScript 2.46% Zig 1.75% Rust 0.87% Ruby 0.45% HTML 0.11%

computer_enhance's People

Contributors

akmubi avatar bitwitch avatar charlesastaylor avatar cmuratori avatar dankeyy avatar davidegrayson avatar ethanfischer avatar gauravgautamgoldcast avatar gautam1168 avatar geo-ant avatar jeng avatar jpmckinney avatar kiroxas avatar knexator avatar pankkor avatar penguingovernor avatar pinatamostgrim avatar puremourning avatar rluba avatar ryanschneider avatar said6289 avatar santiagocabrera96 avatar setharchambault avatar shiver avatar strager avatar tomasz-rozanski avatar xrxr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

computer_enhance's Issues

[QUESTION] Listing 57: Is Casey using the displacement and not the calculated address for the counts contributed by transfers?

For reference, this is my code:

std::pair<size_t, size_t> get_clocks_for_ea_and_transfers(
  effective_address_expression ea,
  size_t num_transfers,
  bool from_to_acc
) {
  auto term0_reg_info = ea.Terms[0].Register;
  auto term1_reg_info = ea.Terms[1].Register;
  auto term0_reg_val = term0_reg_info.Index != 0 ? registers[term0_reg_info.Index - 1] : 0;
  auto term1_reg_val = term1_reg_info.Index != 0 ? registers[term1_reg_info.Index - 1] : 0;
  auto disp = ea.Displacement;
  auto addr = memory[term0_reg_val + term1_reg_val + disp];

  // Replacing `addr` with `disp` here lends the same numbers as Casey's listing text file
  auto clocks_transfers = 4 * num_transfers * (addr & 1);
  if (from_to_acc) {
    return {0, clocks_transfers};
  }

  auto clocks_ea = 0;

  if (term1_reg_info.Index == 0 && term0_reg_info.Index == 0 && disp != 0) {
    clocks_ea = 6;
  }

  if (disp == 0 && (term0_reg_info.Index > 0 || term1_reg_info.Index > 0)) {
    clocks_ea = 5;
  }

  if (term0_reg_info.Index > 0 && term1_reg_info.Index == 0 && disp != 0) {
    clocks_ea = 9;
  }

  // bp + di / bx + si
  if (term0_reg_info.Index == 6 && term1_reg_info.Index == 8
    || term0_reg_info.Index == 2 && term1_reg_info.Index == 7) {
    clocks_ea = disp == 0 ? 7 : 11;
  }

  // bp + si / bx + di
  if (term0_reg_info.Index == 6 && term1_reg_info.Index == 7
    || term0_reg_info.Index == 2 && term1_reg_info.Index == 8) {
    clocks_ea = disp == 0 ? 8 : 12;
  }

  return {clocks_ea, clocks_transfers};
}

My output when calculating using addr and not disp:

; ip: 0x0000    ; bx: 0x0000 -> 0x03e8  ; clocks: +4 = 4
mov bx, 1000
; ip: 0x0003    ; bp: 0x0000 -> 0x07d0  ; clocks: +4 = 8
mov bp, 2000
; ip: 0x0006    ; si: 0x0000 -> 0x0bb8  ; clocks: +4 = 12
mov si, 3000
; ip: 0x0009    ; di: 0x0000 -> 0x0fa0  ; clocks: +4 = 16
mov di, 4000
; ip: 0x000c    ; cx: 0x0000 -> 0x0000  ; clocks: +15 (8 + 7ea) = 31
mov cx, [bp + di]
; ip: 0x000e    ; [bx + si]: 0x0000 -> 0x0000   ; clocks: +16 (9 + 7ea) = 47
mov [bx + si], cx
; ip: 0x0010    ; cx: 0x0000 -> 0x0000  ; clocks: +16 (8 + 8ea) = 63
mov cx, [bp + si]
; ip: 0x0012    ; [bx + di]: 0x0000 -> 0x0000   ; clocks: +17 (9 + 8ea) = 80
mov [bx + di], cx
; ip: 0x0014    ; cx: 0x0000 -> 0x0000  ; clocks: +19 (8 + 11ea) = 99
mov cx, [bp + di + 1000]
; ip: 0x0018    ; [bx + si + 1000]: 0x0000 -> 0x0000    ; clocks: +20 (9 + 11ea) = 119
mov [bx + si + 1000], cx
; ip: 0x001c    ; cx: 0x0000 -> 0x0000  ; clocks: +20 (8 + 12ea) = 139
mov cx, [bp + si + 1000]
; ip: 0x0020    ; [bx + di + 1000]: 0x0000 -> 0x0000    ; clocks: +21 (9 + 12ea) = 160
mov [bx + di + 1000], cx
; ip: 0x0024    ; dx: 0x0000 -> 0x0000 | SF: 0 -> 0 ZF: 0 -> 1  ; clocks: +21 (9 + 12ea) = 181
add dx, [bp + si + 1000]
; ip: 0x0028    ; [bp + si]: 0x0000 -> 0x004c | SF: 0 -> 0 ZF: 1 -> 1   ; clocks: +25 (17 + 8ea) = 206
add [bp + si], 76
; ip: 0x002b    ; dx: 0x0000 -> 0x0000 | SF: 0 -> 0 ZF: 1 -> 1  ; clocks: +21 (9 + 12ea) = 227
add dx, [bp + si + 1001]
; ip: 0x002f    ; [di + 999]: 0x4c00 -> 0x4c00 | SF: 0 -> 0 ZF: 1 -> 0  ; clocks: +25 (16 + 9ea) = 252
add [di + 999], dx
; ip: 0x0033    ; [bp + si]: 0x004c -> 0x0097 | SF: 0 -> 0 ZF: 0 -> 1   ; clocks: +33 (17 + 8ea + 8odd) = 285
add [bp + si], 75

Final registers:
        bx: 0x03e8 (1000)
        bp: 0x07d0 (2000)
        si: 0x0bb8 (3000)
        di: 0x0fa0 (4000)
        ip: 0x0036 (54)

Please explain.

Revise `[bp + 0]` effective address cycles from 5 to 9 in listing 56?

In Q+A 21 for question [00:13], you reviewed the microcode and concluded (at around [25:40]) that a displacement of 0 would still go through the motions in an effective address calculation, and thus take 9 cycles instead of 5. Were you planning on changing listing 56 to reflect this (and I guess your simulator as well)?

So

mov cx, [bp] ; Clocks: +13 = 62 (8 + 5ea) | ip:0x17->0x1a

should really be

mov cx, [bp] ; Clocks: +17 = 66 (8 + 9ea) | ip:0x17->0x1a 

and

mov cx, [bp] ; Clocks: +17 = 74 (8 + 5ea + 4p) | ip:0x17->0x1a

should really be

mov cx, [bp] ; Clocks: +21 = 78 (8 + 9ea + 4p) | ip:0x17->0x1a

I bring this up because in order to satisfy the listings as they currently are in my simulator code, I had to add extra logic to check if the displacement value was 0, and if so, discard the displacement cycles.

It seems to me that perhaps the reference simulator treats a displacement of 0 and 'no displacement' as the same thing due to the if check simply checking that the value is not 0 rather than checking for existence. Perhaps this is the only reason why the reference simulator showed 5 cycles instead of 9 in the first place:

if(Expr.Displacement)
{
Result += 4;
}

I'm fine with "correct, but won't fix," but I just wanted to point this out in case others got confused like I did, and to verify that I'm understanding this correctly.

Call direct intersegment not handled?

So I could totally be misunderstanding, but I don't think the reference decoder is handling "Call direct intersegment"

If I give NASM this assembly

bits 16 ; or cpu 8086 - same result
call 999:888

I get these bytes (hex)

9A 78 03 E7 03

Giving this to the reference decoder I get this output

bits 16
call 231
ERROR: Instruction extends outside disassembly region

Unless this isn't an 8086 instruction? But those by bytes do look to me like that instruction.

(Built on Windows 10 MSVC. Other listing work as expected. Believe the same also applies to "Jump direct intersegment")

Opcodes for text and xchg overlap

test instruction at https://github.com/cmuratori/computer_enhance/blob/main/perfaware/sim86/sim86_instruction_table.inl#L145 should have opcode 1000010 without a D flag, according to table 4-31 in the manual. Table 4-25 (incorrectly) has it with a different opcode and a D flag so understandable how it got missed. I was playing around with code generation and noticed it getting confused distinguishing xchg and test in a table that was extracted from the .inl file.

Undefined behavior in circular buffer

Circular buffer implementation that involves page mapping mentioned in the recent video (which is a great video btw) behaves inconsistently on different optimization levels, which is likely caused by undefined behavior. The reason is probably because of aliasing rules that modern compilers use aggressively to optimize code. The following code uses the circular buffer defined in perfaware/part3/listing_0121_circular_buffer_main.cpp:

int main(void)
{
    printf("Circular buffer test:\n");
    
    const size_t BUF_SIZE = 64 * 4096;

    circular_buffer Circular = AllocateCircularBuffer(BUF_SIZE, 3);
    
    if(IsValid(Circular))
    {
        u8 *Data = Circular.Base.Data + BUF_SIZE;

        Data[0] = 1;
        Data[BUF_SIZE] = 2;

        printf("%u\n", Data[0]);

        DeallocateCircularBuffer(&Circular);
    }
    else
    {
        printf("  FAILED\n");
    }
    
    // NOTE(casey): Since we do not use these functions in this particular build, we reference their pointers
    // here to prevent the compiler from complaining about "unused functions".
    (void)&IsInBounds;
    (void)&AreEqual;
    (void)&AllocateBuffer;
    (void)&FreeBuffer;
    
    return 0;
}

This code outputs (which is the expected result) on each compiler with optimizations off (cl /Od, g++ -O0, clang++ -O0):

Circular buffer test:
2

But it gives the following output when optimizations are on (cl /O2, g++ -O2, clang++ -O2):

Circular buffer test:
1

It seems like compilers assume that writing to Data[BUF_SIZE] could not possibly affect the value of Data[0], so it can safely put the known value of Data[0] directly into printf.
Here is the assembly generated with g++ -O2 (g++ version 13.1, mingw-w64)

   140007eba:   c6 80 00 00 04 00 01    mov    BYTE PTR [rax+0x40000],0x1   ; write 1 to Data[0]
   140007ec1:   48 8d 0d 8b 21 00 00    lea    rcx,[rip+0x218b]
   140007ec8:   ba 01 00 00 00          mov    edx,0x1                      ; put 1 directly into printf args
   140007ecd:   c6 80 00 00 08 00 02    mov    BYTE PTR [rax+0x80000],0x2   ; write 2 to Data[BUF_SIZE]
   140007ed4:   e8 f7 fd ff ff          call   140007cd0 <_Z6printfPKcz>    ; call printf

And here is the assembly generated with g++ -O0

   140001aec:   c6 00 01                mov    BYTE PTR [rax],0x1           ; write 1 to Data[0]
   140001aef:   48 8b 45 f0             mov    rax,QWORD PTR [rbp-0x10]
   140001af3:   48 05 00 00 04 00       add    rax,0x40000
   140001af9:   c6 00 02                mov    BYTE PTR [rax],0x2           ; write 2 to Data[BUF_SIZE]
   140001afc:   48 8b 45 f0             mov    rax,QWORD PTR [rbp-0x10]
   140001b00:   0f b6 00                movzx  eax,BYTE PTR [rax]           ; read Data[0] again
   140001b03:   0f b6 c0                movzx  eax,al
   140001b06:   89 c2                   mov    edx,eax                      ; put the value of Data[0] into printf args
   140001b08:   48 8d 05 6b 85 00 00    lea    rax,[rip+0x856b]
   140001b0f:   48 89 c1                mov    rcx,rax
   140001b12:   e8 39 68 00 00          call   140008350 <_Z6printfPKcz>    ; call printf

Sorry if it's not the right place to disscuss this, but YouTube comments are disabled, and Computerenhance comments are for subscribers only. But I believe it should be mentioned somewhere that this kind of circular buffers are not really safe to use with modern compilers unless someone figures out how to reliably tell the compiler that this kind of page manipulation is involved.

Is underflow ok ?

I have hard time to understand why you did the Tester->TimeAccumulatedOnThisTest -= ReadCPUTimer(); part of your code.
To me, and after trying to find a different answer, I conclude that if Tester->TimeAccumulatedOnThisTest is =0 (apparently on first Tester it can) you will "underflow".
I presume that we don't care because even if the underflow is not consistent on every platform, the behaviour is consistent on the platform it runs on.
But what I don't get is that on the first repetition of a specific Tester, you will get a TSCElapsed for a single read not representative of the actual TSC it takes to read, because:
image

I am certain that if you did it, the reason is that the code don't care about it. But I need someone to point me to the obvious.

In my code I did :

static void BeginTime(repetition_tester *Tester)
{
    Tester->TSCLastRepetition = ReadCPUTimer();
}

static void EndTime(repetition_tester *Tester)
{
    Tester->TSCLastRepetition = ReadCPUTimer() - Tester->TSCLastRepetition; 
}

Thank you for anyone that will make me understand that part of Casey's source code. As computer enhance is about CPU and not the source code it's the only place where I can post it.

Calls lose far for intra-segment addresses

Both FF 52 C6 and FF 5A C6 prints to call word [bp+si-58]

But it for FF 5A C6 it should be call far word [bp+si-58] to indicate it is inter-segment call.

Similar for jumps:
FF 25 and FF 2D give jmp word [di]
But second one should be jmp far word [di] as inter-segment jump.

Consider doing retf for inter-segment returns

CA 98 44 and C2 98 44 disassembles to same ret 17560 instruction.
But actually first is intra-segment return, second is inter-segment return - they use different opcodes.
Typically they are written as retn and retf in asm to specify which one you want. NASM knows retn and retf.

Similarly for CB should be retf, and C3 should be retn, but sim86 does ret for both.

Bad size prefix for moving segment reg to memory

Input bytes 8C 40 3B decodes & prints to mov byte [bx+si+59], es which is nonsense. Cannot move byte from segment register. There should be no byte prefix. Just mov [bx+si+59], es assembles back to 8C 40 3B correctly.

Always print immediates as unsigned in the disassembly

I'm perhaps late to the party.

I have a problem with how Immediate are being printed out in the disassembly.

As we can't differentiate between positive and negative immediates in the binary, the last -90 seems to be out of place. I would appreciate if it would be printed as unsigned 65446.

Immediate is printed as signed integer here:
https://github.com/cmuratori/computer_enhance/blob/main/perfaware/sim86/sim86_text.cpp#L128

And this is how it's read:
https://github.com/cmuratori/computer_enhance/blob/main/perfaware/sim86/sim86_decode.cpp#L62

When we read wide, 2 bytes are read and stored into lower word of 32-bit Result. No sign extension happens if the word has bit 15 set, hence this 16-bit immediate would be always printed as a positive 32-bit integer.
However in case of sign extension we sign extend all 32-bits of Result. This immediate could be printed out as a negative 32-bit integer.
I think it would be more appropriate to sign extend only up to a word boundary and always print as unsigned. What do you think?

REPNE prefix not handled

repne.binary.txt

$ sim86_clang_debug.exe repne.binary.txt
; repne.binary.txt disassembly:
bits 16
rep cmpsb
rep scasb
rep cmpsw
rep scasw
rep cmpsb
rep scasb
rep movsw
rep cmpsw
rep scasw
$ ndisasm.exe repne.binary.txt
00000000  F3A6              repe cmpsb
00000002  F3AE              repe scasb
00000004  F3A7              repe cmpsw
00000006  F3AF              repe scasw
00000008  F2A6              repne cmpsb
0000000A  F2AE              repne scasb
0000000C  F2A5              repne movsw
0000000E  F2A7              repne cmpsw
00000010  F2AF              repne scasw

REPE is same as REP for cmps/scas instruction.

Basically movs/stos/lods use REP prefix.
cmps/scas use REPE/REPZ (which has same encoding as REP) or REPNE/REPNZ.

Is listing_0101_read_bandwidth_main.cpp in the wrong folder ?

I may guess it wrong but it seems that you missplaced listing_0101_read_bandwidth_main.cpp in part 2 folder instead of part 3. But as the TOC in https://www.computerenhance.com/p/table-of-contents is broken for me, I cannot be sure. If it was obvious I wouldn't mind but it's being quite confusing for me.
Edit: You have it duplicated in both folders actually, as I said it's fine, but as the TOC is broken it was confusing
Edit: TOC has been repaired today.

Thank you !

DllNotFoundException running C# sim86_test.cs (Ubuntu)

Hi,

When I follow the instructions at the top of sim86_test.cs, ie. copying the sim86_shared_debug.dll next to sim86.cs and sim86.test, etc, and running dotnet run from that directory, I get the following exception:

Exception has occurred: CLR/System.DllNotFoundException
An unhandled exception of type 'System.DllNotFoundException' occurred in sim86.dll: 'Unable to load shared library 'sim86_shared_debug' or one of its dependencies. In order to help diagnose loading problems, consider setting the LD_DEBUG environment variable: libsim86_shared_debug: cannot open shared object file: No such file or directory'
   at Sim86.Native.Sim86_GetVersion()
   at Sim86.GetVersion() in /home/ethan/repos/computer_enhance/perfaware/sim86/shared/contrib_csharp/sim86.cs:line 291
   at Program.<Main>$(String[] args) in /home/ethan/repos/computer_enhance/perfaware/sim86/shared/contrib_csharp/sim86_test.cs:line 23

I'm new to calling into c++ dlls so maybe I'm missing something obvious but figured I post here in case anyone can help.

image

I'm running it in vscode on Ubuntu 20.04

How to view ASM output - CLion

Hey,

I mostly use CLion for C/C++ and I noticed the instruction markdown for how to view ASM does not mention it.

This could be covered inside the "Using a debugger section" along with Visual Studio, but someone also made a cool Compiler Explorer plugin.

Would that be helpful/interesting to add in a PR?

Sim DLL doesn't work on Apple M1 running Ventura 13.6

I think the issue is that MacOS wants binary files in "Mach-O" format, which the DLLS aren't. Found a post (here) that seems to confirm.

I will try & build myself in the mean time. If I get it working, would it be helpful to post the MacOS DLL?

Thank you!

Full error for context:

Traceback (most recent call last):
  File "/Users/username/dev/computer-enhance/sim86_test.py", line 1, in <module>
    import sim86
  File "/Users/username/dev/computer-enhance/sim86.py", line 202, in <module>
    dll = ctypes.CDLL(str(pathlib.Path(__file__).parent / "sim86_shared_debug.dll"))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.11.7/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ctypes/__init__.py", line 376, in __init__
    self._handle = _dlopen(self._name, mode)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: dlopen(/Users/username/dev/computer-enhance/sim86_shared_debug.dll, 0x0006): tried: '/Users/username/dev/computer-enhance/sim86_shared_debug.dll' (not a mach-o file), '/System/Volumes/Preboot/Cryptexes/OS/Users/username/dev/computer-enhance/sim86_shared_debug.dll' (no such file), '/Users/username/dev/computer-enhance/sim86_shared_debug.dll' (not a mach-o file)

`[0]` mistakenly decodes to `[]`

Given the file:

; listing A
bits 16
mov ax, [0]

nasm produces A1 00 00

sim86 decodes that to:

; listing A disassembly
bits 16
mov ax, []

which nasm cannot assemble (error: expression syntax error).

I would expect it to produce listing A.

This happens because PrintEffectiveAddressExpression of an effective_address_expression with no terms and a 0 Displacement doesn't print anything due to this conditional:

    if(Address.Displacement != 0)
    {
        fprintf(Dest, "%+d", Address.Displacement);
    }

Push instruction decode has operand 0 as Operand_None operand 1 as Operand_Register

Hi Casey,

Calling Sim86_RegisterNameFromOperand on a push r instruction, eg push cx, gives an instruction that has the register in Operands[1], while Operands[0] is Operand_None.

This surprised me when setting up to use your shared library to decode (which otherwise was incredibly easy to use/understand!). Looking at your code I'm afraid I can't tell for sure if this is an issue or if it is expected, so this might not be an issue at all!

Loving the course btw!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.