cmuratori / computer_enhance Goto Github PK
View Code? Open in Web Editor NEWSource code for the https://computerenhance.com programming series
License: Other
Source code for the https://computerenhance.com programming series
License: Other
For reference, this is my code:
std::pair<size_t, size_t> get_clocks_for_ea_and_transfers(
effective_address_expression ea,
size_t num_transfers,
bool from_to_acc
) {
auto term0_reg_info = ea.Terms[0].Register;
auto term1_reg_info = ea.Terms[1].Register;
auto term0_reg_val = term0_reg_info.Index != 0 ? registers[term0_reg_info.Index - 1] : 0;
auto term1_reg_val = term1_reg_info.Index != 0 ? registers[term1_reg_info.Index - 1] : 0;
auto disp = ea.Displacement;
auto addr = memory[term0_reg_val + term1_reg_val + disp];
// Replacing `addr` with `disp` here lends the same numbers as Casey's listing text file
auto clocks_transfers = 4 * num_transfers * (addr & 1);
if (from_to_acc) {
return {0, clocks_transfers};
}
auto clocks_ea = 0;
if (term1_reg_info.Index == 0 && term0_reg_info.Index == 0 && disp != 0) {
clocks_ea = 6;
}
if (disp == 0 && (term0_reg_info.Index > 0 || term1_reg_info.Index > 0)) {
clocks_ea = 5;
}
if (term0_reg_info.Index > 0 && term1_reg_info.Index == 0 && disp != 0) {
clocks_ea = 9;
}
// bp + di / bx + si
if (term0_reg_info.Index == 6 && term1_reg_info.Index == 8
|| term0_reg_info.Index == 2 && term1_reg_info.Index == 7) {
clocks_ea = disp == 0 ? 7 : 11;
}
// bp + si / bx + di
if (term0_reg_info.Index == 6 && term1_reg_info.Index == 7
|| term0_reg_info.Index == 2 && term1_reg_info.Index == 8) {
clocks_ea = disp == 0 ? 8 : 12;
}
return {clocks_ea, clocks_transfers};
}
My output when calculating using addr
and not disp
:
; ip: 0x0000 ; bx: 0x0000 -> 0x03e8 ; clocks: +4 = 4
mov bx, 1000
; ip: 0x0003 ; bp: 0x0000 -> 0x07d0 ; clocks: +4 = 8
mov bp, 2000
; ip: 0x0006 ; si: 0x0000 -> 0x0bb8 ; clocks: +4 = 12
mov si, 3000
; ip: 0x0009 ; di: 0x0000 -> 0x0fa0 ; clocks: +4 = 16
mov di, 4000
; ip: 0x000c ; cx: 0x0000 -> 0x0000 ; clocks: +15 (8 + 7ea) = 31
mov cx, [bp + di]
; ip: 0x000e ; [bx + si]: 0x0000 -> 0x0000 ; clocks: +16 (9 + 7ea) = 47
mov [bx + si], cx
; ip: 0x0010 ; cx: 0x0000 -> 0x0000 ; clocks: +16 (8 + 8ea) = 63
mov cx, [bp + si]
; ip: 0x0012 ; [bx + di]: 0x0000 -> 0x0000 ; clocks: +17 (9 + 8ea) = 80
mov [bx + di], cx
; ip: 0x0014 ; cx: 0x0000 -> 0x0000 ; clocks: +19 (8 + 11ea) = 99
mov cx, [bp + di + 1000]
; ip: 0x0018 ; [bx + si + 1000]: 0x0000 -> 0x0000 ; clocks: +20 (9 + 11ea) = 119
mov [bx + si + 1000], cx
; ip: 0x001c ; cx: 0x0000 -> 0x0000 ; clocks: +20 (8 + 12ea) = 139
mov cx, [bp + si + 1000]
; ip: 0x0020 ; [bx + di + 1000]: 0x0000 -> 0x0000 ; clocks: +21 (9 + 12ea) = 160
mov [bx + di + 1000], cx
; ip: 0x0024 ; dx: 0x0000 -> 0x0000 | SF: 0 -> 0 ZF: 0 -> 1 ; clocks: +21 (9 + 12ea) = 181
add dx, [bp + si + 1000]
; ip: 0x0028 ; [bp + si]: 0x0000 -> 0x004c | SF: 0 -> 0 ZF: 1 -> 1 ; clocks: +25 (17 + 8ea) = 206
add [bp + si], 76
; ip: 0x002b ; dx: 0x0000 -> 0x0000 | SF: 0 -> 0 ZF: 1 -> 1 ; clocks: +21 (9 + 12ea) = 227
add dx, [bp + si + 1001]
; ip: 0x002f ; [di + 999]: 0x4c00 -> 0x4c00 | SF: 0 -> 0 ZF: 1 -> 0 ; clocks: +25 (16 + 9ea) = 252
add [di + 999], dx
; ip: 0x0033 ; [bp + si]: 0x004c -> 0x0097 | SF: 0 -> 0 ZF: 0 -> 1 ; clocks: +33 (17 + 8ea + 8odd) = 285
add [bp + si], 75
Final registers:
bx: 0x03e8 (1000)
bp: 0x07d0 (2000)
si: 0x0bb8 (3000)
di: 0x0fa0 (4000)
ip: 0x0036 (54)
Please explain.
In Q+A 21 for question [00:13], you reviewed the microcode and concluded (at around [25:40]) that a displacement of 0 would still go through the motions in an effective address calculation, and thus take 9 cycles instead of 5. Were you planning on changing listing 56 to reflect this (and I guess your simulator as well)?
So
should really be
mov cx, [bp] ; Clocks: +17 = 66 (8 + 9ea) | ip:0x17->0x1a
and
mov cx, [bp] ; Clocks: +21 = 78 (8 + 9ea + 4p) | ip:0x17->0x1a
I bring this up because in order to satisfy the listings as they currently are in my simulator code, I had to add extra logic to check if the displacement value was 0, and if so, discard the displacement cycles.
It seems to me that perhaps the reference simulator treats a displacement of 0 and 'no displacement' as the same thing due to the if
check simply checking that the value is not 0 rather than checking for existence. Perhaps this is the only reason why the reference simulator showed 5 cycles instead of 9 in the first place:
computer_enhance/perfaware/sim86/sim86_cycles.cpp
Lines 65 to 68 in 15e0e7b
I'm fine with "correct, but won't fix," but I just wanted to point this out in case others got confused like I did, and to verify that I'm understanding this correctly.
So I could totally be misunderstanding, but I don't think the reference decoder is handling "Call direct intersegment"
If I give NASM this assembly
bits 16 ; or cpu 8086 - same result
call 999:888
I get these bytes (hex)
9A 78 03 E7 03
Giving this to the reference decoder I get this output
bits 16
call 231
ERROR: Instruction extends outside disassembly region
Unless this isn't an 8086 instruction? But those by bytes do look to me like that instruction.
(Built on Windows 10 MSVC. Other listing work as expected. Believe the same also applies to "Jump direct intersegment")
test
instruction at https://github.com/cmuratori/computer_enhance/blob/main/perfaware/sim86/sim86_instruction_table.inl#L145 should have opcode 1000010
without a D
flag, according to table 4-31 in the manual. Table 4-25 (incorrectly) has it with a different opcode and a D
flag so understandable how it got missed. I was playing around with code generation and noticed it getting confused distinguishing xchg
and test
in a table that was extracted from the .inl file.
In the homework, I believe this line of assembly is wrong.
According to the 8086 manual, there is not source address calculation with the register bp since the corresponding code is instead use for Direct address.
Input bytes E9 39 0A
produce jmp
, but should output jmp 0xa3c
Input bytes E8 16 2E
produce call
, but should output call 0x2e19
Circular buffer implementation that involves page mapping mentioned in the recent video (which is a great video btw) behaves inconsistently on different optimization levels, which is likely caused by undefined behavior. The reason is probably because of aliasing rules that modern compilers use aggressively to optimize code. The following code uses the circular buffer defined in perfaware/part3/listing_0121_circular_buffer_main.cpp
:
int main(void)
{
printf("Circular buffer test:\n");
const size_t BUF_SIZE = 64 * 4096;
circular_buffer Circular = AllocateCircularBuffer(BUF_SIZE, 3);
if(IsValid(Circular))
{
u8 *Data = Circular.Base.Data + BUF_SIZE;
Data[0] = 1;
Data[BUF_SIZE] = 2;
printf("%u\n", Data[0]);
DeallocateCircularBuffer(&Circular);
}
else
{
printf(" FAILED\n");
}
// NOTE(casey): Since we do not use these functions in this particular build, we reference their pointers
// here to prevent the compiler from complaining about "unused functions".
(void)&IsInBounds;
(void)&AreEqual;
(void)&AllocateBuffer;
(void)&FreeBuffer;
return 0;
}
This code outputs (which is the expected result) on each compiler with optimizations off (cl /Od
, g++ -O0
, clang++ -O0
):
Circular buffer test:
2
But it gives the following output when optimizations are on (cl /O2
, g++ -O2
, clang++ -O2
):
Circular buffer test:
1
It seems like compilers assume that writing to Data[BUF_SIZE]
could not possibly affect the value of Data[0]
, so it can safely put the known value of Data[0]
directly into printf.
Here is the assembly generated with g++ -O2
(g++ version 13.1, mingw-w64)
140007eba: c6 80 00 00 04 00 01 mov BYTE PTR [rax+0x40000],0x1 ; write 1 to Data[0]
140007ec1: 48 8d 0d 8b 21 00 00 lea rcx,[rip+0x218b]
140007ec8: ba 01 00 00 00 mov edx,0x1 ; put 1 directly into printf args
140007ecd: c6 80 00 00 08 00 02 mov BYTE PTR [rax+0x80000],0x2 ; write 2 to Data[BUF_SIZE]
140007ed4: e8 f7 fd ff ff call 140007cd0 <_Z6printfPKcz> ; call printf
And here is the assembly generated with g++ -O0
140001aec: c6 00 01 mov BYTE PTR [rax],0x1 ; write 1 to Data[0]
140001aef: 48 8b 45 f0 mov rax,QWORD PTR [rbp-0x10]
140001af3: 48 05 00 00 04 00 add rax,0x40000
140001af9: c6 00 02 mov BYTE PTR [rax],0x2 ; write 2 to Data[BUF_SIZE]
140001afc: 48 8b 45 f0 mov rax,QWORD PTR [rbp-0x10]
140001b00: 0f b6 00 movzx eax,BYTE PTR [rax] ; read Data[0] again
140001b03: 0f b6 c0 movzx eax,al
140001b06: 89 c2 mov edx,eax ; put the value of Data[0] into printf args
140001b08: 48 8d 05 6b 85 00 00 lea rax,[rip+0x856b]
140001b0f: 48 89 c1 mov rcx,rax
140001b12: e8 39 68 00 00 call 140008350 <_Z6printfPKcz> ; call printf
Sorry if it's not the right place to disscuss this, but YouTube comments are disabled, and Computerenhance comments are for subscribers only. But I believe it should be mentioned somewhere that this kind of circular buffers are not really safe to use with modern compilers unless someone figures out how to reliably tell the compiler that this kind of page manipulation is involved.
I have hard time to understand why you did the Tester->TimeAccumulatedOnThisTest -= ReadCPUTimer();
part of your code.
To me, and after trying to find a different answer, I conclude that if Tester->TimeAccumulatedOnThisTest
is =0
(apparently on first Tester it can) you will "underflow".
I presume that we don't care because even if the underflow is not consistent on every platform, the behaviour is consistent on the platform it runs on.
But what I don't get is that on the first repetition of a specific Tester
, you will get a TSCElapsed
for a single read not representative of the actual TSC it takes to read, because:
I am certain that if you did it, the reason is that the code don't care about it. But I need someone to point me to the obvious.
In my code I did :
static void BeginTime(repetition_tester *Tester)
{
Tester->TSCLastRepetition = ReadCPUTimer();
}
static void EndTime(repetition_tester *Tester)
{
Tester->TSCLastRepetition = ReadCPUTimer() - Tester->TSCLastRepetition;
}
Thank you for anyone that will make me understand that part of Casey's source code. As computer enhance is about CPU and not the source code it's the only place where I can post it.
A direct intersegment jump cause a jmp
and push
to generate
Example input:
JMP 0x5566:0x7788 ;ea 66 55 88 77
Example hexdump:
00000000: ea88 7766 55 ..wfU
sim86 output:
bits 16
jmp 102
push bp
Both FF 52 C6
and FF 5A C6
prints to call word [bp+si-58]
But it for FF 5A C6
it should be call far word [bp+si-58]
to indicate it is inter-segment call.
Similar for jumps:
FF 25
and FF 2D
give jmp word [di]
But second one should be jmp far word [di]
as inter-segment jump.
CA 98 44
and C2 98 44
disassembles to same ret 17560
instruction.
But actually first is intra-segment return, second is inter-segment return - they use different opcodes.
Typically they are written as retn
and retf
in asm to specify which one you want. NASM knows retn
and retf
.
Similarly for CB
should be retf
, and C3
should be retn
, but sim86 does ret
for both.
Input bytes 8C 40 3B
decodes & prints to mov byte [bx+si+59], es
which is nonsense. Cannot move byte from segment register. There should be no byte
prefix. Just mov [bx+si+59], es
assembles back to 8C 40 3B
correctly.
I'm perhaps late to the party.
I have a problem with how Immediate are being printed out in the disassembly.
Binary:
BA 88 88
Your reference sim86 disassembles it as:
mov dx, 34952
Binary
81 C3 40 9C
83 C1 A6
Your reference sim86 disassembles it as:
add cx, 40000
add cx, -90
As we can't differentiate between positive and negative immediates in the binary, the last -90
seems to be out of place. I would appreciate if it would be printed as unsigned 65446.
Immediate is printed as signed integer here:
https://github.com/cmuratori/computer_enhance/blob/main/perfaware/sim86/sim86_text.cpp#L128
And this is how it's read:
https://github.com/cmuratori/computer_enhance/blob/main/perfaware/sim86/sim86_decode.cpp#L62
When we read wide, 2 bytes are read and stored into lower word of 32-bit Result. No sign extension happens if the word has bit 15 set, hence this 16-bit immediate would be always printed as a positive 32-bit integer.
However in case of sign extension we sign extend all 32-bits of Result. This immediate could be printed out as a negative 32-bit integer.
I think it would be more appropriate to sign extend only up to a word boundary and always print as unsigned. What do you think?
$ sim86_clang_debug.exe repne.binary.txt
; repne.binary.txt disassembly:
bits 16
rep cmpsb
rep scasb
rep cmpsw
rep scasw
rep cmpsb
rep scasb
rep movsw
rep cmpsw
rep scasw
$ ndisasm.exe repne.binary.txt
00000000 F3A6 repe cmpsb
00000002 F3AE repe scasb
00000004 F3A7 repe cmpsw
00000006 F3AF repe scasw
00000008 F2A6 repne cmpsb
0000000A F2AE repne scasb
0000000C F2A5 repne movsw
0000000E F2A7 repne cmpsw
00000010 F2AF repne scasw
REPE is same as REP for cmps/scas instruction.
Basically movs/stos/lods use REP prefix.
cmps/scas use REPE/REPZ (which has same encoding as REP) or REPNE/REPNZ.
I may guess it wrong but it seems that you missplaced listing_0101_read_bandwidth_main.cpp
in part 2
folder instead of part 3
. But as the TOC in https://www.computerenhance.com/p/table-of-contents is broken for me, I cannot be sure. If it was obvious I wouldn't mind but it's being quite confusing for me.
Edit: You have it duplicated in both folders actually, as I said it's fine, but as the TOC is broken it was confusing
Edit: TOC has been repaired today.
Thank you !
Hi,
When I follow the instructions at the top of sim86_test.cs, ie. copying the sim86_shared_debug.dll next to sim86.cs and sim86.test, etc, and running dotnet run from that directory, I get the following exception:
Exception has occurred: CLR/System.DllNotFoundException
An unhandled exception of type 'System.DllNotFoundException' occurred in sim86.dll: 'Unable to load shared library 'sim86_shared_debug' or one of its dependencies. In order to help diagnose loading problems, consider setting the LD_DEBUG environment variable: libsim86_shared_debug: cannot open shared object file: No such file or directory'
at Sim86.Native.Sim86_GetVersion()
at Sim86.GetVersion() in /home/ethan/repos/computer_enhance/perfaware/sim86/shared/contrib_csharp/sim86.cs:line 291
at Program.<Main>$(String[] args) in /home/ethan/repos/computer_enhance/perfaware/sim86/shared/contrib_csharp/sim86_test.cs:line 23
I'm new to calling into c++ dlls so maybe I'm missing something obvious but figured I post here in case anyone can help.
I'm running it in vscode on Ubuntu 20.04
Hey,
I mostly use CLion for C/C++ and I noticed the instruction markdown for how to view ASM does not mention it.
This could be covered inside the "Using a debugger section" along with Visual Studio, but someone also made a cool Compiler Explorer plugin.
Would that be helpful/interesting to add in a PR?
I think the issue is that MacOS wants binary files in "Mach-O" format, which the DLLS aren't. Found a post (here) that seems to confirm.
I will try & build myself in the mean time. If I get it working, would it be helpful to post the MacOS DLL?
Thank you!
Full error for context:
Traceback (most recent call last):
File "/Users/username/dev/computer-enhance/sim86_test.py", line 1, in <module>
import sim86
File "/Users/username/dev/computer-enhance/sim86.py", line 202, in <module>
dll = ctypes.CDLL(str(pathlib.Path(__file__).parent / "sim86_shared_debug.dll"))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/[email protected]/3.11.7/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ctypes/__init__.py", line 376, in __init__
self._handle = _dlopen(self._name, mode)
^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: dlopen(/Users/username/dev/computer-enhance/sim86_shared_debug.dll, 0x0006): tried: '/Users/username/dev/computer-enhance/sim86_shared_debug.dll' (not a mach-o file), '/System/Volumes/Preboot/Cryptexes/OS/Users/username/dev/computer-enhance/sim86_shared_debug.dll' (no such file), '/Users/username/dev/computer-enhance/sim86_shared_debug.dll' (not a mach-o file)
Given the file:
; listing A
bits 16
mov ax, [0]
nasm
produces A1 00 00
sim86
decodes that to:
; listing A disassembly
bits 16
mov ax, []
which nasm
cannot assemble (error: expression syntax error).
I would expect it to produce listing A.
This happens because PrintEffectiveAddressExpression
of an effective_address_expression
with no terms and a 0
Displacement
doesn't print anything due to this conditional:
if(Address.Displacement != 0)
{
fprintf(Dest, "%+d", Address.Displacement);
}
Hi Casey,
Calling Sim86_RegisterNameFromOperand
on a push r
instruction, eg push cx
, gives an instruction
that has the register in Operands[1]
, while Operands[0]
is Operand_None
.
This surprised me when setting up to use your shared library to decode (which otherwise was incredibly easy to use/understand!). Looking at your code I'm afraid I can't tell for sure if this is an issue or if it is expected, so this might not be an issue at all!
Loving the course btw!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.