Comments (8)
Good idea, I've tried to add it to the project but OpenWatcom can't compile it correctly. We need to port it to OpenWatcom first to be able to use it 😢
from fastdoom.
It seems to be mostly die to the new C99 option to mix declaration with expressions, I am working on a fix.
Also there is no 128b integer support on WC
from fastdoom.
Well. this one works, (still work in progress).
libdivide-wc.zip
When I use gcc, the performance difference between division and fastdiv is HUGE, however it is just around 20% faster on openwatcom.
I guess this is because of inlining...
I guess we can extract only the signed int32/int32 functions and build them with gcc, and then inline the generated assembly code.
Maybe we also need unsigned divisions and 16 bit divisions?
unsigned divisions are faster...
from fastdoom.
more minimalist libdiv.h:
libdiv.zip
Without SSE AVX etc and without 128 bit integers
from fastdoom.
Well here is the minimal version with inlined asm for Openwatcom C89 compatible.
There still are improvements for the generating stuff.
#define int32_t long
#define uint32_t unsigned long
#define uint8_t unsigned char
#define uint64_t unsigned long long
enum {
LIBDIVIDE_16_SHIFT_MASK = 0x1F,
LIBDIVIDE_32_SHIFT_MASK = 0x1F,
LIBDIVIDE_64_SHIFT_MASK = 0x3F,
LIBDIVIDE_ADD_MARKER = 0x40,
LIBDIVIDE_NEGATIVE_DIVISOR = 0x80
};
struct libdivide_s32_t {
int32_t magic;
uint8_t more;
};
// TODO: use BSR inline asm for WATCOM
static int32_t libdivide_count_leading_zeros32(uint32_t val)
{
int32_t result = 8;
uint32_t hi = 0xFFU << 24;
if (val == 0) return 32;
while ((val & hi) == 0) {
hi >>= 8;
result += 8;
}
while (val & hi) {
result -= 1;
hi <<= 1;
}
return result;
}
// libdivide_64_div_32_to_32: divides a 64-bit uint {u1, u0} by a 32-bit
// uint {v}. The result must fit in 32 bits.
// Returns the quotient directly and the remainder in *r
static uint32_t libdivide_64_div_32_to_32(uint32_t u1, uint32_t u0, uint32_t v, uint32_t *r)
{
#if (defined(LIBDIVIDE_i386) || defined(LIBDIVIDE_X86_64)) && defined(LIBDIVIDE_GCC_STYLE_ASM)
uint32_t result;
__asm__("divl %[v]" : "=a"(result), "=d"(*r) : [v] "r"(v), "a"(u0), "d"(u1));
return result;
#else
uint64_t n = ((uint64_t)u1 << 32) | u0;
uint32_t result = (uint32_t)(n / v);
*r = (uint32_t)(n - result * (uint64_t)v);
return result;
#endif
}
// generate psedo inverse to go inside libdiv_s32_do(x, div)
struct libdivide_s32_t libdiv_s32_gen(int32_t d)
{
struct libdivide_s32_t result;
// If d is a power of 2, or negative a power of 2, we have to use a shift.
// This is especially important because the magic algorithm fails for -1.
// To check if d is a power of 2 or its inverse, it suffices to check
// whether its absolute value has exactly one bit set. This works even for
// INT_MIN, because abs(INT_MIN) == INT_MIN, and INT_MIN has one bit set
// and is a power of 2.
uint32_t ud = (uint32_t)d;
uint32_t absD = (d < 0) ? -ud : ud;
uint32_t floor_log_2_d = 31 - libdivide_count_leading_zeros32(absD);
// check if exactly one bit is set,
// don't care if absD is 0 since that's divide by zero
if ((absD & (absD - 1)) == 0) {
result.magic = 0;
result.more = (uint8_t)(floor_log_2_d | (d < 0 ? LIBDIVIDE_NEGATIVE_DIVISOR : 0));
} else {
// LIBDIVIDE_ASSERT(floor_log_2_d >= 1);
uint8_t more;
int32_t magic;
// the dividend here is 2**(floor_log_2_d + 31), so the low 32 bit word
// is 0 and the high word is floor_log_2_d - 1
uint32_t e;
uint32_t rem, proposed_m;
proposed_m = libdivide_64_div_32_to_32((uint32_t)1 << (floor_log_2_d - 1), 0, absD, &rem);
e = absD - rem;
// We are going to start with a power of floor_log_2_d - 1.
// This works if works if e < 2**floor_log_2_d.
if (e < ((uint32_t)1 << floor_log_2_d)) {
// This power works
more = (uint8_t)(floor_log_2_d - 1);
} else {
// We need to go one higher. This should not make proposed_m
// overflow, but it will make it negative when interpreted as an
// int32_t.
const uint32_t twice_rem = rem + rem;
proposed_m += proposed_m;
if (twice_rem >= absD || twice_rem < rem) proposed_m += 1;
more = (uint8_t)(floor_log_2_d | LIBDIVIDE_ADD_MARKER);
}
proposed_m += 1;
magic = (int32_t)proposed_m;
// Mark if we are negative.
if (d < 0) {
more |= LIBDIVIDE_NEGATIVE_DIVISOR;
magic = -magic;
}
result.more = more;
result.magic = magic;
}
return result;
}
#undef int32_t
#undef uint32_t
#undef uint8_t
#undef uint64_t
// Build from
int libdiv_s32_do(int x, struct libdivide_s32_t *div);
#pragma aux libdiv_s32_do = \
"mov esi, ecx", \
"mov bl, BYTE PTR [edx+4]", \
"mov eax, DWORD PTR [edx]", \
"mov cl, bl", \
"and ecx, 31", \
"test eax, eax", \
"jne L2", \
"sar bl, 7", \
"movsx ebx, bl", \
"mov eax, esi", \
"sar eax, cl", \
"xor eax, ebx", \
"sub eax, ebx", \
"jmp ENDPROC", \
"L2:", \
"imul esi", \
"mov eax, edx", \
"test bl, 64", \
"je L4", \
"sar bl, 7", \
"movsx ebx, bl", \
"xor esi, ebx", \
"add eax, esi", \
"sub eax, ebx", \
"L4:", \
"mov edx, eax", \
"sar edx, cl", \
"shr eax, 31", \
"add eax, edx", \
"ENDPROC:" parm [ecx] [edx] value [eax] modify exact [ecx edx eax ebx esi]
I tried in a loop in openwatcom and I get ~5 time faster divisions than idiv.
I am sure this could help with fast doom for FixedDiv. extra bit shifting would be needed though.
from fastdoom.
Also openwatcom can handle C99 with the -za99
and -aa
falgs it is not required here, we only need long long
from fastdoom.
Oh I forgot to reply to this commit in the past! I did try using this division method, it works but didn't found any loop with the same DIV or IDIV called everytime with the same divisor, ID Software already tried to remove all possible divisions. I'll look again to see if something can be optimized.
from fastdoom.
You are right, I thought I could optimize R_ClearPlanes the your idea of iprojection but I get even more rounding problems and I am unable to gain any measurable overall performances.
from fastdoom.
Related Issues (20)
- VRAM uninitialized memory on direct rendering VESA modes (> 320x200)
- Automap not filling black background properly on VESA High res modes
- Crash on High res VESA modes without LFB support
- Some floors are not dealing damage on FastDoom 0.9.9 dev builds HOT 1
- Off by 1 error rendering on VESA HighRes new modes (above status bar) HOT 2
- Finale text is drawn vertically
- Wrong finale background graphics on VESA HighRes modes HOT 1
- Episode 4 of Ultimate Doom finale screen is broken
- Add method to write finale screen text faster HOT 1
- Episode 3 of Ultimate Doom finale screen is broken on VESA HighRes modes HOT 2
- Partial missing background on 1280x1024 direct mode
- Off by 1 rendering errors on VESA 1024x768 (both direct and backbuffer) HOT 1
- Quick question about converting music for PCM mode HOT 2
- Add size optimized rendering functions (rolled)
- Fastdoom and SoftMPU and awe HOT 3
- Timedemo demo2 of Ultimate Doom 1.9 is broken HOT 1
- VESA modes card testing HOT 5
- crash on startup for a Pentium 3 HOT 26
- Audio startup bug HOT 5
- Fix "Save game buffer overflow"
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fastdoom.