fay59 / x86doc Goto Github PK

View Code? Open in Web Editor NEW

439.0 439.0 76.0 2.54 MB

HTML representation of the Intel x86 instructions documentation.

Home Page: http://www.felixcloutier.com/x86

License: The Unlicense

Python 99.49% CSS 0.51%

x86doc's People

Contributors

Stargazers

Watchers

x86doc's Issues

Footnotes not separated out to the end of the Description section

For example, http://felixcloutier.com/x86/MOVDQU.html

This instruction can be used to load a YMM register from a 256-bit memory

If alignment checking is enabled (CR0.AM = 1, RFLAGS.AC = 1, and CPL = 3), an alignment-check exception (#AC) may or may not be generated (depending on processor implementation) when the operand is not aligned on an 8-byte boundary.

location, to store the contents of a YMM register into a 256-bit memory location, or to move data between two YMM registers.

Should be "This instruction can be used to load a YMM register from a 256-bit memory location, to store ...", and then the 1. If alignment checking... footnote at the end of the whole section.

Intel's PDF puts footnotes at the end of a page, with a horizontal line separating them from the rest of the text.

This issue is quite confusing for the MOV to/from segment registers entry (which isn't currently on the site at all) because the footnote text almost looks like it could be continuing on from the break in the main text.

I searched for footnotes in the PDF by searching for 1. (with a trailing space).

Print PDF issue

Hi,
In the "How To Run" step3 : you said that we need to print pdf before runing extract.py script.
I used CutePDF to print (vol A & B) to PDF, but i had got error when runing extract.py.
So could you please well explain the step3 ?

Regards.

Is this repo still maintained?

encounter error in python extract.py vol2a.pdf vol2b.pdf

Another extraction fails (and an offer)

Offer

If you fix my error, I will convert the output to a set of Python dictionaries.

A dictionary using opcode hex values as keys to access opcode name and info
A dictionary using opcode name to access opcode hex values and info.

Hopefully, you will want to pull my dictionaries into your offering.

Question

Is it unreasonable to use extract.py on the full converted PDFs as opposed to clipped at AAA?

Error seen during extraction

Writing to html/FCLEX:FNCLEX.html
[<OpenTag p >, <OpenTag em >, <OpenTag sup >]
[<OpenTag p >, u'W', u'h', u'e', u'n', u' ', u'o', u'p', u'e', u'r', u'a', u't', u'i', u'n', u'g', u' ', u'a', u' ', u'P', u'e', u'n', u't', u'i', u'u', u'm', u' ', u'o', u'r', u' ', u'I', u'n', u't', u'e', u'l', u'4', u'8', u'6', u' ', u'p', u'r', u'o', u'c', u'e', u's', u's', u'o', u'r', u' ', u'i', u'n', u' ', u'M', u'S', u'-', u'D', u'O', u'S', u'*', u' ', u'c', u'o', u'm', u'p', u'a', u't', u'i', u'b', u'i', u'l', u'i', u't', u'y', u' ', u'm', u'o', u'd', u'e', u',', u' ', u'i', u't', u' ', u'i', u's', u' ', u'p', u'o', u's', u's', u'i', u'b', u'l', u'e', u' ', u'(', u'u', u'n', u'd', u'e', u'r', u' ', u'u', u'n', u'u', u's', u'u', u'a', u'l', ' ', u'c', u'i', u'r', u'c', u'u', u'm', u's', u't', u'a', u'n', u'c', u'e', u's', u')', u' ', u'f', u'o', u'r', u' ', u'a', u'n', u' ', u'F', u'N', u'C', u'L', u'E', u'X', u' ', u'i', u'n', u's', u't', u'r', u'u', u'c', u't', u'i', u'o', u'n', u' ', u't', u'o', u' ', u'b', u'e', u' ', u'i', u'n', u't', u'e', u'r', u'r', u'u', u'p', u't', u'e', u'd', u' ', u'p', u'r', u'i', u'o', u'r', u' ', u't', u'o', u' ', u'b', u'e', u'i', u'n', u'g', u' ', u'e', u'x', u'e', u'c', u'u', u't', u'e', u'd', u' ', u't', u'o', u' ', u'h', u'a', u'n', u'd', u'l', u'e', u' ', u'a', u' ', u'p', u'e', u'n', u'd', u'i', u'n', u'g', u' ', u'F', u'P', u'U', u' ', u'e', u'x', u'c', u'e', u'p', u'-', u't', u'i', u'o', u'n', u'.', u' ', u'S', u'e', u'e', u' ', u't', u'h', u'e', u' ', u's', u'e', u'c', u't', u'i', u'o', u'n', u' ', u't', u'i', u't', u'l', u'e', u'd', u' ', u'\u201c', u'N', u'o', u'-', u'W', u'a', u'i', u't', u' ', u'F', u'P', u'U', u' ', u'I', u'n', u's', u't', u'r', u'u', u'c', u't', u'i', u'o', u'n', u's', u' ', u'C', u'a', u'n', u' ', u'G', u'e', u't', u' ', u'F', u'P', u'U', u' ', u'I', u'n', u't', u'e', u'r', u'r', u'u', u'p', u't', u' ', u'i', u'n', u' ', u'W', u'i', u'n', u'd', u'o', u'w', u'\u201d', u' ', u'i', u'n', u' ', u'A', u'p', u'p', u'e', u'n', u'd', u'i', u'x', u' ', u'D', u' ', u'o', u'f', u' ', u't', u'h', u'e', u' ', <OpenTag em >, u'I', u'n', u't', u'e', u'l', <OpenTag sup >, u'\xae', ' ', u'6', u'4', u' ', u'a', u'n', u'd', u' ', u'I', u'A', u'-', u'3', u'2', u' ', u'A', u'r', u'c', u'h', u'i', u't', u'e', u'c', u't', u'u', u'r', u'e', u's', u' ', u'S', u'o', u'f', u't', u'w', u'a', u'r', u'e', u' ', u'D', u'e', u'v', u'e', u'l', u'o', u'p', u'e', u'r', u'\u2019', u's', u' ', u'M', u'a', u'n', u'u', u'a', u'l', u',', u' ', u'V', u'o', u'l', u'u', u'm', u'e', u' ', u'1', <CloseTag em>, u',', u' ', u'f', u'o', u'r', u' ', u'a', u' ', u'd', u'e', u's', u'c', u'r', u'i', u'p', u't', u'i', u'o', u'n', u' ', u'o', u'f', u' ', u't', u'h', u'e', u's', u'e', u' ', u'c', u'i', u'r', u'c', u'u', u'm', u's', u't', u'a', u'n', u'c', u'e', u's', u'.', u' ', u'A', u'n', ' ', u'F', u'N', u'C', u'L', u'E', u'X', u' ', u'i', u'n', u's', u't', u'r', u'u', u'c', u't', u'i', u'o', u'n', u' ', u'c', u'a', u'n', u'n', u'o', u't', u' ', u'b', u'e', u' ', u'i', u'n', u't', u'e', u'r', u'r', u'u', u'p', u't', u'e', u'd', u' ', u'i', u'n', u' ', u't', u'h', u'i', u's', u' ', u'w', u'a', u'y', u' ', u'o', u'n', u' ', u'a', u' ', u'P', u'e', u'n', u't', u'i', u'u', u'm', u' ', u'4', u',', u' ', u'I', u'n', u't', u'e', u'l', u' ', u'X', u'e', u'o', u'n', u',', u' ', u'o', u'r', u' ', u'P', u'6', u' ', u'f', u'a', u'm', u'i', u'l', u'y', u' ', u'p', u'r', u'o', u'c', u'e', u's', u's', u'o', u'r', u'.']
Traceback (most recent call last):
  File "extract.py", line 41, in <module>
    result = main(sys.argv)
  File "extract.py", line 33, in main
    parser.process_page(page)
  File "/Users/jlettvin/Desktop/github/x86doc/x86manual.py", line 303, in process_page
    self.end_page(page)
  File "/Users/jlettvin/Desktop/github/x86doc/x86manual.py", line 255, in end_page
    self.flush()
  File "/Users/jlettvin/Desktop/github/x86doc/x86manual.py", line 239, in flush
    self.__output_file(displayable)
  File "/Users/jlettvin/Desktop/github/x86doc/x86manual.py", line 354, in __output_file
    file_data = self.__output_page(displayable).encode("UTF-8")
  File "/Users/jlettvin/Desktop/github/x86doc/x86manual.py", line 373, in __output_page
    text.append(self.__output_html(element))
  File "/Users/jlettvin/Desktop/github/x86doc/x86manual.py", line 385, in __output_html
    result = self.__output_text(element)
  File "/Users/jlettvin/Desktop/github/x86doc/x86manual.py", line 574, in __output_text
    text.autoclose()
  File "/Users/jlettvin/Desktop/github/x86doc/htmltext.py", line 56, in autoclose
    raise Exception("autoclose mismatch")
Exception: autoclose mismatch

Extraction fails

$ python2 extract.py vol2a.pdf vol2b.pdf
Processing page 1
[...]
Processing page 670
Processing page 671
Processing page 672
Writing to html/Intel® 64 and IA.html
Traceback (most recent call last):
  File "extract.py", line 40, in <module>
    result = main(sys.argv)
  File "extract.py", line 34, in main
    parser.flush()
  File "x86manual.py", line 239, in flush
    self.__output_file(displayable)
  File "x86manual.py", line 354, in __output_file
    file_data = self.__output_page(displayable).encode("UTF-8")
  File "x86manual.py", line 373, in __output_page
    text.append(self.__output_html(element))
  File "x86manual.py", line 385, in __output_html
    result = self.__output_text(element)
  File "x86manual.py", line 539, in __output_text
    elif element.font_name() == "NeoSansIntel" and self.__title_stack[-1] == "operation":
IndexError: list index out of range

Tables with sub-column headers (like in CMPPD) get messed up

http://felixcloutier.com/x86/CMPPD.html is pretty messed up; the column headers seem to repeat everything. The main table body seem to be ok.

It's a tricky table because on "main" column has 4 sub-columns each with their own header.
https://github.com/HJLebbink/asm-dude/wiki/CMPPD formats the headers correctly, but the table is so wide that it needs a scroll bar. It's usable if you click in the table so you can left/right arrow to scroll sideways without having to leave your place to click on the scroll bar itself.

ImportError: No module named pdfminer.pdfdocument

No go. Module is installed.

Opcode tables messed up for some instructions (formatted as non-table)

Some entries have clean tables for the various forms of the instruction, like 66 0F C2 /r ib CMPPD xmm1, xmm2/m128, imm8.

But others don't use table formatting, and are a total mess: In http://felixcloutier.com/x86/IMUL.html every cell becomes a separate paragraph.

Only 1 of the 3 MOV entries is present (and it's the debug-register one, not regular integer)

http://felixcloutier.com/x86/MOV.html is the entry for MOV r32, DR0–DR7.

In a fork of this project, https://github.com/HJLebbink/asm-dude/wiki/MOV is regular GP-register mov, like MOV r/m32,r32.

But HJLebbink's fork seems to have lost the debug-register and control-register forms. IIRC, http://felixcloutier.com/x86 used to have all 3 separate entries from the Intel PDF:

MOV—Move
MOV—Move to/from Control Registers
MOV—Move to/from Debug Registers

(which appear in that order in the PDF).

So HJLebbink kept the first entry, this revision kept the last entry?

Note that the problem isn't present for MOVQ: the index has 2 entries for MOVQ. But one of them is actually MOVD/MOVQ, so the HTML pages have different URLs.

Much post-processing needed

I've ported the repo to python3 and pdfminer.six.
Apart from some minor issues it seems to parse OK, but there is still a lot of manual/scripted post-processing needed. Could you please show what post-processing steps you have taken to create the website from the files produced by the parser?

Regards

Missing instruction "setne"

First, this website is really very good !!!!!! Easy to read and find instruction compared to intel's PDF. But I can not find "setne".

Bad html output for PSRLW/PSRLD/PSRLQ instructions.

Hi,
It seems that this page : http://www.felixcloutier.com/x86/PSRLW:PSRLD:PSRLQ.html wasn't generated correctly.Html output looks more or less as a plaint text.

Regards,
Mahdi.

fay59 / x86doc Goto Github PK

x86doc's People

Contributors

Stargazers

Watchers

Forkers

x86doc's Issues

Offer

Question

Error seen during extraction

Recommend Projects

Recommend Topics

Recommend Org

Jobs