GithubHelp home page GithubHelp logo

galoisinc / elf-edit Goto Github PK

View Code? Open in Web Editor NEW
36.0 36.0 6.0 1001 KB

The elf-edit library provides a datatype suitable for reading and writing Elf files.

License: Other

Haskell 96.26% Makefile 0.16% C 0.16% Roff 3.42%

elf-edit's People

Contributors

benjaminselfridge avatar dagit avatar dmwit avatar elliottt avatar erikcharlebois avatar joehendrix avatar kquick avatar langston-barrett avatar ntc2 avatar ptival avatar ryanglscott avatar simonjwinwood avatar thebendavis avatar tomahawkins avatar tommd avatar travitch avatar yav avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

elf-edit's Issues

Correct e_phoff

The Linux function for loading binaries [load_elf_binary] treats the e_phoff field in two ways:

  1. It is an offset from the beginning of the file used to locate the phdr table.
  2. It is used to calculate the virtual address of the program header table that is passed into the binary in the AT_PHDR field of the auxillary structure used to initialize the binary.

Existing libc implementations will parse the program header table in (2) to initialize TLS and other state. If this address is incorrect, then the program may crash.

Relevant to this is that the Linux kernel makes certain assumptions about the layout of the Elf file. In particular, it computes the "load address" load_addr of the program. In the common case, this is the virtual address of the first program segment. It then adds the value of e_phoff to this see.

To check binaries will run, the simplest check would be to verify that the program header table in executables appears within the first segment.

Identity transformation may corrupt file for some usecases

I wrote a simple script which applies identity transformation to test this lib. So the code is super simple and inspired by tests:

{-# LANGUAGE DataKinds #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE RankNTypes #-}
module Main
  (
    main,
    strip_ghc_symbols,
  ) where


import qualified Data.ByteString as B
import qualified Data.ByteString.Lazy as L
import qualified Data.ElfEdit as EE
import qualified System.IO as IO

in_filename :: FilePath
in_filename = "./in.elf"

out_filename :: FilePath
out_filename = "./out.elf"


strip_ghc_symbols :: FilePath -> FilePath -> IO ()
strip_ghc_symbols in_path out_path =
  IO.withBinaryFile in_filename IO.ReadMode $ \in_h -> do
    bs <- B.hGetContents in_h
    withElf bs $ \e -> do

      let e_final = e
      -- write to output
      IO.withBinaryFile out_filename IO.WriteMode $ \out_h -> do
        B.hPut out_h $ L.toStrict (EE.renderElf e_final)


main :: IO ()
main = do
  putStrLn "hello, kitty"
  strip_ghc_symbols in_filename out_filename


--
withElf :: B.ByteString -> (forall w . EE.Elf w -> IO ()) -> IO ()
withElf bs f =
  case EE.parseElf bs of
    EE.Elf32Res err e32
      | null err  -> f e32
      | otherwise -> fail ("Failed to parse elf file: " ++ show err)
    EE.Elf64Res err e64
      | null err  -> f e64
      | otherwise -> fail ("Failed to parse elf file: " ++ show err)
    EE.ElfHeaderError _ e -> fail $ "Failed to parse elf file: " ++ show e

Unfortunately, the generated file doesn't look as correct as the original one.

lastg-mbp:fix-ghc-symbols lastg$ objdump -t in.elf > /dev/null
lastg-mbp:fix-ghc-symbols lastg$ echo $?
0
lastg-mbp:fix-ghc-symbols lastg$ objdump -t out.elf
/Applications/Xcode_9.3.0_fb.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/objdump: 'out.elf': Invalid data was encountered while parsing the file
lastg-mbp:fix-ghc-symbols lastg$ echo $?
1
lastg-mbp:fix-ghc-symbols lastg$ objdump -version
Apple LLVM version 9.1.0 (clang-902.0.39.1)
  Optimized build.
  Default target: x86_64-apple-darwin17.5.0
  Host CPU: skylake

  Registered Targets:
    aarch64    - AArch64 (little endian)
    aarch64_be - AArch64 (big endian)
    arm        - ARM
    arm64      - ARM64 (little endian)
    armeb      - ARM (big endian)
    thumb      - Thumb
    thumbeb    - Thumb (big endian)
    x86        - 32-bit X86: Pentium-Pro and above
    x86-64     - 64-bit X86: EM64T and AMD64

But in the same time out.elf seems to be valid for readelf and eu-readelf. And it's still runnable.

bins.zip

`dynSymEntry` doesn't consult `VersionDef`s to find symbol versions

Here is a test harness program which loads /lib/x86_64-linux-gnu/libz.so.1.2.11, retrieves its dynamic symbol table, and prints each symbol table entry along with its version information:

{-# LANGUAGE GADTs #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE RankNTypes #-}
{-# OPTIONS_GHC -Wall #-}
module Main (main) where

import           Control.Monad ( guard )
import qualified Data.ByteString as BS
import qualified Data.ByteString.Char8 as BSC
import qualified Data.Foldable as F
import           Data.Maybe ( listToMaybe )
import qualified Data.Vector as DV
import           Data.Word ( Word32 )

import qualified Data.ElfEdit as DE

withElfHeader :: BS.ByteString -> (forall w . DE.ElfHeaderInfo w -> r) -> r
withElfHeader bs f =
  case DE.decodeElfHeaderInfo bs of
    Left (_,err) -> error ("Failed to parse ELF file: " ++ show err)
    Right (DE.SomeElf e) -> f e

elfDynamicSymbolTable ::
  DE.ElfHeader w ->
  BS.ByteString ->
  -- ^ File contents
  DV.Vector (DE.Shdr Word32 (DE.ElfWordType w)) ->
  -- ^ Section header table
  Maybe (DV.Vector (DE.SymtabEntry BS.ByteString (DE.ElfWordType w)))
elfDynamicSymbolTable hdr contents shdrs = DE.elfClassInstances (DE.headerClass hdr) $ do
  guard (DE.headerType hdr == DE.ET_DYN)
  symtab <-
    case DV.toList $ DV.filter (\s -> DE.shdrType s == DE.SHT_DYNSYM) shdrs of
      [symtab] -> Just symtab
      _        -> Nothing
  let strtabIdx = DE.shdrLink symtab
  strtab <- shdrs DV.!? fromIntegral strtabIdx
  let shdrData shdr = DE.slice (DE.shdrFileRange shdr) contents
  let symtabData = shdrData symtab
  let strtabData = shdrData strtab
  case DE.decodeSymtab cl dta strtabData symtabData of
    Left _ -> Nothing
    Right entries -> Just entries
  where
    cl  = DE.headerClass hdr
    dta = DE.headerData hdr

main :: IO ()
main = do
  bs <- BS.readFile "/lib/x86_64-linux-gnu/libz.so.1.2.11"
  withElfHeader bs $ \hdrInfo -> do
    let hdr = DE.header hdrInfo
    let cl  = DE.headerClass hdr
    let dta = DE.headerData hdr
    let (_, elf) = DE.getElf hdrInfo
    let elfPhdrs = DE.headerPhdrs hdrInfo
    let elfShdrs = DE.headerShdrs hdrInfo
    let elfBytes = DE.headerFileContents hdrInfo
    let mbRes = DE.elfClassInstances cl $ do
          entries <- elfDynamicSymbolTable hdr elfBytes elfShdrs
          vam <- DE.virtAddrMap elfBytes elfPhdrs
          rawDynSec <- listToMaybe $ DE.findSectionByName (BSC.pack ".dynamic") elf
          let dynBytes = DE.elfSectionData rawDynSec
          dynSec <- case DE.dynamicEntries dta (DE.headerClass hdr) dynBytes of
            Left _dynErr -> Nothing
            Right dynSec -> return dynSec
          verReqMap <- case DE.dynVersionReqMap dynSec vam of
            Left _dynErr -> Nothing
            Right vrm -> return vrm
          Just $ DV.imap (\symIdx ste -> ( DE.steName ste
                                         , case DE.dynSymEntry dynSec vam verReqMap (fromIntegral symIdx) of
                                             Left x -> show x
                                             Right (_, DE.VersionLocal) -> "VersionLocal"
                                             Right (_, DE.VersionGlobal) -> "VersionGlobal"
                                             Right (_, DE.VersionSpecific vi) -> "VersionSpecific: " ++ show vi
                                         )) entries
    F.for_ mbRes $ \res ->
      DV.forM_ res $ \(nm, ver) ->
        putStrLn $ BSC.unpack nm ++ ", " ++ ver

Unfortunately, GitHub won't let me attach /lib/x86_64-linux-gnu/libz.so.1.2.11, but what makes this particular .so file so interesting is its output:

, VersionLocal
__snprintf_chk, VersionSpecific: VersionId {verFile = "libc.so.6", verName = "GLIBC_2.3.4"}
free, VersionSpecific: VersionId {verFile = "libc.so.6", verName = "GLIBC_2.2.5"}
__errno_location, VersionSpecific: VersionId {verFile = "libc.so.6", verName = "GLIBC_2.2.5"}
_ITM_deregisterTMCloneTable, VersionLocal
write, VersionSpecific: VersionId {verFile = "libc.so.6", verName = "GLIBC_2.2.5"}
strlen, VersionSpecific: VersionId {verFile = "libc.so.6", verName = "GLIBC_2.2.5"}
__stack_chk_fail, VersionSpecific: VersionId {verFile = "libc.so.6", verName = "GLIBC_2.4"}
snprintf, VersionSpecific: VersionId {verFile = "libc.so.6", verName = "GLIBC_2.2.5"}
memset, VersionSpecific: VersionId {verFile = "libc.so.6", verName = "GLIBC_2.2.5"}
close, VersionSpecific: VersionId {verFile = "libc.so.6", verName = "GLIBC_2.2.5"}
memchr, VersionSpecific: VersionId {verFile = "libc.so.6", verName = "GLIBC_2.2.5"}
read, VersionSpecific: VersionId {verFile = "libc.so.6", verName = "GLIBC_2.2.5"}
__gmon_start__, VersionLocal
memcpy, VersionSpecific: VersionId {verFile = "libc.so.6", verName = "GLIBC_2.14"}
malloc, VersionSpecific: VersionId {verFile = "libc.so.6", verName = "GLIBC_2.2.5"}
__vsnprintf_chk, VersionSpecific: VersionId {verFile = "libc.so.6", verName = "GLIBC_2.3.4"}
open, VersionSpecific: VersionId {verFile = "libc.so.6", verName = "GLIBC_2.2.5"}
lseek64, VersionSpecific: VersionId {verFile = "libc.so.6", verName = "GLIBC_2.2.5"}
_ITM_registerTMCloneTable, VersionLocal
strerror, VersionSpecific: VersionId {verFile = "libc.so.6", verName = "GLIBC_2.2.5"}
__cxa_finalize, VersionSpecific: VersionId {verFile = "libc.so.6", verName = "GLIBC_2.2.5"}
ZLIB_1.2.2, Symbol ZLIB_1.2.2 has unresolvable version requirement index 5.
inflateEnd, VersionGlobal
inflateInit2_, VersionGlobal
crc32_z, Symbol crc32_z has unresolvable version requirement index 14.
deflate, VersionGlobal
deflateTune, Symbol deflateTune has unresolvable version requirement index 6.
deflatePrime, Symbol deflatePrime has unresolvable version requirement index 4.
gzerror, VersionGlobal
inflateReset, VersionGlobal
ZLIB_1.2.0.2, Symbol ZLIB_1.2.0.2 has unresolvable version requirement index 3.
gztell, VersionGlobal
gzflush, VersionGlobal
inflateMark, Symbol inflateMark has unresolvable version requirement index 9.
inflateSyncPoint, VersionGlobal
ZLIB_1.2.9, Symbol ZLIB_1.2.9 has unresolvable version requirement index 14.
adler32_z, Symbol adler32_z has unresolvable version requirement index 14.
deflateSetHeader, Symbol deflateSetHeader has unresolvable version requirement index 5.
ZLIB_1.2.0.8, Symbol ZLIB_1.2.0.8 has unresolvable version requirement index 4.
inflateInit_, VersionGlobal
inflateBackEnd, Symbol inflateBackEnd has unresolvable version requirement index 2.
gzbuffer, Symbol gzbuffer has unresolvable version requirement index 10.
deflateGetDictionary, Symbol deflateGetDictionary has unresolvable version requirement index 14.
adler32, VersionGlobal
gzseek, VersionGlobal
gzfread, Symbol gzfread has unresolvable version requirement index 14.
ZLIB_1.2.5.1, Symbol ZLIB_1.2.5.1 has unresolvable version requirement index 11.
deflateResetKeep, Symbol deflateResetKeep has unresolvable version requirement index 12.
ZLIB_1.2.5.2, Symbol ZLIB_1.2.5.2 has unresolvable version requirement index 12.
crc32, VersionGlobal
zError, VersionGlobal
gzfwrite, Symbol gzfwrite has unresolvable version requirement index 14.
gzread, VersionGlobal
deflateCopy, VersionGlobal
inflateGetHeader, Symbol inflateGetHeader has unresolvable version requirement index 5.
gzputc, VersionGlobal
gzgetc, VersionGlobal
gzwrite, VersionGlobal
gzvprintf, Symbol gzvprintf has unresolvable version requirement index 13.
deflateReset, VersionGlobal
gzeof, VersionGlobal
inflate, VersionGlobal
gzopen64, Symbol gzopen64 has unresolvable version requirement index 8.
inflateReset2, Symbol inflateReset2 has unresolvable version requirement index 9.
crc32_combine64, Symbol crc32_combine64 has unresolvable version requirement index 8.
deflateInit_, VersionGlobal
zlibCompileFlags, Symbol zlibCompileFlags has unresolvable version requirement index 3.
gzdirect, Symbol gzdirect has unresolvable version requirement index 6.
ZLIB_1.2.2.3, Symbol ZLIB_1.2.2.3 has unresolvable version requirement index 6.
deflateInit2_, VersionGlobal
deflatePending, Symbol deflatePending has unresolvable version requirement index 11.
ZLIB_1.2.2.4, Symbol ZLIB_1.2.2.4 has unresolvable version requirement index 7.
gzseek64, Symbol gzseek64 has unresolvable version requirement index 8.
gzputs, VersionGlobal
gzgets, VersionGlobal
inflateResetKeep, Symbol inflateResetKeep has unresolvable version requirement index 12.
gzoffset64, Symbol gzoffset64 has unresolvable version requirement index 10.
compressBound, Symbol compressBound has unresolvable version requirement index 2.
deflateParams, VersionGlobal
inflateGetDictionary, Symbol inflateGetDictionary has unresolvable version requirement index 13.
get_crc_table, VersionGlobal
ZLIB_1.2.7.1, Symbol ZLIB_1.2.7.1 has unresolvable version requirement index 13.
inflateBack, Symbol inflateBack has unresolvable version requirement index 2.
gzprintf, VersionGlobal
inflateValidate, Symbol inflateValidate has unresolvable version requirement index 14.
inflateSetDictionary, VersionGlobal
inflatePrime, Symbol inflatePrime has unresolvable version requirement index 7.
adler32_combine, Symbol adler32_combine has unresolvable version requirement index 5.
gzrewind, VersionGlobal
gzclose, VersionGlobal
gzclose_r, Symbol gzclose_r has unresolvable version requirement index 10.
gzdopen, VersionGlobal
zlibVersion, VersionGlobal
gzclearerr, Symbol gzclearerr has unresolvable version requirement index 3.
inflateBackInit_, Symbol inflateBackInit_ has unresolvable version requirement index 2.
ZLIB_1.2.3.3, Symbol ZLIB_1.2.3.3 has unresolvable version requirement index 8.
adler32_combine64, Symbol adler32_combine64 has unresolvable version requirement index 8.
compress, VersionGlobal
gzclose_w, Symbol gzclose_w has unresolvable version requirement index 10.
ZLIB_1.2.3.4, Symbol ZLIB_1.2.3.4 has unresolvable version requirement index 9.
ZLIB_1.2.3.5, Symbol ZLIB_1.2.3.5 has unresolvable version requirement index 10.
gzopen, VersionGlobal
deflateBound, Symbol deflateBound has unresolvable version requirement index 2.
compress2, VersionGlobal
gztell64, Symbol gztell64 has unresolvable version requirement index 8.
uncompress, VersionGlobal
gzoffset, Symbol gzoffset has unresolvable version requirement index 10.
deflateSetDictionary, VersionGlobal
inflateCopy, Symbol inflateCopy has unresolvable version requirement index 2.
gzgetc_, Symbol gzgetc_ has unresolvable version requirement index 12.
deflateEnd, VersionGlobal
crc32_combine, Symbol crc32_combine has unresolvable version requirement index 5.
inflateCodesUsed, Symbol inflateCodesUsed has unresolvable version requirement index 14.
gzsetparams, VersionGlobal
gzungetc, Symbol gzungetc has unresolvable version requirement index 3.
uncompress2, Symbol uncompress2 has unresolvable version requirement index 14.
inflateUndermine, Symbol inflateUndermine has unresolvable version requirement index 8.
ZLIB_1.2.0, Symbol ZLIB_1.2.0 has unresolvable version requirement index 2.
inflateSync, VersionGlobal

There are quite a few results where elf-edit claims that a symbol has an unresolvable version requirement index. readelf, on the other hand, disagrees with this assessment, as it is able to find version numbers for each of these problematic symbols:

$ readelf --dyn-syms /lib/x86_64-linux-gnu/libz.so.1.2.11  

Symbol table '.dynsym' contains 120 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __snprintf_chk@GLIBC_2.3.4 (15)
     2: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND free@GLIBC_2.2.5 (16)
     3: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __errno_location@GLIBC_2.2.5 (16)
     4: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_deregisterTMCloneTab
     5: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND write@GLIBC_2.2.5 (16)
     6: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND strlen@GLIBC_2.2.5 (16)
     7: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __stack_chk_fail@GLIBC_2.4 (17)
     8: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND snprintf@GLIBC_2.2.5 (16)
     9: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND memset@GLIBC_2.2.5 (16)
    10: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND close@GLIBC_2.2.5 (16)
    11: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND memchr@GLIBC_2.2.5 (16)
    12: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND read@GLIBC_2.2.5 (16)
    13: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__
    14: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND memcpy@GLIBC_2.14 (18)
    15: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND malloc@GLIBC_2.2.5 (16)
    16: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __vsnprintf_chk@GLIBC_2.3.4 (15)
    17: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND open@GLIBC_2.2.5 (16)
    18: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND lseek64@GLIBC_2.2.5 (16)
    19: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_registerTMCloneTable
    20: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND strerror@GLIBC_2.2.5 (16)
    21: 0000000000000000     0 FUNC    WEAK   DEFAULT  UND __cxa_finalize@GLIBC_2.2.5 (16)
    22: 0000000000000000     0 OBJECT  GLOBAL DEFAULT  ABS ZLIB_1.2.2
    23: 000000000000c1e0   154 FUNC    GLOBAL DEFAULT   15 inflateEnd
    24: 0000000000009c80   240 FUNC    GLOBAL DEFAULT   15 inflateInit2_
    25: 0000000000003280    12 IFUNC   GLOBAL DEFAULT   15 crc32_z@@ZLIB_1.2.9
    26: 0000000000005800  6240 FUNC    GLOBAL DEFAULT   15 deflate
    27: 0000000000005550   170 FUNC    GLOBAL DEFAULT   15 deflateTune@@ZLIB_1.2.2.3
    28: 0000000000005430   282 FUNC    GLOBAL DEFAULT   15 deflatePrime@@ZLIB_1.2.0.8
    29: 0000000000010cf0    83 FUNC    GLOBAL DEFAULT   15 gzerror
    30: 0000000000009b40    82 FUNC    GLOBAL DEFAULT   15 inflateReset
    31: 0000000000000000     0 OBJECT  GLOBAL DEFAULT  ABS ZLIB_1.2.0.2
    32: 0000000000010c50     9 FUNC    GLOBAL DEFAULT   15 gztell
    33: 0000000000012b80   237 FUNC    GLOBAL DEFAULT   15 gzflush
    34: 000000000000ccd0   135 FUNC    GLOBAL DEFAULT   15 inflateMark@@ZLIB_1.2.3.4
    35: 000000000000c8c0    94 FUNC    GLOBAL DEFAULT   15 inflateSyncPoint
    36: 0000000000000000     0 OBJECT  GLOBAL DEFAULT  ABS ZLIB_1.2.9
    37: 0000000000002340  1752 FUNC    GLOBAL DEFAULT   15 adler32_z@@ZLIB_1.2.9
    38: 00000000000052f0   138 FUNC    GLOBAL DEFAULT   15 deflateSetHeader@@ZLIB_1.2.2
    39: 0000000000000000     0 OBJECT  GLOBAL DEFAULT  ABS ZLIB_1.2.0.8
    40: 0000000000009d70    19 FUNC    GLOBAL DEFAULT   15 inflateInit_
    41: 0000000000008e00    62 FUNC    GLOBAL DEFAULT   15 inflateBackEnd@@ZLIB_1.2.0
    42: 0000000000010950    70 FUNC    GLOBAL DEFAULT   15 gzbuffer@@ZLIB_1.2.3.5
    43: 0000000000004fd0   240 FUNC    GLOBAL DEFAULT   15 deflateGetDictionary@@ZLIB_1.2.9
    44: 0000000000002a20    11 FUNC    GLOBAL DEFAULT   15 adler32
    45: 0000000000010c00     9 FUNC    GLOBAL DEFAULT   15 gzseek
    46: 00000000000118d0   292 FUNC    GLOBAL DEFAULT   15 gzfread@@ZLIB_1.2.9
    47: 0000000000000000     0 OBJECT  GLOBAL DEFAULT  ABS ZLIB_1.2.5.1
    48: 00000000000050c0   310 FUNC    GLOBAL DEFAULT   15 deflateResetKeep@@ZLIB_1.2.5.2
    49: 0000000000000000     0 OBJECT  GLOBAL DEFAULT  ABS ZLIB_1.2.5.2
    50: 0000000000003290    11 FUNC    GLOBAL DEFAULT   15 crc32
    51: 0000000000010180    25 FUNC    GLOBAL DEFAULT   15 zError
    52: 0000000000012640   129 FUNC    GLOBAL DEFAULT   15 gzfwrite@@ZLIB_1.2.9
    53: 0000000000011710   444 FUNC    GLOBAL DEFAULT   15 gzread
    54: 0000000000007760   646 FUNC    GLOBAL DEFAULT   15 deflateCopy
    55: 000000000000c550   106 FUNC    GLOBAL DEFAULT   15 inflateGetHeader@@ZLIB_1.2.2
    56: 00000000000126d0   391 FUNC    GLOBAL DEFAULT   15 gzputc
    57: 0000000000011a00   195 FUNC    GLOBAL DEFAULT   15 gzgetc
    58: 00000000000125d0    99 FUNC    GLOBAL DEFAULT   15 gzwrite
    59: 00000000000128c0   507 FUNC    GLOBAL DEFAULT   15 gzvprintf@@ZLIB_1.2.7.1
    60: 0000000000005200   227 FUNC    GLOBAL DEFAULT   15 deflateReset
    61: 0000000000010cd0    28 FUNC    GLOBAL DEFAULT   15 gzeof
    62: 0000000000009e90  9032 FUNC    GLOBAL DEFAULT   15 inflate
    63: 00000000000108b0    17 FUNC    GLOBAL DEFAULT   15 gzopen64@@ZLIB_1.2.3.3
    64: 0000000000009ba0   214 FUNC    GLOBAL DEFAULT   15 inflateReset2@@ZLIB_1.2.3.4
    65: 00000000000032b0     9 FUNC    GLOBAL DEFAULT   15 crc32_combine64@@ZLIB_1.2.3.3
    66: 0000000000007730    39 FUNC    GLOBAL DEFAULT   15 deflateInit_
    67: 0000000000010170    10 FUNC    GLOBAL DEFAULT   15 zlibCompileFlags@@ZLIB_1.2.0.2
    68: 0000000000011d90    83 FUNC    GLOBAL DEFAULT   15 gzdirect@@ZLIB_1.2.2.3
    69: 0000000000000000     0 OBJECT  GLOBAL DEFAULT  ABS ZLIB_1.2.2.3
    70: 0000000000007430   764 FUNC    GLOBAL DEFAULT   15 deflateInit2_
    71: 0000000000005380   161 FUNC    GLOBAL DEFAULT   15 deflatePending@@ZLIB_1.2.5.1
    72: 0000000000000000     0 OBJECT  GLOBAL DEFAULT  ABS ZLIB_1.2.2.4
    73: 0000000000010a60   402 FUNC    GLOBAL DEFAULT   15 gzseek64@@ZLIB_1.2.3.3
    74: 0000000000012860    96 FUNC    GLOBAL DEFAULT   15 gzputs
    75: 0000000000011c20   360 FUNC    GLOBAL DEFAULT   15 gzgets
    76: 0000000000009a50   234 FUNC    GLOBAL DEFAULT   15 inflateResetKeep@@ZLIB_1.2.5.2
    77: 0000000000010c60    94 FUNC    GLOBAL DEFAULT   15 gzoffset64@@ZLIB_1.2.3.5
    78: 0000000000010310    34 FUNC    GLOBAL DEFAULT   15 compressBound@@ZLIB_1.2.0
    79: 0000000000007060   665 FUNC    GLOBAL DEFAULT   15 deflateParams
    80: 000000000000c280   168 FUNC    GLOBAL DEFAULT   15 inflateGetDictionary@@ZLIB_1.2.7.1
    81: 0000000000003270    12 FUNC    GLOBAL DEFAULT   15 get_crc_table
    82: 0000000000000000     0 OBJECT  GLOBAL DEFAULT  ABS ZLIB_1.2.7.1
    83: 0000000000007ae0  4882 FUNC    GLOBAL DEFAULT   15 inflateBack@@ZLIB_1.2.0
    84: 0000000000012ac0   181 FUNC    GLOBAL DEFAULT   15 gzprintf
    85: 000000000000cc50   117 FUNC    GLOBAL DEFAULT   15 inflateValidate@@ZLIB_1.2.9
    86: 000000000000c330   540 FUNC    GLOBAL DEFAULT   15 inflateSetDictionary
    87: 0000000000009d90   146 FUNC    GLOBAL DEFAULT   15 inflatePrime@@ZLIB_1.2.2.4
    88: 0000000000002a30   228 FUNC    GLOBAL DEFAULT   15 adler32_combine@@ZLIB_1.2.2
    89: 00000000000109a0   187 FUNC    GLOBAL DEFAULT   15 gzrewind
    90: 0000000000010520    43 FUNC    GLOBAL DEFAULT   15 gzclose
    91: 0000000000011df0   152 FUNC    GLOBAL DEFAULT   15 gzclose_r@@ZLIB_1.2.3.5
    92: 00000000000108d0   124 FUNC    GLOBAL DEFAULT   15 gzdopen
    93: 0000000000010160    12 FUNC    GLOBAL DEFAULT   15 zlibVersion
    94: 0000000000010d50   113 FUNC    GLOBAL DEFAULT   15 gzclearerr@@ZLIB_1.2.0.2
    95: 00000000000079f0   238 FUNC    GLOBAL DEFAULT   15 inflateBackInit_@@ZLIB_1.2.0
    96: 0000000000000000     0 OBJECT  GLOBAL DEFAULT  ABS ZLIB_1.2.3.3
    97: 0000000000002b20   228 FUNC    GLOBAL DEFAULT   15 adler32_combine64@@ZLIB_1.2.3.3
    98: 0000000000010300    15 FUNC    GLOBAL DEFAULT   15 compress
    99: 0000000000012dd0   363 FUNC    GLOBAL DEFAULT   15 gzclose_w@@ZLIB_1.2.3.5
   100: 0000000000000000     0 OBJECT  GLOBAL DEFAULT  ABS ZLIB_1.2.3.4
   101: 0000000000000000     0 OBJECT  GLOBAL DEFAULT  ABS ZLIB_1.2.3.5
   102: 0000000000010890    17 FUNC    GLOBAL DEFAULT   15 gzopen
   103: 0000000000005600   376 FUNC    GLOBAL DEFAULT   15 deflateBound@@ZLIB_1.2.0
   104: 00000000000101c0   317 FUNC    GLOBAL DEFAULT   15 compress2
   105: 0000000000010c10    56 FUNC    GLOBAL DEFAULT   15 gztell64@@ZLIB_1.2.3.3
   106: 0000000000010500    28 FUNC    GLOBAL DEFAULT   15 uncompress
   107: 0000000000010cc0     9 FUNC    GLOBAL DEFAULT   15 gzoffset@@ZLIB_1.2.3.5
   108: 0000000000004d10   696 FUNC    GLOBAL DEFAULT   15 deflateSetDictionary
   109: 000000000000c920   709 FUNC    GLOBAL DEFAULT   15 inflateCopy@@ZLIB_1.2.0
   110: 0000000000011ad0     9 FUNC    GLOBAL DEFAULT   15 gzgetc_@@ZLIB_1.2.5.2
   111: 0000000000007300   302 FUNC    GLOBAL DEFAULT   15 deflateEnd
   112: 00000000000032a0     9 FUNC    GLOBAL DEFAULT   15 crc32_combine@@ZLIB_1.2.2
   113: 000000000000cd60   104 FUNC    GLOBAL DEFAULT   15 inflateCodesUsed@@ZLIB_1.2.9
   114: 0000000000012c70   347 FUNC    GLOBAL DEFAULT   15 gzsetparams
   115: 0000000000011ae0   318 FUNC    GLOBAL DEFAULT   15 gzungetc@@ZLIB_1.2.0.2
   116: 0000000000010340   448 FUNC    GLOBAL DEFAULT   15 uncompress2@@ZLIB_1.2.9
   117: 000000000000cbf0    86 FUNC    GLOBAL DEFAULT   15 inflateUndermine@@ZLIB_1.2.3.3
   118: 0000000000000000     0 OBJECT  GLOBAL DEFAULT  ABS ZLIB_1.2.0
   119: 000000000000c5c0   768 FUNC    GLOBAL DEFAULT   15 inflateSync

What accounts for this discrepancy? I believe it has to do with the way that dynSymEntry is implemented. When looking up symbol versions, it will only consult the required symbol version definitions in the .gnu.version_r section (by way of its VersionReqMap argument). /lib/x86_64-linux-gnu/libz.so.1.2.11, on the other hand, doesn't just have a .gnu.version_r section—it also has symbol version definitions in a .gnu.version_d section. (See here for a more in-depth explanation of .gnu.version_d and .gnu.version_r.)

In this example, the version information we need lives in the .gnu_version_d section. As a proof of concept, if you tweak this program to consult .gnu.version_d:

$ diff -ru Bug.hs Bug2.hs
--- Bug.hs      2022-04-14 21:36:51.096314470 -0400
+++ Bug2.hs     2022-04-14 21:35:37.867868893 -0400
@@ -8,9 +8,11 @@
 import qualified Data.ByteString as BS
 import qualified Data.ByteString.Char8 as BSC
 import qualified Data.Foldable as F
+import qualified Data.List as L
+import qualified Data.Map as Map
 import           Data.Maybe ( listToMaybe )
 import qualified Data.Vector as DV
-import           Data.Word ( Word32 )
+import           Data.Word ( Word16, Word32 )
 
 import qualified Data.ElfEdit as DE
 
@@ -67,8 +69,18 @@
           verReqMap <- case DE.dynVersionReqMap dynSec vam of
             Left _dynErr -> Nothing
             Right vrm -> return vrm
+          verDefs <- case DE.dynVersionDefs dynSec vam of
+            Left _dynErr -> Nothing
+            Right defs -> return defs
+          verDefMap <- case versionDefMap verDefs of
+            Left _dynErr -> Nothing
+            Right vdm -> return vdm
           Just $ DV.imap (\symIdx ste -> ( DE.steName ste
                                          , case DE.dynSymEntry dynSec vam verReqMap (fromIntegral symIdx) of
+                                             Left x@(DE.UnresolvedVersionReqAuxIndex _ idx) ->
+                                               case Map.lookup idx verDefMap of
+                                                 Just vi -> "VersionSpecific: " ++ show vi
+                                                 Nothing -> show x
                                              Left x -> show x
                                              Right (_, DE.VersionLocal) -> "VersionLocal"
                                              Right (_, DE.VersionGlobal) -> "VersionGlobal"
@@ -77,3 +89,32 @@
     F.for_ mbRes $ \res ->
       DV.forM_ res $ \(nm, ver) ->
         putStrLn $ BSC.unpack nm ++ ", " ++ ver
+
+type VersionDefMap = Map.Map Word16 DE.VersionId
+
+versionDefMap :: [DE.VersionDef] -> Either DE.DynamicError VersionDefMap
+versionDefMap []   = Right Map.empty
+versionDefMap defs = do
+  m0 <- F.foldlM insVersionDef Map.empty defs
+  base_def <-
+    case L.find (\def -> DE.vd_flags def == DE.ver_flg_base) defs of
+      Just def -> Right def
+      Nothing  -> Left $ error "TODO RGS"
+  file <-
+    case Map.lookup (DE.vd_ndx base_def) m0 of
+      Just def -> Right $ DE.vd_string def
+      Nothing  -> Left $ error "TODO RGS"
+  Right $ fmap (\def -> DE.VersionId { DE.verFile = file
+                                     , DE.verName = DE.vd_string def
+                                     })
+               m0
+  where
+    insVersionDef ::
+      Map.Map Word16 DE.VersionDef ->
+      DE.VersionDef ->
+      Either DE.DynamicError (Map.Map Word16 DE.VersionDef)
+    insVersionDef m def =
+      let ndx = DE.vd_ndx def in
+      case Map.lookup ndx m of
+        Nothing -> Right $! Map.insert ndx def m
+        Just{}  -> Left $! DE.DupVersionReqAuxIndex ndx

Then it will successfully discover versions for all symbols:

, VersionLocal
__snprintf_chk, VersionSpecific: VersionId {verFile = "libc.so.6", verName = "GLIBC_2.3.4"}
free, VersionSpecific: VersionId {verFile = "libc.so.6", verName = "GLIBC_2.2.5"}
__errno_location, VersionSpecific: VersionId {verFile = "libc.so.6", verName = "GLIBC_2.2.5"}
_ITM_deregisterTMCloneTable, VersionLocal
write, VersionSpecific: VersionId {verFile = "libc.so.6", verName = "GLIBC_2.2.5"}
strlen, VersionSpecific: VersionId {verFile = "libc.so.6", verName = "GLIBC_2.2.5"}
__stack_chk_fail, VersionSpecific: VersionId {verFile = "libc.so.6", verName = "GLIBC_2.4"}
snprintf, VersionSpecific: VersionId {verFile = "libc.so.6", verName = "GLIBC_2.2.5"}
memset, VersionSpecific: VersionId {verFile = "libc.so.6", verName = "GLIBC_2.2.5"}
close, VersionSpecific: VersionId {verFile = "libc.so.6", verName = "GLIBC_2.2.5"}
memchr, VersionSpecific: VersionId {verFile = "libc.so.6", verName = "GLIBC_2.2.5"}
read, VersionSpecific: VersionId {verFile = "libc.so.6", verName = "GLIBC_2.2.5"}
__gmon_start__, VersionLocal
memcpy, VersionSpecific: VersionId {verFile = "libc.so.6", verName = "GLIBC_2.14"}
malloc, VersionSpecific: VersionId {verFile = "libc.so.6", verName = "GLIBC_2.2.5"}
__vsnprintf_chk, VersionSpecific: VersionId {verFile = "libc.so.6", verName = "GLIBC_2.3.4"}
open, VersionSpecific: VersionId {verFile = "libc.so.6", verName = "GLIBC_2.2.5"}
lseek64, VersionSpecific: VersionId {verFile = "libc.so.6", verName = "GLIBC_2.2.5"}
_ITM_registerTMCloneTable, VersionLocal
strerror, VersionSpecific: VersionId {verFile = "libc.so.6", verName = "GLIBC_2.2.5"}
__cxa_finalize, VersionSpecific: VersionId {verFile = "libc.so.6", verName = "GLIBC_2.2.5"}
ZLIB_1.2.2, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.2"}
inflateEnd, VersionGlobal
inflateInit2_, VersionGlobal
crc32_z, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.9"}
deflate, VersionGlobal
deflateTune, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.2.3"}
deflatePrime, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.0.8"}
gzerror, VersionGlobal
inflateReset, VersionGlobal
ZLIB_1.2.0.2, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.0.2"}
gztell, VersionGlobal
gzflush, VersionGlobal
inflateMark, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.3.4"}
inflateSyncPoint, VersionGlobal
ZLIB_1.2.9, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.9"}
adler32_z, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.9"}
deflateSetHeader, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.2"}
ZLIB_1.2.0.8, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.0.8"}
inflateInit_, VersionGlobal
inflateBackEnd, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.0"}
gzbuffer, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.3.5"}
deflateGetDictionary, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.9"}
adler32, VersionGlobal
gzseek, VersionGlobal
gzfread, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.9"}
ZLIB_1.2.5.1, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.5.1"}
deflateResetKeep, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.5.2"}
ZLIB_1.2.5.2, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.5.2"}
crc32, VersionGlobal
zError, VersionGlobal
gzfwrite, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.9"}
gzread, VersionGlobal
deflateCopy, VersionGlobal
inflateGetHeader, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.2"}
gzputc, VersionGlobal
gzgetc, VersionGlobal
gzwrite, VersionGlobal
gzvprintf, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.7.1"}
deflateReset, VersionGlobal
gzeof, VersionGlobal
inflate, VersionGlobal
gzopen64, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.3.3"}
inflateReset2, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.3.4"}
crc32_combine64, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.3.3"}
deflateInit_, VersionGlobal
zlibCompileFlags, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.0.2"}
gzdirect, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.2.3"}
ZLIB_1.2.2.3, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.2.3"}
deflateInit2_, VersionGlobal
deflatePending, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.5.1"}
ZLIB_1.2.2.4, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.2.4"}
gzseek64, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.3.3"}
gzputs, VersionGlobal
gzgets, VersionGlobal
inflateResetKeep, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.5.2"}
gzoffset64, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.3.5"}
compressBound, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.0"}
deflateParams, VersionGlobal
inflateGetDictionary, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.7.1"}
get_crc_table, VersionGlobal
ZLIB_1.2.7.1, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.7.1"}
inflateBack, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.0"}
gzprintf, VersionGlobal
inflateValidate, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.9"}
inflateSetDictionary, VersionGlobal
inflatePrime, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.2.4"}
adler32_combine, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.2"}
gzrewind, VersionGlobal
gzclose, VersionGlobal
gzclose_r, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.3.5"}
gzdopen, VersionGlobal
zlibVersion, VersionGlobal
gzclearerr, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.0.2"}
inflateBackInit_, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.0"}
ZLIB_1.2.3.3, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.3.3"}
adler32_combine64, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.3.3"}
compress, VersionGlobal
gzclose_w, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.3.5"}
ZLIB_1.2.3.4, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.3.4"}
ZLIB_1.2.3.5, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.3.5"}
gzopen, VersionGlobal
deflateBound, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.0"}
compress2, VersionGlobal
gztell64, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.3.3"}
uncompress, VersionGlobal
gzoffset, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.3.5"}
deflateSetDictionary, VersionGlobal
inflateCopy, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.0"}
gzgetc_, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.5.2"}
deflateEnd, VersionGlobal
crc32_combine, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.2"}
inflateCodesUsed, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.9"}
gzsetparams, VersionGlobal
gzungetc, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.0.2"}
uncompress2, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.9"}
inflateUndermine, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.3.3"}
ZLIB_1.2.0, VersionSpecific: VersionId {verFile = "libz.so.1", verName = "ZLIB_1.2.0"}
inflateSync, VersionGlobal

I propose that elf-edit take .gnu.version_d into account somehow in dynSymEntry. I'm not entirely sure whether it's more correct to first look in .gnu.version_r and then .gnu.version_d or vice versa.

Hackage release?

This library looks great. How mature is it? Time to put it up on Hackage? We'd be interested in using it in inline-java and sparkle.

Is it the case that renderElf . parseElf == id always?

Error parsing ELF files with a zero physical size BSS (but non-zero logical size)

Parsing fails when the input ELF file has a .bss with zero physical size but non-zero logical size. There are two errors: Invalid region .bss. and Invalid region PT_LOAD segment.. It seems like at least some of the problem is in the Data.ElfEdit.Get.insertAtOffset function. While scanning for the place to put the bss region, it seems that insertAtOffset never finds a suitable location. This might be related to how it is computing the size of the bss: I'm not clear on whether this is function should be considering the logical or physical size.

There is a (currently failing) test in the test suite covering this case.

`dynNeeded` returns entries in the opposite order

If you compile this program with gcc fmax.c -o fmax.elf -lm:

#include <math.h>
#include <stdio.h>

float my_fmax(float a, float b) {
  return fmax(a, b);
}

int main() {
  printf("%f\n", my_fmax(1.0, 2.0));
  return 0;
}

Then fmax.elf will have two DT_NEEDED entries, as reported by readelf -d:

$ readelf -d tests/fmax.elf | grep "NEEDED"
 0x0000000000000001 (NEEDED)             Shared library: [libm.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]

Note the order in which the entries appear is libm.so.6 then libc.so.6. elf-edit's dynNeeded function, on the other hand, will return the opposite order! This is because of this code:

insertDynamic :: Dynamic v -> Map.Map ElfDynamicTag [v] -> Map.Map ElfDynamicTag [v]
insertDynamic (Dynamic tag v) = Map.insertWith (++) tag [v]

The use of insertWith (++) means that new entries in the list will be placed at the front, and since buildDynamicMap repeatedly calls insertDynamic as it processes the DT_NEEDED entries from left to right, this will cause them to be inserted in the opposite order in the DynamicMap.

We could technically fix this by changing (++) to flip (++), but this would change the time complexity of buildDynamicMap from linear to quadratic time, since we'd be appending an element to the end of the list at each step rather than preprending it. A better approach, in my opinion, would be to keep insertDynamic as it is but to reverse the list at the end of buildDynamicMap. This is the same approach that gnuLinkedList takes:

gnuLinkedList readFn d = go []
where readNextVal b = (,) <$> readFn b <*> getWord32 d
go prev 0 _ = return (reverse prev)
go prev cnt b =
case strictRunGetOrFail (readNextVal b) b of
Left (_,_,msg) -> Left msg
Right (_,_,(d',next)) -> do
go (d':prev) (cnt-1) (B.drop (fromIntegral next) b)

Generalize `relocTargetBits` to work with non-word-aligned relocation addends (as in PowerPC and RISC-V)

While working on #35, I realized that the relocTargetBits method of IsRelocationType is not capable of handling certain relocation types in PowerPC code. To recap:

-- | Return the number of bits that the rel entry should use for the addend.
--
-- This is commonly the size of a pointer, but may be smaller for some
-- relocation types. This is used for rel entries, where to compute
-- the addend the next @ceiling (bits/8)@ bytes is read out of memory
-- as a list of bytes and the low @bits@ are interpreted as a signed
-- integer (where low uses the elf's endianness) and sign extended.
relocTargetBits :: tp -> Int

Currently, relocTargetBits assumes that the bits needed for the relocation addend will always start at a word alignment. For instance, here are the different addend sizes in the x86-64 System V ABI:

elf-edit-1

Each relocation entry is 64 bits, but certain relocation types may only use 8, 16, or 32 bits of the entry (starting from the LSBs). Regardless, the addend calculations will always happen at 64-bit-sized word boundaries, which greatly simplifies the bitwise arithmetic needed to compute the addends (see this code in macaw).

Things aren't quite so simple in PowerPC, however. For instance, here are the different addend sizes in the PPC32 ABI:

elf-edit-2

This time, each relocation entry is 32 bits. What is unusual is that the low24 and low14 relocation fields do not start from the LSBs of a 32-bit word, but rather from somewhere in the middle of the word. A low24 relocation addend starts from the 7th LSB of the word, and a low14 relocation addend starts from the 17th LSB of the word. This means that we cannot use the addend-computing code in macaw as-is with low24 or low14 relocation fields, as this code does not account for the bit offsets required.

In order to fix this, we will likely need to augment relocTargetBits with additional information about bit offsets (the PPC32 refers to this as "displacement"). Luckily, low24/low14 relocation fields aren't super common, so I am opting not to fix this issue for now, instead leaving this here as a reminder to revisit the issue should we need to support such relocations properly.

`decodeRelaEntries` sometimes returns fewer relocations than `readelf -r`

While adding a test case for #35, I discovered that elf-edit's decodeRelaEntries function will return different results for PPC32 binaries versus PPC64 binaries. To pick a concrete example, let's look at this simple C program:

int main(void) {
  return 0;
}

I used a musl-based PPC32 cross-compiler (obtained from here) and a PPC64 cross-compiler (obtained from here) to compile this program into a PPC32 binary named ppc32-relocs.elf and a PPC64 binary named ppc64-relocs.elf, respectively. I can use readelf -r to determine the relocations contained in each binary's RELA relocation table:

$ readelf -r ppc32-relocs.elf 

Relocation section '.rela.dyn' at offset 0x2ec contains 17 entries:
 Offset     Info    Type            Sym.Value  Sym. Name + Addend
0001fecc  00000016 R_PPC_RELATIVE               5d0
0001fed0  00000016 R_PPC_RELATIVE               54c
0001fed4  00000016 R_PPC_RELATIVE               6d0
0001fed8  00000016 R_PPC_RELATIVE               3e8
0001fedc  00000016 R_PPC_RELATIVE               61c
0001fee0  00000016 R_PPC_RELATIVE               20014
0001fee8  00000016 R_PPC_RELATIVE               20014
0001fef0  00000016 R_PPC_RELATIVE               20014
0001fef8  00000016 R_PPC_RELATIVE               20010
0001ff00  00000016 R_PPC_RELATIVE               734
0001ff08  00000016 R_PPC_RELATIVE               20018
00020010  00000016 R_PPC_RELATIVE               20010
0001fee4  00000501 R_PPC_ADDR32      00000000   _ITM_deregisterTM[...] + 0
0001feec  00000401 R_PPC_ADDR32      00000000   _ITM_registerTMCl[...] + 0
0001fef4  00000201 R_PPC_ADDR32      00000000   __cxa_finalize + 0
0001fefc  00000301 R_PPC_ADDR32      00000000   __deregister_fram[...] + 0
0001ff04  00000701 R_PPC_ADDR32      00000000   __register_frame_info + 0

Relocation section '.rela.plt' at offset 0x3b8 contains 4 entries:
 Offset     Info    Type            Sym.Value  Sym. Name + Addend
00020000  00000215 R_PPC_JMP_SLOT    00000000   __cxa_finalize + 0
00020004  00000315 R_PPC_JMP_SLOT    00000000   __deregister_fram[...] + 0
00020008  00000615 R_PPC_JMP_SLOT    00000000   __libc_start_main + 0
0002000c  00000715 R_PPC_JMP_SLOT    00000000   __register_frame_info + 0
$ readelf -r ppc64-relocs.elf 

Relocation section '.rela.dyn' at offset 0x430 contains 8 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
00000001fd10  000000000016 R_PPC64_RELATIVE                     7f0
00000001fd18  000000000016 R_PPC64_RELATIVE                     760
000000020030  000000000016 R_PPC64_RELATIVE                     20030
00000001ff08  000600000026 R_PPC64_ADDR64    0000000000000000 _ITM_deregisterTM[...] + 0
00000001ff10  000500000026 R_PPC64_ADDR64    0000000000000000 _ITM_registerTMCl[...] + 0
00000001ff18  000300000026 R_PPC64_ADDR64    0000000000000000 __cxa_finalize + 0
00000001ff20  000400000026 R_PPC64_ADDR64    0000000000000000 __deregister_fram[...] + 0
00000001ff28  000800000026 R_PPC64_ADDR64    0000000000000000 __register_frame_info + 0

Relocation section '.rela.plt' at offset 0x4f0 contains 4 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000020010  000300000015 R_PPC64_JMP_SLOT  0000000000000000 __cxa_finalize + 0
000000020018  000400000015 R_PPC64_JMP_SLOT  0000000000000000 __deregister_fram[...] + 0
000000020020  000700000015 R_PPC64_JMP_SLOT  0000000000000000 __libc_start_main + 0
000000020028  000800000015 R_PPC64_JMP_SLOT  0000000000000000 __register_frame_info + 0

Note that each binaries' RELA relocation table is divided into two sections, .rela.dyn and .rela.plt. This will be important later.

If I use elf-edit's decodeRelaEntries function on ppc32-relocs.elf, it will return all of the relocations that readelf -r reports. On the other hand, if I use decodeRelaEntries on ppc64-relocs.elf, it will only report a subset of the relocations:

[ (0x000000000001fd10, R_PPC64_RELATIVE)
, (0x000000000001fd18, R_PPC64_RELATIVE)
, (0x0000000000020030, R_PPC64_RELATIVE)
, (0x000000000001ff08, R_PPC64_ADDR64)
, (0x000000000001ff10, R_PPC64_ADDR64)
, (0x000000000001ff18, R_PPC64_ADDR64)
, (0x000000000001ff20, R_PPC64_ADDR64)
, (0x000000000001ff28, R_PPC64_ADDR64)
]

Notably, all of the R_PPC64_JMP_SLOT relocations (contained exclusively within the .rela.plt section) are absent!

The reason this happens is because each binary prescribes different semantics to the RELASZ tag. For example, let's look out the readelf -d ppc32-relocs.elf tag:

$ readelf -d ppc32-relocs.elf 

Dynamic section at offset 0xff0c contains 25 entries:
  Tag        Type                         Name/Value
 0x00000001 (NEEDED)                     Shared library: [libc.so]
 0x0000000c (INIT)                       0x3e8
 0x0000000d (FINI)                       0x6d0
 0x00000019 (INIT_ARRAY)                 0x1fecc
 0x0000001b (INIT_ARRAYSZ)               4 (bytes)
 0x0000001a (FINI_ARRAY)                 0x1fed0
 0x0000001c (FINI_ARRAYSZ)               4 (bytes)
 0x00000004 (HASH)                       0x150
 0x6ffffef5 (GNU_HASH)                   0x18c
 0x00000005 (STRTAB)                     0x250
 0x00000006 (SYMTAB)                     0x1b0
 0x0000000a (STRSZ)                      154 (bytes)
 0x0000000b (SYMENT)                     16 (bytes)
 0x00000015 (DEBUG)                      0x0
 0x00000003 (PLTGOT)                     0x20000
 0x00000002 (PLTRELSZ)                   48 (bytes)
 0x00000014 (PLTREL)                     RELA
 0x00000017 (JMPREL)                     0x3b8
 0x00000007 (RELA)                       0x2ec
 0x00000008 (RELASZ)                     252 (bytes)
 0x00000009 (RELAENT)                    12 (bytes)
 0x70000000 (PPC_GOT)                    0x1fff4
 0x6ffffffb (FLAGS_1)                    Flags: PIE
 0x6ffffff9 (RELACOUNT)                  12
 0x00000000 (NULL)                       0x0

elf-edit determines what part of the binary corresponds to the RELA relocation table by:

  1. Jumping to the address denoted by RELA (0x2ec), and
  2. Reading a number of bytes equal to RELASZ (252)

In the ppc32-relocs.txt example, this works beautifully. There are 17 entries in the .rela.dyn section and 4 entries in the .rela.plt section for a total of 21 entries overall in the relocation table. RELAENT tells us that each entry is 12 bytes in size, and 21 * 12 = 252, which is exactly the value of RELASZ.

Things get stranger with ppc64-relocs.elf, however:

$ readelf -d ppc64-relocs.elf 

Dynamic section at offset 0xfd20 contains 26 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libc.so]
 0x000000000000000c (INIT)               0x550
 0x000000000000000d (FINI)               0x8c4
 0x0000000000000019 (INIT_ARRAY)         0x1fd10
 0x000000000000001b (INIT_ARRAYSZ)       8 (bytes)
 0x000000000000001a (FINI_ARRAY)         0x1fd18
 0x000000000000001c (FINI_ARRAYSZ)       8 (bytes)
 0x0000000000000004 (HASH)               0x220
 0x000000006ffffef5 (GNU_HASH)           0x260
 0x0000000000000005 (STRTAB)             0x390
 0x0000000000000006 (SYMTAB)             0x288
 0x000000000000000a (STRSZ)              154 (bytes)
 0x000000000000000b (SYMENT)             24 (bytes)
 0x0000000000000015 (DEBUG)              0x0
 0x0000000000000003 (PLTGOT)             0x20000
 0x0000000000000002 (PLTRELSZ)           96 (bytes)
 0x0000000000000014 (PLTREL)             RELA
 0x0000000000000017 (JMPREL)             0x4f0
 0x0000000070000000 (PPC64_GLINK)        0x894
 0x0000000070000003 (PPC64_OPT)          0x0
 0x0000000000000007 (RELA)               0x430
 0x0000000000000008 (RELASZ)             192 (bytes)
 0x0000000000000009 (RELAENT)            24 (bytes)
 0x000000006ffffffb (FLAGS_1)            Flags: PIE
 0x000000006ffffff9 (RELACOUNT)          3
 0x0000000000000000 (NULL)               0x0

Here, we have a RELASZ of 192 bytes. There are 8 entries in the .rela.dyn section and 4 entries in the .rela.plt section for a total of 12 entries overall in the relocation table. Moreover, RELAENT is 24 bytes. But note that 12 * 24 = 288, which exceeds the value of RELASZ! In this particular example, RELASZ only covers the size of the .rela.dyn section, and it does not cover anything in the .rela.plt section, which explains why all of the relocations from the .rela.plt section were omitted. (If you add RELASZ with PLTRELSZ, the latter being the size of the .rela.plt section, then you do in fact get 288 bytes.)


What should we do here? The cross-compilers I am using for PPC32 and PPC64 appear to prescribe different semantics to the RELASZ tag, which makes it questionable whether that is a reliable way to gauge the overall size of the RELA relocation table. Perhaps we should instead count the number of table entries and multiply it by RELAENT?

Replace all `String`-based error messages with structured error types

Generally speaking, elf-edit adheres to the convention of using structured data types to represent its error messages, such as with the DynamicError data type. Unfortunately, there are still some parts of elf-edit that use raw Strings to represent errors:

  • -- | Parses a linked list
    gnuLinkedList :: (B.ByteString -> Get a) -- ^ Function for reading.
    -> ElfData
    -> Int -- ^ Number of entries expected.
    -> B.ByteString -- ^ Buffer to read.
    -> Either String [a]
  • -- | Get values of DT_NEEDED entries
    dynNeeded :: forall w . DynamicSection w -> VirtAddrMap w -> Either String [B.ByteString]
  • -- | Attempt to convert a section to a GOT.
    elfSectionAsGOT :: (Bits w, Num w)
    => ElfSection w
    -> Either String (ElfGOT w)
  • transShdr :: Integral w
    => B.ByteString -- ^ Contents fof file.
    -> B.ByteString -- ^ String table for sectionnames
    -> Word16 -- ^ Index of section
    -> Shdr Word32 w
    -> Either String (FileRange w, ElfSection w)
  • -- | Return relocation entries from byte string.
    decodeRelEntries :: forall tp
    . IsRelocationType tp
    => ElfData -- ^ Endianess of encodings
    -> B.ByteString -- ^ Relocation entries
    -> Either String [RelEntry tp]
  • -- | Return relocation entries from byte string.
    decodeRelaEntries :: forall tp
    . IsRelocationType tp
    => ElfData -- ^ Endianess of encodings
    -> B.ByteString -- ^ Relocation entries
    -> Either String [RelaEntry tp]

We should replace these uses of Either String with Either <structured error type>, where the specific structured error type is specific to the function in question. We might be able to use existing error types for some functions; for instance, dynNeeded could likely use Either DynamicError, just like other functions in Data.ElfEdit.Dynamic do.

I don't have a pressing need to fix this right now, especially since fixing this would require API changes. This does seem like a goal that we should work towards in the long term, however.

Missing AArch64 relocation types

The Data.ElfEdit.Relocations.ARM32 module has a very thorough list of AArch32 ELF relocations, but the list of AArch64 relocations in Data.ElfEdit.Relocations.AArch64 is quite limited by comparison. We should strive to make the AArch64 relocation list more complete by drawing from this document. At a minimum, we should strive to have all of the AArch64 counterparts to the existing AArch32 relocations—for instance, I was recently surprised to discover that elf-edit has R_ARM_COPY but not R_AARCH64_COPY.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.