Tuesday, March 28, 2006

Q&A: How OS X Executes Applications

After writing my previous article, How OS X Executes Applications, I received quite a few comments and e-mails with some good questions. I will attempt to answer some of them here, and continue to update this entry as questions arise.

Question 1. What is libSystem.B.dylib?

evil:~/Temp mohit$ otool -L /bin/ls
/bin/ls:
/usr/lib/libncurses.5.4.dylib (compatibility version 5.4.0, current version 5.4.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 88.0.0)

The System library, found in /usr/lib/libSystem.dylib, is simply a collection of core libraries that are used by most Darwin applications. A few libraries worth mentioning that are in libSystem.dylib are:
  • libc : The standard C library.
  • libdl : The dynamic loader library.
  • libm : The math library.
  • libpthread : The POSIX threads library.
  • libinfo : The NetInfo library.
To get a complete list of modules and symbols within the library, use the -Tv switch of the otool command.

Question 2. Is there an objdump for OS X?

Yes there is, and it supports Mach-O binaries. It's just not distributed with Darwin / OS X. This link on this site, tells you what systems objdump is distributed with.

Question 3. Are executable code and readonly data in the same _TEXT segment? If so, how can they mark part of it executable and part not executable (normal security practice nowadays)?

I actually updated the article with the answer to this, but its a good question, and I'll answer it here again.

Segments may be sub-divided into sections. Within the __TEXT segment, only certain sections, e.g., __text, or __picsymbol_stub, can contain executable code.

To determine which sections contain executable code, use the -lv parameter with otool, and look at the attribute named attributes.
evil:~/Temp mohit$ otool -lv /bin/ls | egrep '(sectname|attributes)'
sectname __text
attributes PURE_INSTRUCTIONS SOME_INSTRUCTIONS
sectname __picsymbol_stub
attributes PURE_INSTRUCTIONS
sectname __symbol_stub
attributes PURE_INSTRUCTIONS
sectname __picsymbolstub1
attributes PURE_INSTRUCTIONS SOME_INSTRUCTIONS
sectname __cstring
attributes (none)
sectname __symbol_stub1
attributes PURE_INSTRUCTIONS SOME_INSTRUCTIONS
sectname __literal8
attributes (none)
sectname __eh_frame
attributes NO_TOC STRIP_STATIC_SYMS LIVE_SUPPORT
sectname __data
attributes (none)
sectname __nl_symbol_ptr
attributes (none)
sectname __la_symbol_ptr
attributes (none)
sectname __dyld
attributes (none)
sectname __common
attributes (none)
sectname __bss
attributes (none)

The sections with attributes set to PURE_INSTRUCTIONS contain executable code.

Question 4. How do I dechipher the constants in the otool output?

There are two ways to do this: One way is to examine the header files in /usr/include/mach and /usr/include/mach-o; and the other, simpler, way is to just add -v to your otool commands.
evil:~/Temp mohit$ otool -vh /bin/ls      
/bin/ls:
Mach header
magic cputype cpusubtype filetype ncmds sizeofcmds flags
MH_MAGIC PPC ALL EXECUTE 11 1608 NOUNDEFS DYLDLINK TWOLEVEL

Question 5. What are Two-Level Namespaces?

It is a feature included since OS X 10.1, that prevents collisions with symbol names in dynamic libraries. It works by associating library names with symbol names at compile time.

Suppose you have an application that is linked against libfirst and libsecond. libfirst exports a function called dothis(). At a later time, a new version of libsecond comes out with its own dothis() function. Now, the application may execute whichever dothis() function it loads first, which may not be the one that was intended.

With two-level namespaces (enabled by default), the linker associates dothis() with libfirst at compile time. This prevents the chances of symbol collisions in future versions of linked libraries.

Question 6. Is Steve Jobs going to have you executed for reverse engineering this information?

Yes he is.

Seriously though, all this information is public knowledge. I did not "reverse engineeer" anything. All I did was put together the most relevant parts of the documents mentioned at the end of the article. And I would suggest reading them for a deeper understanding of the OS X runtime environments.

9 comments:

  1. Thanks for the answers. Its intresting that Darwin bundles a whole set of libraries into libSystem.

    Doesnt that break build scripts that expect these libraries?

    ReplyDelete
  2. No it does not. (Well... mostly)

    This is because common libraries like libc, libm, or libpthread are symbolically linked to libSystem in /usr/lib.

    See for your self:

    $ ls -l /usr/lib/libc.dylib /usr/lib/libpthread.dylib

    ReplyDelete
  3. Some really hacker stuff going on here :)

    Welcome to the blogger front page!

    ReplyDelete
  4. This comment has been removed by a blog administrator.

    ReplyDelete
  5. Awesome name for a blog. I'm surprised it wasn't taken already.

    ReplyDelete
  6. I am guessing you don't get out a lot.

    God bless you though. I don't have half the brain it takes to figure this stuff out.

    ReplyDelete
  7. Congratulations on getting Blogger's front page spotlight.

    ReplyDelete
  8. Now that Macs are Intel, has anyone written anything allowing you to load ELF binaries?

    ReplyDelete
  9. I am trying to find the physical entry point to a MACH-O executable.

    When I look at the section marked as PURE_INSTRUCTIONS, I see the addr, size and offset.

    Using a hex editor I look at the physical offset pointed to by addr. My initial response was that it should point to the executable's entry point for execution, but it seems that that initial assessment is wrong.

    Sometimes it does, sometimes not at all.

    I need to calculate the entry point in the file on disk, not when executing, but in the physical file.

    I can't find documentation on how to calculate that entry point. Do you perhaps know how to do this?

    Thanks!

    ReplyDelete