Re: Self-modifying code, Function pointers & { Safety, Security}

Martin Ward <martin@gkc.org.uk>
Fri, 14 Mar 2014 14:15:13 +0000

          From comp.compilers

Related articles
Self-modifying code, Function pointers & { Safety, Security} seimarao@gmail.com (Seima Rao) (2014-03-07)
Re: Self-modifying code, Function pointers & { Safety, Security} kaz@kylheku.com (Kaz Kylheku) (2014-03-09)
Re: Self-modifying code, Function pointers & { Safety, Security} martin@gkc.org.uk (Martin Ward) (2014-03-14)
Re: Self-modifying code, Function pointers & { Safety, Security} tenger@iseries-guru.com (Terrence Enger) (2014-03-15)
Re: Self-modifying code, Function pointers & { Safety, Security} seimarao@gmail.com (2014-03-20)
Re: Self-modifying code, Function pointers & { Safety, Security} federation2005@netzero.com (2014-04-13)
Re: Self-modifying code, Function pointers & { Safety, Security} monnier@iro.umontreal.ca (Stefan Monnier) (2014-04-16)
| List of all articles for this month |

From: Martin Ward <martin@gkc.org.uk>
Newsgroups: comp.compilers
Date: Fri, 14 Mar 2014 14:15:13 +0000
Organization: Compilers Central
References: 14-03-015
Keywords: code, comment
Posted-Date: 14 Mar 2014 13:47:06 EDT

On 07/03/14 16:04, John wrote:
> [The major reasons for self-modifying code historically were for
> address modification for indexing and subroutine returns. Index
> registers and indirect addressing provide instruction modification as
> the instruction is executed, making those modifications go away.... -John]


The IBM mainframe instruction set includes an "Execute" instruction
(EX) which takes a register and an address. The instruction at the
given address is loaded into memory, part of it is overwritten by the
low byte from the register, and the resulting modified instruction is
executed.


The most common use for Execute is to modify the length field of an
instruction to give a variable length move or compare.


As common with many IBM instructions, the Execute instruction includes
an index register field: so an instruction can be selected for
execution from a table of instructions, depending on the value in the
given index register.


Despite the presence of Execute, and the fact that more recent
mainframes now include variable length move and comare instructions,
self modifying code is still quite common in assembler code which is
currently in production.


One of the most common cases is overwriting a NOP (branch never)
instruction to convert it into a B (unconditional branch) instruction.
This is typically used to set up a "first time through" switch, for
example:


NOPINSTR NOP LAB1
                    MVI NOPINSTR+1,X'F0'
                    initialisation code...
LAB1 ...


The first time through, the NOP instruction does not branch, so
control falls through to the MVI instruction (move immediate) which
overwrites part of the NOP instruction, turning it into an
unconditional branch. Subsequent executions will then skip the
initialisation code. The advantage of this approach (over setting and
testing a flag) is that it saves five whole bytes of memory: one byte
for the flag and four more bytes for the compare instruction. It also
saves executing one instruction.


The other common cases for self-modifying code are modifying the
length field (directly in the instruction, rather than indirectly via
an Execute instruction), modifying one or more displacement fields
(i.e. modifying the address of the data that the instruction operates
on) and modifying the data field of an immediate data instruction.


I recently carried out a survey of several million lines of current
production assembler in three organisations. Self-modifying code
appeared in all code bases, with the amounts varying considerably
between organisations. The numbers below are per million lines of
executable instructions:


Organisation A:


        24 EXecute instructions
  2,685 modified branch instructions
  3,451 modified length fields
      490 modified immediate data fields
  1,578 modified displacements


Organisation B:


      722 EXecute instructions
  1,236 modified branch instructions
      626 modified length fields
        74 modified immediate data fields
      429 modified displacements


Organisation C:


  1,222 EXecute instructions
        76 modified branch instructions
      107 modified length fields
          0 modified immediate data fields
        90 modified displacements


Any automated assembler analysis or migration tool therefore needs to
be able to detect and handle all these cases of self-modifying code
(as ours does, of course!)


As well as saving memory space, self-modifying code was used to
improve exection speed. Ironically, on modern mainframes,
self-modifying code will dirty the instruction cache: forcing the CPU
to re-load from main memory and killing performance!


I can only think of two cases (out of all the production assembler code
we have ever processed) in which a block of instructions is written
to memory and subsequently executed.


--
Martin


Dr Martin Ward STRL Principal Lecturer & Reader in Software Engineering
martin@gkc.org.uk http://www.cse.dmu.ac.uk/~mward/ Erdos number: 4
G.K.Chesterton web site: http://www.cse.dmu.ac.uk/~mward/gkc/
Mirrors: http://www.gkc.org.uk and http://www.gkc.org.uk/gkc


[Thanks, this is quite interesting. Mainframe sort programs compile
the sort instructions into machine code that does the record editing
and comparison, but they write the code out as an object file on disk,
and then use the link editor or loader to combine it with any user
exit routines and bring it into memory, where the main sort program
can call it. Given the large size of most sort jobs, the extra time for
the linkedit is insignficant. -John]


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.