8088 ASSEMBLER  -  MANUAL                           3/5/89  AJD
           -------------------------


       1.  INTRODUCTION. . . . . . . . . . . . . . . . . . 1

       2.  THE SOURCE CODE FILE. . . . . . . . . . . . . . 3

       3.  LABELS. . . . . . . . . . . . . . . . . . . . . 6

       4.  ASSEMBLER DIRECTIVES. . . . . . . . . . . . . . 9

       5.  USING THE ASSEMBLER PROGRAM . . . . . . . . . .13

       6.  USING THE ASSEMBLE COMMAND LINE OPTIONS . . . .15

       7.  HEXADECIMAL EXPRESSIONS . . . . . . . . . . . .17

       8.  ADDRESSING MODES. . . . . . . . . . . . . . . .19

       9.  BUGS. . . . . . . . . . . . . . . . . . . . . .22



    1. INTRODUCTION
    ---------------

    ASSEMBLE is a program which takes in symbolic 8088 source code and
    translates it to an executable COM file.

    The format of instructions used by ASSEMBLE is similar to that used by
    DEBUG (e.g. MOV AL,[BP+DI+05] ), but there are a number of minor
    differences. There are two major differences between ASSEMBLE and DEBUG
    which made it necessary to write ASSEMBLE in the first place:

       (1) ASSEMBLE takes input from a file stored on disk, while DEBUG
    takes input only from the keyboard. It is easier to edit files for
    input to ASSEMBLE than it is to attempt to edit programs under DEBUG.

       (2) ASSEMBLE supports the use of symbolic labels and symbolic
    constants, while DEBUG does not. Labels are especially useful when
    forward references must be made to lines that you do not yet know the
    location of, or when you wish to insert code without recalculating the
    new addresses of all the locations that have been shifted forwards by
    the insertion. Symbolic constants are useful when a value that appears
    in many places in your code may need to be changed at some time in the
    future.

    Files used and created by ASSEMBLE are:

    filename.ASM   -  Your specified source code file.
    filename.DMP   -  A dump of what appears on the screen during assembly.
    filename.COM   -  The executable file produced by the assembly process.

    ASSEMBLE.COM   -  The program which you run to assemble your programs.
    ASSEMBL2.EXE   -  A program which is loaded and used by ASSEMBLE.COM.
                      Do NOT run this program yourself.

    Section 5 of this manual provides instructions on how to assemble a
    program.



    2. THE SOURCE CODE FILE
    -----------------------

    Your source code file contains the symbolic source code which you enter
    using any available text editor (As long as each line is no longer than
    80 characters and is terminated with a CR and/or LF).  Your source code
    file should always be saved with a suffix of '.ASM', otherwise the
    assembler will not be able to read it.

    Each line in your source file consists of a number of different fields
    which may or may not have anything in them.  These fields are:

    (1) THE LABEL FIELD. This contains the name of the label for the
    current line or the name of a constant which you are defining the value
    of. If you are either providing a label name or defining a symbolic
    constant, you must put the first character of the name in the very
    first column of the line.

    (2) THE INSTRUCTION FIELD. This field starts at any position after the
    first column. It may contain either an 8088 mnemonic for assembly or a
    directive to the assembler (directives begin with a '.').

    (3) THE OPERANDS FIELD. If the line contains an instruction field, then
    this field immediately follows it. This field contains operands for
    8088 instructions and may contain references to labels.

    (4) THE COMMENTS FIELD. This field may begin at any column, but it must
    start with a ';' to show that it is a comment. After an ';', the
    assembler ignores the rest of the line.

    AN EXAMPLE SOURCE PROGRAM:

    ; This is an example of a source code file.
    ;
    PEN = 0321   ; This is a symbolic constant
    ;
            JMP %CONT
            .DB 'This is a table',0
            .DB 48,45,4C,4C,4F
            .DB %CONT,00,05,6,FF/5,03,56,%label
    This may be useful if you want to load a byte register (e.g. AL) with
    a value from a label (e.g. MOV AL,%DOTTY %>SPOTTY %>CONST1 ).
    After the source file has been assembled, a table of the labels used in
    the program and their values is displayed. If the value of any label is
    unknown, then '????' is displayed instead of a value.

    There is one more important feature of label names to be aware of: a
    label name typed in capitals is taken to be a different label name from
    the same name typed in lower-case. Therefore, it may be wise to always
    type your source files in the same case, even though the assembler
    accepts both lower and upper case.



    4. ASSEMBLER DIRECTIVES
    -----------------------

    Assembler directives are commands which are not 8088 instructions.
    Instead, they are commands which tell the assembler to perform various
    functions. Directives start in the instruction field (i.e. any column
    after the first) and are denoted by a leading '.' . ASSEMBLE recognises
    three directives:

    (1) SET LOCATION POINTER (SET ORIGIN). This directive has the form
    '.ORG x', where x is any hexidecimal expression (which may contain
    substitute labels) which evaluates to a 16-bit number. The effect of
    this directive is to set the location pointer (the number displayed in
    the first four columns of the assembly printout) to the value
    specified.  ('ORG' stands for origin). Instructions after this
    directive will be assembled starting from this new location.  Normally,
    the location pointer starts at 0100 when assembly is commenced (0100H
    is the normal offset in the code segment for the start of a COM
    program).  An '.ORG ' directive may be used at any time during the
    assembly to change the value of the location pointer - At the end of
    the assembly, the locations between the lowest value and the highest
    value that the location pointer held during assembly will be written to
    the .COM file.  Note that only locations between 0000H and 4FFFH
    inclusive may be filled with code or data in this version of ASSEMBLE.

    Note: Do not use forward references to labels in the expression for the
    .ORG directive or the results will be chaotic. References to labels
    whose values are already know is quite okay.


    (2) DEFINE DATA BYTES. This directive's function is to store data bytes
    directly into the object code to create tables or reserve memory. It
    may take any of the following formats (and many more):
       .DB aa bb cc
       .DB aa,bb,cc
       .DB 'ssssss'
       .DB 'ssssss',aa,bb,'ssssss',cc
       .DB aa Rzzzz
       .DB 'sssss',aa,bb Rzzzz

    Where,
          aa, bb, cc  are hexadecimal expressions which evaluate to 8 bit
                      numbers. They may be separated by commas or spaces.
          ssssss      is a string of characters of any length which
                      contains no quote (') symbols and no '%' symbols.
          zzzz        is a hexadecimal expression which evaluates to a 16
                      bit number. This is called the multiplier.

    The letter R has a special function in the define data bytes directive.
    The 'R' stands for Repeat. It denotes that the hexadecimal expression
    following it is a multiplier which says how many times the string or
    list of bytes which occurs before it will be repeated in memory. This
    feature may be useful for filling or reserving memory. Note that any
    data after the multiplier expression will be ignored.

    Note that if a 16 bit hexadecimal expression is included as one of the
    data byte expressions then only the least significant byte of the
    evaluated number will be stored in memory using the .DB directive.

    For examples of the use of this directive see the example program in
    section 2 above.

    (3) DEFINE DATA WORDS. This directive is used in exactly the same way
    as the define data bytes directive except that it stores words (16 bit
    numbers) in memory instead of bytes. The form used is '.DW' followed by
    a series of word expressions and/or strings. The Repeat option (R) may
    also be used.

    Note that if strings are included in the define data words directive
    that each character will be written to memory as a word with the high
    byte set to 20H (the ascii value for a space). This may cause some
    spaced-out results (pardon the pun).



    5. USING THE ASSEMBLER PROGRAM
    ------------------------------

    Once you have created your source file, type ASSEMBLE followed by the
    name of your file (Note: do NOT include the .ASM suffix  -  ASSEMBLE
    assumes that all source files have a .ASM suffix).  This will commence
    the assembly of your program.

    When the assembly process begins, the assembly printout will scroll up
    the screen showing each instruction's location in the program, the
    bytes of code it generates, the corresponding text from your source
    code, and any warnings or errors generated. If you wish to read the
    assembly printout one line at a time as it is generated, turn on SCROLL
    LOCK.  This will make ASSEMBLE wait for you to press a key after each
    line of printout it generates.

    When the assembly is finished, a list of labels used in the program and
    their values is produced (The labels are ordered by the sequence of
    their first appearance in the source file).  Following this is a list
    of errors caused by references to non-existent labels (if any).

    Finally, the name of the object file generated, the start offset of the
    program, the number of bytes in the file, and the total number of
    errors encountered during assembly are displayed. If no errors were
    encountered by ASSEMBLE, then you can test the generated object file
    immediately by executing it.

    Note: Errors in the assembly printout are surrounded by asterixes
    ( **** ) , while warnings (i.e. non-fatal errors) are surrounded by
    minus signs ( ---- ). A source program that produces only warnings may
    be safe to run, but a program that produces any errors should not be
    run.



    6. USING THE ASSEMBLE COMMAND LINE OPTIONS
    ------------------------------------------

    The format for executing the assemble program is:

          ASSEMBLE  [-O] [-F] PATHNAME

    PATHNAME is in the standard DOS format for paths and may specify the
             disk, sub-directory, and prefix of your source file. Do not
             include the '.ASM' suffix here because ASSEMBLE adds it on
             automatically. Obviously, the source file may be located in
             any sub-directory. You should note, however, that any output
             files produced by ASSEMBLE (e.g. filename.DMP, filename.COM)
             will be stored in the same sub-directory as your source file
             (filename.ASM) and not in the current directory.

    -O       OUTPUT. This optional parameter tells ASSEMBLE to dump all
             output that appears on the screen during assembly to a file
             with the same name as your source file but with a '.DMP'
             suffix.

    -F       FAST MODE. This option prevents ASSEMBLE from displaying the
             normal assembly output on the screen. Only the changing value
             of the location pointer and any errors or warnings are
             displayed. This mode is slightly faster than the normal mode
             and has the advantage of showing you only the errors and where
             they occurred - not all the other unnecessary data. This mode
             cannot be used in conjunction with the -O option for obvious
             reasons.



    7. HEXADECIMAL EXPRESSIONS
    --------------------------

    There are basically two types of expressions which are recognised by
    this version of ASSEMBLE. These types are simple expressions and
    complex expressions.  All expressions must use hexadecimal numbers as
    decimal numbers are not supporting.

    (1) SIMPLE EXPRESSIONS:  These are strings of hexadecimal numbers and
    '+' and '-' signs which are added and subtracted together to produce a
    single word or byte value (e.g. 09A4+876-543+1 produces 0CD8). Such
    expressions are the only type that may be used within the operands of
    the 8088 instructions with ASSEMBLE. Negative numbers and substitute
    labels are also supported in simple expressions. For example:
           MOV AX,[BX+0105+3-2]
           CMP [BX-50],AL
           JMP %LOOP1+3
      Note that the normal register names used in the addressing modes (BX,
      SI, DI, BP) are not affected by the evaluation of these expressions
      because they are register names  -  not numbers or labels.
      Do NOT include spaces within the operands of the 8088 instructions or
      the operands will not be read correctly.

    (2) COMPLEX EXPRESSIONS: These expressions are similar to simple
    expressions except that two more operations are allowed: multiplication
    (*) and division (/). Such expressions are allowed in the definition of
    constants (i.e. LABEL = expression), and as the data for any of the
    assembler directives (i.e. .ORG, .DB, and .DW).  Note: do not use
    spaces between the operators and/or the numbers with .DB and .DW
    because these directives use spaces to delimit data items.  Spaces are
    okay with .ORG and with the definition of constants.

    ASSEMBLE knows the correct order of operations for these expressions,
    so: 4+3C*2-8/2  produces  0078 hex.  Note however, that brackets are
    not recognised in the evaluation of complex expressions. This should
    not cause too much difficulty because most expressions you are likely
    to need will not be overly complex - if they are then you can get
    around the problem by either multiplying the expression out before you
    type it in (e.g. x*(y+z) = x*y + x*z ) or by defining an extra constant
    whose only purpose is to evaluate parts of an expression, e.g:
       DUMMY1 = %CONSTY + %CONSTZ
       ANSWER = %CONSTX * %DUMMY1


    8. ADDRESSING MODES
    -------------------

    If you are unfamiliar with the addressing modes available with the 8088
    (or the instructions themselves) then it would be advisable to find
    yourself a reference sheet or book on the 8088. The formats of the
    addressing modes used by ASSEMBLE are similar to those used by DEBUG.
    The differences which exist are explained below.

    (1) BYTE OR WORD POINTERS: DEBUG has the annoying feature of forcing
    you to type such instructions as:  MOV BYTE PTR [BX+0985],87    when
    the addressing mode used does not make it obvious whether a byte or a
    word MOV was desired. This ambiguity only occurs if neither of the
    operands is a straight reference to a register.

    ASSEMBLE produces a warning if you do not specify whether a byte or
    word operation is desired in such cases - but it assembles the
    instruction anyway. It displays a message to show what mode it has
    assumed the instruction to be in (this assumtion is based on the size
    of the numeric value which is the second operand). If you wish to
    specify the mode of such instructions in the source code and avoid the
    warning messages then this is simple to do. Place a 'B' after the
    brackets if a byte value is to be used or place a 'W' after the
    brackets if a word value is to be used. For example:
          MOV [BX+0985]B,87  ; 87 is the byte value to be moved.
          MOV [BX+0985]W,87  ; 0087 is the word value to be moved.

    (2) FAR ADDRESSING: When instructions which may reference FAR locations
    (i.e. CALL, JMP, RET) are used and it is not clear that you intend to
    use the FAR mode, you can suffix the instruction with an 'F' (i.e.
    CALLF, JMPF, RETF). If CALL or JMP is used with the a double word
    addressing mode (e.g. CALL 0100:0230) then it is obvious that the FAR
    mode is desired, so the CALLF and JMPF instructions are not valid with
    the double word address mode.

    (3) BRACKETS & PARENTHESES:  ASSEMBLE sees brackets and parentheses as
    equivalent for addressing modes (i.e. '[' = '(' and ']' = ')' ).

    (4) FORWARD REFERENCES: If a label value is unknown when an instruction
    is assembled then the value of 8080H is substituted for that label
    (Unless the instruction is a conditional jump or loop, in which case
    the current value of the location pointer is substituted). This may
    mean that when the value of the label is found-out and the instruction
    is reassembled, the instruction may be shorter than after the original
    assembly. If this is the case then the instruction space is padded out
    with NOPs (Hopefully the instruction never becomes longer upon
    reassembly!).  This represents some inefficiency-but mostly it is not
    significant (In the case of the JMP instruction which may have either a
    byte or a word displacement this only means that one byte of memory
    space may be wasted and no execution time will be lost).

    (5) ORDER OF REGISTERS IN ADDRESSING MODES: In a reference to a memory
    location that uses more than two registers (e.g. [BX+SI] ), the BX or
    BP register must come first followed by the SI or DI register. If this
    convention is not followed then ASSEMBLE will tell you that the format
    of the operand is invalid. Numbers and substitute labels may be freely
    interweaved between the register names and the result of the offset
    evaluation will still be correct. (e.g. [09+BX-5+SI+098] is the same as
    [BX+SI+09-5+098]  although the later is more readable).



    9. BUGS
    -------

    Since only preliminary tests have been performed on ASSEMBLE so far,
    there is no guarantee that all instructions will assemble correctly or
    that the program will not crash.  If a program assembled with no
    errors, seems free from programing errors, and yet does not execute
    correctly, I suggest you use DEBUG to verify the correctness of the
    assembled code.

    Please note any faults or errors encountered with ASSEMBLE and inform
    me of them so that they may be corrected.

    If you feel inclined to take a shot at fixing a bug yourself, you will
    be able to use the source files included. These are ASSEMBLE.ASM which
    must be assembled using ASSEMBLE itself, and ASSEMBL2.PAS which must be
    compiled using TURBO PASCAL 4 or a later version. These files
    correspond to ASSEMBLE.COM and ASSEMBL2.EXE respectively.

    ASSEMBLE.COM - Loads ASSEMBL2.EXE and then executes it, but it also
    remains resident in memory and is called by ASSEMBL2.EXE. It contains
    the code which assembles an 8088 instruction.

    ASSEMBL2.EXE - Is really the main part of the program which organises
    the assembly process. It reads the source file, handles labels,
    accepts assembler directives, and calls the code to assemble 8088
    instructions (located in ASSEMBLE.COM).