Win32ASM - 61
Win32ASM - 61
1 INTRODUCTION ...................................................................................................... 6
1.1 WHAT THIS DOCUMENT IS NOT ABOUT ............................................................................ 7 1.2 WHAT THIS DOCUMENT IS ABOUT (AND PREREQUISITES) ................................................ 7 1.3 KEEP THE BALL ROLLING ........................................................................................... 10
2.2 CHOOSING A LINKER ...................................................................................................... 13 2.3 CHOOSING A DEBUGGER ................................................................................................ 14 2.4 CHOOSING A GUI IDE .................................................................................................... 14
2.4.1 The bad news is ................................................................................................................. 14 2.4.2 The good news is ............................................................................................................... 15
2.4.2.1 Programmer's IDE for Windows 95/NT v2.3.................................................................................. 15 2.4.2.2 Watcoms 10.6 IDE. ....................................................................................................................... 15
3.1.4 MASM options ...................................................................................................................... 27 3.1.5 Miscellaneous OS and systems issues ................................................................................... 28
3.1.5.1 Beware of the CLI........................................................................................................................... 28 3.1.5.2 Beware of the STD ......................................................................................................................... 29
3.2.4 Linking a DLL file ................................................................................................................. 40 3.2.5 Advanced linking techniques ................................................................................................. 41
3.2.5.1 Grouped Sections ............................................................................................................................ 41 3.2.5.2 DLL forwarders .............................................................................................................................. 42 3.2.5.3 Weak Externals ............................................................................................................................... 43 3.3 DEBUGGING AN ASSEMBLY LANGUAGE WIN32 APPLICATION ....................................... 43
5 WIN32ASM TOOLKIT............................................................................................ 49
5.1 THE EXAMPLE FILES....................................................................................................... 49 5.2 THE INCLUDE FILES ........................................................................................................ 49
5.2.1 General Include files .............................................................................................................. 50
5.2.1.1 Win32Inc.equ.................................................................................................................................. 50 5.2.1.1.1 UnicAnsi.equ ........................................................................................................................... 50 5.2.1.1.1.1 The UnicAnsiExtern macro: ............................................................................................. 50 5.2.1.1.1.2 The String macro .............................................................................................................. 51 5.2.1.1.2 Win32Types.equ ...................................................................................................................... 51 5.2.1.1.3 Win32Defs.equ ........................................................................................................................ 51 5.2.1.1.4 Win32Strs.equ ......................................................................................................................... 51 5.2.1.2 Win32Res.equ ................................................................................................................................. 51
6 BIBLIOGRAPHY ..................................................................................................... 60
6.1 [BOOTH, 96.01] .............................................................................................................. 60 6.2 [BRAIN, 96.01] ............................................................................................................... 60 6.3 [INTEL, 95.01] ................................................................................................................ 61 6.4 [PETZOLD 96.01] ............................................................................................................ 61 6.5 [PIETREK 95.01] ............................................................................................................. 61 6.6 [RECTOR & AL, 96.01].................................................................................................... 61 6.7 [RICHTER 96.01]............................................................................................................. 61
Disclaimer This documentation and associated files is provided "as is" and any express or implied warranties, including, but not limited to, the implied warranties of merchantibility and fitness for a particular purpose are disclaimed. In no event shall the author be liable for any direct, indirect, incidental, special, exemplary, or consequential damages (including, but not limited to, procurement of substitute goods or services; loss of use, data, or profits; or business interruption) however caused and on any theory of liability, whether in contract, strict liability, or tort (including negligence or otherwise) arising in any way out of the use of this software, even if advised of the possibility of such damage.
Distribution Since this documentation and associated files are declared as Public Domain, you are allowed to distribute it without any restrictions on any storage or communications media, as long as you use the distribution self-extracting file without any modification, using the same file name (Win32ASM) as the original file. Trademarks All brand names and product names used in this documentation are trademarks, registered trademarks, or trade names of their respective holders.
1 Introduction
Microsoft never documented the way to develop applications for Win32 Intel platforms using assembly language. The only assembly language documentation that Microsoft ever produced on the topic is the Win95 DDK, dedicated to the development of virtual drivers (VxD) in the Win95 environment, and it scarcely covers any Ring 3 application programming matter. In addition; development in a Win32 environment requires the use of numerous reference data, such as function prototypes; structures, type and constant definitions, macros and other data. Microsoft released these items as C header files (.H files) in the Win32 SDK, but no equivalent files were ever published for assembly language. This complete lack of documentation and tools propagates the illusion that developing in assembly language for Win32 is something that simply could not be done. This is aggravated by answers commonly provided by Microsofts developers support staff, claiming that assembly language programming for Win32 is not supported by Microsoft and even more often that No, it cannot be done. The truth is very different. Programming in assembly language for Win32 is indeed very possible, it is just as simple to achieve as with a High Level Language (HLL), there is nothing magic about it, the tools are the same, and with some initial explanations and road-mapping, the considerable C-oriented documentation that Microsoft (and others) have released can be used to program in assembly language. Moreover, programming in the Win32 environment is paradoxically considerably easier now than it has ever been: The Win32 API provides a vastly improved equivalent to the standard runtime library assembly language programmers never had before, the new operating environment, with its true multi-tasking services, provides a new context and new challenges for high performance assembly language applications, the Win32 environment offers the assembly language programmer additional debugging aids, tools and protection that never existed before. Two critical keys are missing today to make the above facts obvious: Assembly specific documentation, and include files describing various symbols such as function prototypes, typedefs, structures and constant definitions. This quick documentation tries to fill a part of the first shortcoming, and we hope the accompanying set of include files will be one step in the right direction toward remedying the second.
The bottom line is: Programming in assembly language for the Intel PC platform has never been easier than it is today in the Win32 environment and we hope this document will help you take advantage of the new opportunities this opens.
It is intended for assembly language programmers who are already fluent in Intel (32-bit) assembly language, who know enough Win32 programming to write (or at least read and fully understand) a Win32 program in the C language, who have access to and understanding of the Microsofts Win32 SDK documentation and tools and who are looking for all the useful details that Microsoft carefully refrained from explicitly documenting.
The official information about Win32 programming can be found in Microsoft Developers Network (MSDN). The Win32 SDK is delivered with MSDN Level 2 (a.k.a. Professional) and upper level subscriptions. Building applications using the techniques indicated in this documentation requires access to Microsoft's Win32SDK documentation and Microsoft's Win32 import library (.LIB) files. Both are available as part of Microsoft's Microsoft Developer's Network (MSDN). Both are also distributed with various 32-bit compiler sets, but as this document assumes and documents the use of the Microsoft Assembler, you must insure that the import .LIB you are using are somehow compatible with the regular Microsoft programming tools. Talking about documentation, there is one public domain piece of documentation that anyone interested in Win32 programming should get: In March 1996, Sven B. Schreiber released to the public domain a wonderful set of tools, with complete source and documentation. The whole set is available on the Internet at Svens site, as file ftp://ftp.orgon.com/pub/asm/WALK32_1.ZIP It is also available from several other sources (try your favorite search engine and/or Archie program), and on Compuserve, GO PCPROG, Library 1 (Assembler) I could not use Svens tools, first because I found them too late and then because I needed Microsofts COFF format compatibility and symbolic debugging support. But I still wish I had found WALK32 earlier than I did. In addition to the collection of tools it provides, WALK32 includes remarkable documentation about Win32 and Win32 programming that can be used in any context. The documentation exposes many sound and clean programming techniques, including a few that I wish Microsoft had thought about when they originally designed the Windows API. Whatever the tools you end up using, there are a lot of ideas, techniques and code that you can reuse from WALK32, and you shouldnt pass it. Sven also published an article in the November 96 issue of Dr Dobbs Journal that shows a special application of WALK32 in Netware programming. The source code (including a mini WALK32 environment) can be found on www.ddj.com
Lots of other information can be found in any good computer book shop, as Win32 is a fashionable enough topic these days. But keep in mind that: Many books claim to cover Win32 programming; but not so many cover it properly.
Many books are mere copying and/or rephrasing of Microsoft documentation and examples. No book so far really covers Win32 programming for assembly language. So assembly language programmers have to read at least another programming language in order to find the information they need about Win32. The other language that is needed is C. The official Win32 SDK by Microsoft describes most of the interface to Win32 in term of C language and C data structures. All function prototypes, structure descriptions and constant definitions are described in .H (C header) files in the MS SDK. There are many other books about Win32 and other languages (C++, Delphi and Visual Basic come to mind). We do not recommend attempting to use any of these books for our particular purpose, as the languages they cover offer a higher level of abstraction than C: As such, they tend to hide interfacing details away from the programmer and to make transposition to assembly language harder. It often becomes mostly impossible to relate the examples to the underlying machine (assembly) implementation. Be particularly careful when selecting new books about Win32 programming, as an increasing number of books cover the topic exclusively through C++ and MFC without ever mentioning this fact on the cover.
The short bibliography at the end of this document mentions a few reference books and magazines we found useful (and sometimes more) in discovering assembly language issues for Win32. If we had to pick only ONE third party Win32 book, it would likely be Advanced Windows (Third edition) [Richter 97.01]. This book clearly explains all the important mechanisms in Win32 (with C code examples), covers most differences between the Win95 and NT (including NT 4.0) implementations, and exposes many, many of the pitfalls and oddly documented aspects you have to know when programming for Win32. Be warned: This book is probably not for the beginning Windows programmer. Beginning Windows programmers might want to start with Programming Windows 95 [Petzold 96.01]. Finally, Matt Pietreks Windows 95 System Programming Secrets [Pietrek 95.01] covers many aspects of Windows 95 internals and contains lots of invaluable information about many Win32 topics. For those readers who own a Win32 compatible HLL compiler, another source of information could sometimes be the .ASM files that are delivered with the source of their runtime library. Some interesting pieces can be found there, possibly bringing information about some advanced topics like Structured Error Handling (SEH). Last but not least, this document refers to a number of advanced (and sometimes not so well known) features of MASM 6.1x. It assumes that you have access to the MASM Programmers Guide, either in its paper form or its electronic form. The electronic form is available in MSDN Archive Edition, Product Documentation/ Languages/ Macro Assembler 6.1 (16-bit). Do not be fooled by the 16-bit mention in the table of contents of the electronic documentation: this documentation is the image of the latest printed documentation and does handle the 32-bit features of MASM as well. It is very unfortunate that Microsoft decided to bury the MASM documentation in the archive CD-ROM rather than in the MSDN Library were it belongs, since MASM 6.1 is a dual, 16-bit and 32-bit product, and its 32-bit part is alive and well. It is specially inconsistent, now that Microsoft has brought MASM
back to the MSDN Universal CD-ROMs, that the MASM documentation has not been restored as well. At the time of this writing, there is a section about MASM in the MSDN Library [Product Documentation \ Languages \ Macro Assembler 6.11 for Windows NT (32bit)], but it unfortunately contains no more than a few release notes about MASM 6.11. For those using the electronic documentation in MSDN Archive Edition, and since the electronic documentation does not provide table of contents and/or page numbers, we will attempt to reference the MASM documentation through its hierarchical path, as describing the tree organization that appears using MSDNs INFOVIEW viewer. References to the documentation will thus look such as in: Chapter 1 Understanding Global Concepts/ Language Components of MASM/ Statements. Unless specified otherwise, all references will be to the Programmers Guide, in MSDN Archive Edition, Product Documentation/ Languages/ Macro Assembler 6.1 (16-bit) At this time, we (unfortunately) do not know of any third party book that could be considered as an exhaustive (and, one would hope, improved) replacement for the MASM Programmers Guide.
10
2 Product choices
2.1 Choosing an assembler
2.1.1 MASM 6.11a, 6.11d and 6.12 2.1.1.1 MASM availability (and uncertain future)
At the time of this writing, and to the best of our knowledge, MASM 6.11a is the latest commercially available incarnation of Microsoft Macro Assembler. MASM. MASM 6.11a is not the latest release of the software, though: it can be patched to 6.11c, 6.11d and 6.12 (see below). Until recently (September/ October 1997), one could question Microsofts willingness to keep on supporting MASM: MASM did not appear anywhere in Microsofts Web site, not even in the developer product list. MASM did not appear as a product in any of the MSDN Universal CD-ROM disks, although MSDN Universal contains by definition all of the current Microsoft development products. The documentation for MASM only appears in the MSDN Archive edition, Product Documentation/ Languages/ Macro Assembler 6.1 (16-bit). Since the MSDN Archive CD-ROM is dedicated to obsolete products, one can question the actual position of MASM in the Microsoft product line. Visual Studio (aka Developer Studio), Microsofts universal IDE (Integrated Development Environment), has provision for supporting nearly all Microsoft translators. Nearly, that is, except for MASM, and even though the current version of MASM supports just everything it needs to in term of IDE prerequisite functions: Debugging, local / global symbol handling, COFF code format, etc (more about this later)
Then by the end of August 1997, several things happened: Microsoft posted a patch on the Microsoft Web Site. This patch turns any MASM 6.11, 6.11a, or 6.11d. to the brand new MASM 6.12. The patch is available at: http://support.microsoft.com/support/kb/articles/Q173/1/68.asp At about the same time, MASM 6.11a was rehabilitated as a Microsoft product, as can now be seen at http://www.microsoft.com/products/developer.htm http://www.microsoft.com/products/prodref/450_ov.htm Finally, MASM 6.1 was (at last!) included in the MSDN Universal Edition (Level 4), starting with the October 97 issue. The CD-ROM version also contains the patches required to build a MASM 6.12 image.
11
So at the time of this writing, MASM 6.12 is the latest version of Microsoft assembler. According to the README.TXT file delivered with the 6.12 patch, 6.12 corrects a number of the bugs that plagued 6.11, and brings Pentium Pro (a.k.a. 686) and MMX instruction set support to MASM. Most of this document was written during the MASM 6.11d area, so a few of the bugs and limitations that we mention here might have been fixed in 6.12. Likewise, new bugs that could have been implemented in MASM 6.12 are not covered yet.
. MASM is far from being perfect, though. It would certainly require a few improvements in some areas (more on this later). But it is a very good tool as it is.
12
The WATCOM linker might work too, but since the Microsoft linker is widely available and did what I expected, there was little incentive to look for something else, and I didnt check it further. At the time of this writing, other linkers I know of either dont support the Codeview debugging format (Borland) or only support OMF object format (Symantec, Borland).
13
Those deciding to favor the OMF object format might want to consider the excellent, feature rich and superfast OPTLINK linker: As far as I know, it is not sold anymore as a stand alone product but now belongs to the Symantec C++ development suite.
The article presents four pitiful kludges requiring the user to do most of the work by hand for each project to be managed through the IDE. But unfortunately, Microsoft didnt bite the bullet and incorporate MASM inside Developer (Visual) Studio. Someone at Microsoft has still to realize that the whole purpose of the IDE is to allow the programmer to merely click or dragndrop source files in the module tree, not to impose further configuration chores.
14
15
to tailor the WatcomIDE to make it use MASM, LINK and all their options instead of native Watcom tools. Once this had been done, replacing the support for the Watcom debugger with that of the Developer Studio or SoftICE was not such a complex matter. The benefits are: The resulting IDE uses its native WATCOM make program and provides a graphical interface to it (no fiddling with yet another disgusting make syntax). After proper configuration, The Watcom IDE gives access to all of the options for all tools, (MASM, LINK, ) as radio buttons, checkboxes, edit boxes, etc These are maintained individually on a file by file basis and can be changed through the GUI interface. Finally, the Watcom IDE supports multiple targets and cascaded dependencies, like a set of libraries and several .EXE files. If you modify one of the source files for a library, and this library is used to link an .EXE files, rebuilding the .EXE will first automatically rebuild the library (and recompile the source file). Unfortunately, there does not seem to be support for non-Watcom include files. WATCOM supplies a text editor, but I didnt like it too much. I personally use American Cybernetics Multi Edit for Windows (MEW). It has off-the-shelf support for all the language translators I ever heard of, plus many more I never thought could even exist. It does in-editor compiles either synchronously or asynchronously. It has myriads of customization capabilities, all accessible through nice hierarchical menus. And if that were not enough, it has a full macro language, and macros are provided with sources so you can change them to your will (I actually hardly ever needed this). Last but not least, MEW has built-in support for several IDEs from various vendors, including the WATCOM one. Practically, this means that clicking on an error line reported by MASM to the WATCOM IDE brings up the MEW editor with its cursor pointing to the right line number. Pressing the build button in the IDE automatically directs the editor to save the source files before the IDE launches its make session. Etc Last time I checked, you could get an evaluation copy of MEW 7.1x from www.amcyber.com. And NO, they dont bribe me for plugs! Alternately, here is freeware bargain: There are dozens of text editors available on the Internet, as substitute for NotePad and/or WordPad. But there are not so many that are good programmers editors. Here is the best we found: Programmers File Editor (PFE32), written by Alan Phillips. PFE32 is a fullfledged programmers editor and supports a compiler / assembler, keyboard macros and other features. Last but not least, there is even an Alpha and a PowerPC version. PFE32 can be downloaded from http://www.lancs.ac.uk/people/cpaap/pfe and many other sites (hint: use an Archie search). At the time Im writing this, the name of the file is PFE0701I.ZIP, but this is likely to change as new versions are published. Go to the URL above if in doubt. The author can also be reached at A.Phillips@lancaster.ac.uk
16
It was a lot of boring work to put all the IDE pieces together (specially customizing the Watcom IDE), but retrospectively, the result was probably worth the effort. I just wish one day, the good people at American Cybernetics bit the bullet and added to their great editor a fully configurable GUI make, as well as the minimal additional support required to interface with the most common linkers and debuggers. This would turn a great editor into the first fast, compiler-independent GUI IDE, something the development world is missing. If anyone out there has found any useful, fully GUI combination of tools covering the same (or a larger) area, Id love to hear about it!
Another thing you need to do to complete your quest for MASM documentation is to search the whole MSDN library CD for the string "6.11." This will pull various items that include READMEs and knowledge base articles, detailing a number of small improvements, limitations and features that are not mentioned anywhere else. Some of these knowledge base articles are either obsolete, or wrong, or both, but heck, you cant win all the time. Finally, the README file included in the MASM 6.12 patch contains a number of documentation updates and clarifications (some of them identical to previously published Knowledge Base articles.) Read them carefully.
MASM will generate the appropriate PUSHes upon entry and the required POPs automatically before each subsequent RET instruction in the PROC. Of course, since you are undoubtedly programming in a structured way, there will be a single exit point to your function, a single RET instruction, and this will never generate more than the minimum number of POPs. EBP is a special case, that you might not find too often in the USES list. If your function uses any local data (aka dynamic data, defined on the stack through the LOCAL statement), or if your function is called with stack parameters (declared in the PROTO / PROC definition), then MASM will generate the appropriate stack frame. It will set EBP as its addressing base to access both local variable and parameters. In this situation, MASM will also automatically generate code to save and restore EBP, so it would be a waste to mention the EBP register in the USES list.
18
Be especially careful to NOT change EBP in any PROC that is defined as taking parameters and/or using local data or at least, to use EBP very carefully in such cases. Using the command line switches /Fl /Sg creates an expanded listing file showing all generated instruction and helps in mastering these delicate situations. The above rules about register preservation might or might not apply to your own functions, though. The general rules are as follows: Win32 functions that you call from your own code do not care about the entry contents of the EBX, EDI, ESI and EBP registers. Win32 functions currently do care about the content of the segment registers, though, assume them to follow the holly FLAT model, and dont bother reloading them upon entry into system code. In other words, if you ever happen to play with segment registers, Win32 is very likely to expect the segment registers upon function entry to be in the same state as they were when your process initially got control. When calling functions that you wrote from your own code, it is obviously your own business to decide whether the calling function, the called function or neither of them will save and restore registers. You might be able to spare quite a few cycles by only saving registers when you know it is really needed: it is not because MASM gives you some HLL-like facilities that you should start generating code like that of a compiler! On the other hand, functions that you wrote and that you register as a callback with any Win32 or other function external to your code should always play by the rules: The upper level Win32 function that will eventually call your code certainly doesnt expect you to change any of EBX, EDI, ESI, EBP and the segment registers before returning control. So callback functions that you write should always return with these registers unaltered. The very same rule applies to the DLL functions you export: the code calling your DLL expects your code to play by the rules and respect EBX, EDI, ESI and EBP. A quick one about the segment registers: you will probably not need to do anything with the segment registers. As you certainly know, Win32 uses the FLAT model, where all segment registers contain descriptors mapping the whole logical address space your application (process) sees. The way the flat model was implemented by Microsoft is too restrictive in my opinion, and deprives the programmer of a very useful and efficient native CPU mechanism, as is explained below in The absence of LDT support in Intel-based platforms, page 44. But this is unfortunately the way it is today. Unless proved otherwise, its a no-no to change DS, ES, CS, SS or FS. The GS register doesnt seem to be used, but to my knowledge, nothing has been published on the topic so far and I did not investigate any further yet personally. If you ever try to play with GS, you should at the very least realize that you are doing it at your own risks. In any case, in the great tradition, although Microsoft didnt let the Ring 3 application programmer use segmentation and segment registers, they permitted themselves to use it, even in your own Ring 3 code: If you look at the FS register, youll notice that at any time, the FS register contains a valid selector. This has been documented in [Pietrek 95.01]. The selector in FS actually points to a Thread Information Block (TIB). The TIB contains various thread-dependant items. The contents of the TIB are used by many Win32 syscalls;
19
and changing the FS register is very likely to crash your process automagically and very soon.
Defining a given default calling convention in .MODEL still allows you to override the default for any given function. The calling convention you explicitly state in specific PROC and/or PROTO directives overrides the one in .MODEL
The calling convention defines two different aspects of function interfacing: Naming Parameter passing
STDCALL is an hybrid of the C convention, for the naming and order or parameters of the stack, and PASCAL convention, for the removal of parameters from the stack.
by the assembler (since two DWORD generate 8 bytes of parameters). This trick is used by the linker to perform some brute force (but very useful) parameter consistency check against the Win32 import libraries. If you inadvertently specified the wrong number of parameters in the PROC and PROTO definition, MASM would generate the wrong @x postfix value and the link would fail with an undefined reference, pointing at the error. Believe me, this is much better than getting some
20
unpredictable behavior at runtime because of a parameter error, disguised into a stack error
Both MASM and TASM come with utilities to convert C header files (.H files) to include files (.INC files). The Microsoft one is H2INC.EXE, while the TASM one is H2ASH32.EXE. I ended up not using them, because: neither is able to properly compile original unmodified .H files. Both spit tens of errors compiling WINBASE.H, for instance. And manually tweaking copies of the original .H files to get them to properly convert is a pain. I decided I preferred to know what I imported in my files rather than blindely importing whatever the converter converted. There are many things in the .H files that might have some historical value (for 16-bit portability, for instance), but do not mean a thing in a new Win32 assembly context.
22
I didnt like the twisted syntax the Microsoft tool (H2INC) generates for function prototypes: a TYPEDEF with an artificial name (PROTO_<sequence number>) followed by a PROTO referencing the TYPEDEF.
As a result, I chose to create include files for each of the libraries I needed manually, and to add function prototypes, structures and equate definitions along the way, as I needed them. The ideal solution to this problem could be an assembler that would properly compile and interpret a large subset of the standard .H files, smartly enough to gobble irrelevant errors. Hey, one can dream a little! On the dark side, this would likely slowdown assembly by a large amount. Alternately, an H2INC-style utility that would be able to properly and cleanly compile any of the existing .H files to .ASM source would certainly help.
directive ahead of the include file that contains the PROTO definitions for the KERNEL32 functions, for instance, and MASM will automatically generates a linker directive (embedded in your object code) adding KERNEL32.LIB to the list of libraries to search at link time. MS Link is able to recognize and process the embedded directive. I didnt check with other linkers, but I would assume they understand the embedded directive the same way. By using this technique, the only other thing that the linker needs to resolve external references is a LIBPATH switch on the command line: A command-line switch like /LIBPATH:G:\WinSDK\LIB tells MS Link where the Win32 import libraries files (that you might define using INCLUDELIB)can be found. If you have several groups of libraries, use the /LIBPATH directive several times on the command line. One of the MASM-related Knowledge Base articles claims that INCLUDELIB is not supported with LINK: Don't believe it, INCLUDELIB does work just great, at least with recent the 3 most recent MS LINK implementations I checked. Additional tricks: If you need to include several libraries, use several INCLUDELIB directive. INCLUDELIB can be used to pass other directives to the linker. What INCLUDELIB does is to embed a -defaultlib: directive in the special .drctve pseudo-section (see Microsoft definition of the COFF format for more information on that special section). The (dirty) kludge we use here is that INCLUDELIB passes everything that follows it as is to the linker. So a line such as
23
as a additional parameter line to the linker The END directive acts about the same as the INCLUDELIB directive. It only generates an
-entry
parameter rather than a defaultlib one. One can regret that Microsoft did not make provision for a generic LINKDIRECTIVE verb instead of (or in addition to) creating the specialized INCLUDELIB and END directives.
24
might results in severly degraded performances. The variable someDWORD and the following DWORDs are not properly aligned, and can considerably slow operations down if accessed very frequently. There are several ways to handle this: Manually insert ALIGN DWORD directives after each non-DWORD directive, Grouping data items by size, and prefixing each group with an ALIGN <size> directive, Creating additional sections (possibly by defining macros to define a .DATABYTE and a .DATAWORD sections). This is a costly solution, though (sections are allocated with a 4K page granularity). Likewise, when coding structure, always group data items in such a way that DWORDs are always aligned on a DWORD boundary Words are always aligned on a WORD boundary When using bytes, group them by 4, or by 2 with an adjacent word, etc Finally, do not forget to mention structure alignment in your structures, such as this:
Foo STRUCT DWORD DWFoo0 DWORD 1 BBar BYTE 1 DWFoo1 DWORD 1 Foo ENDS ;Equivalent to STRUCT 4 ;Padding will be added before ;DWFoo1 to achieve proper alignment.
The alignment rules for structures are documented in MASM Programmers Guide, Chapter 5/ Structures and Unions/ Declaring Structures and Unions/ Alignement Value and Offsets for Structures, page 119. One thing that I have not seen documented is that the alignment specification after the STRUCT keyword can be a type specifier directive (like the DWORD in the example above), instead of a number (1, 2 or 4). Look at the code generated by the above structure: MASM will pad item BBar with zeroes to respect the alignment inside the structure.
Since VC++4 / VC++5, Visual C++ started to default to aligning structures on QWORD boundaries. There is no real hardware reason to follow this rule with the
25
current machines, as the cache lines for the Intel 486 and Pentium processors are 16 bytes and 32 bytes, respectively. Other Intel recommendations suggest aligning data the following way: WORD data should not cross a DWORD boundary, DWORD data should be aligned on a DWORD boundary, QWORD data (double precision reals) should be aligned on an 8-byte boundary. So in our 32-bit Intel world, I currently see no real reason to align on QWORDs. I suspect this change is nothing more than Microsofts anticipation of the use of 64-bit machines, where the performance hit might occur when the native word format (QWORD) alignment is not respected. On 64-bit machines, it is safe to assume that the C integer will be 64 bits, and that by simply aligning structures to QWORDs, the alignment issues would be solved. Previous experience suggests that portability issues are not limited in any way to simple word size and alignment matters, but this is yet another story. The exact architecture of the future Intel machines is not known at this time, anyway, and the only points that have been disclosed tend to indicate that they will use dual mode machines, able to execute either in 32-bit or in 64-bit mode. Now, considering that: assembly language is not that portable anyway, we dont know what the 32-bit code performance will be like on these machines, but might assume that Intel will make their best to make it look excellent to ease the transition, and nobody expects the 32-bit machines and code to disappear overnight (heck, most of the world is still mostly running 16 bits DOS code!), The bottom line is, I am not sure this is worth bothering at this point. A last word on alignment: Sven B. Schreiber brought to my attention that alignment issues are not only a performance issue: they are also reliability issue. Since NT 3.51, some APIs will simply crash your process if you pass them parameters that are not aligned on DWORD boundaries. Sven makes a special mention of resource data such as BITMAP structures that must always be DWORD aligned.
26
32-Bit Flat Memory Model MASM Code for Windows NT, Article ID: Q94314, Revision Date: 23-JAN-1995.
Do NOT believe this article, it is all wrong. There is a bug in the way the END directive is implemented, but it can be worked around easily. Here is the scoop: MASM 6.11d does process the Start label in the END directive and generates an embedded /ENTRY directive in the object code. The only problem is that it does not generate the right label in the /entry: directive it writes to the .OBJ file. If an STDCALL directive is in effect, and the entry procedure is Start, the "END Start" directive will generate an inline
-Entry:_Start"
directive. Now, note that LINK will in turn decorate the name it gets in the entry parameter and internally change it at link time to "__Start", that it expects to be a PUBLIC in the object file. Strike 2: There can't be neither _Start nor __Start externally defined in the MASM module, because Win32 requires the use of STDCALL. And STDCALL will change the "Start" in the source code into _Start@0. So MASM should be consistent with itself (and with LINK), and apply the default interface convention to the END directive, thus generating an inline
/Entry:Start@0
link directive for everything to work fine. Since MASM is inconsistent, we have to fix the problem ourselves. This can be done by replacing the END directive with an ENTRY directive, defined by the following macro:
ENTRY MACRO EntryPoint:REQ LOCAL EntryPoint IF @Version GE 611 ALIAS <_&EntryPoint&@0>=<&EntryPoint> ENDIF END &EntryPoint ENDM
With this macro used in your code in place of the END directive, the code line
ENTRY Start
should work just fine, by instructing the linker to use the right label in replacement to the one it cant find. See the explanation of the ALIAS directive below (Use of ALIAS, page 34) for more details on this magic.
expect it the least: If you suddenly decide to reorganize your code modules (by breaking out a very large code file, for instance), symbols that used to be internal might need to become external. And this will suddenly reveal discrepancies in the case usage. Since code restructuring is already a delicate operation, you usually dont want this extra problem to surface at that time. You will likely want to use the /c command line parameter, too: this will prevent MASM from automatically attempting to launch LINK. For all but the smallest projects, you will want to control the launching of LINK by other means. Use /COFF code generation in ML. Link only knows COFF natively and you won't need to undergo a COFF to OMF conversion. The call to the OMF-to-COFF converter is handled by LINK (by shelling out to an external conversion program), but this only slows down the linking process. Use /Zi to get symbolic debugging information. This should work with any debugger directly supporting the Microsofts Codeview debugging information. We successfully tested Microsofts MSDEV IDE debugger, Watcoms 10.6 debugger as well as with SoftICE for Win95 version 3.x. Using the /W3 option to turn maximum warning level on might be useful too. You might want to use /Sc to get instruction timings in the listing file. Timings depends on the definition you gave of the target CPU (.386, .486, .586, etc). This facility is quite useful, but: Keep in mind that this only shows brute force timings, and does not take into account other factors like CPU pipes, AGI stalls, cache side effects, alignment effects, etc See [Booth, 96.01] or [Intel, 95.01] for more details on timings and processor dependant optimization. MASM unfortunately doesnt show timings (or cumulated timings) for the code it generates through structured programming directives and/or macros.
28
pile of sources and bingo, found the offending line. Everything went back to normal after I removed it When I told this story to Sven, he quickly pointed out that this Win95 behavior was carefully documented in Unauthorized Windows 95 [Schulman 94.01], pages 319331. Schulman documents the effect of CLI in Win95 under various conditions. The most worrisome ones are when the CLI (with no matching STI) happens in a 16-bit regular DOS program running in Intels CPU V86 mode. The effect I saw happens when running under protected mode, and affects only the faulty process. Under the same circumstances, NT would quite simply trap the process with a Privileged Instruction exception.
29
the above will define an array, Foo, of 10 uninitialized structures of type Item. There is more information about this topic in the README file for the MASM 6.12 patch (c.f. supra.)
(Do not forget the alignment issues!) This funny syntax is documented in the MASM Programmers Guide, but you are likely to miss it, unless you look very carefully at the code fragment at Chapter 7/ Procedures/ Creating Local Variables Automatically, page 192: the example to look at is the aproc procedure, and the way it defines a local array of words... You can't initialize a local var. The whole LOCAL space is merely allocated all at once on the stack at runtime upon proc entry and you have to MOV any initial value there all by yourself (look at the code the LOCAL statement generates). But you can of course use symbolic addressing to do so, and the assembler will automatically generate the corresponding EBP-based addressing. It is probably useful here to mention a little know but very important characteristic of the Win32 stack management: Sven B. Schreiber brought to my attention the sequence of code that the C compilers generate to probe the stack when assigning large LOCAL blocks. Large is defined here as near or above 4K (a VM page). The problem seemed to be that if a
30
program would unexpectedly hit a stack page that had not been committed, the whole process would suddenly disappear with no warning or message. Not even a Poof! or a cloud of smoke This did not seem very clear at that point, as the stack probe would not seem to accomplish more than what the faulty program would do, i.e. merely touch the uncommitted page. I finally found the whole story in Richters Advanced Windows [Richter 97.01], in the chapter titled Using Virtual Memory in Your Own Applications, subtitle A Threads Stack. The problem is that the stack normally grows and shrinks sequentially. The OS tracks the current bottom of the stack by trapping accesses to the lowest page that is currently committed, the guard page. If an access is done in that page, the OS commits yet another page immediately below, and this new page becomes the guard page. Trouble occurs if an application allocates more than 4K (a page) of data at one time and manages to directly access one page below the guard page, effectively jumping over the guard page and defeating the stack growth logic. This should normally trigger an access violation, but it instead silently kills the process, just like a stack overflow would do. This problem is taken care of by the stack checking logic of the compilers: when the compiler detects that a function is allocating more than 4K of local data, it generates code that touches the allocated data sequentially, from top to bottom, 4K at a time. Whenever the guard page is touched, and new page is committed 4K below it and the newly committed page becomes the guard page, and the compiler stack probe routine prevents a stack fault from occurring this way. Once again, this is fully documented with all gory details in the aforementioned Richter book. The bad news is that this case is not currently handled by the MASM prologue code generated in STDCALL functions. But there is good news too: One is that this problem is limited to functions that allocate 4K or more of local variables at a time, that there are probably not so many such functions, and that it is easy for the programmer to track these manually and add a call to a stack touch loop in these functions. Another good news: MASM makes provision for fully customizing the prologue and epilogue code that are generated upon PROC entry and exit. This is the right place to write the stack touch logic. The epilogue code has access to everything it needs, such as the size of the LOCALs for the function, the functions .MODEL, etcThe way to achieve this is documented in the MASM Programmers Guide, Chapter 7/ Procedures/ Generating Prologue and Epilogue Code, page 198. One of the include files provided with MASM, PROLOGUE.INC, gives an example of a customized prologue code in a 16-bit environment.
Without parameters, CALL to a DWORD variable can be used: it is more straightforward to code (no TYPEDEF, etc), and supports forward references.
(MASM Programmers Guide, Chapter 8/ Declaring Symbols Public and External / Using EXTERNDEF/ (look at label codelabel in the example) page 215) Remember that most local labels can be eliminated by the use of structured programming directives.
32
MyItem Item <> One way to do this is by fully qualifying the structure path, such as
MOV EAX,MyItem.Foo
Finally, one can use the ASSUME directive. This way is particularly convenient in the most common case, when several members of the same structure are manipulated in the same code area:
MOV EBX,OFFSET MyItem ASSUME EBX:PTR Item MOV EAX,[EBX].Foo MOV ECX,[EBX].Bar ASSUME EBX:Nothing ;Get access to MyItem. ;Tell MASM what EBX points to. ;Works even if labels Foo and Bar are ;used in many distinct structs. ;Tell MASM were done with EBX.
This form as the added benefit of checking for erroneous base register use. MASM will flag as an error any attempt to use the wrong register for addressing the structure.
Alternately, it is possible to disable checking through the OPTION OLDSTRUCTS directive, but this is not recommended: it prevents the use of identical member names in two distinct structures, the nesting of structures and many other useful features. This likely to bite you in some of the numerous Win32 structures, where the identical symbols are often use in distinct structures.
33
All of this (and some more) is documented in the MASM Programmers Guide, but the information is once again oddly split in two parts: under Chapter 5/ Structures and Unions/ Referencing Structures, Unions and Fields Page 126, (where you would expect to find it), under Appendix A, Differences between MASM 6.1 and MASM 5.1, OPTION OLDSTRUCTS (text and examples), pages 370371, (where you would probably not expect to find it)
For details about SIZEOF, LENGTHOF and TYPE, see page 108 of the MASM Programmers Guide.
34
3.1.7 MASM bugs and shortcomings 3.1.7.1 Invalid code generation in INVOKE using 16 bit parameters (or a mix of 16 and 32 bit)
Note: This 6.11d bug has not yet been checked in MASM 6.12 When one declares an 8 or 16 bit parameter in an INVOKE list, MASM gets very confused: it tries to be smart and to generate code extending the parameter on the stack to 32 bits, but gets hopelessly confused in the data size of the set of PUSHes and POPs that it generates. The exact error depends on the exact code pattern being assembled, but the net result is always inconsistent generated code, and a stack structure that doesnt match the instructions MASM generated (carefully look at the extended listing). The result is a GPF that seems to strike from nowhere. This occurs when the current code segment is a 32-bit one (USE32), which is always the case when programming for Win32. The resulting GPF is such that when it happens, it looks like it cant be tracked: ESP gets loaded with an invalid value so when the process crashes, you have completely lost the stack context and/or the value of EIP, and dont know anymore where the error came from. Even single stepping in the code is confusing, as when one experiences this for the first time, the problem seems to strike from nowhere (like a fault that would happen in some unrelated and unknown code portion). A similar incorrect set of instructions is generated if you mention segment registers in an INVOKE list (not that usual, though). You have been warned!
Well, forget it; the comments count in the 512 bytes, so the above whole logical line (that doesnt use TAB characters but spaces) doesn't fit in the 512 buffer.
35
MASM will flag it as an error (Booo). However long the INVOKE is (and some of them ARE very long), it has to fit in 512 bytes So you have to remove comments and leading spaces, pack several parameters per line, etc You can NOT get away by using the continuation character (\) nor any other trick. A similar problem is likely to happen with macros generating long byte strings. VERY frustrating.
but there are many other potential (and very benign) causes for a fatal error. Unfortunately, the Fatal Error prevents the .LST (listing) file from being generated. This is particularly painful when debugging complex macros, since the only way to debug macros is precisely to generate a listing with full code expansion. So one ends up with a macro that generate offending code that can't be seen because the generated code can't be listed. Yet another very frustrating situation.
that generates
00000052 2 00000055 7m,3 00000057 5 0000005A 7m,3 0000005C 3 0000005D 83 76 83 75 90 F8 00 06 3B 00 01 * * * * *@C0006: cmp jbe cmp jne nop .ENDIF eax, 000h @C0006 dword ptr [ebx], 000h @C0006
But quite often, one needs to test preexisting condition codes, such as those resulting from an arithmetic operation (e.g. SUB EAX,EBX), or those returned by a routine.
36
To handle these cases, the authors of MASM created some special (and somewhat redundant) symbolic for directly testing preexisting condition flags: ZERO?, CARRY?, OVERFLOW?, SIGN? and PARITY?. They can be used as in
.IF CARRY? ;Generates a JNC
The MASM authors apparently didn't think about the obvious, that of simply deriving all the existing Intel J<cond> mnemonics for simple condition testing in their structured programming directive: allowing expressions such as .IF Z?, .WHILE C?, .UNTIL S?, .BREAK .IF P?, CONTINUE IF AO?, etc,... would have been simple, intuitive and exhaustive. This oversight is unfortunate for two reasons: First,
.IF !CARRY? ; The only direct way to generate a JC
is not as readable as
.IF ABOVE? ; Is more mnemonic.
At the opposite of MASM, the Intel mnemonics define various synonyms for the same conditions to improve code readability. For instance, Intel defines both a JZ (Zero) and a JE (Equal), that are exactly the same instruction. But testing for ZERO makes sense after a subtraction while testing for EQUAL is intuitive after a comparison. Ditto for JL and JNGE, JGE and JNL, etc But the most annoying part is this story is that the existing predefined symbols dont allow generation of some of the less usual combo jumps, those that test for multiple conditions at once, such as G, GE, BE, LE, and their negations. I have not found any way to solve this one. Even trying to use combined flags does not work: a JL for instance takes a jump when the SIGN? flag is not equal to the OVERFLOW? flag. Lets try this:
.IF Sign? == Overflow? ;This should generate a "JL"
Ooops!
error A2154: syntax error in control-flow directive 00000060 7m,3 75 01 * jne @C0008
(Boooo!) Another example: A JA takes its jump if both Carry and Zero are false. The inverse of a JA is a JBE, and is taken when either Carry or Zero is True. So the following expression should generate a JA:
.IF CARRY? || ZERO? ;Should generate a JA
37
But instead, it generates the right logical sequence in the most inefficient way.
00000060 7m,3 00000062 7m,3 72 02 75 01 * * jb jne @C0009 @C0008
The bottom line is that we have no way to generate any of the jump instructions that test combined flags, nor to redefine the right mnemonics for them.
The use of some older features of the macro language additionally preclude the use of the quote, double quote, backslash, percent and ampersand symbols. The macro syntax does not allow one to parse the label field of a macro invocation as a parameter and use it as such to generate a label somewhere in the generated code. Etc As an example, here is a problem to solve: Use the macro language to define a directive that would transparently generate a Unicode strings using a syntax exactly compatible with that of the native BYTE directive, for instance. Go to The String macro, page 51, for the best solution we found so far. And remember that the result does not reflect in any way the pain it was to achieve it. The challenge is open, by the way: anyone with a better solution to the problem, please Email! The way things are, I dont see any way this situation could be fixed by enhancing the macro language again. If MASM ever goes back to the development cycle one day, or if a new MASM compatible assembler is ever developed, I would gladly vote for the creation of a
38
brand new, incompatible but consistent macro language, and rewrite all my existing macros from scratch so they can still compile the old source code syntax. I guess such an enhanced MASM could also insure backward compatibility by taking an OPTION OLDMACROS directive to hide the new syntax and re-enable the existing braindamaged 6.1x syntax.
39
40
The only other thing you need to realize to connect the above pieces is that the name of the entry point function is defined through the /ENTRY: directive of the LINK utility (the same directive that is used to define the program/process entry point in an .EXE). At this point, you will have about all the first level information you need to know about DLLs, and especially about the entry point function. The rest of what you need explains how to tell LINK to generate your DLL, and this is documented in the MSDN Library CD-ROM: Product Documentation Languages Visual C++ x.y User's Guides Visual C++ User's Guide LINK Reference Module-Definition (.DEF) Files The functions your DLL expose will be defined in the EXPORTS section of the .DEF files. Alternately, the other way to define exports is through the use of a command line switches (/EXPORT:). For any large project, I tend to like the .DEF file approach better, but this is largely a matter of personal preference (and of the building tools one uses). As you can see, we picked up the documentation of the VC++ linker that we took from recent MSDN Library documentation. About the same documentation applies to several earlier versions of the 32-bit LINK utility, probably up to its earliest release that used to be known as LINK32.
41
Using this feature, the linker can be used to construct tables of related objects. The related objects can be declared in different modules, but the linker will be able to consolidate / concatenate them in an orderly way in the resulting image, building structures that the program will be able to use at run time. An example of this use can be found in : Runtime Initialization / Termination Macros, page 56
The trick is a very simple one: A DLL forwarder is created by placing the address of the target function as the optional internalname, such as:
SomeFunc=OtherDLL.SomeOtherFunc
When defining exports from the command line, the syntax is:
/export:SomeFunc=OtherDLL.SomeOtherFunc
42
I have found that at least with my MASM / linker couple, using this feature forced me to specify decorated names in the .DEF file where the forwarder was defined. This might or might not be true of other versions of the software. Without using forwarders, you dont have to bother as the linker handles the decoration automatically. If you get undefined symbols corresponding to names involved in forwarders, try decorating the names manually. This will likely propagate the error up the DLL chain, back to the main .EXE, and you will have to fix the upstream .DEF files accordingly. Now for the bad news: DLL forwarders are not implemented in the Win95 loader. You will get an OSgenerated runtime error complaining about a missing DLL symbol if you try to use them.
Weak external can be implemented in MASM 6.1x using the ALIAS directive (only very poorly described in a MASM 6.11 release note). In the example described above, the directive would be: ALIAS <sym2> = <sym1> Beware: Any syntax errors in a ALIAS definitions (and/or reference to a missing symbol) usually trigger page faults in MASM 6.11d (Owell). Note: The README.TXT file for MASM 6.12 claims that a number of Access Violation causes have been fixed, but we have not checked this one at this time.
43
If Visual Studio (aka Developer Studio) is loaded on your machine, you can use it as a mere debugger, even if you never use its IDE to develop your MASM application: If you want to debug your great new FUBAR.EXE application on the current drive, and providing you installed VC++ on drive G: under \MSDEV, just run:
G:\MSDEV\BIN\MSDEV.EXE FUBAR.EXE
This will launch the Visual Studio IDE right into the debugger and allow you to start debugging. All symbolic facilities should be there. The best tool I have found so far for debugging Win32 applications is Numegas SoftIce 3.x., and the best setup I found for it was to run it on a single machine, using a second video controller and dedicating a small alternate screen to the debugger. Unfortunately, although it claims to fully support MASM, SoftICE doesnt support the complete legal character set that MASM allows (as defined in MASM Programmers Guide, Chapter 1/ Language Components of MASM/ Identifiers page 9): as a result, labels including special characters such as $ and ? are not supported and can not be accessed or used in expressions with SoftICE. This is quite unfortunate, since MASM itself does generates labels with ? symbols for local symbols such as those generated by the structured programming macros. In addition, there are quite a few libraries that use the perfectly legal $ and ? characters, and debugging code using these libraries is very akward.
4 Various gripes
4.1 The absence of LDT support in Intel-based platforms
Microsoft decided a few years ago that since NT was to be multi-platform, the only MS blessed way of programming was to use C (or C++). From that day, Microsoft seemingly stopped caring about assembly language, to the point of mostly ignoring it as they do today. What MS might not have anticipated is that the MIPS people, soon followed by the PowerPC folks, would drop off the NT market, and that the remaining non-Intel NT platform (the DEC Alpha) would represent a minuscule part of the market, making the whole portability issue a very moot point. The MS folks went as far as preventing use of a key feature of the Intel CPU, seemingly because they didn't have any exact counterpart on other (RISC) platforms. Did you ever notice that there is no way to benefit from segmentation in user mode under WIntel32 platforms? I know that the forced use of segmentation with 16-bitness in previous times gave a terrible reputation to segmentation: It was a nightmare in 16BitLand to manipulate large pieces of data broken in 64k segments. But segments have another use and benefit: They provide a very efficient way to implement multi-instantiation: By simply changing the place where the data segment registers point it is possible to implement
44
code reentrancy and re-instantiation in a fully code-transparent transparent way. And as the segment registers are part of the CPU context, the OS can automatically keep separate data context for each thread running off the same piece of object code. So by initially manipulating the segment registers, a process thread could launch many threads running the same code, with each thread running off its own data area (materialized by its own descriptor) The benefit is, no addressing restriction nor inefficiency in the available addressing modes. This segment mechanism was explictely designed in the CPU as the simplest and one of the most efficient way to program a piece of code running a single application for many users simultaneously, for instance. Well, the problem is that Win32 provides no documented way to let a Ring 3 (user mode) process allocate an LDT entry. This means one can't allocate a bunch of memory from the OS, ask the OS to create a selector for it and start a new instance of a thread that would use this new data area as its data segment. The only mechanism that Microsoft offers to achieve multi-instantiation is what they call Thread Local Storage (TLS): It is implemented in two different ways, Dynamic TLS and Static TLS (see [Richter 97.01] for details). Dynamic TLS allows the programmer to use an OS allocated 64 DWORD array to maintain thread-specific pointer. This implies: That the number of data items that can be tracked this way is limited to 64, that all instantiated memory accesses are done at best through an indirection that access to the array is accomplished by system calls, making the mechanism even less efficient The least inefficient way, static TLS, is still hardly acceptable: Compile-time storage is allocated in the .TLS section, and the OS replicate the TLS segment for each new thread that is started. This means that each thread in the process gets a block of TLS storage the size of all the TLS data from all the threads In other words, the TLS section is allocated as if it where global for all the threads, and the threads that dont need any instantiation data still get it. In addition, and as pointed out by [Richter 97.01], on an x86 CPU, three additional machine instructions are generated for every reference to a static TLS variable. This is something most C programmers dont see, since the extra overhead only appears in the code generated by the compiler, that most C programmers dont look at or really care about. The problem is completely hidden at the C source level. But it certainly doesnt look so to the assembly language programmer, and the efficiency of the resulting code is obviously much lower. Furthermore, using TLS, the programmer looses in the process the automatic inter-thread protection the use of segmentation would have provided: attempts to access data outside a memory segment is something an Intel CPU automatically trap. Finally, static TLS can only be used with implicitly loaded DLLs. No Win32 OS is able to properly initialize TLS storage for explicitly loaded DLLs (loaded via LoadLibrary). For details, see Static Thread-Local Storage, [Richter 97.01]. The bottom line is that, at least for the Intel implementation, TLS is hardly more than a dirty and very inefficient kludge.
45
Apart from TLS, the only other official way to achieve multi-instantiation in theWin32 world is to create multiple processes (rather than multiple threads). This belongs to the steam-hammer action category though, and doesnt compete for efficiency with the lightweight multi-threaded way: Each process requires a separate .EXE file, Each process requires reloading / remapping the same memory image for each instantiation of the process. Changing process context is more costly than changing thread context (all process-local items are part of the process context and dont need to be changed when switching thread context inside the same process). Using separate processes precludes the use of any of the lightweight intra-process synchronization and data sharing mechanisms such as global memory items, critical sections, etc, And since the inter-process synchronization / communication mechanisms have to cross process boundaries, they are much more costly than the intra-process ones.
Whenever we had a chance to ask, we only got two explanations so far from Microsoft personnel about this missing feature: The first one is the flat model dogma: Win32 uses the flat model, and this model precludes the use of segmentation. This is simply not true. Using the flat model never prevented the use of segmentation, as the Intel CPU documentation clearly states. There is no such thing in the Intel CPU as a flat model bit, and using a flat model is a pure programming convention / convenience. No CPU-inherent technical limitation prevents a programmer from occasionally using a segment register for any reason. As we mentioned above, the best evidence is that Microsoft themselves use segmentation in the Win32 world: in any Win32 thread, the FS register always contains a special descriptor, that doesnt follow the flat model rules, and is used to access the TID (Thread Information Block, see [Pietrek 95.01]). The lack of access to segmentation from Ring 3 code only comes from an OS design decision. The second explanation Microsoft commonly gives about the lack of access to segmentation from Ring 3 code is the need for portability, and the lack of hardware mechanisms to implement segmentation on non-Intel platforms. As we already mentioned, this looks to us as a very moot point, as portability is non-existent in the Win9x world anyway, since there is no such thing as a non-Intel-compatible Win9x platform, and conversely Win9x-specific features are only supported on Wintel32 platforms, portability is of little use in the NT world, and it is even diminishing: cumulated sales of NT on non-Intel platforms are a milli- or micro-market, even shrinking now the MIPS and the PowerPC contenders threw the towel, there is only one non-Intel machine left in the arena (the Digital Alpha), it is currently marginal, and this situation doesnt seem likely to improve: AutoDesk, makers of AutoCAD, one of the major selling application for the Alpha, recently decided to drop support for the NT/Alpha platform: there was not enough demand from an API point of view, it would only take two or three Intel-specific NT system calls to implement this mechanism, and last but not least,
46
its ultimately the responsibility of the people designing application software to decide whether portability is (or is not) a more desirable goal to achieve than efficiency on any given platform they decide to choose.
We think there is clearly a case here for Microsoft implementing a few Intel-specific syscalls and allowing proper thread instantiation through the use of selectors and Ring 3 LDT manipulation.
Since we are not likely to see Microsoft change their position on this point in the short term (!), here are some alternate solutions for those assembly language programmers that the TLS kludges do not satisfy:
One way could be to restrict all data to local (stack based) storage, but there are some problems with that approach: it severely limits the addressing modes available to the programmer. This is particularly limiting when complex, nested structures / arrays need to be accessed, it imposes severe architectural constraint to the programmer: there are many programming situations where global data access is required, specially in timecritical real-time applications, when the database is large enough. A number of Win32 constructs actually require global data access,
Another variant could be: 1. 2. 3. 4. Program the thread to instantiate, grouping its instance data together, compute at runtime the size of the static RAM database for the thread, allocate a chunk of memory of the same size, initialize the new chunk as needed (possibly by a mere memory move from the original to the new chunk), 5. compute the offset between the original memory block and the new chunk, 6. load a base register with the result and 7. address each and every instantiated variable through based addressing, using a different chunk (with a different base) for each thread to instantiate.
47
This solution is slightly better: it provides a global database with no size limitation, while still leaving stack based addressing to parameter passing and true local storage. But still far from ideal: it sacrifices a precious and scarce base register for the whole life of the thread, it precludes any access to the direct addressing mode, the simplest an least error prone of all, it prevents the use of base + index addressing inside the thread instantiated data block (since base is already used to maintain access to the data block), seriously complicating access to complex data structures, and if the programmer forgets to use base addressing on any single instruction, the program will access the RAM instance of the original thread rather than that of the thread its running in (ouch), creating very hard to track bugs. But everything being relative, keep in mind that the same kind of error is even more likely to happen using the much more twisted TLS addressing ways.
The irony is that nearly the same logic as we described above would apply if LDT selector allocation were allowed: step 5 above (and following ones) would be replaced by something like 5. Allocate an LDT selector 6. Load a segment register with the result 7. Address each and every variable just as you would if this thread were alone: It is actually alone to access this memory segment.
The main difference is that each memory access could be then be achieved safely and efficiently using any and all of the addressing methods the CPU offers.
48
5 Win32ASM Toolkit
The toolkit contains an undefined number of files. Undefined, because the toolkit is a never ending work, the documentation always tend to lag behind, We have already postponed the release of this document too much, waiting for planned enhancements that did not make it, it is probably better to provide a file with no documentation (or an obsolete documentation) than no file at all, these files should be considered as work in progress.
For these reasons, we can not guarantee that files associated with this documentation match exactly what is described, nor can we insure that they are fully stable. In other words, and as stated in the disclaimer ahead of this document, you are using this code at your own risks
49
5.2.1.1.1 UnicAnsi.equ
This include file handles the character set issues. At the time I am writing this, I am far from having covered, or even started studying the topic: All my current works has to be Win95 compatible, and I thus I have to stick to ANSI representation (Win95 does not support Unicode format). So the only part of UnicAnsi.equ I am actually using today is the UnicAnsiExtern macro (see below). Most of the other material in this file directly comes from Sven B. Schreibers Walk32 work, mentioned in this document, and has not yet been used or tested in the Win32ASM environment. I could even have broken it while reshuffling it around and not have realized it yet.
50
in the Win32SDK: if you spelled it properly (case included), you might have hit a function that is character set dependent. In this case, add an UnicAnsiExtern entry with the name of the new function ahead of the equate file, before your PROTO definition.
5.2.1.1.3 Win32Defs.equ
This file contains miscellaneous Win32 EQUate and TEXTEQU definitions.
5.2.1.1.4 Win32Strs.equ
This file contains a number of Win32 structure definitions.
5.2.1.2 Win32Res.equ
This file contains numerous EQUates related to resource definitions.
51
programming is more oriented toward console (service) applications, not that much of the Windows API is covered at this point. This situation should slowly improve with time, as I keep on adding new functions, structures and reorganizing this file set as needed.
5.2.2.1 CommCtl32.equ 5.2.2.2 CommDlg32.equ 5.2.2.3 GDI32.equ 5.2.2.4 Kernel32.equ 5.2.2.5 TAPI32.equ 5.2.2.6 User32.equ 5.2.2.7 WinMM.equ 5.2.2.8 WinSpool.equ
52
.BLOCK CALL FBIIGetParms .BREAK .IF CARRY? CALL FBIISendParms .BREAK .IF CARRY? CALL FBIxWait4Tx .BREAK .IF CARRY? CALL FBIxRelTxSem CALL FBTxQuotePatch CALL FBTxInit1Init CALL FBRxInit1Init CLC .ENDBLOCK
;We just sent DLE ++ DLE 0. ;get other end's parms, ;Drop out if error. ;queue our parameters packet, ;Wait until we're acked to go to data ;Release the Tx semaphore we just used. ;Update the quote table. ;Initiator, end of init, Rxer and Txer.
5.3.1.1.2 FOREVER
.FOREVER is a termination for a .REPEAT loop that unconditionally jumps back to the head of the loop. It is synonym to .UNTIL 0, but here again; readability is the key: .REPEAT / .FOREVER is simply more explicit than .REPEAT / UNTIL 0.
.REPEAT INVOKE GetMessage, OFFSET winMsg, ;Adress of Msg structure, 0, ;Window to get msg from, 0, ;Filter min, 0 ;filter max. .BREAK .IF (EAX == 0) ;Can this ever happen here?... INC StatTAPIMsgs ;Count messages we see. $Display 'Got a Win message',$EOL INVOKE DispatchMessage, ;Dispatch msg to proper winproc, and OFFSET winMsg ;loop again. .FOREVER
53
5.3.1.3 UnusedParm
The UnusedParm macro is useful in PROCs where some entry parameters are defined but not used. This happens very often in CALLBACK procedures, for instance. In this case, if the warning level is set to the maximum value as it should, MASM generates a warning.
UnusedParm allows you to disable the warning (and document the fact that actually not using the parameter is not a bug). MSGPROC WinProcCMD_ID_HELP_ABOUT INVOKE DialogBoxParam, hInst, IDD_ABOUTBOX, hWnd, OFFSET AboutDlgProc, 0 XOR EAX,EAX RET UnusedParm wMsg UnusedParm wParam UnusedParm lParam WinProcCMD_ID_HELP_ABOUT ENDP ;Process instance, ;"About" box template resource, ;owner window, ;dialog box procedure, ;lparam for WM_DIALOGBOX message.
5.3.1.4.1 MUSTBE
This routine does not take any message. The only data the FatalError routine has are the contents of the registers and condition codes, and the address where the problem occurred (sitting at the top of the stack).
54
5.3.1.4.2 MUSTBEM
This routine takes an additional, optional parameter, an error message string. The message is generated in the .CONST section, headed by a byte length, and an INVOKE FatalError is generated, with a pointer to the aforementionned message.
CMP hCall,0 ;Check call handle: MUSTBEM E,'FoolineMakeCall: Call handle already active ?!'
5.3.1.4.3 MUSTBEMGLE
Same as MUSTBEM, but the macro invokes a variation of the FatalError routine, FatalErrorGLE. FatalErrorGLE invokes GetLastError, formats the corresponding OS error message and present that information together with all the relevant information available to the regular FatalError routine.
INVOKE SetConsoleCtrlHandler, ADDR BruteForceExit, TRUE OR EAX,EAX MUSTBEMGLE NZ,'FooMain: SetConsoleControlHandler failed'
5.3.1.4.4 SHOULDBE
This macro is in essence equivalent to the MUSTBE macro with a major difference: it invokes a Warning routine instead of a FatalError routine, and Warning is supposed to return to the point it was called without changing any register or condition code. Although I have had the SHOULDBE macro around forever, I never got around implementing the Warning routine. The FatalError ended up being sufficient for my own use.
55
ENUM WORD ENUMITEM ENUMITEM ENUMITEM ENUMITEM ENUMITEM ENUMITEM ENUMITEM ENUMITEM ENUMITEM ENUMEND
;TXer does plenty of nothing. ;About to send 'B', second header byte. ;About to send a data byte. ;TxEr about to send CRC-8 (negociation). ;TxEr about to send CRC-16, LSB ;TxEr about to send CRC-16, MSB ;TXer about to send DLE'd data. ;Transmitting from ring buffer. ;Txer about to check for chained ;transmission.
56
point. And then, later, during testing, you might figure out that you forgot to call the termination routine and some cleanup action never gets properly performed. Wasteful, because if one day, you stop invoking a given routine in your program, you will likely forget to remove the initialization and/or termination routine, and even if this does no harm, this will result in the linker still pulling the whole library member in your code, because the initialization code is still invoked there.
The solution to this are the four Init/Exit macros. You code the $InitRoutine macro in the same module as the library routine itself, together with the initialization code for the routine. Ditto for the $ExitRoutine macro. So the library routine and everything related to its initialization and termination can be coded in the same source module, that of the runtime library, and only there. From them on, you dont have to think about initializing and terminating a library routine anymore. If you call a library routine from within your code, this will pull its initialization code at the same time. And the initialization code of all library routines will be invoked all at once, in the startup code of your program, but you will not even have to know which routines actually require initialization and which do not. Stop using some library routine, and it will not be pulled in the .EXE file anymore. Nor will its initialization code. So where is the trick? The trick is in the linker (and its Grouped Section feature mentioned above) and in four rather simple macros. The macros create two sections. One is used to build a table containing the addresses of all those initialization routines. The other one is used to build a table containing the addresses of all the termination routines. The $InitRoutine is used to declare an entry point to an initialization routine. Each $InitRoutine macro will define the address of its initialization routine and put it in the initialization section. Ditto for each $ExitRoutine, that will put the address of its exit routine in the termination section. If convenient, a single module might have as many $InitRoutine and $ExitRoutine macro calls as needed. When the linker pulls a library routine, it will find their initialization and termination sections (if any) and concatenate them with those of the other library routines used by the program. The only thing that the application startup code will need to do is to declare a $RunInitRoutines macro in its startup code, before any thread is created. The $RunInitRoutine does not take any parameter and does not even know whether there is any initialization routine to call in the whole project. If there is no initialization to perform, the initialization section will be empty. The $RunInitRoutine generate a very simple loop that calls in turn each address in the initialization section. Ditto for the $RunExitRoutines macro, that is called by the application in its final code, presumably after all active threads have terminated.
57
An example is probably appropriate at this point: The modules requiring initialization and / or termination look like this: In a module defining memory pools handling routines:
; Declare all routines that require initialization. $InitRoutine MPInitialize $InitRoutine MPCritSectInit $ExitRoutine MPTerminate MPInitialize PROC ; Initialization code here RET MPInitialize ENDP MPCritSectInit PROC ; Initialization code here RET MPCritSectInit ENDP MPTerminate PROC ; Initialization code here RET MPTerminate ENDP ;Initialize memory pools ;Initialize critical section ;for the mempool routines. ;Cleanup / release memory pools
The memory pool handling routines above need both initialization and termination. In a module defining message queues handling routines:
; Declare the routine that requires initialization. $InitRoutine MQInitialize MQInitialize PROC ; Initialization code here RET MQInitialize ENDP ;Initialize message queue
The message queue routines above require an initialization routine but no termination routine.
The main code for a process using the initialization macros looks like this:
58
.CODE MainProc PROC ; At this point, no other thread is running, so none of the ; initialization routines encurs the risk of a race condition. $RunInitRoutines ;Run all initialization routines. ;Carry will be set if some init ;routine failed. In this case, ;execution of init routines will ;stop after the failing routine. ;If no initialization error, ;the main code is here
.IF !CARRY CALL MyMainCode .ENDIF SAVE EAX $RunExitRoutines RESTORE EAX INVOKE ExitProcess, EAX MainProc ENDP
;Now execute all the registered ;Exit routines. ;All done, ;pass exit retcode.
Here is the detailed, under the hood view of the four macros. You do not need to know exactly how this works to use it, anyway. If you dont care, just skip to the next paragraph. The $InitRoutine macro allows automatic, application-wide registering of the initialization routines. The $ExitRoutine macro accomplishes the same for termination routines. The $InitRoutine is used in any module containing initialization routine(s) that must be executed before the main logic of a program can be started. $InitRoutine and $ExitRoutine are used to declare initialization and exit routines. $InitRoutine declares a section, naming it @Init$<module name>. $ExitRoutine declares a section, naming it @Exit$<module name>. So if invoked in .ASM module FOO, $InitRoutine and $ExitRoutine will create segments named @Init$FOO and @Exit$FOO respectively. Both macros are called with the name of a routine. The macro will generate a DWORD pointer to the routine, and place this pointer in the @Init$ (or @Exit$) segment/section. When a section name contains a '$' sign, a PE linker processes it specially, as mentionned in Grouped Sections, page 41 As a result, the contents of all "@Init$<modulename>" segments will be concatenated in the "@Init section and the contents of all "@Exit$<modulename>" sections will be concatenated in the "@Exit" section. The linker sorts the sections fragments by alphabetical order of <modulename>. So the @$InitRoutine macros of all modules contribute to the construction of a global table containing all the addresses of the initialization routines and located in section "@Init".
59
Likewise, the $ExitRoutine macros contribute to the construction of a global table containing all the addresses of the Exit routines and located in section "@Exit". The $RunInitRoutines and $RunExitRoutines in the startup / exit code of the application put all of this together: they create the "@Init$" and "@Exit$" segments (that will end up ahead of all other @Init and @Exit segments in alphabetical collating sequence and contain the label at the top of the table), and "@Init$zzzzzzzz" / "@Exit$zzzzzzzz" (that will hopefully end up alphabetically after all other @Init and @Exit segments and contain a DWORD 0 as an end of table marker). Finally, both the $RunInitRoutines and the $RunExitRoutines macro generate a short code loop that goes down its associated list and calls each address in the list. In case the order of execution of some Init and/or Exit routines must be ordered differently, it is possible to pass a second (optional) parameter to the $InitRoutine ($ExitRoutine) declaration. This second parameter is concatenated in the segment name ahead between the @Init$ (@Exit$) and before the <module name>. It allows one to change the linking order inside the group section and force the Init (Exit) routines to execute in any suitable order (rather than by alphabetical order of modules). For instance, a "Console Log" routine might need initialization before any other routine so the other initialization routines might use the Console Log PROCs to log what they did. Passing a second parameter of "0" (lowest in collating sequence) might force the console log init routine to move ahead of the list (providing no other less important module uses this same value and no module is named "0.ASM").
6 Bibliography
6.1 [Booth, 96.01]
Rick Booth Inner Loops Addison Wesley Developers Press
60
61