Pascal Newsletter #3
The full source code examples of this issue are available for download.
![]() |
![]() |
Pascal Newsletter #3 INDEX 1. A FEW WORDS FROM THE EDITOR 2. FIND FILE: ADDING A CONTEXT MENU 3. PORTING ISSUES: UTF-8 STRINGS Strings types in Delphi MultiByte Character Strings (MBCS) in Windows Length of an ANSI string Introduction to UTF-8 (UCS Transformation Format) UTF-8 encoding Length of an UTF-8 string 4. LINKS ________________________________________________________________________ 1. A FEW WORDS FROM THE EDITOR If you visited our web site in the last week or two, you must have seen the new look. If not, then take a look at http://www.latiumsoftware/en/index.php I would like to thank a friend of mine for this nice job. Please report if you have any troubles viewing it with your browser. We have added a few articles about Delphi. Most of them have been featured in past issues of this newsletter (and its predecessors), but there are also new things: Determining the short name (DOS name) of a file http://www.latiumsoftware.com/en/delphi/00007.php Accessing hidden properties http://www.latiumsoftware.com/en/delphi/00008.php Adding new methods and properties without registering new components http://www.latiumsoftware.com/en/delphi/00009.php We will keep you informed about new additions to the site. Please remember that if you have doubts or questions regarding the articles of this newsletter, or any other question about Delphi programming, you can post them to our mailing list. We would like to hear about your programming needs to see if we can cover them in this newsletter. Regards, Ernesto De Spirito eds2004 @ latiumsoftware.com ________________________________________________________________________ JfControls Library. Multi-language. Multi-appearance. Skins. Privileges. More than 40 integrated and customizable components. Impressive GUI. Centralized resources administration. Multiple programming problems solved. For Delphi 3-7 and C++ Builder 3-6. http://www.jfactivesoft.com/ ________________________________________________________________________ 2. FIND FILE: ADDING A CONTEXT MENU In this article we continue building our Find File application we started in the former Delphi Newsletter. This time we are going to add a context menu to the file list, so the user can choose to open the file or the folder where the file is located. Adding a context menu to a control is easy: 1) Drop a TPopupMenu component on the form; 2) Edit its Items property adding menu items with their corresponding OnClick event handlers; and 3) Assign the menu object to the control by setting the PopupMenu property of the control. This way, the context menu will popup whenever the user right-clicks on the control (or presses the Apps key in Windows 95 keyboards). For our purpose we are going to make it a little bit more difficult by skipping step 3) and calling the popup menu using the Popup method when we need it. We have to do it "by hand" because we need to differentiate between keyboard or mouse invocation basically to determine which file or folder is the one we should open. We would also like to set the default menu option to be "Open" if the user right-clicks on a filename, or "Open folder" if the user right-clicks on a folder. Ok, enough introduction. Now let's work! Drop a TPopupMenu component on the form and edit its Items property adding two menu items with the following properties: 1) Name = 'Open1' 2) Name = 'OpenFolder1' Caption = 'Open' Caption = 'Open folder' OnClick = 'Open1Click' OnClick = 'OpenFolder1Click' Add the following code to the event handlers: procedure TForm1.Open1Click(Sender: TObject); begin if ShellExecute(Self.Handle, nil, PChar(SelectedItem.SubItems.Strings[0] + SelectedItem.Caption), nil, nil, SW_SHOWMAXIMIZED) <= 32 then begin Application.MessageBox(cstrCouldNotExecApp, 'Error', MB_ICONEXCLAMATION); end; // if end; procedure TForm1.OpenFolder1Click(Sender: TObject); begin if ShellExecute(Self.Handle, 'explore', PChar(SelectedItem.SubItems.Strings[0]), nil, nil, SW_SHOWMAXIMIZED) <= 32 then begin Application.MessageBox(cstrCouldNotExecApp, 'Error', MB_ICONEXCLAMATION); end; // if end; These methods open a file and a directory respectively, exactly as we have seen in past issues. The only difference is that we assume SelectedItem is a variable or property of type TListItem that references the item in the TListView object (ListView1) that was selected before calling up the menu. So, first thing before going into this is declaring SelectedItem. We have declared it as a private property of the form: type TForm1 = class(TForm) ... private { Private declarations } SelectedItem: TListItem; ... Now we should capture the OnMouseDown and OnKeyDown events of ListView1 to set the value of SelectedItem and invoke the popup: procedure TForm1.ListView1MouseDown(Sender: TObject; Button: TMouseButton; Shift: TShiftState; X, Y: Integer); var Col: Integer; begin Last.X := X; Last.Y := Y; if Shift = [ssRight] then begin SelectedItem := TListViewX(ListView1).GetItemAtX(X, Y, Col); if (SelectedItem <> nil) and (Col <= 1) then PopupMenu1.Items[Col].Default := True; PopupMenu1.Popup( Left + ListView1.Left + X + 10, Top + ListView1.Top + Y + 20); end; end; procedure TForm1.ListView1KeyDown(Sender: TObject; var Key: Word; Shift: TShiftState); begin if (Key = VK_APPS) or (Shift = [ssShift]) and (Key = VK_F10) then begin SelectedItem := ListView1.ItemFocused; if SelectedItem <> nil then begin PopupMenu1.Items[0].Default := True; PopupMenu1.Popup( Left + ListView1.Left + SelectedItem.Position.X + 20, Top + ListView1.Top + SelectedItem.Position.Y + 35); end; end; end; The Popup method expects the coordinates of the menu. This coordinates are relative to the screen, so to the coordinates of the focused item or the mouse position (relative to the control) we add the coordinates of the form and the control (to make them relative to the screen), plus a little offset. And that's it! Now you can try it... As usual, the full source code is available at our site: http://www.latiumsoftware.com/download/p0003.zip ________________________________________________________________________ 3. PORTING ISSUES: UTF-8 STRINGS This article is intended mainly for future programmers for the Linux environment and intends to present some of the differences that will exist regarding string processing between Windows and Linux. Strings types in Delphi ======================= A string (as you probably know by now :-) is a sequence of characters. Delphi has three types of strings: * Short strings Short strings are declared using the ShortString keyword. This string type comes from the old times of Turbo Pascal and is supported for backwards compatibility. A short string variable normally uses 256 bytes in total, although its length (stored in the first byte) can vary from 0 to 255. For example: var s: shortstring; begin s := 'Hello!'; The string s takes 256 bytes. s[0] is the length of the string, so in the example its value would be #6. You can't access s[0] directly, but rather you should use Length and SetLength. s[1] is the first character ('H'), s[2] is the second character ('e'), and so on. From s[7] to s[255] the values would be undefined. * ANSI strings Usually called "long strings", ANSI strings are declared using the AnsiString keyword. ANSI strings are actually pointers to a data structure consisting of two integers (that hold the length of the string and the reference count) and the sequence of bytes allocated for the string, that can range from 1 byte to almost 2 GB (providing you have enough memory). For example: var s: ansistring; begin s := 'Hello!'; The variable s itself takes 4 bytes (a 32-bit pointer). The data structure it points to takes 8 bytes for the two integers and in this case 6 bytes for the 6 characters, giving 14 bytes in total. Like with the short string, s[1] is the first character ('H'), and so on. * Wide strings Wide strings, also named UNICODE strings, are special strings where each character (of type WideChar) takes two bytes (a word). In the UNICODE character set, the first 256 values correspond to the ANSI character set. Wide strings are pointers, like ANSI strings, but they are not reference counted, so when you make an assignment between two wide-string variables, the string is actually copied (in the case of ANSI strings the reference count is incremented), so they are inefficient in comparison, but the COM and OLE APIs use this type of strings, and so do ActiveX objects. For example: var s: widestring; begin s := 'Hello!'; Here, the variable s takes 4 bytes for the pointer, and the data structure takes 4 bytes for the length and 12 bytes for the 6 characters (2 bytes each), giving 16 bytes in total. s[1] is the first character ('H'), except it is of type WideChar instead of AnsiChar and takes two bytes instead of one. s[2] is the second character ('e') and starts in the third byte (the first two bytes are for s[1]). The type String is mapped by default to AnsiString. Char is mapped to AnsiChar, and PChar is mapped to PAnsiChar. MultiByte Character Strings in Windows (MBCS) ============================================= When working with Ansi strings, normally we consider that each character occupies one byte, which is true for Western European languages, but for most Asian languages, 256 characters are simply not enough. A possible solution is using wide strings, and another solution is encoding some characters in one byte and others in two (DBCS: Double- Byte Character Strings). For this to work, there must be a way to know whether a byte in a string is a character, or is the "lead byte" of a two byte character. Delphi defines a character set named LeadBytes that contains the characters that are lead bytes in the current Windows locale. For Western locales, this set is empty (there are no lead bytes since there is an equivalence between bytes and characters), and in general for other locales, if the value of the byte ranges from 0 to 127 it is an ASCII character, and if it is greater than 127, then it is a lead byte and the next character is called "trail byte" (may range from 0 to 255). For reasons of efficiency and backwards compatibility, Delphi comes with different versions of string functions for SBCS (Single-Byte Character Strings) and DBCS. For SBCS (one byte = one character) there is no point in going thru the overhead of trying to see if each byte is a character or a lead byte (since there are no lead bytes), so for SBCS you can use the standard functions like Pos, LowerCase, etc., while for DBCS you should use funtions like AnsiPos, AnsiLowerCase, etc. which take into account that some characters may be represented by more than one byte (and thus these functiones are slower). Length of an ANSI string ======================== Indexing a DBCS can be tricky, since s[i] represents the i-th byte, not necessarily the i-th character because previous characters could have had two bytes. The number of bytes in a string returned by the Length function may or may not represent the actual number of characters contained in a DBCS. To determine this number you can use a function like the following: function AnsiLength(const s: string): integer; var i, n: integer; begin Result := 0; n := Length(s); i := 1; while i <= n do begin inc(Result); if s[i] in LeadBytes then inc(i); inc(i); end; end; Introduction to UTF-8 (UCS Transformation Format) ================================================= Windows can work with Unicode strings, as well as SBCS and DBCS, but the Linux kernel works with UTF-8 strings, where one character may take up to six bytes! Normally one or two in Western languages and from one to three in Asian languages. UTF-8 is a multibyte character encoding that can accommodate all the characters of the UCS (Universal Character Set), which contains 31-bit characters that can represent practically all the characters of known languages living and dead, as well as scripts like Hiragana, Kiragana, etc. It also leaves space for more languages, scripts and hieroglyphics, so in the future we can expect to be able to read Klingon poetry, the Ferengi Acquisition Rules and Bajoran prophecies in their original versions... :-) UTF-8 has these important features: * Variable-length encoding for UCS characters UTF-8 can encode UCS (ISO 10646) characters in up to 6 bytes. * Transparency and uniqueness for ASCII characters 7-bit ASCII characters (#0..#127) are encoded as plain 7-bit ASCII (1 byte per character). All non-ASCII characters (#128..#255) are represented purely with non-ASCII 8-bit values (#128..#255) so that non-ASCII characters cannot be mistaken for ASCII characters, and ASCII-based text processing tools can be used on UTF-8 text as long as they pass 8-bit characters without interpretation. * Null character Character #0 (ASCII NULL) only appears where a NULL is intended. It can't be a trail byte for instance. * Self-synchronization for fast speed processing High bit patterns unambiguates character boundaries, and makes it easy to know whether a byte is a single-byte character (0xxxxxxx), a lead byte (11yyyyyx) or a fill byte (10xxxxxx). This feature is very important because it allows UTF-8 strings processing functions be by far a lot more efficient than Windows DBCS. For example, an UTF-8 string can be parsed backwards and also string searches for a multibyte character beginning with a lead byte will never match on the fill byte in the middle of an unwanted multibyte character. And as the lead-byte announces the length of the multibyte character you can quickly tell how many bytes to skip for fast forward parsing. * Processor-friendliness UTF-8 can be read and written quickly with simple bitmask and bitshift operations without any multiplication or division (that are slow CPU operation). * Reasonable compression UTF-8 is not as compact as Windows DBCS, but for Western languages it is better than Unicode, and in the worst case (Eastern languages) it is no worse than UCS-4. * Canonical sort-order UTF-8 preserves the sort ordering for plain 8-bit comparison routines like strcmp (a C standard function). * Flag characters The octets #$FE and #$FF never appear, so you can use them as flags to signal a special meaning (avoiding the possibility of mistaking a flag with a real character). * Detectability It's easy to detect an UTF-8 input with high probability if you see the UTF-8 signature #$EF#$BB#$BF ('') or if you see valid UTF-8 multibyte characters since it is very unlikely that they accidentally appear in ISO 8859-1 (Latin-1) text. UTF-8 encoding ============== This is the general format used to encode UCS characters in UTF-8: Bits Bytes Representation 7 1 0xxxxxxx 11 2 110xxxxx 10xxxxxx 16 3 1110xxxx 10xxxxxx 10xxxxxx 21 4 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx 26 5 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 31 6 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx Notice that the number of leading 1 bits in the lead byte is the number of bytes in a multibyte sequence. The copyright sign ('©' = #169 = #$A9) in binary would be 10101001 and since it needs 8 bits, we would have to use two bytes: 110xxxxx 10xxxxxx We have to fill 11 bits (x), so we add three zeroes to the left of 10101001: 00010 101001 The UTF-8 representation for the copyright character would then be: 11000010 10101001 It could also be represented with more bytes than needed in an overlong string sequence. For example with four bytes it would be: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx 000 000000 000010 101001 ---------------------------------------- 11110000 10000000 10000010 10101001 Overlong sequences are usually used to "camouflage" characters to cheat UTF-8 substring tests. For example, if you look for the copyright sign exactly as 11000010 10101001 (the shortest possible encoding), then you won't find it. Length of an UTF-8 string ========================= In Delphi for Linux, long strings will be in UTF-8 format, while wide strings will remain as two-byte Unicode, although they will be reference counted. To know the number of characters stored in a UTF-8 string we could use a function like the following: function UTF8Length(const s: string): integer; var i, n: integer; c: byte; begin Result := 0; n := Length(s); i := 1; while i <= n do begin inc(Result); c := byte(s[i]); if (c and $80) = 0 then inc(i) else if (c and $E0) = $C0 then inc(i, 2) else if (c and $F0) = $E0 then inc(i, 3) else if (c and $F8) = $F0 then inc(i, 4) else if (c and $FC) = $F8 then inc(i, 5) else if (c and $FE) = $FC then inc(i, 6) else raise Exception.Create('Not an UTF-8 string!'); end; if i > n + 1 then raise Exception.Create('Not an UTF-8 string!'); end; Of course this function should be written using pointers and a bit of assembler to improve its performance, but let's leave that for the pros... :) ________________________________________________________________________ 4. LINKS * Torry's Delphi Pages http://www.torry.ru * Delphi Programming Source Code http://ssapcs.hispeed.com/index.html * Swiss Delphi Center http://www.swissdelphicenter.ch * Delphi Downloads Web Page http://members.xoom.com/sandbrook/downloads/Download.htm * AlphaCom, Inc. http://alphacom.hypermart.net * Top Delphi Sites http://ssapcs.hispeed.com/topsites/index.html * Advanced Delphi Developer's Guide to ADO http://d5ado.homepage.com * Central Iowa Delphi Users Group http://www.bigcreek.com/delphi * The Delphi Cafe http://www.geocities.com/ResearchTriangle/6201 * Natalia Elmanova http://www.geocities.com/SiliconValley/way/9281 ________________________________________________________________________ YOU CAN HELP US We need your help to keep this newsletter going and growing. You can help by referring the newsletter to your colleagues: http://www.latiumsoftware.com/en/pascal/delphi-newsletter.php Or you can help by voting for us in some or all of these rankings to give more visibility to our web site and thus increase the number of subscriptions to this newsletter: http://www.sandbrooksoftware.com/cgi-bin/TopSite2/rankem.cgi?id=latium http://news.optimax.com/delphi/links/links.exe/click?id=70C517ECAE6E http://www.programmingpages.com/?r=latiumsoftwarecomenpascal http://www.top219.org/cgi-bin/vote.cgi?delphi&83 http://top100borland.com/in.php?who=20 http://top200.jazarsoft.com/delphi/rank.php3?id=latium http://213.65.224.200/cgi-bin/toplist.cgi/hits?Id=80 It's just a few seconds for you that REALLY mean a lot to us. ________________________________________________________________________ If you haven't received the full source code examples for this issue, you can get them from http://www.latiumsoftware.com/download/p0003.zip ________________________________________________________________________ This newsletter is provided "AS IS" without warranty of any kind. Its use implies the acceptance of our licensing terms and disclaimer of warranty you can read at http://www.latiumsoftware.com/en/legal.php where you will also find a note about legal trademarks. Articles are copyright of their respective authors and they are reproduced here with their permission. You can redistribute this newsletter as long as you do it in full (including copyright notices), without changes, and gratis. ________________________________________________________________________ Main page: http://www.latiumsoftware.com/en/pascal/delphi-newsletter.php Group home page: http://groups.yahoo.com/group/pascal-newsletter/ Subscribe/join: pascal-newsletter-subscribe@yahoogroups.com Unsubscribe/leave: pascal-newsletter-unsubscribe@yahoogroups.com Problems with your subscription? eds2004 @ latiumsoftware.com ________________________________________________________________________ Latium Software http://www.latiumsoftware.com/en/index.php Copyright (c) 2000 by Ernesto De Spirito. All rights reserved. ________________________________________________________________________ |
The full source code examples of this issue are available for download.
![]() |
Errors? Omissions? Comments? Please contact us!






