Determining the actual length of a ANSI string (MBCS)
Copyright © 2000 Ernesto De Spirito
![]() |
Introduction
The Length function returns the length of a string, but it
behaves differently according to the type of the string. For the
old short strings (ShortString) and for long strings
(AnsiString), Length returns the number of bytes
they take, while for wide (Unicode) strings (WideString)
it returns the number of wide characters (WideChar), that is,
the number of bytes divided by two.
In the case of short and long strings, in Western languages one
character takes one byte, while for example in Asian languages some
characters take one and others two bytes. For this reason, there are two
versions of almost all string functions, one of great performance
that only works with single-byte character strings (SBCS) and another
-less performant- one that also works with strings where a character can
take one or two bytes (DBCS) that are used in applications distributed
internationally. This way we have functions
like Pos, LowerCase and UpperCase on one side
and AnsiPos, AnsiLowerCase and AnsiUpperCase on
the other. Curiosly there is no AnsiLength function that returns
the number of characters in a DBCS.
AnsiLength (Draft)
Then here it goes a function that returns the number of characters in a double-byte character string:
uses SysUtils;
function AnsiLength(const s: string): integer;
var
i, n: integer;
begin
Result := 0;
n := Length(s);
i := 1;
while i <= n do begin
inc(Result);
if s[i] in LeadBytes then inc(i);
inc(i);
end;
end;
AnsiLength (Final)
Naturally, this function is not optimized. We are not going to mess with assembler, but at least we can use pointers:
uses SysUtils;
function AnsiLength(const s: string): integer;
var
p, q: pchar;
begin
Result := 0;
p := PChar(s);
q := p + Length(s);
while p < q do begin
inc(Result);
if p^ in LeadBytes then
inc(p, 2)
else
inc(p);
end;
end;
![]() |



