#137965 - 2005-04-14 11:04 AM
Whitespace characters
|
Richard H.
Administrator
Registered: 2000-01-24
Posts: 4946
Loc: Leatherhead, Surrey, UK
|
Quote:
btw - is there a formal definition (somewhere) on what constitutes "white-space", or have you captured them all.
The de-facto defintition of a whitespace character is an ASCII control character whose purpose is to move a cursor / printhead while producing no visible output, hence the name.
"0" is a NUL, does nothing and so is not whitespace. Anything above Chr(127) is not ASCII (which is a 7-bit code), is not intended to cause non-printing movement and is not whitespace.
This leaves you a list of
- Horizontal Tab - HT - Chr(9)
- New Line - NL - Chr(10)
- Vertical Tab - VT - Chr(11)
- Form Feed - FF - Chr(12)
- Carriage Return - CR - Chr(13)
- Space - Space - Chr(32)
All other characters <33 are control characters whose action is not intended to move the printhead/cursor and are not whitespace.
You might think backspace (BS - Chr(8)) is whitespace, but it is primarily a "cancel last character" action rather than a true whitespace character.
|
Top
|
|
|
|
#137966 - 2005-04-15 12:44 AM
Re: Whitespace characters
|
Allen
KiX Supporter
Registered: 2003-04-19
Posts: 4545
Loc: USA
|
Richard, I'm not sure what you consider Chr(255) but I've always thought of it as the spacebar's little hidden brother. I often used it for passwords, and even found it used in the output of programs from time to time. For those of you who don't know, you can enter this character or any other chr(?) by pressing Alt and then entering the number of the character on the number pad. This works in most programs.
Anyway, I was just playing around and found something interesting.
Code:
$inflated="hello this is a test" ? join(split($inflated," "),"") ? join(split($inflated,chr(255)),"")
Provided the board software does not remove it, the first space after hello is actually chr(255). Doing the joinsplitting on the string with the actual 255 space, it replaces the character properly. However, the second joinsplit does not work when using CHR(255).
Any ideas why?
|
Top
|
|
|
|
#137970 - 2005-04-15 09:50 AM
Re: Whitespace characters
|
Richard H.
Administrator
Registered: 2000-01-24
Posts: 4946
Loc: Leatherhead, Surrey, UK
|
Quote:
Richard, I'm not sure what you consider Chr(255) but I've always thought of it as the spacebar's little hidden brother
Don't forget that Shawn was asking for a definition of "whitespace", not a definition of "all characters you might want to remove from strings".
To answer your question, Chr(255) is an unprintable (or invalid) character. What happens when your rendering device (printer, vdu, windows graphic renderer) encounters it is undefined. It might ignore it, draw a special character or explode. The same applies for other "extended ASCII" characters
The 7-bit characters and their effects are defined. The "S" in the ASCII acronym is "Standard", and this is where the whitespace definition comes from.
BTW for those who are wondering, the eighth bit was used for parity in the bad old days of unreliable serial interfaces.
Perhaps you need two UDFs, ASCIIPrintable() to get rid of the extended characters and non-printing control characters, and Deflate() or whatever to get rid of ASCII printable whitespace.
Here is an ASCIIPrintable(): Code:
$s="foo"+Chr(5)+@CRLF+Chr(140)+"bar"+@CRLF "BEFORE:" ? $s "AFTER:" ? ASCIIPrintable($s) ? Function ASCIIPrintable($s) Dim $iChar,$aiWhitespace $aiWhiteSpace=Split("9 10 11 12 13 32") $ASCIIPrintable="" ; Return value should be a string While $s<>"" $iChar=ASC($s) If 128 & $iChar ; Ignore Extended ASCII Else If $iChar<33 And NOT(1+ASCAN($aiWhitespace,$iChar)) ; Ignore ASCII control characters which are not whitespace Else $ASCIIPrintable=$ASCIIPrintable+Chr($iChar) EndIf EndIf $s=SubStr($s,2) Loop Exit 0 EndFunction
...and here is a deflate which will remove all whitespace, or replace repeated whitespace with a single space character (default):
Code:
$Space=Chr(32)+Chr(9)+Chr(9)+Chr(32) $s=$Space+"foo"+$Space+"foo"+$Space+@CRLF+$Space+"bar bar " "BEFORE='"+$s+"'"+@CRLF "DEFLATED='"+Deflate($s)+"'"+@CRLF "DEFLATED WITH DELETE FLAG='"+Deflate($s,1)+"'"+@CRLF Function Deflate($s, Optional $bDelete) Dim $iChar,$aiWhitespace, $sSpace $Deflate="" $aiWhiteSpace=Split("9 10 11 12 13 32") While $s<>"" $iChar=ASC($s) If (1+AScan($aiWhiteSpace,$iChar)) ; Character is whitespace If (CInt($bDelete) OR $Deflate="") $sSpace="" Else $sSpace=" " EndIf Else $Deflate=$Deflate+$sSpace+Chr($iChar) $sSpace="" EndIf $s=SubStr($s,2) Loop Exit 0 EndFunction
|
Top
|
|
|
|
Moderator: Arend_, Allen, Jochen, Radimus, Glenn Barnas, ShaneEP, Ruud van Velsen, Mart
|
1 registered
(Allen)
and 466 anonymous users online.
|
|
|