Richard H.Administrator
(KiX Supporter)
2005-04-14 11:04 AM
Whitespace characters

Quote:

btw - is there a formal definition (somewhere) on what constitutes "white-space", or have you captured them all.




The de-facto defintition of a whitespace character is an ASCII control character whose purpose is to move a cursor / printhead while producing no visible output, hence the name.

"0" is a NUL, does nothing and so is not whitespace. Anything above Chr(127) is not ASCII (which is a 7-bit code), is not intended to cause non-printing movement and is not whitespace.

This leaves you a list of
  • Horizontal Tab - HT - Chr(9)
  • New Line - NL - Chr(10)
  • Vertical Tab - VT - Chr(11)
  • Form Feed - FF - Chr(12)
  • Carriage Return - CR - Chr(13)
  • Space - Space - Chr(32)


All other characters <33 are control characters whose action is not intended to move the printhead/cursor and are not whitespace.

You might think backspace (BS - Chr(8)) is whitespace, but it is primarily a "cancel last character" action rather than a true whitespace character.


AllenAdministrator
(KiX Supporter)
2005-04-15 12:44 AM
Re: Whitespace characters

Richard, I'm not sure what you consider Chr(255) but I've always thought of it as the spacebar's little hidden brother. I often used it for passwords, and even found it used in the output of programs from time to time. For those of you who don't know, you can enter this character or any other chr(?) by pressing Alt and then entering the number of the character on the number pad. This works in most programs.

Anyway, I was just playing around and found something interesting.

Code:
 
$inflated="hello this is a test"
? join(split($inflated," "),"")
? join(split($inflated,chr(255)),"")



Provided the board software does not remove it, the first space after hello is actually chr(255). Doing the joinsplitting on the string with the actual 255 space, it replaces the character properly. However, the second joinsplit does not work when using CHR(255).

Any ideas why?


NTDOCAdministrator
(KiX Master)
2005-04-15 01:30 AM
Re: Whitespace characters

Not sure why it doesn't work Al but I use it in my posts all the time.

The board won't allow a blank line with just CRLF or a spacebar in code posted between the tags , so I put in ALT-255 which the board then recongnizes and allows it to be a blank line.
 


AllenAdministrator
(KiX Supporter)
2005-04-15 02:08 AM
Re: Whitespace characters

This appears to be a limitation in kixtart or as Richard mentioned above, has something to do with characters above 127. Kixtart can not split on CHR(###) of anything above 127, but does work with its actual text counterpart.

Les
(KiX Master)
2005-04-15 02:20 AM
Re: Whitespace characters

Ja, I use 255 in place of &nbsp; when posting on the board to get double spaces in. Before I got a laptop, I used it in passwords too. When people look over your shoulder trying to see what password you enter, they concentrate on what you are hammering on the numberpad and don't notice the alt key the other hand is covering.

Richard H.Administrator
(KiX Supporter)
2005-04-15 09:50 AM
Re: Whitespace characters

Quote:

Richard, I'm not sure what you consider Chr(255) but I've always thought of it as the spacebar's little hidden brother




Don't forget that Shawn was asking for a definition of "whitespace", not a definition of "all characters you might want to remove from strings".

To answer your question, Chr(255) is an unprintable (or invalid) character. What happens when your rendering device (printer, vdu, windows graphic renderer) encounters it is undefined. It might ignore it, draw a special character or explode. The same applies for other "extended ASCII" characters

The 7-bit characters and their effects are defined. The "S" in the ASCII acronym is "Standard", and this is where the whitespace definition comes from.

BTW for those who are wondering, the eighth bit was used for parity in the bad old days of unreliable serial interfaces.

Perhaps you need two UDFs, ASCIIPrintable() to get rid of the extended characters and non-printing control characters, and Deflate() or whatever to get rid of ASCII printable whitespace.

Here is an ASCIIPrintable():
Code:
$s="foo"+Chr(5)+@CRLF+Chr(140)+"bar"+@CRLF
"BEFORE:" ?
$s
"AFTER:" ?
ASCIIPrintable($s) ?

Function ASCIIPrintable($s)
Dim $iChar,$aiWhitespace
$aiWhiteSpace=Split("9 10 11 12 13 32")
$ASCIIPrintable="" ; Return value should be a string
While $s<>""
$iChar=ASC($s)
If 128 & $iChar
; Ignore Extended ASCII
Else
If $iChar<33 And NOT(1+ASCAN($aiWhitespace,$iChar))
; Ignore ASCII control characters which are not whitespace
Else
$ASCIIPrintable=$ASCIIPrintable+Chr($iChar)
EndIf
EndIf
$s=SubStr($s,2)
Loop
Exit 0
EndFunction



...and here is a deflate which will remove all whitespace, or replace repeated whitespace with a single space character (default):

Code:
$Space=Chr(32)+Chr(9)+Chr(9)+Chr(32)
$s=$Space+"foo"+$Space+"foo"+$Space+@CRLF+$Space+"bar bar "

"BEFORE='"+$s+"'"+@CRLF
"DEFLATED='"+Deflate($s)+"'"+@CRLF
"DEFLATED WITH DELETE FLAG='"+Deflate($s,1)+"'"+@CRLF

Function Deflate($s, Optional $bDelete)
Dim $iChar,$aiWhitespace, $sSpace
$Deflate=""
$aiWhiteSpace=Split("9 10 11 12 13 32")
While $s<>""
$iChar=ASC($s)
If (1+AScan($aiWhiteSpace,$iChar))
; Character is whitespace
If (CInt($bDelete) OR $Deflate="") $sSpace="" Else $sSpace=" " EndIf
Else
$Deflate=$Deflate+$sSpace+Chr($iChar)
$sSpace=""
EndIf
$s=SubStr($s,2)
Loop
Exit 0
EndFunction