Page 1 of 1 1
Topic Options
#137965 - 2005-04-14 11:04 AM Whitespace characters
Richard H. Administrator Offline
Administrator
*****

Registered: 2000-01-24
Posts: 4946
Loc: Leatherhead, Surrey, UK
Quote:

btw - is there a formal definition (somewhere) on what constitutes "white-space", or have you captured them all.




The de-facto defintition of a whitespace character is an ASCII control character whose purpose is to move a cursor / printhead while producing no visible output, hence the name.

"0" is a NUL, does nothing and so is not whitespace. Anything above Chr(127) is not ASCII (which is a 7-bit code), is not intended to cause non-printing movement and is not whitespace.

This leaves you a list of
  • Horizontal Tab - HT - Chr(9)
  • New Line - NL - Chr(10)
  • Vertical Tab - VT - Chr(11)
  • Form Feed - FF - Chr(12)
  • Carriage Return - CR - Chr(13)
  • Space - Space - Chr(32)


All other characters <33 are control characters whose action is not intended to move the printhead/cursor and are not whitespace.

You might think backspace (BS - Chr(8)) is whitespace, but it is primarily a "cancel last character" action rather than a true whitespace character.

Top
#137966 - 2005-04-15 12:44 AM Re: Whitespace characters
Allen Administrator Online   shocked
KiX Supporter
*****

Registered: 2003-04-19
Posts: 4545
Loc: USA
Richard, I'm not sure what you consider Chr(255) but I've always thought of it as the spacebar's little hidden brother. I often used it for passwords, and even found it used in the output of programs from time to time. For those of you who don't know, you can enter this character or any other chr(?) by pressing Alt and then entering the number of the character on the number pad. This works in most programs.

Anyway, I was just playing around and found something interesting.

Code:
 
$inflated="hello this is a test"
? join(split($inflated," "),"")
? join(split($inflated,chr(255)),"")



Provided the board software does not remove it, the first space after hello is actually chr(255). Doing the joinsplitting on the string with the actual 255 space, it replaces the character properly. However, the second joinsplit does not work when using CHR(255).

Any ideas why?

Top
#137967 - 2005-04-15 01:30 AM Re: Whitespace characters
NTDOC Administrator Offline
Administrator
*****

Registered: 2000-07-28
Posts: 11623
Loc: CA
Not sure why it doesn't work Al but I use it in my posts all the time.

The board won't allow a blank line with just CRLF or a spacebar in code posted between the tags , so I put in ALT-255 which the board then recongnizes and allows it to be a blank line.
 

Top
#137968 - 2005-04-15 02:08 AM Re: Whitespace characters
Allen Administrator Online   shocked
KiX Supporter
*****

Registered: 2003-04-19
Posts: 4545
Loc: USA
This appears to be a limitation in kixtart or as Richard mentioned above, has something to do with characters above 127. Kixtart can not split on CHR(###) of anything above 127, but does work with its actual text counterpart.
Top
#137969 - 2005-04-15 02:20 AM Re: Whitespace characters
Les Offline
KiX Master
*****

Registered: 2001-06-11
Posts: 12734
Loc: fortfrances.on.ca
Ja, I use 255 in place of &nbsp; when posting on the board to get double spaces in. Before I got a laptop, I used it in passwords too. When people look over your shoulder trying to see what password you enter, they concentrate on what you are hammering on the numberpad and don't notice the alt key the other hand is covering.
_________________________
Give a man a fish and he will be back for more. Slap him with a fish and he will go away forever.

Top
#137970 - 2005-04-15 09:50 AM Re: Whitespace characters
Richard H. Administrator Offline
Administrator
*****

Registered: 2000-01-24
Posts: 4946
Loc: Leatherhead, Surrey, UK
Quote:

Richard, I'm not sure what you consider Chr(255) but I've always thought of it as the spacebar's little hidden brother




Don't forget that Shawn was asking for a definition of "whitespace", not a definition of "all characters you might want to remove from strings".

To answer your question, Chr(255) is an unprintable (or invalid) character. What happens when your rendering device (printer, vdu, windows graphic renderer) encounters it is undefined. It might ignore it, draw a special character or explode. The same applies for other "extended ASCII" characters

The 7-bit characters and their effects are defined. The "S" in the ASCII acronym is "Standard", and this is where the whitespace definition comes from.

BTW for those who are wondering, the eighth bit was used for parity in the bad old days of unreliable serial interfaces.

Perhaps you need two UDFs, ASCIIPrintable() to get rid of the extended characters and non-printing control characters, and Deflate() or whatever to get rid of ASCII printable whitespace.

Here is an ASCIIPrintable():
Code:
$s="foo"+Chr(5)+@CRLF+Chr(140)+"bar"+@CRLF
"BEFORE:" ?
$s
"AFTER:" ?
ASCIIPrintable($s) ?

Function ASCIIPrintable($s)
Dim $iChar,$aiWhitespace
$aiWhiteSpace=Split("9 10 11 12 13 32")
$ASCIIPrintable="" ; Return value should be a string
While $s<>""
$iChar=ASC($s)
If 128 & $iChar
; Ignore Extended ASCII
Else
If $iChar<33 And NOT(1+ASCAN($aiWhitespace,$iChar))
; Ignore ASCII control characters which are not whitespace
Else
$ASCIIPrintable=$ASCIIPrintable+Chr($iChar)
EndIf
EndIf
$s=SubStr($s,2)
Loop
Exit 0
EndFunction



...and here is a deflate which will remove all whitespace, or replace repeated whitespace with a single space character (default):

Code:
$Space=Chr(32)+Chr(9)+Chr(9)+Chr(32)
$s=$Space+"foo"+$Space+"foo"+$Space+@CRLF+$Space+"bar bar "

"BEFORE='"+$s+"'"+@CRLF
"DEFLATED='"+Deflate($s)+"'"+@CRLF
"DEFLATED WITH DELETE FLAG='"+Deflate($s,1)+"'"+@CRLF

Function Deflate($s, Optional $bDelete)
Dim $iChar,$aiWhitespace, $sSpace
$Deflate=""
$aiWhiteSpace=Split("9 10 11 12 13 32")
While $s<>""
$iChar=ASC($s)
If (1+AScan($aiWhiteSpace,$iChar))
; Character is whitespace
If (CInt($bDelete) OR $Deflate="") $sSpace="" Else $sSpace=" " EndIf
Else
$Deflate=$Deflate+$sSpace+Chr($iChar)
$sSpace=""
EndIf
$s=SubStr($s,2)
Loop
Exit 0
EndFunction


Top
Page 1 of 1 1


Moderator:  Arend_, Allen, Jochen, Radimus, Glenn Barnas, ShaneEP, Ruud van Velsen, Mart 
Hop to:
Shout Box

Who's Online
1 registered (Allen) and 466 anonymous users online.
Newest Members
gespanntleuchten, DaveatAdvanced, Paulo_Alves, UsTaaa, xxJJxx
17864 Registered Users

Generated in 0.065 seconds in which 0.029 seconds were spent on a total of 13 queries. Zlib compression enabled.

Search the board with:
superb Board Search
or try with google:
Google
Web kixtart.org