Page 1 of 1 1
Topic Options
#109672 - 2003-12-05 04:13 PM UNICODE support for READLINE
Sealeopard Offline
KiX Master
*****

Registered: 2001-04-25
Posts: 11164
Loc: Boston, MA, USA
Currently, READLINE does not support UNICODE-formatted text files. These files are read in on a character-by-character basis instead of a line-by-line basis. Additionally, the CR and LF characters are dropped this making it nearly impossible to reassemble the text file unless one uses two empty 'lines' as indicator of a CRLF combination.

Suggestion triggered by the following post: http://www.kixscripts.com/forum/tm.asp?m=3472
_________________________
There are two types of vessels, submarines and targets.

Top
#109673 - 2003-12-05 04:59 PM Re: UNICODE support for READLINE
Howard Bullock Offline
KiX Supporter
*****

Registered: 2000-09-15
Posts: 5809
Loc: Harrisburg, PA USA
This is a good request. Maybe something like supporint chr(0) in strings would be sufficient.

http://www.kixtart.org/ubbthreads/showthreaded.php?Cat=&Number=66178
_________________________
Home page: http://www.kixhelp.com/hb/

Top
#109674 - 2003-12-05 07:12 PM Re: UNICODE support for READLINE
Allen Administrator Offline
KiX Supporter
*****

Registered: 2003-04-19
Posts: 4545
Loc: USA
IIRC, I ran into the same problem while writing the Addprinter() UDF. The way I got around it was to shell out and TYPE the file to another file.

Code:
  
shell '%comspec% /c type "$driverinf">%temp%\addprinter.txt'



Once the new file was created readline worked normally. However, you are right, it would be nice to be able to just read the unicode directly.



Top
#109675 - 2003-12-08 11:37 AM Re: UNICODE support for READLINE
Richard H. Administrator Offline
Administrator
*****

Registered: 2000-01-24
Posts: 4946
Loc: Leatherhead, Surrey, UK
Quote:

Maybe something like supporint chr(0) in strings would be sufficient.




Supporting wide character/double-byte/unicode is a good idea and I believe will become more and more important. I think that simply changing the internal support for basic strings is not going to be sufficient to support Unicode.

Unicode characters are not simply an ASCII character preceded by a Chr(0). That Chr(0) is there for a reason, see the Unicode home page for the full spec.

This means that you need to be careful when reading, writing, substringing, intstringing, catenating, testing and converting strings to preserve the character set information.

The better solution is to have a new "wide string" basic type and either update the string functions to auto-magically support both or add wide string functions.

Conversion between wide and non-wide strings would be automatic in the same way as (say) between strings and number types. The extra byte would of course be lost when converting from a wide to a non-wide string, and would have to be set to '0' when converting from a non-wide to a wide string.

Specifying characters is also an interesting task. "Simple Latin" isn't a problem as it corresponds to 7-bit ASCII (&0000 - &007F), so the automatic conversion could handle that. How do you specify Cyrillic, Greek, Box drawing ot mathematical characters though, especially when the high order byte is often a non-printable? Perhaps a new conversion function, so if you wanted the currency symbol for the Euro you would use:
Code:
$wsEuroSymbol=CWStr(&20AC)



This is going to be a lot of work, so a short-term measure would be:
  • Update "OPEN", to recognise wide character files, and allow the specification of wide character when creating files.
  • Update "Readline" to silently drop the leading byte of each wide character.
  • Update "Writeline" to prefix each character with Chr(0).


While this doesn't actually provide Unicode support, it will allow the reading and writing of files which contain only the "Basic Latin" and "Latin-1 Supplement" codes which may be sufficient for administration purposes.

Top
#109676 - 2005-09-20 10:08 PM Re: UNICODE support for READLINE
Les Offline
KiX Master
*****

Registered: 2001-06-11
Posts: 12734
Loc: fortfrances.on.ca
Any chance of KiXing this up a notch? Today, more and more data is written in unicode and while we could call upon the FSO gods, it would be nice to support it natively.

While you are at it, any ideas on a way for KiX to detect if the file is unicode or not?
_________________________
Give a man a fish and he will be back for more. Slap him with a fish and he will go away forever.

Top
#109677 - 2005-09-20 10:32 PM Re: UNICODE support for READLINE
NTDOC Administrator Offline
Administrator
*****

Registered: 2000-07-28
Posts: 11623
Loc: CA
Well there is a UDF to detect it so I'm sure doing it natively would be a walk in the park.
Top
#109678 - 2005-09-20 10:42 PM Re: UNICODE support for READLINE
Les Offline
KiX Master
*****

Registered: 2001-06-11
Posts: 12734
Loc: fortfrances.on.ca
Ja, the question was directed at Ruud. I know there is a UDF.

Was thinking maybe Open() could return a different code for unicode.
_________________________
Give a man a fish and he will be back for more. Slap him with a fish and he will go away forever.

Top
Page 1 of 1 1


Moderator:  Lonkero, ShaneEP, Jochen, Radimus, Glenn Barnas, Allen, Ruud van Velsen, Mart 
Hop to:
Shout Box

Who's Online
0 registered and 557 anonymous users online.
Newest Members
gespanntleuchten, DaveatAdvanced, Paulo_Alves, UsTaaa, xxJJxx
17864 Registered Users

Generated in 0.056 seconds in which 0.025 seconds were spent on a total of 12 queries. Zlib compression enabled.

Search the board with:
superb Board Search
or try with google:
Google
Web kixtart.org