Quote:

Actualy his readfile() UDF is using the OPEN() and Readline() commands... so i doubt that the CPU usage will drop when dealing with such a LARGE file.




No, that's not correct. Look at the udf more closely, and you will see the obvious problems.

To elaborate, there are two major inefficiencies in the ReadFile() udf.

The most obvious is where you specify a postive limit on the number of lines to be returned. In this case the entire file is read before the code checks to see if there is a limit. This means that if you need the first 10 lines from your 10,000 line file all 10,000 lines will be read anyway.

The second less obvious problem is how ReadFile() manipulates the data.

As each line is read from the file buffer, it is catenated to the end of a string. This string gets longer and longer until it eventually becomes the size of your file. As you can imagine, string manipulations on data this size become slower as the string gets larger, and with a 800KB text file there is likely to be a large number of iterations of the ReadLine() loop.

The string is then converted to an array - again this is not a trivial task for a string of this length.

All this string manipulation occurs in core memory and requires repeated creation and destruction of temporary variable / stack space in the scripts private process space.

The string maniplation is where the majority CPU cycles are burned, not on the file IO.

The reason for moving the ReadLine() activity to the main script is that the temporary string is not created so we lose the massive overhead. As we need to iterate the result anyway we also save the overhead of the do..until loop in the UDF.

However, FSO is a good option too. In this case even with the Split() it may well be quicker than using KiXtart file IO

[edit]
Contribution to the Secret Long Line Policemans ball to follow...
[/edit]


Edited by Richard H. (2005-01-06 09:15 AM)