Page 1 of 2 12>
Topic Options
#147083 - 2005-09-04 10:34 PM log file pruning / tail
iffy Offline
Starting to like KiXtart

Registered: 2005-05-29
Posts: 149
Loc: The Netherlands
A little 'challenge'

I'm in need of a routine to prune log files, sorta like you can with the unix TAIL command. That is, I have a few hundred (ascii) log files with 500-50000 lines each (around 10000 on average, average line length 80). I would want to run the pruning daily and only keep the last 5000 lines.

Before actually starting to code I was thing of an efficient way to do this but I'm not coming up with anything imaginative beside the plain read the whole file in memory and replace the file on disk with only the last x lines. If it was a binary file I could do a fileseek/position. Since I don't desperately need exact numbers I was thinking of cheating a little by doing a calculation based on filesize and average line length.

Or I could cheat by calling an external routine like the tail.exe Microsoft provides in their Resource Kit but I prefer 'native code'.

Anyway, I was hoping one of you got an idea for a nice algorithm.

Thanks in advance,
Alex


BTW: I don't need a fully coded example, just a pointer in the right direction

Top
#147084 - 2005-09-04 11:24 PM Re: log file pruning / tail
Shawn Administrator Offline
Administrator
*****

Registered: 1999-08-13
Posts: 8611
ok, here's some ideas and a few code snippets for a nice algorithm (not sure if there is already one in the UDF library, might want to check) ... but basically create an array of lines (buffer) the same size as the number of tail lines you want to keep, and as you read the file, store these lines in your buffer. When done reading, go back and write out your saved lines.

Create a $variable that will hold your "numbers of lines to keep", kinda like this:

$KEEP = 5

Create an array dimensioned that size, like this:

Dim $Array[$KEEP]

and create another variable that will be your indexer into this array, as you read the lines in the file:

$Index = 0

Create a variable called "Rollover" that will indicate if you extend pass the end of this array when reading the file:

$RollOver = 0

Open your file and start reading the lines into your buffer:

Code:

$array[$index] = readline(1)

while @ERROR = 0
$index = $index + 1
if $index > $KEEP
$RollOver = 1
$index = 0
endif
$array[$index] = readline(1)
loop



Note that if you blow past the buffer size ($index > $KEEP) you wrap around to the beginning of the buffer, BUT - you set the RollOver flag so that later you will know this.

After your done reading the file, close it, then pump out what you got - you would have to take into consideration whether you rolled-over or not. Maybe something like:

Code:

if $RollOver = 0
for $i = 0 to $index - 1
? $array[$i]
next
else
for $i = $index + 1 to $keep
? $array[$i]
next
for $i = 0 to $index - 1
? $array[$i]
next
endif


Top
#147085 - 2005-09-04 11:37 PM Re: log file pruning / tail
iffy Offline
Starting to like KiXtart

Registered: 2005-05-29
Posts: 149
Loc: The Netherlands
Hmm... why didn't I think of this rollover thingy.. I vaguelly remember actually doing something like that a long long time ago. Better then what I came up with so far. What I'm really hoping for of course is if someone can cut the file i/o down to less than sourcefilesize + outputfilesize.

I'm now wondering how the unix tail command works, those tools are usually quite golfed down. Anyone around fluent enough in C, with a unix distro handy and some time on their hands?

/me is of to do some Googling

Top
#147086 - 2005-09-05 12:08 AM Re: log file pruning / tail
iffy Offline
Starting to like KiXtart

Registered: 2005-05-29
Posts: 149
Loc: The Netherlands
Ok, looking thru some version of tail.c (http://minnie.tuhs.org/UnixTree/V7/usr/src/cmd/tail.c.html) I remembered a technique I've used almost 15 years ago to speed up searching thru large (100.000+ lines) textfiles. Do binary reads of large blocks into a buffer and search for the EOL markers, count those and combine that some efficient memcpy stuff. Back then it was worth it cause the filesearch was almost as fast as sequential disk i/o would allow. Seem to remember that large buffers in a multiple of the disk block size really helped back then. Might be a little of a hassle nowadays with them blazing fast puters, I might end up with 3x the lines of code and only 5% speed increase. And it's not speed I'm interested in to begin with, it's more file i/o I'd like to reduce. Hmmm... while typing I'm thinking... read blocks backwards... but then I'd need a seek function... /me is confusing himself.. need sleep..

Edited by iffy (2005-09-05 12:09 AM)

Top
#147087 - 2005-09-05 12:17 AM Re: log file pruning / tail
Lonkero Administrator Offline
KiX Master Guru
*****

Registered: 2001-06-05
Posts: 22346
Loc: OK
seek you have if you want.
filestream scanning/searching, no problem.
just ask and you shall have.
it's all in the binary access pack.

and when you start to play with files with thousands of lines, you should see increase in speed, no matter how fast... well, none of the current desktop computers are fast enough for you to not see the diff.
_________________________
!

download KiXnet

Top
#147088 - 2005-09-05 12:34 AM Re: log file pruning / tail
iffy Offline
Starting to like KiXtart

Registered: 2005-05-29
Posts: 149
Loc: The Netherlands
Binary access pack? Which can be found where? And again, it's not about speed it's about minimizing file i/o.
Top
#147089 - 2005-09-05 12:38 AM Re: log file pruning / tail
Lonkero Administrator Offline
KiX Master Guru
*****

Registered: 2001-06-05
Posts: 22346
Loc: OK
minimizing file I/O equals speed, imho.

anyways, the binary access pack is available via personal contacts only at this point.
_________________________
!

download KiXnet

Top
#147090 - 2005-09-05 12:43 AM Re: log file pruning / tail
iffy Offline
Starting to like KiXtart

Registered: 2005-05-29
Posts: 149
Loc: The Netherlands
You are right of course that a file i/o minimized routine would probably be the fastest (unless the files are on a ramdisk which unfortionately they aren't). It's more like over 2Mbit WAN links.
Top
#147091 - 2005-09-05 12:57 AM Re: log file pruning / tail
Allen Administrator Offline
KiX Supporter
*****

Registered: 2003-04-19
Posts: 4567
Loc: USA
I'm not sure about all this binary stuff... but I slightly modified the wshpipe udf to dump out the last X number of lines. Very raw as it doesn't do much error checking... but might get you started...

Code:
 
Function Tail($file, $last)
Dim $oExec, $Output,$out[$last]
$oExec = CreateObject("WScript.Shell").Exec('%comspec% /c type "' + $file + '"')
If Not VarType($oExec)=9 $WshPipe="WScript.Shell Exec Unsupported" Exit 10 EndIf
$Output = $oExec.StdOut.ReadAll + $oExec.StdErr.ReadAll
$output=Split(Join(Split($Output,CHR(13)),''),CHR(10))
for $i=0 to $last
$out[$i]=$output[ubound($output)+$i-$last]
next
$tail=$out
Exit($oExec.ExitCode)
EndFunction


Top
#147092 - 2005-09-05 01:07 AM Re: log file pruning / tail
iffy Offline
Starting to like KiXtart

Registered: 2005-05-29
Posts: 149
Loc: The Netherlands
The double split/join could be done as $output=split($output, @crlf) but otherwise it sure looks neat and simple. Funny thing is I grew up with 64k or less of ram doing assembly code and after a zillion years I still don't consider doing a split on a file a few mb large. Silly me of course typing this on a laptop with 1Gb of ram... Makes me think that the lonkonizer was born 1 or 2 decades too late... Them good old days, bloody fights over 1 bytes or 2 cpu cycles...
Top
#147093 - 2005-09-05 01:18 AM Re: log file pruning / tail
Lonkero Administrator Offline
KiX Master Guru
*****

Registered: 2001-06-05
Posts: 22346
Loc: OK
well, I don't see this udf by allen being anything than slower than direct kixtart readline() and writeline() code.
_________________________
!

download KiXnet

Top
#147094 - 2005-09-05 01:27 AM Re: log file pruning / tail
Shawn Administrator Offline
Administrator
*****

Registered: 1999-08-13
Posts: 8611
Are we still talking a "pure kixtart" solution here ? Are add-ins allowed ? If so, I would just get me some tail (tail.exe I mean) ;0)
Top
#147095 - 2005-09-05 01:31 AM Re: log file pruning / tail
iffy Offline
Starting to like KiXtart

Registered: 2005-05-29
Posts: 149
Loc: The Netherlands
a simple for /f %i in ('dir /b') do @tail.exe %i ... etc would do the job but... I'm interested in a 'pure kix' solution since I want do keep dependencies down to an absolute minimum (within reason of course).
Top
#147096 - 2005-09-05 01:34 AM Re: log file pruning / tail
Shawn Administrator Offline
Administrator
*****

Registered: 1999-08-13
Posts: 8611
You mean a pure kix solution (ie. could use COM), or a pure-pure-pure kix solution ?
Top
#147097 - 2005-09-05 01:37 AM Re: log file pruning / tail
iffy Offline
Starting to like KiXtart

Registered: 2005-05-29
Posts: 149
Loc: The Netherlands
LOL.. anything without shelling, spawning, forking or otherwise calling an external file Thus COM is allowed of course since that's (now) considered a very welcome addition to KiX.
Top
#147098 - 2005-09-05 01:39 AM Re: log file pruning / tail
Lonkero Administrator Offline
KiX Master Guru
*****

Registered: 2001-06-05
Posts: 22346
Loc: OK
so...
the tail above is not nice as it shells...

hmm...
what about dll's? thinks like kixforms.dll?
_________________________
!

download KiXnet

Top
#147099 - 2005-09-05 01:44 AM Re: log file pruning / tail
iffy Offline
Starting to like KiXtart

Registered: 2005-05-29
Posts: 149
Loc: The Netherlands
Preferably no dll's which are not in a standard winxp or w2k3 server installation so no kixforms either, besides wouldn't calling any external file only cause more file i/o and thus sorta conflict with the task at hand?
Top
#147100 - 2005-09-05 02:08 AM Re: log file pruning / tail
Lonkero Administrator Offline
KiX Master Guru
*****

Registered: 2001-06-05
Posts: 22346
Loc: OK
well, if the external file is not huge and it handles the read properly, surely the file I/O would be reduced by it.

say, a proper tail.exe does a search for the EOL's and prints/reads only the needed lines thus minimizing the I/O.

anyway, with the above type example, you would get more I/O than simply reading the whole file with kix and replacing it with the new one with just some of the lines in it.

anyway, you could do a approximation by the size of the file and calculate how many lines you wanna strip.
then call more with it:
shell "%comspec% /c more myfile.old +"+$amountToSkip+" > myfile.new"


[edit]
hmm...
more is not good.
it doesn't have switch for batch-mode...


Edited by Lonkero (2005-09-05 02:12 AM)
_________________________
!

download KiXnet

Top
#147101 - 2005-09-05 05:02 PM Re: log file pruning / tail
Stevie Offline
Starting to like KiXtart
*****

Registered: 2002-01-09
Posts: 199
If COM is allowed, then if the file is 100K (assuming ANSI 1 byte/char.) then there are ~100,000 characters in the file. If you just want the last 10K of the file or so, you can use the TextStream object from the scripting library. It contains a 'Skip' method which takes a number of characters to skip (90,000 in this example). Then just call ReadAll and you'll get the rest of the text from the file.
_________________________
Stevie

Top
#147102 - 2005-09-05 08:57 PM Re: log file pruning / tail
Bryce Offline
KiX Supporter
*****

Registered: 2000-02-29
Posts: 3167
Loc: Houston TX
Quote:

If COM is allowed, then if the file is 100K (assuming ANSI 1 byte/char.) then there are ~100,000 characters in the file. If you just want the last 10K of the file or so, you can use the TextStream object from the scripting library. It contains a 'Skip' method which takes a number of characters to skip (90,000 in this example). Then just call ReadAll and you'll get the rest of the text from the file.




you mean like this UDF?

Tail()


Edited by Bryce (2005-09-05 09:02 PM)

Top
Page 1 of 2 12>


Moderator:  Arend_, Allen, Jochen, Radimus, Glenn Barnas, ShaneEP, Ruud van Velsen, Mart 
Hop to:
Shout Box

Who's Online
0 registered and 302 anonymous users online.
Newest Members
Sir_Barrington, batdk82, StuTheCoder, M_Moore, BeeEm
17886 Registered Users

Generated in 0.171 seconds in which 0.034 seconds were spent on a total of 13 queries. Zlib compression enabled.

Search the board with:
superb Board Search
or try with google:
Google
Web kixtart.org