#147083 - 2005-09-04 10:34 PM
log file pruning / tail
|
iffy
Starting to like KiXtart
Registered: 2005-05-29
Posts: 149
Loc: The Netherlands
|
A little 'challenge'
I'm in need of a routine to prune log files, sorta like you can with the unix TAIL command. That is, I have a few hundred (ascii) log files with 500-50000 lines each (around 10000 on average, average line length 80). I would want to run the pruning daily and only keep the last 5000 lines.
Before actually starting to code I was thing of an efficient way to do this but I'm not coming up with anything imaginative beside the plain read the whole file in memory and replace the file on disk with only the last x lines. If it was a binary file I could do a fileseek/position. Since I don't desperately need exact numbers I was thinking of cheating a little by doing a calculation based on filesize and average line length.
Or I could cheat by calling an external routine like the tail.exe Microsoft provides in their Resource Kit but I prefer 'native code'.
Anyway, I was hoping one of you got an idea for a nice algorithm.
Thanks in advance, Alex
BTW: I don't need a fully coded example, just a pointer in the right direction
|
|
Top
|
|
|
|
#147084 - 2005-09-04 11:24 PM
Re: log file pruning / tail
|
Shawn
Administrator
   
Registered: 1999-08-13
Posts: 8611
|
ok, here's some ideas and a few code snippets for a nice algorithm (not sure if there is already one in the UDF library, might want to check) ... but basically create an array of lines (buffer) the same size as the number of tail lines you want to keep, and as you read the file, store these lines in your buffer. When done reading, go back and write out your saved lines.
Create a $variable that will hold your "numbers of lines to keep", kinda like this:
$KEEP = 5
Create an array dimensioned that size, like this:
Dim $Array[$KEEP]
and create another variable that will be your indexer into this array, as you read the lines in the file:
$Index = 0
Create a variable called "Rollover" that will indicate if you extend pass the end of this array when reading the file:
$RollOver = 0
Open your file and start reading the lines into your buffer:
Code:
$array[$index] = readline(1)
while @ERROR = 0 $index = $index + 1 if $index > $KEEP $RollOver = 1 $index = 0 endif $array[$index] = readline(1) loop
Note that if you blow past the buffer size ($index > $KEEP) you wrap around to the beginning of the buffer, BUT - you set the RollOver flag so that later you will know this.
After your done reading the file, close it, then pump out what you got - you would have to take into consideration whether you rolled-over or not. Maybe something like:
Code:
if $RollOver = 0 for $i = 0 to $index - 1 ? $array[$i] next else for $i = $index + 1 to $keep ? $array[$i] next for $i = 0 to $index - 1 ? $array[$i] next endif
|
|
Top
|
|
|
|
#147085 - 2005-09-04 11:37 PM
Re: log file pruning / tail
|
iffy
Starting to like KiXtart
Registered: 2005-05-29
Posts: 149
Loc: The Netherlands
|
Hmm... why didn't I think of this rollover thingy.. I vaguelly remember actually doing something like that a long long time ago. Better then what I came up with so far. What I'm really hoping for of course is if someone can cut the file i/o down to less than sourcefilesize + outputfilesize.
I'm now wondering how the unix tail command works, those tools are usually quite golfed down. Anyone around fluent enough in C, with a unix distro handy and some time on their hands?
/me is of to do some Googling
|
|
Top
|
|
|
|
#147086 - 2005-09-05 12:08 AM
Re: log file pruning / tail
|
iffy
Starting to like KiXtart
Registered: 2005-05-29
Posts: 149
Loc: The Netherlands
|
Ok, looking thru some version of tail.c (http://minnie.tuhs.org/UnixTree/V7/usr/src/cmd/tail.c.html) I remembered a technique I've used almost 15 years ago to speed up searching thru large (100.000+ lines) textfiles. Do binary reads of large blocks into a buffer and search for the EOL markers, count those and combine that some efficient memcpy stuff. Back then it was worth it cause the filesearch was almost as fast as sequential disk i/o would allow. Seem to remember that large buffers in a multiple of the disk block size really helped back then. Might be a little of a hassle nowadays with them blazing fast puters, I might end up with 3x the lines of code and only 5% speed increase. And it's not speed I'm interested in to begin with, it's more file i/o I'd like to reduce. Hmmm... while typing I'm thinking... read blocks backwards... but then I'd need a seek function... /me is confusing himself.. need sleep..
Edited by iffy (2005-09-05 12:09 AM)
|
|
Top
|
|
|
|
#147088 - 2005-09-05 12:34 AM
Re: log file pruning / tail
|
iffy
Starting to like KiXtart
Registered: 2005-05-29
Posts: 149
Loc: The Netherlands
|
Binary access pack? Which can be found where? And again, it's not about speed it's about minimizing file i/o.
|
|
Top
|
|
|
|
#147090 - 2005-09-05 12:43 AM
Re: log file pruning / tail
|
iffy
Starting to like KiXtart
Registered: 2005-05-29
Posts: 149
Loc: The Netherlands
|
You are right of course that a file i/o minimized routine would probably be the fastest (unless the files are on a ramdisk which unfortionately they aren't). It's more like over 2Mbit WAN links.
|
|
Top
|
|
|
|
#147091 - 2005-09-05 12:57 AM
Re: log file pruning / tail
|
Allen
KiX Supporter
   
Registered: 2003-04-19
Posts: 4567
Loc: USA
|
I'm not sure about all this binary stuff... but I slightly modified the wshpipe udf to dump out the last X number of lines. Very raw as it doesn't do much error checking... but might get you started...
Code:
Function Tail($file, $last) Dim $oExec, $Output,$out[$last] $oExec = CreateObject("WScript.Shell").Exec('%comspec% /c type "' + $file + '"') If Not VarType($oExec)=9 $WshPipe="WScript.Shell Exec Unsupported" Exit 10 EndIf $Output = $oExec.StdOut.ReadAll + $oExec.StdErr.ReadAll $output=Split(Join(Split($Output,CHR(13)),''),CHR(10)) for $i=0 to $last $out[$i]=$output[ubound($output)+$i-$last] next $tail=$out Exit($oExec.ExitCode) EndFunction
|
|
Top
|
|
|
|
#147092 - 2005-09-05 01:07 AM
Re: log file pruning / tail
|
iffy
Starting to like KiXtart
Registered: 2005-05-29
Posts: 149
Loc: The Netherlands
|
The double split/join could be done as $output=split($output, @crlf) but otherwise it sure looks neat and simple. Funny thing is I grew up with 64k or less of ram doing assembly code and after a zillion years I still don't consider doing a split on a file a few mb large. Silly me of course typing this on a laptop with 1Gb of ram... Makes me think that the lonkonizer was born 1 or 2 decades too late... Them good old days, bloody fights over 1 bytes or 2 cpu cycles...
|
|
Top
|
|
|
|
#147095 - 2005-09-05 01:31 AM
Re: log file pruning / tail
|
iffy
Starting to like KiXtart
Registered: 2005-05-29
Posts: 149
Loc: The Netherlands
|
a simple for /f %i in ('dir /b') do @tail.exe %i ... etc would do the job but... I'm interested in a 'pure kix' solution since I want do keep dependencies down to an absolute minimum (within reason of course).
|
|
Top
|
|
|
|
#147097 - 2005-09-05 01:37 AM
Re: log file pruning / tail
|
iffy
Starting to like KiXtart
Registered: 2005-05-29
Posts: 149
Loc: The Netherlands
|
LOL.. anything without shelling, spawning, forking or otherwise calling an external file Thus COM is allowed of course since that's (now) considered a very welcome addition to KiX.
|
|
Top
|
|
|
|
#147099 - 2005-09-05 01:44 AM
Re: log file pruning / tail
|
iffy
Starting to like KiXtart
Registered: 2005-05-29
Posts: 149
Loc: The Netherlands
|
Preferably no dll's which are not in a standard winxp or w2k3 server installation so no kixforms either, besides wouldn't calling any external file only cause more file i/o and thus sorta conflict with the task at hand?
|
|
Top
|
|
|
|
#147102 - 2005-09-05 08:57 PM
Re: log file pruning / tail
|
Bryce
KiX Supporter
   
Registered: 2000-02-29
Posts: 3167
Loc: Houston TX
|
Quote:
If COM is allowed, then if the file is 100K (assuming ANSI 1 byte/char.) then there are ~100,000 characters in the file. If you just want the last 10K of the file or so, you can use the TextStream object from the scripting library. It contains a 'Skip' method which takes a number of characters to skip (90,000 in this example). Then just call ReadAll and you'll get the rest of the text from the file.
you mean like this UDF?
Tail()
Edited by Bryce (2005-09-05 09:02 PM)
|
|
Top
|
|
|
|
Moderator: Arend_, Allen, Jochen, Radimus, Glenn Barnas, ShaneEP, Ruud van Velsen, Mart
|
0 registered
and 302 anonymous users online.
|
|
|