Page 1 of 2 12>
Topic Options
#158407 - 2006-03-07 01:26 AM Compare two textfiles line per line
therob Offline
Starting to like KiXtart

Registered: 2005-05-19
Posts: 150
Loc: Frankfurt/M., Germany
Hi,
i have a script that runs weekly and basically compares an external User-Account-database to my ActiveDirectory.
Changes can be made on both sides, so i have to check in each direction.
But, at the current state, i do that quite "bute force", i think.

And this is what i do:
At first i get an export-txt-file of the external User-DB with all accountnames and a few important details
(e.g. email) seperated with "#"-letters.
Then, i export the related AD-group, also to a text file.
Now i read every line in the first *.txt and search for it in the second file.
If not found i create a delta-file containing the changes to make.
After that i do the same with the second file compared to the first.

At the moment i've got 3000 useraccounts so the script takes 10 to 15 min to
run. Thats not nice but acceptable.But in the near future i will have approx
15000 - 20000 Accounts to compare and the runtime will be too long.
Apart from that i open the second file 3000 times and read every line... ugly scripting i think...

I've got a fast server with enough memory for the task, so i was thinking about reading the files into memory
and do the checking there, but i dont know how.



a piece of the script for better understanding (i hope):
Code:


;... Compare 1of4: RD-Database -> AD

$rc = Open (1,"$datapath\$externalDBfile",2)

Do
$lineA1 = ReadLine (1)
If $lineA1 <> ""
$arrayA1 = Split ($lineA1,"#",4)
$loginA1 = $arrayA1[0]
probarplus (5)

$rc = Open (5,"$datapath\$ADexpAll",2)
Do

$lineB1 = ReadLine (5)
If $lineB1 <> ""
$arrayB1 = Split ($lineB1,"#",2)
$loginB1 = $arrayB1[1]
If $loginB1 = $loginA1 $rc = Close (5) Goto A1next EndIf
EndIf
Until @ERROR
$rc = Close (5)


; User not found
$wlineA1 = $arrayA1[0]+"#"+$arrayA1[1]+"#"+$arrayA1[2]+"#$Remarks#Create_User@crlf"
$rc = WriteLine (4,$wlineA1)
$deltaA1 = $deltaA1 + 1
EndIf
:A1next
Until @ERROR



any ideas?

thanks, rob

[Edited by NTDOC to remove long lines.]


Edited by NTDOC (2006-03-07 09:41 AM)

Top
#158408 - 2006-03-07 02:10 AM Re: Compare two textfiles line per line
NTDOC Administrator Offline
Administrator
*****

Registered: 2000-07-28
Posts: 11623
Loc: CA
Well I don't think your current methodology will work correctly.
You're comparing line by line but once the other file has an entry that the other one doesn't then from that
point forward ALL entries from both files will not match.

Though I'm sure something could be coded with KiX I think I might go with a tool designed for such comparison work.

Not a free solution but quite cheap and very powerful comparison tool.

ExamDiff Pro
http://www.prestosoft.com/ps.asp?page=edp_examdiffpro

Here is another such tool, but I like ExamDiff Pro better, but this one may be better some day.

Ultra Compare by authors of UltraEdit
http://www.ultraedit.com/index.php?name=Content&pa=showpage&pid=34


ps.. Please edit your post and remove the long lines to make reading easier without
needing to scroll the window.


Edited by NTDOC (2006-03-07 02:17 AM)

Top
#158409 - 2006-03-07 03:33 AM Re: Compare two textfiles line per line
Glenn Barnas Administrator Offline
KiX Supporter
*****

Registered: 2003-01-28
Posts: 4396
Loc: New Jersey
Wellll...

Here's a process where named arrays (Hashes) would be wonderful. Each named element would contain a 2 element array - a flag indicating the status from group 1 and group 2. Once you create your hash, you just enumerate it to find the items with only one element defined. If the flag is in element 1, it needs to be added to group 2. If it is only in element 2, it needs to be deleted from group 1. The HASH only contains an index of what exists in each group - you need to use this to find the data to move from group 1 to group 2. You could put all the data into the hash, but that might make it more difficult to manage.

Howard Bullock has a HASH library - a collection of UDFs that might give you this feature. Something to look into.

You can test the logic with an INI file.. I have a situation where I need to gather data that might have many duplicates - I write each item as an element in a section of an ini file, then just enumerate that section to get a unique list. Sorting isn't important, but lack of dups is, although QSort would make quick work if I needed them sorted.

In another case, I need to track how many times object X exists in a log file, and gather 5 pieces of data about that object. Again, I use an INI file, this time with a separate section for each object, and the values of the section representing the data fields. I do a read/increment/write of the COUNT value to keep a running tally of how many times it occurred. Named arrays would be faster, but I typically have only 1-200 items in the first example, and generally less than 15,000 in the second. When processing a source file with 15,000 records, it completes on my P4-3000 in 30-40 seconds. (actually, I process 11 files ranging from 100 lines to 15,000 lines, and all 11 complete in 43 seconds. Have never timed a single file.)

You could try the INI method by having a section define the account name, a set of values that are read from the master, and a flag entry that indicates it exists in the target. Simply enumerate the INI file, and every time you read the flag entry and it comes up null, you know you need to read the rest of the values from that section and send them to the target environment.

Anyway, just some ideas.. the INI file method is easy to implement without complex coding or UDFs, and lets you look at the "array" by editing the file itself.

I can share a few code fragments if you want a head start on the INI method - just email me.

Glenn
_________________________
Actually I am a Rocket Scientist! \:D

Top
#158410 - 2006-03-07 03:39 AM Re: Compare two textfiles line per line
NTDOC Administrator Offline
Administrator
*****

Registered: 2000-07-28
Posts: 11623
Loc: CA
See and now we await Richard and Howard on this subject


I know there are ways to code it but I suppose for $39 bucks it's been faster / easier to purchase a tool to do this. But by coding it one learns much more about coding and potentially has a tool that might be even more flexible.

 

Top
#158411 - 2006-03-07 03:42 AM Re: Compare two textfiles line per line
Glenn Barnas Administrator Offline
KiX Supporter
*****

Registered: 2003-01-28
Posts: 4396
Loc: New Jersey
Sun Sparc & diff - $1,000's
Commercial diff tools - $40
Cygwin diff - free

Learning how to do it, and getting EXACTLY what you need...

Priceless, even if it is a pain in the wrist!
_________________________
Actually I am a Rocket Scientist! \:D

Top
#158412 - 2006-03-07 03:44 AM Re: Compare two textfiles line per line
NTDOC Administrator Offline
Administrator
*****

Registered: 2000-07-28
Posts: 11623
Loc: CA
Oh, and there is this method too.

fc
Compares two files or sets of files and displays the differences between
them


FC [/A] [/C] [/L] [/LBn] [/N] [/OFF[LINE]] [/T] [/W] [/nnnn]
[drive1:][path1]filename1 [drive2:][path2]filename2
FC /B [drive1:][path1]filename1 [drive2:][path2]filename2

/A Displays only first and last lines for each set of differences.
/B Performs a binary comparison.
/C Disregards the case of letters.
/L Compares files as ASCII text.
/LBn Sets the maximum consecutive mismatches to the specified
number of lines.
/N Displays the line numbers on an ASCII comparison.
/OFF[LINE] Do not skip files with offline attribute set.
/T Does not expand tabs to spaces.
/U Compare files as UNICODE text files.
/W Compresses white space (tabs and spaces) for comparison.
/nnnn Specifies the number of consecutive lines that must match
after a mismatch.
[drive1:][path1]filename1
Specifies the first file or set of files to compare.
[drive2:][path2]filename2
Specifies the second file or set of files to compare.



and WinDiff
http://www.grigsoft.com/download-windiff.htm

Top
#158413 - 2006-03-07 07:06 AM Re: Compare two textfiles line per line
Howard Bullock Offline
KiX Supporter
*****

Registered: 2000-09-15
Posts: 5809
Loc: Harrisburg, PA USA
Hashes would make a nice solution. I already did something similar in Perl. Don't use my all KiX hash UDF's. They will be too slow. Use your VBS help file and lookup Dictionary Objects. These are M$'s associative arrays (HASHes in Perl). They will be able to provide you a fairly elegant solution.

There is some KiXtart code that uses this COM object in the following thread: http://www.kixtart.org/ubbthreads/showflat.php?Cat=&Board=UBB13&Number=154245

Basically you will load each set of data into it own Dictionary Object. Then you can cycle through each one to determine your ADDs, DELETEs, and Changes. I can assist you with the coding if you run into trouble.
_________________________
Home page: http://www.kixhelp.com/hb/

Top
#158414 - 2006-03-07 07:38 AM Re: Compare two textfiles line per line
NTDOC Administrator Offline
Administrator
*****

Registered: 2000-07-28
Posts: 11623
Loc: CA
See, and now we're just waiting on Richard's reply

Not that other's could not step in as well, but I know that this type of coding was in Howard's area of specialty and I'm guessing the Richard might have done something in this area as well.

Top
#158415 - 2006-03-07 09:22 AM Re: Compare two textfiles line per line
Richard H. Administrator Offline
Administrator
*****

Registered: 2000-01-24
Posts: 4946
Loc: Leatherhead, Surrey, UK
Personally I'd code something up using ports of sort and diff (or sdiff) from *nix - there are freely available ports of the textutils package to do this kind of thing.

I also posted a KiXtart compare script which could be leveraged to do this task a short while ago. It only reads each file once so doesn't suffer from the exponential read problem. It also does not keep either of the files in memory, so file size should not be a big issue.

Have a search of the board, and if you cannot locate it post again and I'll see if I can dig it out.

The only pre-requisite is that the files must be sorted first, which is not normally a problem.

Top
#158416 - 2006-03-07 09:40 AM Re: Compare two textfiles line per line
NTDOC Administrator Offline
Administrator
*****

Registered: 2000-07-28
Posts: 11623
Loc: CA
I think this is the post in question.

Compare contents directory
http://www.kixtart.org/ubbthreads/showflat.php?Cat=0&Number=156159
 

Top
#158417 - 2006-03-07 09:42 AM Re: Compare two textfiles line per line
NTDOC Administrator Offline
Administrator
*****

Registered: 2000-07-28
Posts: 11623
Loc: CA
Do you have links to these apps you're suggesting Richard?
Top
#158418 - 2006-03-07 10:03 AM Re: Compare two textfiles line per line
Richard H. Administrator Offline
Administrator
*****

Registered: 2000-01-24
Posts: 4946
Loc: Leatherhead, Surrey, UK
Quote:

I think this is the post in question.

Compare contents directory
http://www.kixtart.org/ubbthreads/showflat.php?Cat=0&Number=156159




Yes, that is the one that I meant but I see that I'm comparing lists in memory. Somewhere I have a script for comparing files without loading them in memory first - I'll have a dig around if I get time.

Having said that, the compare-in-memory UDF will probably work just as well for files of a few MB.

Top
#158419 - 2006-03-07 01:13 PM Re: Compare two textfiles line per line
Howard Bullock Offline
KiX Supporter
*****

Registered: 2000-09-15
Posts: 5809
Loc: Harrisburg, PA USA
From the original post I gathered that we are comparing two different account databases. Most corporate user account names that I have come across have been 10 characters or less. The files mentioned most likely could be parsed and only two pieces of data from each line would need to be stored in memory for the compare. So I do not see the file sizes as an issue. 20,000 accounts can easily be stored in memory.

He also mention group membership which clouds the issue of exactly how the logic would flow. More detail in this area is needed in my opinion. Adding group memberships may require a re-reading of the file or some additional memory data structure.
_________________________
Home page: http://www.kixhelp.com/hb/

Top
#158420 - 2006-03-07 01:37 PM Re: Compare two textfiles line per line
Sealeopard Offline
KiX Master
*****

Registered: 2001-04-25
Posts: 11164
Loc: Boston, MA, USA
What about loading the data into a database and them comparing the records? It might be a simpele as getting the data into an Excel spreadsheet and then using ADODB to run some SQL statements on it.
_________________________
There are two types of vessels, submarines and targets.

Top
#158421 - 2006-03-07 02:26 PM Re: Compare two textfiles line per line
Richard H. Administrator Offline
Administrator
*****

Registered: 2000-01-24
Posts: 4946
Loc: Leatherhead, Surrey, UK
This is an example using a modified version of the UDF mentioned earlier:
Code:
$aArray1="abc#123#456","def#123#456","ghi#123#456","xyz#123#456"
$aArray2="abc#123#456","ghi#999#999","xyz#123#456"

"Results of compare:"+@CRLF
udfCompareArray($aArray1,$aArray2,"#")

Function udfCompareArray($avList1,$avList2,Optional $sDelimiter)
Dim $iIndex1,$iIndex2
$iIndex1=0
$iIndex2=0
If Not $sDelimiter $sDelimiter=@CRLF EndIf
While $iIndex1<=UBound($avList1) OR $iIndex2<=UBound($avList2)
Select
Case $iIndex1>UBound($avList1)
$udfCompareArray=$udfCompareArray+@CRLF+$avList2[$iIndex2]+" missing from array 1"
$iIndex2=$iIndex2+1
Case $iIndex2>UBound($avList2)
$udfCompareArray=$udfCompareArray+@CRLF+$avList1[$iIndex1]+" missing from array 2"
$iIndex1=$iIndex1+1
Case Split($avList1[$iIndex1],$sDelimiter)[0] > Split($avList2[$iIndex2],$sDelimiter)[0]
$udfCompareArray=$udfCompareArray+@CRLF+$avList2[$iIndex2]+" missing from array 1"
$iIndex2=$iIndex2+1
Case Split($avList2[$iIndex2],$sDelimiter)[0] > Split($avList1[$iIndex1],$sDelimiter)[0]
$udfCompareArray=$udfCompareArray+@CRLF+$avList1[$iIndex1]+" missing from array 2"
$iIndex1=$iIndex1+1
Case "Match"
If $avList2[$iIndex2]<>$avList1[$iIndex1]
$udfCompareArray=$udfCompareArray+@CRLF+"'"+$avList1[$iIndex1]+"' values do not match '"+$avList2[$iIndex2]+"'"
EndIf
$iIndex1=$iIndex1+1
$iIndex2=$iIndex2+1
EndSelect
Loop
$udfCompareArray=SubStr($udfCompareArray,3)
EndFunction



The UDF had been modified to allow an optional field delimiter.

To use this compare function:
  • Load each of the files into an array using something like the ReadFile() UDF
  • The key field must be the first field.
  • The arrays must be sorted in ascending alphanumeric order. You may either sort the files first, or use the quick sort QS() UDF to sort the arrays once they have been loaded.

Top
#158422 - 2006-03-07 02:51 PM Re: Compare two textfiles line per line
Howard Bullock Offline
KiX Supporter
*****

Registered: 2000-09-15
Posts: 5809
Loc: Harrisburg, PA USA
Quote:

Then, i export the related AD-group, also to a text file.



What does that mean? Please give more details about your business process. Are you only comparing this external user DB to an OU, a security group, or all the user accounts in the AD? Your answer will definitely impact my suggested methodology.
_________________________
Home page: http://www.kixhelp.com/hb/

Top
#158423 - 2006-03-07 03:16 PM Re: Compare two textfiles line per line
Howard Bullock Offline
KiX Supporter
*****

Registered: 2000-09-15
Posts: 5809
Loc: Harrisburg, PA USA
Untested. This is the basic program flow using VBS Dictionary Objects.
Code:

Dim $objDictionary1, $objDictionary2
$objDictionary1 = CreateObject("Scripting.Dictionary")
$objDictionary2 = CreateObject("Scripting.Dictionary")


; open and read file #1 (External Users DB)
Dim $Array, $x
IF Open(3, "C:\DATA\FILE1.TXT") = 0
$x = ReadLine(3)
WHILE @ERROR = 0
$Array = split($x,"#")

; Use the array element for the user account as the KEY.
; The value at this time does not matter so set it to 1
; This may change as the program evolves
DictionarySetValue($objDictionary1, $Array[1], "1")

$x = ReadLine(3)
LOOP
ENDIF

; open and read file #2 (AD users)
IF Open(3, "C:\DATA\FILE1.TXT") = 0
$x = ReadLine(3)
WHILE @ERROR = 0
$Array = split($x,"#")

; Use the array element for the user account as the KEY.
; The value at this time does not matter so set it to 1
; This may change as the program evolves
DictionarySetValue($objDictionary2, $Array[1], "1")

$x = ReadLine(3)
LOOP
ENDIF


; Which external users exist in AD
$colKeys = $objDictionary1.Keys
For Each $strKey in $colKeys
If $objDictionary2.Exists($strKey) Then
"Specified External User ($strKey) Exists in AD" ?
Else
"Specified External User ($strKey) DOES NOT Exist in AD" ?
;do something?
End If
Next
??

; Which AD users exist in the external User DB
$colKeys = $objDictionary2.Keys
For Each $strKey in $colKeys
If $objDictionary1.Exists($strKey) Then
"Specified AD User ($strKey) Exists in the External User DB" ?
Else
"Specified AD User ($strKey) DOES NOT Exist in the External User DB" ?
;do something?
End If
Next
??



function DictionarySetValue($DictionaryObject, $key, $value)
if vartypename($DictionaryObject) <> "Object"
exit 13
endif
if vartypename($key) = "Empty"
exit 13
endif

if $DictionaryObject.Exists($key)
$DictionaryObject.Remove($key)
endif

$DictionaryObject.Add($key, $value)
exit @error
endfunction

_________________________
Home page: http://www.kixhelp.com/hb/

Top
#158424 - 2006-03-07 07:05 PM Re: Compare two textfiles line per line
therob Offline
Starting to like KiXtart

Registered: 2005-05-19
Posts: 150
Loc: Frankfurt/M., Germany
Wow, i didnt expect that much replies that fast. Thanks.

The task in a little bit more detail (forgive me my bad english):
Basically i have four files, filled with useraccount-details (not sorted). Two files are generated by an external Userdatabase.
Two files are generated by me and each of the two contain all members of a specified AD-group.

Each of the four files has a counterpart from the other database. So i have two "pairs".

Now i have to compare each "pair" against each other. So i take a line (an account) from one file(say: File 1) and look if its the counterpart file(say: File 2). If yes, its ok, if no i have to generate a "delta"-file which contains the username and the action "create user". So i go on. When i reach the end of file 1, i do the same with file 2: take every line and look if its in file 1 as well...

As said before, the script is finished and works well. But i would like to speed up the compare-part because the files will grow much bigger. But i would like to stick with kixtart because if i use another tool for comparison now, i fear i have to rewrite lots of my code (the whole script is about 2000 lines, because apart from the GUI, it contains a lot of functions for creating and handling the deltafiles, create and delete new AD-accounts etc.)

ok now i will check out the suggestions above in detail...

cheers, rob
_________________________
Eternity is a long time, especially towards the end. - W.Allan

Top
#158425 - 2006-03-08 06:58 PM Re: Compare two textfiles line per line
therob Offline
Starting to like KiXtart

Registered: 2005-05-19
Posts: 150
Loc: Frankfurt/M., Germany
Howard, i'm speechless. Your method does the job in under one second!
Great idea and thanks for the code! Thanks for ALL suggestions btw.

A few questions though:
Until now i only used adsi-com-scripting, so i'm not totally new to the usage of objects but i didnt know the 'dictionary-object'. What is that? Is it possible to assign more than one value to a key? How do i read the value(s) of a key them?

I dont understand why you remove the key if it already exists (middle part of the function) why not just 'exit' if it already exists?


cheers, rob


Edited by robweg (2006-03-08 07:02 PM)
_________________________
Eternity is a long time, especially towards the end. - W.Allan

Top
#158426 - 2006-03-08 08:13 PM Re: Compare two textfiles line per line
NTDOC Administrator Offline
Administrator
*****

Registered: 2000-07-28
Posts: 11623
Loc: CA
Quote:

Your method does the job in under one second!





Holy Smoke Batman - not sure I have a need for it, but dang if it can reduce the run time by that type of margin that's amazing.

Might have to look into that further.

Top
Page 1 of 2 12>


Moderator:  Glenn Barnas, NTDOC, Arend_, Jochen, Radimus, Allen, ShaneEP, Ruud van Velsen, Mart 
Hop to:
Shout Box

Who's Online
0 registered and 259 anonymous users online.
Newest Members
gespanntleuchten, DaveatAdvanced, Paulo_Alves, UsTaaa, xxJJxx
17864 Registered Users

Generated in 0.072 seconds in which 0.023 seconds were spent on a total of 14 queries. Zlib compression enabled.

Search the board with:
superb Board Search
or try with google:
Google
Web kixtart.org