#178123 - 2007-07-20 10:25 PM
Convert TXT file from UTF-8 to ANSI
|
green78
Fresh Scripter
Registered: 2007-05-02
Posts: 34
|
Hi folks,
I read some old topics but couldn't somehow find a solution for my problem. I'm downloading a CSV file from a website (normally file wont be greater than 500 KB) that comes in UTF-8 Encoding. What I want to do via Kix I think it's pretty simple:
$Filename = 'C:\report.csv'
Open the file in Notepad change the encoding to ANSI and save the file overwriting the old one.
Thanks !
|
Top
|
|
|
|
#178124 - 2007-07-20 11:08 PM
Re: Convert TXT file from UTF-8 to ANSI
[Re: green78]
|
Allen
KiX Supporter
Registered: 2003-04-19
Posts: 4545
Loc: USA
|
I've used the following to convert Unicode text files to ansi... not sure about the utf-8 format though
shell '%comspec% /c type ' + $filename + '>' + $newfilename
|
Top
|
|
|
|
#178125 - 2007-07-20 11:22 PM
Re: Convert TXT file from UTF-8 to ANSI
[Re: Allen]
|
green78
Fresh Scripter
Registered: 2007-05-02
Posts: 34
|
Hi Allen,
Sorry but I tried it - doesn't work for UTF-8 :(. There are some cyrillic symbols inside.
Basically I will be happy if I have something like:
Open the file with Notepad, change the option to ANSI, Save. That's it.
|
Top
|
|
|
|
#178128 - 2007-07-20 11:49 PM
Re: Convert TXT file from UTF-8 to ANSI
[Re: Allen]
|
green78
Fresh Scripter
Registered: 2007-05-02
Posts: 34
|
Yes, I know sendkeys, but since this script should work in locked out PC, I don't think it will be of much use... and if for a sec I lose focus...
Thanks anyway, seems this is bigger problem than I thought
|
Top
|
|
|
|
#178138 - 2007-07-21 10:04 AM
Re: Convert TXT file from UTF-8 to ANSI
[Re: NTDOC]
|
green78
Fresh Scripter
Registered: 2007-05-02
Posts: 34
|
Hi NTDOC,
Actually I was looking across the web for a solution that would involve nothing more than Notepad and Kix.
Manually to do this is so easy, I can't believe it's so hard to automate. What I do to convert the file manually is:
1. Open the file in "Notepad" 2. Select "File" --> "Save As" 3. Select Encoding "ANSI" 4. Choose Filename and path....
That's it.
Probably it will be easier to do this via CreateObject ("Word.Application") and then Save As with proper Encoding. I will check if this can happen and post here if something works out.
|
Top
|
|
|
|
#178139 - 2007-07-21 11:49 AM
Re: Convert TXT file from UTF-8 to ANSI
[Re: green78]
|
green78
Fresh Scripter
Registered: 2007-05-02
Posts: 34
|
Ok folks, here's what I came up with - pretty much suits me:
It is combination of VB and Kix (mainly VB does the job + MS Word).
Kix code only activates MS Word and the switch:
====================================== Kix file: Transform.kix ======================================
$WordApp = "C:\Program Files\Microsoft Office\Office11\Winword /mMacro2"
SHELL $WordApp
Macro2() in Visual Basic in your Normal.dot in MS Word would be:
Sub Macro2()
'
' Macro2 Macro
' Macro recorded 21.7.2007 by BOBBYD
'
ChangeFileOpenDirectory "C:\"
Documents.Open FileName:="report.csv", ConfirmConversions:=False, ReadOnly _
:=False, AddToRecentFiles:=False, PasswordDocument:="", PasswordTemplate _
:="", Revert:=False, WritePasswordDocument:="", WritePasswordTemplate:="" _
, Format:=wdOpenFormatAuto, XMLTransform:="", Encoding:=65001
ChangeFileOpenDirectory "C:\"
ActiveDocument.SaveAs FileName:="reportANSI.csv", FileFormat:=wdFormatText _
, LockComments:=False, Password:="", AddToRecentFiles:=True, _
WritePassword:="", ReadOnlyRecommended:=False, EmbedTrueTypeFonts:=False, _
SaveNativePictureFormat:=False, SaveFormsData:=False, SaveAsAOCELetter:= _
False, Encoding:=1251, InsertLineBreaks:=False, AllowSubstitutions:=False _
, LineEnding:=wdCRLF
Application.Quit
End Sub
It's prety simple and can convert any TXT file from UTF-8 to ANSI... still not as clean as I wanted but if something better doesn't come up I will use it... and other encodings of course...
Edited by green78 (2007-07-21 12:00 PM)
|
Top
|
|
|
|
#178145 - 2007-07-21 11:05 PM
Re: Convert TXT file from UTF-8 to ANSI
[Re: Witto]
|
green78
Fresh Scripter
Registered: 2007-05-02
Posts: 34
|
In your solution, you have to rely on the existance of a macro and the filenames always have to be the same. Maybe with some COM scripting, you could create a more flexible and reusable script.
I would, Witto, the only problem is that I can't :).
I'm newbie in Kix, and my knowledge stops here. I only found what suits me at the moment - fast conversion of a text file from UTF-8 to ANSI (cp:1251) in my case. I needed fast resolution and fully automated.
I know it's rubbish as a composite, but that's why I posted it - if someone more experienced can transform it into a more neat solution.
My guess is CreatObject ("Word.Application") Open document... SaveAs....
And I stop here, I don't know how to set the Encoding to what I desire.
Maybe a UDF will be perfect to create, but I don't know if it will be useful for many people, and yes - you must have MS Word installed - another dependency.
|
Top
|
|
|
|
#178146 - 2007-07-22 01:49 AM
Re: Convert TXT file from UTF-8 to ANSI
[Re: green78]
|
Witto
MM club member
Registered: 2004-09-29
Posts: 1828
Loc: Belgium
|
Here is some explanation about the constants Word 2003 VBA Language Reference: Word Enumerated Constants You can reuse the macro you created. Together with some imagination and AdminScriptEditor that reveals all the arguments, you could write something like:
;************************************************************************* ; Script Name: ; Author: green78 ; Date: 21/07/2007 ; Description: ;*************************************************************************
;Script Options If Not @LOGONMODE Break On Else Break Off EndIf Dim $RC $RC = SetOption("Explicit", "On") $RC = SetOption("NoMacrosInStrings", "On") $RC = SetOption("NoVarsInStrings", "On") If @SCRIPTEXE = "KIX32.EXE" $RC = SetOption("WrapAtEOL", "On") EndIf
;Declare vaiables Dim $strSrcFile, $strDstFile
Dim $wdOpenFormatAllWord, $wdOpenFormatAuto, $wdOpenFormatDocument, $wdOpenFormatEncodedText Dim $wdOpenFormatRTF, $wdOpenFormatTemplate, $wdOpenFormatText, $wdOpenFormatUnicodeText Dim $wdOpenFormatWebPages
Dim $wdFormatDocument, $wdFormatDOSText, $wdFormatDOSTextLineBreaks, $wdFormatEncodedText Dim $wdFormatFilteredHTML, $wdFormatHTML, $wdFormatRTF, $wdFormatTemplate, $wdFormatText Dim $wdFormatTextLineBreaks, $wdFormatUnicodeText, $wdFormatWebArchive, $wdFormatXML
Dim $wdCRLF, $wdCROnly, $wdLFCR, $wdLFOnly, $wdLSPS
Dim $objWord
;Initialize variables $strSrcFile = "C:\test\test-UDF-8.txt" $strDstFile = "C:\test\test-ANSI.txt"
$wdOpenFormatAllWord = 6 $wdOpenFormatAuto = 0 $wdOpenFormatDocument = 1 $wdOpenFormatEncodedText = 5 $wdOpenFormatRTF = 3 $wdOpenFormatTemplate = 2 $wdOpenFormatText = 4 $wdOpenFormatUnicodeText = 5 $wdOpenFormatWebPages = 7
$wdFormatDocument = 0 $wdFormatDOSText = 4 $wdFormatDOSTextLineBreaks = 5 $wdFormatEncodedText = 7 $wdFormatFilteredHTML = 10 $wdFormatHTML = 8 $wdFormatRTF = 6 $wdFormatTemplate = 1 $wdFormatText = 2 $wdFormatTextLineBreaks = 3 $wdFormatUnicodeText = 7 $wdFormatWebArchive = 9 $wdFormatXML = 11
$wdCRLF = 0 $wdCROnly = 1 $wdLFCR = 3 $wdLFOnly = 2 $wdLSPS = 4
$objWord = CreateObject("Word.Application") If @ERROR ? "Error creating Excel object" ? "Error " + @ERROR + ": " + @SERROR Exit @ERROR EndIf
;$objWord.Visible = True ;No need to make this visible ;$RC = $objWord.Documents.Open($strSrcFile,0,0,0,"","",0,"","",$wdOpenFormatAuto,65001,,,,,"") $RC = $objWord.Documents.Open($strSrcFile) ;You will see this is enough ;$RC = $objWord.ActiveDocument.SaveAs($strDstFile,$wdFormatText,0,"",1,"",0,0,0,0,0,1252,0,0,$wdCRLF) $RC = $objWord.ActiveDocument.SaveAs($strDstFile,$wdFormatText) ;You will see this is enough $objWord.Application.Quit |
Here is the essential and simplified part
Break ON $strSrcFile = "C:\test\test-UDF-8.txt" $strDstFile = "C:\test\test-ANSI.txt" $objWord = CreateObject("Word.Application") $RC = $objWord.Documents.Open($strSrcFile) $RC = $objWord.ActiveDocument.SaveAs($strDstFile,2) $objWord.Application.Quit |
|
Top
|
|
|
|
#178150 - 2007-07-22 11:53 AM
Re: Convert TXT file from UTF-8 to ANSI
[Re: Witto]
|
green78
Fresh Scripter
Registered: 2007-05-02
Posts: 34
|
Witto,
This is great. Many thanks for your effort. This is exactly what it has to be - simple, effective :).
Usage of word is inevitable in this case, since I don't see enumerations for Notepad :).
This will do perfect work, many thanks once again.
|
Top
|
|
|
|
#201265 - 2010-12-22 08:43 PM
Re: Convert TXT file from UTF-8 to ANSI
[Re: green78]
|
green78
Fresh Scripter
Registered: 2007-05-02
Posts: 34
|
Hi guys,
Would it be possible the below script to be enriched with ReplaceAll function execution? The blue text is VBScript - I hope it can be translated and the whole thing can be done via Kix only?
Thank you!
Break ON $strSrcFile = "C:\test\test-UDF-8.txt" $strDstFile = "C:\test\test-ANSI.txt" $objWord = CreateObject("Word.Application") $RC = $objWord.Documents.Open($strSrcFile) With Selection.Find .Text = "~" .Replacement.Text = "," .Forward = True .Wrap = wdFindContinue .Format = False .MatchCase = False .MatchWholeWord = False .MatchWildcards = False .MatchSoundsLike = False .MatchAllWordForms = False End With Selection.Find.Execute Replace:=wdReplaceAll $RC = $objWord.ActiveDocument.SaveAs($strDstFile,2) $objWord.Application.Quit
Edited by green78 (2010-12-22 09:04 PM)
|
Top
|
|
|
|
#201268 - 2010-12-22 11:56 PM
Re: Convert TXT file from UTF-8 to ANSI
[Re: green78]
|
ShaneEP
MM club member
Registered: 2002-11-29
Posts: 2125
Loc: Tulsa, OK
|
Well, I gave it my best shot, but I'm not able to find the correct syntax for the .execute statement. If I just do a simple .Execute() it seems to be finding the text successfully, but can't get the replace parameter right. Hopefully someone else with more COM experience will come along soon and help you out.
Break ON
$strSrcFile = "C:\test\test-UDF-8.txt"
$strDstFile = "C:\test\test-ANSI.txt"
$strFind = "~"
$strReplace = ","
$objWord = CreateObject("Word.Application")
$RC = $objWord.Documents.Open($strSrcFile)
$objWord.Selection.Find.Text = $strFind
$objWord.Selection.Find.Replacement.Text = $strReplace
$objWord.Selection.Find.Forward = 1
$objWord.Selection.Find.Format = 0
$objWord.Selection.Find.MatchCase = 0
$objWord.Selection.Find.MatchWholeWord = 0
$objWord.Selection.Find.MatchWildcards = 0
$objWord.Selection.Find.MatchSoundsLike = 0
$objWord.Selection.Find.MatchAllWordForms = 0
$RC = $objWord.Selection.Find.Execute(Replace:=wdReplaceAll)
$RC = $objWord.ActiveDocument.SaveAs($strDstFile,2)
$RC = $objWord.Application.Quit
Edited by CitrixMan (2010-12-22 11:57 PM)
|
Top
|
|
|
|
#201269 - 2010-12-23 12:47 AM
Re: Convert TXT file from UTF-8 to ANSI
[Re: ShaneEP]
|
green78
Fresh Scripter
Registered: 2007-05-02
Posts: 34
|
Thanks a lot CitrixMan... this gives some guidelines.
I'm trying to figure it out from here - execute method as it relates to find:
http://msdn.microsoft.com/en-us/library/aa171990(v=office.11).aspx
but can't get it right
Edit: Woo-hoo I made it :), finally... here it is:
Break ON
$strSrcFile = "C:\kix\test1.txt"
$strDstFile = "C:\kix\test2.txt"
$strFind = "~"
$strReplace = ","
$objWord = CreateObject("Word.Application")
$RC = $objWord.Documents.Open($strSrcFile)
$RC = $objWord.ActiveDocument.Content.Find.Execute($strFind, , , , , , , , , $strReplace, 2)
$RC = $objWord.ActiveDocument.SaveAs($strDstFile,2)
$RC = $objWord.Application.Quit
Edited by green78 (2010-12-23 12:53 AM) Edit Reason: I made it :)
|
Top
|
|
|
|
#201270 - 2010-12-23 01:03 AM
Re: Convert TXT file from UTF-8 to ANSI
[Re: green78]
|
green78
Fresh Scripter
Registered: 2007-05-02
Posts: 34
|
Basically the whole idea of the whole exercise was to open a "CSV" file but with ~ (tilde separator) in UTF-8 encoding, remove all commas from it, then change the tildes to commas and save it as ANSI encoded pure CSV file without excessive commas.
Here is the full code for it in case someone interested in such a scenario or just replacing a string in a text file:
Break ON
$strSrcFile = "C:\kix\test1.txt"
$strDstFile = "C:\kix\test2.txt"
$strFind = ","
$strReplace = "."
$strFind2 = "~"
$strReplace2 = ","
$objWord = CreateObject("Word.Application")
$RC = $objWord.Documents.Open($strSrcFile)
$RC = $objWord.ActiveDocument.Content.Find.Execute($strFind, , , , , , , , , $strReplace, 2)
$RC = $objWord.ActiveDocument.Content.Find.Execute($strFind2, , , , , , , , , $strReplace2, 2)
$RC = $objWord.ActiveDocument.SaveAs($strDstFile,2)
$RC = $objWord.Application.Quit
Edited by green78 (2010-12-23 01:06 AM)
|
Top
|
|
|
|
#212537 - 2017-06-03 09:32 PM
Re: Convert TXT file from UTF-8 to ANSI
[Re: green78]
|
green78
Fresh Scripter
Registered: 2007-05-02
Posts: 34
|
Bringing this topic up again as I have a new challenge. So far with your help I managed to automate the conversion of a txt file from UTF-8 to ANSI thru MS Word. But now the thing with server side automation is that you shouldn't be using MS Office applications for server side automation as some error messages may popup that would need interaction.
Long story short need to convert a txt file from UTF-8 encoding to ANSI but without using MS Office applications. What I managed to find online is a VB script that utilizes ADODB.Stream but when I convert it to KIX script it doesn't work for me properly.
Can someone take a look and advise what am I doing wrong? I'm getting an error of type: "Expected ")" at line
$stream = CreateObject("ADODB.Stream")
$stream.Open
$stream.Type = 2 'text
$stream.Charset = "utf-8"
$stream.LoadFromFile "C:\input.txt"
$text = stream.ReadText
$stream.Close
$fso = CreateObject("Scripting.FileSystemObject")
$f = fso.OpenTextFile("C:\output.txt", 2, True, True)
$f.Write $text
$f.Close
|
Top
|
|
|
|
#212538 - 2017-06-04 04:59 PM
Re: Convert TXT file from UTF-8 to ANSI
[Re: green78]
|
green78
Fresh Scripter
Registered: 2007-05-02
Posts: 34
|
After some more searching on the web it looks like Powershell can be an easy solution (for the output file I use the Default encoding the machine instead of "ascii" as I need to preserve the Cyrillic characters). Guess will go with the below:
gc -en utf8 utf8.txt | Out-File -en default out.txt
or this
[io.file]::ReadAllText("c:\kix\utf8.txt", [System.Text.Encoding]::utf8) | %{[io.file]::WriteAllText("c:\kix\out.txt", $_, [System.Text.Encoding]::Default)}
|
Top
|
|
|
|
#212539 - 2017-06-06 07:30 PM
Re: Convert TXT file from UTF-8 to ANSI
[Re: green78]
|
ShaneEP
MM club member
Registered: 2002-11-29
Posts: 2125
Loc: Tulsa, OK
|
$stream = CreateObject("ADODB.Stream")
$stream.Open()
$stream.Type = 2
$stream.Charset = "utf-8"
$stream.LoadFromFile("C:\input.txt")
$text = $stream.ReadText()
$stream.Close()
$fso = CreateObject("Scripting.FileSystemObject")
$f = $fso.OpenTextFile("C:\output.txt", 2, 1, 0) ; -2 for Default, -1 for Unicode, 0 for ASCII
$f.Write($text)
$f.Close()
|
Top
|
|
|
|
Moderator: Jochen, Allen, Radimus, Glenn Barnas, ShaneEP, Ruud van Velsen, Arend_, Mart
|
0 registered
and 507 anonymous users online.
|
|
|