How to Replace Texts in a File before Importing It Into Access

As part of our ongoing series of articles covering importing data into MS Access we’re going to cover another common aspect of getting a file ready to be imported – namely, what do you do when you have files to import that have header lines or footers with page numbers on them? Again, as common as this is, MS Access doesn’t have a built in method for handling these files so traditionally you’re left with having to use text processing tools like Monarch Pro to pre-process the files before you import them. But what if you don’t have access to those tools? Or don’t want the added complication of maintaining yet another software package? In this article we’ll show you how, using a little VBA, you can remove those types of lines from the file before you start your import.

An example – removing page footers or headers before importing your file

Import Your File Into AccessFor this article, we’ll use a very common situation for our example – namely, we have a file that contains footer text that has the page number on it. Naturally we don’t want to attempt to import those lines into our table as it won’t be in the correct format, so what we need to do is to remove all of those lines from the file before we import it. Before we get stuck in to putting the solution together, always keep in mind that even best intentions for data cleaning can potentially cause issues if the file you’re importing has any major issues that you’re not aware of – the results of importing a damaged or incorrectly formatted file could even potentially corrupt Access file – so do take the time to make sure you know what needs to be cleaned from any file you’re importing before you begin!

The approach

How we’re going to do this is pretty straight forward. First we will read the whole file into a string, we’ll then split that string into an array that we can loop over, and finally we’ll loop over that array, line by line, and write any line except those that match the pattern we’re searching for back out to a file. The code below will allow you to use either create a new cleaned file, or overwrite the existing file by passing the same file name for both the fileIn and fileOut parameters.

The code

Sub CleanFile(fileIn As String, fileOut As String, removePattern As String, Optional positionOfText As String = "anywhere")
    Dim fs As Variant
    Dim filObj As Variant
    Dim fileString As String
    Dim fileArray As Variant
    Dim doWrite As Boolean
    
    'open the file for reading
    Set fs = CreateObject("Scripting.FileSystemObject")
    Set filObj = fs.OpenTextFile(fileIn, 1)
    'read the whole file in one go
    fileString = filObj.readall
    'and split into an array
    fileArray = Split(fileString, vbNewLine)
    
    filObj.Close
    
    'create our output file - overwrite if it already exists
    Set filObj = fs.CreateTextFile(fileOut, True)
    For Each ln In fileArray
        doWrite = False
        
        'only output the line if it does NOT contain the string we're looking for
        Select Case UCase(positionOfText)
            
            Case "START"
                If Left(Trim(ln), Len(removePattern)) <> removePattern Then doWrite = True
            
            Case "END"
                If Right(Trim(ln), Len(removePattern)) <> removePattern Then doWrite = True
            
            Case "ANYWHERE"
            If InStr(ln, removePattern) = 0 Then doWrite = True
            Case Else
                MsgBox "Invalid parameter given - valid options are: START, END, or ANYWHERE", vbExclamation, "Incorrect Parameter given"
        
        End Select
        
        If doWrite Then filObj.writeline ln

    Next
    
End Sub

Explaining the code

VBA CodeSo that we can use the code in many different situations, we’ve added an option to say where on the line you’re looking for text – i.e. the end of the line (the date a report is ran is often found there for example), the start of the line (which for example could be the report name or the page number), or anywhere on the line. The “Select Case” section of code checks in the relevant position, and if the text we’re looking for doesn’t exist then the line won’t be written out to the new file.

Using the code

Going back to our example, let’s say each page on the report/file we want to import has a footer that reads:

Page 1 of 30

What we need to do here is when we are looping through the individual lines in our code array, we obviously need to ignore all lines that start with “Page “ – there may or may not be spaces at the start of the line, but these are stripped out using the Trim() function in the code.

To remove those lines from our file we would simply use the following command:

CleanFile(“H:\Input.txt”, “H:\Output.txt”, “Page “, “START”)

Once ran, the Output.txt file would contain all lines except the page footers – which is exactly what we want.

Author Introduction:

Mitchell Pond is a data recovery expert in DataNumen, Inc., which is the world leader in data recovery technologies, including repair SQL Server db and excel recovery software products. For more information visit www.datanumen.com

Leave a Reply

Your email address will not be published. Required fields are marked *