As part of our ongoing series of articles covering importing data into MS Access we’re going to cover another common aspect of getting a file ready to be imported – namely, what do you do when you have files to import that have header lines or footers with page numbers on them? Again, as common as this is, MS Access doesn’t have a built in method for handling these files so traditionally you’re left with having to use text processing tools like Monarch Pro to pre-process the files before you import them. But what if you don’t have access to those tools? Or don’t want the added complication of maintaining yet another software package? In this article we’ll show you how, using a little VBA, you can remove those types of lines from the file before you start your import.
An example – removing page footers or headers before importing your file
For this article, we’ll use a very common situation for our example – namely, we have a file that contains footer text that has the page number on it. Naturally we don’t want to attempt to import those lines into our table as it won’t be in the correct format, so what we need to do is to remove all of those lines from the file before we import it. Before we get stuck in to putting the solution together, always keep in mind that even best intentions for data cleaning can potentially cause issues if the file you’re importing has any major issues that you’re not aware of – the results of importing a damaged or incorrectly formatted file could even potentially corrupt Access file – so do take the time to make sure you know what needs to be cleaned from any file you’re importing before you begin!
How we’re going to do this is pretty straight forward. First we will read the whole file into a string, we’ll then split that string into an array that we can loop over, and finally we’ll loop over that array, line by line, and write any line except those that match the pattern we’re searching for back out to a file. The code below will allow you to use either create a new cleaned file, or overwrite the existing file by passing the same file name for both the fileIn and fileOut parameters.
Sub CleanFile(fileIn As String, fileOut As String, removePattern As String, Optional positionOfText As String = "anywhere") Dim fs As Variant Dim filObj As Variant Dim fileString As String Dim fileArray As Variant Dim doWrite As Boolean 'open the file for reading Set fs = CreateObject("Scripting.FileSystemObject") Set filObj = fs.OpenTextFile(fileIn, 1) 'read the whole file in one go fileString = filObj.readall 'and split into an array fileArray = Split(fileString, vbNewLine) filObj.Close 'create our output file - overwrite if it already exists Set filObj = fs.CreateTextFile(fileOut, True) For Each ln In fileArray doWrite = False 'only output the line if it does NOT contain the string we're looking for Select Case UCase(positionOfText) Case "START" If Left(Trim(ln), Len(removePattern)) <> removePattern Then doWrite = True Case "END" If Right(Trim(ln), Len(removePattern)) <> removePattern Then doWrite = True Case "ANYWHERE" If InStr(ln, removePattern) = 0 Then doWrite = True Case Else MsgBox "Invalid parameter given - valid options are: START, END, or ANYWHERE", vbExclamation, "Incorrect Parameter given" End Select If doWrite Then filObj.writeline ln Next End Sub
Explaining the code
So that we can use the code in many different situations, we’ve added an option to say where on the line you’re looking for text – i.e. the end of the line (the date a report is ran is often found there for example), the start of the line (which for example could be the report name or the page number), or anywhere on the line. The “Select Case” section of code checks in the relevant position, and if the text we’re looking for doesn’t exist then the line won’t be written out to the new file.
Using the code
Going back to our example, let’s say each page on the report/file we want to import has a footer that reads:
Page 1 of 30
What we need to do here is when we are looping through the individual lines in our code array, we obviously need to ignore all lines that start with “Page “ – there may or may not be spaces at the start of the line, but these are stripped out using the Trim() function in the code.
To remove those lines from our file we would simply use the following command:
CleanFile(“H:\Input.txt”, “H:\Output.txt”, “Page “, “START”)
Once ran, the Output.txt file would contain all lines except the page footers – which is exactly what we want.
Mitchell Pond is a data recovery expert in DataNumen, Inc., which is the world leader in data recovery technologies, including repair SQL Server db and excel recovery software products. For more information visit www.datanumen.com