Remove and replace special characters in VBA

Insert Symbol

While pulling some XML from the web I ran into some special characters that display on screen, but the MSXML parser refuses to accept. They need to be converted to normal (i.e. unaccented) characters before we can parse the XML.

I'm referring to the accented characters that look like this: é or ñ

At first I just replaced the one or two characters I find. For example, in the New York State Legislature, there's only one member with a name that needs fixing: Senator José Serrano.

But I need a better solution. What if the next XML response from another API returns a different special character? I can't work on a case by case basis, replacing each character as it comes.

So I went searching and found this solution: Excel – Replace accented chars with regular chars

Here is a slight adaptation of that code. Just pass in a string you want to scrub; any foreign characters will be replaced with their non-accented equivalent. (Did I miss any?) You can use it as a UDF on the worksheet as well as in VBA.

It has some slight changes; for example, instead of always looping through the accented characters list, it loops through the input string if it's shorter. I listed the most likely scenario first (the input string is longer) because right now the strings I'm passing are large XML responses more likely to be longer than the list of accented characters.

Function ConvertAccent(ByVal inputString As String) As String
' http://www.vbforums.com/archive/index.php/t-483965.html

Const AccChars As String = _
    "ŠŽšžŸÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖÙÚÛÜÝàáâãäåçèéêëìíîïðñòóôõöùúûüýÿ"
Const RegChars As String = _
    "SZszYAAAAAACEEEEIIIIDNOOOOOUUUUYaaaaaaceeeeiiiidnooooouuuuyy"

Dim i As Long, j As Long
Dim tempString As String
Dim currentCharacter As String
Dim found As Boolean
Dim foundPosition As Long

  tempString = inputString

  ' loop through the shorter string
  Select Case True
    Case Len(AccChars) <= Len(inputString)
      ' accent character list is shorter (or same)
      ' loop through accent character string
      For i = 1 To Len(AccChars)

        ' get next accent character
        currentCharacter = Mid$(AccChars, i, 1)

        ' replace with corresponding character in "regular" array
        If InStr(tempString, currentCharacter) > 0 Then
          tempString = Replace(tempString, currentCharacter, Mid$(RegChars, i, 1))
        End If

      Next i
    Case Len(AccChars) > Len(inputString)
      ' input string is shorter
      ' loop through input string
      For i = 1 To Len(inputString)

        ' grab current character from input string and
        ' determine if it is a special char
        currentCharacter = Mid$(inputString, i, 1)
        found = (InStr(AccChars, currentCharacter) > 0)

        If found Then

          ' find position of special character in special array
          foundPosition = InStr(AccChars, currentCharacter)

          ' replace with corresponding character in "regular" array
          tempString = Replace(tempString, currentCharacter, _
    Mid$(RegChars, foundPosition, 1))

        End If
      Next i
  End Select

  ConvertAccent = tempString
End Function

If the list of accent characters is shorter (the more likely scenario) then we loop through it and assign each character to a String variable. If it exists in the input string, the accented character is replaced with the corresponding character in the regular string list, even if it is found multiple times (the Replace function). So regardless of the size of the input string, the loop will only execute 60 times (the number of accented characters) per string.

If the list of accented characters is longer, then we loop through the input string. Each character is grabbed and assigned to a String variable. If it is found in the list of accented characters, it is replaced with the corresponding character in the regular string list. Even though the list of accented characters contains 60 characters, the loop will only execute n times (the number of characters in the input string).

The reason I did this is because in the original function, you always loop through the accented characters (60 times), even if the string is very short. In my function, instead of a minimum 60 loop function, this one can potentially loop less times (if the input string is less than 60 characters). I hope that makes sense.

Sample usage

Sub TestConvert()
  Debug.Print ConvertAccent("Jimmy Peña")
End Sub

Note that ASAP Utilities can do the same thing.

About JP

I'm just an average guy who writes VBA code for a living. This is my personal blog. Excel and Outlook are my thing, with a sprinkle of Access and Word here and there. Follow this space to learn more about VBA. Keep Reading »


Related Articles:


Share This Article:

Share and bookmark this articledelicious buttonfacebook buttonlinkedin buttonstumbleupon buttontwitter button

comment bubble 6 Comment(s) on Remove and replace special characters in VBA:

  1. Rick Rothstein (MVP - Excel) writes:

    I believe this will execute much quicker than your code even if inputString is much longer than the AccChars string constant. Instr and Mid (used both as a function and a statement) are exceptionally fast whereas the Replace function is not. Sure, for extremely long inputString values containing few accented characters, your code might (note I said "might") win out (I'm not really sure), but I would expect the following code to really shine in the majority of text strings that one could throw at it. If you have some long text files to test both fucntions on, the result of a set of timed test could prove interesting. Anyway, here is my function for your and your reader's consideration…

    Function ConvertAccent(ByVal inputString As String) As String
      ' http://www.vbforums.com/archive/index.php/t-483965.html
      Dim X As Long, Position As Long
      Const AccChars As String = _
            "ŠŽšžŸÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖÙÚÛÜÝàáâãäåçèéêëìíîïðñòóôõöùúûüýÿ"
      Const RegChars As String = _
            "SZszYAAAAAACEEEEIIIIDNOOOOOUUUUYaaaaaaceeeeiiiidnooooouuuuyy"
      For X = 1 To Len(inputString)
        Position = InStr(AccChars, Mid(inputString, X, 1))
        If Position Then Mid(inputString, X) = Mid(RegChars, Position, 1)
      Next
      ConvertAccent = inputString
    End Function
  2. Hi Guys,

    I have been looking all over the web for a method to convert the european notation of numbers to the US notation, e.g. 35,00 becomes 35.00. Changing the country settings wasn't possible because it has to work on different workstations and shoud be fool-proof. Also, using the format (number, "o.oo") function didn't work because excel would convert back to european notation immediately.

    By reducing your code (using bedrag1 as my variable) i finally found the solution, so it's only fair to share:

    For X = 1 To Len(bedrag1)
    position = InStr(",", Mid(bedrag1, X, 1))
    If position Then Mid(bedrag1, X) = Mid(".", position, 1)
    Next

    Given that i am a n00b at VBA (started 2 days ago with "hello world") i was surprised it works in such a simple way, but it does :-)

  3. Rick Rothstein (MVP - Excel) writes:

    @Hugo,

    For only two days programming… very good Hugo! And very good of you to notice, since InStr returns only 0 or positive integers, that you do not have to test position>0 in your If..Then statement as VB considers any non-zero value to be True when testing it as logical expression. As you learn more and more about VB and the various functions available, you will start to find there are usually more than one way to do thing in code, some more concise than others.. Just to try to spur you on to read up on all the functions available in VB, here is an alternate way to do what your posted code does…

    bedrag1 = Replace(bedrag1, ",", ".")

  4. Que tal JP, me ha servido mucho tu código aunque necesito algo más, te explico, tengo nombres propios en los cuales tengo caracteres raros por ejemplo :
    MONTAムEZ
    SALDA・A

    los cuales al querer descubrir su ASCII obtengo esto:

    c =  Asc(Mid(ActiveCell, b, 1))
    MsgBox c
    

    en el Msgbox obtengo un "?" y el ASCII es siempre 63 y al intentar cambiarlo nunca resulta, ¿alguna idea?, Slds

This article is closed to any future comments.
learn excel dashboards