Showing posts with label Email. Show all posts
Showing posts with label Email. Show all posts

Tuesday, May 27, 2014

Extracting E-Mail addresses from a file or a web page using Microsoft Excel

Recently I needed to extract E-mail addresses from a range of cells in an Excel file. I looked around for a solution on Internet.

After going through number of solutions, I hit upon a very elegant solution that could be used easily for extracting Email addresses not only from Excel file but from any web page or a word file or anything that can be selected and copied and pasted in to a Excel sheet. The solution is described below.

It's essentially a short VBA (Visual Basic for Applications) code (also called a program) to create a custom function in Excel named FindEmailAddresses(Cell1:Cell2). Where range Cell1:Cell2 would contain the text from where we wish to extract Email addresses.

Assuming we have already created the function (explained later below) - here is a description of how to use it.
1. Assuming cells A1 to C5 contains the text from where we want to extract email ids
2. Insert a new column at A1 - so that original content of A1 to C5 will shift to B1 to D5.
3. Now, select Cells A1 to A4 - where the extracted Email ids would come. Here, we are selecting only 4 rows because we know that number of extracted Email ids will be less than 4. If they are more then select more number of rows.
4. While keeping the selection, in Cell A1 type the function =FindEmailAddresses(B1:D5), as shown below.
5. Now press Ctrl + Shift + Enter. Note that this step is crucial. By doing this, formula in A1 will get copied to all cells up to A4.



A
B
C
D
1
=FindEmailAddresses(B1:C5)
Hello world
Well just a empty cell
Nothing here
2
=FindEmailAddresses(B1:C5)
There is no email here
There is an email id:prakash@gmail.com
Here also nothing
3
=FindEmailAddresses(B1:C5)
Excel is very good tool
Practically this is useless

4
=FindEmailAddresses(B1:C5)
Her email is meena@yahoo.com but it changed
Nothing here
Well this Is good
5

Nothing

Theoretically this is empty

You will the extracted Email ids from B1:D5 in A1:A4 as shown below.
 
A
B
C
D
1
prakash@gmail.com
Hello world
Well just a empty cell
Nothing here
2
meena@yahoo.com
There is no email here
There is an email id:prakash@gmail.com
Here also nothing
3
#NA
Excel is very good tool
Practically this is useless

4
#NA
Her email is meena@yahoo.com but it changed
Nothing here
Well this Is good
5

Nothing

Theoretically this is empty

If all cells A1:A4 shows #VALUE! then save the file and upon opening the file you will see a warning saying 'Macros are disabled' Click on 'Enable Macro' and you will see the Emails ids in A1 to A4. If number of Email ids are less than 4 then you will see #NA in remaining cells. If they are more then select A1:A4 and press 'Del' key. Now, select more number of cells in step-3 above and continue. 
 
Note that if the range B1:D5 has only one Email id then all cells A1:A4 gets filled up with same Email id.

How do I extract Email addresses from my Gmail or Outlook application?
Well, this technique can be used only for extracting Email addresses from a single page. If you wish to extract Email addresses from your Email application - you will have to get programs that are developed specifically to do this work.

How to add user defined function FindEmailAddresses
Before you can use above function, you will have to add the user defined function using following procedure in Excel. It works with all versions of Excel from 2003 to 2013.

  1. Press Alt-F11 to open Visual Basic editor (while you are in the file containing the Email addresses)
  2. On top menu under Insert click on Module.
  3. Copy and paste the user defined function (given below) into new module window.
  4. Exit Visual Basic editor by clicking on 'x'
  5. Use the FindEmailAddresses function in Excel file, as explained above.

Visual Basic for Applications (VBA) Code:
Copy and Paste following code (program) in Visual Basic editor of Excel.
-----------------------------------------
Function FindEmailAddresses(rng As Range) As Variant()
Dim Temp As String, Cell As Range, EM() As Variant
ReDim EM(0)
For Each Cell In rng
Temp = Cell.Value
Do While InStr(Temp, "@")
   EM(UBound(EM)) = GetEmailAddress(Temp)
   Temp = Replace(Temp, "@", "", 1, 1)
   ReDim Preserve EM(UBound(EM) + 1)
   Loop
Next
ReDim Preserve EM(UBound(EM) - 1)
FindEmailAddresses = WorksheetFunction.Transpose(EM)
End Function

Function GetEmailAddress(ByVal S As String) As String
Dim x As Long, AtSign As Long
Dim Locale As String, DomainPart As String
Locale = "[A-Za-z0-9.!#$%&'*/=?^_`{|}~+-]"
Domain = "[A-Za-z0-9._-]"
AtSign = InStr(S, "@")
For x = AtSign To 1 Step -1
    If Not Mid(" " & S, x, 1) Like Locale Then
       S = Mid(S, x)
    If Left(S, 1) = "." Then S = Mid(S, 2)
       Exit For
    End If
Next x
AtSign = InStr(S, "@")
For x = AtSign + 1 To Len(S) + 1
    If Not Mid(S & " ", x, 1) Like Domain Then
       S = Left(S, x - 1)
    If Right(S, 1) = "." Then S = Left(S, Len(S) - 1)
       GetEmailAddress = S
       Exit For
    End If
Next x
End Function
----------------------------------------
Explanatory Notes
This notes are for those who wants to know more about technique used in this solution and further improvements.
1.  VBA code uses simple string comparison to find out presence of @ within the content of a cell. If it found one it found out word before and after it and extracted it as a Email address. This is done in GetEmailAddress function
2. It uses a clever Array Formula technique to store the extracted Email addresses one each in each cell. This is done by Ctrl+Shift+Enter key combination.
3. With some modifications this technique can be used to sort the extracted Email address domain wise (@yourcompany.com).

If you find it useful or have any suggestions - do write back.

Covid Resource for Second Wave

Initial part of this blog has resources useful for all of India. Later part is dedicated to resources for Mumbai city.