Monday, April 21, 2014

How to extract information from a text string into columns in Excel.

In this example the person had rows of information similar to the following in an Excel spreadsheet and wanted to sort the rows in distance order and then select those rows/locations between five and ten kilometres.

Albion Heights (7050) 8.2 km
Battery Point (7004) 1.1 km
Bellerive (7018) 3.2 km
Bonnet Hill (7053) 9.9 km

Each row of text was in a single cell. To sort the rows by distance we need to extract the distance into a separate column. Then we can sort by the distance column and either delete the rows before and after the desired distance, or copy the wanted cells to another sheet.

When I look at data like this I look for patterns I can use. What is unique I can use. I then use the text functions provided by Excel to extract the text I want.

For example, if we wanted only the suburb, notice that after the suburb is a space followed by the left bracket. That means we want the characters before the space and the left bracket. For this we can use  the LEFT() function which allows us to extract the left characters of the text. In this case however the left bracket can appear in a variable location. Excel provides the FIND() function which allows us to find the position of the left bracket no matter where it is located. We can thus use the FIND() function to tell us where the left bracket is located and the end of the suburb is then two characters before.

Let's say the first cell is A1. In the column B1 we can then write the following formula.

=LEFT(A1,FIND("(",A1,1)-2)

Let's look at this formula more closely. We use LEFT() to find the left most characters of the text. The A1 is the cell where we find the text. After the comma we have the FIND() function. In the FIND() function we say we want to find the left bracket (we place the bracket in quotes), we then tell the FIND() function to use the text in cell A1, and the value 1 says to look from the start of the text until the left bracket is found. We then have the position of the left bracket. Now finally, since we don't want the space and the left bracket, we subtract 2 so the LEFT() function only returns the text from the start of A1 to the character before the space and the left bracket.

OK. That was the easy one. We really want the distance. Notice the pattern. There is a right bracket followed by a space before the number. After the number is a space followed by km.

I'll first present the solution and then go through the solution and show how it works.  But first I'll mention something that is very important. You can't assume anything about your data. You need to test your data to make sure there are no exceptions that could cause a problem. If there is an issue you have to decide either to change your solution, or sometimes, fudge your data temporarily so your solution works and then change your data back later. Why have I said this you may ask. There is an issue when writing the formula in this case. First we want to locate the right bracket. We need to check our data for right brackets that might occur elsewhere. If this data is for all postcodes across Australia there are two entries which include two sets of right brackets. They are for Cocos (Keeling) Islands. For these two locations the rows of data will contain two right brackets. There are only two such entries so changing the brackets around Keeling temporarily to say X would remove this issue and the approach will work.

So before using this solution, first change the brackets in Cocos (Keeling) to read Cocos XKeelingX. Then you can use this formula and the distance will be shown. Again we assume the first row of data starts in cell A1.

Here now is one solution.

=TRIM(SUBSTITUTE(RIGHT(A1,LEN(A1)-FIND(")",A1,1)),"km","",1))

In this solution I introduce four more text functions. The LEN() function which allows us to determine the length of the text. The RIGHT() function which allows us to take a number of characters from the right of the text, the SUBSTITUTE() function which I use to remove the characters km from the text, and the TRIM() function which removes spaces to the left and right of the text. The easiest way to understand this formula is to work from the inside out.

First we want to take the right hand characters of the string starting after the right bracket. We use RIGHT(A1,LEN(A1)-FIND(")",A1,1)). We use the text in cell A1. We determine the number of characters by subtracting the position of the right bracket from the length of the text.

Now that we have the right hand characters from the end of the text we have the distance followed by a space and the letters km. A way to get rid of the characters km is to use the SUBSTITUTE() function and replace km with text of zero length.

Finally we use the trim value to remove and spaces from the start and end of the text. That leaves us with just the number and that is what we were after.

To use the solution.

To use this solution, if your data is in column A and starts in cell A1, then copy the solution to B1. You need to do this first. Then copy the contents in B1 to all of the B column cells required. If you have your data in different cells you need to change A1 to match the first cell for your data.

Now to sort your data.

Finally we are now ready to sort the data. Select all your data, select custom sort and sort by the column containing the above formula. You can either copy the range of locations to a new sheet or you can delete the rows you don't want. Make sure you are however working on a copy of your data.

For completeness however, in case you'd like the postcode as well in a separate column, then use the following formula.

=MID(A1,FIND("(",A1,1)+1,4)

The MID() function enables you to extract the text from the middle of the string. To the left of the postcode is a bracket and and the length of a postcode is always four characters. That means you extract four characters starting at the character after the left bracket.

Hopefully you can now see the text functions is Excel are available for your use. Yes they can be tricky but they are also very powerful.

Kelvin Eldridge
www.OnlineConnections.com.au
Call 0415 910 703 for IT support.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.