a scholarly smorgasbord: excel

Showing posts with label excel. Show all posts

Thursday, August 9, 2012

Paste special and transpose in Excel

There are a few ways to do this, but if you're looking for a non-macro shortcut, try this:

Select the cells you want to copy. Ctrl-C to copy. Select the cells where you want to paste. Holding down Alt, type E S E. Choose the options you want in the paste special box (i.e. check values, transpose). Click OK.

Many thanks to MrExcel.com (John) for this tip! He also spells out the macros if you want to go that route here.

Wednesday, August 8, 2012

Separating a text string to cells, columns in Excel

I received a string of numbers in an email and needed to manipulate them, so I copied them to a text document and imported the text into Excel 2010 (under the "data" tab, click on Get External Data: From Text). I chose the "Delimited" option instead of the default "Fixed width", clicked "Next", chose "Delimiters: Space" instead of the default "Tab" (note that you can chose multiple delimiter options), clicked "Next", set the column formats, and clicked "Finish".

Source

Friday, August 19, 2011

Calculating age in Excel

If you have an event date and a date of birth, Excel will calculate the age-at-event for you through this formula:

=INT((A1-B1)/365.25)

Where A1 is the cell with the event date, and B1 is the cell with the birthdate.

Shortcut courtesy of http://excel-formulas.blogspot.com/2005/08/calculate-age-in-excel.html

Tuesday, July 19, 2011

Identifying duplicate records in Excel, ArcGIS

Let me start by saying that I have no doubt there is an easier method. This is just the way I attacked this problem today, and it seemed to work, so I thought I'd share.

There was a question as to why I didn't just use Excel's highlight or delete duplicate action. This is a great addition to the Excel package (I believe it introduced with Excel 2007) and you can read about it here: http://www.lytebyte.com/2008/06/10/highlight-remove-delete-duplicates-excel-2007/. But since I didn't want to blindly delete duplicates, I needed to find a quicker way than eyeballing highlighted cells to investigate and ultimately delete duplicate records.

Problem: I have four layers of patient data, each with over 3000 records. An eyeball review suggests that each layer has duplicate records, defined as records with the same Patient ID and same diagnosis (Dx) date. How do I identify these duplicates so that I can delete them?

Roundabout solution: First, I exported the records for one layer (i.e. 2005 cases) into Excel. I deleted all fields except patient ID, dx date, and STFID. I translated all columns into numbers with no decimal places (select the cells, hit the exclamation point drop down and select "convert to number").

I created a concatenated column named "ID_and_Dx". By using the formula "=CONCATENATE(A2,C2)", I created a record number of patient ID (located in A2) and dx date in numerical form (located in C2).

The important next step: I sorted the data by the concatenated column (ID_and_Dx).

I created a column called "Dup_ID_Dx". This is my indicator variable for duplicate fields. I used the formula "@if(D3=D2,"Dup", "Not Dup"). Filling this formula down through the sheet will mark the first record in a duplicate set as "Dup".

The important next step: I copied the Dup_ID_Dx column, highlighted the next blank column, right click and selected "Paste special", and chose to paste values. This will create a column of "Dup" or "Not Dup" without the formula. I then deleted the formula version column.

One more problem: I noticed that some records have the same patient ID and dx date, but different STFID. I would like to be able to investigate these records later in ArcGIS. How can I denote these kind-of-duplicates in Excel before I bring them back into ArcGIS?

And back to the circuitous solutions: I created another concatenated column and named it "ID_and_Dx_and_STFID". I used the concatenate function as described above to mash up my previously created ID_and_Dx field and my STFID field. Note that if a records couldn't geocode and therefore didn't have an associated STFID, this ID_and_Dx_and_STFID cell only contains the ID_and_Dx value.

I then made another duplicate indicator column, called "ID_Dx_STFID_Dup" and used the formula "@if(F3=F2,"Dup", "Not Dup")" to fill down and identify duplicates. Note that duplicates in this column indicate records that have the same patient ID, Dx date, and STFID. I am actually interested in records that have the same Patient ID and Dx date but different STFID. I will get to these kind-of-duplicates later.

Then it was time to bring it in to ArcGIS10. I created a patient ID field type double and named it "Pat_ID_dbl" on the 2005 case layer on ArcGIS and used the field calculator to populate it with the patient ID. I did this because my duplicate investigation xls document has patient ID listed as a number.

I joined the 2005 layer to the duplicate investigation xls document on the patient ID field. I sorted by the Dup_ID_Dx_ field and selected all duplicate records (using "select by attribute") to calculate my % duplication. I then used "switch selection" to select the non-duplicate records and created a new layer from these selected features (right click on layer name -> data -> export data and make sure you're exporting "selected features"). This will be the layer I will use for my analysis because it contains no duplicate records.

Now, back to the issue of kind-of-duplicates (same patient ID, same Dx date, different STFIDs, indicating that something went quite wrong in the data entry). I used my original, joined layer to select duplicate records (Dup_ID_dx). I created a new layer from this selection (right click layer -> selection -> new layer from selected features). In this layer, I sorted by the kind-of-duplicate field (sort by "ID_Dx_STFID_Dup") so I can investigate the "Not Dup" within these records to try to figure out what's going on with these records. In my case, I noticed all of the patients with the same patient ID and Dx date had the same birthday (DOB). So even though the two kind-of-duplicate records have different home addresses and therefore different STFID's, they seem to indicate that they are true duplicates of the same patient and diagnosis.

Thursday, June 23, 2011

Excel tricks: Parsing out the date

If you have a column with the date in the month/day/year format and you want to create separate columns for day, month, and year, Excel is nice enough to do that for you:

In cell A1, you have 01/02/2001
In cell B1, type =DAY(A1) and Excel will update B1 to read "2"
In cell C1, type =MONTH(A1) and Excel will update C1 to read "1"
In cell D1, type =YEAR(A1) and Excel will update D1 to read "2001"

Thursday, June 9, 2011

ArcGIS 10: Exporting part of an attribute table and renaming fields

Q: I want to create a dbf file containing a few columns from my layer's attribute table to send to a collaborator. I also want to rename some of the fields so she doesn't have to translate my shorthand. Please help!

A: Gladly!

First, you want to turn off the fields you are not interested in exporting: Open your attribute table, right click the field of (un)interest, and select "turn field off". This should remove it from view, but don't worry... you can undo this command by clicking the table options in the upper right corner of the attribute table (the icon looks like a bulleted paper)and selecting "Turn all fields on".

Back to the problem at hand... repeat turning off all of the fields you aren't interested in sharing. Now export the data as a dbf (check this post for info). Use Excel to open the dbf file.

Note that you can of course export your entire attribute table to Excel and remove unwanted fields by deleting the columns in Excel instead of hiding them in ArcGIS, but sometimes there are exportation restrictions on sensitive data, so I wanted to outline how to avoid importing prohibited data into a new program.

The file still has your original field names. We didn't change those in ArcGIS 10 because field aliases are lost when you export the data, so now we just change the column names in Excel.

And now for the silliest step... Excel 2007 won't save files as dbf,so we have to save the Excel file in the old .xls format (Save As Excel 97-2003 Workbook) and re-import it into ArcGIS (notes here) to then export it again as dBase (dbf). Horray for hoops!

Friday, May 6, 2011

Joining insanity in ArcGIS 10, Excel 2010

Problem: I haven't been able to join the Excel file that contains demographic data by census tract to the census tract shape file.

My specific issue is that when I open the Join window, the joinable field names are not selectable... if the CT shapefile field name is selected, the corresponding field name in the Excel file is not available to select, and vice versa.

I downloaded both sets of info from the census website... so why the technical trubs???

After renaming, resaving, repopulating, and refilling my beer, I finally thought to check the field properties for my problem fields in both layers (open attribute table, highlight column of interest, right click column heading, select "Properties", check "Type"). Surprise! My field of interest in my shapefile layer is type "string", while the corresponding field of interest in my Excel file layer is type "double".

Solution: I created a same-type field to complete the join.

Since I couldn't seem to add a string type field to the Excel layer in ArcGIS, I ended up adding a double type field to the shapefile layer: Open attribute table, select "Add field...", named it "GeoId10d" and type "double" leaving the default for precision and scale. Then I started Editor (button that looks like a pencil connecting the dots), selected the appropriate layer to edit, opened the attribute table, right clicked on my new GeoID10d column, selected "Field Calculator" and just double clicked the "GeoID10" column (telling ArcGIS that GeoID10d = [GeoId10]... in English, setting the new GeoID10d column to copy the values from the old GeoID10 column and store those values as type double).

Once I had a matching type field, the join worked! Horray!

Friday, April 29, 2011

Excel notes

Merging two text columns into a single column (concatenation, concatenate):

"=A2&B2"

Remember you can convert numbers into text to accomplish this if needed.

To include a space between text from different columns:

=A2&" "&B2

To convert all caps to capitalize first letter only:

=PROPER(A2&" "&B2)

To convert a formula column into a static column (ie text or number):
Highlight the column of interest, Copy, then Paste Special into a new column-> Value. Can then select all cells and format into appropriate type.

Reference: http://www.mrexcel.com/tip074.shtml

Importing Excel 2007 into ArcGIS 10

Save table as old Excel file .xls. Check for the following (from webhelp@esri):

1. Make sure the first row of the worksheet is properly formatted, since it will be used for the field names in ArcGIS. You should follow these general best practices for field naming, particularly if you want to join an Excel table to another table in ArcMap:

Field names must start with a letter.
Field names must contain only letters, numbers, and underscores.
Field names must not exceed 64 characters.

2. If you have cells with numeric data, dates, and so on, make sure the content is consistently formatted. ArcMap will scan the first eight rows to determine the field type that should be used. If there are other types of data in those rows, the field will be converted to text when the table is in ArcMap.

3. ArcMap can only read the first 255 characters of a cell. If you have more characters than that, ArcMap converts the field to a BLOB type and you won't be able to read its contents.

Exporting ArcGIS 10 attribute table to Excel 2007

Open attribute table. Table options in upper left (button looks like bulleted paper) -> export -> export: all records; output table: save as type dBASE Table in desired folder.

Open folder and manually select Open With: Excel

Friday, March 18, 2011

Importing Excel data into R

First save Excel into text file (see earlier post).
EX: I created a 2x2 table with no headers and saved it to E drive (dat path E:\Table2forR.txt)

In R, check working directory
> getwd()

If it's not correct (in this case, not set to E drive), reset it:
> setwd("E:")

Set R to read your table and rename it (renamed from "Table2forR" to "mydata" here)
> mydata = read.table("Table2forR.txt")

Check it:
> mydata

Should get output:
V1 V2
1 336 355
2 955 947

Saving Excel as Tab delimited for R

Excel:
Save As -> Other formats (may have to scroll down to see this option) -> Save as type: Text (tab delimited) (scroll down to bottom for this option)