Biostatistics Guidelines for Files

Guidelines for the Excel Files to Be Used for Data Analysis

1. Place the variable names in the first row. Be sure that the names follow the following rules:

  • Variable names can’t be more than 8 characters long
  • Variable names must start with a letter
  • Variable names may only have letters, numbers, or underscores in them
  • Do not use following characters in variable names: %,$,#,@,!,+,*,~,”,.,-,.
  • Do not include any blanks in variable names
  • Be sure that each variable name is unique (no duplicate variable names)
  • Be sure variable names are on the first row only
  • Provide a detailed variable dictionary (in word document) to help statistician understand what each variable presents, and what each value means


2. Make sure the data is in the rectangular form, each row represents an observation and each column represents value for a variable. When subject has measurements from multiple visits, record the data as a row for each visit, and with a column added as visit variable. When measures are measured from each eye, record the data as a row for each eye, and add one column to indicate which eye is the measure from.

3. Only include the raw data, do NOT include summarized data please. Don’t include extraneous data in your Excel file, like row or column totals, graphs, comments, annotations, etc.

4. Include a unique identifying number for each case. If you need more than one identifier, such as Household ID and Subject ID, place these in separate columns. If you have several spreadsheets containing data on the same individuals, include their identifier(s) on each sheet.

5. Only include one value per cell. Don’t enter data such as “120/80” for blood pressure. Enter systolic blood pressure as one variable, and diastolic blood pressure as another variable. Don’t enter data as “A, C, D” or “BDF” if there are three possible answers to a question. Include a separate column for each answer.

6. For the measurement with units, such as “120 lb” use two columns, one column for number (120) and another column for unit (lb).

7. Don’t leave blank rows or columns in the data.

8. Don’t mix numeric and character values (e.g. names and ID numbers) in the same column.

9. Date values are best entered in three columns: one for month, one for day, one for year.

10. If you have missing values, indicate them with a numeric code, such as 99 or 999, or leave the cell blank. Be sure that the missing value code is not confused with a “real” data value.

11. Save the spreadsheet with values only – not formulas.

12. Do not underline text, or use boldface or italics.


Back to the Biostatistics Module page