Stat 445: Introduction to Exploratory Data Analysis

This is archived information for Stat 445 Sect 201 (Spring, 2005).

Rough Marking Guidelines

In grading the Midterm Project #1, I used the following table of marking guidelines. I'll probably use a similar marking scheme for the second project (and probably also the final project), though it might change once I've reviewed the reports.

Statistical Content
Description of data and "problem"10
Exploratory data analysis20
Confirmatory analysis of select questions15
Conclusions10
Open issues, limitations of data or "study"5
Total60
Report Style and Presentation
Level appropriate to audience10
Organization and structure10
Presentation of analysis10
Clarity of direction, presentation, and conclusions10
Total40

Detailed Explanation

Statistical Content

About 60% of the marks will be for actual statistical content. It's a little hard to be specific here, since telling you exactly what kind of analysis I want to see would pretty much defeat the purpose of you doing the project. There are a few things I'll be looking for, though:

A description of the dataset and the "problem".

The description here can be brief, but someone unfamiliar with this dataset should be able to read your report without getting lost. For this project, there's no clear-cut "problem" to solve (unlike with, say, the Forbes' Alps data where the idea was to use the boiling point of water to estimate air pressure). However, you should make it clear what sorts of questions you expect the data to answer based on what you think the audience of your report would expect to learn from this dataset.

Use of exploratory and confirmatory data analysis techniques.

As we've mentioned in class, there's not a clear distinction between these two phases of data analysis, so you may or may not find it helpful to actually have formal "Exploratory" and "Confirmatory" data analysis sections in your report. However, I'd like to see evidence that you've explored the data enough to find avenues of inquiry suggested by the data itself and that you've then applied appropriate statistical techniques to more directly answer the questions raised.

Clear and relevant conclusions and interpretation.

Your statistical conclusions should be clearly stated, and they should be interpreted for your audience so they can understand the relevance of your findings.

Open issues and data/study limitations.

You'll also want to discuss the ways in which the dataset itself or the way the data were collected might affect the scope of your conclusions.

Throughout, I'll also be looking for the use of appropriate statistical techniques, checks for violations of assumptions, appropriate treatment of missing data or outliers (if any), and—above all—the use of the statistical consultant's most powerful weapon, common sense.

Report Style and Presentation

The report style and presentation will be worth about 40% of your project mark. Here, I want to see:

Writing at an appropriate level.

You should be writing this report as if to a client who may not have much statistical training. You should avoid using jargon or including too many formulas. You should also be careful not to get so involved in the intricacies of a cool, new statistical technique that you lose sight of your client's practical interest in the data.

At the same time, your analysis should be described in sufficient detail that another statistical consultant reading your report can understand and duplicate your results. This can be a difficult balancing act.

Organization and structure of your report.

There should be a clear and concise executive summary briefly summarizing your report and giving a statement of its key findings and conclusions. The body of the report should be organized in some rational way and preferably divided up into sections to reflect that organization.

Effective presentation of your analysis.

The results of your analysis should be presented in a clear and effective manner. In particular, graphs and tables should make sense, should be relevant, and should do a good job highlighting whichever aspect of the data you're interested in. Their meaning and relevance should be clearly explained, and they should be referenced in a logical way in the main text of the report.

Also, it is generally a bad idea to include R output in your report (unless you think there's some important reason to do so). Your client doesn't want to learn about R. He or she wants to learn about the data.

Clarity of direction, presentation, and conclusions

Throughout your report, it should be clear what you are trying to accomplish, how your analysis supports your goals, and how your conclusions follow from your analysis.

This is archived information for Stat 445 Sect 201 (Spring, 2005).