Graphics are a fundamental part of data analysis, used in initial data inspection and exploration, model building and checking and also communicating information. In this course we will teach the basics of static graphics and move on to the new developments in direct manipulation and dynamic graphics that facilitate exploratory data analysis. The methods taught are readily available in open source software, enabling all participants to reproduce, extend and use them with their own data after the workshop.
This course is targeted at anyone interested in learning a new way of looking at their data or learning about tools that make producing graphics easier. We will use R to demonstrate static graphics and to link analysis and exploratory graphics, so a basic knowledge of R will be helpful, but not necessary. Ideally, you should have read a book by Bill Cleveland, Naomi Robbins, Stephen Few or Edward Tufte, as these authors all touch on important themes in statistical graphics.
If you are already familiar with GGobi or ggplot2, this course may be too basic, although you will receive expert hands on instruction that you wouldn't otherwise.
Please bring your own laptop. Closer to the course we'll let you know what you need to install beforehand.
The course will be split into two roughly equal parts: static graphics, and direct manipulation/dynamic graphics. We will alternate between instructional and hands-on components. The presentations will provide a solid foundation to the use of graphics and the hands-on components will give you the practical skills needed apply these techniques to their data.
You will learn how to create a wide variety of static graphics using the ggplot2 R package. In particular, you will learn:
With these basic tools in hand, we will explore how we can apply ggplot2 to different problem domains:
Data examples will include diamond prices, movie ratings, and automobile fuel economy, with sizes ranging from 200 to 50,000 rows
The day will conclude with a discussion of inference for data graphics. Inference for graphics helps you to confirm that you've found something real with your exploratory graphics, not just a random fluctuation. This is an important tool, and creating these graphics will tie together many of the themes of the day.
Direct manipulation and dynamic graphics will be demonstrated with GGobi, and the R package rggobi, which provides access to GGobi from R. GGobi is an open source visualization program for exploring high-dimensional data. It provides highly interactive and dynamic graphics such as linked windows and tours, on the familiar scatterplot, barchart and parallel coordinates plots. Direct manipulation on the plots includes scaling, moving points, linked brushing and identification using categorical variables. This in this section of the course you will learn about:
These techniques will be applied to multiple application areas including: