Salt Lake City, 28 Jul 2007
Workshop resources
Workshop details
Graphics are a fundamental part of data analysis, used in initial data inspection and exploration, model building and checking and also communicating information. In this course we will teach the basics of static graphics and move on to the new developments in direct manipulation and dynamic graphics that facilitate exploratory data analysis. The methods taught are readily available in open source software, enabling all participants to reproduce, extend and use them with their own data after the workshop.
Who should take this course?
This training session is targeted at anyone interested in learning a new way of looking at their data or learning about new tools that make producing graphics easier. We will use R to demonstrate static graphics and to link analysis and exploratory graphics, so a basic knowledge of R will be helpful, but not necessary. Ideally, attendees should have read one of Bill Cleveland's books (eg. Visualizing Data) as these introduce important themes in statistical graphics. If you are already familiar with GGobi or ggplot, this course may be too basic, although you will receive expert hands on instruction that you wouldn't otherwise.
What will you learn?
The course will be split into two roughly equal parts—static graphics, and direct manipulation/dynamic graphics—which will form the morning and afternoon segments. We will alternate between instructional and hands-on components. The presentations will provide a solid foundation to the use of graphics and the hands-on components will give attendees the practical skills needed apply these techniques to their data.
Static graphics
Static graphics will be illustrated using the ggplot R package. The static graphics section contain the following parts:
- Building blocks of a plot. Description of a plot as a mapping from data to visual properties of graphical objects. Analysis of several plots to determine their components.
- Basic grammar/syntax. How the components of a plot can be described using a formal grammar, presented using the declarative syntax from the Grammar of Graphics, and the functional syntax of ggplot.
- Geometric objects and statistics control exactly what data is displayed and what it looks like. Scales adjust how data values are mapped to aesthetic values. We will cover default scales, adjusting scales, and defining your own scales. Displaying the same graph for different subsets of your data is often useful, and is called facetting (or conditioning, or trellising). How can we do this with ggplot?
- What to plot. We have discussed in depth how to create a pre-planned plot, but not how to choose that plot. To conclude the static graphics section we will discuss some guidelines of Tukey, Chambers and Cleveland.
Direct manipulation/dynamic graphics
Direct manipulation and dynamic graphics will be demonstrated with GGobi, and the R package rggobi, which provides access to GGobi from R. GGobi is an open source visualization program for exploring high-dimensional data. It provides highly interactive and dynamic graphics such as linked windows and tours, on the familiar scatterplot, barchart and parallel coordinates plots. Direct manipulation on the plots includes scaling, moving points, linked brushing and identification using categorical variables. This in this section of the course you will learn about:
- The toolbox, which contains a collection of basic plot types, ways to link multiple plots and tour methods for examining multivariate data.
- How to use direct manipulation and dynamic graphics to rapidly explore data and uncover new and unexpected features.
- Several application areas:
- Missing values: How are missing values distributed in the data? Are they missing at random, completely at random or not at random? Do the imputed values match the distribution of the complete data?
- Supervised classification: How can we explore the class structure in a labelled data set in multiple dimensions? How do we check that the data is consistent with the assumptions of the classification method? How do we assess the results of black box methods such as support vector machines (SVM) and neural networks using graphics?
- Cluster analysis: How do we examine the cluster structure in multivariate data? How can we compare the results from several clustering algorithms? Does the model parameterization in model-based clustering match the variance-covariance present in the data? How do self-organizing maps (SOM) compare to multidimensional scaling (MDS) as a method for summarizing the interpoint distances?
- Multivariate longitudinal data analysis: Using functional data analysis tools in R, we will explore how to connect data modelling and exploration.
- Inference for graphics, so that we can check that what we see is really there.
More questions?
If you have any more questions, please do not hesitate to contact us at h.wickham@gmail.com