*Kickstarting* R - Adding lines to a plot
Many lines that are added to plots are just straight lines that span the plot.
`abline()`

is a good choice for this type of line. Say that we wished
to add a vertical line at 2.5 on the x axis to the plot to divide the women who
completed high school from those who didn't.

`> abline(v=2.5,col=3,lty=3)`

This would produce a green, dotted, vertical line across the plot. To divide
the other axis, say that age 33 was to be marked.

`> abline(h=33,col=4,lty=2)`

would draw a blue, dashed, horizontal line at 33 on the y axis. We can also
display regression lines.

`> abline(lm(infert$age~as.numeric(infert$educ)),col=2,lty=1)`

This draws a solid, red line illustrating the regression of education on age.

## Hypothetical distribution curves

Sometimes a hypothetical distribution curve for the data illustrated will give
the viewer a better notion of how the distribution in the population might look
(we sincerely hope). If you can write down the function that describes the
distribution you think underlies the data, you can use `curve()`

to
add it to your plot. Using the `airquality`

data, plot
`airquality$Ozone`

. Suppose you think that the probability of a given
concentration of ozone on any day is described by two linear functions, one
valid for the range 0 to 120, and the other for 120 and up.

> data(airquality)
> airhist<-hist(airquality$Ozone)
> curve(40-(x/3.3+1),from=0,to=120,add=T)
> curve(6.6-(x/30),from=120,to=180,add=T)

This might impress an uncritical audience, but it is completely
fabricated. When you are at a loss for what the underlying distribution might
be, it may be better to just smooth the data and plot the result.

> airhist<-hist(airquality$Ozone)
> airspline<-spline(airhist$counts)
> lines(rescale(airspline$x,range(airhist$mids)),airspline$y)

There are a number of smoothing algorithms available in R, including
`spline()`

. Producing smoothed curves for `histogram()`

or `barplot()`

is a common problem, partly because the horizontal
axis on these plots is not scaled in an obvious way. As you can see,
`histogram()`

returns a list that contains the midpoints of the bars,
as does `barplot()`

. The function rescale() does a simple linear transformation of one
vector of values into a new scale. In this case, the scaling was by about a
factor of 20.

For more information, see __An Introduction to R__: Examining the
distribution of a set of data.

Back to Table of Contents