R-project pdf output


















Many aspects of the LaTeX template used to create PDF documents can be customized using top-level YAML metadata note that these options do not appear underneath the output section, but rather appear at the top level along with title , author , and so on. A few available metadata variables are displayed in Table 3. By default, citations are processed through pandoc-citeproc , which works for all output formats. By default, PDF documents are rendered using pdflatex. Available engines are pdflatex , xelatex , and lualatex.

The main reasons you may want to use xelatex or lualatex are: 1 They support Unicode better; 2 It is easier to make use of system fonts. See some posts on Stack Overflow for more detailed explanations, e. Defaults to "special".

See postscript for details. Defaults to "default". Defaults to Defaults to TRUE. Defaults to "srgb". Should small circles be rendered via the Dingbats font? Defaults to TRUE , which produces smaller and better output. Setting this to FALSE can work around font display problems in broken PDF viewers: although this font is one of the 14 guaranteed to be available in all PDF viewers, that guarantee is not always honoured. Should kerning corrections be included in setting text and calculating string widths?

Should PDF streams be generated with Flate compression? The default color model "srgb" is sRGB. Model "gray" or "grey" maps sRGB colors to greyscale using perceived luminosity biased towards green. Also available for backwards compatibility is model "rgb" which uses uncalibrated RGB and corresponds to the model used with that name in R prior to 2. Some viewers may render some plots in that colorspace faster than in sRGB, and the plot files will be smaller.

Circles of any radius are allowed. Except on Windows it is possible to print directly from pdf by something like this is appropriate for a CUPS printing system :. All arguments except file default to values given by pdf. The ultimate defaults are quoted in the arguments section.

The file argument is interpreted as a C integer format as used by sprintf , with integer argument the page number. The default gives files Rplot Next: Forming partitioned matrices, cbind and rbind , Previous: Generalized transpose of an array , Up: Arrays and matrices [ Contents ][ Index ]. As noted above, a matrix is just an array with two subscripts.

However it is such an important special case it needs a separate discussion. R contains many operators and functions that are available only for matrices. For example t X is the matrix transpose function, as noted above. The functions nrow A and ncol A give the number of rows and columns in the matrix A respectively.

An n by 1 or 1 by n matrix may of course be used as an n -vector if in the context such is appropriate. Conversely, vectors which occur in matrix multiplication expressions are automatically promoted either to row or column vectors, whichever is multiplicatively coherent, if possible, although this is not always unambiguously possible, as we see later.

If the second argument to crossprod is omitted it is taken to be the same as the first. The meaning of diag depends on its argument. On the other hand diag M , where M is a matrix, gives the vector of main diagonal entries of M. Also, somewhat confusingly, if k is a single numeric value then diag k is the k by k identity matrix! In R,. Next: Singular value decomposition and determinants , Previous: Linear equations and inversion , Up: Matrix facilities [ Contents ][ Index ].

The function eigen Sm calculates the eigenvalues and eigenvectors of a symmetric matrix Sm. The result of this function is a list of two components named values and vectors. Had we only needed the eigenvalues we could have used the assignment:. If the expression. For large matrices it is better to avoid computing the eigenvectors if they are not needed by using the expression.

The function svd M takes an arbitrary matrix argument, M , and calculates the singular value decomposition of M. D is actually returned as a vector of the diagonal elements. The result of svd M is actually a list of three components named d , u and v , with evident meanings.

If this calculation were needed often with a variety of matrices it could be defined as an R function. As a further trivial but potentially useful example, you might like to consider writing a function, say tr , to calculate the trace of a square matrix. Look again at the diag function. R has a builtin function det to calculate a determinant, including the sign, and another, determinant , to give the sign and modulus optionally on log scale ,.

Previous: Singular value decomposition and determinants , Up: Matrix facilities [ Contents ][ Index ]. The function lsfit returns a list giving results of a least squares fitting procedure. An assignment such as.

See the help facility for more details, and also for the follow-up function ls. Note that a grand mean term is automatically included and need not be included explicitly as a column of X.

Further note that you almost always will prefer using lm. Another closely related function is qr and its allies. Consider the following assignments. It is not assumed that X has full column rank. Redundancies will be discovered and removed as they are found. This alternative is the older, low-level way to perform least squares calculations. Although still useful in some contexts, it would now generally be replaced by the statistical models features, as will be discussed in Statistical models in R.

As we have already seen informally, matrices can be built up from other vectors and matrices by the functions cbind and rbind. Roughly cbind forms matrices by binding together matrices horizontally, or column-wise, and rbind vertically, or row-wise. If some of the arguments to cbind are vectors they may be shorter than the column size of any matrices present, in which case they are cyclically extended to match the matrix column size or the length of the longest vector if no matrices are given.

The function rbind does the corresponding operation for rows. In this case any vector argument, possibly cyclically extended, are of course taken as row vectors. Suppose X1 and X2 have the same number of rows. To combine these by columns into a matrix X , together with an initial column of 1 s we can use. The result of rbind or cbind always has matrix status. Hence cbind x and rbind x are possibly the simplest ways explicitly to allow the vector x to be treated as a column or row matrix respectively.

Next: Frequency tables from factors , Previous: Forming partitioned matrices, cbind and rbind , Up: Arrays and matrices [ Contents ][ Index ].

It should be noted that whereas cbind and rbind are concatenation functions that respect dim attributes, the basic c function does not, but rather clears numeric objects of all dim and dimnames attributes.

This is occasionally useful in its own right. The official way to coerce an array back to a simple vector object is to use as. However a similar result can be achieved by using c with just one argument, simply for this side-effect:. There are slight differences between the two, but ultimately the choice between them is largely a matter of style with the former being preferable. Previous: The concatenation function, c , with arrays , Up: Arrays and matrices [ Contents ][ Index ].

Recall that a factor defines a partition into groups. Similarly a pair of factors defines a two way cross classification, and so on. The function table allows frequency tables to be calculated from equal length factors. If there are k factor arguments, the result is a k -way array of frequencies. Suppose, for example, that statef is a factor giving the state code for each entry in a data vector.

The frequencies are ordered and labelled by the levels attribute of the factor. This simple case is equivalent to, but more convenient than,. An R list is an object consisting of an ordered collection of objects known as its components.

There is no particular need for the components to be of the same mode or type, and, for example, a list could consist of a numeric vector, a logical value, a matrix, a complex vector, a character array, a function, and so on.

Here is a simple example of how to make a list:. Components are always numbered and may always be referred to as such. Thus if Lst is the name of a list with four components, these may be individually referred to as Lst[[1]] , Lst[[2]] , Lst[[3]] and Lst[[4]].

If, further, Lst[[4]] is a vector subscripted array then Lst[[4]][1] is its first entry. If Lst is a list, then the function length Lst gives the number of top level components it has. Components of lists may also be named , and in this case the component may be referred to either by giving the component name as a character string in place of the number in double square brackets, or, more conveniently, by giving an expression of the form.

This is a very useful convention as it makes it easier to get the right component if you forget the number. Additionally, one can also use the names of the list components in double square brackets, i. This is especially useful, when the name of the component to be extracted is stored in another variable as in. It is very important to distinguish Lst[[1]] from Lst[1]. Thus the former is the first object in the list Lst , and if it is a named list the name is not included.

The latter is a sublist of the list Lst consisting of the first entry only. If it is a named list, the names are transferred to the sublist. The names of components may be abbreviated down to the minimum number of letters needed to identify them uniquely. The vector of names is in fact simply an attribute of the list like any other and may be handled as such. Other structures besides lists may, of course, similarly be given a names attribute also. New lists may be formed from existing objects by the function list.

An assignment of the form. If these names are omitted, the components are numbered only. The components used to form the list are copied when forming the new list and the originals are not affected. Lists, like any subscripted object, can be extended by specifying additional components.

Previous: Constructing and modifying lists , Up: Constructing and modifying lists [ Contents ][ Index ]. When the concatenation function c is given list arguments, the result is an object of mode list also, whose components are those of the argument lists joined together in sequence.

Recall that with vector objects as arguments the concatenation function similarly joined together all arguments into a single vector structure. In this case all other attributes, such as dim attributes, are discarded.

A data frame is a list with class "data. There are restrictions on lists that may be made into data frames, namely. A data frame may for many purposes be regarded as a matrix with columns possibly of differing modes and attributes.

It may be displayed in matrix form, and its rows and columns extracted using matrix indexing conventions. Objects satisfying the restrictions placed on the columns components of a data frame may be used to form one using the function data. A list whose components conform to the restrictions of a data frame may be coerced into a data frame using the function as.

The simplest way to construct a data frame from scratch is to use the read. This is discussed further in Reading data from files. A useful facility would be somehow to make the components of a list or data frame temporarily visible as variables under their component name, without the need to quote the list name explicitly each time.

The attach. At this point an assignment such as. However the new value of component u is not visible until the data frame is detached and attached again. More precisely, this statement detaches from the search path the entity currently at position 2. Entities at positions greater than 2 on the search path can be detached by giving their number to detach , but it is much safer to always use a name, for example by detach lentils or detach "lentils".

Note: In R lists and data frames can only be attached at position 2 or above, and what is attached is a copy of the original object. You can alter the attached values via assign , but the original list or data frame is unchanged. A useful convention that allows you to work with many different problems comfortably together in the same working directory is.

In this way it is quite simple to work with many problems in the same directory, all of which have variables named x , y and z , for example. In particular any object of mode "list" may be attached in the same way:. Anything that has been attached can be detached by detach , by position number or, preferably, by name. The function search shows the current search path and so is a very useful way to keep track of which data frames and lists and packages have been attached and detached.

Initially it gives. Large data objects will usually be read as values from external files rather than entered during an R session at the keyboard. R input facilities are simple and their requirements are fairly strict and even rather inflexible. There is a clear presumption by the designers of R that you will be able to modify your input files using other tools, such as file editors or Perl 20 to fit in with the requirements of R.

Generally this is very simple. If variables are to be held mainly in data frames, as we strongly suggest they should be, an entire data frame can be read directly with the read. There is also a more primitive input function, scan , that can be called directly. If the file has one fewer item in its first line than in its second, this arrangement is presumed to be in force.

So the first few lines of a file to be read as a data frame might look as follows. By default numeric items except row labels are read as numeric variables and non-numeric variables, such as Cent.

This can be changed if necessary. Often you will want to omit including the row labels directly and use the default labels. In this case the file may omit the row label column as in the following.

Next: Accessing builtin datasets , Previous: The read. Suppose the data vectors are of equal length and are to be read in parallel. Further suppose that there are three vectors, the first of mode character and the remaining two of mode numeric, and the file is input. The first step is to use scan to read in the three vectors as a list, as follows.

The second argument is a dummy list structure that establishes the mode of the three vectors to be read. The result, held in inp , is a list whose components are the three vectors read in. To separate the data items into three separate vectors, use assignments like.

More conveniently, the dummy list can have named components, in which case the names can be used to access the vectors read in. If you wish to access the variables separately they may either be re-assigned to variables in the working frame:. If the second argument is a single value and not a list, a single vector is read in, all components of which must be of the same mode as the dummy value.

Around datasets are supplied with R in package datasets , and others are available in packages including the recommended packages supplied with R. To see the list of datasets currently available use. All the datasets supplied with R are available directly by name. However, many packages still use the obsolete convention in which data was also used to load datasets into R, for example. In most cases this will load an R object of the same name. However, in a few cases it loads several objects, so see the on-line help for the object to see what to expect.

If a package has been attached by library , its datasets are automatically included in the search. When invoked on a data frame or matrix, edit brings up a separate spreadsheet-like environment for editing. This is useful for making small changes once a data set has been read. Next: Examining the distribution of a set of data , Previous: Probability distributions , Up: Probability distributions [ Contents ][ Index ]. One convenient use of R is to provide a comprehensive set of statistical tables.

The first argument is x for d xxx , q for p xxx , p for q xxx and n for r xxx except for rhyper , rsignrank and rwilcox , for which it is nn. In not quite all cases is the non-centrality parameter ncp currently available: see the on-line help for details. The p xxx and q xxx functions all have logical arguments lower. This allows, e. In addition there are functions ptukey and qtukey for the distribution of the studentized range of samples from a normal distribution, and dmultinom and rmultinom for the multinomial distribution.

Further distributions are available in contributed packages, notably SuppDists. Given a univariate set of data we can examine its distribution in a large number of ways. The simplest is to examine the numbers. A stem-and-leaf plot is like a histogram, and R has a function hist to plot histograms. More elegant density plots can be made by density , and we added a line produced by density in this example.

We can plot the empirical cumulative distribution function by using the function ecdf. This distribution is obviously far from any standard distribution. How about the right-hand mode, say eruptions of longer than 3 minutes? Let us fit a normal distribution and overlay the fitted CDF. Quantile-quantile Q-Q plots can help us examine this more carefully. Let us compare this with some simulated data from a t distribution. We can make a Q-Q plot against the generating distribution by.

Finally, we might want a more formal test of agreement with normality or not. R provides the Shapiro-Wilk test. Note that the distribution theory is not valid here as we have estimated the parameters of the normal distribution from the same sample. Previous: Examining the distribution of a set of data , Up: Probability distributions [ Contents ][ Index ]. So far we have compared a single sample to a normal distribution. A much more common operation is to compare aspects of two samples. To test for the equality of the means of the two examples, we can use an unpaired t -test by.

By default the R function does not assume equality of variances in the two samples in contrast to the similar S-PLUS t. We can use the F test to test for equality in the variances, provided that the two samples are from normal populations. All these tests assume normality of the two samples. The two-sample Wilcoxon or Mann-Whitney test only assumes a common continuous distribution under the null hypothesis. Note the warning: there are several ties in each sample, which suggests strongly that these data are from a discrete distribution probably due to rounding.

There are several ways to compare graphically the two samples. We have already seen a pair of boxplots. The following. Next: Control statements , Previous: Grouping, loops and conditional execution , Up: Grouping, loops and conditional execution [ Contents ][ Index ].

R is an expression language in the sense that its only command type is a function or expression which returns a result. Even an assignment is an expression whose result is the value assigned, and it may be used wherever any expression may be used; in particular multiple assignments are possible.

Since such a group is also an expression it may, for example, be itself included in parentheses and used as part of an even larger expression, and so on. This has the form ifelse condition, a, b and returns a vector of the same length as condition , with elements a[i] if condition[i] is true, otherwise b[i] where a and b are recycled as necessary.

As an example, suppose ind is a vector of class indicators and we wish to produce separate plots of y versus x within classes. One possibility here is to use coplot , 21 which will produce an array of plots corresponding to each level of the factor.

Another way to do this, now putting all plots on the one display, is as follows:. Note the function split which produces a list of vectors obtained by splitting a larger vector according to the classes specified by a factor. This is a useful function, mostly used in connection with boxplots. See the help facility for further details. Warning : for loops are used in R code much less often than in compiled languages. The break statement can be used to terminate any loop, possibly abnormally. This is the only way to terminate repeat loops.

Control statements are most often used in connection with functions which are discussed in Writing your own functions , and where more examples will emerge. As we have seen informally along the way, the R language allows the user to create objects of mode function. These are true R functions that are stored in a special internal form and may be used in further expressions and so on.

In the process, the language gains enormously in power, convenience and elegance, and learning to write useful functions is one of the main ways to make your use of R comfortable and productive.

It should be emphasized that most of the functions supplied as part of the R system, such as mean , var , postscript and so on, are themselves written in R and thus do not differ materially from user written functions. The value of the expression is the value returned for the function. This is an artificial example, of course, since there are other, simpler ways of achieving the same end.

With this function defined, you could perform two sample t -tests using a call such as. As a second example, consider a function to emulate directly the MATLAB backslash command, which returns the coefficients of the orthogonal projection of the vector y onto the column space of the matrix, X. This is ordinarily called the least squares estimate of the regression coefficients.

This would ordinarily be done with the qr function; however this is sometimes a bit tricky to use directly and it pays to have a simple function such as the following to use it safely. The classical R function lsfit does this job quite well, and more It in turn uses the functions qr and qr. Hence there is probably some value in having just this part isolated in a simple to use function if it is going to be in frequent use.

If so, we may wish to make it a matrix binary operator for even more convenient use. Suppose, for example, we choose! The function definition would then start as. Note the use of quote marks. The backslash symbol itself is not a convenient choice as it presents special problems in this context. Furthermore the argument sequence may begin in the unnamed, positional form, and specify named arguments after the positional arguments.

In many cases arguments can be given commonly appropriate default values, in which case they may be omitted altogether from the call when the defaults are appropriate.

For example, if fun1 were defined as. It is important to note that defaults may be arbitrary expressions, even involving other arguments to the same function; they are not restricted to be constants as in our simple example here. Another frequent requirement is to allow one function to pass on argument settings to another.

For example many graphics functions use the function par and functions like plot allow the user to pass on graphical parameters to par to control the graphical output. See Permanent changes: The par function , for more details on the par function. An outline example is given below.

The expression list Note that any ordinary assignments done within the function are local and temporary and are lost after exit from the function. To understand completely the rules governing the scope of R assignments the reader needs to be familiar with the notion of an evaluation frame. This is a somewhat advanced, though hardly difficult, topic and is not covered further here.

See the help document for details. These are discussed further in Scope. As a more complete, if a little pedestrian, example of a function, consider finding the efficiency factors for a block design. Some aspects of this problem have already been discussed in Index matrices. A block design is defined by two factors, say blocks b levels and varieties v levels. One way to write the function is given below.

It is numerically slightly better to work with the singular value decomposition on this occasion rather than the eigenvalue routines. The result of the function is a list giving not only the efficiency factors as the first component, but also the block and variety canonical contrasts, since sometimes these give additional useful qualitative information.

Next: Recursive numerical integration , Previous: Efficiency factors in block designs , Up: More advanced examples [ Contents ][ Index ]. For printing purposes with large matrices or arrays, it is often useful to print them in close block form without the array names or numbers. Removing the dimnames attribute will not achieve this effect, but rather the array must be given a dimnames attribute consisting of empty strings.

For example to print a matrix, X. This can be much more conveniently done using a function, no. It also illustrates how some effective and useful user functions can be quite short. This is particularly useful for large integer arrays, where patterns are the real interest rather than the values. Functions may be recursive, and may themselves define functions within themselves.

Note, however, that such functions, or indeed variables, are not inherited by called functions in higher evaluation frames as they would be if they were on the search path. The example below shows a naive way of performing one-dimensional numerical integration.

The integrand is evaluated at the end points of the range and in the middle. If the one-panel trapezium rule answer is close enough to the two panel, then the latter is returned as the value. Otherwise the same process is recursively applied to each panel. The result is an adaptive integration process that concentrates function evaluations in regions where the integrand is farthest from linear. There is, however, a heavy overhead, and the function is only competitive with other algorithms when the integrand is both smooth and very difficult to evaluate.

The discussion in this section is somewhat more technical than in other parts of this document. The symbols which occur in the body of a function can be divided into three classes; formal parameters, local variables and free variables.

The formal parameters of a function are those occurring in the argument list of the function. Their values are determined by the process of binding the actual function arguments to the formal parameters. Local variables are those whose values are determined by the evaluation of expressions in the body of the functions. Variables which are not formal parameters or local variables are called free variables. Free variables become local variables if they are assigned to.

Consider the following function definition. In this function, x is a formal parameter, y is a local variable and z is a free variable. In R the free variable bindings are resolved by first looking in the environment in which the function was created. This is called lexical scope. First we define a function called cube. The variable n in the function sq is not an argument to that function. Therefore it is a free variable and the scoping rules must be used to ascertain the value that is to be associated with it.

Under lexical scope R it is the parameter to the function cube since that is the active binding for the variable n at the time the function sq was defined.

Lexical scope can also be used to give functions mutable state. In the following example we show how R can be used to mimic a bank account. A functioning bank account needs to have a balance or total, a function for making withdrawals, a function for making deposits and a function for stating the current balance.

We achieve this by creating the three functions within account and then returning a list containing them. When account is invoked it takes a numerical argument total and returns a list containing the three functions. Because these functions are defined in an environment which contains total , they will have access to its value. This operator looks back in enclosing environments for an environment that contains the symbol total and when it finds such an environment it replaces the value, in that environment, with the value of right hand side.

If the global or top-level environment is reached without finding the symbol total then that variable is created and assigned to there. Users can customize their environment in several different ways.

There is a site initialization file and every directory can have its own special initialization file. Finally, the special functions. First and. Last can be used. If that variable is unset, the file Rprofile. This file should contain the commands that you want to execute every time R is started under your system.

A second, personal, profile file named. Rprofile 24 can be placed in any directory. If R is invoked in that directory then that file will be sourced. This file gives individual users control over their workspace and allows for different startup procedures in different working directories.

If no. Rprofile file is found in the startup directory, then R looks for a. Rprofile files. Any function named. First in either of the two profile files or in the.

RData image has a special status. It is automatically performed at the beginning of an R session and may be used to initialize the environment.

Thus, the sequence in which files are executed is, Rprofile. RData and then. A definition in later files will mask definitions in earlier files. Similarly a function. Last , if defined, is normally executed at the very end of the session.

An example is given below. The class of an object determines how it will be treated by what are known as generic functions. Put the other way round, a generic function performs a task or action on its arguments specific to the class of the argument itself.

If the argument lacks any class attribute, or has a class not catered for specifically by the generic function in question, there is always a default action provided.

An example makes things clearer. The class mechanism offers the user the facility of designing and writing generic functions for special purposes. Among the other generic functions are plot for displaying objects graphically, summary for summarizing analyses of various types, and anova for comparing statistical models. The number of generic functions that can treat a class in a specific way can be quite large.

For example, the functions that can accommodate in some fashion objects of class "data. Conversely the number of classes a generic function can handle can also be quite large.

For example the plot function has a default method and variants for objects of classes "data. A complete list can be got again by using the methods function:. The presence of UseMethod indicates this is a generic function. To see what methods are available we can use methods. In this example there are six methods, none of which can be seen by typing its name. We can read these by either of. A function named gen. The reader is referred to the R Language Definition for a more complete discussion of this mechanism.

This section presumes the reader has some familiarity with statistical methodology, in particular with regression analysis and the analysis of variance. Later we make some rather more ambitious presumptions, namely that something is known about generalized linear models and nonlinear regression. The requirements for fitting statistical models are sufficiently well defined to make it possible to construct general tools that apply in a broad spectrum of problems.

R provides an interlocking suite of facilities that make fitting statistical models very simple. As we mention in the introduction, the basic output is minimal, and one needs to ask for the details by calling extractor functions. The template for a statistical model is a linear regression model with independent, homoscedastic errors. Suppose y , x , x0 , x1 , x2 , … are numeric variables, X is a matrix and A , B , C , … are factors.

The following formulae on the left side below specify statistical models as described on the right. Both imply the same simple linear regression model of y on x. The first has an implicit intercept term, and the second an explicit one. Simple linear regression of y on x through the origin that is, without an intercept term.

Multiple regression of the transformed variable, log y , on x1 and x2 with an implicit intercept term. Polynomial regression of y on x of degree 2. The first form uses orthogonal polynomials, and the second uses explicit powers, as basis.

Multiple regression y with model matrix consisting of the matrix X as well as polynomial terms in x to degree 2.

Single classification analysis of variance model of y , with classes determined by A. Single classification analysis of covariance model of y , with classes determined by A , and with covariate x. Two factor non-additive model of y on A and B. The first two specify the same crossed classification and the second two specify the same nested classification.

In abstract terms all four specify the same model subspace. Three factor experiment but with a model containing main effects and two factor interactions only. Both formulae specify the same model. Separate simple linear regression models of y on x within the levels of A , with different codings. The last form produces explicit estimates of as many different intercepts and slopes as there are levels in A.

An experiment with two treatment factors, A and B , and error strata determined by factor C. For example a split plot experiment, with whole plots and hence also subplots , determined by factor C.



0コメント

  • 1000 / 1000