Search This Blog

Showing posts with label R Language. Show all posts
Showing posts with label R Language. Show all posts

Thursday, November 3, 2011

R Language Quick Tips

Built-in Functions

Almost everything in R is done through functions. Here I'm only refering to numeric and character functions that are commonly used in creating or recoding variables.

NUMERIC FUNCTIONS

FunctionDescription
abs(x)absolute value
sqrt(x)square root
ceiling(x)ceiling(3.475) is 4
floor(x)floor(3.475) is 3
trunc(x)trunc(5.99) is 5
round(x, digits=n)round(3.475, digits=2) is 3.48
signif(x, digits=n)signif(3.475, digits=2) is 3.5
cos(x), sin(x), tan(x)also acos(x), cosh(x), acosh(x), etc.
log(x)natural logarithm
log10(x)common logarithm
exp(x)e^x

CHARACTER FUNCTIONS

FunctionDescription
substr(x, start=n1, stop=n2)Extract or replace substrings in a character vector.
x <- "abcdef"
substr(x, 2, 4) is "bcd"
substr(x, 2, 4) <- "22222" is "a222ef"
grep(pattern, x , ignore.case=FALSE, fixed=FALSE)Search for pattern in x. If fixed =FALSE then pattern is aregular expression. If fixed=TRUE then pattern is a text string. Returns matching indices.
grep("A", c("b","A","c"), fixed=TRUE) returns 2
sub(pattern,replacementx, ignore.case =FALSE, fixed=FALSE)Find pattern in x and replace with replacement text. If fixed=FALSE then pattern is a regular expression.If fixed = T then pattern is a text string.
sub("\\s",".","Hello There") returns "Hello.There"
strsplit(xsplit)Split the elements of character vector x at split.
strsplit("abc", "") returns 3 element vector "a","b","c"
paste(..., sep="")Concatenate strings after using sep string to seperate them.
paste("x",1:3,sep="") returns c("x1","x2" "x3")
paste("x",1:3,sep="M") returns c("xM1","xM2" "xM3")
paste("Today is", date())
toupper(x)Uppercase
tolower(x)Lowercase

STATISTICAL PROBABILITY FUNCTIONS

The following table describes functions related to probaility distributions. For random number generators below, you can use set.seed(1234) or some other integer to create reproducible pseudo-random numbers.
FunctionDescription
dnorm(x)normal density function (by default m=0 sd=1)
# plot standard normal curve
x <- pretty(c(-3,3), 30)
y <- dnorm(x)
plot(x, y, type='l', xlab="Normal Deviate", ylab="Density", yaxs="i")
pnorm(q)cumulative normal probability for q
(area under the normal curve to the right of q)
pnorm(1.96) is 0.975
qnorm(p)normal quantile.
value at the p percentile of normal distribution
qnorm(.9) is 1.28 # 90th percentile
rnorm(n, m=0,sd=1)n random normal deviates with mean m
and standard deviation sd.
#50 random normal variates with mean=50, sd=10
x <- rnorm(50, m=50, sd=10)
dbinom(x, size, prob)
pbinom(
qsizeprob)
qbinom(
psizeprob)
rbinom(
n, size, prob)
binomial distribution where size is the sample size
and prob is the probability of a heads (pi)
# prob of 0 to 5 heads of fair coin out of 10 flips
dbinom(0:5, 10, .5)
# prob of 5 or less heads of fair coin out of 10 flips
pbinom(5, 10, .5)
dpois(xlamda)
ppois(
qlamda)
qpois(
plamda)
rpois(
nlamda)
poisson distribution with m=std=lamda
#probability of 0,1, or 2 events with lamda=4
dpois(0:2, 4)
# probability of at least 3 events with lamda=4
1- ppois(2,4)
dunif(x, min=0, max=1)
punif(
q, min=0, max=1)
qunif(
p, min=0, max=1)
runif(
n, min=0, max=1)
uniform distribution, follows the same pattern
as the normal distribution above.
#10 uniform random variates
x <- runif(10)

OTHER STATISTICAL FUNCTIONS

Other useful statistical functions are provided in the following table. Each has the option na.rm to strip missing values before calculations. Otherwise the presence of missing values will lead to a missing result. Object can be a numeric vector or dataframe.
FunctionDescription
mean(x, trim=0,
na.rm=
FALSE)
mean of object x
# trimmed mean, removing any missing values and
# 5 percent of highest and lowest scores
mx <- mean(x,trim=.05,na.rm=TRUE)
sd(x)standard deviation of object(x). also look at var(x) for variance and mad(x) for median absolute deviation.
median(x)median
quantile(x,probs)quantiles where x is the numeric vector whose quantiles are desired and probs is a numeric vector with probabilities in [0,1].
# 30th and 84th percentiles of x
y <- quantile(x, c(.3,.84))
range(x)range
sum(x)sum
diff(x, lag=1)lagged differences, with lag indicating which lag to use
min(x)minimum
max(x)maximum
scale(x, center=TRUE, scale=TRUE)column center or standardize a matrix.

OTHER USEFUL FUNCTIONS

FunctionDescription
seq(from , to, by)generate a sequence
indices <- seq(1,10,2)
#indices is c(1, 3, 5, 7, 9)
rep(xntimes)repeat x n times
y <- rep(1:3, 2)
# y is c(1, 2, 3, 1, 2, 3)
cut(x, n)divide continuous variable in factor with n levels
y <- cut(x, 5)
Note that while the examples on this page apply functions to individual variables, many can be applied to vectors and matrices as well.

Thursday, October 13, 2011

RApache Configuration

In the last post we installed Rapache, but didn't get to configuring it.  Let's now create the "rapache.conf" file and some "Hello World" examples to check that Rapache works.

Configuration

As was the case for PHP and Python, we create a file in the "/etc/httpd/conf.d" directory with configuration information.  We will name this "rapache.conf" and start out with the following text:


LoadModule R_module /etc/httpd/modules/mod_R.so

ROutputErrors

<Location /RApacheInfo>
  SetHandler r-info
</Location>

<Directory /var/www/html/rscripts>
  SetHandler r-script
  RHandler sys.source
</Directory>

<Directory /var/www/html/brew>
  SetHandler r-script
  RHandler brew::brew
</Directory>



The "LibModule" statement tells Apache to load the "mod_R.so" shared library and associate it with the "R_module" set of directives.

The "ROutputErrors" statement indicates that R errors should be displayed in the brower.

The "Location" statement creates a location "RApacheInfo" that displays information about the running rapache module. We can test that rapache has loaded and is running correctly by browsing to the link:


http://localhost/RApacheInfo


The first "Directory" statement indicates that all files in the "rscripts" subdirectory will be processed by the "sys.source()" function. This will execute the file as an R script.

The second "Directory" statement indicates that all files in the "brew" subdirectory will be processed by the "brew" function that is in the "brew" package. This function takes a file containing a mix of HTML and R code, executes the R code, and places the results within the HTML that is returned. This is analogous to the mixture of HTML and code in PHP and PSP.

Hello World: R Script

We can test the "R Script" handling with some simple code that generates HTML. This code also uses the "setContentType" function to indicate that the result should be treated as HTML, and finishes with the "DONE" statement indicating the script has finished without error.


setContentType("text/html")
cat("<HTML><BODY><H1>")
cat("Hello from R!")
cat("</H1></BODY></HTML>")
DONE



If we save this to the file "test.R" in "/var/www/html/rscripts" we will see "Hello from R!" displayed in the Header 1 font when we browse to:


http://localhost/rscripts/test.R


Hello World: Brew

Instead of writing out all of the HTML directly with "cat()" commands, we can create an "rhtml" file containing a mix of R and HTML. This then gets processed by the "brew" function to create the HTML response.

As an example, we create the file "test.rhtml" in "/var/www/html/brew" containing:


<HTML>
<BODY>
<H1>
<% cat("Hello from Brew!") %>
</H1>
</BODY>
</HTML>



Browsing to "http://localhost/brew/test.rhtml" will display "Hello from Brew!" in the Header 1 font.

Beyond Hello

Rapache provides rich capabilities for accessing all of the information in the HTTP request from within R, and for setting information as part of the HTTP response.

It also provides ways to pass information other than HTML back as the response, and can even support the uploading of files to the server as part of the client's request.

This is discussed on the Rapache manual and displayed by the examples available from the Rapache web site.

Installing R Language

    * R built for use as a shared library
    * rapache to connect R with Apache
    * The "brew" package for processing HTML files with embedded R code
    * Other packages we want to be available from R

Installing Dependencies

In order for Apache to load R, the R application needs to be built as a shared library.  As we are building R from the source, it's convenient to make sure we have installed other C libraries commonly needed by R first.

Recall that in the CentOS machine we configured there is a user "r-user" with password "r-passwd".  The system password is "r-lamp".  Start up CentOS and log in as "r-user".

Open a terminal and use the "su" command to gain administrative rights.  Then use "yum" to install the following packages:



yum install gcc-gfortran
yum install gcc-c++
yum install readline-devel
yum install libpng-devel libX11-devel libXt-devel
yum install texinfo-tex
yum install tetex-dvips
yum install docbook-utils-pdf
yum install cairo-devel
yum install java-1.6.0-openjdk-devel
yum install libxml2-devel


It isn't strictly required to install all of these.  For example, the "openjdk" libraries are only needed if the "rJava" package is going to be used.  However, it's much easier to install them all now than to rebuild R later when the desire to use "rJava" occurs.

Installing R

The first step is to retrieve the source "tar.gz" file from CRAN.  The source for the latest release is available from the main page of the CRAN web site, such as:

http://cran.r-project.org/src/base/R-2/R-2.10.1.tar.gz

Use Firefox to download this file, and move it to the home directory for "r-user".

At this point we need to follow the R installation instructions carefully.  Two important points:

    * In order for the files to be readable by others, we need to set the appropriate default permissions for newly created files.  For example, set "umask 022".
    * When configuring the R build, include the flag "--enable-R-shlib".

The following commands will unpack the files and build R 2.10.1, which is the most recent version when this was written:



umask 022
tar xf R-2.10.1.tar.gz
cd R-2.10.1
./configure --enable-R-shlib
make


For a quick check on whether R built correctly:



make check


When you are happy with the build, install R with:



make install


Installing R Packages

We will go ahead and install various packages that are likely to be of use with web applications.  It is likely that you will want to disallow installation of packages by R scripts run from Apache for security reasons, so the required packages would be installed in advance.

To install the packages, first use "cd" to change back to the home directory of "r-user" then start R with "R".  Package installation will possibly fail if you start "R" from within the directory where we were running the "make" command.

Use the "install.packages()" function to install the packages, such as:



install.packages(c("brew", "XML", "rjson", "RMySQL", "RJDBC", "rJava","Cairo", "Hmisc"))


The only package that's required for use with "rapache" is "brew".

Installing Rapache

The procedure for downloading, building, and installing Rapache is similar to that for R.  The main detail is you need to include "--with-apache2-apxs=/usr/sbin/apxs" when doing the configuration.

Download the "tar.gz" from the rapache web site:

http://rapache.net/files/rapache-1.1.9.tar.gz

Move the file to the home directory for "r-user" and run the following commands:



tar xf rapache-1.1.9.tar.gz
cd rapache-1.1.0
./configure --with-apache2-apxs=/usr/sbin/apxs
make install


Rapache is now installed.

Configuring Apache

The next step is configuring Apache HTTPD to load the R module.

Create a file "rapache.conf" in "/etc/httpd/conf.d" with this basic configuration information:



LoadModule R_module /etc/httpd/modules/mod_R.so

ROutputErrors

<Location /RApacheInfo>
  SetHandler r-info
</Location>


LoadModule tells Apache to load rapache.  ROutputErrors tells it to direct error messages from the R engine to the browser rather than just putting them in a log file.  SetHandler maps the "RApacheInfo" location to an action returning information about the rapache configuration.

To test this out, browse to "http://localhost/RApacheInfo".  Information about rapache should be displayed.

At this point we have rapache working, but Apache is not yet configured to process "R" or "RHTML" files.  That configuration and example test scripts will be covered in a later post.  If you are anxious to get that configuration in place, see the rapache manual for details.