Skip to main content

How To configure and install RHadoop ( R Streaming in Hadoop)

For installing and configuring RHadoop framework we would require Hadoop (2.6.0 and above installed on every machine if in cluster) and RStudio. You can refer Michael G Noll's Blog or Chalpritam's Blog for both hadoop single and multi-node setup. This configuration and installation steps have been tested on Ubuntu 14.04 LTS 32-bit OS please feel free to contact me via comments if there is some error in steps
1. Getting into root access to install all the RHadoop libraries globally
    
sudo su
2. Start R Terminal using the command below
  
R
3. Install RHadoop framework Libraries by using the following commands

install.packages(c("codetools", "R", "Rcpp", "RJSONIO", "bitops", "digest", "functional", "stringr", "plyr", "reshape2", "rJava")) 
install.packages(c("dplyr","R.methodsS3")) 
install.packages(c("Hmisc")) 
 install.packages(c("caTools")
4. Set up the system environment variables
  • Sys.setenv(HADOOP_HOME="/usr/local/hadoop")
  • Sys.setenv(HADOOP_CMD="/usr/local/hadoop/bin/hadoop")
  • Sys.setenv(HADOOP_STREAMING="/usr/local/hadoop/share/hadoop/tools/lib/hadoopversiomentionhere.jar")

5. Download rmr2 and rdfs packages from here and install them using the following commands

 install.packages(path_to_rmr2package, repos = NULL, type="source")
install.packages(path_to_rdfspackage, repos = NULL , type="source")
6. After installing these packages switch to RStudio and run a test code given below. Make sure you have executed the start-all.sh srcipt of hadoop in a separate terminal before executing this program

Sys.setenv(HADOOP_HOME="/usr/local/hadoop")
Sys.setenv(HADOOP_CMD='/usr/local/hadoop/bin/hadoop')
Sys.setenv(JAVA_HOME='/usr/lib/jvm/java-7-openjdk-amd64')
Sys.setenv(HADOOP_STREAMING="/usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.6.0.jar")
library("rmr2")
library("rJava")
library("rhdfs")
hdfs.init()
ints = to.dfs(1:100)
calc = mapreduce(input = ints,map = function(k, v) cbind(v, 2*v))
from.dfs(calc)

Comments

Popular posts from this blog

Customizing(Hacking) softwares in Windows

Often a times, we feel not satisfied with the functioning of an already installed software and we wish whether we could change it. To tell you, the look, dialogs, windows are all hard coded to form a software and at the first place it seems an impossible task for us to customize it especially when we only have an .exe file of it. But wait! That same exe file is all you need! I will share with you two amazing tools which I use and by experience I am telling you, they are damn easy. Resource Tuner and Resource Hacker. Let' see what they have got for us. They are tools which enables you to customize your programs by modifying the executable files( EXE, DLL, SYS, MSSTYLES, SCR and more) without any programming skills required! You can change the icon, replace the images in the applications, translate the messages, change the menu or any other element of your user interface with just a matter of few clicks. To upload an exe file you can just do it from the "open" option u...

Installing Anaconda and Spyder

Anaconda and Spyder are world class scientific as well as financial analysis IDE for python. Anaconda and Spyder Installation saves you from installing and configuring numpy and scipy components separately.If any of the commands doesn't work please notify me in the comments. Thanks!!! Install Anaconda from here  Anaconda sudo -su bash Anaconda-2.2.0-Linux-x86.sh apt-get install spyder spyder