hadoop fs -cmd <args>
hadoop fs -ls /
hadoop fs -lsr /
hadoop fs -mkdir xxx/xxx
hadoop fs -put file.txt
hadoop fs -cat file.txt | head
hadoop fs -tail file.txt
hadoop fs -rm file.txt
Friday, 25 October 2013
Compiling and running a Hadoop java program
create classes from .java code file
javac -classpath hadoop-core-1.2.1.jar:lib/commons-cli-1.2.jar -d code/classes code/WordCount.java
create .jar from classes
jar -cvf code/wordcount.jar -C code/classes/ .
run the program
hadoop jar code/wordcount.jar org.apache.hadoop.examples.WordCount inputdir outputdir
view the results
hadoop fs -cat output/*
javac -classpath hadoop-core-1.2.1.jar:lib/commons-cli-1.2.jar -d code/classes code/WordCount.java
create .jar from classes
jar -cvf code/wordcount.jar -C code/classes/ .
run the program
hadoop jar code/wordcount.jar org.apache.hadoop.examples.WordCount inputdir outputdir
view the results
hadoop fs -cat output/*
Wednesday, 23 October 2013
Hadoop Pseudo Distributed Mode
The pseudo-distributed mode is running Hadoop in a “cluster of one” with all daemons running on a single machine. This mode complements the standalone mode for debugging your code, allowing you to examine memory usage, HDFS input/output issues, and other daemon interactions.
edit conf/hadoop-env.sh
so that the line below points correctly to your java installation:
#The java implementation to use. Required.
export JAVA_HOME=/usr/local/jdk1.7.0_45
edit conf/hadoop-env.sh
#The java implementation to use. Required.
export JAVA_HOME=/usr/local/jdk1.7.0_45
Edit the xml files in /conf dir
core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
<description>The name of the default file system. A URI whos scheme and authority determine the FileSystem implementation.</description>
</property>
</configuration>
mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
<description>The host and port that the MapReduce job tracker runs at.</description>
</property>
</configuration>
hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>The actual number of replications can be specified when the file is created.</description>
</property>
</configuration>
In core-site.xml and mapred-site.xml we specify the hostname and port of the NameNode and the JobTracker, respectively.
core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
<description>The name of the default file system. A URI whos scheme and authority determine the FileSystem implementation.</description>
</property>
</configuration>
mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
<description>The host and port that the MapReduce job tracker runs at.</description>
</property>
</configuration>
hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>The actual number of replications can be specified when the file is created.</description>
</property>
</configuration>
In core-site.xml and mapred-site.xml we specify the hostname and port of the NameNode and the JobTracker, respectively.
In hdfs-site.xmlwe specify the default replication factor for HDFS, which should only be one because we’re running on only one node.
We must also specify the location of the Secondary NameNode in the masters file and the slave nodes in the slaves file. Make sure you are in the conf dir:
cat masters >> localhost
cat slaves >> localhost
While all the daemons are running on the same machine, they still communicate with each other using the same SSH protocol as if they were distributed over a cluster. Section 2.2 has a more detailed discussion of setting up the SSH channels, but for single-node operation simply check to see if your machine already allows you to ssh back to itself.
ssh localhost
If it does, then you’re good. Otherwise try entering the following:
sudo apt-get install openssh-server
cat masters >> localhost
cat slaves >> localhost
While all the daemons are running on the same machine, they still communicate with each other using the same SSH protocol as if they were distributed over a cluster. Section 2.2 has a more detailed discussion of setting up the SSH channels, but for single-node operation simply check to see if your machine already allows you to ssh back to itself.
ssh localhost
If it does, then you’re good. Otherwise try entering the following:
sudo apt-get install openssh-server
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
You are almost ready to start Hadoop. But first you’ll need to format your HDFS by using the command
bin/hadoop namenode -format
We can now launch the daemons by use of the start-all.sh script. The Java jps command will list all daemons to verify the setup was successful.
bin/start-all.sh
jps
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
You are almost ready to start Hadoop. But first you’ll need to format your HDFS by using the command
bin/hadoop namenode -format
We can now launch the daemons by use of the start-all.sh script. The Java jps command will list all daemons to verify the setup was successful.
bin/start-all.sh
jps
connect to host localhost port 22: Connection refused
if you are using ubuntu or mint, ssh server is not installed by default, only the client is, so run the following and this should fix the problem:
sudo apt-get install openssh-server
sudo apt-get install openssh-server
Saturday, 19 October 2013
Install Sun Java on Mac OS X and set JAVA_HOME path
First, go here:
http://www.oracle.com/technetwork/java/javase/downloads/index.html
Click on JDK download box. This takes you to a new page with all the downloads on it.
Download the correct installation for your machine. For this tutorial, we will use the one that says Mac OS X x64. So, download the dmg file, then, when it completes downloading, double click it, and install like any other software.
What we want to do now is let our operating system know where this installation lives. Once we do this, and other software that you use that would like to use Java, will be able to use this operating system variable to find our Java installation.
These "operating system variables" are known as Environment Variables.
Run these two lines in your terminal:
export JAVA_HOME=/Library/Java/Home
export PATH=$PATH:$JAVA_HOME/bin
Now, to check that it has been set correctly, you can run:
echo $JAVA_HOME
and the output should be the location that you just set above.
To see the version of java you have installed, run
java -version
thanks
http://www.oracle.com/technetwork/java/javase/downloads/index.html
Click on JDK download box. This takes you to a new page with all the downloads on it.
Download the correct installation for your machine. For this tutorial, we will use the one that says Mac OS X x64. So, download the dmg file, then, when it completes downloading, double click it, and install like any other software.
What we want to do now is let our operating system know where this installation lives. Once we do this, and other software that you use that would like to use Java, will be able to use this operating system variable to find our Java installation.
These "operating system variables" are known as Environment Variables.
Run these two lines in your terminal:
export JAVA_HOME=/Library/Java/Home
export PATH=$PATH:$JAVA_HOME/bin
Now, to check that it has been set correctly, you can run:
echo $JAVA_HOME
and the output should be the location that you just set above.
To see the version of java you have installed, run
java -version
thanks
Monday, 14 October 2013
How do I move tempdb location SQL Server
This is how to move your tempdb location.
Why?
Maybe they have grown too big and the existing drive does not have enough space.
It can improve database disk read speed as they can be read in parallel.
Below are instructions to move the existing tempdb files to new drives, g and h.
Open management studio and connect to your server. Start a new query, and run the following sql to get the current location of your tempdb files:
USE TempDB
GO
EXEC sp_helpfile
GO
The results should show the name of the files and their location.
The names of the files are usually tempdev and templog. Location should be something like:
C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\tempdb.mdf
Remember the names, they will be used in the next statement:
Run the sql below to move mdf and ldf tempdb files.
USE master
GO
ALTER DATABASE TempDB MODIFY FILE (NAME = tempdev, FILENAME = 'd:datatempdb.mdf')
GO
ALTER DATABASE TempDB MODIFY FILE (NAME = templog, FILENAME = 'e:datatemplog.ldf')
GO
Why?
Maybe they have grown too big and the existing drive does not have enough space.
It can improve database disk read speed as they can be read in parallel.
Below are instructions to move the existing tempdb files to new drives, g and h.
Open management studio and connect to your server. Start a new query, and run the following sql to get the current location of your tempdb files:
USE TempDB
GO
EXEC sp_helpfile
GO
The results should show the name of the files and their location.
The names of the files are usually tempdev and templog. Location should be something like:
C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\tempdb.mdf
Remember the names, they will be used in the next statement:
Run the sql below to move mdf and ldf tempdb files.
USE master
GO
ALTER DATABASE TempDB MODIFY FILE (NAME = tempdev, FILENAME = 'd:datatempdb.mdf')
GO
ALTER DATABASE TempDB MODIFY FILE (NAME = templog, FILENAME = 'e:datatemplog.ldf')
GO
zoomquilt is pretty amazing
Take a look herre: http://zoomquilt.org/
Sunday, 13 October 2013
How can I see where python is installed? Linux - Ubunutu - Mint
to see which directories make up your python installations, you can type:
whereis python
simple as that.
you can type
which python
to see the default installation location
and you can type
python
on its own to get the version
whereis python
simple as that.
you can type
which python
to see the default installation location
and you can type
python
on its own to get the version
Saturday, 12 October 2013
vmware workstation path is not a valid path generic kernel headers
Before installing Vmware Workstation you need to install build-essential and linux headers
sudo apt-get install build-essential linux-headers-$(uname -r)
and then
sudo ln -s /usr/src/linux-headers-$(uname -r)/include/generated/uapi/linux/version.h /usr/src/linux-headers-$(uname -r)/include/linux/version.h
now install vmware tools
sudo apt-get install build-essential linux-headers-$(uname -r)
and then
sudo ln -s /usr/src/linux-headers-$(uname -r)/include/generated/uapi/linux/version.h /usr/src/linux-headers-$(uname -r)/include/linux/version.h
now install vmware tools
Wednesday, 9 October 2013
Install Sun Java on Mint, Ubuntu, or any linux dist
First, go here:
http://www.oracle.com/technetwork/java/javase/downloads/index.html
Click on JDK download, and download the correct installtion for your machine. For this tutorial, we will use the tar.gz version.
So if you are using a 64bit linux distribution, the file will look something like: jdk-7u40-linux-x64.tar.gz (version permitting)
What we want to do is extract the contents, then let our operating system know where this installation lives. Once we do this, and other software that you use that would like to use Java, will be able to use this operating system variable to find our Java installtion.
These "operating system variables" are known as Environment Variables.
Once you have downloaded the file, open terminal, cd to where the file is located. Good practice is to first move the file to a better location than Downloads. So:
sudo mv jdk-7u40-linux-x64.tar.gz /usr/local
then
cd /usr/local
now you are in the correct directory, and so is your java download. To extract the contents, issue the command:
tar xzf jdk-7u40-linux-x64.tar.gz
Now, add or change these two lines in your ~/.profile to point to the installation:
export JAVA_HOME=/usr/local/ jdk-7u40
export PATH=$PATH:$JAVA_HOME/bin
http://www.oracle.com/technetwork/java/javase/downloads/index.html
Click on JDK download, and download the correct installtion for your machine. For this tutorial, we will use the tar.gz version.
So if you are using a 64bit linux distribution, the file will look something like: jdk-7u40-linux-x64.tar.gz (version permitting)
What we want to do is extract the contents, then let our operating system know where this installation lives. Once we do this, and other software that you use that would like to use Java, will be able to use this operating system variable to find our Java installtion.
These "operating system variables" are known as Environment Variables.
Once you have downloaded the file, open terminal, cd to where the file is located. Good practice is to first move the file to a better location than Downloads. So:
sudo mv jdk-7u40-linux-x64.tar.gz /usr/local
then
cd /usr/local
now you are in the correct directory, and so is your java download. To extract the contents, issue the command:
tar xzf jdk-7u40-linux-x64.tar.gz
Now, add or change these two lines in your ~/.profile to point to the installation:
export JAVA_HOME=/usr/local/ jdk-7u40
export PATH=$PATH:$JAVA_HOME/bin
Wednesday, 2 October 2013
SQL Server Express through firewall on AWS
This is just to help anyone that gets caught out.
SQL Server Express does not, by default, run on port 1433 like SQL Server.
It can be set to run on a random port.
To find the port your instance is running on:
Open SQL Server Configuration Manager
Under protocols for SQL Express: "enable all"
In TCP/IP properties, go to the bottom of IP addresses, change or note the port.
Now open up AWS firewall for that port and you are ready to go :)
SQL Server Express does not, by default, run on port 1433 like SQL Server.
It can be set to run on a random port.
To find the port your instance is running on:
Open SQL Server Configuration Manager
Under protocols for SQL Express: "enable all"
In TCP/IP properties, go to the bottom of IP addresses, change or note the port.
Now open up AWS firewall for that port and you are ready to go :)
Subscribe to:
Posts (Atom)