Friday, 25 October 2013

hdfs commands

hadoop fs -cmd <args>

hadoop fs -ls /

hadoop fs -lsr /

hadoop fs -mkdir xxx/xxx

hadoop fs -put file.txt

hadoop fs -cat file.txt | head

hadoop fs -tail file.txt

hadoop fs -rm file.txt


Compiling and running a Hadoop java program

create classes from .java code file

javac -classpath hadoop-core-1.2.1.jar:lib/commons-cli-1.2.jar -d code/classes code/WordCount.java

create .jar from classes 

jar -cvf code/wordcount.jar -C code/classes/ .

run the program

hadoop jar code/wordcount.jar org.apache.hadoop.examples.WordCount inputdir outputdir

view the results

hadoop fs -cat output/*

Wednesday, 23 October 2013

Hadoop Pseudo Distributed Mode

The pseudo-distributed mode is running Hadoop in a “cluster of one” with all daemons running on a single machine. This mode complements the standalone mode for debugging your code, allowing you to examine memory usage, HDFS input/output issues, and other daemon interactions.

edit conf/hadoop-env.sh

so that the line below points correctly to your java installation:

#The java implementation to use. Required.
export JAVA_HOME=/usr/local/jdk1.7.0_45
Edit the xml files in /conf dir

core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
<description>The name of the default file system. A URI whos scheme and authority determine the FileSystem implementation.</description>
</property>
</configuration>

mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
<description>The host and port that the MapReduce job tracker runs at.</description>
</property>
</configuration>

hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>The actual number of replications can be specified when the file is created.</description>
</property>
</configuration>

In core-site.xml and mapred-site.xml we specify the hostname and port of the NameNode and the JobTracker, respectively. 

In hdfs-site.xmlwe specify the default replication factor for HDFS, which should only be one because we’re running on only one node. 

We must also specify the location of the Secondary NameNode in the masters file and the slave nodes in the slaves file. Make sure you are in the conf dir:

cat masters >> localhost

cat slaves >> localhost

While all the daemons are running on the same machine, they still communicate with each other using the same SSH protocol as if they were distributed over a cluster. Section 2.2 has a more detailed discussion of setting up the SSH channels, but for single-node operation simply check to see if your machine already allows you to ssh back to itself.

ssh localhost

If it does, then you’re good. Otherwise try entering the following:

sudo apt-get install openssh-server
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa

cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

You are almost ready to start Hadoop. But first you’ll need to format your HDFS by using the command

bin/hadoop namenode -format

We can now launch the daemons by use of the start-all.sh script. The Java jps command will list all daemons to verify the setup was successful.

bin/start-all.sh

jps

connect to host localhost port 22: Connection refused

if you are using ubuntu or mint, ssh server is not installed by default, only the client is, so run the following and this should fix the problem:

sudo apt-get install openssh-server

Saturday, 19 October 2013

Install Sun Java on Mac OS X and set JAVA_HOME path

First, go here:

http://www.oracle.com/technetwork/java/javase/downloads/index.html

Click on JDK download box. This takes you to a new page with all the downloads on it.

Download the correct installation for your machine. For this tutorial, we will use the one that says Mac OS X x64. So, download the dmg file, then, when it completes downloading, double click it, and install like any other software.

What we want to do now is let our operating system know where this installation lives. Once we do this, and other software that you use that would like to use Java, will be able to use this operating system variable to find our Java installation.

These "operating system variables" are known as Environment Variables.

Run these two lines in your terminal:

export JAVA_HOME=/Library/Java/Home
export PATH=$PATH:$JAVA_HOME/bin

Now, to check that it has been set correctly, you can run:

echo $JAVA_HOME

and the output should be the location that you just set above.

To see the version of java you have installed, run

java -version

thanks

Monday, 14 October 2013

How do I move tempdb location SQL Server

This is how to move your tempdb location.

Why?

Maybe they have grown too big and the existing drive does not have enough space.

It can improve database disk read speed as they can be read in parallel.

Below are instructions to move the existing tempdb files to new drives, g and h.

Open management studio and connect to your server. Start a new query, and run the following sql to get the current location of your tempdb files:

USE TempDB
GO
EXEC sp_helpfile
GO

The results should show the name of the files and their location.

The names of the files are usually tempdev and templog. Location should be something like:

C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\tempdb.mdf

Remember the names, they will be used in the next statement:

Run the sql below to move mdf and ldf tempdb files.

USE master
GO

ALTER DATABASE TempDB MODIFY FILE (NAME = tempdev, FILENAME = 'd:datatempdb.mdf')
GO

ALTER DATABASE TempDB MODIFY FILE (NAME = templog, FILENAME = 'e:datatemplog.ldf')
GO

zoomquilt is pretty amazing

Take a look herre: http://zoomquilt.org/

Sunday, 13 October 2013

How can I see where python is installed? Linux - Ubunutu - Mint

to see which directories make up your python installations, you can type:

whereis python

simple as that.

you can type

which python

to see the default installation location

and you can type

python 

on its own to get the version

Saturday, 12 October 2013

vmware workstation path is not a valid path generic kernel headers

Before installing Vmware Workstation you need to install build-essential and linux headers

sudo apt-get install build-essential linux-headers-$(uname -r)

and then

sudo ln -s /usr/src/linux-headers-$(uname -r)/include/generated/uapi/linux/version.h /usr/src/linux-headers-$(uname -r)/include/linux/version.h

now install vmware tools

Wednesday, 9 October 2013

Install Sun Java on Mint, Ubuntu, or any linux dist

First, go here:

http://www.oracle.com/technetwork/java/javase/downloads/index.html

Click on JDK download, and download the correct installtion for your machine. For this tutorial, we will use the tar.gz version.

So if you are using a 64bit linux distribution, the file will look something like: jdk-7u40-linux-x64.tar.gz (version permitting)

What we want to do is extract the contents, then let our operating system know where this installation lives. Once we do this, and other software that you use that would like to use Java, will be able to use this operating system variable to find our Java installtion.

These "operating system variables" are known as Environment Variables.

Once you have downloaded the file, open terminal, cd to where the file is located. Good practice is to first move the file to a better location than Downloads. So:

sudo mv jdk-7u40-linux-x64.tar.gz /usr/local

then

cd /usr/local

now you are in the correct directory, and so is your java download. To extract the contents, issue the command:

tar xzf jdk-7u40-linux-x64.tar.gz

Now, add or change these two lines in your ~/.profile to point to the installation:

export JAVA_HOME=/usr/local/ jdk-7u40

export PATH=$PATH:$JAVA_HOME/bin

Wednesday, 2 October 2013

SQL Server Express through firewall on AWS

This is just to help anyone that gets caught out.

SQL Server Express does not, by default, run on port 1433 like SQL Server.

It can be set to run on a random port.

To find the port your instance is running on:

Open SQL Server Configuration Manager

Under protocols for SQL Express: "enable all"

In TCP/IP properties, go to the bottom of IP addresses, change or note the port.

Now open up AWS firewall for that port and you are ready to go :)