189 8069 5689

初识Hadoop

Hadoop

创新互联-专业网站定制、快速模板网站建设、高性价比上虞网站开发、企业建站全套包干低至880元,成熟完善的模板库,直接使用。一站式上虞网站制作公司更省心,省钱,快速模板网站建设找我们,业务覆盖上虞地区。费用合理售后完善,十年实体公司更值得信赖。

安装 Ubuntu环境
192.168.1.64 HNClient
192.168.1.65 HNName

SUSE,Ubuntu的vi不能使用退格键删除数据
删除的时候,要按ESC,再按X才能删除数据
插入数据,使用i
在当前行之下新开一行,使用o

在HNClient上操作
norman@HNClient:~$ sudo vi /etc/hostname
norman@HNClient:~$ HNClient
norman@HNClient:~$ sudo apt-get install openssh-server

norman@HNClient:~$ sudo vi /etc/hosts
192.168.1.64 HNClient
192.168.1.65 HNName

norman@HNClient:~$ ssh-keygen (下面直接默认回车)
Generating public/private rsa key pair.
Enter file in which to save the key (/home/norman/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/norman/.ssh/id_rsa.
Your public key has been saved in /home/norman/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:rj3kM5OeqxceqGP6DcofXa+hZFReLQmKqksqoYL+YH4 norman@HNClient
The key's randomart image is:
+---[RSA 2048]----+
| . |
| . . . o |
| . . . + . |
| . o . . |
| . ..S |
|.. o.o+. |
|+= o.++o+. |
|Xo.E+ +X+ |
|
oo.=+ |
+----[SHA256]-----+

norman@HNClient:~$ ssh localhost (ssh localhost,还是需要密码认证)
norman@localhost's password:
Welcome to Ubuntu 16.04.4 LTS (GNU/Linux 4.13.0-36-generic i686)

  • Documentation: https://help.ubuntu.com
  • Management: https://landscape.canonical.com
  • Support: https://ubuntu.com/advantage

251 packages can be updated.
79 updates are security updates.

New release '18.04.1 LTS' available.
Run 'do-release-upgrade' to upgrade to it.

Last login: Wed Oct 31 23:14:08 2018 from 192.168.1.65

norman@HNClient:~$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

norman@HNClient:~$ ssh localhost (ssh localhost,不需要密码认证了)
Welcome to Ubuntu 16.04.4 LTS (GNU/Linux 4.13.0-36-generic i686)

  • Documentation: https://help.ubuntu.com
  • Management: https://landscape.canonical.com
  • Support: https://ubuntu.com/advantage

251 packages can be updated.
79 updates are security updates.

New release '18.04.1 LTS' available.
Run 'do-release-upgrade' to upgrade to it.

Last login: Wed Oct 31 23:18:02 2018 from 127.0.0.1

norman@HNClient:~$ ssh HNName (ssh HNName,还是需要密码认证)
norman@hnname's password:

norman@HNClient:~$ ssh-copy-id -i ~/.ssh/id_rsa.pub norman@HNName

norman@HNClient:~$ ssh HNName (ssh HNName,不需要密码就能登陆HNName了)
Welcome to Ubuntu 16.04.4 LTS (GNU/Linux 4.13.0-36-generic i686)

  • Documentation: https://help.ubuntu.com
  • Management: https://landscape.canonical.com
  • Support: https://ubuntu.com/advantage

254 packages can be updated.
79 updates are security updates.

New release '18.04.1 LTS' available.
Run 'do-release-upgrade' to upgrade to it.

Last login: Wed Oct 31 23:23:21 2018 from 192.168.1.64
norman@HNName:~$

在HNName上操作

norman@HNName:~$ sudo vi /etc/hosts
192.168.1.64 HNClient
192.168.1.65 HNName

norman@HNName:~$ ssh-keygen (下面直接默认回车)
Generating public/private rsa key pair.
Enter file in which to save the key (/home/norman/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/norman/.ssh/id_rsa.
Your public key has been saved in /home/norman/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:YXrPGdhKYkPsAroDlIZJ4sYdbrpHyvaMQccMV3GJn9I norman@HNName
The key's randomart image is:
+---[RSA 2048]----+
|.. . oo.. |
|.+ oo.. |
|oO.= = + |
|+.B. + E + |
|oo =. B S o |
|+.= o = + o |
|o
. . + |
|..* |
| . o |
+----[SHA256]-----+
norman@HNClient:~$ ssh localhost (ssh localhost,还是需要密码认证)
norman@localhost's password:
Welcome to Ubuntu 16.04.4 LTS (GNU/Linux 4.13.0-36-generic i686)

  • Documentation: https://help.ubuntu.com
  • Management: https://landscape.canonical.com
  • Support: https://ubuntu.com/advantage

251 packages can be updated.
79 updates are security updates.

New release '18.04.1 LTS' available.
Run 'do-release-upgrade' to upgrade to it.

Last login: Wed Oct 31 22:55:29 2018 from 127.0.0.1

norman@HNName:~$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
norman@HNName:~$ ssh localhost (ssh localhost,不需要密码认证了)
Welcome to Ubuntu 16.04.4 LTS (GNU/Linux 4.13.0-36-generic i686)

  • Documentation: https://help.ubuntu.com
  • Management: https://landscape.canonical.com
  • Support: https://ubuntu.com/advantage

254 packages can be updated.
79 updates are security updates.

New release '18.04.1 LTS' available.
Run 'do-release-upgrade' to upgrade to it.

Last login: Wed Oct 31 23:00:28 2018 from 127.0.0.1

norman@HNName:~$ ssh-copy-id -i ~/.ssh/id_rsa.pub norman@hnclient
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/norman/.ssh/id_rsa.pub"
The authenticity of host 'hnclient (192.168.1.64)' can't be established.
ECDSA key fingerprint is SHA256:w5dwBrXor00JfFtpGXc0G/+deJJwmAxKmjXE32InhgA.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
norman@hnclient's password:

Number of key(s) added: 1

Now try logging into the machine, with: "ssh 'norman@hnclient'"
and check to make sure that only the key(s) you wanted were added.

norman@HNName:~$ ssh hnclient (ssh hnclient,不需要密码就能登陆hnclient了)
Welcome to Ubuntu 16.04.4 LTS (GNU/Linux 4.13.0-36-generic i686)

  • Documentation: https://help.ubuntu.com
  • Management: https://landscape.canonical.com
  • Support: https://ubuntu.com/advantage

251 packages can be updated.
79 updates are security updates.

New release '18.04.1 LTS' available.
Run 'do-release-upgrade' to upgrade to it.

Last login: Wed Oct 31 23:05:13 2018 from 192.168.1.58
norman@HNClient:~$ exit

norman@HNName:~$ sudo apt-get install openjdk-7-jdk
[sudo] password for norman:
Reading package lists... Done
Building dependency tree
Reading state information... Done
Package openjdk-7-jdk is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source

E: Package 'openjdk-7-jdk' has no installation candidate

是因为Ubuntu16.04的安装源已经默认没有openjdk7了,所以要自己手动添加仓库,如下:

norman@HNName:~$ sudo add-apt-repository ppa:openjdk-r/ppa (添加oracle openjdk ppa source)( add-apt-repository ppa: xxx/ppa 这句话的意思是获取最新的个人软件包档案源,将其添加至当前apt库中,并自动导入公钥。)
norman@HNName:~$ sudo apt-get update
norman@HNName:~$ sudo apt-get install openjdk-7-jdk
norman@HNName:~$ java -version
java version "1.7.0_95"
OpenJDK Runtime Environment (IcedTea 2.6.4) (7u95-2.6.4-3)
OpenJDK Client VM (build 24.95-b01, mixed mode, sharing)

norman@HNName:~$ wget http://archive.apache.org/dist/hadoop/core/hadoop-1.2.0/hadoop-1.2.0-bin.tar.gz
norman@HNName:~$ tar -zxvf hadoop-1.2.0-bin.tar.gz
norman@HNName:~$ sudo cp -r hadoop-1.2.0 /usr/local/hadoop
norman@HNName:~$ dir /usr/local/hadoop
bin hadoop-ant-1.2.0.jar hadoop-tools-1.2.0.jar NOTICE.txt
build.xml hadoop-client-1.2.0.jar ivy README.txt
c++ hadoop-core-1.2.0.jar ivy.xml sbin
CHANGES.txt hadoop-examples-1.2.0.jar lib share
conf hadoop-minicluster-1.2.0.jar libexec src
contrib hadoop-test-1.2.0.jar LICENSE.txt webapps

norman@HNName:~$ sudo vi $HOME/.bashrc (末尾添加以下)
export HADOOP_PREFIX=/usr/local/hadoop
export PATH=$PATH:$HADOOP_PREFIX/bin

norman@HNName:~$ exec bash
norman@HNName:~$ $PATH

norman@HNName:~$ sudo vi /usr/local/hadoop/conf/hadoop-env.sh
( The java implementation to use. Required.)
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-i386

( Extra Java runtime options. Empty by default. 设置禁用IPv6)
export HADOOP_OPTS=-Djava.net.preferIP4Stack=true

Installing Apache Hadoop (Single Node)
norman@HNName:~$ sudo vi /usr/local/hadoop/conf/core-site.xml


fs.default.name
hdfs://HNName:10001


hadoop.tmp.dir
/usr/local/hadoop/tmp

norman@HNName:~$ sudo vi /usr/local/hadoop/conf/mapred-site.xml


mapred.job.tracker
HNName:10002

norman@HNName:~$ sudo mkdir /usr/local/hadoop/tmp
norman@HNName:~$ sudo chown norman /usr/local/hadoop/tmp
norman@HNName:~$ hadoop namenode -format (能看到以下说明成功)
18/11/01 19:07:36 INFO common.Storage: Storage directory /usr/local/hadoop/tmp/dfs/name has been successfully formatted.

norman@HNName:~$ hadoop-daemons.sh start namenode (出以下错误)
localhost: mkdir: cannot create directory ?usr/local/hadoop/libexec/../logs? Permission denied
localhost: chown: cannot access '/usr/local/hadoop/libexec/../logs': No such file or directory
localhost: starting namenode, logging to /usr/local/hadoop/libexec/../logs/hadoop-norman-namenode-HNName.out
localhost: /usr/local/hadoop/bin/hadoop-daemon.sh: line 137: /usr/local/hadoop/libexec/../logs/hadoop-norman-namenode-HNName.out: No such file or directory
localhost: head: cannot open '/usr/local/hadoop/libexec/../logs/hadoop-norman-namenode-HNName.out' for reading: No such file or directory
localhost: /usr/local/hadoop/bin/hadoop-daemon.sh: line 147: /usr/local/hadoop/libexec/../logs/hadoop-norman-namenode-HNName.out: No such file or directory
localhost: /usr/local/hadoop/bin/hadoop-daemon.sh: line 148: /usr/local/hadoop/libexec/../logs/hadoop-norman-namenode-HNName.out: No such file or directory

norman@HNName:~$ ll /usr/local
total 44
drwxr-xr-x 11 root root 4096 Nov 1 02:02 ./
drwxr-xr-x 11 root root 4096 Feb 28 2018 ../
drwxr-xr-x 2 root root 4096 Feb 28 2018 bin/
drwxr-xr-x 2 root root 4096 Feb 28 2018 etc/
drwxr-xr-x 2 root root 4096 Feb 28 2018 games/
drwxr-xr-x 15 root root 4096 Nov 1 20:05 hadoop/
drwxr-xr-x 2 root root 4096 Feb 28 2018 include/
drwxr-xr-x 4 root root 4096 Feb 28 2018 lib/
lrwxrwxrwx 1 root root 9 Jul 26 23:29 man -> share/man/
drwxr-xr-x 2 root root 4096 Feb 28 2018 sbin/
drwxr-xr-x 8 root root 4096 Feb 28 2018 share/
drwxr-xr-x 2 root root 4096 Feb 28 2018 src/

norman@HNName:~$ sudo chown norman /usr/local/hadoop

norman@HNName:~$ ll /usr/local
total 44
drwxr-xr-x 11 root root 4096 Nov 1 02:02 ./
drwxr-xr-x 11 root root 4096 Feb 28 2018 ../
drwxr-xr-x 2 root root 4096 Feb 28 2018 bin/
drwxr-xr-x 2 root root 4096 Feb 28 2018 etc/
drwxr-xr-x 2 root root 4096 Feb 28 2018 games/
drwxr-xr-x 15 norman root 4096 Nov 1 20:05 hadoop/
drwxr-xr-x 2 root root 4096 Feb 28 2018 include/
drwxr-xr-x 4 root root 4096 Feb 28 2018 lib/
lrwxrwxrwx 1 root root 9 Jul 26 23:29 man -> share/man/
drwxr-xr-x 2 root root 4096 Feb 28 2018 sbin/
drwxr-xr-x 8 root root 4096 Feb 28 2018 share/
drwxr-xr-x 2 root root 4096 Feb 28 2018 src/

norman@HNName:~$ hadoop-daemons.sh start namenode
localhost: starting namenode, logging to /usr/local/hadoop/libexec/../logs/hadoop-norman-namenode-HNName.out

norman@HNName:~$ start-all.sh
norman@HNName:~$ jps
23297 DataNode
23610 TaskTracker
23484 JobTracker
23739 Jps
23102 NameNode
23416 SecondaryNameNode

norman@HNName:~$ dir /usr/local/hadoop/bin
hadoop hadoop-daemon.sh rcc start-all.sh start-dfs.sh start-mapred.sh stop-balancer.sh stop-jobhistoryserver.sh task-controller
hadoop-config.sh hadoop-daemons.sh slaves.sh start-balancer.sh start-jobhistoryserver.sh stop-all.sh stop-dfs.sh stop-mapred.sh

http://192.168.1.65:50070/dfshealth.jsp

http://192.168.1.65:50030/jobtracker.jsp

http://192.168.1.65:50060/tasktracker.jsp

Managing HDFS
http://www.gutenberg.org/files/2600/2600-0.txt (下载文本文件)
复制网页内容到war_and_peace.txt
https://www.ncdc.noaa.gov/orders/qclcd/ (下载任意数据)
QCLCD201701.zip,QCLCD201702.zip,然后解压出201701hourly.txt, 201702hourly.txt

在HNClient上操作
将数据 war_and_peace.txt 放到 /home/norman/data/book
将数据201701hourly.txt,201702hourly.txt放到 /home/norman/data/weather

norman@HNClient:~$ sudo mkdir -p /home/norman/data/book
norman@HNClient:~$ sudo mkdir -p /home/norman/data/weather
norman@HNClient:~$ sudo chown norman /home/norman/data/weather
norman@HNClient:~$ sudo chown norman /home/norman/data/book

norman@HNClient:~$ sudo add-apt-repository ppa:openjdk-r/ppa
norman@HNClient:~$ sudo apt-get update
norman@HNClient:~$ sudo apt-get install openjdk-7-jdk
norman@HNClient:~$ java -version
norman@HNClient:~$ wget http://archive.apache.org/dist/hadoop/core/hadoop-1.2.0/hadoop-1.2.0-bin.tar.gz
norman@HNClient:~$ tar -zxvf hadoop-1.2.0-bin.tar.gz
norman@HNClient:~$ sudo cp -r hadoop-1.2.0 /usr/local/hadoop
norman@HNClient:~$ sudo vi $HOME/.bashrc (末尾添加以下)
export HADOOP_PREFIX=/usr/local/hadoop
export PATH=$PATH:$HADOOP_PREFIX/bin

norman@HNClient:~$ exec bash
norman@HNClient:~$ $PATH
norman@HNClient:~$ sudo vi /usr/local/hadoop/conf/hadoop-env.sh
(The java implementation to use. Required.)
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-i386
( Extra Java runtime options. Empty by default. 设置禁用IPv6)
export HADOOP_OPTS=-Djava.net.preferIP4Stack=true

norman@HNClient:~$ sudo vi /usr/local/hadoop/conf/core-site.xml


fs.default.name
hdfs://HNName:10001


hadoop.tmp.dir
/usr/local/hadoop/tmp

norman@HNClient:~$ sudo vi /usr/local/hadoop/conf/mapred-site.xml


mapred.job.tracker
HNName:10002

norman@HNClient:~$ hadoop fs -mkdir test
norman@HNClient:~$ hadoop fs -ls
Found 1 items
drwxr-xr-x - norman supergroup 0 2018-11-02 01:17 /user/norman/test

norman@HNClient:~$ hadoop fs -mkdir hdfs://hnname:10001/data/small
norman@HNClient:~$ hadoop fs -mkdir hdfs://hnname:10001/data/big

网页打开http://192.168.1.65:50070

http://192.168.1.65:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=/

norman@HNClient:~$ hadoop fs -rmr test (测试删除)
Deleted hdfs://HNName:10001/user/norman/test

norman@HNClient:~$ hadoop fs -moveFromLocal /home/norman/data/book/war_and_peace.txt hdfs://hnname:10001/data/small/war_and_peace.txt

可以看到以下数据

norman@HNClient:~$ hadoop fs -copyToLocal hdfs://hnname:10001/data/small/war_and_peace.txt /home/norman/data/book/war_and_peace.bak.txt (测试复制到本地)

norman@HNClient:~$ hadoop fs -put /home/norman/data/weather hdfs://hnname:10001/data/big

可以看到以下数据

norman@HNClient:~$ hadoop dfsadmin -report
Configured Capacity: 19033165824 (17.73 GB)
Present Capacity: 13114503168 (12.21 GB)
DFS Remaining: 12005150720 (11.18 GB)
DFS Used: 1109352448 (1.03 GB)
DFS Used%: 8.46%
Under replicated blocks: 19
Blocks with corrupt replicas: 0
Missing blocks: 0


Datanodes available: 1 (1 total, 0 dead)

Name: 192.168.1.65:50010
Decommission Status : Normal
Configured Capacity: 19033165824 (17.73 GB)
DFS Used: 1109352448 (1.03 GB)
Non DFS Used: 5918662656 (5.51 GB)
DFS Remaining: 12005150720(11.18 GB)
DFS Used%: 5.83%
DFS Remaining%: 63.07%
Last contact: Fri Nov 02 01:49:43 GMT-08:00 2018

norman@HNClient:~$ hadoop dfsadmin -safemode enter (upgrade的时候,需要用到safemode)
Safe mode is ON

norman@HNClient:~$ hadoop dfsadmin -safemode leave
Safe mode is OFF

在HNName上操作
norman@HNName:~$ hadoop fsck -blocks
Status: HEALTHY
Total size: 1100586452 B
Total dirs: 13
Total files: 4
Total blocks (validated): 19 (avg. block size 57925602 B)
Minimally replicated blocks: 19 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 19 (100.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 1.0
Corrupt blocks: 0
Missing replicas: 38 (200.0 %)
Number of data-nodes: 1
Number of racks: 1
FSCK ended at Fri Nov 02 01:54:46 GMT-08:00 2018 in 1049 milliseconds

The filesystem under path '/' is HEALTHY

norman@HNName:~$ hadoop fsck /data/big
Status: HEALTHY
Total size: 1097339705 B
Total dirs: 2
Total files: 2
Total blocks (validated): 17 (avg. block size 64549394 B)
Minimally replicated blocks: 17 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 17 (100.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 1.0
Corrupt blocks: 0
Missing replicas: 34 (200.0 %)
Number of data-nodes: 1
Number of racks: 1
FSCK ended at Fri Nov 02 19:33:55 GMT-08:00 2018 in 14 milliseconds

The filesystem under path '/data/big' is HEALTHY


分享文章:初识Hadoop
URL地址:http://gzruizhi.cn/article/jsjges.html

其他资讯