docker pull harisekhon/hbase:1.3
docker run -d --name hbase001 -p 16010:16010 harisekhon/hbase:1.3
进入环境
docker exec -it hbase001 bash
hbase shell
按照一个特定的值来查找
hbase(main):003:0> scan 'students',{FILTER=>"ValueFilter(=,'binary:20202200')"}
ROW COLUMN+CELL
2 column=info:snum, timestamp=1668598464963, value=20202200
1 row(s) in 0.0130 seconds
hbase(main):004:0> scan 'students',{FILTER=>"ValueFilter(=,'binary:19911005')"}
ROW COLUMN+CELL
5 column=info:snum, timestamp=1668598985356, value=19911005
1 row(s) in 0.0110 seconds
hbase(main):005:0> scan 'students',{FILTER=>"ValueFilter(=,'binary:20030127')"}
ROW COLUMN+CELL
3 column=info:snum, timestamp=1668599030042, value=20030127
1 row(s) in 0.0140 seconds
hbase(main):006:0> scan 'students',{FILTER=>"ValueFilter(=,'binary:20192200')"}
ROW COLUMN+CELL
1 column=info:snum, timestamp=1668598491261, value=20192200
1 row(s) in 0.0120 seconds
hbase(main):007:0> scan 'students',{FILTER=>"ValueFilter(=,'binary:19940411')"}
ROW COLUMN+CELL
4 column=info:snum, timestamp=1668599000006, value=19940411
1 row(s) in 0.0110 seconds
hbase的命令
hbase> t.get 'r1'
hbase> t.get 'r1', {TIMERANGE => [ts1, ts2]}
hbase> t.get 'r1', {COLUMN => 'c1'}
hbase> t.get 'r1', {COLUMN => ['c1', 'c2', 'c3']}
hbase> t.get 'r1', {COLUMN => 'c1', TIMESTAMP => ts1}
hbase> t.get 'r1', {COLUMN => 'c1', TIMERANGE => [ts1, ts2], VERSIONS => 4}
hbase> t.get 'r1', {COLUMN => 'c1', TIMESTAMP => ts1, VERSIONS => 4}
hbase> t.get 'r1', {FILTER => "ValueFilter(=, 'binary:abc')"}
hbase> t.get 'r1', 'c1'
hbase> t.get 'r1', 'c1', 'c2'
hbase> t.get 'r1', ['c1', 'c2']
hbase> t.get 'r1', {CONSISTENCY => 'TIMELINE'}
hbase> t.get 'r1', {CONSISTENCY => 'TIMELINE', REGION_REPLICA_ID => 1}
修改列族
首先修改列族的参数信息,如修改列族的版本。例如上面的 Student 表,假设它的列族 Grades 的 VERSIONS 为 1,但是实际可能需要保存最近的 3 个版本,可使用以下命令完成:
alter 'Student', {NAME => 'Grades', VERSIONS => 3}
修改多个列族的参数,形式与 create 命令类似。
增加列族
如果需要在 Student 表中新增一个列族 hobby,使用以下命令:
alter ’student‘,'hobby'
删除列族
如果要移除或者删除已有的列族,以下两条命令均可完成:
alter 'student','delete'=>'hobby'
alter 'student',{name =>'hobby',method =>'delete'}
另外,HBase 表至少要包含一个列族,因此当表中只有一个列族时,无法将其删除。
- describe命令查看“student”表的基本信息:describe 'student
- 进入容器 docker exec -it hbase001 bash
- 进入shell环境。hbase shell
- 按exit,退出hbase shell的环境
- 按ctrl+d 退出环境,来到虚拟机linux
docker搭建hadoop环境
拉取镜像
docker pull kiwenlau/hadoop:1.0
创建网络
docker network create --driver=bridge hadoop
如果失败
可以试试
docker network create --driver=nat hadoop
启动hadoop-master容器节点
docker run -itd --net=hadoop -p 50070:50070 -p 8088:8088 -p 9000:9000 --name hadoop-master --hostname hadoop-master kiwenlau/hadoop:1.0
启动hadoop-slave1容器节点
docker run -itd --net=hadoop --name hadoop-slave1 --hostname hadoop-slave1 kiwenlau/hadoop:1.0
启动hadoop-slave2容器节点
docker run -itd --net=hadoop --name hadoop-slave2 --hostname hadoop-slave2 kiwenlau/hadoop:1.0
进入hadoop-master容器并启动hadoop
docker exec -it hadoop-master bash
./start-hadoop.sh
查看hadoop进程
jps
创建文件夹(以自己名字命名)
hdfs dfs -mkdir -p /jyj
本地创建文本文件
echo "Hello word"> jyj1.txt
echo "Hello Hadoop"> jyj2.txt
hdfs dfs -put * /jyj
hdfs dfs -ls /jyj
查看Web管理页面
http://localhost:50070
http://localhost:8088
2、Docker搭建Spark环境
拉取镜像
docker pull bde2020/spark-master
启动spark
docker run -itd --name spark-master -h spark-master -e ENABLE_INIT_DAEMON=false bde2020/spark-master
进入spark容器内
docker exec -it spark-master bash
cd /spark
echo "Hello world,Hello docker,Hello spark">word.txt
cd bin
./pyspark
Spark简单编写词频统计程序
val textFile=sc.textFile('file:///spark/word.txt')
textFile.first()
textFile.count()
val linesWithSpark = textFile.filter(line => line.contains("Spark"))
linesWithSpark.count()
val wordCounts.collect() = textFile.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey((a, b) => a + b)
wordCounts.collect()
Ctrl +D退出pyspark
Spark-SQL简单程序编写
cd /spark/bin
./spark-shell
import org.apache.spark.sql.SparkSession
import spark.implicits._
val spark=SparkSession.builder().getOrCreate()
val df = spark.read.json("file:///spark/examples/src/main/resources/people.json")
df.show()
df.select(df("name"),df("age")+1).show()
df.sort(df("age").desc).show()