NoSQL数据库原理与应用综合项目——MongoDB篇
文章目录
- NoSQL数据库原理与应用综合项目——MongoDB篇
- 0、 写在前面
- 1、本地数据或HDFS数据导入到MongoDB
- 2、MongoDB数据库表操作
- 2.1 Java API 连接MongoDB
- 2.2 查询数据
- 2.3 插入数据
- 2.4 修改数据
- 2.5 删除数据
- 3、MongoDB数据库表进行数据分析
- 3.1 统计各个类型书籍数量
- 3.2 统计计算机类书籍中个别书籍的数量
- 3.3 统计著名文人书籍数量
- 4、Windows远程连接MongoDB(Linux)
- 5、数据及源代码
- 6、总结
0、 写在前面
- Windos版本:
Windows10
- Linux版本:
Ubuntu Kylin 16.04
- JDK版本:
Java8
- Hadoop版本:
Hadoop-2.7.1
- HBase版本:
HBase-1.1.5
- Zookeepr版本:使用HBase自带的ZK
- Redis版本:
Redis-3.2.7
- MongoDB版本:
MongoDB-3.2.7
- Neo4j版本:
Neo4j-3.5.12 Community
- IDE:
IDEA 2020.2.3
- IDE:
Pycharm 2021.1.3
1、本地数据或HDFS数据导入到MongoDB
- 代码:
创建db_books数据库以及集合tb_books
> use db_books;
switched to db db_books
> db
db_books
> db.createCollection('tb_books')
{ "ok" : 1 }
> show collections;
tb_books
为方便将数据导入MongoDB,需要将从MySQL导入的
txt
文件数据转换成JSON格式
此处使用Python处理,代码如下:
import os, json
with open("E:\\ tb_book.txt", encoding='utf-8') as file :
lines = file.readlines()
tolist = list()
for data in lines :
res = {}
data = data.strip("\n")
str = data.split("\t")
res['id'] = str[0]
res['type'] = str[1]
res['name'] = str[2]
res['author'] = str[3]
res['price'] = str[4]
res['discount'] = str[5]
res['pub_time'] = str[6]
res['pricing'] = str[7]
res['publisher'] = str[8]
res['crawler_time'] = str[9]
tolist.append(res)
with open("E:\\ tb_book_out.txt", "w", encoding='utf-8') as outFile :
jsonData = json.dumps(tolist)
for row in jsonData :
outFile.write(row)
- 导入数据到MongoDB
zhangsan@node01:/usr/local/mongodb-3.2.7/bin$ ./mongoimport --db db_books --collection tb_books --type json --jsonArray --file /home/zhangsan/windowsUpload/data/tb_books.json
- 运行成功图示:
数据总条数:
- 结果图:
2、MongoDB数据库表操作
2.1 Java API 连接MongoDB
- 设置控制台日志输出信息,设置为
Level.SEVERE
代码具体见init()
方法即可
- 代码:
private static String dbName = "db_books";
private static String collectionName = "tb_books";
private static MongoClient client;
private static MongoDatabase db;
private static MongoCollection<Document> collection;
public static void init() {
try {
client = new MongoClient("10.125.0.15");
// client = new MongoClient("localhost");
db = client.getDatabase(dbName);
collection = db.getCollection(collectionName);
// TODO 设置console日志级别
Logger mongoLogger = Logger.getLogger("org.mongodb.driver");
mongoLogger.setLevel(Level.SEVERE);
} catch (Exception e) {
e.printStackTrace();
}
}
2.2 查询数据
- 查询前5条数据
/**
* TODO 查询前5条数据
*/
public static void queryTop5() {
FindIterable<Document> documents = collection.find().limit(5);
for (Document document : documents) {
System.out.println(document);
}
client.close();
}
- 按属性查询数据
/**
* TODO 按属性查询数据
* @param fieldName
* @param fieldValue
*/
public static void queryOne(String fieldName, String fieldValue) {
int size = 0;
Bson filter = Filters.eq(fieldName, fieldValue);
FindIterable findIterable = collection.find(filter);
MongoCursor cursor = findIterable.iterator();
try {
if (!cursor.hasNext()) {
System.out.println("Don't find what you want to query!");
}
while (cursor.hasNext()) {
String next = cursor.next().toString();
String[] split = next.split(",");
for (int i = 0; i < split.length; i++) {
if (i == 0) {
String prefix = split[i].substring(0, 9);
System.out.println(prefix);
System.out.println(" " + split[i].substring(10, 14) + "ObjectId(\"" +
split[i].substring(14) + "\")" + ",");
} else if (i < split.length - 1){
System.out.println(split[i] + ",");
} else {
System.out.println(split[i].substring(0, split[i].length() - 2));
}
}
size++;
if (size == 1 && !cursor.hasNext()) {
System.out.println("}");
} else {
System.out.println("},");
}
}
} catch (Exception e) {
e.printStackTrace();
}
client.close();
}
2.3 插入数据
- 插入一条数据
/**
* TODO 插入一条数据
* @param values
*/
public static void insertOne(String[] values) {
try {
Document document = new Document()
.append("id", values[0])
.append("type", values[1])
.append("name", values[2])
.append("author", values[3])
.append("price", values[4])
.append("discount", values[5])
.append("pub_time", values[6])
.append("pricing", values[7])
.append("publisher", values[8])
.append("crawler_time", values[9]);
collection.insertOne(document);
System.out.println("document insert successfully!");
} catch (MongoException me) {
System.err.println("Unable to insert due to an error: " + me);
}
client.close();
}
- 插入多条数据
/**
* TODO 批量插入数据
* @param valuesList
*/
public static void insertBatch(List<String[]> valuesList) {
int size = 1;
List<Document> list = new ArrayList<Document>();
for (String[] values : valuesList) {
list.add(new Document()
.append("id", values[0])
.append("type", values[1])
.append("name", values[2])
.append("author", values[3])
.append("price", values[4])
.append("discount", values[5])
.append("pub_time", values[6])
.append("pricing", values[7])
.append("publisher", values[8])
.append("crawler_time", values[9])
);
if (list.size() == 2) {
try {
collection.insertMany(list);
size *= 2;
list.clear();
} catch (MongoException me) {
System.err.println("Unable to insert due to an error: " + me);
}
}
}
if (!list.isEmpty()) {
try {
collection.insertMany(list);
size += 1;
list.clear();
} catch (MongoException me) {
System.err.println("Unable to insert due to an error: " + me);
}
}
System.out.println("Inserted" + size + " document successfully!");
client.close();
}
2.4 修改数据
- 修改一条数据
/**
* TODO 修改一条数据
* @param fieldName
* @param fieldValue
* @param newValue
*/
public static void updateOne(String fieldName, String fieldValue, String newValue) {
// update one document
Bson filter = eq(fieldName, fieldValue);
try {
collection.updateOne(filter, new Document("$set", new Document(fieldName, newValue)));
System.out.println("update one document successfully!");
} catch (MongoException me) {
System.err.println("Unable to update due to an error: " + me);
}
client.close();
}
- 修改多条数据
/**
* TODO 修改多条数据
* @param list
*/
public static void updateMulti(List<String[]> list) {
// update one document
int size = 0;
for (String[] line : list) {
String fieldName = line[0];
String fieldValue = line[1];
String newValue = line[2];
Bson filter = eq(fieldName, fieldValue);
UpdateResult updateResult = collection.updateMany(filter, new Document("$set", new Document(fieldName, newValue)));
size += updateResult.getModifiedCount();
}
System.out.println("update " + size + " document successfully!");
client.close();
}
2.5 删除数据
- 删除一个文档
/**
* TODO 删除一个文档
* @param fieldName
* @param fieldValue
*/
public static void dropOne(String fieldName, String fieldValue) {
// TODO delete one document
// fieldName = name,fieldValue = 老舍
Bson exists = exists(fieldName);
// 申明删除条件
Bson filter = eq(fieldName, fieldValue);
try {
DeleteResult result = collection.deleteOne(filter);
System.out.println("Deleted document count: " + result.getDeletedCount());
} catch (MongoException me) {
System.err.println("Unable to delete due to an error: " + me);
}
client.close();
}
- 删除多个文档
/**
* TODO 删除多个文档
* @param list
*/
public static void dropMany(List<String[]> list) {
int size = 0;
for (String[] line : list) {
String fieldName = line[0];
String fieldValue = line[1];
Bson filter = Filters.eq(fieldName, fieldValue);
try {
DeleteResult result = collection.deleteMany(filter);
size += result.getDeletedCount();
// System.out.println("Deleted document count: " + result.getDeletedCount());
} catch (MongoException me) {
System.err.println("Unable to delete due to an error: " + me);
}
}
System.out.println("Deleted document count: " + size);
client.close();
}
3、MongoDB数据库表进行数据分析
3.1 统计各个类型书籍数量
> db.tb_books.aggregate([ { $group : { _id : "$type", total: { $sum: 1 } } } ])
{ "_id" : "type", "total" : 1 }
{ "_id" : "小说", "total" : 2740 }
{ "_id" : "计算机", "total" : 3035 }
{ "_id" : "社会科学", "total" : 2268 }
{ "_id" : "文学", "total" : 2761 }
{ "_id" : "科普百科", "total" : 2714 }
3.2 统计计算机类书籍中个别书籍的数量
> db.tb_books.find({name:{$regex:"java"}}).count()
5
> db.tb_books.find({name:{$regex:"算法"}}).count()
10
> db.tb_books.find({name:{$regex:"数据库"}}).count()
149
> db.tb_books.find({name:{$regex:"C++"}}).count()
761
> db.tb_books.find({name:{$regex:"Web"}}).count()
39
3.3 统计著名文人书籍数量
> db.tb_books.find({author:{$regex:"老舍"}}).count()
28
> db.tb_books.find({author:{$regex:"冰心"}}).count()
13
> db.tb_books.find({author:{$regex:"鲁迅"}}).count()
40
> db.tb_books.find({author:{$regex:"东野圭吾"}}).count()
6
> db.tb_books.find({author:{$regex:"太宰治"}}).count()
3
> db.tb_books.find({author:{$regex:"史铁生"}}).count()
6
4、Windows远程连接MongoDB(Linux)
MongoDB的相关配置文件需要提前设置正确,最主要的就是ip地址的设置,同时要注意防火墙是否关闭。
5、数据及源代码
-
Github
-
Gitee
6、总结
由于数据量只有1万多条,将数据导入MongoDB并不难,直接使用mongoimport
命令即可,导入的时候注意数据的格式,默认是JSON格式的数据,当然也可以使用CSV格式的数据。
在删除、修改数据时,主要是要先匹配到条件,使用Filters.eq()
方法即可,接着执行相应的操作,同时注意数据应该是文档类型new Document()
例如:
- find()
- insertOne()
- insertMany()
- updateOne()
- updateMany()
- deleteOne()
- deleteMany()
结束!