什么是Protobuf?
\qquad Protobuf(Protocol Buffer)是 Google 开发的一套数据存储传输协议,作用就是将数据进行序列化后再传输,Protobuf 编码是二进制的,它不是可读的,也不容易手动修改,因此它增加了分析或修改数据的难度。同时Protobuf 能够把数据压缩得很小,从而提高传输效率。通俗的理解就是Protobuf跟json序列化是类似的,只不过实现的方法不同而已。
安装Protobuf
\qquad 点击下载对应的版本,然后解压,并加入环境变量。
序列化与反序列化
\qquad Protobuf序列化需要开发人员在 .proto 文件中自定义消息格式,使用protobuf 编译器(protoc)选择需要的语言生成消息处理文件,也可以在 官网一键生成,用生成的文件就能进行序列化与反序列化。
\qquad 下面将举例说明如何通过js逆向来进行反序列化,目标网址:aHR0cHM6Ly93d3cueGlhb2hvbmdzaHUuY29tL2V4cGxvcmUvNjRkYzg2OGEwMDAwMDAwMDBhMDFiZDgz。
\qquad 打开目标网址,F12抓包,collect接口的请求参数是base64编码的,
解码后的数据是这样的,
춐]
6discovery-undefined0.0.00:
xhs-pc-webB3.5.2pm
5bc331f43e6e73244d2b51c2999b1e02HyYjdqDYqjyF8yYjdqDYq2I24qyKAfI4WlxWh7idWx1y1vK28SqduD0888yW2yWj8DDiqd0qy"
61c3e3e9000000001000d0df*264dc868a000000000a01bd83p:B
$2cd55f67-ae5a-446a-9571-cb81e171d8360J167Xຊִx˅1BJ
$9bab7cd2-3eae-4469-9553-06cc2e5c8492oMozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36<https://www.xiaohongshu.com/explore/64dc868a000000000a01bd83"/explore/:noteId*Lhttps://www.xiaohongshu.com/explore/64e19dc2000000000103c666?m_source=pinpaiZ"
64dc868a000000000a01bd83rlink
可以看出有一些乱码在里面,这个时候其实还无法判断是否用了protobuf序列化,一些网站可以查看协议头的content-type,如下图所示就是使用protobuf。
但是目标网站对序列化结果进行了base64编码,所以协议头的content-type跟正常的请求是一样的。
这种情况就得通过动态调试来看看这到底是什么玩意,查看调用堆栈,定位到可疑代码,在此处打上断点。
单步跟进去,图示位置打上断点。
单步跟进,来到关键位置,到这里特征就很明显了,”proto“、”serializeBinary“等关键字就是protobuf的显著特征。
接下来就可以根据源码中的规律来自定义proto文件,在此之前需要了解一下proto文件的语法格式以及数据类型,篇幅有限大佬们可以查看别的教程,本文只侧重逆向部分。
编写.proto文件
\qquad 如下图所示,目标网站的消息格式是一个Tracker消息里有很多的子消息,有APP、Mobile、Device等。
我们可以根据这个写出最外层的proto,
syntax = "proto3";
package xhs;
message Tracker {
repeated APP app = 1;
repeated Mobile mobile = 2;
repeated Device device = 3;
repeated User user = 4;
repeated Network network = 5;
repeated Page page = 6;
repeated Event event = 7;
repeated Browser browser = 9;
repeated NoteTarget noteTarget = 11;
repeated NoteCommentTarget noteCommentTarget = 12;
repeated TagTarget tagTarget = 13;
repeated UserTarget userTarget = 14;
repeated MallBannerTarget mallBannerTarget = 15;
repeated MallGoodsTarget mallGoodsTarget = 16;
repeated MallVendorTarget mallVendorTarget = 17;
repeated MallCouponTarget mallCouponTarget = 18;
repeated SearchTarget searchTarget = 30;
repeated BrandingUserTarget brandingUserTarget = 40;
repeated BrowserTarget browserTarget = 51;
repeated ChannelTabTarget channelTabTarget = 100;
repeated MessageTarget messageTarget = 151;
repeated AdsTarget adsTarget = 152;
repeated HeyTarget heyTarget = 153;
repeated DebugTarget debugTarget = 154;
repeated ActivityTarget activityTarget = 157;
repeated LiveTarget liveTarget = 164;
repeated CircleTarget circleTarget = 167;
repeated GrowthPetTaskTarget growthPetTaskTarget = 195;
repeated HideType hideType = 197;
repeated WebTarget webTarget = 219;
}
然后单步进入proto.App.serializeBinaryToWriter,写出App的proto。
message APP {
enum NameTracker {
DEFAULT_1 = 0;
IOST = 1;
ANDRT = 2;
RNT = 3;
MPT = 4;
WAPT = 5;
WXMPT = 6;
BDMPT = 7;
TTMPT = 8;
QQMPT = 9;
APMPT = 10;
MINI_ANDRT = 11;
}
NameTracker nameTracker = 1;
string AppVersion = 2;
string TrackerVersion = 3;
string SessionId = 4;
string AppMarket = 5;
enum Platform {
DEFAULT_13 = 0;
IOS = 1;
ANDROID = 2;
REACTNATIVE = 3;
MOBILEBROWSER = 4;
WECHATBROWSER = 5;
WECHATMINIPROGRAM = 6;
PC = 7;
IOSBROWSER = 8;
ANDROIDBROWSER = 9;
FLUTTER = 10;
};
Platform platform = 6;
string ArtifactName = 7;
string ArtifactVersion = 8;
enum AppMode {
app_mode = 0;
};
AppMode appMode = 9;
string LaunchId = 10;
string MpScene = 11;
string AppStartMode = 12;
string BuildVersion = 13;
int32 EventSeqIdInSession = 14;
bool DarkMode = 15;
string StartupId = 16;
enum Orientation {
DEFAULT_60 = 0;
PORTRAIT = 1;
LANDSCAPE = 2;
LANDSCAPE_SPLIT = 3;
PORTRAIT_SPLIT = 4;
PORTRAIT_SPLIT_MAGIC = 5;
LANDSCAPE_SPLIT_MAGIC = 6;
LANDSCAPE_MAGIC = 7;
PORTRAIT_MAGIC = 8;
};
Orientation orientation = 17;
string BuildId = 1001;
string Package = 1002;
string AppName = 1003;
string SdkName = 1004;
string SdkVersion = 1005;
enum Environment {
DEFAULT_64 = 0;
ENVIRONMENT_DEVELOP = 1;
ENVIRONMENT_RELEASE = 2;
};
Environment environment = 1006;
int64 ColdStartId = 1007;
bool IsTeenagerMode = 1008;
string DeviceType = 1009;
}
enum 数据类型就是提前为字段预设定一些值,可以通过关键字搜索在源码中找到预设的值。
依葫芦画瓢就能写出完整的.proto文件,这个时候我们就可以生成任何语言的消息处理文件,以python为例,写好之后执行命令”protoc --python_out=. ./collect.proto“就会生成一个py文件,测试一下反序列化,
import base64
from utils import collect_pb2
a = 'jgXsthBdCjYIBRITZGlzY292ZXJ5LXVuZGVmaW5lZBoFMC4wLjAwBzoKeGhzLXBjLXdlYkIFMy41LjJwgwESABptCiA1YmMzMzFmNDNlNmU3MzI0NGQyYjUxYzI5OTliMWUwMooBSHlZamRxRFlxanlGOHlZamRxRFlxMkkyNHF5S0FmSTRXbHhXaDdpZFd4MXkxdksyOFNxZHVEMDg4OHlXMnlXajhERGlxZDBxeSIaChg2MWMzZTNlOTAwMDAwMDAwMTAwMGQwZGYqAggEMh0I+xcSGDY0ZGM4NjhhMDAwMDAwMDAwYTAxYmQ4MzpECiRjMThhYzliYS1mY2JiLTQ3YTYtOTMwOC1hMTM4MGVmZTQ1YzIgATAfSgMxMzFY4NOG49T0gAN4z8UBiALs1eGxojFKtQIKJDliYWI3Y2QyLTNlYWUtNDQ2OS05NTUzLTA2Y2MyZTVjODQ5MhJvTW96aWxsYS81LjAgKFdpbmRvd3MgTlQgMTAuMDsgV2luNjQ7IHg2NCkgQXBwbGVXZWJLaXQvNTM3LjM2IChLSFRNTCwgbGlrZSBHZWNrbykgQ2hyb21lLzExNS4wLjAuMCBTYWZhcmkvNTM3LjM2GjxodHRwczovL3d3dy54aWFvaG9uZ3NodS5jb20vZXhwbG9yZS82NGRjODY4YTAwMDAwMDAwMGEwMWJkODMiEC9leHBsb3JlLzpub3RlSWQqTGh0dHBzOi8vd3d3LnhpYW9ob25nc2h1LmNvbS9leHBsb3JlLzY0ZTE5ZGMyMDAwMDAwMDAwMTAzYzY2Nj9tX3NvdXJjZT1waW5wYWlaIgoYNjRkYzg2OGEwMDAwMDAwMDBhMDFiZDgzEAFyBGxpbms='
b = base64.urlsafe_b64decode(a)
tracker = collect_pb2.Tracker()
tracker.ParseFromString(b[4::])
print(tracker)
此时已经可以成功的反序列化了,需要特殊说明的是base解码的时候必须要用urlsafe_b64decode方法,因为原始数据里面有url,解码后的字节数据去掉了前面4个字节,因为在编码的时候在前面加了四个无用字节。
很多教程会说用fd抓包下载bin,然后命令行 protoc --decode_raw < 1.bin执行,解析protobuf数据结构,根据这个结构写proto,这种方法只适合大佬用,对于刚接触protobuf的人来说如果看到这种教程就会掉入无底深坑。
本文只用来交流学习,关键信息均已脱敏,如有侵权请联系删除。
欢迎大家进扣群交流学习:OTQwNDQ3ODg5