【他山之石】优化 JavaScript 的乐趣与价值(上)

news2025/1/9 2:03:55

前言
这是前几天偶然看到的一篇硬核推文。作者一口气分了 12 个主题探讨了 JavaScript 在优化时应该注意的要点,读后深受启发。由于篇幅较长,分两篇发表。本篇为上篇。

文章目录

  • Optimizing Javascript for fun and for profit
    • 0. Avoid work
    • 1. Avoid string comparisons
            • About benchmarks
    • 2. Avoid different shapes
            • A note on terminology
        • What the eff should I do about this?
            • Number representation
    • 3. Avoid array/object methods
    • 4. Avoid indirection
    • 5. Avoid cache misses
      • 5.1 Prefetching
        • What the eff should I do about this?
      • 5.2 Caching in L1/2/3
        • What the eff should I do about this?
            • About immutable data structures
            • About the { ...spread } operator

Optimizing Javascript for fun and for profit


by romgrk

I often feel like javascript code in general runs much slower than it could, simply because it’s not optimized properly. Here is a summary of common optimization techniques I’ve found useful. Note that the tradeoff for performance is often readability, so the question of when to go for performance versus readability is a question left to the reader. I’ll also note that talking about optimization necessarily requires talking about benchmarking. Micro-optimizing a function for hours to have it run 100x faster is meaningless if the function only represented a fraction of the actual overall runtime to start with. If one is optimizing, the first and most important step is benchmarking. I’ll cover the topic in the later points. Be also aware that micro-benchmarks are often flawed, and that may include those presented here. I’ve done my best to avoid those traps, but don’t blindly apply any of the points presented here without benchmarking.

I have included runnable examples for all cases where it’s possible. They show by default the results I got on my machine (brave 122 on archlinux) but you can run them yourself. As much as I hate to say it, Firefox has fallen a bit behind in the optimization game, and represents a very small fraction of the traffic for now, so I don’t recommend using the results you’d get on Firefox as useful indicators.

0. Avoid work

This might sound evident, but it needs to be here because there can’t be another first step to optimization: if you’re trying to optimize, you should first look into avoiding work. This includes concepts like memoization, laziness and incremental computation. This would be applied differently depending on the context. In React, for example, that would mean applying memo(), useMemo() and other applicable primitives.

1. Avoid string comparisons

Javascript makes it easy to hide the real cost of string comparisons. If you need to compare strings in C, you’d use the strcmp(a, b) function. Javascript uses === instead, so you don’t see the strcmp. But it’s there, and a string comparison will usually (but not always) require comparing each of the characters in the string with the ones in the other string; string comparison is O(n). One common JavaScript pattern to avoid is strings-as-enums. But with the advent of TypeScript this should be easily avoidable, as enums are integers by default.

// No
enum Position {
  TOP    = 'TOP',
  BOTTOM = 'BOTTOM',
}
// Yeppers
enum Position {
  TOP,    // = 0
  BOTTOM, // = 1
}

Here is a comparison of the costs:

// 1. string compare
const Position = {
  TOP: 'TOP',
  BOTTOM: 'BOTTOM',
}
 
let _ = 0
for (let i = 0; i < 1000000; i++) {
  let current = i % 2 === 0 ?
    Position.TOP : Position.BOTTOM
  if (current === Position.TOP)
    _ += 1
}
// 2. int compare
const Position = {
  TOP: 0,
  BOTTOM: 1,
}
 
let _ = 0
for (let i = 0; i < 1000000; i++) {
  let current = i % 2 === 0 ?
    Position.TOP : Position.BOTTOM
  if (current === Position.TOP)
    _ += 1
}

benchmark1

About benchmarks

Percentage results represent the number of operations completed within 1s, divided by the number of operations of the highest scoring case. Higher is better.

As you can see, the difference can be significant. The difference isn’t necessarily due to the strcmp cost as engines can sometimes use a string pool and compare by reference, but it’s also due to the fact that integers are usually passed by value in JS engines, whereas strings are always passed as pointers, and memory accesses are expensive (see section 5). In string-heavy code, this can have a huge impact.

For a real-world example, I was able to make this JSON5 javascript parser run 2x faster just by replacing string constants with numbers. Unfortunately it wasn’t merged, but that’s how open-source is.

2. Avoid different shapes

Javascript engines try to optimize code by assuming that objects have a specific shape, and that functions will receive objects of the same shape. This allows them to store the keys of the shape once for all objects of that shape, and the values in a separate flat array. To represent it in javascript:

const objects = [
  {
    name: 'Anthony',
    age: 36,
  },
  {
    name: 'Eckhart',
    age: 42
  },
]
const shape = [
  { name: 'name', type: 'string' },
  { name: 'age',  type: 'integer' },
]
 
const objects = [
  ['Anthony', 36],
  ['Eckhart', 42],
]
A note on terminology

I have used the word “shape” for this concept, but be aware that you may also find “hidden class” or “map” used to describe it, depending on the engine.

For example, at runtime if the following function receives two objects with the shape { x: number, y: number }, the engine is going to speculate that future objects will have the same shape, and generate machine code optimized for that shape.

function add(a, b) {
  return {
    x: a.x + b.x,
    y: a.y + b.y,
  }
}

If one would instead pass an object not with the shape { x, y } but with the shape { y, x }, the engine would need to undo its speculation and the function would suddenly become considerably slower. I’m going to limit my explanation here because you should read the excellent post from mraleph if you want more details, but I’m going to highlight that V8 in particular has 3 modes, for accesses that are: monomorphic (1 shape), polymorphic (2-4 shapes), and megamorphic (5+ shapes). Let’s say you really want to stay monomorphic, because the slowdown is drastic:

// setup
let _ = 0
// 1. monomorphic
const o1 = { a: 1, b: _, c: _, d: _, e: _ }
const o2 = { a: 1, b: _, c: _, d: _, e: _ }
const o3 = { a: 1, b: _, c: _, d: _, e: _ }
const o4 = { a: 1, b: _, c: _, d: _, e: _ }
const o5 = { a: 1, b: _, c: _, d: _, e: _ } // all shapes are equal
// 2. polymorphic
const o1 = { a: 1, b: _, c: _, d: _, e: _ }
const o2 = { a: 1, b: _, c: _, d: _, e: _ }
const o3 = { a: 1, b: _, c: _, d: _, e: _ }
const o4 = { a: 1, b: _, c: _, d: _, e: _ }
const o5 = { b: _, a: 1, c: _, d: _, e: _ } // this shape is different
// 3. megamorphic
const o1 = { a: 1, b: _, c: _, d: _, e: _ }
const o2 = { b: _, a: 1, c: _, d: _, e: _ }
const o3 = { b: _, c: _, a: 1, d: _, e: _ }
const o4 = { b: _, c: _, d: _, a: 1, e: _ }
const o5 = { b: _, c: _, d: _, e: _, a: 1 } // all shapes are different
// test case
function add(a1, b1) {
  return a1.a + a1.b + a1.c + a1.d + a1.e +
         b1.a + b1.b + b1.c + b1.d + b1.e }
 
let result = 0
for (let i = 0; i < 1000000; i++) {
  result += add(o1, o2)
  result += add(o3, o4)
  result += add(o4, o5)
}

benchmark2

What the eff should I do about this?

Easier said than done but: create all your objects with the exact same shape. Even something as trivial as writing your React component props in a different order can trigger this.

For example, here are simple cases I found in React’s codebase, but they already had a much higher impact case of the same problem a few years ago because they were initializing an object with an integer, then later storing a float. Yes, changing the type also changes the shape. Yes, there are integer and float types hidden behind number. Deal with it.

Number representation

Engines can usually encode integers as values. For example, V8 represents values in 32 bits, with integers as compact Smi (SMall Integer) values, but floats and large integers are passed as pointers just like strings and objects. JSC uses a 64 bit encoding, double tagging, to pass all numbers by value, as SpiderMonkey does, and the rest is passed as pointers.

3. Avoid array/object methods

I love functional programming as much as anyone else, but unless you’re working in Haskell/OCaml/Rust where functional code gets compiled to efficient machine code, functional will always be slower than imperative.

const result =
  [1.5, 3.5, 5.0]
    .map(n => Math.round(n))
    .filter(n => n % 2 === 0)
    .reduce((a, n) => a + n, 0)

The problem with those methods is that:

  1. They need to make a full copy of the array, and those copies later need to be freed by the garbage collector. We will explore more in details the issues of memory I/O in section 5.
  2. They loop N times for N operations, whereas a for loop would allow looping once.
// setup:
const numbers = Array.from({ length: 10_000 }).map(() => Math.random())
// 1. functional
const result =
  numbers
    .map(n => Math.round(n * 10))
    .filter(n => n % 2 === 0)
    .reduce((a, n) => a + n, 0)
// 2. imperative
let result = 0
for (let i = 0; i < numbers.length; i++) {
  let n = Math.round(numbers[i] * 10)
  if (n % 2 !== 0) continue
  result = result + n
}

benchmark3

Object methods such as Object.values(), Object.keys() and Object.entries() suffer from similar problems, as they also allocate more data, and memory accesses are the root of all performance issues. No really I swear, I’ll show you in section 5.

4. Avoid indirection

Another place to look for optimization gains is any source of indirection, of which I can see 3 main sources:

const point = { x: 10, y: 20 }
 
// 1.
// Proxy objects are harder to optimize because their get/set function might
// be running custom logic, so engines can't make their usual assumptions.
const proxy = new Proxy(point, { get: (t, k) => { return t[k] } })
// Some engines can make proxy costs disappear, but those optimizations are
// expensive to make and can break easily.
const x = proxy.x
 
// 2.
// Usually ignored, but accessing an object via `.` or `[]` is also an
// indirection. In easy cases, the engine may very well be able to optimize the
// cost away:
const x = point.x
// But each additional access multiplies the cost, and makes it harder for the
// engine to make assumptions about the state of `point`:
const x = this.state.circle.center.point.x
 
// 3.
// And finally, function calls can also have a cost. Engine are generally good
// at inlining these:
function getX(p) { return p.x }
const x = getX(p)
// But it's not guaranteed that they can. In particular if the function call
// isn't from a static function but comes from e.g. an argument:
function Component({ point, getX }) {
  return getX(point)
}

The proxy benchmark is particularly brutal on V8 at the moment. Last time I checked, proxy objects were always falling back from the JIT to the interpreter, seeing from those results it might still be the case.

// 1. proxy access
const point = new Proxy({ x: 10, y: 20 }, { get: (t, k) => t[k] })
 
for (let _ = 0, i = 0; i < 100_000; i++) { _ += point.x }
// 2. direct access
const point = { x: 10, y: 20 }
const x = point.x
 
for (let _ = 0, i = 0; i < 100_000; i++) { _ += x }

benchmark4

I also wanted to showcase accessing a deeply nested object vs direct access, but engines are very good at optimizing away object accesses via escape analysis when there is a hot loop and a constant object. I inserted a bit of indirection to prevent it.

// 1. nested access
const a = { state: { center: { point: { x: 10, y: 20 } } } }
const b = { state: { center: { point: { x: 10, y: 20 } } } }
const get = (i) => i % 2 ? a : b
 
let result = 0
for (let i = 0; i < 100_000; i++) {
  result = result + get(i).state.center.point.x }
// 2. direct access
const a = { x: 10, y: 20 }.x
const b = { x: 10, y: 20 }.x
const get = (i) => i % 2 ? a : b
 
let result = 0
for (let i = 0; i < 100_000; i++) {
  result = result + get(i) }

benchmark5

5. Avoid cache misses

This point requires a bit of low-level knowledge, but has implications even in javascript, so I’ll explain. From the CPU point of view, retrieving memory from RAM is slow. To speed things up, it uses mainly two optimizations.

5.1 Prefetching

The first one is prefetching: it fetches more memory ahead of time, in the hope that it’s the memory you’ll be interested in. It always guesses that if you request one memory address, you’ll be interested in the memory region that comes right after that. So accessing data sequentially is the key. In the following example, we can observe the impact of accessing memory in random order.

// setup:
const K = 1024
const length = 1 * K * K
 
// Theses points are created one after the other, so they are allocated
// sequentially in memory.
const points = new Array(length)
for (let i = 0; i < points.length; i++) {
  points[i] = { x: 42, y: 0 }
}
 
// This array contains the *same data* as above, but shuffled randomly.
const shuffledPoints = shuffle(points.slice())
// 1. sequential
let _ = 0
for (let i = 0; i < points.length; i++) { _ += points[i].x }
// 2. random
let _ = 0
for (let i = 0; i < shuffledPoints.length; i++) { _ += shuffledPoints[i].x }

benchmark6

What the eff should I do about this?

This aspect is probably the hardest to put in practice, because javascript doesn’t have a way of placing objects in memory, but you can use that knowledge to your advantage as in the example above, for example to operate on data before re-ordering or sorting it. You cannot assume that objects created sequentially will stay at the same location after some time because the garbage collector might move them around. There is one exception to that, and it’s arrays of numbers, preferably TypedArray instances:

// from this
const points = [{ x: 0, y: 5 }, { x: 0, y: 10 }]
 
// to this
const points = new Int64Array([0, 5, 0, 10])

For a more detailed example, see this link* .
*Note that it contains some optimizations that are now outdated, but it’s still accurate overall.

5.2 Caching in L1/2/3

The second optimization CPUs use is the L1/L2/L3 caches: those are like faster RAMs, but they are also more expensive, so they are much smaller. They contain RAM data, but act as an LRU cache. Data comes in while it’s “hot” (being worked on), and is written back to the main RAM when new working data needs the space. So the key here is use as little data as possible to keep your working dataset in the fast caches. In the following example, we can observe what are the effects of busting each of the successive caches.

// setup:
const KB = 1024
const MB = 1024 * KB
 
// These are approximate sizes to fit in those caches. If you don't get the
// same results on your machine, it might be because your sizes differ.
const L1  = 256 * KB
const L2  =   5 * MB
const L3  =  18 * MB
const RAM =  32 * MB
 
// We'll be accessing the same buffer for all test cases, but we'll
// only be accessing the first 0 to `L1` entries in the first case,
// 0 to `L2` in the second, etc.
const buffer = new Int8Array(RAM)
buffer.fill(42)
 
const random = (max) => Math.floor(Math.random() * max)
// 1. L1
let r = 0; for (let i = 0; i < 100000; i++) { r += buffer[random(L1)] }
// 2. L2
let r = 0; for (let i = 0; i < 100000; i++) { r += buffer[random(L2)] }
// 3. L3
let r = 0; for (let i = 0; i < 100000; i++) { r += buffer[random(L3)] }
// 4. RAM
let r = 0; for (let i = 0; i < 100000; i++) { r += buffer[random(RAM)] }
What the eff should I do about this?

Ruthlessly eliminate every single data or memory allocations that can be eliminated. The smaller your dataset is, the faster your program will run. Memory I/O is the bottleneck for 95% of programs. Another good strategy can be to split your work into chunks, and ensure you work on a small dataset at a time.

For more details on CPU and memory, see this link.

About immutable data structures

Immutability is great for clarity and correctness, but in the context of performance, updating an immutable data structure means making a copy of the container, and that’s more memory I/O that will flush your caches. You should avoid immutable data structures when possible.

About the { …spread } operator

It’s very convenient, but every time you use it you create a new object in memory. More memory I/O, slower caches!

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/2143354.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

网络安全-LD_PRELOAD,请求劫持

目录 一、环境 二、开始做题 三、总结原理 四、如何防护 一、环境 我们这里用蚁剑自带的靶场第一关来解释 docker制作一下即可 二、开始做题 首先环境内很明显给我们已经写好了webshell 同样我们也可以访问到 我们使用这个蚁剑把这个webshell连上 我们发现命令不能执行&am…

机器学习-点击率预估-论文速读-20240916

1. [经典文章] 特征交叉: Factorization Machines, ICDM, 2010 分解机&#xff08;Factorization Machines&#xff09; 摘要 本文介绍了一种新的模型类——分解机&#xff08;FM&#xff09;&#xff0c;它结合了支持向量机&#xff08;SVM&#xff09;和分解模型的优点。与…

【C++语言】C/C++内存管理

一、C/C内存分布 我们先来看一看C/C中有哪些区域&#xff0c;为什么C/C中区分这些区域呢&#xff1f;&#xff1f;不同的数据有不同的存储需求&#xff0c;各个区域满足不同的需求。我们有临时用的数据&#xff0c;该数据是存储在栈帧区域的&#xff1b;在一些数据结构中&#…

Text-to-SQL技术升级 - 阿里云OpenSearch-SQL在BIRD榜单夺冠方法

Text-to-SQL技术升级 - 阿里云OpenSearch-SQL在BIRD榜单夺冠方法 Text-to-SQL 任务旨在将自然语言查询转换为结构化查询语言(SQL),从而使非专业用户能够便捷地访问和操作数据库。近期,阿里云的 OpenSearch 引擎凭借其一致性对齐技术,在当前极具影响力的 Text-to-SQL 任务…

【C++11 —— 异常】

C —— 异常 C语言传统的处理错误的方式C异常概念异常的使用异常的抛出和捕获异常的重新抛出异常安全异常规范 自定义异常体系自定义异常体系的目的 C标准库的异常体系异常的优缺点 C语言传统的处理错误的方式 在C语言中&#xff0c;错误处理通常依赖于返回值和全局变量的方式…

简单了解微服务--黑马(在更)

认识微服务 单体架构 不适合大型复杂项目 微服务架构 将单体结构的各个功能模块拆分为多个独立的项目 拆取的独立项目分别开发&#xff0c;在部署的时候也要分别去编译打包&#xff0c;分别去部署&#xff0c;不同的模块部署在不同的服务器上&#xff0c;对外提供不同的功能…

算法导论(第3版)

目录 第一部分 基础知识第2章 算法基础2.1 插入排序 第二部分 排序和顺序统计量第三部分 数据结构第四部分 高级设计和分析技术第五部分 高级数据结构第六部分 图算法第七部分 算法问题选编第八部分 附录&#xff1a;数学基础知识 第一部分 基础知识 第2章 算法基础 2.1 插入…

【智路】智路OS 服务组件开发

https://airos-edge.readthedocs.io/zh/latest/dev_guide/service_dev.html 1 总览 1.1 功能 感知服务包含感知的整体pipeline&#xff0c;主要模块包括单相机感知和融合。 单相机感知模块 主要功能为接收IP相机RTSP视频流&#xff0c;解码成RGB图片&#xff0c;通过算法识…

【黑马点评】已解决java.lang.NullPointerException异常

Redis学习Day3——黑马点评项目工程开发-CSDN博客 问题发现及描述 在黑马点评项目中&#xff0c;进行到使用Redis提供的Stream消息队列优化异步秒杀问题时&#xff0c;我在进行jmeter测试时遇到了重大的错误 发现无论怎么测试&#xff0c;一定会进入到catch中&#xff0c;又由…

DRS部署(DM8-DM8)

DRS部署 一、规划端口二、设置环境变量三、开启源数据库的归档和逻辑日志四、配置DDL同步五、创建用户六、 DRS服务部署&#xff08;DM8目的端&#xff09;6.1 部署 drs 服务6.2启动drs服务 七、 DRS 服务部署&#xff08;DM8 源端&#xff09;7.1 部署 DRS服务7.2 启动dmhs服务…

C++第七节课 运算符重载

一、运算符重载 并不是所有情况下都需要运算符重载&#xff0c;要看这个运算符对这个类是否有意义&#xff01; 例如&#xff1a;日期减日期可以求得两个日期之间的天数&#xff1b;但是日期 日期没有意义&#xff01; #include<iostream> using namespace std; clas…

SpringBoot启动成功,但端口启动失败

目录 一、问题展示 二、问题分析 2.1.端口与Tomcat的关系 2.2.问题分析 三、SpringBoot常见知识记录 3.1.SpringBoot项目常用jar包 3.1.1.必要性jar包 3.1.2.选择性jar包 3.2.标签的作用及取值 3.2.1.compile&#xff08;编译范围&#xff09; 3.2.2.provided…

爵士编曲:爵士鼓编写 爵士鼓笔记 底鼓和军鼓 闭镲和开镲 嗵鼓

底鼓和军鼓 底鼓通常是动的音色&#xff0c;军鼓通常是大的音色。 “动”和“大”构成基础节奏。“动大”听着不够有连接性&#xff0c;所以可以加入镲片&#xff01; 开镲 直接鼓棒敲击是开镲音色 闭镲 当脚踩下踏板&#xff0c;2个镲片合并&#xff0c;然后用鼓棒敲击&am…

Koa安装和应用

文章目录 1、Koa21.1 简介1.2 安装1.3 简单使用1.4 使用脚手架创建Koa项目 1、Koa2 1.1 简介 Koa 是一个新的 web 框架&#xff0c;由 Express 幕后的原班人马打造&#xff0c; 致力于成为 web 应用和 API 开发领域中的一个更小、更富有表现力、更健壮的基石。 通过利用 async…

rust快速创建Tauri App ——基于create-tauri-app

Tauri App Tauri是一个工具包&#xff0c;可以帮助开发人员使用现有的几乎任何前端框架为主要桌面平台制作应用程序。核心是用Rust构建的&#xff0c;CLI利用Node.js使Tauri成为创建和维护优秀应用程序的真正多语言方法。 cargo install create-tauri-appcreate-tauri-app&am…

多版本node管理工具nvm

什么是nvm&#xff1f; 在项目开发过程中&#xff0c;使用到vue框架技术&#xff0c;需要安装node下载项目依赖&#xff0c;但经常会遇到node版本不匹配而导致无法正常下载&#xff0c;重新安装node却又很麻烦。为解决以上问题&#xff0c;nvm&#xff1a;一款node的版本管理工…

FSFP——专为蛋白质工程设计的少样本学习策略

论文地址&#xff1a;通过小样本学习&#xff0c;以最少的湿实验室数据提高蛋白质语言模型的效率 参考文献&#xff1a;AI蛋白质设计“新引擎”:FSFP驱动大模型超低采样学习,少量数据显著提升蛋白质语言模型的性能 前言介绍&#xff1a;上海交通大学自然科学研究院洪亮教授课…

在STM32工程中使用Mavlink与飞控通信

本文讲述如何在STM32工程中使用Mavlink协议与飞控通信&#xff0c;特别适合自制飞控外设模块的项目。 需求来源&#xff1a; 1、增稳云台里的STM32单片机需要通过串口接收飞控传来的云台俯仰、横滚控制指令和相机拍照控制指令&#xff1b; 2、自制的有害气体采集器需要接收飞…

PCL 曲线点云提取

文章目录 一、简介二、实现代码三、实现效果参考文献一、简介 提取曲线点云的方法主要分为两种:参数化与非参数化,其中参数化是指事先直线曲线的形状,反之,非参数化则是不依赖与曲线的参数,通常是一种聚类的行为。这里我们采用非参数方法(TriplClust),将点集划分为一个未…

Java ETL - Apache Beam 简介

基本介绍 Apache Beam是一个用于大数据处理的开源统一编程模型。它允许用户编写一次代码&#xff0c;然后在多个批处理和流处理引擎上运行&#xff0c;如Apache Flink、Apache Spark和Google Cloud Dataflow等。Apache Beam提供了一种简单且高效的方式来实现数据处理管道&…