|
| 1 | +# 《Chrome V8源码》25.最难啃的骨头——Builtin! |
| 2 | + |
| 3 | +# 前言 |
| 4 | +接下来的几篇文章对Builtin做专题讲解。Builtin实现了V8中大量的核心功能,可见它的重要性。但大多数的Builtin采用CAS和TQ实现,CAS和TQ与汇编类似,这给我们阅读源码带来了不少困难,更难的是无法在V8运行期间调试Builtin,这让学习Builtin愈加困难。因此,本专题将详细讲解Builtin的学习方法和调试方法,希望能起到抛砖引玉的作用。 |
| 5 | +# 1 摘要 |
| 6 | +本篇文章是Builtin专题的第一篇,讲解Built-in Functions(Builtin)是什么,以及它的初始化。Built-in Functions(Builtin)作为V8的内建功能,实现了很多重要功能,例如ignition、bytecode handler、JavaScript API。因此学会Builtin有助于理解V8的执行逻辑,例如可以看到bytecode是怎么执行的、字符串的substring方法是怎么实现的。本文主要内容介绍Builtin的实现方法(章节2);Builtin初始化(章节3)。 |
| 7 | + |
| 8 | +# 2 Builtin的实现方法 |
| 9 | +Builtin的实现方法有Platform-dependent assembly language、C++、JavaScript、CodeStubAssembler和Torque,这五种方式在使用的难易度和性能方面有明显不同。引用官方内容如下: |
| 10 | +**(1)** Platform-dependent assembly language: can be highly efficient, but need manual ports to all platforms and are difficult to maintain. |
| 11 | +**(2)** C++: very similar in style to runtime functions and have access to V8’s powerful runtime functionality, but usually not suited to performance-sensitive areas. |
| 12 | +**(3)** JavaScript: concise and readable code, access to fast intrinsics, but frequent usage of slow runtime calls, subject to unpredictable performance through type pollution, and subtle issues around (complicated and non-obvious) JS semantics. Javascript builtins are deprecated and should not be added anymore. |
| 13 | +**(4)** CodeStubAssembler: provides efficient low-level functionality that is very close to assembly language while remaining platform-independent and preserving readability. |
| 14 | +**(5)** V8 Torque: is a V8-specific domain-specific language that is translated to CodeStubAssembler. As such, it extends upon CodeStubAssembler and offers static typing as well as readable and expressive syntax. |
| 15 | +Torque是CodeStubAssembler的改进版,强调在不损失性能的前提下尽量降低使用难度,让Builtin的开发更加容易一些。 |
| 16 | + |
| 17 | +图1(来自官方)说明了使用Torque创建Builtin的过程。 |
| 18 | +首先,开发者编写的file.tq被Torque编译器翻译为*-tq-csa.cc/.h文件; |
| 19 | +其次,*-tq-csa.cc/.h被编译进可执行文件mksnapshot中; |
| 20 | +最后,mksnapshot生成snapshot.bin文件,该文件存储Builtin的二进制序列。 |
| 21 | +**再次强调:** *-tq-csa.cc/.h是由file.tq指导Torque编译器生成的Builtin源码。 |
| 22 | +V8通过反序列化方式加载snapshot文件时没有符号表,所以调试V8源码时不能看到Torque Builtin源码,CodeStubAssembler Builtin也存储在snapshot.bin文件中,所以调试时也看不到源码。调试方法请参见mksnapshot,下面讲解我的调试方法。 |
| 23 | +# 3 Builtin初始化 |
| 24 | +讲解源码之前先说注意事项,调试方法采用7.9版本和v8_use_snapshot选项,因为新版本不再支持v8_use_snapshot = false,无法调试Builtin的初始化。v8_use_snapshot = false会禁用snapshot.bin文件,这就意味着V8启动时会使用C++源码创建和初始化Builtin,而这正是我们想要看的内容。 |
| 25 | +我认为C++、CodeStubAssembler和Torque三种Builtin最重要,因为ignition、bytecode handler、Javascript API等核心功能基本由这三种Builtin实现,下面对这三种Builtin做详细说明。Builtin的初始化入口代码如下: |
| 26 | +```c++ |
| 27 | +bool Isolate::InitWithoutSnapshot() { return Init(nullptr, nullptr); } |
| 28 | +``` |
| 29 | +从`InitWithoutSnapshot()`函数的名字也可看出禁用了snapshot.bin文件,`InitWithoutSnapshot()`函数执行以下代码: |
| 30 | +```c++ |
| 31 | +1. bool Isolate::Init(ReadOnlyDeserializer* read_only_deserializer, |
| 32 | +2. StartupDeserializer* startup_deserializer) { |
| 33 | +3. //..............省略............... |
| 34 | +4. bootstrapper_->Initialize(create_heap_objects); |
| 35 | +5. if (FLAG_embedded_builtins && create_heap_objects) { |
| 36 | +6. builtins_constants_table_builder_ = new BuiltinsConstantsTableBuilder(this); |
| 37 | +7. } |
| 38 | +8. setup_delegate_->SetupBuiltins(this); |
| 39 | +9. if (FLAG_embedded_builtins && create_heap_objects) { |
| 40 | +10. builtins_constants_table_builder_->Finalize(); |
| 41 | +11. delete builtins_constants_table_builder_; |
| 42 | +12. builtins_constants_table_builder_ = nullptr; |
| 43 | +13. CreateAndSetEmbeddedBlob(); |
| 44 | +14. } |
| 45 | +15.//..............省略............... |
| 46 | +16. return true; |
| 47 | +17. } |
| 48 | +``` |
| 49 | +上述第8行代码进入`SetupBuiltins()`,在`SetupBuiltins()`中调用`SetupBuiltinsInternal()`以完成Builtin的初始化。`SetupBuiltinsInternal()`的源码如下: |
| 50 | +```c++ |
| 51 | +1. void SetupIsolateDelegate::SetupBuiltinsInternal(Isolate* isolate) { |
| 52 | +2. Builtins* builtins = isolate->builtins(); |
| 53 | +3. //省略................... |
| 54 | +4. int index = 0; |
| 55 | +5. Code code; |
| 56 | +6. #define BUILD_CPP(Name) \ |
| 57 | +7. code = BuildAdaptor(isolate, index, FUNCTION_ADDR(Builtin_##Name), #Name); \ |
| 58 | +8. AddBuiltin(builtins, index++, code); |
| 59 | +9. #define BUILD_TFJ(Name, Argc, ...) \ |
| 60 | +10. code = BuildWithCodeStubAssemblerJS( \ |
| 61 | +11. isolate, index, &Builtins::Generate_##Name, Argc, #Name); \ |
| 62 | +12. AddBuiltin(builtins, index++, code); |
| 63 | +13. #define BUILD_TFC(Name, InterfaceDescriptor) \ |
| 64 | +14. /* Return size is from the provided CallInterfaceDescriptor. */ \ |
| 65 | +15. code = BuildWithCodeStubAssemblerCS( \ |
| 66 | +16. isolate, index, &Builtins::Generate_##Name, \ |
| 67 | +17. CallDescriptors::InterfaceDescriptor, #Name); \ |
| 68 | +18. AddBuiltin(builtins, index++, code); |
| 69 | +19. #define BUILD_TFS(Name, ...) \ |
| 70 | +20. /* Return size for generic TF builtins (stub linkage) is always 1. */ \ |
| 71 | +21. code = \ |
| 72 | +22. BuildWithCodeStubAssemblerCS(isolate, index, &Builtins::Generate_##Name, \ |
| 73 | +23. CallDescriptors::Name, #Name); \ |
| 74 | +24. AddBuiltin(builtins, index++, code); |
| 75 | +25. #define BUILD_TFH(Name, InterfaceDescriptor) \ |
| 76 | +26. /* Return size for IC builtins/handlers is always 1. */ \ |
| 77 | +27. code = BuildWithCodeStubAssemblerCS( \ |
| 78 | +28. isolate, index, &Builtins::Generate_##Name, \ |
| 79 | +29. CallDescriptors::InterfaceDescriptor, #Name); \ |
| 80 | +30. AddBuiltin(builtins, index++, code); |
| 81 | +31. #define BUILD_BCH(Name, OperandScale, Bytecode) \ |
| 82 | +32. code = GenerateBytecodeHandler(isolate, index, OperandScale, Bytecode); \ |
| 83 | +33. AddBuiltin(builtins, index++, code); |
| 84 | +34. #define BUILD_ASM(Name, InterfaceDescriptor) \ |
| 85 | +35. code = BuildWithMacroAssembler(isolate, index, Builtins::Generate_##Name, \ |
| 86 | +36. #Name); \ |
| 87 | +37. AddBuiltin(builtins, index++, code); |
| 88 | +38. BUILTIN_LIST(BUILD_CPP, BUILD_TFJ, BUILD_TFC, BUILD_TFS, BUILD_TFH, BUILD_BCH, |
| 89 | +39. BUILD_ASM); |
| 90 | +40. //省略........................... |
| 91 | +41. } |
| 92 | +``` |
| 93 | +`SetupBuiltinsInternal()`的三大核心功能解释如下: |
| 94 | +**(1)** BUILD_CPP, BUILD_TFJ, BUILD_TFC, BUILD_TFS, BUILD_TFH, BUILD_BCH和BUILD_ASM从功能上对Builtin做了区分,注释如下: |
| 95 | +```c++ |
| 96 | +// CPP: Builtin in C++. Entered via BUILTIN_EXIT frame. |
| 97 | +// Args: name |
| 98 | +// TFJ: Builtin in Turbofan, with JS linkage (callable as Javascript function). |
| 99 | +// Args: name, arguments count, explicit argument names... |
| 100 | +// TFS: Builtin in Turbofan, with CodeStub linkage. |
| 101 | +// Args: name, explicit argument names... |
| 102 | +// TFC: Builtin in Turbofan, with CodeStub linkage and custom descriptor. |
| 103 | +// Args: name, interface descriptor |
| 104 | +// TFH: Handlers in Turbofan, with CodeStub linkage. |
| 105 | +// Args: name, interface descriptor |
| 106 | +// BCH: Bytecode Handlers, with bytecode dispatch linkage. |
| 107 | +// Args: name, OperandScale, Bytecode |
| 108 | +// ASM: Builtin in platform-dependent assembly. |
| 109 | +// Args: name, interface descriptor |
| 110 | +``` |
| 111 | +**(2)** `SetupBuiltinsInternal()`的第38行代码BUILTIN_LIST定义了所有的Builtin,源码如下: |
| 112 | +```c++ |
| 113 | +1. #define BUILTIN_LIST(CPP, TFJ, TFC, TFS, TFH, BCH, ASM) \ |
| 114 | +2. BUILTIN_LIST_BASE(CPP, TFJ, TFC, TFS, TFH, ASM) \ |
| 115 | +3. BUILTIN_LIST_FROM_TORQUE(CPP, TFJ, TFC, TFS, TFH, ASM) \ |
| 116 | +4. BUILTIN_LIST_INTL(CPP, TFJ, TFS) \ |
| 117 | +5. BUILTIN_LIST_BYTECODE_HANDLERS(BCH) |
| 118 | +6. //================分隔线================================= |
| 119 | +7. #define BUILTIN_LIST_FROM_TORQUE(CPP, TFJ, TFC, TFS, TFH, ASM) \ |
| 120 | +8. //...............省略............................ |
| 121 | +9. TFJ(StringPrototypeToString, 0, kReceiver) \ |
| 122 | +10. TFJ(StringPrototypeValueOf, 0, kReceiver) \ |
| 123 | +11. TFS(StringToList, kString) \ |
| 124 | +12. TFJ(StringPrototypeCharAt, 1, kReceiver, kPosition) \ |
| 125 | +13. TFJ(StringPrototypeCharCodeAt, 1, kReceiver, kPosition) \ |
| 126 | +14. TFJ(StringPrototypeCodePointAt, 1, kReceiver, kPosition) \ |
| 127 | +15. TFJ(StringPrototypeConcat, SharedFunctionInfo::kDontAdaptArgumentsSentinel) \ |
| 128 | +16. TFJ(StringConstructor, SharedFunctionInfo::kDontAdaptArgumentsSentinel) \ |
| 129 | +17. TFS(StringAddConvertLeft, kLeft, kRight) \ |
| 130 | +18. TFS(StringAddConvertRight, kLeft, kRight) \ |
| 131 | +19. TFJ(StringPrototypeEndsWith, SharedFunctionInfo::kDontAdaptArgumentsSentinel) \ |
| 132 | +20. TFS(CreateHTML, kReceiver, kMethodName, kTagName, kAttr, kAttrValue) \ |
| 133 | +21. TFJ(StringPrototypeAnchor, SharedFunctionInfo::kDontAdaptArgumentsSentinel) \ |
| 134 | +22. TFJ(StringPrototypeBig, SharedFunctionInfo::kDontAdaptArgumentsSentinel) \ |
| 135 | +23. TFJ(StringPrototypeIterator, 0, kReceiver) \ |
| 136 | +24. TFJ(StringIteratorPrototypeNext, 0, kReceiver) \ |
| 137 | +25. TFJ(StringPrototypePadStart, SharedFunctionInfo::kDontAdaptArgumentsSentinel) \ |
| 138 | +26. TFJ(StringPrototypePadEnd, SharedFunctionInfo::kDontAdaptArgumentsSentinel) \ |
| 139 | +27. TFS(StringRepeat, kString, kCount) \ |
| 140 | +28. TFJ(StringPrototypeRepeat, 1, kReceiver, kCount) \ |
| 141 | +29. TFJ(StringPrototypeSlice, SharedFunctionInfo::kDontAdaptArgumentsSentinel) \ |
| 142 | +30. TFJ(StringPrototypeStartsWith, SharedFunctionInfo::kDontAdaptArgumentsSentinel) \ |
| 143 | +31. TFJ(StringPrototypeSubstring, SharedFunctionInfo::kDontAdaptArgumentsSentinel) \ |
| 144 | +``` |
| 145 | +BUILTIN_LIST和BUILTIN_LIST_FROM_TORQUE配合使用可以看到所有的Builtin名字,第9-31行代码可以看到实现字符串方法的Builtin的名字,例如substring的Builtin是StringPrototypeSubstring。 |
| 146 | +**(3)** BUILD_CPP, BUILD_TFJ等七个宏和BUILTIN_LIST的共同配合完成所有Builtin的初始化。以`SetupBuiltinsInternal()`的BUILD_CPP为例进一步分析,源码如下: |
| 147 | +```c++ |
| 148 | +1. int index = 0; |
| 149 | +2. Code code; |
| 150 | +3. #define BUILD_CPP(Name) \ |
| 151 | +4. code = BuildAdaptor(isolate, index, FUNCTION_ADDR(Builtin_##Name), #Name); \ |
| 152 | +5. AddBuiltin(builtins, index++, code); |
| 153 | +//...................分隔线................. |
| 154 | +// FUNCTION_ADDR(f) gets the address of a C function f. |
| 155 | +#define FUNCTION_ADDR(f) (reinterpret_cast<v8::internal::Address>(f)) |
| 156 | +``` |
| 157 | +index的初始值为0,code是一个基于HeapObject的地址指针,用于保存生成的Builtin地址。`FUNCTION_ADDR(Builtin_##Name)`创建Builtin的地址指针,在`BuildAdaptor()`中完成Builtin的创建时会使用该指针。`BuildAdaptor()`的源码如下: |
| 158 | +```c++ |
| 159 | +Code BuildAdaptor(Isolate* isolate, int32_t builtin_index, |
| 160 | + Address builtin_address, const char* name) { |
| 161 | + HandleScope scope(isolate); |
| 162 | + // Canonicalize handles, so that we can share constant pool entries pointing |
| 163 | + // to code targets without dereferencing their handles. |
| 164 | + CanonicalHandleScope canonical(isolate); |
| 165 | + constexpr int kBufferSize = 32 * KB; |
| 166 | + byte buffer[kBufferSize]; |
| 167 | + MacroAssembler masm(isolate, BuiltinAssemblerOptions(isolate, builtin_index), |
| 168 | + CodeObjectRequired::kYes, |
| 169 | + ExternalAssemblerBuffer(buffer, kBufferSize)); |
| 170 | + masm.set_builtin_index(builtin_index); |
| 171 | + DCHECK(!masm.has_frame()); |
| 172 | + Builtins::Generate_Adaptor(&masm, builtin_address); |
| 173 | + CodeDesc desc; |
| 174 | + masm.GetCode(isolate, &desc); |
| 175 | + Handle<Code> code = Factory::CodeBuilder(isolate, desc, Code::BUILTIN) |
| 176 | + .set_self_reference(masm.CodeObject()) |
| 177 | + .set_builtin_index(builtin_index) |
| 178 | + .Build(); |
| 179 | + return *code; |
| 180 | +} |
| 181 | +``` |
| 182 | +上述代码中,通过`Generate_Adaptor`和`Factory::CodeBuilder`完成Builtin的创建,code表示Builtin的地址。 |
| 183 | +返回到`#define BUILD_CPP(Name)`,进入`AddBuiltin`,源码如下: |
| 184 | +```c++ |
| 185 | +void SetupIsolateDelegate::AddBuiltin(Builtins* builtins, int index, |
| 186 | + Code code) { |
| 187 | + DCHECK_EQ(index, code.builtin_index()); |
| 188 | + builtins->set_builtin(index, code); |
| 189 | +} |
| 190 | +//..............分隔线....................... |
| 191 | +void Builtins::set_builtin(int index, Code builtin) { |
| 192 | + isolate_->heap()->set_builtin(index, builtin); |
| 193 | +} |
| 194 | +//.............分隔线.......................... |
| 195 | +void Heap::set_builtin(int index, Code builtin) { |
| 196 | + DCHECK(Builtins::IsBuiltinId(index)); |
| 197 | + DCHECK(Internals::HasHeapObjectTag(builtin.ptr())); |
| 198 | + // The given builtin may be completely uninitialized thus we cannot check its |
| 199 | + // type here. |
| 200 | + isolate()->builtins_table()[index] = builtin.ptr(); |
| 201 | +} |
| 202 | +``` |
| 203 | +上述代码中,`Builtins::set_builtin()`调用`Heap::set_builtin()`把Builtin存储到`isolate()->builtins_table()`中。`builtin_table`是`V8_INLINE Address*`类型的数组,`index`是数组下标,该数组存储了所有的Builtin。至此,Builtin初始化完成,图2是函数调用堆栈。 |
| 204 | + |
| 205 | +Buitlin的调试方法总结如下: |
| 206 | +**(1)** 把BUILTIN_LIST宏展开,得到每个Builtin的编号index。可以借助VS2019的预处理来展开宏。 |
| 207 | +**(2)** 使用index设置条件断点,图3展示了跟踪12号Builtin的方法。 |
| 208 | + |
| 209 | +在Builtin的源码下断点是最简单直接的方法,如果你不知道Builtin是用哪种方式实现的(如`BUILD_CPP`或`BUILD_TFS`),那就在每个方法中都设置条件断点。图4中是在Substring源码中下的断点。 |
| 210 | + |
| 211 | +**技术总结** |
| 212 | +**(1)** 调试Bultin时要使用7.x版的V8,高版本中已经没有v8_use_snapshot了; |
| 213 | +**(2)** 编译V8时需要设置v8_optimized_debug = false,关闭compiler optimizations; |
| 214 | +**(3)** 因为builtin_index是int32_t,设置条件断点时要用使用(int)builtin_index。 |
| 215 | + |
| 216 | +好了,今天到这里,下次见。 |
| 217 | + |
| 218 | +**恳请读者批评指正、提出宝贵意见** |
| 219 | +**微信:qq9123013 备注:v8交流 知乎:https://www.zhihu.com/people/v8blink** |
| 220 | + |
| 221 | +本文由灰豆原创发布 |
| 222 | +转载出处: https://www.anquanke.com/post/id/260029 |
| 223 | +安全客 - 有思想的安全新媒体 |
| 224 | + |
| 225 | + |
| 226 | + |
0 commit comments