<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>Manjusaka</title>
  
  <subtitle>写代码的香港记者</subtitle>
  <link href="https://www.manjusaka.blog/atom.xml" rel="self"/>
  
  <link href="https://www.manjusaka.blog/"/>
  <updated>2026-03-29T17:00:43.280Z</updated>
  <id>https://www.manjusaka.blog/</id>
  
  <author>
    <name>Manjusaka</name>
    
  </author>
  
  <generator uri="https://hexo.io/">Hexo</generator>
  
  <entry>
    <title>怎么样 tracing 你的 SQL？</title>
    <link href="https://www.manjusaka.blog/posts/2026/02/22/how-to-tracing-your-sql/"/>
    <id>https://www.manjusaka.blog/posts/2026/02/22/how-to-tracing-your-sql/</id>
    <published>2026-02-22T10:49:00.000Z</published>
    <updated>2026-03-29T17:00:43.280Z</updated>
    
    <content type="html"><![CDATA[<script src="/assets/js/DPlayer.min.js"> </script><p>过年是个好时候，可以猛猛干活或者看论文。不过上了几天磨，看了几天论文后觉得还是需要整点活</p><p>所以这篇文章来聊聊一个经典话题，How to trace your SQL？</p><span id="more"></span><h2 id="正文"><a href="#正文" class="headerlink" title="正文"></a>正文</h2><p>在 OpenTelementry 这一套开始铺展开来后，我们对于代码整个生命周期的 tracing 有了一个较为成熟的方案。包括各类 auto-instrumentation 的库，能够让我们在不修改代码的情况下就能对整个调用链进行 tracing。</p><p>但是始终有一朵乌云盘绕着 Tracing 世界的大厦上，我们怎么样将 SQL 的执行如同业务代码一样从黑盒中拆出来</p><p>在现阶段，我们对于整个 Tracing 的引入都是在做加法，我们选择构建一个 context，注入一些 metadata，让代码中不同的环节都可以获取到上下文。但是还是有一个问题，怎么样在 SQL 中做加法呢？</p><p>最直观的想法是，我们可以尝试一个类似机制</p><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">begin</span>;</span><br><span class="line"><span class="keyword">set</span> saka.tracing_id <span class="operator">=</span> <span class="string">&#x27;1234567890&#x27;</span>;</span><br><span class="line"><span class="keyword">select</span> <span class="operator">*</span> <span class="keyword">from</span> users <span class="keyword">where</span> id <span class="operator">=</span> <span class="number">1</span>;</span><br><span class="line"><span class="keyword">commit</span>;</span><br></pre></td></tr></table></figure><p>我们可以在 SQL 中设置一些 tracing_id 之类的东西，这样我们就可以在明确一个事务的上下文了。但是这个方案有一个问题，这会改变我们使用 SQL 的 pattern，我们需要在每个 SQL 语句前面都加上 set saka.tracing_id = ‘1234567890’ 这样的东西，这样就会导致我们在代码中需要修改大量的 SQL 语句，这显然是不可行的。</p><p>那么另外一种方式实现的我们可以将 tracing 注入到 SQL 中，something like this:</p><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">select</span> <span class="operator">*</span> <span class="keyword">from</span> users <span class="keyword">where</span> id <span class="operator">=</span> <span class="number">1</span> <span class="comment">/* tracing_id: 1234567890 */</span>;</span><br></pre></td></tr></table></figure><p>那么怎么做？</p><p>Google 提出了一个通用的方案 or 叫一个事实上的的标准吧，叫作 <a href="https://google.github.io/sqlcommenter/">SQLCommenter</a>，它的核心思想就是通过 Hook ORM 等手段，让我们尽可能简单的在 SQL 中注入一些 comment，这些 comment 中包含了 tracing/Custom Tag 的信息，这样我们可以将元数据注入的成本降到最低。</p><p>那么我们来根据 SQLCommenter 的方案来看看我们怎么样在 Python 中实现这个功能，</p><p>以 pymysql 为例，我们需要实现这个非常简单</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">from</span> pymysql.cursors <span class="keyword">import</span> SSCursor, SSDictCursor</span><br><span class="line"><span class="keyword">from</span> flask <span class="keyword">import</span> g</span><br><span class="line"></span><br><span class="line"><span class="keyword">from</span> main.utils.context <span class="keyword">import</span> INJECT_SQL_COMMENT, TRACE_ID</span><br><span class="line"><span class="keyword">from</span> main.utils.ip_utils <span class="keyword">import</span> local_ip</span><br><span class="line"></span><br><span class="line"><span class="keyword">try</span>:</span><br><span class="line">    CURRENT_IP = local_ip()</span><br><span class="line"><span class="keyword">except</span>:</span><br><span class="line">    CURRENT_IP = <span class="string">&quot;127.0.0.1&quot;</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">def</span> <span class="title function_">inject_meta_info</span>(<span class="params">query: <span class="built_in">str</span></span>) -&gt; <span class="built_in">str</span>:</span><br><span class="line">    <span class="keyword">if</span> INJECT_SQL_COMMENT.get():</span><br><span class="line">        <span class="keyword">if</span> TRACE_ID.get() != <span class="string">&quot;None&quot;</span>:</span><br><span class="line">            trace_id = <span class="built_in">getattr</span>(g, <span class="string">&quot;TRACE_ID&quot;</span>)</span><br><span class="line">            sql_comment = <span class="string">f&quot;/*X-Amzn-Trace-Id=<span class="subst">&#123;trace_id&#125;</span>*/&quot;</span></span><br><span class="line">            query = <span class="string">f&quot;<span class="subst">&#123;sql_comment&#125;</span> <span class="subst">&#123;query&#125;</span>&quot;</span></span><br><span class="line">    query = <span class="string">f&quot;/*source_ip=<span class="subst">&#123;CURRENT_IP&#125;</span>*/ <span class="subst">&#123;query&#125;</span>&quot;</span></span><br><span class="line">    <span class="keyword">return</span> query</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">CustomSSCursor</span>(<span class="title class_ inherited__">SSCursor</span>):</span><br><span class="line">    <span class="keyword">def</span> <span class="title function_">execute</span>(<span class="params">self, query: <span class="built_in">str</span>, args: <span class="type">Any</span> = <span class="literal">None</span></span>) -&gt; <span class="built_in">int</span>:</span><br><span class="line">        <span class="keyword">return</span> <span class="built_in">super</span>(CustomSSCursor, <span class="variable language_">self</span>).execute(inject_meta_info(query), args)</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">CustomSSDictCursor</span>(<span class="title class_ inherited__">SSDictCursor</span>):</span><br><span class="line">    <span class="keyword">def</span> <span class="title function_">execute</span>(<span class="params">self, query: <span class="built_in">str</span>, args: <span class="type">Any</span> = <span class="literal">None</span></span>) -&gt; <span class="built_in">int</span>:</span><br><span class="line">        <span class="keyword">return</span> <span class="built_in">super</span>(CustomSSDictCursor, <span class="variable language_">self</span>).execute(inject_meta_info(query), args)</span><br><span class="line"></span><br></pre></td></tr></table></figure><p>然后我们在使用时，注入自定义的 Cursor 就好了</p><p>Python 的生态还是幸福，但是很可惜，我现在是被迫在写 Node.js 的代码了，Node.js 的生态就没有那么幸福了。由于历史原因，我们现在用的是极为美味的 Prisma 作为 ORM。那么我们需要在 Prisma 来看一下怎么样注入 SQL Comment。</p><p>首先在 Prisma 最新的 v7.x 版本中，Prisma 本身实现了 SQLCommenter 的功能，something like this:</p><figure class="highlight ts"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> &#123; queryTags, withQueryTags &#125; <span class="keyword">from</span> <span class="string">&quot;@prisma/sqlcommenter-query-tags&quot;</span>;</span><br><span class="line"><span class="keyword">import</span> &#123; <span class="title class_">PrismaClient</span> &#125; <span class="keyword">from</span> <span class="string">&quot;../prisma/generated/client&quot;</span>;</span><br><span class="line"><span class="keyword">const</span> prisma = <span class="keyword">new</span> <span class="title class_">PrismaClient</span>(&#123;</span><br><span class="line">  adapter,</span><br><span class="line">  <span class="attr">comments</span>: [<span class="title function_">queryTags</span>()],</span><br><span class="line">&#125;);</span><br><span class="line"><span class="comment">// Wrap your queries to add tags</span></span><br><span class="line"><span class="keyword">const</span> users = <span class="keyword">await</span> <span class="title function_">withQueryTags</span>(&#123; <span class="attr">route</span>: <span class="string">&quot;/api/users&quot;</span>, <span class="attr">requestId</span>: <span class="string">&quot;abc-123&quot;</span> &#125;, <span class="function">() =&gt;</span></span><br><span class="line">  prisma.<span class="property">user</span>.<span class="title function_">findMany</span>(),</span><br><span class="line">);</span><br></pre></td></tr></table></figure><p>最终会生成类似这样的 SQL</p><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">SELECT</span> ... <span class="keyword">FROM</span> &quot;User&quot; <span class="comment">/*requestId=&#x27;abc-123&#x27;,route=&#x27;/api/users&#x27;*/</span></span><br></pre></td></tr></table></figure><p>OK， 很不错，是预期内行为</p><p>但是问题在于 Prisma V7 是一个极为屎一样的版本，我们完全无法如品鉴母鸡卡一样品鉴这个功能。因为性能问题（v7 比 v6 慢了 30%-40%），我们完全无法在生产环境中使用 v7 的版本，所以我们只能在 v6 中实现这个功能了。</p><p>在 Prisma v6 中，Prisma 数据映射部分和核心的 Query Engine 是完全分开，他们通过走 NAPI-RS 进行通信，而他们自定义了一套 json based 的传输协议，协议样例如下</p><figure class="highlight json"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line"><span class="punctuation">&#123;</span></span><br><span class="line">  <span class="attr">&quot;modelName&quot;</span><span class="punctuation">:</span> <span class="string">&quot;User&quot;</span><span class="punctuation">,</span>                    <span class="comment">// 可选，raw query 不需要</span></span><br><span class="line">  <span class="attr">&quot;action&quot;</span><span class="punctuation">:</span> <span class="string">&quot;findMany&quot;</span><span class="punctuation">,</span>                   <span class="comment">// 操作类型</span></span><br><span class="line">  <span class="attr">&quot;query&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span>                              <span class="comment">// 查询详情</span></span><br><span class="line">    <span class="attr">&quot;arguments&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span>                        <span class="comment">// where/orderBy/take 等</span></span><br><span class="line">      <span class="attr">&quot;where&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span> <span class="attr">&quot;email&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span> <span class="attr">&quot;contains&quot;</span><span class="punctuation">:</span> <span class="string">&quot;prisma.io&quot;</span> <span class="punctuation">&#125;</span> <span class="punctuation">&#125;</span></span><br><span class="line">    <span class="punctuation">&#125;</span><span class="punctuation">,</span></span><br><span class="line">    <span class="attr">&quot;selection&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span>                        <span class="comment">// 字段选择</span></span><br><span class="line">      <span class="attr">&quot;$scalars&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">true</span></span><span class="punctuation">,</span></span><br><span class="line">      <span class="attr">&quot;$composites&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">true</span></span><span class="punctuation">,</span></span><br><span class="line">      <span class="attr">&quot;posts&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span>                          <span class="comment">// 关联查询嵌套</span></span><br><span class="line">        <span class="attr">&quot;arguments&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span></span><br><span class="line">        <span class="attr">&quot;selection&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span> <span class="attr">&quot;$scalars&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">true</span></span> <span class="punctuation">&#125;</span></span><br><span class="line">      <span class="punctuation">&#125;</span></span><br><span class="line">    <span class="punctuation">&#125;</span></span><br><span class="line">  <span class="punctuation">&#125;</span></span><br><span class="line"><span class="punctuation">&#125;</span></span><br></pre></td></tr></table></figure><p>而在这一次，我想实现类似官方的语义</p><figure class="highlight ts"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">return</span> <span class="variable language_">this</span>.<span class="property">prisma</span>.<span class="property">post</span>.<span class="title function_">findMany</span>(&#123;</span><br><span class="line">  <span class="attr">take</span>: <span class="number">100</span>,</span><br><span class="line">  <span class="attr">sqlComments</span>: <span class="title function_">getSqlComments</span>(req),</span><br><span class="line">&#125;);</span><br></pre></td></tr></table></figure><p>OK，那么我们直接用一个流程图来输出一下对应的实现流程</p><pre><code class="highlight mermaid">flowchart TD    subgraph TS[&quot;TypeScript (prisma repo)&quot;]        A[&quot;用户代码&lt;br/&gt;prisma.user.findMany(&#123;sqlComments: &#123;...&#125;, where: &#123;...&#125;&#125;)&quot;]        B[&quot;getPrismaClient.ts :: _executeRequest()&lt;br/&gt;调用 serializeJsonQuery()&quot;]        C[&quot;serializeJsonQuery.ts :: serializeJsonQuery() &lt;b&gt;[改动1]&lt;/b&gt;&lt;br/&gt;提取 sqlComments 放入 JsonQuery 顶层&lt;br/&gt;输出: &#123;modelName, action, query, sqlComments&#125;&quot;]        D[&quot;RequestHandler.ts :: singleLoader / batchLoader&lt;br/&gt;调用 _engine.request(protocolQuery, &#123;traceparent&#125;)&quot;]        E[&quot;LibraryEngine.ts :: request()&lt;br/&gt;JSON.stringify(query) + JSON.stringify(&#123;traceparent&#125;)&lt;br/&gt;engine.query(queryStr, headerStr, txId)&quot;]    end    subgraph FFI[&quot;C ABI / NAPI FFI 边界&quot;]        F((&quot;FFI&quot;))    end    subgraph Rust[&quot;Rust (prisma-engines repo)&quot;]        G[&quot;QueryEngine::query()&lt;br/&gt;解析 body_str → RequestBody&lt;br/&gt;从 header 提取 traceparent&quot;]        H[&quot;RequestHandler::handle() &lt;b&gt;[改动2]&lt;/b&gt;&lt;br/&gt;body.into_doc() → (QueryDocument, SqlCommentsVec)&lt;br/&gt;组装 QueryContext &#123;traceparent, sql_comments&#125;&quot;]        I[&quot;Single 查询&lt;br/&gt;QueryContext::new(traceparent, sql_comments#0)&quot;]        J[&quot;Batch 查询&lt;br/&gt;每个 operation 独立 QueryContext&lt;br/&gt;共享 traceparent，各自 sql_comments&quot;]        K[&quot;JsonBody::into_doc() &lt;b&gt;[改动3]&lt;/b&gt;&lt;br/&gt;JsonSingleQuery 新增 sql_comments 字段&lt;br/&gt;extract_sql_comments() → Vec&amp;lt;(String,String)&amp;gt;&quot;]        L[&quot;QueryExecutor::execute() &lt;b&gt;[改动4]&lt;/b&gt;&lt;br/&gt;traceparent → query_context: QueryContext&lt;br/&gt;管道传递: execute_operation → interpreter → read/write&quot;]        M[&quot;interpreter :: read.rs / write.rs &lt;b&gt;[改动5]&lt;/b&gt;&lt;br/&gt;query_context.sql_trace() → SqlTrace &#123;traceparent, sql_comments&#125;&quot;]        N[&quot;ReadOperations / WriteOperations &lt;b&gt;[改动6]&lt;/b&gt;&lt;br/&gt;traceparent: Option&amp;lt;TraceParent&amp;gt; → trace: SqlTrace&quot;]        O[&quot;sql-query-connector :: connection.rs &lt;b&gt;[改动7]&lt;/b&gt;&lt;br/&gt;Context::new(&amp;connection_info, trace)&quot;]        P[&quot;sql-query-builder :: read/write/select &lt;b&gt;[改动8]&lt;/b&gt;&lt;br/&gt;构建 Quaint AST&lt;br/&gt;调用 .add_trace_id(ctx)&quot;]        Q[&quot;sql_trace.rs :: add_trace_id() &lt;b&gt;[改动9]&lt;/b&gt;&lt;br/&gt;build_trace_comment(sql_comments, traceparent)&lt;br/&gt;1. sql_comments 按 key 字母排序&lt;br/&gt;2. key/value URL 编码, 格式: key=&#x27;value&#x27;&lt;br/&gt;3.traceparent sampled 则追加&lt;br/&gt;4. 逗号拼接, 调用 .comment(result)&quot;]        R[&quot;Quaint AST :: .comment(...)&lt;br/&gt;渲染时在 SQL 末尾追加 /* ... */&quot;]    end    S[&quot;最终 SQL&lt;br/&gt;SELECT ... FROM &amp;quot;User&amp;quot; WHERE ...&lt;br/&gt;/* controller=&#x27;UserController&#x27;,route=&#x27;%2Fapi%2Fusers&#x27; */&quot;]    A --&gt; B --&gt; C --&gt; D --&gt; E    E --&gt; F --&gt; G --&gt; H    H --&gt; I --&gt; L    H --&gt; J --&gt; L    H -.-&gt;|内部调用| K    L --&gt; M --&gt; N --&gt; O --&gt; P --&gt; Q --&gt; R --&gt; S</code></pre><p>OK，在调整完 FFI ，扩展完 Query Engine 后，我们再调整一下 Prisma 代码生成相关的部分即可</p><p>然后我们可以在业务代码中这样使用</p><figure class="highlight ts"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">function</span> <span class="title function_">getSqlComments</span>(<span class="params"><span class="attr">req</span>: <span class="title class_">Request</span></span>): <span class="title class_">Record</span>&lt;<span class="built_in">string</span>, <span class="built_in">string</span>&gt; &#123;</span><br><span class="line">  <span class="keyword">const</span> <span class="attr">comments</span>: <span class="title class_">Record</span>&lt;<span class="built_in">string</span>, <span class="built_in">string</span>&gt; = &#123;&#125;;</span><br><span class="line"></span><br><span class="line">  <span class="comment">// Request info</span></span><br><span class="line">  comments.<span class="property">route</span> = req.<span class="property">path</span>;</span><br><span class="line">  comments.<span class="property">method</span> = req.<span class="property">method</span>;</span><br><span class="line">  <span class="keyword">if</span> (req.<span class="property">route</span>?.<span class="property">path</span>) &#123;</span><br><span class="line">    comments[<span class="string">&quot;route_pattern&quot;</span>] = req.<span class="property">route</span>.<span class="property">path</span>;</span><br><span class="line">  &#125;</span><br><span class="line"></span><br><span class="line">  <span class="comment">// OpenTelemetry trace context</span></span><br><span class="line">  <span class="keyword">const</span> span = trace.<span class="title function_">getSpan</span>(context.<span class="title function_">active</span>());</span><br><span class="line">  <span class="keyword">if</span> (span) &#123;</span><br><span class="line">    <span class="keyword">const</span> ctx = span.<span class="title function_">spanContext</span>();</span><br><span class="line">    comments.<span class="property">traceparent</span> = <span class="string">`00-<span class="subst">$&#123;ctx.traceId&#125;</span>-<span class="subst">$&#123;ctx.spanId&#125;</span>-0<span class="subst">$&#123;ctx.traceFlags&#125;</span>`</span>;</span><br><span class="line">  &#125;</span><br><span class="line"></span><br><span class="line">  <span class="keyword">return</span> comments;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="meta">@Controller</span>()</span><br><span class="line"><span class="keyword">export</span> <span class="keyword">class</span> <span class="title class_">AppController</span> &#123;</span><br><span class="line">  <span class="title function_">constructor</span>(<span class="params"><span class="keyword">private</span> <span class="keyword">readonly</span> <span class="attr">prisma</span>: <span class="title class_">PrismaService</span></span>) &#123;&#125;</span><br><span class="line"></span><br><span class="line">  <span class="meta">@Get</span>()</span><br><span class="line">  <span class="title function_">getHello</span>(): <span class="built_in">string</span> &#123;</span><br><span class="line">    <span class="keyword">return</span> <span class="string">`Hello World!`</span>;</span><br><span class="line">  &#125;</span><br><span class="line"></span><br><span class="line">  <span class="meta">@Get</span>(<span class="string">&quot;posts&quot;</span>)</span><br><span class="line">  <span class="title function_">getPosts</span>(<span class="params"><span class="meta">@Req</span>() <span class="attr">req</span>: <span class="title class_">Request</span></span>) &#123;</span><br><span class="line">    <span class="keyword">return</span> <span class="variable language_">this</span>.<span class="property">prisma</span>.<span class="property">post</span>.<span class="title function_">findMany</span>(&#123;</span><br><span class="line">      <span class="attr">take</span>: <span class="number">100</span>,</span><br><span class="line">      <span class="attr">sqlComments</span>: <span class="title function_">getSqlComments</span>(req),</span><br><span class="line">    &#125;);</span><br><span class="line">  &#125;</span><br><span class="line"></span><br><span class="line">  <span class="meta">@Get</span>(<span class="string">&quot;posts/:id&quot;</span>)</span><br><span class="line">  <span class="title function_">getPostsById</span>(<span class="params"><span class="meta">@Param</span>(<span class="string">&quot;id&quot;</span>) <span class="attr">id</span>: <span class="built_in">string</span>, <span class="meta">@Req</span>() <span class="attr">req</span>: <span class="title class_">Request</span></span>) &#123;</span><br><span class="line">    <span class="keyword">return</span> <span class="variable language_">this</span>.<span class="property">prisma</span>.<span class="property">post</span>.<span class="title function_">findUnique</span>(&#123;</span><br><span class="line">      <span class="attr">where</span>: &#123; id &#125;,</span><br><span class="line">      <span class="attr">sqlComments</span>: <span class="title function_">getSqlComments</span>(req),</span><br><span class="line">    &#125;);</span><br><span class="line">  &#125;</span><br><span class="line"></span><br><span class="line">  <span class="meta">@Get</span>(<span class="string">&quot;posts-with-comments&quot;</span>)</span><br><span class="line">  <span class="title function_">getPostsWithComments</span>(<span class="params"><span class="meta">@Req</span>() <span class="attr">req</span>: <span class="title class_">Request</span></span>) &#123;</span><br><span class="line">    <span class="keyword">return</span> <span class="variable language_">this</span>.<span class="property">prisma</span>.<span class="property">post</span>.<span class="title function_">findMany</span>(&#123;</span><br><span class="line">      <span class="attr">take</span>: <span class="number">100</span>,</span><br><span class="line">      <span class="attr">include</span>: &#123;</span><br><span class="line">        <span class="attr">comments</span>: <span class="literal">true</span>,</span><br><span class="line">      &#125;,</span><br><span class="line">      <span class="attr">sqlComments</span>: <span class="title function_">getSqlComments</span>(req),</span><br><span class="line">    &#125;);</span><br><span class="line">  &#125;</span><br><span class="line"></span><br><span class="line">  <span class="meta">@Get</span>(<span class="string">&quot;posts-with-comments/:id&quot;</span>)</span><br><span class="line">  <span class="title function_">getPostWithCommentsById</span>(<span class="params"><span class="meta">@Param</span>(<span class="string">&quot;id&quot;</span>) <span class="attr">id</span>: <span class="built_in">string</span>, <span class="meta">@Req</span>() <span class="attr">req</span>: <span class="title class_">Request</span></span>) &#123;</span><br><span class="line">    <span class="keyword">return</span> <span class="variable language_">this</span>.<span class="property">prisma</span>.<span class="property">post</span>.<span class="title function_">findUnique</span>(&#123;</span><br><span class="line">      <span class="attr">where</span>: &#123; id &#125;,</span><br><span class="line">      <span class="attr">include</span>: &#123;</span><br><span class="line">        <span class="attr">comments</span>: <span class="literal">true</span>,</span><br><span class="line">      &#125;,</span><br><span class="line">      <span class="attr">sqlComments</span>: <span class="title function_">getSqlComments</span>(req),</span><br><span class="line">    &#125;);</span><br><span class="line">  &#125;</span><br><span class="line"></span><br><span class="line">  <span class="meta">@Post</span>(<span class="string">&quot;posts&quot;</span>)</span><br><span class="line">  <span class="title function_">createPost</span>(<span class="params"><span class="meta">@Body</span>() <span class="attr">body</span>: &#123; title: <span class="built_in">string</span>; body: <span class="built_in">string</span> &#125;, <span class="meta">@Req</span>() <span class="attr">req</span>: <span class="title class_">Request</span></span>) &#123;</span><br><span class="line">    <span class="keyword">return</span> <span class="variable language_">this</span>.<span class="property">prisma</span>.<span class="property">post</span>.<span class="title function_">create</span>(&#123;</span><br><span class="line">      <span class="attr">data</span>: &#123;</span><br><span class="line">        <span class="attr">title</span>: body.<span class="property">title</span>,</span><br><span class="line">        <span class="attr">body</span>: body.<span class="property">body</span>,</span><br><span class="line">      &#125;,</span><br><span class="line">      <span class="attr">sqlComments</span>: <span class="title function_">getSqlComments</span>(req),</span><br><span class="line">    &#125;);</span><br><span class="line">  &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>然后我们可以得到这样的 SQL </p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">2026-02-22 07:56:38.681 GMT [190] LOG:  execute s412381: SELECT &quot;public&quot;.&quot;Post&quot;.&quot;id&quot;, &quot;public&quot;.&quot;Post&quot;.&quot;title&quot;, &quot;public&quot;.&quot;Post&quot;.&quot;body&quot;, &quot;public&quot;.&quot;Post&quot;.&quot;createdAt&quot;, &quot;public&quot;.&quot;Post&quot;.&quot;modifiedAt&quot; FROM &quot;public&quot;.&quot;Post&quot; WHERE (&quot;public&quot;.&quot;Post&quot;.&quot;id&quot; = $1 AND 1=1) LIMIT $2 OFFSET $3 /* method=&#x27;GET&#x27;,route=&#x27;%2Fposts%2Fff4ccd6d-2c15-4979-a5f3-0c27e4e2f169&#x27;,route_pattern=&#x27;%2Fposts%2F:id&#x27;,traceparent=&#x27;00-e984180b391935fbdf2b1e1f6f3b2b12-1286ff6754fbbc57-01&#x27; */</span><br><span class="line">2026-02-22 07:56:38.681 GMT [190] DETAIL:  parameters: $1 = &#x27;ff4ccd6d-2c15-4979-a5f3-0c27e4e2f169&#x27;, $2 = &#x27;1&#x27;, $3 = &#x27;0&#x27;</span><br><span class="line">2026-02-22 07:56:38.681 GMT [190] LOG:  execute s412382: SELECT &quot;public&quot;.&quot;Post&quot;.&quot;id&quot;, &quot;public&quot;.&quot;Post&quot;.&quot;title&quot;, &quot;public&quot;.&quot;Post&quot;.&quot;body&quot;, &quot;public&quot;.&quot;Post&quot;.&quot;createdAt&quot;, &quot;public&quot;.&quot;Post&quot;.&quot;modifiedAt&quot; FROM &quot;public&quot;.&quot;Post&quot; WHERE (&quot;public&quot;.&quot;Post&quot;.&quot;id&quot; = $1 AND 1=1) LIMIT $2 OFFSET $3 /* method=&#x27;GET&#x27;,route=&#x27;%2Fposts%2Fff4ccd6d-2c15-4979-a5f3-0c27e4e2f169&#x27;,route_pattern=&#x27;%2Fposts%2F:id&#x27;,traceparent=&#x27;00-fa7f8686b25c2efe8942718207c28079-bbeb59140c47895c-01&#x27; */</span><br><span class="line">2026-02-22 07:56:38.681 GMT [190] DETAIL:  parameters</span><br></pre></td></tr></table></figure><p>通常来说，这样的语句已经能在我们常见的数据库调试流程中已经起到很大的帮助了，可以查到某一条慢 SQL 的来源，上下文等信息</p><p>但是这就够了吗？我们能不能把数据库也接入到 OpenTelemetry 的 tracing 体系中来呢？我们能不能在数据库的层面上看到整个调用链的 tracing 信息呢？</p><p>那没有问题的，这里以我现在更熟悉一点的 PostgreSQL 生态举个例子</p><p>Datadog 之前有一个工作，他们在 PostgreSQL 上实现了一个扩展叫作 pg_tracing <a href="https://github.com/DataDog/pg_tracing">https://github.com/DataDog/pg_tracing</a></p><p>通过使用 PostgreSQL 的一些扩展点</p><ol><li>post_parse_analyze_hook</li><li>planner_hook</li><li>ExecutorStart_hook</li><li>ExecutorRun_hook</li><li>ExecutorFinish_hook</li><li>ExecutorEnd_hook</li><li>ProcessUtility_hook</li><li>xact_callback</li></ol><p>然后配合一些环形缓冲区和共享内存，就可以在数据库层面上实现一个 tracing 的功能了，这样我们就可以将 PostgreSQL 接入到 OTEL 生态中了。当然这个库的实现里面有不少小技巧，改天可以单独写个文章来聊聊</p><p>MySQL 虽然内部的实现是一坨，但是我想如果要在 MySQL 上实现类似的功能应该工作量也不会太大，</p><p>最终的效果如下所示</p><p><img src="/images/Screenshot_20260222_165110.png" alt="OTEL tracing"></p><p>差不多就这样</p><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><p>大家都在写各种 AI/Agent 的文章的时候，我还在搞点这种 old school 的东西。恍惚间看到我的前方有一个巨大的风车。But anyway， 我喜欢这些东西， 这就够了</p><p>祝大家看的开心。</p>]]></content>
    
    
    <summary type="html">&lt;p&gt;过年是个好时候，可以猛猛干活或者看论文。不过上了几天磨，看了几天论文后觉得还是需要整点活&lt;/p&gt;
&lt;p&gt;所以这篇文章来聊聊一个经典话题，How to trace your SQL？&lt;/p&gt;</summary>
    
    
    
    <category term="编程" scheme="https://www.manjusaka.blog/categories/%E7%BC%96%E7%A8%8B/"/>
    
    <category term="Infra" scheme="https://www.manjusaka.blog/categories/%E7%BC%96%E7%A8%8B/Infra/"/>
    
    
    <category term="编程" scheme="https://www.manjusaka.blog/tags/%E7%BC%96%E7%A8%8B/"/>
    
    <category term="Linux" scheme="https://www.manjusaka.blog/tags/Linux/"/>
    
    <category term="Python" scheme="https://www.manjusaka.blog/tags/Python/"/>
    
    <category term="笔记" scheme="https://www.manjusaka.blog/tags/%E7%AC%94%E8%AE%B0/"/>
    
    <category term="水文" scheme="https://www.manjusaka.blog/tags/%E6%B0%B4%E6%96%87/"/>
    
    <category term="Node.js" scheme="https://www.manjusaka.blog/tags/Node-js/"/>
    
    <category term="Infra" scheme="https://www.manjusaka.blog/tags/Infra/"/>
    
  </entry>
  
  <entry>
    <title>笑ってほしくて</title>
    <link href="https://www.manjusaka.blog/posts/2026/01/01/at-the-end-of-2025/"/>
    <id>https://www.manjusaka.blog/posts/2026/01/01/at-the-end-of-2025/</id>
    <published>2026-01-01T15:00:00.000Z</published>
    <updated>2026-03-29T17:00:43.280Z</updated>
    
    <content type="html"><![CDATA[<script src="/assets/js/DPlayer.min.js"> </script><p>每年都会选一句话作为年终总结的标题，去年是“本当の僕らをありがとう”，今年我选择“笑ってほしくて”。</p><p>这句话出自 《葬送的芙莉莲》的片尾曲《Anytime anywhere》。</p><blockquote><p>愿你露出笑容</p></blockquote><p>可能是我想对逝去之人，逝去的猫说的，可能也是他们想对我说的</p><p>也是我想对所有看到这篇文章的人说的</p><span id="more"></span><h2 id="开篇"><a href="#开篇" class="headerlink" title="开篇"></a>开篇</h2><p>实际上一度想放弃写这篇文章，因为总是会害怕。今年在生死边缘搏战了许久。但终究是没有挽留住。每次想起点点滴滴时，总会不免破防</p><p>但是就如同歌词所说</p><blockquote><p>こんなに胸が痛いのは/胸口传来的痛楚</p><p>あなたといた証かな/是与你同在的证明</p></blockquote><p>那么所以还是要写点什么来纪念这特殊的2025, 也是 saka 的2025</p><h2 id="生活"><a href="#生活" class="headerlink" title="生活"></a>生活</h2><p>今年生活中最大的事件就是猫咪的过世。小熊是我们在2023年从小区收养的猫咪。一最开始就给上了强度，重度口炎+肾衰。然后后续情况稳定后带回家进行日常的治疗。</p><p>而在24年经过一整年的折腾（开腹两次，病危三次），25年年初医生终于说可以两月复查后，我们以为小熊还能陪伴我们很长时间</p><p>但是事与愿违，在6月份过完到家纪念日后，小熊的状态就时好时坏。在8月份最后一次住进医院后，虽然期间还是有好转的时候，但是整体情况还是急转直下。最终我们选择放他离开</p><p>很难说，小熊的离去对我的打击是怎么样的。我同事说感觉我从中恢复过来还蛮快的。不过中个泪流时分也只有自己知道（其实写这篇文章的时候也是在默默流泪）</p><p>今年也是我好友离开我的第五个年头。恍惚间，我已经比他大了</p><blockquote><p>君埋泉下泥销骨，我寄人间雪满头</p></blockquote><p>不得不说古人是真的会写啊。</p><p>聊点开心的，今年我的日常生活比去年又丰富了很多。史无前例的，saka 获取了，他的，第一个全成就游戏！所以 2025 saka 严选最佳游戏的是？</p><p>是《空之轨迹 1st 重制版》</p><p>对不起 P5R，我也很爱你的（</p><p><img src="/images/end-of-2025/photo_2025-11-10_15-20-20.jpg" alt="P5R 白金"></p><p>今年也还在继续的拍照，买了 Z135 F1.8S 圆神，Z105 F2.8 百微，Z35 F1.2S 三个头，在摄影器材的路上越走越远，整体的快门数快 10w 了，整体以猫狗为主。某种意义上来说，我很庆幸用相机记录下了小熊这几年最精彩的时光</p><p>今年对于我来说另一个在最大的改变就是，我去了出来能去的最远的地方——日本</p><p>我精神图腾中的几大支柱（奥特曼，摇曳露营）都发源自日本，而我第一次出国便是去日本（某种意义上算冥冥之中的一种巧合）</p><p>在日本出差的这两周里，解锁了很多新奇的体验。去秋叶原爆米，去抚子的故乡爆米，在出租车上和司机一起唱奥特曼主题曲，和好友一起逛街爆米，和同事一起去热海合宿，见了好看的海边，去了沼津，吃了好吃的海鲜饭等等等。</p><p>某种意义上今年开启了我过去很多人生中未曾想过的体验。也让我坚定的了一个想法、</p><blockquote><p>人与人的欢乐实际上是可以相通的</p></blockquote><p><img src="/images/end-of-2025/IMG_1919.JPG" alt="铁道痴"></p><h2 id="感情"><a href="#感情" class="headerlink" title="感情"></a>感情</h2><p>感情进入了第七个年头。如果要为今年的感情沉淀一个主题的话，那可能就是四个字，生死相依</p><p>小熊走前的一个月，我们在医院陪护。仪器的滴鸣声，时而奔走的医护。时而逝去的生命。那种氛围真的很压抑</p><p>这种时候，两个人的相互的守望，可能是枯燥且祈祷着奇迹发生的日子里，已然发生少数的奇迹。</p><p>小熊离去后，我们两也久违的带着小狗出远门去散心，去一起给小狗拍了很多好看的照片，一起救助了其余的小猫，一起给医院捐献了设备。</p><p>能在人群中遇到彼此守望的伴侣，本身已然是奇迹</p><h2 id="工作-amp-技术"><a href="#工作-amp-技术" class="headerlink" title="工作&amp;技术"></a>工作&amp;技术</h2><p>今年年初在朋友邀请下，我加入了一家 AIGC 创业公司，负责一些 infra 上的工作。这算是我职业生涯一个全新的转折点。</p><p>如果说这里面最大的感受是什么，那就是我可能在之前聊过的，身份转变所带来的更多的责任与事情。</p><p>先聊聊务虚的部分，往常我只需要 focus 在具体的事情的落地，可能其余的东西都会有人帮我兜底。而我今年开始成为一个试图去帮助其余同事能够更好落地事情的人。说实话这种身份上的转变实际上会让你有很多想法转变。就如同我和团队成员 1:1 沟通时我一直在问的一个问题一样“你觉得我还能做什么，能帮助你落地现在的事情？”。很多时候我去思考事情的时候，我不能仅考虑这个事情本身，而是我需要去思考这个事情怎么样才能够帮助团队/同事实现更好的价值。</p><p>与此同时，我的职责以及角色会变得更多元化。这一点也是需要走出舒适区。比如今年我在公司做了很多涉及到 DBA 和搜索相关的工作。严格来说，这一部分工作其实离我的好球区实际上很远，但是如果一个事情是重要的，同时当下没有人比我更擅长这件事（换句话说让我做不会是最坏的结果），那么我就需要去承担起这份责任。</p><p>接着是务实的部分，如果要说今年很有成就感的事情的话，就是在加入公司后，从头开始做稳定性相关的建设，且收获颇丰。无论是在多次互联网 infra 大范围 crash 后我们能在最短的时间内恢复业务，还是给我们的业务带来了实际上的稳定性与性能提升。我所做的工作都能直接的作用在用户价值上。某种意义上来说，这也是人生的一件幸事</p><p>以上都是好的 part，坏的 part 也不是没有，那就是</p><blockquote><p>教练，我想写代码</p></blockquote><p>我现在写代码基本上只能靠工作之外写写代码来保持手感和练手，呜呜呜呜呜，我真的好想转岗去写代码啊（</p><p>不过这也不算坏事，我现在非常享受业余时间写代码的生活，所以今年在社区的贡献比去年多了不少，包括把自建图床项目的代码拾掇了很多，给 CPython 的 JIT 贡献了不少代码。也让自己名字第一次在官方文档里留下了痕迹。</p><p>如果说明年有什么想法的话，简单的说就是在工作上带好团队，能够继续突破我自己的舒适区。以及希望我能给 CPython 下一阶段的 JIT 贡献更多的代码（其实有一些想法，正好趁着元旦假期梳理一下）</p><p>啊，突然想起，我26年还得好好学学 GPU 相关的编程的东西，说实话学习新的东西真的是非常开心的一件事！</p><p><img src="/images/end-of-2025/git-wrapped-Zheaoli.png" alt="saka 的2025"></p><h2 id="关于-AI"><a href="#关于-AI" class="headerlink" title="关于 AI"></a>关于 AI</h2><p>算是凑个热点，聊聊我对 AI 的看法。</p><p>很多人都会陷入一种焦虑，“AI 会不会替代我自己”</p><p>我自己的看法愈发的清晰，AI 时代会让人的价值更大，而不是被消泯</p><p>这里的逻辑很简单，AI 让一切都变得更为高效，内容的生产更为高效，开发效率更为高效，公司的形态更为灵活，在这种情况下，传统的很多东西在 AI 时代都面临的新的挑战。比如我举几个例子</p><blockquote><p>对于愈发高效的内容生产效率，UGC 等形态的  AI Startup 会面临越来越大的 anti nsfw 的合规的挑战</p><p>而对于算力的愈发的渴求，边缘 GPU 算力的管控与接入也愈发成为一个重要的课题</p></blockquote><p>AI 带来的生产力提升其实会让人的价值在这个时代更为凸显，而不是被消解</p><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><p>差不多就是这样吧。过去一年有过不少很多的泪水，也有过很多的难眠之夜，也有过很多的快乐。</p><p>坦白来说能挺过这一年，是因为身边有着很多的陪伴，有我女朋友，有一群好朋友，有一群好同事。</p><p>每一个人的陪伴加上生活中辛酸苦辣甜汇聚在一起就成为了2025年的 saka，或者是 saka 的2025。</p><p>曾经想对于以后试图列出很多的展望，但是死生经历过一轮后，觉得剩下的都不太重要了</p><p>如同标题一样，</p><blockquote><p>笑ってほしくて/愿你露出笑容</p></blockquote><p>愿我们每一个人都能以笑容度过2026年的每一天</p><p>一万年太长，我们只笑今朝（</p><p>新年快乐！</p>]]></content>
    
    
    <summary type="html">&lt;p&gt;每年都会选一句话作为年终总结的标题，去年是“本当の僕らをありがとう”，今年我选择“笑ってほしくて”。&lt;/p&gt;
&lt;p&gt;这句话出自 《葬送的芙莉莲》的片尾曲《Anytime anywhere》。&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;愿你露出笑容&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;可能是我想对逝去之人，逝去的猫说的，可能也是他们想对我说的&lt;/p&gt;
&lt;p&gt;也是我想对所有看到这篇文章的人说的&lt;/p&gt;</summary>
    
    
    
    <category term="杂记" scheme="https://www.manjusaka.blog/categories/%E6%9D%82%E8%AE%B0/"/>
    
    <category term="总结" scheme="https://www.manjusaka.blog/categories/%E6%9D%82%E8%AE%B0/%E6%80%BB%E7%BB%93/"/>
    
    <category term="秀恩爱" scheme="https://www.manjusaka.blog/categories/%E6%9D%82%E8%AE%B0/%E6%80%BB%E7%BB%93/%E7%A7%80%E6%81%A9%E7%88%B1/"/>
    
    
    <category term="杂记" scheme="https://www.manjusaka.blog/tags/%E6%9D%82%E8%AE%B0/"/>
    
    <category term="总结" scheme="https://www.manjusaka.blog/tags/%E6%80%BB%E7%BB%93/"/>
    
  </entry>
  
  <entry>
    <title>成为榜样，但不要成为偶像</title>
    <link href="https://www.manjusaka.blog/posts/2025/10/02/be-a-good-example-not-be-an-idol/"/>
    <id>https://www.manjusaka.blog/posts/2025/10/02/be-a-good-example-not-be-an-idol/</id>
    <published>2025-10-02T20:00:00.000Z</published>
    <updated>2026-03-29T17:00:43.280Z</updated>
    
    <content type="html"><![CDATA[<script src="/assets/js/DPlayer.min.js"> </script><p>最近刚把空轨 1st 打通关。差不多国庆假期也要结束了，要开始准备接下来的学习和工作了。趁着这个机会，写点东西，当个迟到的 PyCon China 2025 的总结吧</p><span id="more"></span><h2 id="正文"><a href="#正文" class="headerlink" title="正文"></a>正文</h2><p>其实去年在结束去年的会议后，我就在考虑要不要彻底退出 PyCon China 一段时间。原因其实也很简单，太累了。不过我向来是个很拖延癌的人。恰逢那时候正在职业的变动期，所以想着要不要再看看。</p><p>今年来了一家 AIGC 公司做 infra，说实话我干的还蛮开心的。每天都会学习新的东西。状态相比于去年好了不少（除了每天都需要和该死的 Any 做斗争），所以一度准备继续参加今年的 PyCon China</p><p>不过到了筹备期后，原本的计划突生变故。陪伴需求的猫咪病危，去世。一直在医院通宵，陪护以及最终要面临的生死别离。对我的精神和体力造成了极大的消耗。是否继续参加今年的 PyCon China 也成为一个问题。</p><p>不过最终还是决定参加了。今年突然出版社那边给我加了一个活，我翻译的书要在现场签售。这就又成为我心里一个新的疑问：嘛，真的会有人来买吗？</p><p>嘛，其实这也是每年在参加 PyCon 时都会有的自我怀疑，我真的有资格在这里吗？我讲的东西真的会有人听吗？</p><p>不过做了最终的决定那么就继续做吧。所以今年还是给了一个白银赞助+一个主题演讲。</p><p>时间过得很快，转眼到了会议前一天，临上飞机前我还在被 AWS 的傻叉 OpenSearch 折磨。下飞机后 yihong，piglei，空想家，jay 他们去开 impact 了。好好好开 impact 不带我是吧。我就孤零零的跑去了酒店。有人演讲前12h文件夹都还没建是谁我不说。</p><p>在酒店顶了几个小时把 PPT 赶完，工作收个尾。勉强睡了2h，然后就去会场了。</p><p>今年比较轻松的是早上的主持不用我了，我只要负责暖场一下。不过临开场前发现 C 会场的 PPT 还没收，然后摇了晚枫帮忙。</p><p>暖场还是我每年的保留节目，Saka 三问.jpg:</p><ol><li>现在还在写 Python 的举个手</li><li>现在还没写 Python 的举个手（拖出去</li><li>用 2.7 的举个手</li></ol><p>看起来效果还不错。暖场完我就跑出去了。不过卧槽，今年怎么这么多人来单杀我，<del>妈耶社恐狂哭了</del>。</p><p>不过说实话，在经历生死一圈后，和老朋友打打闹闹，认识一下新朋友，还是蛮开心的。特别是很多人过来说我影响了他们，我是他们的偶像，他们从我的博客和分享里学到很多的时候。我一直以来的疑惑也得到了解决。虽然说人的自我认同最理想应该是内源性的，但是大家的认可也真的会让我很开心。</p><p>哦对了，还有很多人说他们也很喜欢摇曳露营！</p><p><img src="/images/luying1.jpeg" alt="摇曳露营的光辉指引我们"></p><p><img src="/images/photo_2025-09-20_13-09-39.jpg" alt="花絮"></p><p>值得一提的是，我司的宝藏同事从日本来一起面基了（是的，我们是 Remote 公司），还给我带了手办</p><p><img src="/images/G1STXPQbQAQnSDp.jpeg" alt="和同事一起"></p><p>当然有某屑 HR 说要来参会结果早上睡过了，是谁我不点名了</p><p>上午的时间其实过去的很快，很快就结束了。然后我就来到了签售地方。出乎意料的，有很多人都来了，有来捧场的老朋友们，有刚刚线下刚对上号的新朋友们。大家一起打打闹闹，我用我的丑字写了很多祝福，也有很多杂话。要说哪一句最真心，那我觉得是“不要用 Next.js”罢。</p><p><img src="/images/photo_2025-09-20_14-13-48.jpg" alt="不要用 Next.js"></p><p>签售完火急火燎的吃完盒饭，抽烟的时候遇到师父，<del>我开始基情的抚摸他的胸肌</del>，这可能算是每次我们师徒相聚的仪式感了。问他了一个问题：你现在还需要我帮你收尸么？这个出处源自于我们之前约好他要是自杀我会来帮他处理后事。他想了想说，我们不如想想怎么活到150岁吧。很好，很强大，我很喜欢的回答。那我就当不会了。</p><p>后面纯爷也来加入了聊天，我顺便向他倾诉了一下把 PostgreSQL 当 MongoDB 用的痛苦。纯爷也只能用爱莫能助的眼光看看我</p><p>下午 C 会场实际上因为我需要抽烟提神迟到了几分钟，yihong 帮我暖场缓解尴尬，不得不给他磕一个，以及 yihong 抱起来手感很好，建议网友有条件的可以去试试</p><p>下午到我的时候其实因为控场的原因给我的时间比预期的要短一些，所以我临时调整了一些内容。要上去讲的时候，发现很多人都从其余会场赶了过来，算是非常满足了。在 QA 的时候，我给大家说我现在在一家用着 Node.js 写着 Any，把 PostgreSQL 当 MongoDB 用，以及还用着 GraphQL 的公司工作。大家一片会心一笑。我想我们屑 HR 大概今天在现场招不到人了罢</p><p>下午的时候，屑 HR 和我们在上海的另外一位同事也来，带来了我需要<del>昏睡红茶</del>零度可乐。说实话在一个 Remote 公司大家面基的机会还是很少的。理所当然，我成为公司群内表情包的一部分。该考虑下找屑 HR 要肖像费了</p><p><img src="/images/20251003-032740.jpg" alt="屑同事们做的表情包"></p><p>晚上散场后，去和沪爷阿蔡以及几个同事一起组了一个局。不过说实话拉着明天要去霓虹的两位同事去吃日式拉面算不算另一种职场 80？笑死</p><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><p>说实话今年 PyCon 的当天是我这两个月最快乐的一点，也许在很多年后我记不得了今年讲了什么，但是还会记得当天最简单快乐。</p><p>嘛，从18年到现在，7年过去了，我也从一个刚出校园的年轻人变成了一个老登。似乎很多东西都在变，但是很多东西又没变。</p><p>我还是很菜，但是我好像比之前影响了更多的人，帮到了更多的人？我还有很多东西不会，但是我好像能学的东西也更多了？还是会经历很多痛苦，很多迷茫，很多挣扎。但是生活似乎也还是一如既往的有着无限的希望与美好？</p><p>回北京后，一次吃饭时，我给我女朋友说，你知道吗？很多人说我是他们的偶像。我女朋友说：不，你是他们的榜样</p><p>是的，成为榜样，但是不要成为偶像。</p><p>这篇文章差不多写到这里。要到中秋了，除了祝大家中秋快乐，阖家欢乐以外。也祝大家每个人都能在这个快速迭代的世道里永葆初心。用 Piglei 老师的话说就是“老而不登”</p><p>这个世界唯有爱，希望，奥特曼与摇曳露营不可辜负，抚门！</p>]]></content>
    
    
    <summary type="html">&lt;p&gt;最近刚把空轨 1st 打通关。差不多国庆假期也要结束了，要开始准备接下来的学习和工作了。趁着这个机会，写点东西，当个迟到的 PyCon China 2025 的总结吧&lt;/p&gt;</summary>
    
    
    
    <category term="杂记" scheme="https://www.manjusaka.blog/categories/%E6%9D%82%E8%AE%B0/"/>
    
    <category term="总结" scheme="https://www.manjusaka.blog/categories/%E6%9D%82%E8%AE%B0/%E6%80%BB%E7%BB%93/"/>
    
    
    <category term="杂记" scheme="https://www.manjusaka.blog/tags/%E6%9D%82%E8%AE%B0/"/>
    
    <category term="总结" scheme="https://www.manjusaka.blog/tags/%E6%80%BB%E7%BB%93/"/>
    
  </entry>
  
  <entry>
    <title>Further Performance Evolution in Python 3.14: Tail Call Interpreter</title>
    <link href="https://www.manjusaka.blog/posts/2025/07/02/tail-call-in-3-14-interpreter-en/"/>
    <id>https://www.manjusaka.blog/posts/2025/07/02/tail-call-in-3-14-interpreter-en/</id>
    <published>2025-07-02T15:49:00.000Z</published>
    <updated>2026-03-29T17:00:43.284Z</updated>
    
    <content type="html"><![CDATA[<script src="/assets/js/DPlayer.min.js"> </script><p>I’ve been overwhelmed by security work lately, so let me switch to something lighter to relax my mind.</p><p>Python 3.14 has officially introduced a new mechanism called Tail Call Interpreter (Made by Ken Jin), which is undoubtedly another major milestone that lays the foundation for the future.</p><span id="more"></span><h2 id="Main-Content"><a href="#Main-Content" class="headerlink" title="Main Content"></a>Main Content</h2><p>Before discussing Python 3.14’s Tail Call Interpreter, we need to talk about the most basic syntax in C - switch-case.</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">switch</span> (x) &#123;</span><br><span class="line">    <span class="keyword">case</span> <span class="number">1</span>:</span><br><span class="line">        <span class="comment">// do something</span></span><br><span class="line">        <span class="keyword">break</span>;</span><br><span class="line">    <span class="keyword">case</span> <span class="number">2</span>:</span><br><span class="line">        <span class="comment">// do something else</span></span><br><span class="line">        <span class="keyword">break</span>;</span><br><span class="line">    <span class="keyword">default</span>:</span><br><span class="line">        <span class="comment">// do default thing</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>What would the final assembly look like for such code? Most people’s first reaction might be to use <code>cmp</code> followed by <code>je</code>, and if it doesn’t match, continue. Let’s compile a version to see.</p><p>For this code:</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="type">void</span> <span class="title function_">small_switch</span><span class="params">(<span class="type">int</span> x)</span> &#123;</span><br><span class="line">    <span class="keyword">switch</span>(x) &#123;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">1</span>: <span class="built_in">printf</span>(<span class="string">&quot;One\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">2</span>: <span class="built_in">printf</span>(<span class="string">&quot;Two\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">3</span>: <span class="built_in">printf</span>(<span class="string">&quot;Three\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">default</span>: <span class="built_in">printf</span>(<span class="string">&quot;Other\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>The final assembly output would be:</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line">00000000000011f0 &lt;small_switch&gt;:</span><br><span class="line">    11f0:83 ff 02             cmp    $0x2,%edi</span><br><span class="line">    11f3:74 2b                je     1220 &lt;small_switch+0x30&gt;</span><br><span class="line">    11f5:83 ff 03             cmp    $0x3,%edi</span><br><span class="line">    11f8:74 16                je     1210 &lt;small_switch+0x20&gt;</span><br><span class="line">    11fa:83 ff 01             cmp    $0x1,%edi</span><br><span class="line">    11fd:75 31                jne    1230 &lt;small_switch+0x40&gt;</span><br><span class="line">    11ff:48 8d 3d fe 0d 00 00 lea    0xdfe(%rip),%rdi        # 2004 &lt;_IO_stdin_used+0x4&gt;</span><br><span class="line">    1206:e9 25 fe ff ff       jmp    1030 &lt;puts@plt&gt;</span><br><span class="line">    120b:0f 1f 44 00 00       nopl   0x0(%rax,%rax,1)</span><br><span class="line">    1210:48 8d 3d f5 0d 00 00 lea    0xdf5(%rip),%rdi        # 200c &lt;_IO_stdin_used+0xc&gt;</span><br><span class="line">    1217:e9 14 fe ff ff       jmp    1030 &lt;puts@plt&gt;</span><br><span class="line">    121c:0f 1f 40 00          nopl   0x0(%rax)</span><br><span class="line">    1220:48 8d 3d e1 0d 00 00 lea    0xde1(%rip),%rdi        # 2008 &lt;_IO_stdin_used+0x8&gt;</span><br><span class="line">    1227:e9 04 fe ff ff       jmp    1030 &lt;puts@plt&gt;</span><br><span class="line">    122c:0f 1f 40 00          nopl   0x0(%rax)</span><br><span class="line">    1230:48 8d 3d db 0d 00 00 lea    0xddb(%rip),%rdi        # 2012 &lt;_IO_stdin_used+0x12&gt;</span><br><span class="line">    1237:e9 f4 fd ff ff       jmp    1030 &lt;puts@plt&gt;</span><br><span class="line">    123c:0f 1f 40 00          nopl   0x0(%rax)</span><br></pre></td></tr></table></figure><p>We can see that overall it’s as we expected - continuous <code>cmp</code> followed by continuous <code>je</code>. Now let’s evaluate the complexity here? Oh, time complexity O(n), that’s straightforward.</p><p>Damn, for Python with such a huge switch case structure, wouldn’t that be O(n) every time? Wouldn’t that be a performance disaster?</p><p>Actually, no. Usually, compilers use different strategies to handle switch case structures based on data type and scale.</p><p>Let’s prepare several examples:</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br></pre></td><td class="code"><pre><span class="line"><span class="type">void</span> <span class="title function_">small_switch</span><span class="params">(<span class="type">int</span> x)</span> &#123;</span><br><span class="line">    <span class="keyword">switch</span>(x) &#123;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">1</span>: <span class="built_in">printf</span>(<span class="string">&quot;One\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">2</span>: <span class="built_in">printf</span>(<span class="string">&quot;Two\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">3</span>: <span class="built_in">printf</span>(<span class="string">&quot;Three\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">default</span>: <span class="built_in">printf</span>(<span class="string">&quot;Other\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="type">void</span> <span class="title function_">dense_switch</span><span class="params">(<span class="type">int</span> x)</span> &#123;</span><br><span class="line">    <span class="keyword">switch</span>(x) &#123;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">10</span>: <span class="built_in">printf</span>(<span class="string">&quot;Ten\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">11</span>: <span class="built_in">printf</span>(<span class="string">&quot;Eleven\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">12</span>: <span class="built_in">printf</span>(<span class="string">&quot;Twelve\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">13</span>: <span class="built_in">printf</span>(<span class="string">&quot;Thirteen\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">14</span>: <span class="built_in">printf</span>(<span class="string">&quot;Fourteen\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">15</span>: <span class="built_in">printf</span>(<span class="string">&quot;Fifteen\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">16</span>: <span class="built_in">printf</span>(<span class="string">&quot;Sixteen\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">17</span>: <span class="built_in">printf</span>(<span class="string">&quot;Seventeen\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">default</span>: <span class="built_in">printf</span>(<span class="string">&quot;Other\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="type">void</span> <span class="title function_">sparse_switch</span><span class="params">(<span class="type">int</span> x)</span> &#123;</span><br><span class="line">    <span class="keyword">switch</span>(x) &#123;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">1</span>: <span class="built_in">printf</span>(<span class="string">&quot;One\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">100</span>: <span class="built_in">printf</span>(<span class="string">&quot;Hundred\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">1000</span>: <span class="built_in">printf</span>(<span class="string">&quot;Thousand\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">10000</span>: <span class="built_in">printf</span>(<span class="string">&quot;Ten thousand\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">default</span>: <span class="built_in">printf</span>(<span class="string">&quot;Other\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="type">void</span> <span class="title function_">large_dense_switch</span><span class="params">(<span class="type">int</span> x)</span> &#123;</span><br><span class="line">    <span class="keyword">switch</span>(x) &#123;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">1</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 1\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">2</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 2\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">3</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 3\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">4</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 4\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">5</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 5\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">6</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 6\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">7</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 7\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">8</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 8\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">9</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 9\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">10</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 10\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">11</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 11\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">12</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 12\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">13</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 13\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">14</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 14\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">15</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 15\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">16</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 16\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">17</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 17\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">18</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 18\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">19</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 19\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">20</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 20\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">default</span>: <span class="built_in">printf</span>(<span class="string">&quot;Other\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="type">void</span> <span class="title function_">mixed_switch</span><span class="params">(<span class="type">int</span> x)</span> &#123;</span><br><span class="line">    <span class="keyword">switch</span>(x) &#123;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">1</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 1\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">2</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 2\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">3</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 3\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        </span><br><span class="line">        <span class="keyword">case</span> <span class="number">50</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 50\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        </span><br><span class="line">        <span class="keyword">case</span> <span class="number">100</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 100\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">101</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 101\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">102</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 102\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        </span><br><span class="line">        <span class="keyword">default</span>: <span class="built_in">printf</span>(<span class="string">&quot;Other\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="type">void</span> <span class="title function_">char_switch</span><span class="params">(<span class="type">char</span> c)</span> &#123;</span><br><span class="line">    <span class="keyword">switch</span>(c) &#123;</span><br><span class="line">        <span class="keyword">case</span> <span class="string">&#x27;a&#x27;</span>: <span class="built_in">printf</span>(<span class="string">&quot;Letter a\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="string">&#x27;b&#x27;</span>: <span class="built_in">printf</span>(<span class="string">&quot;Letter b\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="string">&#x27;c&#x27;</span>: <span class="built_in">printf</span>(<span class="string">&quot;Letter c\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="string">&#x27;d&#x27;</span>: <span class="built_in">printf</span>(<span class="string">&quot;Letter d\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="string">&#x27;e&#x27;</span>: <span class="built_in">printf</span>(<span class="string">&quot;Letter e\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">default</span>: <span class="built_in">printf</span>(<span class="string">&quot;Other char\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>Then we disassemble and look at the results (I’ll only paste the key parts here):</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br></pre></td><td class="code"><pre><span class="line">00000000000011f0 &lt;small_switch&gt;:</span><br><span class="line">    11f0:83 ff 02             cmp    $0x2,%edi      # Compare if it&#x27;s 2</span><br><span class="line">    11f3:74 2b                je     1220            # Jump to case 2</span><br><span class="line">    11f5:83 ff 03             cmp    $0x3,%edi      # Compare if it&#x27;s 3</span><br><span class="line">    11f8:74 16                je     1210            # Jump to case 3</span><br><span class="line">    11fa:83 ff 01             cmp    $0x1,%edi      # Compare if it&#x27;s 1</span><br><span class="line">    11fd:75 31                jne    1230            # If not, jump to default</span><br><span class="line"></span><br><span class="line">0000000000001240 &lt;dense_switch&gt;:</span><br><span class="line">    1240:83 ef 0a             sub    $0xa,%edi      # Subtract 10 (minimum case value)</span><br><span class="line">    1243:83 ff 07             cmp    $0x7,%edi      # Compare range (17-10=7)</span><br><span class="line">    1246:0f 87 90 00 00 00    ja     12dc           # Out of range, jump to default</span><br><span class="line">    124c:48 8d 15 15 0f 00 00 lea    0xf15(%rip),%rdx # Load jump table address</span><br><span class="line">    1253:48 63 04 ba          movslq (%rdx,%rdi,4),%rax # Get offset</span><br><span class="line">    1257:48 01 d0             add    %rdx,%rax      # Calculate target address</span><br><span class="line">    125a:ff e0                jmp    *%rax          # Indirect jump</span><br><span class="line"></span><br><span class="line">00000000000012f0 &lt;sparse_switch&gt;:</span><br><span class="line">    12f0:81 ff e8 03 00 00    cmp    $0x3e8,%edi    # Compare 1000</span><br><span class="line">    12f6:74 40                je     1338           # If equal, jump</span><br><span class="line">    12f8:7f 16                jg     1310           # If greater than 1000, continue checking</span><br><span class="line">    12fa:83 ff 01             cmp    $0x1,%edi      # Less than 1000, check 1</span><br><span class="line">    12fd:74 49                je     1348           </span><br><span class="line">    12ff:83 ff 64             cmp    $0x64,%edi     # Check 100</span><br><span class="line">    1302:75 24                jne    1328           # If none match, default</span><br><span class="line">    ...</span><br><span class="line">    1310:81 ff 10 27 00 00    cmp    $0x2710,%edi   # Check 10000</span><br><span class="line"></span><br><span class="line">0000000000001360 &lt;large_dense_switch&gt;:</span><br><span class="line">    1360:83 ff 14             cmp    $0x14,%edi     # Check if ≤20</span><br><span class="line">    1363:0f 87 53 01 00 00    ja     14bc           # Out of range</span><br><span class="line">    1369:48 8d 15 18 0e 00 00 lea    0xe18(%rip),%rdx # Jump table address</span><br><span class="line">    1372:48 63 04 ba          movslq (%rdx,%rdi,4),%rax # Direct indexing</span><br><span class="line">    1376:48 01 d0             add    %rdx,%rax</span><br><span class="line">    1379:ff e0                jmp    *%rax</span><br><span class="line"></span><br><span class="line">00000000000014d0 &lt;mixed_switch&gt;:</span><br><span class="line">    14d0:83 ff 32             cmp    $0x32,%edi     # Compare 50</span><br><span class="line">    14d3:74 7b                je     1550</span><br><span class="line">    14d5:7f 29                jg     1500           # &gt;50 case</span><br><span class="line">    14d7:83 ff 02             cmp    $0x2,%edi      # ≤50, check small values</span><br><span class="line">    14da:74 64                je     1540</span><br><span class="line">    14dc:83 ff 03             cmp    $0x3,%edi</span><br><span class="line">    ...</span><br><span class="line">    1500:83 ff 65             cmp    $0x65,%edi     # &gt;50, check 100,101,102</span><br><span class="line">    1503:74 5b                je     1560</span><br><span class="line"></span><br><span class="line">0000000000001580 &lt;char_switch&gt;:</span><br><span class="line">    1580:83 ef 61             sub    $0x61,%edi     # Subtract ASCII value of &#x27;a&#x27;</span><br><span class="line">    1583:40 80 ff 04          cmp    $0x4,%dil      # Check if ≤4 (a-e)</span><br><span class="line">    1587:77 63                ja     15ec</span><br><span class="line">    1589:48 8d 15 4c 0c 00 00 lea    0xc4c(%rip),%rdx</span><br><span class="line">    1590:40 0f b6 ff          movzbl %dil,%edi      # Zero-extend to 32 bit</span><br><span class="line">    1594:48 63 04 ba          movslq (%rdx,%rdi,4),%rax</span><br></pre></td></tr></table></figure><p>Here we can see that the compiler handles switch case structures differently based on different data types. Let me summarize this with a table:</p><div class="table-container"><table><thead><tr><th>Switch Type</th><th>Case Count</th><th>Distribution</th><th>Compiler Strategy</th><th>Time Complexity</th></tr></thead><tbody><tr><td>small_switch</td><td>3</td><td>Consecutive(1,2,3)</td><td>Linear comparison</td><td>O(n)</td></tr><tr><td>dense_switch</td><td>8</td><td>Consecutive(10-17)</td><td>Offset jump table</td><td>O(1)</td></tr><tr><td>sparse_switch</td><td>4</td><td>Sparse(1,100,1000,10000)</td><td>Binary search</td><td>O(log n)</td></tr><tr><td>large_dense_switch</td><td>20</td><td>Consecutive(1-20)</td><td>Standard jump table</td><td>O(1)</td></tr><tr><td>mixed_switch</td><td>7</td><td>Partially consecutive</td><td>Nested comparison</td><td>O(log n)</td></tr><tr><td>char_switch</td><td>5</td><td>Consecutive(‘a’-‘e’)</td><td>Character offset table</td><td>O(1)</td></tr></tbody></table></div><p>OK, here we find that the final implementation of switch-case varies depending on data types, leading to unpredictability in our final code. So do we have ways to optimize this problem? The answer is yes.</p><p>Let’s look at the following code:</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string">&lt;stdio.h&gt;</span></span></span><br><span class="line"></span><br><span class="line"><span class="type">void</span> <span class="title function_">basic_computed_goto</span><span class="params">(<span class="type">int</span> operation)</span> &#123;</span><br><span class="line">    <span class="type">static</span> <span class="type">void</span>* jump_table[] = &#123;</span><br><span class="line">        &amp;&amp;op_add,   </span><br><span class="line">        &amp;&amp;op_sub,   </span><br><span class="line">        &amp;&amp;op_mul,   </span><br><span class="line">        &amp;&amp;op_div,   </span><br><span class="line">        &amp;&amp;op_mod,   </span><br><span class="line">        &amp;&amp;op_default</span><br><span class="line">    &#125;;</span><br><span class="line">    </span><br><span class="line">    <span class="type">int</span> a = <span class="number">10</span>, b = <span class="number">3</span>;</span><br><span class="line">    <span class="type">int</span> result;</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">if</span> (operation &lt; <span class="number">0</span> || operation &gt; <span class="number">4</span>) &#123;</span><br><span class="line">        operation = <span class="number">5</span>;</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="built_in">printf</span>(<span class="string">&quot;Operation %d: a=%d, b=%d -&gt; &quot;</span>, operation, a, b);</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">goto</span> *jump_table[operation];</span><br><span class="line">    </span><br><span class="line">op_add:</span><br><span class="line">    result = a + b;</span><br><span class="line">    <span class="built_in">printf</span>(<span class="string">&quot;ADD: %d\n&quot;</span>, result);</span><br><span class="line">    <span class="keyword">return</span>;</span><br><span class="line">    </span><br><span class="line">op_sub:</span><br><span class="line">    result = a - b;</span><br><span class="line">    <span class="built_in">printf</span>(<span class="string">&quot;SUB: %d\n&quot;</span>, result);</span><br><span class="line">    <span class="keyword">return</span>;</span><br><span class="line">    </span><br><span class="line">op_mul:</span><br><span class="line">    result = a * b;</span><br><span class="line">    <span class="built_in">printf</span>(<span class="string">&quot;MUL: %d\n&quot;</span>, result);</span><br><span class="line">    <span class="keyword">return</span>;</span><br><span class="line">    </span><br><span class="line">op_div:</span><br><span class="line">    <span class="keyword">if</span> (b != <span class="number">0</span>) &#123;</span><br><span class="line">        result = a / b;</span><br><span class="line">        <span class="built_in">printf</span>(<span class="string">&quot;DIV: %d\n&quot;</span>, result);</span><br><span class="line">    &#125; <span class="keyword">else</span> &#123;</span><br><span class="line">        <span class="built_in">printf</span>(<span class="string">&quot;DIV: Error (division by zero)\n&quot;</span>);</span><br><span class="line">    &#125;</span><br><span class="line">    <span class="keyword">return</span>;</span><br><span class="line">    </span><br><span class="line">op_mod:</span><br><span class="line">    <span class="keyword">if</span> (b != <span class="number">0</span>) &#123;</span><br><span class="line">        result = a % b;</span><br><span class="line">        <span class="built_in">printf</span>(<span class="string">&quot;MOD: %d\n&quot;</span>, result);</span><br><span class="line">    &#125; <span class="keyword">else</span> &#123;</span><br><span class="line">        <span class="built_in">printf</span>(<span class="string">&quot;MOD: Error (division by zero)\n&quot;</span>);</span><br><span class="line">    &#125;</span><br><span class="line">    <span class="keyword">return</span>;</span><br><span class="line">    </span><br><span class="line">op_default:</span><br><span class="line">    <span class="built_in">printf</span>(<span class="string">&quot;Unknown operation\n&quot;</span>);</span><br><span class="line">    <span class="keyword">return</span>;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>We can see that the core operation here is to turn each case of our switch-case into a label, then we use a jump_table to directly jump to the corresponding label. Let’s look at the assembly of the most critical part:</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">11d3:48 8d 05 c6 2b 00 00 lea    0x2bc6(%rip),%rax        # 3da0 &lt;jump_table.0&gt;</span><br><span class="line">11da:ff 24 d8             jmp    *(%rax,%rbx,8)</span><br></pre></td></tr></table></figure><p>Here we can summarize that using Computed Goto compared to traditional switch-case has the following advantages:</p><ol><li>Reduces the cost of branch prediction fallback</li><li>Better instruction cache locality</li><li>Reduces the number and overhead of cmp instructions</li></ol><p>So how much faster can it be? You can refer to the test results of Computed Goto introduced in CPython, which showed an overall improvement of about 15%.</p><p>So is the Computed Goto approach perfect? Actually, no. Currently, CPython’s interpreter ceval.c, which is also currently the largest switch case, has several typical problems:</p><ol><li>Computed Goto as a specialized feature of clang and gcc, other platforms have limited benefits</li><li>Currently Computed Goto is not mature, different versions of the same compiler may have different issues</li><li>Extremely large switch cases cause compilers to not optimize switch cases well enough</li><li>We cannot use perf to precisely perform quantitative analysis of per-opcode overhead in our entire process, which will be a big problem in the context of making Python faster</li></ol><p>Points 1, 3, and 4 are easy to understand. Let’s look at an example of point 2 (thanks to Ken Jin for providing the example).</p><p>In GCC 11, switch-case would generate normal code in certain parts:</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">738f: movq%r13, %r15</span><br><span class="line">7392: movzbl%ah, %ebx</span><br><span class="line">7395: movzbl%al, %eax</span><br><span class="line">7398: movq(,%rax,8), %rax</span><br><span class="line">73a0: movl%ebx, -0x248(%rbp)</span><br><span class="line">73a6: jmpq*%rax</span><br></pre></td></tr></table></figure><p>While in GCC 13-15Beta, it would generate code like this:</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line">747a: movzbl%ah, %ebx</span><br><span class="line">747d: movzbl%al, %eax</span><br><span class="line">7480: movl%ebx, -0x248(%rbp)</span><br><span class="line">7486: movq(,%rax,8), %rax</span><br><span class="line">748e: jmp0x72a0 &lt;_PyEval_EvalFrameDefault+0x970&gt;</span><br><span class="line"></span><br><span class="line">72a0: movq%r15, %xmm0</span><br><span class="line">72a5: movq%r13, %xmm3</span><br><span class="line">72aa: movq%r15, %rbx</span><br><span class="line">72ad: punpcklqdq%xmm3, %xmm0</span><br><span class="line">72b1: movhlps%xmm0, %xmm2</span><br><span class="line">72b4: movq%xmm2, %r10</span><br><span class="line">72b9: movq%r10, %r11</span><br><span class="line">72bc: jmpq*%rax</span><br></pre></td></tr></table></figure><p>We can see that additional registers were introduced. Computer Architecture 101: additional registers mean additional overhead. Registers are expensive!</p><p>So do we have ways to iterate on the extremely large switch case above? Some students might be thinking, since the switch case above is extremely large, why don’t we split it into multiple small functions so that the compiler can have enough context to optimize, and our perf can also precisely analyze the overhead of each function. Wouldn’t that be great?</p><p>But other students object: function calls trigger call instructions, which bring additional overhead of register push and pop operations. Won’t this make it slower again?</p><p>So can we optimize this? The answer is yes. Many students might have thought of it - tail call.</p><p>Suppose we have this code:</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">__attribute__((noinline)) <span class="type">void</span> <span class="title function_">g</span><span class="params">(<span class="type">int</span> x)</span> &#123;</span><br><span class="line">    <span class="built_in">printf</span>(<span class="string">&quot;Value: %d\n&quot;</span>, x);</span><br><span class="line">&#125;;</span><br><span class="line"><span class="type">void</span> <span class="title function_">f</span><span class="params">(<span class="type">int</span> x)</span> &#123;</span><br><span class="line">    <span class="keyword">return</span> g(x);</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>We can see this assembly:</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br></pre></td><td class="code"><pre><span class="line">0000000000001140 &lt;g&gt;:</span><br><span class="line">    1140:55                   push   %rbp</span><br><span class="line">    1141:48 89 e5             mov    %rsp,%rbp</span><br><span class="line">    1144:48 83 ec 10          sub    $0x10,%rsp</span><br><span class="line">    1148:89 7d fc             mov    %edi,-0x4(%rbp)</span><br><span class="line">    114b:8b 75 fc             mov    -0x4(%rbp),%esi</span><br><span class="line">    114e:48 8d 3d af 0e 00 00 lea    0xeaf(%rip),%rdi        # 2004 &lt;_IO_stdin_used+0x4&gt;</span><br><span class="line">    1155:b0 00                mov    $0x0,%al</span><br><span class="line">    1157:e8 d4 fe ff ff       call   1030 &lt;printf@plt&gt;</span><br><span class="line">    115c:48 83 c4 10          add    $0x10,%rsp</span><br><span class="line">    1160:5d                   pop    %rbp</span><br><span class="line">    1161:c3                   ret</span><br><span class="line">    1162:66 66 66 66 66 2e 0f data16 data16 data16 data16 cs nopw 0x0(%rax,%rax,1)</span><br><span class="line">    1169:1f 84 00 00 00 00 00 </span><br><span class="line"></span><br><span class="line">0000000000001170 &lt;f&gt;:</span><br><span class="line">    1170:55                   push   %rbp</span><br><span class="line">    1171:48 89 e5             mov    %rsp,%rbp</span><br><span class="line">    1174:48 83 ec 10          sub    $0x10,%rsp</span><br><span class="line">    1178:89 7d fc             mov    %edi,-0x4(%rbp)</span><br><span class="line">    117b:8b 7d fc             mov    -0x4(%rbp),%edi</span><br><span class="line">    117e:e8 bd ff ff ff       call   1140 &lt;g&gt;</span><br><span class="line">    1183:48 83 c4 10          add    $0x10,%rsp</span><br><span class="line">    1187:5d                   pop    %rbp</span><br><span class="line">    1188:c3                   ret</span><br><span class="line">    1189:0f 1f 80 00 00 00 00 nopl   0x0(%rax)</span><br></pre></td></tr></table></figure><p>The <code>call   1140 &lt;g&gt;</code> instruction is very prominent. This is also an important source of function call overhead.</p><p>In modern compilers, there’s a special optimization called tail recursion, where when the last step of a function is calling another function, the compiler can optimize away the overhead of this call.</p><p>Let’s test this:</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string">&lt;stdio.h&gt;</span></span></span><br><span class="line">__attribute__((preserve_none)) <span class="type">void</span> <span class="title function_">g</span><span class="params">(<span class="type">int</span> x)</span>;</span><br><span class="line">__attribute__((noinline, preserve_none)) <span class="type">void</span> <span class="title function_">g</span><span class="params">(<span class="type">int</span> x)</span>&#123;</span><br><span class="line">    <span class="built_in">printf</span>(<span class="string">&quot;Value: %d\n&quot;</span>, x);</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">__attribute__((preserve_none)) <span class="type">void</span> <span class="title function_">f</span><span class="params">(<span class="type">int</span> x)</span> &#123;</span><br><span class="line">    [[clang::musttail]] <span class="keyword">return</span> g(x);</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="type">int</span> <span class="title function_">main</span><span class="params">()</span> &#123;</span><br><span class="line">    f(<span class="number">42</span>);</span><br><span class="line">    <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>Let’s look at the related assembly:</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br></pre></td><td class="code"><pre><span class="line">0000000000001140 &lt;g&gt;:</span><br><span class="line">    1140:55                   push   %rbp</span><br><span class="line">    1141:48 89 e5             mov    %rsp,%rbp</span><br><span class="line">    1144:48 83 ec 10          sub    $0x10,%rsp</span><br><span class="line">    1148:44 89 65 fc          mov    %r12d,-0x4(%rbp)</span><br><span class="line">    114c:8b 75 fc             mov    -0x4(%rbp),%esi</span><br><span class="line">    114f:48 8d 3d ae 0e 00 00 lea    0xeae(%rip),%rdi        # 2004 &lt;_IO_stdin_used+0x4&gt;</span><br><span class="line">    1156:31 c0                xor    %eax,%eax</span><br><span class="line">    1158:e8 d3 fe ff ff       call   1030 &lt;printf@plt&gt;</span><br><span class="line">    115d:48 83 c4 10          add    $0x10,%rsp</span><br><span class="line">    1161:5d                   pop    %rbp</span><br><span class="line">    1162:c3                   ret</span><br><span class="line">    1163:66 66 66 66 2e 0f 1f data16 data16 data16 cs nopw 0x0(%rax,%rax,1)</span><br><span class="line">    116a:84 00 00 00 00 00 </span><br><span class="line"></span><br><span class="line">0000000000001170 &lt;f&gt;:</span><br><span class="line">    1170:55                   push   %rbp</span><br><span class="line">    1171:48 89 e5             mov    %rsp,%rbp</span><br><span class="line">    1174:44 89 65 fc          mov    %r12d,-0x4(%rbp)</span><br><span class="line">    1178:44 8b 65 fc          mov    -0x4(%rbp),%r12d</span><br><span class="line">    117c:5d                   pop    %rbp</span><br><span class="line">    117d:e9 be ff ff ff       jmp    1140 &lt;g&gt;</span><br><span class="line">    1182:66 66 66 66 66 2e 0f data16 data16 data16 data16 cs nopw 0x0(%rax,%rax,1)</span><br><span class="line">    1189:1f 84 00 00 00 00 00 </span><br></pre></td></tr></table></figure><p>We can see that the last step of function <code>f</code> is <code>jmp 1140 &lt;g&gt;</code>, not <code>call 1140 &lt;g&gt;</code>. This means when we call function <code>g</code>, we won’t have additional overhead like register allocation that traditional call instructions bring.</p><p>Some students might have realized that after tail recursion processing, this can completely be viewed as a high-performance Goto.</p><p>Bingo, the idea here is similar. A 1977 paper “Debunking the ‘Expensive Procedure Call’ Myth, or, Procedure Call Implementations Considered Harmful, or, Lambda: The Ultimate GOTO” mentioned that efficient procedure calls can have performance close to Goto, while being more concise in implementation.</p><p>In Python 3.14, the implementation of Tail Call Interpreter is based on this idea.</p><p>We can see that we’ve applied tail recursion processing to the macro that dispatches opcodes:</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#   <span class="keyword">define</span> Py_MUSTTAIL [[clang::musttail]]</span></span><br><span class="line"><span class="meta">#   <span class="keyword">define</span> Py_PRESERVE_NONE_CC __attribute__((preserve_none))</span></span><br><span class="line">    Py_PRESERVE_NONE_CC <span class="keyword">typedef</span> PyObject* (*py_tail_call_funcptr)(TAIL_CALL_PARAMS);</span><br><span class="line"></span><br><span class="line"><span class="meta">#   <span class="keyword">define</span> TARGET(op) Py_PRESERVE_NONE_CC PyObject *_TAIL_CALL_##op(TAIL_CALL_PARAMS)</span></span><br><span class="line"><span class="meta">#   <span class="keyword">define</span> DISPATCH_GOTO() \</span></span><br><span class="line"><span class="meta">        do &#123; \</span></span><br><span class="line"><span class="meta">            Py_MUSTTAIL return (INSTRUCTION_TABLE[opcode])(TAIL_CALL_ARGS); \</span></span><br><span class="line"><span class="meta">        &#125; while (0)</span></span><br><span class="line"><span class="meta">#   <span class="keyword">define</span> JUMP_TO_LABEL(name) \</span></span><br><span class="line"><span class="meta">        do &#123; \</span></span><br><span class="line"><span class="meta">            Py_MUSTTAIL return (_TAIL_CALL_##name)(TAIL_CALL_ARGS); \</span></span><br><span class="line"><span class="meta">        &#125; while (0)</span></span><br><span class="line"><span class="meta">#   <span class="keyword">ifdef</span> Py_STATS</span></span><br><span class="line"><span class="meta">#       <span class="keyword">define</span> JUMP_TO_PREDICTED(name) \</span></span><br><span class="line"><span class="meta">            do &#123; \</span></span><br><span class="line"><span class="meta">                Py_MUSTTAIL return (_TAIL_CALL_##name)(frame, stack_pointer, tstate, this_instr, oparg, lastopcode); \</span></span><br><span class="line"><span class="meta">            &#125; while (0)</span></span><br><span class="line"><span class="meta">#   <span class="keyword">else</span></span></span><br><span class="line"><span class="meta">#       <span class="keyword">define</span> JUMP_TO_PREDICTED(name) \</span></span><br><span class="line"><span class="meta">            do &#123; \</span></span><br><span class="line"><span class="meta">                Py_MUSTTAIL return (_TAIL_CALL_##name)(frame, stack_pointer, tstate, this_instr, oparg); \</span></span><br><span class="line"><span class="meta">            &#125; while (0)</span></span><br><span class="line"><span class="meta">#   <span class="keyword">endif</span></span></span><br><span class="line"><span class="meta">#    <span class="keyword">define</span> LABEL(name) TARGET(name)</span></span><br></pre></td></tr></table></figure><p>So while ensuring our baseline performance is as good as or even better than Computed Goto, we can get the following benefits:</p><ol><li>Broader platform support</li><li>After splitting cases, compilers are less likely to make mistakes, and overall performance predictability is stronger</li><li>Happy perf</li><li>And I can do more cool stuff with tools like eBPF</li></ol><h2 id="Summary"><a href="#Summary" class="headerlink" title="Summary"></a>Summary</h2><p>This article is pretty much it. Although it claims to only introduce Python 3.14’s Tail Call Interpreter, it still completely introduces the entire evolution of thinking.</p><p>This also gives me an insight: predictability is really a very important characteristic in many cases.</p><p>This, along with remote debug, are the two features I like most in 3.14. Long live observability!</p>]]></content>
    
    
    <summary type="html">&lt;p&gt;I’ve been overwhelmed by security work lately, so let me switch to something lighter to relax my mind.&lt;/p&gt;
&lt;p&gt;Python 3.14 has officially introduced a new mechanism called Tail Call Interpreter (Made by Ken Jin), which is undoubtedly another major milestone that lays the foundation for the future.&lt;/p&gt;</summary>
    
    
    
    <category term="Programming" scheme="https://www.manjusaka.blog/categories/Programming/"/>
    
    <category term="Python" scheme="https://www.manjusaka.blog/categories/Programming/Python/"/>
    
    
    <category term="Linux" scheme="https://www.manjusaka.blog/tags/Linux/"/>
    
    <category term="Python" scheme="https://www.manjusaka.blog/tags/Python/"/>
    
    <category term="Notes" scheme="https://www.manjusaka.blog/tags/Notes/"/>
    
    <category term="Programming" scheme="https://www.manjusaka.blog/tags/Programming/"/>
    
    <category term="Random" scheme="https://www.manjusaka.blog/tags/Random/"/>
    
  </entry>
  
  <entry>
    <title>Python 3.14 的进一步性能进化: Tail Call Interpreter</title>
    <link href="https://www.manjusaka.blog/posts/2025/07/02/tail-call-in-3-14-interpreter/"/>
    <id>https://www.manjusaka.blog/posts/2025/07/02/tail-call-in-3-14-interpreter/</id>
    <published>2025-07-02T15:49:00.000Z</published>
    <updated>2026-03-29T17:00:43.284Z</updated>
    
    <content type="html"><![CDATA[<script src="/assets/js/DPlayer.min.js"> </script><p>最近做安全做的我头晕脑胀，来点轻松的换换脑子，让自己放松下</p><p>Python 3.14 正式引入了一个新的机制叫作 Tail Call Interpreter（Made by Ken Jin），这无疑又是一个奠定未来基础的重大工作</p><span id="more"></span><h2 id="正文"><a href="#正文" class="headerlink" title="正文"></a>正文</h2><p>在聊 Python 3.14 的 Tail Call Interpreter 之前，我们先要来聊 C 语言中最基本的语法 switch-case</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">switch</span> (x) &#123;</span><br><span class="line">    <span class="keyword">case</span> <span class="number">1</span>:</span><br><span class="line">        <span class="comment">// do something</span></span><br><span class="line">        <span class="keyword">break</span>;</span><br><span class="line">    <span class="keyword">case</span> <span class="number">2</span>:</span><br><span class="line">        <span class="comment">// do something else</span></span><br><span class="line">        <span class="keyword">break</span>;</span><br><span class="line">    <span class="keyword">default</span>:</span><br><span class="line">        <span class="comment">// do default thing</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>对于这样的代码我们最终的汇编会是什么样的呢？可能大家第一反应是先 cmp 然后 je ，不等式秒了，我们编译一个版本来看看</p><p>对于这样一段代码</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="type">void</span> <span class="title function_">small_switch</span><span class="params">(<span class="type">int</span> x)</span> &#123;</span><br><span class="line">    <span class="keyword">switch</span>(x) &#123;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">1</span>: <span class="built_in">printf</span>(<span class="string">&quot;One\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">2</span>: <span class="built_in">printf</span>(<span class="string">&quot;Two\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">3</span>: <span class="built_in">printf</span>(<span class="string">&quot;Three\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">default</span>: <span class="built_in">printf</span>(<span class="string">&quot;Other\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>最终汇编的产物会是这样的</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line">00000000000011f0 &lt;small_switch&gt;:</span><br><span class="line">    11f0:83 ff 02             cmp    $0x2,%edi</span><br><span class="line">    11f3:74 2b                je     1220 &lt;small_switch+0x30&gt;</span><br><span class="line">    11f5:83 ff 03             cmp    $0x3,%edi</span><br><span class="line">    11f8:74 16                je     1210 &lt;small_switch+0x20&gt;</span><br><span class="line">    11fa:83 ff 01             cmp    $0x1,%edi</span><br><span class="line">    11fd:75 31                jne    1230 &lt;small_switch+0x40&gt;</span><br><span class="line">    11ff:48 8d 3d fe 0d 00 00 lea    0xdfe(%rip),%rdi        # 2004 &lt;_IO_stdin_used+0x4&gt;</span><br><span class="line">    1206:e9 25 fe ff ff       jmp    1030 &lt;puts@plt&gt;</span><br><span class="line">    120b:0f 1f 44 00 00       nopl   0x0(%rax,%rax,1)</span><br><span class="line">    1210:48 8d 3d f5 0d 00 00 lea    0xdf5(%rip),%rdi        # 200c &lt;_IO_stdin_used+0xc&gt;</span><br><span class="line">    1217:e9 14 fe ff ff       jmp    1030 &lt;puts@plt&gt;</span><br><span class="line">    121c:0f 1f 40 00          nopl   0x0(%rax)</span><br><span class="line">    1220:48 8d 3d e1 0d 00 00 lea    0xde1(%rip),%rdi        # 2008 &lt;_IO_stdin_used+0x8&gt;</span><br><span class="line">    1227:e9 04 fe ff ff       jmp    1030 &lt;puts@plt&gt;</span><br><span class="line">    122c:0f 1f 40 00          nopl   0x0(%rax)</span><br><span class="line">    1230:48 8d 3d db 0d 00 00 lea    0xddb(%rip),%rdi        # 2012 &lt;_IO_stdin_used+0x12&gt;</span><br><span class="line">    1237:e9 f4 fd ff ff       jmp    1030 &lt;puts@plt&gt;</span><br><span class="line">    123c:0f 1f 40 00          nopl   0x0(%rax)</span><br></pre></td></tr></table></figure><p>我们能看到整体如我们所预期的一样，不断的 cmp 然后不断的 je，然后我们评估一下这里的复杂度呢？哦，时间复杂度 O(n) 秒了。</p><p>卧槽，那对于 Python 这样一个超大的 switch case 的结构，岂不是每次都是一个 O(n) ？这不得原地升天？</p><p>其实不是，通常来说，编译器会根据数据的类型和规模来用不同的方案处理 switch case 的结构</p><p>我们来准备几组例子</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br></pre></td><td class="code"><pre><span class="line"><span class="type">void</span> <span class="title function_">small_switch</span><span class="params">(<span class="type">int</span> x)</span> &#123;</span><br><span class="line">    <span class="keyword">switch</span>(x) &#123;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">1</span>: <span class="built_in">printf</span>(<span class="string">&quot;One\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">2</span>: <span class="built_in">printf</span>(<span class="string">&quot;Two\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">3</span>: <span class="built_in">printf</span>(<span class="string">&quot;Three\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">default</span>: <span class="built_in">printf</span>(<span class="string">&quot;Other\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="type">void</span> <span class="title function_">dense_switch</span><span class="params">(<span class="type">int</span> x)</span> &#123;</span><br><span class="line">    <span class="keyword">switch</span>(x) &#123;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">10</span>: <span class="built_in">printf</span>(<span class="string">&quot;Ten\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">11</span>: <span class="built_in">printf</span>(<span class="string">&quot;Eleven\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">12</span>: <span class="built_in">printf</span>(<span class="string">&quot;Twelve\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">13</span>: <span class="built_in">printf</span>(<span class="string">&quot;Thirteen\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">14</span>: <span class="built_in">printf</span>(<span class="string">&quot;Fourteen\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">15</span>: <span class="built_in">printf</span>(<span class="string">&quot;Fifteen\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">16</span>: <span class="built_in">printf</span>(<span class="string">&quot;Sixteen\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">17</span>: <span class="built_in">printf</span>(<span class="string">&quot;Seventeen\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">default</span>: <span class="built_in">printf</span>(<span class="string">&quot;Other\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="type">void</span> <span class="title function_">sparse_switch</span><span class="params">(<span class="type">int</span> x)</span> &#123;</span><br><span class="line">    <span class="keyword">switch</span>(x) &#123;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">1</span>: <span class="built_in">printf</span>(<span class="string">&quot;One\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">100</span>: <span class="built_in">printf</span>(<span class="string">&quot;Hundred\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">1000</span>: <span class="built_in">printf</span>(<span class="string">&quot;Thousand\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">10000</span>: <span class="built_in">printf</span>(<span class="string">&quot;Ten thousand\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">default</span>: <span class="built_in">printf</span>(<span class="string">&quot;Other\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="type">void</span> <span class="title function_">large_dense_switch</span><span class="params">(<span class="type">int</span> x)</span> &#123;</span><br><span class="line">    <span class="keyword">switch</span>(x) &#123;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">1</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 1\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">2</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 2\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">3</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 3\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">4</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 4\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">5</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 5\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">6</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 6\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">7</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 7\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">8</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 8\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">9</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 9\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">10</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 10\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">11</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 11\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">12</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 12\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">13</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 13\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">14</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 14\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">15</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 15\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">16</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 16\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">17</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 17\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">18</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 18\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">19</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 19\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">20</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 20\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">default</span>: <span class="built_in">printf</span>(<span class="string">&quot;Other\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="type">void</span> <span class="title function_">mixed_switch</span><span class="params">(<span class="type">int</span> x)</span> &#123;</span><br><span class="line">    <span class="keyword">switch</span>(x) &#123;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">1</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 1\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">2</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 2\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">3</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 3\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        </span><br><span class="line">        <span class="keyword">case</span> <span class="number">50</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 50\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        </span><br><span class="line">        <span class="keyword">case</span> <span class="number">100</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 100\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">101</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 101\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="number">102</span>: <span class="built_in">printf</span>(<span class="string">&quot;Case 102\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        </span><br><span class="line">        <span class="keyword">default</span>: <span class="built_in">printf</span>(<span class="string">&quot;Other\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="type">void</span> <span class="title function_">char_switch</span><span class="params">(<span class="type">char</span> c)</span> &#123;</span><br><span class="line">    <span class="keyword">switch</span>(c) &#123;</span><br><span class="line">        <span class="keyword">case</span> <span class="string">&#x27;a&#x27;</span>: <span class="built_in">printf</span>(<span class="string">&quot;Letter a\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="string">&#x27;b&#x27;</span>: <span class="built_in">printf</span>(<span class="string">&quot;Letter b\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="string">&#x27;c&#x27;</span>: <span class="built_in">printf</span>(<span class="string">&quot;Letter c\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="string">&#x27;d&#x27;</span>: <span class="built_in">printf</span>(<span class="string">&quot;Letter d\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">case</span> <span class="string">&#x27;e&#x27;</span>: <span class="built_in">printf</span>(<span class="string">&quot;Letter e\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">        <span class="keyword">default</span>: <span class="built_in">printf</span>(<span class="string">&quot;Other char\n&quot;</span>); <span class="keyword">break</span>;</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br></pre></td></tr></table></figure><p>然后我们反汇编以下，看下结果（这里我只把关键的部分贴出来）</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br></pre></td><td class="code"><pre><span class="line">00000000000011f0 &lt;small_switch&gt;:</span><br><span class="line">    11f0:83 ff 02             cmp    $0x2,%edi      # 比较是否为2</span><br><span class="line">    11f3:74 2b                je     1220            # 跳转到case 2</span><br><span class="line">    11f5:83 ff 03             cmp    $0x3,%edi      # 比较是否为3</span><br><span class="line">    11f8:74 16                je     1210            # 跳转到case 3</span><br><span class="line">    11fa:83 ff 01             cmp    $0x1,%edi      # 比较是否为1</span><br><span class="line">    11fd:75 31                jne    1230            # 不是则跳转到default</span><br><span class="line"></span><br><span class="line">0000000000001240 &lt;dense_switch&gt;:</span><br><span class="line">    1240:83 ef 0a             sub    $0xa,%edi      # 减去10 (最小case值)</span><br><span class="line">    1243:83 ff 07             cmp    $0x7,%edi      # 比较范围 (17-10=7)</span><br><span class="line">    1246:0f 87 90 00 00 00    ja     12dc           # 超出范围跳转default</span><br><span class="line">    124c:48 8d 15 15 0f 00 00 lea    0xf15(%rip),%rdx # 加载跳转表地址</span><br><span class="line">    1253:48 63 04 ba          movslq (%rdx,%rdi,4),%rax # 获取偏移量</span><br><span class="line">    1257:48 01 d0             add    %rdx,%rax      # 计算目标地址</span><br><span class="line">    125a:ff e0                jmp    *%rax          # 间接跳转</span><br><span class="line"></span><br><span class="line">00000000000012f0 &lt;sparse_switch&gt;:</span><br><span class="line">    12f0:81 ff e8 03 00 00    cmp    $0x3e8,%edi    # 比较1000</span><br><span class="line">    12f6:74 40                je     1338           # 等于则跳转</span><br><span class="line">    12f8:7f 16                jg     1310           # 大于1000则继续检查</span><br><span class="line">    12fa:83 ff 01             cmp    $0x1,%edi      # 小于1000，检查1</span><br><span class="line">    12fd:74 49                je     1348           </span><br><span class="line">    12ff:83 ff 64             cmp    $0x64,%edi     # 检查100</span><br><span class="line">    1302:75 24                jne    1328           # 都不是则default</span><br><span class="line">    ...</span><br><span class="line">    1310:81 ff 10 27 00 00    cmp    $0x2710,%edi   # 检查10000</span><br><span class="line"></span><br><span class="line">0000000000001360 &lt;large_dense_switch&gt;:</span><br><span class="line">    1360:83 ff 14             cmp    $0x14,%edi     # 检查是否≤20</span><br><span class="line">    1363:0f 87 53 01 00 00    ja     14bc           # 超出范围</span><br><span class="line">    1369:48 8d 15 18 0e 00 00 lea    0xe18(%rip),%rdx # 跳转表地址</span><br><span class="line">    1372:48 63 04 ba          movslq (%rdx,%rdi,4),%rax # 直接索引</span><br><span class="line">    1376:48 01 d0             add    %rdx,%rax</span><br><span class="line">    1379:ff e0                jmp    *%rax</span><br><span class="line"></span><br><span class="line">00000000000014d0 &lt;mixed_switch&gt;:</span><br><span class="line">    14d0:83 ff 32             cmp    $0x32,%edi     # 比较50</span><br><span class="line">    14d3:74 7b                je     1550</span><br><span class="line">    14d5:7f 29                jg     1500           # &gt;50的情况</span><br><span class="line">    14d7:83 ff 02             cmp    $0x2,%edi      # ≤50，检查小值</span><br><span class="line">    14da:74 64                je     1540</span><br><span class="line">    14dc:83 ff 03             cmp    $0x3,%edi</span><br><span class="line">    ...</span><br><span class="line">    1500:83 ff 65             cmp    $0x65,%edi     # &gt;50，检查100,101,102</span><br><span class="line">    1503:74 5b                je     1560</span><br><span class="line"></span><br><span class="line">0000000000001580 &lt;char_switch&gt;:</span><br><span class="line">    1580:83 ef 61             sub    $0x61,%edi     # 减去&#x27;a&#x27;的ASCII值</span><br><span class="line">    1583:40 80 ff 04          cmp    $0x4,%dil      # 检查是否≤4 (a-e)</span><br><span class="line">    1587:77 63                ja     15ec</span><br><span class="line">    1589:48 8d 15 4c 0c 00 00 lea    0xc4c(%rip),%rdx</span><br><span class="line">    1590:40 0f b6 ff          movzbl %dil,%edi      # 零扩展到32位</span><br><span class="line">    1594:48 63 04 ba          movslq (%rdx,%rdi,4),%rax</span><br><span class="line"></span><br></pre></td></tr></table></figure><p>我们这里能看到编译器根据数据的不同类型来处理了 switch case 的结构，这里我用一个表格总结一下</p><div class="table-container"><table><thead><tr><th>Switch类型</th><th>Case数量</th><th>分布特点</th><th>编译器策略</th><th>时间复杂度</th></tr></thead><tbody><tr><td>small_switch</td><td>3个</td><td>连续(1,2,3)</td><td>线性比较</td><td>O(n)</td></tr><tr><td>dense_switch</td><td>8个</td><td>连续(10-17)</td><td>偏移跳转表</td><td>O(1)</td></tr><tr><td>sparse_switch</td><td>4个</td><td>稀疏(1,100,1000,10000)</td><td>二分查找</td><td>O(log n)</td></tr><tr><td>large_dense_switch</td><td>20个</td><td>连续(1-20)</td><td>标准跳转表</td><td>O(1)</td></tr><tr><td>mixed_switch</td><td>7个</td><td>部分连续</td><td>嵌套比较</td><td>O(log n)</td></tr><tr><td>char_switch</td><td>5个</td><td>连续(‘a’-‘e’)</td><td>字符偏移表</td><td>O(1)</td></tr></tbody></table></div><p>OK，这里我们发现，Switch-case 最终的实现因为数据类型的不一样，会导致我们最终的代码存在一个不可预测性。那么我们有没有办法来优化这个问题呢？答案是有的。</p><p>我们来看下面一段代码</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string">&lt;stdio.h&gt;</span></span></span><br><span class="line"></span><br><span class="line"><span class="type">void</span> <span class="title function_">basic_computed_goto</span><span class="params">(<span class="type">int</span> operation)</span> &#123;</span><br><span class="line">    <span class="type">static</span> <span class="type">void</span>* jump_table[] = &#123;</span><br><span class="line">        &amp;&amp;op_add,   </span><br><span class="line">        &amp;&amp;op_sub,   </span><br><span class="line">        &amp;&amp;op_mul,   </span><br><span class="line">        &amp;&amp;op_div,   </span><br><span class="line">        &amp;&amp;op_mod,   </span><br><span class="line">        &amp;&amp;op_default</span><br><span class="line">    &#125;;</span><br><span class="line">    </span><br><span class="line">    <span class="type">int</span> a = <span class="number">10</span>, b = <span class="number">3</span>;</span><br><span class="line">    <span class="type">int</span> result;</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">if</span> (operation &lt; <span class="number">0</span> || operation &gt; <span class="number">4</span>) &#123;</span><br><span class="line">        operation = <span class="number">5</span>;</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="built_in">printf</span>(<span class="string">&quot;Operation %d: a=%d, b=%d -&gt; &quot;</span>, operation, a, b);</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">goto</span> *jump_table[operation];</span><br><span class="line">    </span><br><span class="line">op_add:</span><br><span class="line">    result = a + b;</span><br><span class="line">    <span class="built_in">printf</span>(<span class="string">&quot;ADD: %d\n&quot;</span>, result);</span><br><span class="line">    <span class="keyword">return</span>;</span><br><span class="line">    </span><br><span class="line">op_sub:</span><br><span class="line">    result = a - b;</span><br><span class="line">    <span class="built_in">printf</span>(<span class="string">&quot;SUB: %d\n&quot;</span>, result);</span><br><span class="line">    <span class="keyword">return</span>;</span><br><span class="line">    </span><br><span class="line">op_mul:</span><br><span class="line">    result = a * b;</span><br><span class="line">    <span class="built_in">printf</span>(<span class="string">&quot;MUL: %d\n&quot;</span>, result);</span><br><span class="line">    <span class="keyword">return</span>;</span><br><span class="line">    </span><br><span class="line">op_div:</span><br><span class="line">    <span class="keyword">if</span> (b != <span class="number">0</span>) &#123;</span><br><span class="line">        result = a / b;</span><br><span class="line">        <span class="built_in">printf</span>(<span class="string">&quot;DIV: %d\n&quot;</span>, result);</span><br><span class="line">    &#125; <span class="keyword">else</span> &#123;</span><br><span class="line">        <span class="built_in">printf</span>(<span class="string">&quot;DIV: Error (division by zero)\n&quot;</span>);</span><br><span class="line">    &#125;</span><br><span class="line">    <span class="keyword">return</span>;</span><br><span class="line">    </span><br><span class="line">op_mod:</span><br><span class="line">    <span class="keyword">if</span> (b != <span class="number">0</span>) &#123;</span><br><span class="line">        result = a % b;</span><br><span class="line">        <span class="built_in">printf</span>(<span class="string">&quot;MOD: %d\n&quot;</span>, result);</span><br><span class="line">    &#125; <span class="keyword">else</span> &#123;</span><br><span class="line">        <span class="built_in">printf</span>(<span class="string">&quot;MOD: Error (division by zero)\n&quot;</span>);</span><br><span class="line">    &#125;</span><br><span class="line">    <span class="keyword">return</span>;</span><br><span class="line">    </span><br><span class="line">op_default:</span><br><span class="line">    <span class="built_in">printf</span>(<span class="string">&quot;Unknown operation\n&quot;</span>);</span><br><span class="line">    <span class="keyword">return</span>;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>我们能看到这里核心的一个操作是将我们 Switch-cased 的每个 case 都变成了一个标签，然后我们通过一个 jump_table 来直接跳转到对应的标签上, 我们来看一下最关键位置的汇编</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">11d3:48 8d 05 c6 2b 00 00 lea    0x2bc6(%rip),%rax        # 3da0 &lt;jump_table.0&gt;</span><br><span class="line">11da:ff 24 d8             jmp    *(%rax,%rbx,8)</span><br></pre></td></tr></table></figure><p>这里我们可以总结出来使用 Computed Goto 相较于传统的 switch-case 有以下几点优势</p><ol><li>减少分支预测 fallback 的代价</li><li>指令缓存局部性上更优</li><li>减少了 cmp 指令的数量和开销</li></ol><p>那么具体能有多快呢？可以参见 CPython 引入的 Computed Goto 的一个测试结果，大概是整体提升了15% 左右</p><p>那么 Computed Goto 的方式就是完美的吗？其实也不是，目前 CPython 的解释器 ceval.c 也是目前最大的一个 switch case 中存在几个典型问题</p><ol><li>Computed Goto 作为 clang 和 gcc 特化的功能，那么其余平台受益的可能性较小</li><li>目前 Computed Goto 其实并不成熟，在同一个编译器不同的版本可能会有不同的问题</li><li>超大型的 switch case 会导致编译器对于 switch case 的优化不够好</li><li>我们无法使用 perf 精确的去对我们整个过程中 per opcode 的开销进行定量分析，这在于让 Python 变得更快的大背景下将会是一个很大的问题</li></ol><p>1，3，4 都很好理解，我们来看一下2的一个例子（感谢 Ken Jin 提供的例子）</p><p>在 GCC 11 的时候，switch-case 某个部分会正常的代码</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">738f: movq%r13, %r15</span><br><span class="line">7392: movzbl%ah, %ebx</span><br><span class="line">7395: movzbl%al, %eax</span><br><span class="line">7398: movq(,%rax,8), %rax</span><br><span class="line">73a0: movl%ebx, -0x248(%rbp)</span><br><span class="line">73a6: jmpq*%rax</span><br></pre></td></tr></table></figure><p>而在 GCC 13-15Beta 的时候，则会产生这样的代码</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line">747a: movzbl%ah, %ebx</span><br><span class="line">747d: movzbl%al, %eax</span><br><span class="line">7480: movl%ebx, -0x248(%rbp)</span><br><span class="line">7486: movq(,%rax,8), %rax</span><br><span class="line">748e: jmp0x72a0 &lt;_PyEval_EvalFrameDefault+0x970&gt;</span><br><span class="line"></span><br><span class="line">72a0: movq%r15, %xmm0</span><br><span class="line">72a5: movq%r13, %xmm3</span><br><span class="line">72aa: movq%r15, %rbx</span><br><span class="line">72ad: punpcklqdq%xmm3, %xmm0</span><br><span class="line">72b1: movhlps%xmm0, %xmm2</span><br><span class="line">72b4: movq%xmm2, %r10</span><br><span class="line">72b9: movq%r10, %r11</span><br><span class="line">72bc: jmpq*%rax</span><br></pre></td></tr></table></figure><p>我们能发现，额外的寄存器被引入了。体系结构 101，额外的寄存器意味着额外的开销。寄存器是很贵的！</p><p>那么我们有没有办法来迭代上面的超大的 Switch case 呢？估计有同学在想，既然上面的 switch case 超级大，那么我们将其拆分为多个小的函数<br>这样编译器可以有足够的上下文来优化，同时我们的 perf 也可以精确的去分析每个函数的开销，岂不美哉？</p><p>但是又有同学反对了，函数调用会触发 call 的指令，会带来额外的寄存器入栈和出栈的开销，这样会不会又变慢了呢？</p><p>那么能不能优化一下呢？答案是可以的，很多同学可能会想到了，tail call</p><p>假设我们有这样一段代码</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">__attribute__((noinline)) <span class="type">void</span> <span class="title function_">g</span><span class="params">(<span class="type">int</span> x)</span> &#123;</span><br><span class="line">    <span class="built_in">printf</span>(<span class="string">&quot;Value: %d\n&quot;</span>, x);</span><br><span class="line">&#125;;</span><br><span class="line"><span class="type">void</span> <span class="title function_">f</span><span class="params">(<span class="type">int</span> x)</span> &#123;</span><br><span class="line">    <span class="keyword">return</span> g(x);</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>我们能看到这样一段汇编</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br></pre></td><td class="code"><pre><span class="line">0000000000001140 &lt;g&gt;:</span><br><span class="line">    1140:55                   push   %rbp</span><br><span class="line">    1141:48 89 e5             mov    %rsp,%rbp</span><br><span class="line">    1144:48 83 ec 10          sub    $0x10,%rsp</span><br><span class="line">    1148:89 7d fc             mov    %edi,-0x4(%rbp)</span><br><span class="line">    114b:8b 75 fc             mov    -0x4(%rbp),%esi</span><br><span class="line">    114e:48 8d 3d af 0e 00 00 lea    0xeaf(%rip),%rdi        # 2004 &lt;_IO_stdin_used+0x4&gt;</span><br><span class="line">    1155:b0 00                mov    $0x0,%al</span><br><span class="line">    1157:e8 d4 fe ff ff       call   1030 &lt;printf@plt&gt;</span><br><span class="line">    115c:48 83 c4 10          add    $0x10,%rsp</span><br><span class="line">    1160:5d                   pop    %rbp</span><br><span class="line">    1161:c3                   ret</span><br><span class="line">    1162:66 66 66 66 66 2e 0f data16 data16 data16 data16 cs nopw 0x0(%rax,%rax,1)</span><br><span class="line">    1169:1f 84 00 00 00 00 00 </span><br><span class="line"></span><br><span class="line">0000000000001170 &lt;f&gt;:</span><br><span class="line">    1170:55                   push   %rbp</span><br><span class="line">    1171:48 89 e5             mov    %rsp,%rbp</span><br><span class="line">    1174:48 83 ec 10          sub    $0x10,%rsp</span><br><span class="line">    1178:89 7d fc             mov    %edi,-0x4(%rbp)</span><br><span class="line">    117b:8b 7d fc             mov    -0x4(%rbp),%edi</span><br><span class="line">    117e:e8 bd ff ff ff       call   1140 &lt;g&gt;</span><br><span class="line">    1183:48 83 c4 10          add    $0x10,%rsp</span><br><span class="line">    1187:5d                   pop    %rbp</span><br><span class="line">    1188:c3                   ret</span><br><span class="line">    1189:0f 1f 80 00 00 00 00 nopl   0x0(%rax)</span><br></pre></td></tr></table></figure><p>其中 <code>call   1140 &lt;g&gt;</code> 指令非常显眼。这也是函数调用本身的开销的一个重要来源</p><p>在现在编译器中，存在一种特殊的优化叫作尾递归，即当函数的最后一步是调用另一个函数时，编译器可以优化掉这个调用的开销</p><p>我们来测试一下</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string">&lt;stdio.h&gt;</span></span></span><br><span class="line">__attribute__((preserve_none)) <span class="type">void</span> <span class="title function_">g</span><span class="params">(<span class="type">int</span> x)</span>;</span><br><span class="line">__attribute__((noinline, preserve_none)) <span class="type">void</span> <span class="title function_">g</span><span class="params">(<span class="type">int</span> x)</span>&#123;</span><br><span class="line">    <span class="built_in">printf</span>(<span class="string">&quot;Value: %d\n&quot;</span>, x);</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">__attribute__((preserve_none)) <span class="type">void</span> <span class="title function_">f</span><span class="params">(<span class="type">int</span> x)</span> &#123;</span><br><span class="line">    [[clang::musttail]] <span class="keyword">return</span> g(x);</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="type">int</span> <span class="title function_">main</span><span class="params">()</span> &#123;</span><br><span class="line">    f(<span class="number">42</span>);</span><br><span class="line">    <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>我们来看下相关汇编</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br></pre></td><td class="code"><pre><span class="line">0000000000001140 &lt;g&gt;:</span><br><span class="line">    1140:55                   push   %rbp</span><br><span class="line">    1141:48 89 e5             mov    %rsp,%rbp</span><br><span class="line">    1144:48 83 ec 10          sub    $0x10,%rsp</span><br><span class="line">    1148:44 89 65 fc          mov    %r12d,-0x4(%rbp)</span><br><span class="line">    114c:8b 75 fc             mov    -0x4(%rbp),%esi</span><br><span class="line">    114f:48 8d 3d ae 0e 00 00 lea    0xeae(%rip),%rdi        # 2004 &lt;_IO_stdin_used+0x4&gt;</span><br><span class="line">    1156:31 c0                xor    %eax,%eax</span><br><span class="line">    1158:e8 d3 fe ff ff       call   1030 &lt;printf@plt&gt;</span><br><span class="line">    115d:48 83 c4 10          add    $0x10,%rsp</span><br><span class="line">    1161:5d                   pop    %rbp</span><br><span class="line">    1162:c3                   ret</span><br><span class="line">    1163:66 66 66 66 2e 0f 1f data16 data16 data16 cs nopw 0x0(%rax,%rax,1)</span><br><span class="line">    116a:84 00 00 00 00 00 </span><br><span class="line"></span><br><span class="line">0000000000001170 &lt;f&gt;:</span><br><span class="line">    1170:55                   push   %rbp</span><br><span class="line">    1171:48 89 e5             mov    %rsp,%rbp</span><br><span class="line">    1174:44 89 65 fc          mov    %r12d,-0x4(%rbp)</span><br><span class="line">    1178:44 8b 65 fc          mov    -0x4(%rbp),%r12d</span><br><span class="line">    117c:5d                   pop    %rbp</span><br><span class="line">    117d:e9 be ff ff ff       jmp    1140 &lt;g&gt;</span><br><span class="line">    1182:66 66 66 66 66 2e 0f data16 data16 data16 data16 cs nopw 0x0(%rax,%rax,1)</span><br><span class="line">    1189:1f 84 00 00 00 00 00 </span><br></pre></td></tr></table></figure><p>我们能看到，<code>f</code> 函数的最后一步是 <code>jmp 1140 &lt;g&gt;</code>，而不是 <code>call 1140 &lt;g&gt;</code>，这就意味着我们在调用 <code>g</code> 函数的时候不会有额外的寄存器分配等传统 call 指令带来的开销。</p><p>可能有同学回过味来了，那么这里在做尾递归处理后，感觉完全可以当作一种高性能 Goto 来看嘛。</p><p>Bingo，这里其实思路也是差不多这样的，在77年的一篇论文《Debunking the ‘Expensive Procedure Call’ Myth, or, Procedure Call Implementations Considered Harmful, or, Lambda: The Ultimate GOTO》就提到了，高效的过程调用可以和 Goto 性能相近，而在实现上会更简洁。</p><p>在 Python 3.14 中，Tail Call Interpreter 的实现就是基于这个思路的。</p><p>我们能看到我们对于 opcode 进行 dispatch 的宏进行了尾递归的处理</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#   <span class="keyword">define</span> Py_MUSTTAIL [[clang::musttail]]</span></span><br><span class="line"><span class="meta">#   <span class="keyword">define</span> Py_PRESERVE_NONE_CC __attribute__((preserve_none))</span></span><br><span class="line">    Py_PRESERVE_NONE_CC <span class="keyword">typedef</span> PyObject* (*py_tail_call_funcptr)(TAIL_CALL_PARAMS);</span><br><span class="line"></span><br><span class="line"><span class="meta">#   <span class="keyword">define</span> TARGET(op) Py_PRESERVE_NONE_CC PyObject *_TAIL_CALL_##op(TAIL_CALL_PARAMS)</span></span><br><span class="line"><span class="meta">#   <span class="keyword">define</span> DISPATCH_GOTO() \</span></span><br><span class="line"><span class="meta">        do &#123; \</span></span><br><span class="line"><span class="meta">            Py_MUSTTAIL return (INSTRUCTION_TABLE[opcode])(TAIL_CALL_ARGS); \</span></span><br><span class="line"><span class="meta">        &#125; while (0)</span></span><br><span class="line"><span class="meta">#   <span class="keyword">define</span> JUMP_TO_LABEL(name) \</span></span><br><span class="line"><span class="meta">        do &#123; \</span></span><br><span class="line"><span class="meta">            Py_MUSTTAIL return (_TAIL_CALL_##name)(TAIL_CALL_ARGS); \</span></span><br><span class="line"><span class="meta">        &#125; while (0)</span></span><br><span class="line"><span class="meta">#   <span class="keyword">ifdef</span> Py_STATS</span></span><br><span class="line"><span class="meta">#       <span class="keyword">define</span> JUMP_TO_PREDICTED(name) \</span></span><br><span class="line"><span class="meta">            do &#123; \</span></span><br><span class="line"><span class="meta">                Py_MUSTTAIL return (_TAIL_CALL_##name)(frame, stack_pointer, tstate, this_instr, oparg, lastopcode); \</span></span><br><span class="line"><span class="meta">            &#125; while (0)</span></span><br><span class="line"><span class="meta">#   <span class="keyword">else</span></span></span><br><span class="line"><span class="meta">#       <span class="keyword">define</span> JUMP_TO_PREDICTED(name) \</span></span><br><span class="line"><span class="meta">            do &#123; \</span></span><br><span class="line"><span class="meta">                Py_MUSTTAIL return (_TAIL_CALL_##name)(frame, stack_pointer, tstate, this_instr, oparg); \</span></span><br><span class="line"><span class="meta">            &#125; while (0)</span></span><br><span class="line"><span class="meta">#   <span class="keyword">endif</span></span></span><br><span class="line"><span class="meta">#    <span class="keyword">define</span> LABEL(name) TARGET(name)</span></span><br></pre></td></tr></table></figure><p>那么在保证我们基线性能和 Compute GOTO 甚至更优一点的同时，我们可以得到如下的一些好处</p><ol><li>更广泛的平台支持</li><li>将 case 拆分后，编译器更不容易犯错，整体的性能的可预测性更强</li><li>happy perf</li><li>以及我可以用 eBPF 之类的工具做更多的骚操作（</li></ol><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><p>这篇文章差不多就是这样，虽然说是只介绍 Python 3.14 的 Tail Call Interpreter，但是还是完整的介绍了一些整个的一个演进思路</p><p>这也带给我一个启发，很多时候，可预测性真的是非常重要的一个特性。</p><p>这算是 3.14 中和 remote debug 一起并列为我最喜欢的两个feature，可观测性万岁！</p>]]></content>
    
    
    <summary type="html">&lt;p&gt;最近做安全做的我头晕脑胀，来点轻松的换换脑子，让自己放松下&lt;/p&gt;
&lt;p&gt;Python 3.14 正式引入了一个新的机制叫作 Tail Call Interpreter（Made by Ken Jin），这无疑又是一个奠定未来基础的重大工作&lt;/p&gt;</summary>
    
    
    
    <category term="编程" scheme="https://www.manjusaka.blog/categories/%E7%BC%96%E7%A8%8B/"/>
    
    <category term="Python" scheme="https://www.manjusaka.blog/categories/%E7%BC%96%E7%A8%8B/Python/"/>
    
    
    <category term="编程" scheme="https://www.manjusaka.blog/tags/%E7%BC%96%E7%A8%8B/"/>
    
    <category term="Linux" scheme="https://www.manjusaka.blog/tags/Linux/"/>
    
    <category term="Python" scheme="https://www.manjusaka.blog/tags/Python/"/>
    
    <category term="笔记" scheme="https://www.manjusaka.blog/tags/%E7%AC%94%E8%AE%B0/"/>
    
    <category term="水文" scheme="https://www.manjusaka.blog/tags/%E6%B0%B4%E6%96%87/"/>
    
  </entry>
  
  <entry>
    <title>Python 3.14: Python 世界的一大步</title>
    <link href="https://www.manjusaka.blog/posts/2025/04/26/3-14-is-one-of-the-best-python-version/"/>
    <id>https://www.manjusaka.blog/posts/2025/04/26/3-14-is-one-of-the-best-python-version/</id>
    <published>2025-04-26T14:49:00.000Z</published>
    <updated>2026-03-29T17:00:43.276Z</updated>
    
    <content type="html"><![CDATA[<script src="/assets/js/DPlayer.min.js"> </script><p>Python 3.14 目前主要的一些主要的特性其实已经固定了，在我看来，Python 3.14 是一个未来很多年的一个核心版本。因为其确定了是时代的 Python<br>调试生态的基准，这篇文章将会来聊聊这个 Python 世界中的史诗级改进</p><span id="more"></span><h2 id="正文"><a href="#正文" class="headerlink" title="正文"></a>正文</h2><p>在我们日常调试 Python 代码的时候，我们经常会遇到这样一个问题，我们需要采样当前的 Python Runtime 的状态，进而进一步调试我们的 Python 进程</p><p>常见的手段莫过于两种</p><ol><li>通过 eBPF + UProbe 等手段来触发</li><li>通过 <code>process_vm_readv</code> 等 Syscall 来直接整块读取内存</li></ol><p>无论这两种方式都有一个核心的问题，我们怎么样来解析内存中的数据？</p><p>用 <a href="https://github.com/jschwinger233/perf-examples/blob/main/cpython310_backtrace/bpf.c">https://github.com/jschwinger233/perf-examples/blob/main/cpython310_backtrace/bpf.c</a> 来做一个例子，在之前的很多年的时候，我们会怎么做</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br><span class="line">100</span><br><span class="line">101</span><br><span class="line">102</span><br><span class="line">103</span><br><span class="line">104</span><br><span class="line">105</span><br><span class="line">106</span><br><span class="line">107</span><br><span class="line">108</span><br><span class="line">109</span><br><span class="line">110</span><br><span class="line">111</span><br><span class="line">112</span><br><span class="line">113</span><br><span class="line">114</span><br><span class="line">115</span><br><span class="line">116</span><br><span class="line">117</span><br><span class="line">118</span><br><span class="line">119</span><br><span class="line">120</span><br><span class="line">121</span><br><span class="line">122</span><br><span class="line">123</span><br><span class="line">124</span><br><span class="line">125</span><br><span class="line">126</span><br><span class="line">127</span><br><span class="line">128</span><br><span class="line">129</span><br><span class="line">130</span><br><span class="line">131</span><br><span class="line">132</span><br><span class="line">133</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="keyword">define</span> PAGE_SIZE (1&lt;&lt;12)</span></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> KASAN_STACK_ORDER 0</span></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> THREAD_SIZE_ORDER (2 + KASAN_STACK_ORDER)</span></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> THREAD_SIZE  ((__u64)(PAGE_SIZE &lt;&lt; THREAD_SIZE_ORDER))</span></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> TOP_OF_KERNEL_STACK_PADDING ((__u64)0)</span></span><br><span class="line"></span><br><span class="line"><span class="type">const</span> <span class="type">static</span> u32 ZERO = <span class="number">0</span>;</span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">PyTypeObject</span> &#123;</span></span><br><span class="line">    <span class="type">char</span> _[<span class="number">24</span>];</span><br><span class="line">    <span class="type">char</span> *tp_name;</span><br><span class="line">&#125;;</span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">PyObject</span> &#123;</span></span><br><span class="line">    <span class="type">char</span> _[<span class="number">8</span>];</span><br><span class="line">    <span class="class"><span class="keyword">struct</span> <span class="title">PyTypeObject</span> *<span class="title">ob_type</span>;</span></span><br><span class="line">&#125;;</span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">PyVarObject</span> &#123;</span></span><br><span class="line">    <span class="class"><span class="keyword">struct</span> <span class="title">PyObject</span> <span class="title">ob_base</span>;</span></span><br><span class="line">    <span class="type">char</span> _[<span class="number">8</span>];</span><br><span class="line">&#125;;</span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">PyASCIIObject</span> &#123;</span></span><br><span class="line">__u8 _[<span class="number">16</span>];</span><br><span class="line">__u64 length;</span><br><span class="line">__u8 __[<span class="number">24</span>];</span><br><span class="line">&#125;;</span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">struct</span> _<span class="title">PyStr</span> &#123;</span></span><br><span class="line">    <span class="class"><span class="keyword">struct</span> <span class="title">PyASCIIObject</span> <span class="title">ascii</span>;</span></span><br><span class="line">    <span class="type">char</span> buf[<span class="number">100</span>];</span><br><span class="line">&#125;;</span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">PyCodeObject</span> &#123;</span></span><br><span class="line">    <span class="type">char</span> _[<span class="number">104</span>];</span><br><span class="line">    <span class="class"><span class="keyword">struct</span> _<span class="title">PyStr</span> *<span class="title">co_filename</span>;</span></span><br><span class="line">    <span class="class"><span class="keyword">struct</span> _<span class="title">PyStr</span> *<span class="title">co_name</span>;</span></span><br><span class="line">&#125;;</span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">PyFrameObject</span> &#123;</span></span><br><span class="line">    <span class="class"><span class="keyword">struct</span> <span class="title">PyVarObject</span> <span class="title">ob_base</span>;</span></span><br><span class="line">    <span class="class"><span class="keyword">struct</span> <span class="title">PyFrameObject</span> *<span class="title">f_back</span>;</span></span><br><span class="line">    <span class="class"><span class="keyword">struct</span> <span class="title">PyCodeObject</span> *<span class="title">f_code</span>;</span></span><br><span class="line">    <span class="type">char</span> _[<span class="number">60</span>];</span><br><span class="line">    <span class="type">int</span> f_lineno;</span><br><span class="line">&#125;;</span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">event</span> &#123;</span></span><br><span class="line">__u64 rip;</span><br><span class="line">__u8 user_mode;</span><br><span class="line">__s8 python_stack_depth;</span><br><span class="line">__u64 filename_len[<span class="number">20</span>];</span><br><span class="line">__u64 funcname_len[<span class="number">20</span>];</span><br><span class="line"><span class="type">unsigned</span> <span class="type">char</span> filename[<span class="number">20</span>][<span class="number">100</span>];</span><br><span class="line"><span class="type">unsigned</span> <span class="type">char</span> funcname[<span class="number">20</span>][<span class="number">100</span>];</span><br><span class="line">&#125;;</span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">struct</span> &#123;</span></span><br><span class="line">__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);</span><br><span class="line">__uint(max_entries, <span class="number">1</span>);</span><br><span class="line">__type(key, u32);</span><br><span class="line">__type(value, <span class="keyword">struct</span> event);</span><br><span class="line">&#125; events <span class="title function_">SEC</span><span class="params">(<span class="string">&quot;.maps&quot;</span>)</span>;</span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">struct</span> &#123;</span></span><br><span class="line">__uint(type, BPF_MAP_TYPE_RINGBUF);</span><br><span class="line">__uint(max_entries, <span class="number">1</span>&lt;&lt;<span class="number">29</span>);</span><br><span class="line">&#125; ringbuf <span class="title function_">SEC</span><span class="params">(<span class="string">&quot;.maps&quot;</span>)</span>;</span><br><span class="line"></span><br><span class="line">SEC(<span class="string">&quot;perf_event/cpython310&quot;</span>)</span><br><span class="line"><span class="type">int</span> <span class="title function_">perf_event_cpython310</span><span class="params">(<span class="keyword">struct</span> bpf_perf_event_data *ctx)</span></span><br><span class="line">&#123;</span><br><span class="line">__u64 rsp;</span><br><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">event</span> *<span class="title">event</span>;</span></span><br><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">PyFrameObject</span> *<span class="title">frame</span>;</span></span><br><span class="line"></span><br><span class="line">event = bpf_map_lookup_elem(&amp;events, &amp;ZERO);</span><br><span class="line"><span class="keyword">if</span> (!event)</span><br><span class="line"><span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line"></span><br><span class="line">rsp = ctx-&gt;regs.sp;</span><br><span class="line">event-&gt;rip = ctx-&gt;regs.ip;</span><br><span class="line">event-&gt;user_mode = !!(ctx-&gt;regs.cs &amp; <span class="number">3</span>);</span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> (!event-&gt;user_mode) &#123;</span><br><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">task_struct</span> *<span class="title">task</span> =</span> (<span class="keyword">struct</span> task_struct *)bpf_get_current_task();</span><br><span class="line">__u64 __ptr = (__u64)BPF_CORE_READ(task, <span class="built_in">stack</span>);</span><br><span class="line">__ptr += THREAD_SIZE - TOP_OF_KERNEL_STACK_PADDING;</span><br><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">pt_regs</span> *<span class="title">pt_regs</span> =</span> ((<span class="keyword">struct</span> pt_regs *)__ptr) - <span class="number">1</span>;</span><br><span class="line"></span><br><span class="line">rsp = BPF_CORE_READ(pt_regs, sp);</span><br><span class="line">event-&gt;rip = BPF_CORE_READ(pt_regs, ip);</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="type">char</span> name[<span class="number">5</span>];</span><br><span class="line"><span class="type">bool</span> found = <span class="literal">false</span>;</span><br><span class="line"></span><br><span class="line"><span class="keyword">for</span> (<span class="type">int</span> i = <span class="number">0</span>; i &lt; <span class="number">200</span>; i++) &#123;</span><br><span class="line">bpf_probe_read_user(&amp;frame, <span class="keyword">sizeof</span>(frame), (<span class="type">void</span> *)rsp + <span class="number">8</span>*i);</span><br><span class="line"><span class="keyword">if</span> (!frame)</span><br><span class="line"><span class="keyword">continue</span>;</span><br><span class="line"></span><br><span class="line"><span class="type">char</span> *tp_name = BPF_PROBE_READ_USER(frame, ob_base.ob_base.ob_type, tp_name);</span><br><span class="line">bpf_probe_read_user(&amp;name, <span class="keyword">sizeof</span>(name), (<span class="type">void</span> *)tp_name);</span><br><span class="line"><span class="keyword">if</span> (bpf_strncmp(name, <span class="number">5</span>, <span class="string">&quot;frame&quot;</span>) == <span class="number">0</span>) &#123;</span><br><span class="line">found = <span class="literal">true</span>;</span><br><span class="line"><span class="keyword">break</span>;</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> (!found) &#123;</span><br><span class="line">event-&gt;python_stack_depth = <span class="number">-1</span>;</span><br><span class="line">bpf_ringbuf_output(&amp;ringbuf, event, <span class="keyword">sizeof</span>(*event), <span class="number">0</span>);</span><br><span class="line"><span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">for</span> (<span class="type">int</span> i = <span class="number">0</span>; i &lt; <span class="number">20</span>; i++) &#123;</span><br><span class="line">event-&gt;python_stack_depth = i;</span><br><span class="line">BPF_PROBE_READ_USER_INTO(&amp;event-&gt;filename_len[i], frame, f_code, co_filename, ascii.length);</span><br><span class="line">BPF_PROBE_READ_USER_INTO(&amp;event-&gt;filename[i], frame, f_code, co_filename, buf);</span><br><span class="line">BPF_PROBE_READ_USER_INTO(&amp;event-&gt;funcname_len[i], frame, f_code, co_name, ascii.length);</span><br><span class="line">BPF_PROBE_READ_USER_INTO(&amp;event-&gt;funcname[i], frame, f_code, co_name, buf);</span><br><span class="line">frame = BPF_PROBE_READ_USER(frame, f_back);</span><br><span class="line"><span class="keyword">if</span> (!frame)</span><br><span class="line"><span class="keyword">break</span>;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">bpf_ringbuf_output(&amp;ringbuf, event, <span class="keyword">sizeof</span>(*event), <span class="number">0</span>);</span><br><span class="line"><span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="type">char</span> __license[] SEC(<span class="string">&quot;license&quot;</span>) = <span class="string">&quot;Dual MIT/GPL&quot;</span>;</span><br></pre></td></tr></table></figure><p>上面的核心代码其实没多少，核心的逻辑就还是我们手动模拟 Python 中关键的 <code>PyFrameObject</code> 结构体，然后我们在内存中不断做一次搜索，暴力匹配到特征一致的内存</p><p>其余诸如 PySpy 这样的工具也是类似的思路</p><p>这个方式最核心的问题是在于说，Python 每个版本的 ABI 都可能发生变化，所以我们需要不断的根据不同的版本去做兼容（比如 PySpy 维护了从3.7到3.12的不同的 <code>PyFrameObject</code>。</p><p>那么我们有没有更好的方法来处理这个问题？或者说我们能不能更好的去定位？</p><p>可以的，写 Python 的同学肯定都知道我们 Python 中有一个全局的变量 <code>_PyRuntime</code>，其类型为 <code>pyruntimestate</code>，大致的布局如下</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br></pre></td><td class="code"><pre><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">pyruntimestate</span> &#123;</span></span><br><span class="line"></span><br><span class="line">    _Py_DebugOffsets debug_offsets;</span><br><span class="line"></span><br><span class="line">    <span class="type">int</span> _initialized;</span><br><span class="line"></span><br><span class="line">    <span class="type">int</span> preinitializing;</span><br><span class="line"></span><br><span class="line">    <span class="type">int</span> preinitialized;</span><br><span class="line"></span><br><span class="line">    <span class="type">int</span> core_initialized;</span><br><span class="line"></span><br><span class="line">    <span class="type">int</span> initialized;</span><br><span class="line"></span><br><span class="line">    PyThreadState *_finalizing;</span><br><span class="line"></span><br><span class="line">    <span class="type">unsigned</span> <span class="type">long</span> _finalizing_id;</span><br><span class="line"></span><br><span class="line">    <span class="class"><span class="keyword">struct</span> <span class="title">pyinterpreters</span> &#123;</span></span><br><span class="line">        PyMutex mutex;</span><br><span class="line">        PyInterpreterState *head;</span><br><span class="line"></span><br><span class="line">        PyInterpreterState *main;</span><br><span class="line"></span><br><span class="line">        <span class="type">int64_t</span> next_id;</span><br><span class="line">    &#125; interpreters;</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">    <span class="type">unsigned</span> <span class="type">long</span> main_thread;</span><br><span class="line">    PyThreadState *main_tstate;</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">    _PyXI_global_state_t xi;</span><br><span class="line"></span><br><span class="line">    <span class="class"><span class="keyword">struct</span> _<span class="title">pymem_allocators</span> <span class="title">allocators</span>;</span></span><br><span class="line">    <span class="class"><span class="keyword">struct</span> _<span class="title">obmalloc_global_state</span> <span class="title">obmalloc</span>;</span></span><br><span class="line">    <span class="class"><span class="keyword">struct</span> <span class="title">pyhash_runtime_state</span> <span class="title">pyhash_state</span>;</span></span><br><span class="line">    <span class="class"><span class="keyword">struct</span> _<span class="title">pythread_runtime_state</span> <span class="title">threads</span>;</span></span><br><span class="line">    <span class="class"><span class="keyword">struct</span> _<span class="title">signals_runtime_state</span> <span class="title">signals</span>;</span></span><br><span class="line"></span><br><span class="line">    Py_tss_t autoTSSkey;</span><br><span class="line"></span><br><span class="line">    Py_tss_t trashTSSkey;</span><br><span class="line"></span><br><span class="line">    PyWideStringList orig_argv;</span><br><span class="line"></span><br><span class="line">    <span class="class"><span class="keyword">struct</span> _<span class="title">parser_runtime_state</span> <span class="title">parser</span>;</span></span><br><span class="line"></span><br><span class="line">    <span class="class"><span class="keyword">struct</span> _<span class="title">atexit_runtime_state</span> <span class="title">atexit</span>;</span></span><br><span class="line"></span><br><span class="line">    <span class="class"><span class="keyword">struct</span> _<span class="title">import_runtime_state</span> <span class="title">imports</span>;</span></span><br><span class="line">    <span class="class"><span class="keyword">struct</span> _<span class="title">ceval_runtime_state</span> <span class="title">ceval</span>;</span></span><br><span class="line">    <span class="class"><span class="keyword">struct</span> _<span class="title">gilstate_runtime_state</span> &#123;</span></span><br><span class="line"></span><br><span class="line">        <span class="type">int</span> check_enabled;</span><br><span class="line"></span><br><span class="line">        PyInterpreterState *autoInterpreterState;</span><br><span class="line">    &#125; gilstate;</span><br><span class="line">    <span class="class"><span class="keyword">struct</span> _<span class="title">getargs_runtime_state</span> &#123;</span></span><br><span class="line">        <span class="class"><span class="keyword">struct</span> _<span class="title">PyArg_Parser</span> *<span class="title">static_parsers</span>;</span></span><br><span class="line">    &#125; getargs;</span><br><span class="line">    <span class="class"><span class="keyword">struct</span> _<span class="title">fileutils_state</span> <span class="title">fileutils</span>;</span></span><br><span class="line">    <span class="class"><span class="keyword">struct</span> _<span class="title">faulthandler_runtime_state</span> <span class="title">faulthandler</span>;</span></span><br><span class="line">    <span class="class"><span class="keyword">struct</span> _<span class="title">tracemalloc_runtime_state</span> <span class="title">tracemalloc</span>;</span></span><br><span class="line">    <span class="class"><span class="keyword">struct</span> _<span class="title">reftracer_runtime_state</span> <span class="title">ref_tracer</span>;</span></span><br><span class="line"></span><br><span class="line">    _PyRWMutex stoptheworld_mutex;</span><br><span class="line">    <span class="class"><span class="keyword">struct</span> _<span class="title">stoptheworld_state</span> <span class="title">stoptheworld</span>;</span></span><br><span class="line"></span><br><span class="line">    PyPreConfig preconfig;</span><br><span class="line">    Py_OpenCodeHookFunction open_code_hook;</span><br><span class="line">    <span class="type">void</span> *open_code_userdata;</span><br><span class="line">    <span class="class"><span class="keyword">struct</span> &#123;</span></span><br><span class="line">        PyMutex mutex;</span><br><span class="line">        <span class="class"><span class="keyword">struct</span> _<span class="title">Py_AuditHookEntry</span> *<span class="title">head</span>;</span></span><br><span class="line">    &#125; audit_hooks;</span><br><span class="line"></span><br><span class="line">    <span class="class"><span class="keyword">struct</span> _<span class="title">py_object_runtime_state</span> <span class="title">object_state</span>;</span></span><br><span class="line">    <span class="class"><span class="keyword">struct</span> _<span class="title">Py_float_runtime_state</span> <span class="title">float_state</span>;</span></span><br><span class="line">    <span class="class"><span class="keyword">struct</span> _<span class="title">Py_unicode_runtime_state</span> <span class="title">unicode_state</span>;</span></span><br><span class="line">    <span class="class"><span class="keyword">struct</span> _<span class="title">types_runtime_state</span> <span class="title">types</span>;</span></span><br><span class="line">    <span class="class"><span class="keyword">struct</span> _<span class="title">Py_time_runtime_state</span> <span class="title">time</span>;</span></span><br><span class="line"></span><br><span class="line"><span class="meta">#<span class="keyword">if</span> defined(__EMSCRIPTEN__) &amp;&amp; defined(PY_CALL_TRAMPOLINE)</span></span><br><span class="line"></span><br><span class="line">    <span class="type">int</span> (*emscripten_count_args_function)(PyCFunctionWithKeywords func);</span><br><span class="line"><span class="meta">#<span class="keyword">endif</span></span></span><br><span class="line">    <span class="class"><span class="keyword">struct</span> _<span class="title">Py_cached_objects</span> <span class="title">cached_objects</span>;</span></span><br><span class="line">    <span class="class"><span class="keyword">struct</span> _<span class="title">Py_static_objects</span> <span class="title">static_objects</span>;</span></span><br><span class="line"></span><br><span class="line">    PyInterpreterState _main_interpreter;</span><br><span class="line"></span><br><span class="line">&#125;;</span><br></pre></td></tr></table></figure><p>眼尖的同学肯定看到了，我们其中有一段核心的代码</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">pyinterpreters</span> &#123;</span></span><br><span class="line">    PyMutex mutex;</span><br><span class="line">    PyInterpreterState *head;</span><br><span class="line"></span><br><span class="line">    PyInterpreterState *main;</span><br><span class="line"></span><br><span class="line">    <span class="type">int64_t</span> next_id;</span><br><span class="line">&#125; interpreters;</span><br></pre></td></tr></table></figure><p>维护了一个 <code>PyInterpreterState</code> 的链表，我们可以通过 <code>PyInterpreterState</code> 来获取当前的 Frame，<code>PyInterpreterState</code> 中的 TreadState 来获取当前的线程状态</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">pythreads</span> &#123;</span></span><br><span class="line">    <span class="type">uint64_t</span> next_unique_id;</span><br><span class="line">    <span class="comment">/* The linked list of threads, newest first. */</span></span><br><span class="line">    PyThreadState *head;</span><br><span class="line">    _PyThreadStateImpl *preallocated;</span><br><span class="line">    <span class="comment">/* The thread currently executing in the __main__ module, if any. */</span></span><br><span class="line">    PyThreadState *main;</span><br><span class="line">    <span class="comment">/* Used in Modules/_threadmodule.c. */</span></span><br><span class="line">    Py_ssize_t count;</span><br><span class="line">    <span class="comment">/* Support for runtime thread stack size tuning.</span></span><br><span class="line"><span class="comment">       A value of 0 means using the platform&#x27;s default stack size</span></span><br><span class="line"><span class="comment">       or the size specified by the THREAD_STACK_SIZE macro. */</span></span><br><span class="line">    <span class="comment">/* Used in Python/thread.c. */</span></span><br><span class="line">    <span class="type">size_t</span> stacksize;</span><br><span class="line">&#125; threads;</span><br></pre></td></tr></table></figure><p>而 <code>PyThreadState</code> 中和核心的 <code>struct _PyInterpreterFrame *current_frame</code> 就是我们需要的 frame state，整个流程大概如下</p><pre><code class="highlight mermaid">graph TD    PyRuntime[&quot;_PyRuntime (pyruntimestate)&quot;] --&gt; Interpreters[&quot;interpreters (pyinterpreters)&quot;]    Interpreters --&gt;|head| InterpreterStateHead[&quot;PyInterpreterState *head&quot;]    Interpreters --&gt;|main| InterpreterStateMain[&quot;PyInterpreterState *main&quot;]        %% Define interpreter state structure    subgraph PyInterpreterState        InterpreterID[&quot;int64_t id&quot;]         ThreadsStruct[&quot;struct pythreads threads&quot;]        NextInterpreter[&quot;PyInterpreterState *next&quot;]    end        InterpreterStateHead --- PyInterpreterState    InterpreterStateMain --- PyInterpreterState        %% Link to threads structure    ThreadsStruct --&gt; ThreadHead[&quot;PyThreadState *head&quot;]    ThreadsStruct --&gt; ThreadMain[&quot;PyThreadState *main&quot;]        %% Define thread state structure    subgraph PyThreadState        ThreadID[&quot;uint64_t thread_id&quot;]        InterpreterPtr[&quot;PyInterpreterState *interp&quot;]        CurrentFrame[&quot;_PyInterpreterFrame *current_frame&quot;]        NextThread[&quot;PyThreadState *next&quot;]    end        ThreadHead --- PyThreadState    ThreadMain --- PyThreadState        %% Frame structure    CurrentFrame --&gt; Frame[&quot;_PyInterpreterFrame structure&quot;]        subgraph _PyInterpreterFrame        PreviousFrame[&quot;_PyInterpreterFrame *previous&quot;]        CodeObject[&quot;PyCodeObject *f_code&quot;]        Locals[&quot;PyObject **localsplus&quot;]    end        %% Connected paths in color    PyRuntime ==&gt;|&quot;Main Path&quot;| Interpreters    Interpreters ==&gt;|&quot;Main Path&quot;| InterpreterStateMain    InterpreterStateMain ==&gt;|&quot;Main Path&quot;| ThreadsStruct    ThreadsStruct ==&gt;|&quot;Main Path&quot;| ThreadMain    ThreadMain ==&gt;|&quot;Main Path&quot;| CurrentFrame    CurrentFrame ==&gt;|&quot;Main Path&quot;| Frame        class PyRuntime,InterpreterStateMain,ThreadMain,CurrentFrame,Frame mainPath;    classDef mainPath fill:#f96,stroke:#333,stroke-width:2px;    classDef mainNodes fill:#f9f,stroke:#333,stroke-width:2px;</code></pre><p>那么我们现在来解决第一个问题，我们怎么样获取在内存中的 <code>_PyRuntime</code> 的地址呢？</p><p>我们把这个问题抽象成下面最简单一个 C 代码</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string">&lt;stdio.h&gt;</span></span></span><br><span class="line"></span><br><span class="line"><span class="type">int</span> abc=<span class="number">3</span>;</span><br><span class="line"></span><br><span class="line"><span class="type">int</span> <span class="title function_">main</span><span class="params">()</span> &#123;</span><br><span class="line">    <span class="built_in">printf</span>(<span class="string">&quot;abc: %p\n&quot;</span>, &amp;abc);</span><br><span class="line">    <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>我们怎么样获取 abc 的地址呢？这里写过 C 的同学可能反应过来了，我们可以使用 <code>__attribute__((section()))</code> 的语法，来将其放到一个特定的段中</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string">&lt;stdio.h&gt;</span></span></span><br><span class="line"></span><br><span class="line"><span class="type">int</span> abc __attribute__((section(<span class="string">&quot;.my_section&quot;</span>))) = <span class="number">3</span>;</span><br><span class="line"></span><br><span class="line"><span class="type">int</span> <span class="title function_">main</span><span class="params">()</span> &#123;</span><br><span class="line">    <span class="built_in">printf</span>(<span class="string">&quot;abc: %p\n&quot;</span>, &amp;abc);</span><br><span class="line">    <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>我们编译，并用 <code>readelf</code> 来解析一下二进制</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">╰─ readelf -S ./a.out| grep my_section </span><br><span class="line">  [25] .my_section       PROGBITS         0000000000004018  00003018</span><br></pre></td></tr></table></figure><p>我们能看到这里我们得到了一个相对地址。后续我们就可以通过解析 ELF 来遍历寻找到 <code>abc</code> 变量的地址</p><p>那么在 Python 中同样如此，在代码中有这样一段代码</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="keyword">define</span> GENERATE_DEBUG_SECTION(name, declaration)     \</span></span><br><span class="line"><span class="meta">   _GENERATE_DEBUG_SECTION_WINDOWS(name)            \</span></span><br><span class="line"><span class="meta">   _GENERATE_DEBUG_SECTION_APPLE(name)              \</span></span><br><span class="line"><span class="meta">   declaration                                      \</span></span><br><span class="line"><span class="meta">   _GENERATE_DEBUG_SECTION_LINUX(name)</span></span><br><span class="line"></span><br><span class="line"><span class="comment">// Please note that section names are truncated to eight bytes</span></span><br><span class="line"><span class="comment">// on Windows!</span></span><br><span class="line"><span class="meta">#<span class="keyword">if</span> defined(MS_WINDOWS)</span></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> _GENERATE_DEBUG_SECTION_WINDOWS(name)                       \</span></span><br><span class="line"><span class="meta">   <span class="keyword">_Pragma</span>(Py_STRINGIFY(section(Py_STRINGIFY(name), read, write))) \</span></span><br><span class="line"><span class="meta">   __declspec(allocate(Py_STRINGIFY(name)))</span></span><br><span class="line"><span class="meta">#<span class="keyword">else</span></span></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> _GENERATE_DEBUG_SECTION_WINDOWS(name)</span></span><br><span class="line"><span class="meta">#<span class="keyword">endif</span></span></span><br><span class="line"></span><br><span class="line"><span class="meta">#<span class="keyword">if</span> defined(__APPLE__)</span></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> _GENERATE_DEBUG_SECTION_APPLE(name) \</span></span><br><span class="line"><span class="meta">   __attribute__((section(SEG_DATA <span class="string">&quot;,&quot;</span> Py_STRINGIFY(name))))      \</span></span><br><span class="line"><span class="meta">   __attribute__((used))</span></span><br><span class="line"><span class="meta">#<span class="keyword">else</span></span></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> _GENERATE_DEBUG_SECTION_APPLE(name)</span></span><br><span class="line"><span class="meta">#<span class="keyword">endif</span></span></span><br><span class="line"></span><br><span class="line"><span class="meta">#<span class="keyword">if</span> defined(__linux__) &amp;&amp; (defined(__GNUC__) || defined(__clang__))</span></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> _GENERATE_DEBUG_SECTION_LINUX(name) \</span></span><br><span class="line"><span class="meta">   __attribute__((section(<span class="string">&quot;.&quot;</span> Py_STRINGIFY(name))))               \</span></span><br><span class="line"><span class="meta">   __attribute__((used))</span></span><br><span class="line"><span class="meta">#<span class="keyword">else</span></span></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> _GENERATE_DEBUG_SECTION_LINUX(name)</span></span><br><span class="line"><span class="meta">#<span class="keyword">endif</span></span></span><br><span class="line"></span><br><span class="line">GENERATE_DEBUG_SECTION(PyRuntime, _PyRuntimeState _PyRuntime)</span><br><span class="line">= _PyRuntimeState_INIT(_PyRuntime, _Py_Debug_Cookie);</span><br><span class="line">_Py_COMP_DIAG_POP</span><br></pre></td></tr></table></figure><p>这样我们就能比较方便的获取到 PyRuntime 在内存中的地址。</p><p>那么现在第二个问题是，我们怎么样通过我们前面介绍的调用链获取到地址？</p><p>大家可能第一反应还是想通过维护不同版本的数据结构来获取具体的地址。不过这里我们有没有办法可以用更简单的方法来处理呢？答案是有的</p><p>眼尖的同学可能看到了我们在 <code>pyruntimestate</code> 中有一个字段叫 <code>debug_offsets</code>，我们来看下我们怎么初始化这个字段的吧</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br><span class="line">100</span><br><span class="line">101</span><br><span class="line">102</span><br><span class="line">103</span><br><span class="line">104</span><br><span class="line">105</span><br><span class="line">106</span><br><span class="line">107</span><br><span class="line">108</span><br><span class="line">109</span><br><span class="line">110</span><br><span class="line">111</span><br><span class="line">112</span><br><span class="line">113</span><br><span class="line">114</span><br><span class="line">115</span><br><span class="line">116</span><br><span class="line">117</span><br><span class="line">118</span><br><span class="line">119</span><br><span class="line">120</span><br><span class="line">121</span><br><span class="line">122</span><br><span class="line">123</span><br><span class="line">124</span><br><span class="line">125</span><br><span class="line">126</span><br><span class="line">127</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="keyword">define</span> _Py_DebugOffsets_INIT(debug_cookie) &#123; \</span></span><br><span class="line"><span class="meta">    .cookie = debug_cookie, \</span></span><br><span class="line"><span class="meta">    .version = PY_VERSION_HEX, \</span></span><br><span class="line"><span class="meta">    .free_threaded = _Py_Debug_Free_Threaded, \</span></span><br><span class="line"><span class="meta">    .runtime_state = &#123; \</span></span><br><span class="line"><span class="meta">        .size = sizeof(_PyRuntimeState), \</span></span><br><span class="line"><span class="meta">        .finalizing = offsetof(_PyRuntimeState, _finalizing), \</span></span><br><span class="line"><span class="meta">        .interpreters_head = offsetof(_PyRuntimeState, interpreters.head), \</span></span><br><span class="line"><span class="meta">    &#125;, \</span></span><br><span class="line"><span class="meta">    .interpreter_state = &#123; \</span></span><br><span class="line"><span class="meta">        .size = sizeof(PyInterpreterState), \</span></span><br><span class="line"><span class="meta">        .id = offsetof(PyInterpreterState, id), \</span></span><br><span class="line"><span class="meta">        .next = offsetof(PyInterpreterState, next), \</span></span><br><span class="line"><span class="meta">        .threads_head = offsetof(PyInterpreterState, threads.head), \</span></span><br><span class="line"><span class="meta">        .threads_main = offsetof(PyInterpreterState, threads.main), \</span></span><br><span class="line"><span class="meta">        .gc = offsetof(PyInterpreterState, gc), \</span></span><br><span class="line"><span class="meta">        .imports_modules = offsetof(PyInterpreterState, imports.modules), \</span></span><br><span class="line"><span class="meta">        .sysdict = offsetof(PyInterpreterState, sysdict), \</span></span><br><span class="line"><span class="meta">        .builtins = offsetof(PyInterpreterState, builtins), \</span></span><br><span class="line"><span class="meta">        .ceval_gil = offsetof(PyInterpreterState, ceval.gil), \</span></span><br><span class="line"><span class="meta">        .gil_runtime_state = offsetof(PyInterpreterState, _gil), \</span></span><br><span class="line"><span class="meta">        .gil_runtime_state_enabled = _Py_Debug_gilruntimestate_enabled, \</span></span><br><span class="line"><span class="meta">        .gil_runtime_state_locked = offsetof(PyInterpreterState, _gil.locked), \</span></span><br><span class="line"><span class="meta">        .gil_runtime_state_holder = offsetof(PyInterpreterState, _gil.last_holder), \</span></span><br><span class="line"><span class="meta">    &#125;, \</span></span><br><span class="line"><span class="meta">    .thread_state = &#123; \</span></span><br><span class="line"><span class="meta">        .size = sizeof(PyThreadState), \</span></span><br><span class="line"><span class="meta">        .prev = offsetof(PyThreadState, prev), \</span></span><br><span class="line"><span class="meta">        .next = offsetof(PyThreadState, next), \</span></span><br><span class="line"><span class="meta">        .interp = offsetof(PyThreadState, interp), \</span></span><br><span class="line"><span class="meta">        .current_frame = offsetof(PyThreadState, current_frame), \</span></span><br><span class="line"><span class="meta">        .thread_id = offsetof(PyThreadState, thread_id), \</span></span><br><span class="line"><span class="meta">        .native_thread_id = offsetof(PyThreadState, native_thread_id), \</span></span><br><span class="line"><span class="meta">        .datastack_chunk = offsetof(PyThreadState, datastack_chunk), \</span></span><br><span class="line"><span class="meta">        .status = offsetof(PyThreadState, _status), \</span></span><br><span class="line"><span class="meta">    &#125;, \</span></span><br><span class="line"><span class="meta">    .interpreter_frame = &#123; \</span></span><br><span class="line"><span class="meta">        .size = sizeof(_PyInterpreterFrame), \</span></span><br><span class="line"><span class="meta">        .previous = offsetof(_PyInterpreterFrame, previous), \</span></span><br><span class="line"><span class="meta">        .executable = offsetof(_PyInterpreterFrame, f_executable), \</span></span><br><span class="line"><span class="meta">        .instr_ptr = offsetof(_PyInterpreterFrame, instr_ptr), \</span></span><br><span class="line"><span class="meta">        .localsplus = offsetof(_PyInterpreterFrame, localsplus), \</span></span><br><span class="line"><span class="meta">        .owner = offsetof(_PyInterpreterFrame, owner), \</span></span><br><span class="line"><span class="meta">        .stackpointer = offsetof(_PyInterpreterFrame, stackpointer), \</span></span><br><span class="line"><span class="meta">    &#125;, \</span></span><br><span class="line"><span class="meta">    .code_object = &#123; \</span></span><br><span class="line"><span class="meta">        .size = sizeof(PyCodeObject), \</span></span><br><span class="line"><span class="meta">        .filename = offsetof(PyCodeObject, co_filename), \</span></span><br><span class="line"><span class="meta">        .name = offsetof(PyCodeObject, co_name), \</span></span><br><span class="line"><span class="meta">        .qualname = offsetof(PyCodeObject, co_qualname), \</span></span><br><span class="line"><span class="meta">        .linetable = offsetof(PyCodeObject, co_linetable), \</span></span><br><span class="line"><span class="meta">        .firstlineno = offsetof(PyCodeObject, co_firstlineno), \</span></span><br><span class="line"><span class="meta">        .argcount = offsetof(PyCodeObject, co_argcount), \</span></span><br><span class="line"><span class="meta">        .localsplusnames = offsetof(PyCodeObject, co_localsplusnames), \</span></span><br><span class="line"><span class="meta">        .localspluskinds = offsetof(PyCodeObject, co_localspluskinds), \</span></span><br><span class="line"><span class="meta">        .co_code_adaptive = offsetof(PyCodeObject, co_code_adaptive), \</span></span><br><span class="line"><span class="meta">    &#125;, \</span></span><br><span class="line"><span class="meta">    .pyobject = &#123; \</span></span><br><span class="line"><span class="meta">        .size = sizeof(PyObject), \</span></span><br><span class="line"><span class="meta">        .ob_type = offsetof(PyObject, ob_type), \</span></span><br><span class="line"><span class="meta">    &#125;, \</span></span><br><span class="line"><span class="meta">    .type_object = &#123; \</span></span><br><span class="line"><span class="meta">        .size = sizeof(PyTypeObject), \</span></span><br><span class="line"><span class="meta">        .tp_name = offsetof(PyTypeObject, tp_name), \</span></span><br><span class="line"><span class="meta">        .tp_repr = offsetof(PyTypeObject, tp_repr), \</span></span><br><span class="line"><span class="meta">        .tp_flags = offsetof(PyTypeObject, tp_flags), \</span></span><br><span class="line"><span class="meta">    &#125;, \</span></span><br><span class="line"><span class="meta">    .tuple_object = &#123; \</span></span><br><span class="line"><span class="meta">        .size = sizeof(PyTupleObject), \</span></span><br><span class="line"><span class="meta">        .ob_item = offsetof(PyTupleObject, ob_item), \</span></span><br><span class="line"><span class="meta">        .ob_size = offsetof(PyTupleObject, ob_base.ob_size), \</span></span><br><span class="line"><span class="meta">    &#125;, \</span></span><br><span class="line"><span class="meta">    .list_object = &#123; \</span></span><br><span class="line"><span class="meta">        .size = sizeof(PyListObject), \</span></span><br><span class="line"><span class="meta">        .ob_item = offsetof(PyListObject, ob_item), \</span></span><br><span class="line"><span class="meta">        .ob_size = offsetof(PyListObject, ob_base.ob_size), \</span></span><br><span class="line"><span class="meta">    &#125;, \</span></span><br><span class="line"><span class="meta">    .set_object = &#123; \</span></span><br><span class="line"><span class="meta">        .size = sizeof(PySetObject), \</span></span><br><span class="line"><span class="meta">        .used = offsetof(PySetObject, used), \</span></span><br><span class="line"><span class="meta">        .table = offsetof(PySetObject, table), \</span></span><br><span class="line"><span class="meta">        .mask = offsetof(PySetObject, mask), \</span></span><br><span class="line"><span class="meta">    &#125;, \</span></span><br><span class="line"><span class="meta">    .dict_object = &#123; \</span></span><br><span class="line"><span class="meta">        .size = sizeof(PyDictObject), \</span></span><br><span class="line"><span class="meta">        .ma_keys = offsetof(PyDictObject, ma_keys), \</span></span><br><span class="line"><span class="meta">        .ma_values = offsetof(PyDictObject, ma_values), \</span></span><br><span class="line"><span class="meta">    &#125;, \</span></span><br><span class="line"><span class="meta">    .float_object = &#123; \</span></span><br><span class="line"><span class="meta">        .size = sizeof(PyFloatObject), \</span></span><br><span class="line"><span class="meta">        .ob_fval = offsetof(PyFloatObject, ob_fval), \</span></span><br><span class="line"><span class="meta">    &#125;, \</span></span><br><span class="line"><span class="meta">    .long_object = &#123; \</span></span><br><span class="line"><span class="meta">        .size = sizeof(PyLongObject), \</span></span><br><span class="line"><span class="meta">        .lv_tag = offsetof(PyLongObject, long_value.lv_tag), \</span></span><br><span class="line"><span class="meta">        .ob_digit = offsetof(PyLongObject, long_value.ob_digit), \</span></span><br><span class="line"><span class="meta">    &#125;, \</span></span><br><span class="line"><span class="meta">    .bytes_object = &#123; \</span></span><br><span class="line"><span class="meta">        .size = sizeof(PyBytesObject), \</span></span><br><span class="line"><span class="meta">        .ob_size = offsetof(PyBytesObject, ob_base.ob_size), \</span></span><br><span class="line"><span class="meta">        .ob_sval = offsetof(PyBytesObject, ob_sval), \</span></span><br><span class="line"><span class="meta">    &#125;, \</span></span><br><span class="line"><span class="meta">    .unicode_object = &#123; \</span></span><br><span class="line"><span class="meta">        .size = sizeof(PyUnicodeObject), \</span></span><br><span class="line"><span class="meta">        .state = offsetof(PyUnicodeObject, _base._base.state), \</span></span><br><span class="line"><span class="meta">        .length = offsetof(PyUnicodeObject, _base._base.length), \</span></span><br><span class="line"><span class="meta">        .asciiobject_size = sizeof(PyASCIIObject), \</span></span><br><span class="line"><span class="meta">    &#125;, \</span></span><br><span class="line"><span class="meta">    .gc = &#123; \</span></span><br><span class="line"><span class="meta">        .size = sizeof(struct _gc_runtime_state), \</span></span><br><span class="line"><span class="meta">        .collecting = offsetof(struct _gc_runtime_state, collecting), \</span></span><br><span class="line"><span class="meta">    &#125;, \</span></span><br><span class="line"><span class="meta">    .gen_object = &#123; \</span></span><br><span class="line"><span class="meta">        .size = sizeof(PyGenObject), \</span></span><br><span class="line"><span class="meta">        .gi_name = offsetof(PyGenObject, gi_name), \</span></span><br><span class="line"><span class="meta">        .gi_iframe = offsetof(PyGenObject, gi_iframe), \</span></span><br><span class="line"><span class="meta">        .gi_frame_state = offsetof(PyGenObject, gi_frame_state), \</span></span><br><span class="line"><span class="meta">    &#125;, \</span></span><br><span class="line"><span class="meta">    .debugger_support = &#123; \</span></span><br><span class="line"><span class="meta">        .eval_breaker = offsetof(PyThreadState, eval_breaker), \</span></span><br><span class="line"><span class="meta">        .remote_debugger_support = offsetof(PyThreadState, remote_debugger_support),  \</span></span><br><span class="line"><span class="meta">        .remote_debugging_enabled = offsetof(PyInterpreterState, config.remote_debug),  \</span></span><br><span class="line"><span class="meta">        .debugger_pending_call = offsetof(_PyRemoteDebuggerSupport, debugger_pending_call),  \</span></span><br><span class="line"><span class="meta">        .debugger_script_path = offsetof(_PyRemoteDebuggerSupport, debugger_script_path),  \</span></span><br><span class="line"><span class="meta">        .debugger_script_path_size = MAX_SCRIPT_PATH_SIZE, \</span></span><br><span class="line"><span class="meta">    &#125;, \</span></span><br><span class="line"><span class="meta">&#125;</span></span><br></pre></td></tr></table></figure><p>我们能看到我们使用了 <code>offsetof</code> 这个非常经典的宏来将一下我们常用的字段相较于结构体的偏移写入到 <code>debug_offsets</code> 中去。而 <code>debug_offsets</code> 将固定存在于 <code>pyruntimestate</code> 的第一个字段，同时起改变频率相对较低，所以我们就可以通过 <code>debugger_support</code> 获取不同地址的偏移量来获取最终我们想要的数据。</p><p>通过这样的做法，我们实际上就有很多很好玩的事情可以做了。实际上官方也是基于这样一套机制提出了 PEP 768 – Safe external debugger interface for CPython <a href="https://peps.python.org/pep-0768/">https://peps.python.org/pep-0768/</a>。可以允许用户远程的为一个 Python 进程注入一段调试代码</p><p>我们来看一下这个 PEP 的核心实现</p><p>在前面介绍过的 ThreadState 中新增了一组结构</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">typedef</span> <span class="class"><span class="keyword">struct</span> _<span class="title">remote_debugger_support</span> &#123;</span></span><br><span class="line">    <span class="type">int32_t</span> debugger_pending_call;</span><br><span class="line">    <span class="type">char</span> debugger_script_path[MAX_SCRIPT_PATH_SIZE];</span><br><span class="line">&#125; _PyRemoteDebuggerSupport;</span><br></pre></td></tr></table></figure><p>在执行过程中，如果 <code>debugger_pending_call</code> 为 1 的时候，我们就会去执行 <code>debugger_script_path</code> 中的脚本</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br></pre></td><td class="code"><pre><span class="line"><span class="type">int</span> _PyRunRemoteDebugger(PyThreadState *tstate)</span><br><span class="line">&#123;</span><br><span class="line">    <span class="type">const</span> PyConfig *config = _PyInterpreterState_GetConfig(tstate-&gt;interp);</span><br><span class="line">    <span class="keyword">if</span> (config-&gt;remote_debug == <span class="number">1</span></span><br><span class="line">         &amp;&amp; tstate-&gt;remote_debugger_support.debugger_pending_call == <span class="number">1</span>)</span><br><span class="line">    &#123;</span><br><span class="line">        tstate-&gt;remote_debugger_support.debugger_pending_call = <span class="number">0</span>;</span><br><span class="line"></span><br><span class="line">        <span class="comment">// Immediately make a copy in case of a race with another debugger</span></span><br><span class="line">        <span class="comment">// process that&#x27;s trying to write to the buffer. At least this way</span></span><br><span class="line">        <span class="comment">// we&#x27;ll be internally consistent: what we audit is what we run.</span></span><br><span class="line">        <span class="type">const</span> <span class="type">size_t</span> pathsz</span><br><span class="line">            = <span class="keyword">sizeof</span>(tstate-&gt;remote_debugger_support.debugger_script_path);</span><br><span class="line"></span><br><span class="line">        <span class="type">char</span> *path = PyMem_Malloc(pathsz);</span><br><span class="line">        <span class="keyword">if</span> (path) &#123;</span><br><span class="line">            <span class="comment">// And don&#x27;t assume the debugger correctly null terminated it.</span></span><br><span class="line">            <span class="built_in">memcpy</span>(</span><br><span class="line">                path,</span><br><span class="line">                tstate-&gt;remote_debugger_support.debugger_script_path,</span><br><span class="line">                pathsz);</span><br><span class="line">            path[pathsz - <span class="number">1</span>] = <span class="string">&#x27;\0&#x27;</span>;</span><br><span class="line">            <span class="keyword">if</span> (*path) &#123;</span><br><span class="line">                run_remote_debugger_script(path);</span><br><span class="line">            &#125;</span><br><span class="line">            PyMem_Free(path);</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">    <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>那么问题来了，我们现在怎么样给目标 Python 进程注入对应的值呢？我们来看看 remote_debugging.c 中的实现</p><p>首先入口函数为 <code>_PySysRemoteDebug_SendExec</code></p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br></pre></td><td class="code"><pre><span class="line"><span class="type">int</span></span><br><span class="line">_PySysRemoteDebug_SendExec(<span class="type">int</span> pid, <span class="type">int</span> tid, <span class="type">const</span> <span class="type">char</span> *debugger_script_path)</span><br><span class="line">&#123;</span><br><span class="line"><span class="meta">#<span class="keyword">if</span> !defined(Py_SUPPORTS_REMOTE_DEBUG)</span></span><br><span class="line">    PyErr_SetString(PyExc_RuntimeError, <span class="string">&quot;Remote debugging is not supported on this platform&quot;</span>);</span><br><span class="line">    <span class="keyword">return</span> <span class="number">-1</span>;</span><br><span class="line"><span class="meta">#<span class="keyword">elif</span> !defined(Py_REMOTE_DEBUG)</span></span><br><span class="line">    PyErr_SetString(PyExc_RuntimeError, <span class="string">&quot;Remote debugging support has not been compiled in&quot;</span>);</span><br><span class="line">    <span class="keyword">return</span> <span class="number">-1</span>;</span><br><span class="line"><span class="meta">#<span class="keyword">else</span></span></span><br><span class="line"></span><br><span class="line">    PyThreadState *tstate = _PyThreadState_GET();</span><br><span class="line">    <span class="type">const</span> PyConfig *config = _PyInterpreterState_GetConfig(tstate-&gt;interp);</span><br><span class="line">    <span class="keyword">if</span> (config-&gt;remote_debug != <span class="number">1</span>) &#123;</span><br><span class="line">        PyErr_SetString(PyExc_RuntimeError, <span class="string">&quot;Remote debugging is not enabled&quot;</span>);</span><br><span class="line">        <span class="keyword">return</span> <span class="number">-1</span>;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="type">proc_handle_t</span> handle;</span><br><span class="line">    <span class="keyword">if</span> (init_proc_handle(&amp;handle, pid) &lt; <span class="number">0</span>) &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="number">-1</span>;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="type">int</span> rc = send_exec_to_proc_handle(&amp;handle, tid, debugger_script_path);</span><br><span class="line">    cleanup_proc_handle(&amp;handle);</span><br><span class="line">    <span class="keyword">return</span> rc;</span><br><span class="line"><span class="meta">#<span class="keyword">endif</span></span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>前面都是一些例行的检查，我们来看看 <code>send_exec_to_proc_handle</code> 这个函数</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br><span class="line">100</span><br><span class="line">101</span><br><span class="line">102</span><br><span class="line">103</span><br><span class="line">104</span><br><span class="line">105</span><br><span class="line">106</span><br><span class="line">107</span><br><span class="line">108</span><br><span class="line">109</span><br><span class="line">110</span><br><span class="line">111</span><br><span class="line">112</span><br><span class="line">113</span><br><span class="line">114</span><br><span class="line">115</span><br><span class="line">116</span><br><span class="line">117</span><br><span class="line">118</span><br><span class="line">119</span><br><span class="line">120</span><br><span class="line">121</span><br><span class="line">122</span><br><span class="line">123</span><br><span class="line">124</span><br><span class="line">125</span><br><span class="line">126</span><br><span class="line">127</span><br><span class="line">128</span><br><span class="line">129</span><br><span class="line">130</span><br><span class="line">131</span><br><span class="line">132</span><br><span class="line">133</span><br><span class="line">134</span><br><span class="line">135</span><br><span class="line">136</span><br><span class="line">137</span><br><span class="line">138</span><br><span class="line">139</span><br><span class="line">140</span><br><span class="line">141</span><br><span class="line">142</span><br><span class="line">143</span><br><span class="line">144</span><br><span class="line">145</span><br><span class="line">146</span><br><span class="line">147</span><br><span class="line">148</span><br><span class="line">149</span><br><span class="line">150</span><br><span class="line">151</span><br><span class="line">152</span><br><span class="line">153</span><br><span class="line">154</span><br><span class="line">155</span><br><span class="line">156</span><br><span class="line">157</span><br><span class="line">158</span><br><span class="line">159</span><br><span class="line">160</span><br><span class="line">161</span><br><span class="line">162</span><br></pre></td><td class="code"><pre><span class="line"><span class="type">static</span> <span class="type">int</span></span><br><span class="line"><span class="title function_">send_exec_to_proc_handle</span><span class="params">(<span class="type">proc_handle_t</span> *handle, <span class="type">int</span> tid, <span class="type">const</span> <span class="type">char</span> *debugger_script_path)</span></span><br><span class="line">&#123;</span><br><span class="line">    <span class="type">uintptr_t</span> runtime_start_address;</span><br><span class="line">    <span class="class"><span class="keyword">struct</span> _<span class="title">Py_DebugOffsets</span> <span class="title">debug_offsets</span>;</span></span><br><span class="line"></span><br><span class="line">    <span class="keyword">if</span> (read_offsets(handle, &amp;runtime_start_address, &amp;debug_offsets)) &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="number">-1</span>;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="type">uintptr_t</span> interpreter_state_list_head = (<span class="type">uintptr_t</span>)debug_offsets.runtime_state.interpreters_head;</span><br><span class="line"></span><br><span class="line">    <span class="type">uintptr_t</span> interpreter_state_addr;</span><br><span class="line">    <span class="keyword">if</span> (<span class="number">0</span> != read_memory(</span><br><span class="line">            handle,</span><br><span class="line">            runtime_start_address + interpreter_state_list_head,</span><br><span class="line">            <span class="keyword">sizeof</span>(<span class="type">void</span>*),</span><br><span class="line">            &amp;interpreter_state_addr))</span><br><span class="line">    &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="number">-1</span>;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="keyword">if</span> (interpreter_state_addr == <span class="number">0</span>) &#123;</span><br><span class="line">        PyErr_SetString(PyExc_RuntimeError, <span class="string">&quot;Can&#x27;t find a running interpreter in the remote process&quot;</span>);</span><br><span class="line">        <span class="keyword">return</span> <span class="number">-1</span>;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="type">int</span> is_remote_debugging_enabled = <span class="number">0</span>;</span><br><span class="line">    <span class="keyword">if</span> (<span class="number">0</span> != read_memory(</span><br><span class="line">            handle,</span><br><span class="line">            interpreter_state_addr + debug_offsets.debugger_support.remote_debugging_enabled,</span><br><span class="line">            <span class="keyword">sizeof</span>(<span class="type">int</span>),</span><br><span class="line">            &amp;is_remote_debugging_enabled))</span><br><span class="line">    &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="number">-1</span>;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="keyword">if</span> (is_remote_debugging_enabled != <span class="number">1</span>) &#123;</span><br><span class="line">        PyErr_SetString(</span><br><span class="line">            PyExc_RuntimeError,</span><br><span class="line">            <span class="string">&quot;Remote debugging is not enabled in the remote process&quot;</span>);</span><br><span class="line">        <span class="keyword">return</span> <span class="number">-1</span>;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="type">uintptr_t</span> thread_state_addr;</span><br><span class="line">    <span class="type">unsigned</span> <span class="type">long</span> this_tid = <span class="number">0</span>;</span><br><span class="line"></span><br><span class="line">    <span class="keyword">if</span> (tid != <span class="number">0</span>) &#123;</span><br><span class="line">        <span class="keyword">if</span> (<span class="number">0</span> != read_memory(</span><br><span class="line">                handle,</span><br><span class="line">                interpreter_state_addr + debug_offsets.interpreter_state.threads_head,</span><br><span class="line">                <span class="keyword">sizeof</span>(<span class="type">void</span>*),</span><br><span class="line">                &amp;thread_state_addr))</span><br><span class="line">        &#123;</span><br><span class="line">            <span class="keyword">return</span> <span class="number">-1</span>;</span><br><span class="line">        &#125;</span><br><span class="line">        <span class="keyword">while</span> (thread_state_addr != <span class="number">0</span>) &#123;</span><br><span class="line">            <span class="keyword">if</span> (<span class="number">0</span> != read_memory(</span><br><span class="line">                    handle,</span><br><span class="line">                    thread_state_addr + debug_offsets.thread_state.native_thread_id,</span><br><span class="line">                    <span class="keyword">sizeof</span>(this_tid),</span><br><span class="line">                    &amp;this_tid))</span><br><span class="line">            &#123;</span><br><span class="line">                <span class="keyword">return</span> <span class="number">-1</span>;</span><br><span class="line">            &#125;</span><br><span class="line"></span><br><span class="line">            <span class="keyword">if</span> (this_tid == (<span class="type">unsigned</span> <span class="type">long</span>)tid) &#123;</span><br><span class="line">                <span class="keyword">break</span>;</span><br><span class="line">            &#125;</span><br><span class="line"></span><br><span class="line">            <span class="keyword">if</span> (<span class="number">0</span> != read_memory(</span><br><span class="line">                    handle,</span><br><span class="line">                    thread_state_addr + debug_offsets.thread_state.next,</span><br><span class="line">                    <span class="keyword">sizeof</span>(<span class="type">void</span>*),</span><br><span class="line">                    &amp;thread_state_addr))</span><br><span class="line">            &#123;</span><br><span class="line">                <span class="keyword">return</span> <span class="number">-1</span>;</span><br><span class="line">            &#125;</span><br><span class="line">        &#125;</span><br><span class="line"></span><br><span class="line">        <span class="keyword">if</span> (thread_state_addr == <span class="number">0</span>) &#123;</span><br><span class="line">            PyErr_SetString(</span><br><span class="line">                PyExc_RuntimeError,</span><br><span class="line">                <span class="string">&quot;Can&#x27;t find the specified thread in the remote process&quot;</span>);</span><br><span class="line">            <span class="keyword">return</span> <span class="number">-1</span>;</span><br><span class="line">        &#125;</span><br><span class="line">    &#125; <span class="keyword">else</span> &#123;</span><br><span class="line">        <span class="keyword">if</span> (<span class="number">0</span> != read_memory(</span><br><span class="line">                handle,</span><br><span class="line">                interpreter_state_addr + debug_offsets.interpreter_state.threads_main,</span><br><span class="line">                <span class="keyword">sizeof</span>(<span class="type">void</span>*),</span><br><span class="line">                &amp;thread_state_addr))</span><br><span class="line">        &#123;</span><br><span class="line">            <span class="keyword">return</span> <span class="number">-1</span>;</span><br><span class="line">        &#125;</span><br><span class="line"></span><br><span class="line">        <span class="keyword">if</span> (thread_state_addr == <span class="number">0</span>) &#123;</span><br><span class="line">            PyErr_SetString(</span><br><span class="line">                PyExc_RuntimeError,</span><br><span class="line">                <span class="string">&quot;Can&#x27;t find the main thread in the remote process&quot;</span>);</span><br><span class="line">            <span class="keyword">return</span> <span class="number">-1</span>;</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="comment">// Ensure our path is not too long</span></span><br><span class="line">    <span class="keyword">if</span> (debug_offsets.debugger_support.debugger_script_path_size &lt;= <span class="built_in">strlen</span>(debugger_script_path)) &#123;</span><br><span class="line">        PyErr_SetString(PyExc_ValueError, <span class="string">&quot;Debugger script path is too long&quot;</span>);</span><br><span class="line">        <span class="keyword">return</span> <span class="number">-1</span>;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="type">uintptr_t</span> debugger_script_path_addr = (<span class="type">uintptr_t</span>)(</span><br><span class="line">        thread_state_addr +</span><br><span class="line">        debug_offsets.debugger_support.remote_debugger_support +</span><br><span class="line">        debug_offsets.debugger_support.debugger_script_path);</span><br><span class="line">    <span class="keyword">if</span> (<span class="number">0</span> != write_memory(</span><br><span class="line">            handle,</span><br><span class="line">            debugger_script_path_addr,</span><br><span class="line">            <span class="built_in">strlen</span>(debugger_script_path) + <span class="number">1</span>,</span><br><span class="line">            debugger_script_path))</span><br><span class="line">    &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="number">-1</span>;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="type">int</span> pending_call = <span class="number">1</span>;</span><br><span class="line">    <span class="type">uintptr_t</span> debugger_pending_call_addr = (<span class="type">uintptr_t</span>)(</span><br><span class="line">        thread_state_addr +</span><br><span class="line">        debug_offsets.debugger_support.remote_debugger_support +</span><br><span class="line">        debug_offsets.debugger_support.debugger_pending_call);</span><br><span class="line">    <span class="keyword">if</span> (<span class="number">0</span> != write_memory(</span><br><span class="line">            handle,</span><br><span class="line">            debugger_pending_call_addr,</span><br><span class="line">            <span class="keyword">sizeof</span>(<span class="type">int</span>),</span><br><span class="line">            &amp;pending_call))</span><br><span class="line"></span><br><span class="line">    &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="number">-1</span>;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="type">uintptr_t</span> eval_breaker;</span><br><span class="line">    <span class="keyword">if</span> (<span class="number">0</span> != read_memory(</span><br><span class="line">            handle,</span><br><span class="line">            thread_state_addr + debug_offsets.debugger_support.eval_breaker,</span><br><span class="line">            <span class="keyword">sizeof</span>(<span class="type">uintptr_t</span>),</span><br><span class="line">            &amp;eval_breaker))</span><br><span class="line">    &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="number">-1</span>;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    eval_breaker |= _PY_EVAL_PLEASE_STOP_BIT;</span><br><span class="line"></span><br><span class="line">    <span class="keyword">if</span> (<span class="number">0</span> != write_memory(</span><br><span class="line">            handle,</span><br><span class="line">            thread_state_addr + (<span class="type">uintptr_t</span>)debug_offsets.debugger_support.eval_breaker,</span><br><span class="line">            <span class="keyword">sizeof</span>(<span class="type">uintptr_t</span>),</span><br><span class="line">            &amp;eval_breaker))</span><br><span class="line"></span><br><span class="line">    &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="number">-1</span>;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>我们先不考虑具体的细节的话，这段函数的逻辑还是非常明确的，通过 <code>read_offsets</code> 获取目标的地址偏移，通过 <code>read_memory</code> 这个函数读取不同地址，然后做一些处理后，通过 <code>write_memory</code> 来写入到目标进程中去</p><p>而 <code>read_offsets</code> 这个函数就是我们前面核心提到过的怎么样使用目前 Python 给出的调试信息的例子，我们来看一下其在 Linux 下的实现</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line"><span class="type">static</span> <span class="type">int</span></span><br><span class="line"><span class="title function_">read_offsets</span><span class="params">(</span></span><br><span class="line"><span class="params">    <span class="type">proc_handle_t</span> *handle,</span></span><br><span class="line"><span class="params">    <span class="type">uintptr_t</span> *runtime_start_address,</span></span><br><span class="line"><span class="params">    _Py_DebugOffsets* debug_offsets</span></span><br><span class="line"><span class="params">)</span> &#123;</span><br><span class="line">    <span class="keyword">if</span> (_Py_RemoteDebug_ReadDebugOffsets(handle, runtime_start_address, debug_offsets)) &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="number">-1</span>;</span><br><span class="line">    &#125;</span><br><span class="line">    <span class="keyword">if</span> (ensure_debug_offset_compatibility(debug_offsets)) &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="number">-1</span>;</span><br><span class="line">    &#125;</span><br><span class="line">    <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>这里的核心函数是 <code>_Py_RemoteDebug_ReadDebugOffsets</code>， 我们接着来看这个的实现</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br></pre></td><td class="code"><pre><span class="line"><span class="type">static</span> <span class="type">int</span></span><br><span class="line">_Py_RemoteDebug_ReadDebugOffsets(</span><br><span class="line">    <span class="type">proc_handle_t</span> *handle,</span><br><span class="line">    <span class="type">uintptr_t</span> *runtime_start_address,</span><br><span class="line">    _Py_DebugOffsets* debug_offsets</span><br><span class="line">) &#123;</span><br><span class="line">    *runtime_start_address = _Py_RemoteDebug_GetPyRuntimeAddress(handle);</span><br><span class="line">    <span class="keyword">if</span> (!*runtime_start_address) &#123;</span><br><span class="line">        <span class="keyword">if</span> (!PyErr_Occurred()) &#123;</span><br><span class="line">            PyErr_SetString(</span><br><span class="line">                PyExc_RuntimeError, <span class="string">&quot;Failed to get PyRuntime address&quot;</span>);</span><br><span class="line">        &#125;</span><br><span class="line">        <span class="keyword">return</span> <span class="number">-1</span>;</span><br><span class="line">    &#125;</span><br><span class="line">    <span class="type">size_t</span> size = <span class="keyword">sizeof</span>(<span class="keyword">struct</span> _Py_DebugOffsets);</span><br><span class="line">    <span class="keyword">if</span> (<span class="number">0</span> != _Py_RemoteDebug_ReadRemoteMemory(handle, *runtime_start_address, size, debug_offsets)) &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="number">-1</span>;</span><br><span class="line">    &#125;</span><br><span class="line">    <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>我们注意到，这里的核心还是我们先要获取到 <code>PyRuntime</code> 的地址，那么我们来看看 <code>_Py_RemoteDebug_GetPyRuntimeAddress</code> 的实现</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br></pre></td><td class="code"><pre><span class="line"><span class="type">static</span> <span class="type">uintptr_t</span></span><br><span class="line">_Py_RemoteDebug_GetPyRuntimeAddress(<span class="type">proc_handle_t</span>* handle)</span><br><span class="line">&#123;</span><br><span class="line">    <span class="type">uintptr_t</span> address;</span><br><span class="line">    address = search_linux_map_for_section(handle, <span class="string">&quot;PyRuntime&quot;</span>, <span class="string">&quot;python&quot;</span>);</span><br><span class="line">    <span class="keyword">if</span> (address == <span class="number">0</span>) &#123;</span><br><span class="line">        <span class="comment">// Error out: &#x27;python&#x27; substring covers both executable and DLL</span></span><br><span class="line">        PyErr_SetString(PyExc_RuntimeError, <span class="string">&quot;Failed to find the PyRuntime section in the process.&quot;</span>);</span><br><span class="line">    &#125;</span><br><span class="line">    <span class="keyword">return</span> address;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="type">static</span> <span class="type">uintptr_t</span></span><br><span class="line"><span class="title function_">search_linux_map_for_section</span><span class="params">(<span class="type">proc_handle_t</span> *handle, <span class="type">const</span> <span class="type">char</span>* secname, <span class="type">const</span> <span class="type">char</span>* substr)</span></span><br><span class="line">&#123;</span><br><span class="line">    <span class="type">char</span> maps_file_path[<span class="number">64</span>];</span><br><span class="line">    <span class="built_in">sprintf</span>(maps_file_path, <span class="string">&quot;/proc/%d/maps&quot;</span>, handle-&gt;pid);</span><br><span class="line"></span><br><span class="line">    FILE* maps_file = fopen(maps_file_path, <span class="string">&quot;r&quot;</span>);</span><br><span class="line">    <span class="keyword">if</span> (maps_file == <span class="literal">NULL</span>) &#123;</span><br><span class="line">        PyErr_SetFromErrno(PyExc_OSError);</span><br><span class="line">        <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="type">size_t</span> linelen = <span class="number">0</span>;</span><br><span class="line">    <span class="type">size_t</span> linesz = PATH_MAX;</span><br><span class="line">    <span class="type">char</span> *line = PyMem_Malloc(linesz);</span><br><span class="line">    <span class="keyword">if</span> (!line) &#123;</span><br><span class="line">        fclose(maps_file);</span><br><span class="line">        PyErr_NoMemory();</span><br><span class="line">        <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="type">uintptr_t</span> retval = <span class="number">0</span>;</span><br><span class="line">    <span class="keyword">while</span> (fgets(line + linelen, linesz - linelen, maps_file) != <span class="literal">NULL</span>) &#123;</span><br><span class="line">        linelen = <span class="built_in">strlen</span>(line);</span><br><span class="line">        <span class="keyword">if</span> (line[linelen - <span class="number">1</span>] != <span class="string">&#x27;\n&#x27;</span>) &#123;</span><br><span class="line">            <span class="comment">// Read a partial line: realloc and keep reading where we left off.</span></span><br><span class="line">            <span class="comment">// Note that even the last line will be terminated by a newline.</span></span><br><span class="line">            linesz *= <span class="number">2</span>;</span><br><span class="line">            <span class="type">char</span> *biggerline = PyMem_Realloc(line, linesz);</span><br><span class="line">            <span class="keyword">if</span> (!biggerline) &#123;</span><br><span class="line">                PyMem_Free(line);</span><br><span class="line">                fclose(maps_file);</span><br><span class="line">                PyErr_NoMemory();</span><br><span class="line">                <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">            &#125;</span><br><span class="line">            line = biggerline;</span><br><span class="line">            <span class="keyword">continue</span>;</span><br><span class="line">        &#125;</span><br><span class="line"></span><br><span class="line">        <span class="comment">// Read a full line: strip the newline</span></span><br><span class="line">        line[linelen - <span class="number">1</span>] = <span class="string">&#x27;\0&#x27;</span>;</span><br><span class="line">        <span class="comment">// and prepare to read the next line into the start of the buffer.</span></span><br><span class="line">        linelen = <span class="number">0</span>;</span><br><span class="line"></span><br><span class="line">        <span class="type">unsigned</span> <span class="type">long</span> start = <span class="number">0</span>;</span><br><span class="line">        <span class="type">unsigned</span> <span class="type">long</span> path_pos = <span class="number">0</span>;</span><br><span class="line">        <span class="built_in">sscanf</span>(line, <span class="string">&quot;%lx-%*x %*s %*s %*s %*s %ln&quot;</span>, &amp;start, &amp;path_pos);</span><br><span class="line"></span><br><span class="line">        <span class="keyword">if</span> (!path_pos) &#123;</span><br><span class="line">            <span class="comment">// Line didn&#x27;t match our format string.  This shouldn&#x27;t be</span></span><br><span class="line">            <span class="comment">// possible, but let&#x27;s be defensive and skip the line.</span></span><br><span class="line">            <span class="keyword">continue</span>;</span><br><span class="line">        &#125;</span><br><span class="line"></span><br><span class="line">        <span class="type">const</span> <span class="type">char</span> *path = line + path_pos;</span><br><span class="line">        <span class="type">const</span> <span class="type">char</span> *filename = <span class="built_in">strrchr</span>(path, <span class="string">&#x27;/&#x27;</span>);</span><br><span class="line">        <span class="keyword">if</span> (filename) &#123;</span><br><span class="line">            filename++;  <span class="comment">// Move past the &#x27;/&#x27;</span></span><br><span class="line">        &#125; <span class="keyword">else</span> &#123;</span><br><span class="line">            filename = path;  <span class="comment">// No directories, or an empty string</span></span><br><span class="line">        &#125;</span><br><span class="line"></span><br><span class="line">        <span class="keyword">if</span> (<span class="built_in">strstr</span>(filename, substr)) &#123;</span><br><span class="line">            retval = search_elf_file_for_section(handle, secname, start, path);</span><br><span class="line">            <span class="keyword">if</span> (retval) &#123;</span><br><span class="line">                <span class="keyword">break</span>;</span><br><span class="line">            &#125;</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    PyMem_Free(line);</span><br><span class="line">    fclose(maps_file);</span><br><span class="line"></span><br><span class="line">    <span class="keyword">return</span> retval;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>我们这里能看到 <code>_Py_RemoteDebug_GetPyRuntimeAddress</code> 调用了 <code>search_linux_map_for_section</code> 来获取当前的 <code>PyRuntime</code> 的地址，而 <code>search_linux_map_for_section</code> 则是通过 <code>/proc/${pid}/maps</code> ，暴力遍历 <code>maps</code> 中的内存段来获取具体的地址。</p><p>我们来看看 <code>search_elf_file_for_section</code> 的实现</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br></pre></td><td class="code"><pre><span class="line">search_elf_file_for_section(</span><br><span class="line">        <span class="type">proc_handle_t</span> *handle,</span><br><span class="line">        <span class="type">const</span> <span class="type">char</span>* secname,</span><br><span class="line">        <span class="type">uintptr_t</span> start_address,</span><br><span class="line">        <span class="type">const</span> <span class="type">char</span> *elf_file)</span><br><span class="line">&#123;</span><br><span class="line">    <span class="keyword">if</span> (start_address == <span class="number">0</span>) &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="type">uintptr_t</span> result = <span class="number">0</span>;</span><br><span class="line">    <span class="type">void</span>* file_memory = <span class="literal">NULL</span>;</span><br><span class="line"></span><br><span class="line">    <span class="type">int</span> fd = open(elf_file, O_RDONLY);</span><br><span class="line">    <span class="keyword">if</span> (fd &lt; <span class="number">0</span>) &#123;</span><br><span class="line">        PyErr_SetFromErrno(PyExc_OSError);</span><br><span class="line">        <span class="keyword">goto</span> <span class="built_in">exit</span>;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="class"><span class="keyword">struct</span> <span class="title">stat</span> <span class="title">file_stats</span>;</span></span><br><span class="line">    <span class="keyword">if</span> (fstat(fd, &amp;file_stats) != <span class="number">0</span>) &#123;</span><br><span class="line">        PyErr_SetFromErrno(PyExc_OSError);</span><br><span class="line">        <span class="keyword">goto</span> <span class="built_in">exit</span>;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    file_memory = mmap(<span class="literal">NULL</span>, file_stats.st_size, PROT_READ, MAP_PRIVATE, fd, <span class="number">0</span>);</span><br><span class="line">    <span class="keyword">if</span> (file_memory == MAP_FAILED) &#123;</span><br><span class="line">        PyErr_SetFromErrno(PyExc_OSError);</span><br><span class="line">        <span class="keyword">goto</span> <span class="built_in">exit</span>;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    Elf_Ehdr* elf_header = (Elf_Ehdr*)file_memory;</span><br><span class="line"></span><br><span class="line">    Elf_Shdr* section_header_table = (Elf_Shdr*)(file_memory + elf_header-&gt;e_shoff);</span><br><span class="line"></span><br><span class="line">    Elf_Shdr* shstrtab_section = &amp;section_header_table[elf_header-&gt;e_shstrndx];</span><br><span class="line">    <span class="type">char</span>* shstrtab = (<span class="type">char</span>*)(file_memory + shstrtab_section-&gt;sh_offset);</span><br><span class="line"></span><br><span class="line">    Elf_Shdr* section = <span class="literal">NULL</span>;</span><br><span class="line">    <span class="keyword">for</span> (<span class="type">int</span> i = <span class="number">0</span>; i &lt; elf_header-&gt;e_shnum; i++) &#123;</span><br><span class="line">        <span class="type">char</span>* this_sec_name = shstrtab + section_header_table[i].sh_name;</span><br><span class="line">        <span class="comment">// Move 1 character to account for the leading &quot;.&quot;</span></span><br><span class="line">        this_sec_name += <span class="number">1</span>;</span><br><span class="line">        <span class="keyword">if</span> (<span class="built_in">strcmp</span>(secname, this_sec_name) == <span class="number">0</span>) &#123;</span><br><span class="line">            section = &amp;section_header_table[i];</span><br><span class="line">            <span class="keyword">break</span>;</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    Elf_Phdr* program_header_table = (Elf_Phdr*)(file_memory + elf_header-&gt;e_phoff);</span><br><span class="line">    <span class="comment">// Find the first PT_LOAD segment</span></span><br><span class="line">    Elf_Phdr* first_load_segment = <span class="literal">NULL</span>;</span><br><span class="line">    <span class="keyword">for</span> (<span class="type">int</span> i = <span class="number">0</span>; i &lt; elf_header-&gt;e_phnum; i++) &#123;</span><br><span class="line">        <span class="keyword">if</span> (program_header_table[i].p_type == PT_LOAD) &#123;</span><br><span class="line">            first_load_segment = &amp;program_header_table[i];</span><br><span class="line">            <span class="keyword">break</span>;</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="keyword">if</span> (section != <span class="literal">NULL</span> &amp;&amp; first_load_segment != <span class="literal">NULL</span>) &#123;</span><br><span class="line">        <span class="type">uintptr_t</span> elf_load_addr = first_load_segment-&gt;p_vaddr</span><br><span class="line">            - (first_load_segment-&gt;p_vaddr % first_load_segment-&gt;p_align);</span><br><span class="line">        result = start_address + (<span class="type">uintptr_t</span>)section-&gt;sh_addr - elf_load_addr;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line"><span class="built_in">exit</span>:</span><br><span class="line">    <span class="keyword">if</span> (file_memory != <span class="literal">NULL</span>) &#123;</span><br><span class="line">        munmap(file_memory, file_stats.st_size);</span><br><span class="line">    &#125;</span><br><span class="line">    <span class="keyword">if</span> (fd &gt;= <span class="number">0</span> &amp;&amp; close(fd) != <span class="number">0</span>) &#123;</span><br><span class="line">        PyErr_SetFromErrno(PyExc_OSError);</span><br><span class="line">    &#125;</span><br><span class="line">    <span class="keyword">return</span> result;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>这段代码稍微有点复杂，我们来拆分看一下</p><p>首先函数的声明</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">search_elf_file_for_section(</span><br><span class="line">        <span class="type">proc_handle_t</span> *handle,</span><br><span class="line">        <span class="type">const</span> <span class="type">char</span>* secname,</span><br><span class="line">        <span class="type">uintptr_t</span> start_address,</span><br><span class="line">        <span class="type">const</span> <span class="type">char</span> *elf_file)</span><br></pre></td></tr></table></figure><p>用于在ELF文件中搜索特定的section。参数包括：进程句柄、要查找的section名称、起始地址（文件在进程空间的映射位置）、ELF文件路径。</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="type">int</span> fd = open(elf_file, O_RDONLY);</span><br><span class="line"><span class="keyword">if</span> (fd &lt; <span class="number">0</span>) &#123;</span><br><span class="line">    PyErr_SetFromErrno(PyExc_OSError);</span><br><span class="line">    <span class="keyword">goto</span> <span class="built_in">exit</span>;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>以只读方式打开ELF文件，如果失败则设置Python异常并跳转到退出处理。</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">file_memory = mmap(<span class="literal">NULL</span>, file_stats.st_size, PROT_READ, MAP_PRIVATE, fd, <span class="number">0</span>);</span><br><span class="line"><span class="keyword">if</span> (file_memory == MAP_FAILED) &#123;</span><br><span class="line">    PyErr_SetFromErrno(PyExc_OSError);</span><br><span class="line">    <span class="keyword">goto</span> <span class="built_in">exit</span>;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>将文件内容映射到内存，以只读和私有方式，从文件头开始。失败则设置异常并退出。</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">Elf_Ehdr* elf_header = (Elf_Ehdr*)file_memory;</span><br><span class="line">Elf_Shdr* section_header_table = (Elf_Shdr*)(file_memory + elf_header-&gt;e_shoff);</span><br></pre></td></tr></table></figure><p>将文件开头 cast 为ELF文件头结构，并找到section header表的位置，它在文件偏移e_shoff处。</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line">Elf_Shdr* shstrtab_section = &amp;section_header_table[elf_header-&gt;e_shstrndx];</span><br><span class="line"><span class="type">char</span>* shstrtab = (<span class="type">char</span>*)(file_memory + shstrtab_section-&gt;sh_offset);</span><br><span class="line">Elf_Shdr* section = <span class="literal">NULL</span>;</span><br><span class="line"><span class="keyword">for</span> (<span class="type">int</span> i = <span class="number">0</span>; i &lt; elf_header-&gt;e_shnum; i++) &#123;</span><br><span class="line">    <span class="type">char</span>* this_sec_name = shstrtab + section_header_table[i].sh_name;</span><br><span class="line">    <span class="comment">// Move 1 character to account for the leading &quot;.&quot;</span></span><br><span class="line">    this_sec_name += <span class="number">1</span>;</span><br><span class="line">    <span class="keyword">if</span> (<span class="built_in">strcmp</span>(secname, this_sec_name) == <span class="number">0</span>) &#123;</span><br><span class="line">        section = &amp;section_header_table[i];</span><br><span class="line">        <span class="keyword">break</span>;</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>获取section字符串表（包含所有section名称的表），通过e_shstrndx索引定位。同时遍历所有section，查找匹配的section名称。注意需要跳过section名字的”.”前缀。</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">Elf_Phdr* program_header_table = (Elf_Phdr*)(file_memory + elf_header-&gt;e_phoff);</span><br><span class="line"><span class="comment">// Find the first PT_LOAD segment</span></span><br><span class="line">Elf_Phdr* first_load_segment = <span class="literal">NULL</span>;</span><br><span class="line"><span class="keyword">for</span> (<span class="type">int</span> i = <span class="number">0</span>; i &lt; elf_header-&gt;e_phnum; i++) &#123;</span><br><span class="line">    <span class="keyword">if</span> (program_header_table[i].p_type == PT_LOAD) &#123;</span><br><span class="line">        first_load_segment = &amp;program_header_table[i];</span><br><span class="line">        <span class="keyword">break</span>;</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>找到program header表，然后搜索第一个PT_LOAD类型的segment，它定义了程序加载时的基地址。</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">if</span> (section != <span class="literal">NULL</span> &amp;&amp; first_load_segment != <span class="literal">NULL</span>) &#123;</span><br><span class="line">    <span class="type">uintptr_t</span> elf_load_addr = first_load_segment-&gt;p_vaddr</span><br><span class="line">        - (first_load_segment-&gt;p_vaddr % first_load_segment-&gt;p_align);</span><br><span class="line">    result = start_address + (<span class="type">uintptr_t</span>)section-&gt;sh_addr - elf_load_addr;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>如果找到了目标section和第一个LOAD segment，计算目标section的运行时地址：</p><ol><li>计算ELF文件的加载基地址（考虑对齐）</li><li>目标地址 = 进程中映射的起始地址 + section的虚拟地址 - ELF加载基地址</li></ol><p>经过这样一个流程，我们就能最终的获取到 <code>_PyRuntime</code> 中的地址，然后基于此做一些包括 PEP 768 在内很有趣的工作。</p><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><p>Python 3.14 官方其实将进程信息以半正式化的形式形成了一组相对稳定的 ABI，这样可以使我们调试工具能以更好的方式对 Python 进程进行无侵入的调试与观测。PEP 768 其实是这个过程中一个的有效产物。而基于 PEP768 处理的比如 Remote PDB debug，目前也已合入分支。</p><p>可以说从 Python 3.14 起，Python 的调试工具和手段将得到极大的丰富与增强。建议大家在出来后的第一时间进行升级（</p><p>差不多就这样（</p>]]></content>
    
    
    <summary type="html">&lt;p&gt;Python 3.14 目前主要的一些主要的特性其实已经固定了，在我看来，Python 3.14 是一个未来很多年的一个核心版本。因为其确定了是时代的 Python&lt;br&gt;调试生态的基准，这篇文章将会来聊聊这个 Python 世界中的史诗级改进&lt;/p&gt;</summary>
    
    
    
    <category term="编程" scheme="https://www.manjusaka.blog/categories/%E7%BC%96%E7%A8%8B/"/>
    
    <category term="Python" scheme="https://www.manjusaka.blog/categories/%E7%BC%96%E7%A8%8B/Python/"/>
    
    
    <category term="编程" scheme="https://www.manjusaka.blog/tags/%E7%BC%96%E7%A8%8B/"/>
    
    <category term="Linux" scheme="https://www.manjusaka.blog/tags/Linux/"/>
    
    <category term="Python" scheme="https://www.manjusaka.blog/tags/Python/"/>
    
    <category term="笔记" scheme="https://www.manjusaka.blog/tags/%E7%AC%94%E8%AE%B0/"/>
    
    <category term="水文" scheme="https://www.manjusaka.blog/tags/%E6%B0%B4%E6%96%87/"/>
    
  </entry>
  
  <entry>
    <title>简单聊聊常见的负载均衡算法</title>
    <link href="https://www.manjusaka.blog/posts/2025/03/23/a-simple-introduction-about-load-balance-algorithm/"/>
    <id>https://www.manjusaka.blog/posts/2025/03/23/a-simple-introduction-about-load-balance-algorithm/</id>
    <published>2025-03-23T12:00:00.000Z</published>
    <updated>2026-03-29T17:00:43.280Z</updated>
    
    <content type="html"><![CDATA[<script src="/assets/js/DPlayer.min.js"> </script><p>这篇文章鸽了很久，最终决定还是老老实实写完，来介绍一下常见的一些负载均衡算法实现。本文的代码最终都会放在 <strong>load-balancer-algorithm</strong><a href="#refer-anchor-1"><sup>1</sup></a> 这个 repo 中</p><p><del>我从来没有觉得写博客快乐过</del></p><span id="more"></span><h2 id="正文"><a href="#正文" class="headerlink" title="正文"></a>正文</h2><h3 id="先行准备"><a href="#先行准备" class="headerlink" title="先行准备"></a>先行准备</h3><p>既然是讲 LoadBalancer 中常用的一些负载均衡算法，我们先来对一些前置准备做一些讨论</p><p>我们目前需要两个基础的数据结构</p><ol><li>代表着 Backend 节点的结构</li><li>代表着请求上下文的结构</li></ol><p>那么我们可以得出下面一些基础代码</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> dataclasses</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="meta">@dataclasses.dataclass</span></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">Node</span>:</span><br><span class="line">    host: <span class="built_in">str</span> = <span class="string">&quot;&quot;</span></span><br><span class="line">    port: <span class="built_in">int</span> = <span class="number">0</span></span><br><span class="line">    node_available: <span class="built_in">bool</span> = <span class="literal">True</span></span><br><span class="line"></span><br><span class="line"><span class="meta">    @property</span></span><br><span class="line">    <span class="keyword">def</span> <span class="title function_">available</span>(<span class="params">self</span>) -&gt; <span class="built_in">bool</span>:</span><br><span class="line">        <span class="keyword">return</span> <span class="variable language_">self</span>.node_available</span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> dataclasses</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="meta">@dataclasses.dataclass</span></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">RequestContext</span>:</span><br><span class="line">    <span class="keyword">pass</span></span><br><span class="line"></span><br></pre></td></tr></table></figure><p>同时我们在没有后端节点可供选择的时候，我们需要抛出一个异常</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">class</span> <span class="title class_">NoNodesAvailableError</span>(<span class="title class_ inherited__">ValueError</span>):</span><br><span class="line">    <span class="keyword">pass</span></span><br></pre></td></tr></table></figure><p>好了，我们现在可以进行更进一步的抽象，我们可以将我们的负载均衡算法抽象为策略(Strategy), 那么我们可以得出如下的一些代码</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">from</span> __future__ <span class="keyword">import</span> annotations</span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> typing</span><br><span class="line"><span class="keyword">from</span> abc <span class="keyword">import</span> ABC, abstractmethod</span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> typing.TYPE_CHECKING:</span><br><span class="line">    <span class="keyword">from</span> load_balancer_algorithm.context <span class="keyword">import</span> RequestContext</span><br><span class="line">    <span class="keyword">from</span> load_balancer_algorithm.node <span class="keyword">import</span> Node</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">Strategy</span>(<span class="title class_ inherited__">ABC</span>):</span><br><span class="line">    nodes: <span class="built_in">list</span>[Node] = []</span><br><span class="line"></span><br><span class="line">    <span class="keyword">def</span> <span class="title function_">__init__</span>(<span class="params">self, nodes:<span class="built_in">list</span>[Node]</span>) -&gt; <span class="literal">None</span>:</span><br><span class="line">        <span class="variable language_">self</span>.nodes = nodes</span><br><span class="line"></span><br><span class="line"><span class="meta">    @abstractmethod</span></span><br><span class="line">    <span class="keyword">def</span> <span class="title function_">get_node</span>(<span class="params">self, ctx: RequestContext</span>) -&gt; Node:</span><br><span class="line">        <span class="keyword">pass</span></span><br><span class="line"></span><br><span class="line">    <span class="keyword">def</span> <span class="title function_">add_node</span>(<span class="params">self, node: Node</span>) -&gt; <span class="literal">None</span>:</span><br><span class="line">        <span class="variable language_">self</span>.nodes.append(node)</span><br><span class="line"></span><br><span class="line">    <span class="keyword">def</span> <span class="title function_">remove_node</span>(<span class="params">self, node: Node</span>) -&gt; <span class="literal">None</span>:</span><br><span class="line">        <span class="variable language_">self</span>.nodes= <span class="built_in">list</span>(<span class="built_in">filter</span>(<span class="keyword">lambda</span> n: n != node, <span class="variable language_">self</span>.nodes))</span><br></pre></td></tr></table></figure><p>好了，我们现在可以往下去实现一些负载均衡算法了</p><h3 id="随机选择"><a href="#随机选择" class="headerlink" title="随机选择"></a>随机选择</h3><p>负载均衡最简单的一个算法是做一个随机的选择，实现非常简单，最简单的伪代码实现差不多这样</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">a = []</span><br><span class="line">random.choice(a)</span><br></pre></td></tr></table></figure><p>我们来完整实现一下</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">class</span> <span class="title class_">RandomStrategy</span>(<span class="title class_ inherited__">Strategy</span>):</span><br><span class="line">    <span class="keyword">def</span> <span class="title function_">get_node</span>(<span class="params">self, ctx: RequestContext</span>) -&gt; Node:</span><br><span class="line">        nodes = <span class="built_in">list</span>(<span class="built_in">filter</span>(<span class="keyword">lambda</span> node: node.available, <span class="variable language_">self</span>.nodes))</span><br><span class="line">        <span class="keyword">if</span> <span class="keyword">not</span> nodes:</span><br><span class="line">            <span class="keyword">raise</span> NoNodesAvailableError</span><br><span class="line"></span><br><span class="line">        <span class="keyword">return</span> random.choice(nodes)</span><br></pre></td></tr></table></figure><p>OK，现在我们增加一个需求，现在我们每个节点都需要有一个权重值，权重值越高的节点被选中的概率越高。我们可以使用 random.choices 来实现这个需求，不过在此之前我们需要对 Node 进行一些修改</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> dataclasses</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="meta">@dataclasses.dataclass</span></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">Node</span>:</span><br><span class="line">    host: <span class="built_in">str</span> = <span class="string">&quot;&quot;</span></span><br><span class="line">    port: <span class="built_in">int</span> = <span class="number">0</span></span><br><span class="line">    node_available: <span class="built_in">bool</span> = <span class="literal">True</span></span><br><span class="line">    weight: <span class="built_in">int</span> = <span class="number">0</span></span><br><span class="line"></span><br><span class="line"><span class="meta">    @property</span></span><br><span class="line">    <span class="keyword">def</span> <span class="title function_">available</span>(<span class="params">self</span>) -&gt; <span class="built_in">bool</span>:</span><br><span class="line">        <span class="keyword">return</span> <span class="variable language_">self</span>.node_available</span><br></pre></td></tr></table></figure><p>然后我们来实现一下 WeightedRandomStrategy</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">WeightedRandomStrategy</span>(<span class="title class_ inherited__">Strategy</span>):</span><br><span class="line">    <span class="keyword">def</span> <span class="title function_">get_node</span>(<span class="params">self, ctx: RequestContext</span>) -&gt; Node:</span><br><span class="line">        nodes = <span class="built_in">list</span>(<span class="built_in">filter</span>(<span class="keyword">lambda</span> node: node.available, <span class="variable language_">self</span>.nodes))</span><br><span class="line">        <span class="keyword">if</span> <span class="keyword">not</span> nodes:</span><br><span class="line">            <span class="keyword">raise</span> NoNodesAvailableError</span><br><span class="line"></span><br><span class="line">        weights = [node.weight <span class="keyword">for</span> node <span class="keyword">in</span> nodes]</span><br><span class="line">        <span class="keyword">return</span> random.choices(nodes, weights=weights)[<span class="number">0</span>]</span><br></pre></td></tr></table></figure><p>Random 确实是我们非常常用的一套负载均衡算法，但是缺点也很明显，其负载均衡的效果有一定的不可预测性，是神是鬼全靠你使用的 Random 函数的质量。运气不好就会出现分布非常密集的情况。那么我们有没有可用的更好的负载均衡算法呢？</p><h3 id="轮询算法"><a href="#轮询算法" class="headerlink" title="轮询算法"></a>轮询算法</h3><p>我们对于负载均衡算法常见的需求是在逻辑上有一定的可预测性，从这角度上讲，轮询算法是一个非常好的选择。我们可以使用一个 index 来记录当前的节点，然后每次请求的时候都将 index + 1，直到 index 超过节点的数量，然后 index = 0</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">class</span> <span class="title class_">RoundRobinStrategy</span>(<span class="title class_ inherited__">Strategy</span>):</span><br><span class="line">    <span class="keyword">def</span> <span class="title function_">__init__</span>(<span class="params">self, nodes: <span class="built_in">list</span>[Node]</span>) -&gt; <span class="literal">None</span>:</span><br><span class="line">        <span class="built_in">super</span>().__init__(nodes)</span><br><span class="line">        <span class="variable language_">self</span>.index = <span class="number">0</span></span><br><span class="line"></span><br><span class="line">    <span class="keyword">def</span> <span class="title function_">get_node</span>(<span class="params">self, ctx: RequestContext</span>) -&gt; Node:</span><br><span class="line">        nodes = <span class="built_in">list</span>(<span class="built_in">filter</span>(<span class="keyword">lambda</span> node: node.available, <span class="variable language_">self</span>.nodes))</span><br><span class="line">        <span class="keyword">if</span> <span class="keyword">not</span> nodes:</span><br><span class="line">            <span class="keyword">raise</span> NoNodesAvailableError</span><br><span class="line"></span><br><span class="line">        node = nodes[<span class="variable language_">self</span>.index]</span><br><span class="line">        <span class="variable language_">self</span>.index += <span class="number">1</span></span><br><span class="line">        <span class="keyword">if</span> <span class="variable language_">self</span>.index &gt;= <span class="built_in">len</span>(nodes):</span><br><span class="line">            <span class="variable language_">self</span>.index = <span class="number">0</span></span><br><span class="line"></span><br><span class="line">        <span class="keyword">return</span> node</span><br></pre></td></tr></table></figure><p>这里我们实现了一个最基础的轮询算法（我们假设不存在节点不可用，节点增删改的情况），所以我们 index 一直可以有规律的变化</p><p>这里的结果很明显，如果有一个 [A, B] 的节点列表，那么我们会得到一个 [A, B, A, B, A, B] 的结果</p><p>那么现在我们更改一下需求，我们需要实现一个类似 WeightedRandomStrategy 的轮询算法，权重越高的节点被选中的概率越高。</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">class</span> <span class="title class_">WeightedRoundRobinStrategy</span>(<span class="title class_ inherited__">Strategy</span>):</span><br><span class="line">    <span class="keyword">def</span> <span class="title function_">__init__</span>(<span class="params">self, nodes: <span class="built_in">list</span>[Node]</span>) -&gt; <span class="literal">None</span>:</span><br><span class="line">        <span class="built_in">super</span>().__init__(nodes)</span><br><span class="line">        <span class="variable language_">self</span>.index = <span class="number">0</span></span><br><span class="line">    </span><br><span class="line">    <span class="keyword">def</span> <span class="title function_">get_node</span>(<span class="params">self, ctx: RequestContext</span>) -&gt; Node:</span><br><span class="line">        nodes = <span class="built_in">list</span>(<span class="built_in">filter</span>(<span class="keyword">lambda</span> node: node.available, <span class="variable language_">self</span>.nodes))</span><br><span class="line">    </span><br><span class="line">        <span class="keyword">if</span> <span class="keyword">not</span> nodes:</span><br><span class="line">            <span class="keyword">raise</span> NoNodesAvailableError</span><br><span class="line">        nodes=[node <span class="keyword">for</span> node <span class="keyword">in</span> nodes <span class="keyword">for</span> _ <span class="keyword">in</span> <span class="built_in">range</span>(node.weight)]</span><br><span class="line">        node = nodes[<span class="variable language_">self</span>.index]</span><br><span class="line">        <span class="variable language_">self</span>.index += <span class="number">1</span></span><br><span class="line">        <span class="keyword">if</span> <span class="variable language_">self</span>.index &gt;= <span class="built_in">len</span>(nodes):</span><br><span class="line">            <span class="variable language_">self</span>.index = <span class="number">0</span></span><br><span class="line">        <span class="keyword">return</span> node</span><br></pre></td></tr></table></figure><p>这里的核心算法很简单，我们基于每个节点的权重，得到一个扩展后的节点列表，然后我们就可以使用最基础的轮询算法来实现了</p><p>但是这里核心的一个弊端很明显，假设我们有 [A(weight=2),B(weight=1)] 这样一个节点组合，我们会得到 [A, A, B] 这样一个选择结果，这里的节点分布会非常不均匀。那么怎么办呢？我们可以参考一种来自 Nginx 的平滑算法<a href="#refer-anchor-2"><sup>2</sup></a></p><p>我们首先给节点加上一个 current_weight 的熟悉，记录当前节点的权重值</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> dataclasses</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="meta">@dataclasses.dataclass</span></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">Node</span>:</span><br><span class="line">    host: <span class="built_in">str</span> = <span class="string">&quot;&quot;</span></span><br><span class="line">    port: <span class="built_in">int</span> = <span class="number">0</span></span><br><span class="line">    node_available: <span class="built_in">bool</span> = <span class="literal">True</span></span><br><span class="line">    weight = <span class="number">0</span></span><br><span class="line">    current_weight: <span class="built_in">int</span> = <span class="number">0</span></span><br><span class="line"></span><br><span class="line"><span class="meta">    @property</span></span><br><span class="line">    <span class="keyword">def</span> <span class="title function_">available</span>(<span class="params">self</span>) -&gt; <span class="built_in">bool</span>:</span><br><span class="line">        <span class="keyword">return</span> <span class="variable language_">self</span>.node_available</span><br><span class="line"></span><br><span class="line">    <span class="keyword">def</span> <span class="title function_">__post_init__</span>(<span class="params">self</span>):</span><br><span class="line">        <span class="variable language_">self</span>.current_weight = <span class="variable language_">self</span>.weight</span><br><span class="line"></span><br></pre></td></tr></table></figure><p>然后我们来实现一下 WeightedRoundRobinStrategy</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">class</span> <span class="title class_">WeightedRoundRobinStrategy</span>(<span class="title class_ inherited__">RoundRobinStrategy</span>):</span><br><span class="line">    <span class="keyword">def</span> <span class="title function_">get_node</span>(<span class="params">self, ctx: RequestContext</span>) -&gt; Node:</span><br><span class="line">        nodes = <span class="built_in">list</span>(<span class="built_in">filter</span>(<span class="keyword">lambda</span> node: node.available, <span class="variable language_">self</span>.nodes))</span><br><span class="line">        <span class="keyword">if</span> <span class="keyword">not</span> nodes:</span><br><span class="line">            <span class="keyword">raise</span> NoNodesAvailableError</span><br><span class="line">        best_node = <span class="literal">None</span></span><br><span class="line">        total = <span class="number">0</span></span><br><span class="line">        <span class="keyword">for</span> node <span class="keyword">in</span> nodes:</span><br><span class="line">            total += node.weight</span><br><span class="line">            node.current_weight = node.weight</span><br><span class="line">            <span class="keyword">if</span> <span class="keyword">not</span> best_node <span class="keyword">or</span> node.current_weight &gt; best_node.current_weight:</span><br><span class="line">                best_node = node</span><br><span class="line">        <span class="keyword">if</span> <span class="keyword">not</span> best_node:</span><br><span class="line">            <span class="keyword">raise</span> NoNodesAvailableError</span><br><span class="line">        best_node.current_weight -= total</span><br><span class="line">        <span class="keyword">return</span> best_node</span><br><span class="line"></span><br></pre></td></tr></table></figure><p>这里新增的 current_weight 的作用很简单，</p><ul><li>每次选取节点时，遍历可用节点，遍历时把当前节点的 current_weight 的值加上它的 weight</li><li>同时累加所有节点的 weight 值为 total 。</li><li>如果当前节点的 current_weight 值最大，那么这个节点就是被选中的节点，同时把它的 current_weight 减去 total</li><li>没有被选中的节点的 current_weight 不用减少。</li></ul><p>这本质上其实很巧妙的将节点打散，同时将 index 的属性利用 current_weight 来处理，经过处理，我们假设有 [A(weight=3),B(weight=2),C(weight=1)] 这样一个节点组合，我们会得到 [A, B, A, C, B, A] 这样一个选择结果，这里的节点分布会相对均匀很多</p><p>OK，现在我们轮询函数实现完成了，我们能发现，Random 和轮询算法本质上是两种无状态的算法（最原始的 RoundRobin 有状态，但是我们通过 current_weight 的方式将其变成了无状态），但是我们通常在业务上会有一些根据状态来选择节点的需求，常见的场景有</p><ol><li>我们需要请求去往目前负载最低的节点</li><li>某一类请求我们需要去往同一个节点</li></ol><p>因此下面我们会来介绍两种算法</p><ol><li>最小链接/加权最小链接</li><li>一致性 Hash 算法</li></ol><h3 id="最小链接算法"><a href="#最小链接算法" class="headerlink" title="最小链接算法"></a>最小链接算法</h3><p>最小链接算法是一个非常简单的算法，我们需要在每次请求的时候，遍历所有的节点，找到当前连接数最少的节点，然后将请求转发到这个节点上。我们可以使用一个连接数的属性来记录当前节点的连接数</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">class</span> <span class="title class_">LeastConnectionStrategy</span>(<span class="title class_ inherited__">Strategy</span>):</span><br><span class="line">    <span class="keyword">def</span> <span class="title function_">get_node</span>(<span class="params">self, ctx: RequestContext</span>) -&gt; Node:</span><br><span class="line">        best = <span class="literal">None</span></span><br><span class="line">        <span class="keyword">for</span> node <span class="keyword">in</span> <span class="variable language_">self</span>.nodes:</span><br><span class="line">            <span class="keyword">if</span> <span class="keyword">not</span> node.available:</span><br><span class="line">                <span class="keyword">continue</span></span><br><span class="line">            <span class="keyword">if</span> <span class="keyword">not</span> best <span class="keyword">or</span> node.connections &lt; best.connections:</span><br><span class="line">                best = node</span><br><span class="line">        <span class="keyword">if</span> <span class="keyword">not</span> best:</span><br><span class="line">            <span class="keyword">raise</span> NoNodesAvailableError</span><br><span class="line">        best.connections += <span class="number">1</span></span><br><span class="line">        <span class="keyword">return</span> best</span><br></pre></td></tr></table></figure><p>OK，那么我们接下来老规矩需要考虑加权的 LeastConnection 算法，这里稍晚有一点绕</p><ul><li>假设用 C 表示连接数、W 表示权重、S 表示被选中的节点、Sn 表示未被选中的节点</li><li>那么 S 必须满足 C(S) / W(S) &lt; C(Sn) / W(Sn) ，这个条件也可以表示为 C(S) x W(Sn) &lt; C(Sn) x W(S)</li></ul><p>那么我们来实现一下</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">class</span> <span class="title class_">WeightedLeastConnectionStrategy</span>(<span class="title class_ inherited__">LeastConnectionStrategy</span>):</span><br><span class="line">    <span class="keyword">def</span> <span class="title function_">get_node</span>(<span class="params">self, ctx: RequestContext</span>) -&gt; Node:</span><br><span class="line">        best = <span class="literal">None</span></span><br><span class="line">        <span class="keyword">for</span> node <span class="keyword">in</span> <span class="variable language_">self</span>.nodes:</span><br><span class="line">            <span class="keyword">if</span> <span class="keyword">not</span> node.available:</span><br><span class="line">                <span class="keyword">continue</span></span><br><span class="line">            <span class="keyword">if</span> <span class="keyword">not</span> best <span class="keyword">or</span> (node.connections / node.weight) &lt; (best.connections / best.weight):</span><br><span class="line">                best = node</span><br><span class="line">        <span class="keyword">if</span> <span class="keyword">not</span> best:</span><br><span class="line">            <span class="keyword">raise</span> NoNodesAvailableError</span><br><span class="line">        best.connections += <span class="number">1</span></span><br><span class="line">        <span class="keyword">return</span> best</span><br></pre></td></tr></table></figure><p>当然我们这里实际上有一点问题是，这里的选择可能会连续选择到同一个节点上（因为权重的不均匀），这里可以考虑把符合条件的节点放到一个列表中，然后使用我们前面提到过的 RoundRobin/Random 来选择一个节点来进行请求转发</p><p>这里我就不实现了，大家可以自己实现一下</p><h3 id="一致性-Hash-算法"><a href="#一致性-Hash-算法" class="headerlink" title="一致性 Hash 算法"></a>一致性 Hash 算法</h3><p>我们在业务中经常有这样一种需求，我们需要将同一类请求转发到同一个节点上，这个时候我们就需要使用一致性 Hash 算法来实现了</p><p>最基础的一致性 Hash 算法是将请求的 key 和节点的 key 进行 hash 计算，然后将请求转发到 hash 值最接近的节点上。我们可以使用一个 ring 来表示所有的节点，然后在 ring 上找到离请求最近的节点。</p><p>但是这样存在比较大的问题是，如果有节点的增删改，这个时候我们已经分配好的逻辑会存在 rebalance 的问题。所以我们需要将这个变动变得最小。</p><p>目前主流的几种一致性 Hash 算法的核心思路都是通过虚拟节点来解决这个问题。我们可以将每个节点映射到多个虚拟节点上，然后在 ring 上找到离请求最近的虚拟节点，然后将请求转发到对应的真实节点上。</p><p>这样我们就可以将节点的增删改对请求的影响降到最低。</p><p>我们将以 Google 的 Maglev 算法为基础来实现一致性 Hash 算法</p><p>首先我们更改一下 Node 的代码</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> dataclasses</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="meta">@dataclasses.dataclass</span></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">Node</span>:</span><br><span class="line">    host: <span class="built_in">str</span> = <span class="string">&quot;&quot;</span></span><br><span class="line">    port: <span class="built_in">int</span> = <span class="number">0</span></span><br><span class="line">    node_available: <span class="built_in">bool</span> = <span class="literal">True</span></span><br><span class="line">    weight: <span class="built_in">int</span> = <span class="number">0</span></span><br><span class="line">    current_weight: <span class="built_in">int</span> = <span class="number">0</span></span><br><span class="line">    connections: <span class="built_in">int</span> = <span class="number">0</span></span><br><span class="line"></span><br><span class="line"><span class="meta">    @property</span></span><br><span class="line">    <span class="keyword">def</span> <span class="title function_">available</span>(<span class="params">self</span>) -&gt; <span class="built_in">bool</span>:</span><br><span class="line">        <span class="keyword">return</span> <span class="variable language_">self</span>.node_available</span><br><span class="line"></span><br><span class="line">    <span class="keyword">def</span> <span class="title function_">__str__</span>(<span class="params">self</span>) -&gt; <span class="built_in">str</span>:</span><br><span class="line">        <span class="keyword">return</span> <span class="string">f&quot;<span class="subst">&#123;self.host&#125;</span>:<span class="subst">&#123;self.port&#125;</span>&quot;</span></span><br></pre></td></tr></table></figure><p>这里我们可以用 str(node) 来获取 nodekey</p><p>然后我们来介绍一下 Maglev 算法的核心思路（这里只介绍最简化版本的细节，详情可以参考 Maglev: A Fast and Reliable Software Network Load Balancer<a href="#refer-anchor-3"><sup>3</sup></a>）这篇论文</p><p>首先，我们要确定经过预处理后的产物 <strong>lookup table</strong> 的长度 M。所有 Key 都会被 hash 到这个 <strong>lookup table</strong> 中去，而 <strong>lookup table</strong> 中的每个元素都会被映射到一个 Node 上</p><p>而计算 <strong>lookup table</strong> 的计算分为两步</p><ul><li>计算每一个 node 对于每一个 <strong>lookup table</strong> 项的一个取值（也就是原文中提到的 permutation）；</li><li>根据这个值，去计算每一个 <strong>lookup table</strong> 项所映射到的 node（放在 entry 中，此处 entry 用原文的话来讲就是叫做 the final lookup table）。</li></ul><p>permutation 是一个 N<em>M 的矩阵，列对应 <em>*lookup table</em></em>，行对应 node。 为了计算 permutation，需要挑选两个 hash 算法，分别计算两个值 offset 与 skip 。最后根据 offset 和 skip 的值来填充 permutation，计算方式描述如下：</p><ol><li>offset = hash1(name[i]) mod M</li><li>skip = hash2(name[i]) mod (M − 1)+ 1</li><li>permutation[i][j] = (offset+ j × skip) mod M</li></ol><p>其中 hash1 和 hash2 是两个不同的 hash 函数，我们后续会使用 xxhash 和 mmh3 这两种 hash 函数来实现</p><p>然后我们可以给出 lookup table 的计算方式</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">def</span> <span class="title function_">calculate_lookup_table</span>(<span class="params">n: <span class="built_in">int</span>, m: <span class="built_in">int</span>, permutation: <span class="built_in">list</span>[<span class="built_in">list</span>[<span class="built_in">int</span>]]</span>) -&gt; <span class="built_in">list</span>[<span class="built_in">int</span>]:</span><br><span class="line">    <span class="comment"># result 是最终记录分布的 Hash 表</span></span><br><span class="line">    result: <span class="built_in">list</span>[<span class="built_in">int</span>] = [-<span class="number">1</span>] * m</span><br><span class="line">    <span class="comment"># next 是用来解决冲突的，在遍历过程中突然想要填入的 entry 表已经被占用，</span></span><br><span class="line">    <span class="comment"># 则通过 next 找到下一行。一直进行该过程直到找到一个空位。</span></span><br><span class="line">    <span class="comment"># 因为每一列都包含有 0~M-1 的每一个值，所以最终肯定能遍历完每一行。</span></span><br><span class="line">    <span class="comment"># 计算复杂度为 O(M logM) ~ O(M^2)</span></span><br><span class="line">    <span class="built_in">next</span>: <span class="built_in">list</span>[<span class="built_in">int</span>] = [<span class="number">0</span>] * n</span><br><span class="line">    flag = <span class="number">0</span></span><br><span class="line">    <span class="keyword">while</span> <span class="literal">True</span>:</span><br><span class="line">        <span class="keyword">for</span> i <span class="keyword">in</span> <span class="built_in">range</span>(n):</span><br><span class="line">            x = permutation[i][<span class="built_in">next</span>[i]]</span><br><span class="line">            <span class="keyword">while</span> <span class="literal">True</span>:</span><br><span class="line">                <span class="comment"># 找到空位，退出查找</span></span><br><span class="line">                <span class="keyword">if</span> result[x] == -<span class="number">1</span>:</span><br><span class="line">                    <span class="keyword">break</span></span><br><span class="line">                <span class="built_in">next</span>[i] += <span class="number">1</span></span><br><span class="line">                x = permutation[i][<span class="built_in">next</span>[i]]</span><br><span class="line">            result[x] = i</span><br><span class="line">            <span class="built_in">next</span>[i] += <span class="number">1</span></span><br><span class="line">            flag += <span class="number">1</span></span><br><span class="line">            <span class="comment"># 表已经填满，退出计算</span></span><br><span class="line">            <span class="keyword">if</span> flag == m:</span><br><span class="line">                <span class="keyword">return</span> result</span><br></pre></td></tr></table></figure><p>在这里我们能看到，这段循环代码必然结束，而最坏情况下，复杂度会非常高，最坏的情况可能会到 O(M^2)。原文中建议找一个远大于 N 的 M （To avoid this happening we always choose M such that M ≫ N.）可以使平均复杂度维持在 O(MlogM)</p><p>我们可以用论文中的图来评估下如果节点存在移除的情况，整体的 rebalance 的效果</p><p><img src="https://user-images.githubusercontent.com/7054676/82696622-f73b2800-9c99-11ea-8d14-08f67487f3b9.png" alt="Maglev"></p><p>我们现在来完整实现一下 Maglev 算法，我们先确定用请求中的 url 来作为 hash key，所以我们需要对 RequestContext 进行一些修改</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> dataclasses</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="meta">@dataclasses.dataclass</span></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">RequestContext</span>:</span><br><span class="line">    url: <span class="built_in">str</span> = <span class="string">&quot;&quot;</span></span><br></pre></td></tr></table></figure><p>好了，来把剩下的部分实现了</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br></pre></td><td class="code"><pre><span class="line">M = <span class="number">65537</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">MaglevStrategy</span>(<span class="title class_ inherited__">Strategy</span>):</span><br><span class="line"><span class="meta">    @staticmethod</span></span><br><span class="line">    <span class="keyword">def</span> <span class="title function_">calculate_lookup_table</span>(<span class="params">n: <span class="built_in">int</span>, m: <span class="built_in">int</span>, permutations: <span class="built_in">list</span>[<span class="built_in">list</span>[<span class="built_in">int</span>]]</span>) -&gt; <span class="built_in">list</span>[<span class="built_in">int</span>]:</span><br><span class="line">        <span class="comment"># result 是最终记录分布的 Hash 表</span></span><br><span class="line">        result: <span class="built_in">list</span>[<span class="built_in">int</span>] = [-<span class="number">1</span>] * m</span><br><span class="line">        <span class="comment"># next 是用来解决冲突的，在遍历过程中突然想要填入的 entry 表已经被占用，</span></span><br><span class="line">        <span class="comment"># 则通过 next 找到下一行。一直进行该过程直到找到一个空位。</span></span><br><span class="line">        <span class="comment"># 因为每一列都包含有 0~M-1 的每一个值，所以最终肯定能遍历完每一行。</span></span><br><span class="line">        <span class="comment"># 计算复杂度为 O(M logM) ~ O(M^2)</span></span><br><span class="line">        <span class="built_in">next</span>: <span class="built_in">list</span>[<span class="built_in">int</span>] = [<span class="number">0</span>] * n</span><br><span class="line">        flag = <span class="number">0</span></span><br><span class="line">        <span class="keyword">while</span> <span class="literal">True</span>:</span><br><span class="line">            <span class="keyword">for</span> i <span class="keyword">in</span> <span class="built_in">range</span>(n):</span><br><span class="line">                x = permutations[i][<span class="built_in">next</span>[i]]</span><br><span class="line">                <span class="keyword">while</span> <span class="literal">True</span>:</span><br><span class="line">                    <span class="comment"># 找到空位，退出查找</span></span><br><span class="line">                    <span class="keyword">if</span> result[x] == -<span class="number">1</span>:</span><br><span class="line">                        <span class="keyword">break</span></span><br><span class="line">                    <span class="built_in">next</span>[i] += <span class="number">1</span></span><br><span class="line">                    x = permutations[i][<span class="built_in">next</span>[i]]</span><br><span class="line">                result[x] = i</span><br><span class="line">                <span class="built_in">next</span>[i] += <span class="number">1</span></span><br><span class="line">                flag += <span class="number">1</span></span><br><span class="line">                <span class="comment"># 表已经填满，退出计算</span></span><br><span class="line">                <span class="keyword">if</span> flag == m:</span><br><span class="line">                    <span class="keyword">return</span> result</span><br><span class="line"></span><br><span class="line">    <span class="keyword">def</span> <span class="title function_">__init__</span>(<span class="params">self, nodes: <span class="built_in">list</span>[Node]</span>) -&gt; <span class="literal">None</span>:</span><br><span class="line">        <span class="built_in">super</span>().__init__(nodes)</span><br><span class="line">        permutations = []</span><br><span class="line">        <span class="keyword">for</span> i <span class="keyword">in</span> <span class="built_in">range</span>(<span class="built_in">len</span>(nodes)):</span><br><span class="line">            permutation = [<span class="number">0</span>] * M</span><br><span class="line">            offset = mmh3.<span class="built_in">hash</span>(<span class="built_in">str</span>(nodes[i])) % M</span><br><span class="line">            skip = (xxhash.xxh32(<span class="built_in">str</span>(nodes[i])).intdigest() % (M - <span class="number">1</span>)) + <span class="number">1</span></span><br><span class="line">            <span class="keyword">for</span> j <span class="keyword">in</span> <span class="built_in">range</span>(M):</span><br><span class="line">                permutation[j] = (offset + j * skip) % M</span><br><span class="line">            permutations.append(permutation)</span><br><span class="line">        <span class="variable language_">self</span>.tables = <span class="variable language_">self</span>.calculate_lookup_table(<span class="built_in">len</span>(nodes), M, permutations)</span><br><span class="line"></span><br><span class="line">    <span class="keyword">def</span> <span class="title function_">get_node</span>(<span class="params">self, ctx: RequestContext</span>) -&gt; Node:</span><br><span class="line">        hash_value = mmh3.<span class="built_in">hash</span>(<span class="built_in">str</span>(ctx))</span><br><span class="line">        index = hash_value % M</span><br><span class="line">        node_index = <span class="variable language_">self</span>.tables[index]</span><br><span class="line">        <span class="keyword">return</span> <span class="variable language_">self</span>.nodes[node_index]</span><br></pre></td></tr></table></figure><p>如果大家对 Google 整个 Maglev 系统感兴趣，可以去参考一篇我之前写博客，简单聊聊 Maglev ，来自 Google 的软负载均衡实践<a href="#refer-anchor-4"><sup>4</sup></a></p><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><p>好了，这次负载均衡算法告一段落，其实工作中还有一些更组合的场景，比如 sharding 轮询之类的，不过整体思路都不会发生太大变化。希望大家看的开心</p><h2 id="Reference"><a href="#Reference" class="headerlink" title="Reference"></a>Reference</h2><div id="refer-anchor-1"></div><ul><li><a href="https://github.com/Zheaoli/load-balancer-algorithm">https://github.com/Zheaoli/load-balancer-algorithm</a></li></ul><div id="refer-anchor-2"></div><ul><li><a href="https://github.com/nginx/nginx/commit/52327e0627f49dbda1e8db695e63a4b0af4448b1">https://github.com/nginx/nginx/commit/52327e0627f49dbda1e8db695e63a4b0af4448b1</a></li></ul><div id="refer-anchor-3"></div><ul><li><a href="https://research.google/pubs/maglev-a-fast-and-reliable-software-network-load-balancer/">https://research.google/pubs/maglev-a-fast-and-reliable-software-network-load-balancer/</a></li></ul><div id="refer-anchor-4"></div><ul><li><a href="https://www.manjusaka.blog/posts/2020/05/22/a-simple-introduction-about-maglev">https://www.manjusaka.blog/posts/2020/05/22/a-simple-introduction-about-maglev</a></li></ul>]]></content>
    
    
    <summary type="html">&lt;p&gt;这篇文章鸽了很久，最终决定还是老老实实写完，来介绍一下常见的一些负载均衡算法实现。本文的代码最终都会放在 &lt;strong&gt;load-balancer-algorithm&lt;/strong&gt;&lt;a href=&quot;#refer-anchor-1&quot;&gt;&lt;sup&gt;1&lt;/sup&gt;&lt;/a&gt; 这个 repo 中&lt;/p&gt;
&lt;p&gt;&lt;del&gt;我从来没有觉得写博客快乐过&lt;/del&gt;&lt;/p&gt;</summary>
    
    
    
    <category term="编程" scheme="https://www.manjusaka.blog/categories/%E7%BC%96%E7%A8%8B/"/>
    
    <category term="网络" scheme="https://www.manjusaka.blog/categories/%E7%BC%96%E7%A8%8B/%E7%BD%91%E7%BB%9C/"/>
    
    
    <category term="编程" scheme="https://www.manjusaka.blog/tags/%E7%BC%96%E7%A8%8B/"/>
    
    <category term="Linux" scheme="https://www.manjusaka.blog/tags/Linux/"/>
    
    <category term="笔记" scheme="https://www.manjusaka.blog/tags/%E7%AC%94%E8%AE%B0/"/>
    
    <category term="水文" scheme="https://www.manjusaka.blog/tags/%E6%B0%B4%E6%96%87/"/>
    
  </entry>
  
  <entry>
    <title>简单吐槽一下摇曳露营的台配</title>
    <link href="https://www.manjusaka.blog/posts/2025/02/03/a-little-thought-about-translation-about-yuru-camp/"/>
    <id>https://www.manjusaka.blog/posts/2025/02/03/a-little-thought-about-translation-about-yuru-camp/</id>
    <published>2025-02-03T18:00:00.000Z</published>
    <updated>2026-03-29T17:00:43.280Z</updated>
    
    <content type="html"><![CDATA[<script src="/assets/js/DPlayer.min.js"> </script><p>看了一下台配摇曳露营的 PV 放松，有些地方很想吐槽，写个短文聊一下</p><span id="more"></span><h2 id="正文"><a href="#正文" class="headerlink" title="正文"></a>正文</h2><p>先放出两版本对比</p><p>下面是原版</p><div id="dplayer0" class="dplayer" style="margin-bottom: 20px;"></div><script>var dplayer0 = new DPlayer({"element":document.getElementById("dplayer0"),"autoplay":false,"theme":"#FADFA3","loop":true,"screenshot":true,"hotkey":true,"preload":"auto","video":{"url":"/videos/yuanban_yuru_S1E1.mp4"}});</script><p>下面是台配</p><div id="dplayer1" class="dplayer" style="margin-bottom: 20px;"></div><script>var dplayer1 = new DPlayer({"element":document.getElementById("dplayer1"),"autoplay":false,"theme":"#FADFA3","loop":true,"screenshot":true,"hotkey":true,"preload":"auto","video":{"url":"/videos/taipei_yuru_S1E1.mp4"}});</script><p>现在我们来聊一聊我觉得这个配音出现了什么样的问题</p><p>首先这是出自摇曳露营 S1E1 的最开始的部分。实际上是一个倒序的模式，将五人组的富士山露营在最开始进行展现，然后在季末进行收尾。</p><p>这个做法的效果和意图都很明确，其核心在于</p><blockquote><p>让声优用声音将<strong>角色本身的性格</strong>立住</p></blockquote><p>在这里面我认为问题最大的两个人，</p><ol><li>大垣千明</li><li>齐藤惠那</li></ol><p>其实犬山葵的问题也挺大的，但是出于方言角色确实不太好把握，这里就不多说了。</p><p>这两个角色的其实问题都很一致，<strong>配音者对于角色的性格把握不准</strong>。大垣千明是一个很干脆利落的角色，在四人组中是一个类似数码宝贝中太一一样 Leader 的角色，喜欢开玩笑，有一点假小子的感觉。而齐藤惠那性格和大垣千明有一些类似，不过齐藤会有更多一些少女味，所以她也成为四人组与凛之间的融合剂。</p><p>在原配中，大垣千明全程以很干脆利落的声线立住了人设。而齐藤惠那主要的两句台词“犬山同学，这样可以吗？”和“欸？真的吗？”声优很巧妙的换了不同的发声方式，让角色瞬间立体了起来。</p><p>在台配中，两者的声质都显得非常黏，或者用更不客气的观感来说，五个人的声质完全很难立住人设，属于是教科书里应该出现的声音，而不是动画里应该出现的声音。</p><p>而且台配还有一个比较明显的问题，声优对于情绪的把握出现了问题，比如还是经典的齐藤惠那的“欸？真的吗？”（背景是犬山提醒烤棉花糖不要离火太近，否则会烤焦）这句台词，原配中是一个带着学到新东西的惊讶的语气，而在台配中却在声音中显出了一些焦急。我觉得这是合格的声优不应该出现的意料外的问题。</p><p>另外一点其实被很多人忽略了，日语和中文的发音节奏和习惯是不一样的。在引入过程中，台词可能需要做一些适当的调整。比如在原配中，大垣千明的“欸！来芝麻凛的可可”(ほい しまりんココア一丁) ，这里 “一丁” 是日语中一个很口语的用法，声优选择在这里加了一个重音，来体现一个服务生的感觉，从而表现出大垣千明的古灵精怪。而在台配中，直接处理为“来，志摩凛的可可”，这里就没有很好的本土化。如果是我的话，我可能会选择更符合中文语境的 “来，志摩凛，你可可来咯！”这种口语化表达</p><p>这点在曾经上海电影译制厂译制的各种作品中体现的非常不错。我举个例子，在爱迪奥特曼第44话，激ファイト！ 80vsウルトラセブン/激斗，爱迪对战奥特赛文。中，不良暴走族在被假冒赛文狂追的时候，说“まだ ついてきやがる チクショウ”，直译为“该死，他还在跟着我。”，上译的老前辈们处理为“赛文还在追我，TMD”。而且用了非常痞子的声线，我觉得这就是展现了一个非常好的本土化的正面例子</p><p>放一个片段大家感受下</p><div id="dplayer2" class="dplayer" style="margin-bottom: 20px;"></div><script>var dplayer2 = new DPlayer({"element":document.getElementById("dplayer2"),"autoplay":false,"theme":"#FADFA3","loop":true,"screenshot":true,"hotkey":true,"preload":"auto","video":{"url":"/videos/ultra_seven.mp4"}});</script><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><p>差不多就吐槽这么多吧，翻译是一个累活苦活，希望大家也能多包容。希望不同地方的译者也能给我们带来不同的文化碰撞带来的惊喜。</p><p>差不多这样，祝大家新年快乐</p>]]></content>
    
    
    <summary type="html">&lt;p&gt;看了一下台配摇曳露营的 PV 放松，有些地方很想吐槽，写个短文聊一下&lt;/p&gt;</summary>
    
    
    
    <category term="杂记" scheme="https://www.manjusaka.blog/categories/%E6%9D%82%E8%AE%B0/"/>
    
    <category term="ACG" scheme="https://www.manjusaka.blog/categories/%E6%9D%82%E8%AE%B0/ACG/"/>
    
    <category term="摇曳露营" scheme="https://www.manjusaka.blog/categories/%E6%9D%82%E8%AE%B0/ACG/%E6%91%87%E6%9B%B3%E9%9C%B2%E8%90%A5/"/>
    
    
    <category term="摇曳露营" scheme="https://www.manjusaka.blog/tags/%E6%91%87%E6%9B%B3%E9%9C%B2%E8%90%A5/"/>
    
    <category term="动画" scheme="https://www.manjusaka.blog/tags/%E5%8A%A8%E7%94%BB/"/>
    
    <category term="随笔" scheme="https://www.manjusaka.blog/tags/%E9%9A%8F%E7%AC%94/"/>
    
  </entry>
  
  <entry>
    <title>Saka 馬鹿</title>
    <link href="https://www.manjusaka.blog/posts/2025/01/04/saka-is-baka/"/>
    <id>https://www.manjusaka.blog/posts/2025/01/04/saka-is-baka/</id>
    <published>2025-01-04T18:30:00.000Z</published>
    <updated>2026-03-29T17:00:43.284Z</updated>
    
    <content type="html"><![CDATA[<script src="/assets/js/DPlayer.min.js"> </script><p>这篇博客是我在刷题群内的 2025 年的第一次分享整理的演讲稿。主要是完整复盘了过去几年里我犯下的两个比较典型的低级错误。</p><p>希望大家能看的开心</p><span id="more"></span><h2 id="正文"><a href="#正文" class="headerlink" title="正文"></a>正文</h2><p>首先来看一下我们抽象后的架构</p><p><img src="https://i.imgur.com/4nVkyxt.png" alt="架构"></p><p>很平平常规的一个架构。而我犯的两次相对低级的错误分别是在数据的入口和数据落点上。OK 那么我们分别来看一下我犯下的错误</p><p>首先要分享的是我搞出的一个核心数据库删除的事故。在介绍事故现场之前，我将先介绍下下当时我们整体资源管理的结构</p><ol><li>我们基于 Terraform 管理资源</li><li>熟悉 Terraform 的同学都知道，Terraform 很重要的一点就是需要一个介质来存储当前 infra 的 state，这样能让后续的操作基于状态来实现 diff</li><li>我们当时的 state 是存储在 local fs ，state 文件跟随着 Git Repo 一起变更</li><li>我们基于目录划分不同业务需要的 AWS Infra 所对应的 Terraform 描述</li><li>关键设施没有开启删除保护</li></ol><p><img src="https://i.imgur.com/3WQ1eSg.png" alt="目录结构"></p><p>OK，我们继续往前讲，我们来激活一下事故现场的回忆</p><ol><li>事故当天，需要给一个新的业务需要一个 AWS Aurora 实例</li><li>我直接复制了一个目录，然后重命名为新业务名</li><li>删除一些不必要的 TF 声明后，我就直接开始 terraform apply 了</li><li>因为将之前的 TF State 文件迁移到了新目录，同时修改了 TF 声明。Terraform 会判定需要删除以往的资源。在 apply 阶段的 destory 提示被我忽略</li><li>于是数据库没了.jpg</li></ol><p>让我们先快进到事故的处理</p><ol><li>在接到报警发现异常后，先第一时间中断 Terraform 执行并同步所有关联同事。</li><li>将所有关联服务流量 cutoff 并同步客服团队</li><li>基于已有快照重建数据库</li><li>大约事发1.5h后，恢复业务流量</li></ol><p>非常刺激的一次经历。反思的部分我们放在后面。我们快进到第二次事故。CDN 变更事故。还是和之前一样先介绍一下大致的背景</p><ol><li>我们的 CDN 因为处于成本，和架构统一的考虑，使用的是 AWS 的 Cloudflare</li><li>CDN 前面套了一层基础的 WAF 来处理一些恶意流量</li><li>会有一些业务脚本调用 AWS API 来触发 CDN 的 invalid 操作</li><li>我们当时在处理反爬虫的一些事情，需要额外更新一些 WAF Rule</li></ol><p>那么梅开二度，让我们继续激活一下事故现场的回忆</p><ol><li>给 WAF 直接上了官方推荐的 Anti Bot Rule</li><li>因为当时 WAF Rule 不支持灰度功能，所以没有做灰度</li><li>由于 AWS Anti Bot rule 会将 Android/iOS 的 UA 识别为 Bot，导致客户端流量跌0</li></ol><p>继续快进到事故现场的处理</p><ol><li>立刻进入熔断流程，切断相关流量并同步客服团队</li><li>由于业务调用 AWS 触发了 AWS 账号的 rate limit，所以无法第一时间解除对应的 WAF 规则</li><li>先停止业务脚本调用</li><li>大约在事发40min后，AWS rate limit 解除，我们将 WAF 规则回滚到之前的版本，恢复业务流量</li></ol><p>痛苦的回忆先告一段落。我们来复盘一下我们这两个事故中的共性问题。首先务虚的说核心还是对生产抱有<strong>侥幸</strong>。那么从技术上来说存在哪些问题呢？</p><ol><li>核心基础设施保护设置不到位</li><li>核心基础设施变更 Review 缺乏</li><li>缺少关键变更的灰度机制</li><li>对于业务方使用基础设施的手段缺乏监控和治理（在事故2中，如果不存在 rate limit 的时间，那么整个故障时间可以缩短在 10 min 以内）</li></ol><p>所以围绕这样几个点，在事故发生后的一段时间内我在逐步推进一些改进</p><ol><li>我们统一将 terraform state 从 local fs + git 的组合中解放出来，迁移到了 S3 存储，这样为后续 Terraform workflow 改造打下基础</li><li>我们引入 <a href="https://www.runatlantis.io/">Atlantis</a> 来管理 Terraform 的 PR Review。对于核心基础设施的变更需要 double review</li><li>巡检其余 Redis/MySQL/Kafka 等基础设施，统一开启删除保护/二次验证</li><li>对于 CDN 这类变更引入如下流程（实际上分为 AWS 支持灰度前后）<ol><li>支持灰度前<ol><li>我们会从我们自建的网关中提取出一部分镜像 Query 流量</li><li>新建一个 CDN 实例</li><li>将新规则完整应用在新实例上后，进行流量重放，验证规则的有效性</li></ol></li><li>大约在2023年中后，AWS 对于 WAF 之类的规则新增了灰度的一些支持<ol><li>我们会在 AWS WAF 中新建一个规则，action 仅为统计</li><li>在确认规则不会存在误伤后，我们会将 action 修改为目标需求</li></ol></li></ol></li><li>我们在事后统一盘点了业务侧对于基础设施 API 的一些使用情况，将相关问题统一治理</li></ol><p>实际上在事故1和2中，我自己还有一些其余的建议给看到这篇文章的同学</p><ol><li>在事故发生后，如果预计恢复时间比较长，请第一时间将服务降级/切断入口流量。避免在恢复阶段流量不断进来同时存在缓存雪崩等情况下连锁反应导致恢复时间急剧增加</li><li>对于数据库等数据关键数据落地点，一定要存在下面这样一些 action<ol><li>备份一定需要做<ol><li>基于业务的重要性以及备份成本选择备份周期</li><li>PITR 增量和全量备份都需要做</li></ol></li><li>一定需要定时对备份进行重建测试，目的主要有以下一些<ol><li>验证备份的有效性（对于使用云厂商的数据库备份可靠性相对还好，自研工具做 fs snapshot 的需要特别注意）</li><li>验证不同规模下数据恢复的时间，在事故发生后对于恢复周期有个预期（在事故1中，因为我们之前没做过类似的演练，所以完全没法给出个时间点）（这里我们得到的一个参考时间经验公式是 9分钟/GB 的恢复时间）</li></ol></li></ol></li></ol><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><p>差不多就这样，希望大家能从我的分享中得到一些启发。最后，希望大家在新的一年里都能够顺利，事事顺心。</p>]]></content>
    
    
    <summary type="html">&lt;p&gt;这篇博客是我在刷题群内的 2025 年的第一次分享整理的演讲稿。主要是完整复盘了过去几年里我犯下的两个比较典型的低级错误。&lt;/p&gt;
&lt;p&gt;希望大家能看的开心&lt;/p&gt;</summary>
    
    
    
    <category term="编程" scheme="https://www.manjusaka.blog/categories/%E7%BC%96%E7%A8%8B/"/>
    
    <category term="随笔" scheme="https://www.manjusaka.blog/categories/%E7%BC%96%E7%A8%8B/%E9%9A%8F%E7%AC%94/"/>
    
    
    <category term="编程" scheme="https://www.manjusaka.blog/tags/%E7%BC%96%E7%A8%8B/"/>
    
    <category term="杂记" scheme="https://www.manjusaka.blog/tags/%E6%9D%82%E8%AE%B0/"/>
    
    <category term="人生" scheme="https://www.manjusaka.blog/tags/%E4%BA%BA%E7%94%9F/"/>
    
  </entry>
  
  <entry>
    <title>本当の僕らをありがとう</title>
    <link href="https://www.manjusaka.blog/posts/2024/12/31/at-the-end-of-2024/"/>
    <id>https://www.manjusaka.blog/posts/2024/12/31/at-the-end-of-2024/</id>
    <published>2024-12-31T15:00:00.000Z</published>
    <updated>2026-03-29T17:00:43.280Z</updated>
    
    <content type="html"><![CDATA[<script src="/assets/js/DPlayer.min.js"> </script><p>每年都会选择一句话来总结自己这一年，前年是“但行好事，莫问前程”，去年是 “Per aspera, Ad astra”。那么今年我选择是 “本当の僕らをありがとう”。</p><p>这句出自 《Angel Beats!》的片尾曲《My Song》。含义为“向最真实的我们致谢”</p><span id="more"></span><h2 id="开篇"><a href="#开篇" class="headerlink" title="开篇"></a>开篇</h2><p>今年其实某种意义上是去年的延续，去年年中的我尝试从18楼一跃而下。而双相以及自己的不自信以及在职业上的焦虑让躯体化的症状持续到了今年。某种意义上来说今年是混沌的一年。</p><p>不过今年也是挺值得思考的一年，可能今年所积累的东西可能会在未来以某种特殊的形式回馈回来。</p><h2 id="生活"><a href="#生活" class="headerlink" title="生活"></a>生活</h2><p>双相的治疗进入了深水区，今年状态一直起伏不定，药物性肝损伤，双相波动带来的躯体化症状如影随行。今年虽然没有如同去年一样试图从18楼上来一次刺客信条。但是对我来说，艰险程度还超过去年。</p><p>一如既往的噩梦，各种生活里的反复一如既往的围绕着我。在调整治疗方案后，虽然情绪有所改善，但是所带来的副作用又成为新的问题。</p><p>不过日子还得过，生活也还得继续。爱和希望也还围绕着我。</p><p>你们可能还记得去年家里新增了一只名为小熊的猫。出身流浪，最让人担心。今年病危三次，挺过来三次，指标在年底进入稳定状态。主治医生评价“这命真硬啊”。某种意义上小熊算是成今年家里生命力的模板。</p><p>的今年对于我来说，另外一个最大的变化是买了一直心心念的相机，Z8 + 尼康 2.8 大三元，然后 11月份买了 Z9。 看了下快门数差不多接近4w了，带着狗子出门拍照。也自己出去拍了好看的荷花。在经常遛狗的小公园帮很多家庭留下了美好的瞬间。某种意义上来说在按下快门以及和很多人分享照片的时候，是我今年这一年难得心灵上能放松的时候。Hhhhhh</p><p>剩下的就是一些碎碎念了，今年看了不少好看的番剧，86不存在的战区，青之箱，Angel Beat，胆大党 etc…. 某种意义上这些番剧成为我精神的一处安心之所。可能这就是二次元的意义吧.jpg</p><h2 id="感情"><a href="#感情" class="headerlink" title="感情"></a>感情</h2><p>感情进入了第六个年头，相伴真的是一件既幸福，又考验人的事</p><p>如果说去年荆澈同学将我从18楼飞出的瞬间抓了回来让这份感情添加了不少生死交错的厚重感。那么今年则是在这份厚重感上尝试去淡化生死的伤痕。</p><p>今年由于我状态波动非常大，荆澈同学相较于去年承担了更多宠物和生活上的琐事。比如带着猫咪周期性复查等等。</p><p>说实话陪伴一个心理病人稳定的走下去真的是一件非常辛苦的事。首当其冲的挑战是承受伴侣的情绪真的是很有挑战的一件事。所以希望在25年，我的情绪能更加的受控，让荆澈同学能有更多精力去做她所想做的事情</p><p>以及 25 年一定要和荆澈同学出去玩！</p><h2 id="技术"><a href="#技术" class="headerlink" title="技术"></a>技术</h2><p>如果说去年是“改革，啊不，学习进入了深水区”，那么今年我觉得可以是“学习进入了马里亚纳海沟”</p><p>今年我要要面临的更具有挑战性的事是我对于自身的怀疑。“我是否适合做技术？我是否能成为一个优秀的工程师？我是否能继续在这条路上稳定的走下去？”这成为我从业八年以来第一次对自己产生了自我的怀疑。</p><p>在这种怀疑之下，导致我很多时候会在焦虑驱动下去做一些事情。这样的结果无疑会很坏。我今年一直在尝试与自己和解。尝试去更无功利心，更无目标性的去放松的学习一些东西。这种做法有利有弊，有利的地方是我在一些间接性的兴趣驱使下，机缘巧合下扩展了自己的技术宽度（做一些 AI 以及前端的东西），但是也暴露出了我固有的一些缺点</p><p>在8年的职业生涯过去后，我似乎进入了一种思维的舒适区。我会依赖自己相对较强的快速学习和 landing 的能力去啃下不少东西。但是这也会导致我在不少问题的思考深度上有很明显的不足。我在过去这一年通过一些方式在调整自己的这样一些习惯。但是这一点在25年我也更希望能有所改进</p><p>不过往好处看，24年算是我积累的一年，这一年我在帮社区修了不少 Bug 之余，在 CPU 指令集/汇编/体系结构/编译原理等基础技能上也有了不小的提升。虽然短期内这些东西不太可能会有很高的收益。但是我感觉未来某一天这些东西会以某种形式回馈回来。</p><p>最后用一张图来总结下我的 2024 吧</p><p><img src="https://i.imgur.com/RERVPbH.png" alt=""></p><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><p>2024 虽然相较于之前有所摸鱼，不过也算是我自己必须要经历的一个过度年份吧。展望2025，最核心的目标还是和自己和解，希望能在年底的时候能对自己说“saka 你是个不错的人”</p><p>如果要说一些具体的小目标的话，那么我希望</p><ol><li>和荆澈同学出远门旅游一次</li><li>新进入一个 Top 项目的 Org，成为 member/maintainer</li><li>继续保持每天刷题的节奏</li><li>尽可能保证每两周一篇博客（技术/生活/随笔）</li><li>能够去在完成一到两个前端应用（在 AI 辅助下）并在刷题群内分享</li><li>重新恢复刷题群的公益活动（刷题捐款，跑步捐款，以及公开分享（作为起始，这周五我先在刷题群内自己公开复盘一下自己过去几年犯过的低级的技术错误以及后续改进的 action （自我鞭尸（</li><li>自己能基于 Dify 等 AI Agent 框架做一些好玩的应用（自己目前有一些 idea Hhhh</li><li>保持每周一次的摄影活动</li></ol><p>差不多就这样吧。</p><p>感谢大家 2024 的陪伴，saka 永远爱着你们（</p>]]></content>
    
    
    <summary type="html">&lt;p&gt;每年都会选择一句话来总结自己这一年，前年是“但行好事，莫问前程”，去年是 “Per aspera, Ad astra”。那么今年我选择是 “本当の僕らをありがとう”。&lt;/p&gt;
&lt;p&gt;这句出自 《Angel Beats!》的片尾曲《My Song》。含义为“向最真实的我们致谢”&lt;/p&gt;</summary>
    
    
    
    <category term="杂记" scheme="https://www.manjusaka.blog/categories/%E6%9D%82%E8%AE%B0/"/>
    
    <category term="总结" scheme="https://www.manjusaka.blog/categories/%E6%9D%82%E8%AE%B0/%E6%80%BB%E7%BB%93/"/>
    
    <category term="秀恩爱" scheme="https://www.manjusaka.blog/categories/%E6%9D%82%E8%AE%B0/%E6%80%BB%E7%BB%93/%E7%A7%80%E6%81%A9%E7%88%B1/"/>
    
    
    <category term="杂记" scheme="https://www.manjusaka.blog/tags/%E6%9D%82%E8%AE%B0/"/>
    
    <category term="总结" scheme="https://www.manjusaka.blog/tags/%E6%80%BB%E7%BB%93/"/>
    
  </entry>
  
  <entry>
    <title>OpenDAL 0.51 版本代号 “本当の僕らをありがとう。” 背后的故事</title>
    <link href="https://www.manjusaka.blog/posts/2024/12/14/the-story-behind-the-opendal-0.51/"/>
    <id>https://www.manjusaka.blog/posts/2024/12/14/the-story-behind-the-opendal-0.51/</id>
    <published>2024-12-14T18:30:00.000Z</published>
    <updated>2026-03-29T17:00:43.284Z</updated>
    
    <content type="html"><![CDATA[<script src="/assets/js/DPlayer.min.js"> </script><p>Apache OpenDAL v0.51.0 第一个 RC 版本发布了，这个大版本有些特殊，是我第三个负责 Release 的版本，也是第一个带有 Subtitle 的版本</p><p>这个版本的副标题为 “本当の僕らをありがとう” ，意为 “向真实的我们致谢”</p><p>聊聊这个版本背后的一些故事吧</p><span id="more"></span><h2 id="正文"><a href="#正文" class="headerlink" title="正文"></a>正文</h2><p>在筹备 v0.51.0 的发版的时候，我本来是想想一个标题，来庆祝摇曳露营 S4 制作确定的。所以在 @xuanwo 和 @frostming 的支持和建议下。我发起了一个提案 <strong>[VOTE] Proposal: Adding Cultural References to Release Titles</strong><a href="#refer-anchor-1"><sup>1</sup></a></p><p>提案 #5303 通过后，我本来想用志摩凛的一句我很喜欢的台词“ソロキャンは寂しさも 楽しむものなんだって”来作为摇曳露营 S4 制作决定的庆祝。但是不知道怎么回事，我耳边突然想起了 《My Song》 这首歌</p><blockquote><p>苛立ちをどこにぶつけるか，【焦躁地在哪里寻找的时候】，<br>探してる间に终わる日，【一天已经结束了】，<br>空は灰色をして，【天空一片灰暗】，<br>その先は何も见えない，【前方什么都看不见】，<br>常识ぶってる奴が笑ってる，【故作明了的家伙在笑】，<br>次はどんな嘘を言う？【接着又会说出怎样的谎言？】，<br>それで得られたもの，【由此所得之物】，<br>大事に饰っておけるの？【又怎会好好珍重呢？】，<br>でも明日へと 进まなきゃならない，【但是我们必须向着明天前进】，<br>だからこう歌うよ，【因此放声高歌】，<br>泣いてる君こそ孤独な君こそ，【哭泣的你 孤独的你 反而】，<br>正しいよ人间らしいよ，【是更合理的 更有人情味的】，<br>落とした涙がこう言うよ，【落下的泪水仿佛在说】，<br>こんなにも美しい嘘じゃない，【如此美丽 毫不虚假的】，<br>本当の僕らをありがとう，【向真实的我们致谢啊】，<br>叶えたい梦や，【希望能够实现的梦想】，<br>届かない梦がある事，【及遥不可及的梦想】，<br>それ自体が梦になり希望になり，【其本身就是一种梦想化作了希望】，<br>人は生きてゆけるんだろ，【人才能活下去吧】，<br>扉はある そこで待っている，【有一扇门 在那里一直等待着】，<br>だから手を伸ばすよ，【所以伸出双手吧】，<br>挫けた君にはもう一度戦える，【为了让受挫折的你能再度奋战】，<br>强さと自信とこの歌を，【送上这首坚强的自信的歌】，<br>落とした涙がこう言うよ，【落下的泪水仿佛在说】，<br>こんなにも汚れて丑い世界で，【向在如此肮脏丑陋的世界中】，<br>出会えた奇迹にありがとう，【相遇的奇迹致谢吧】。</p></blockquote><p>先附上歌的链接</p><ol><li><strong>国内读者</strong><a href="#refer-anchor-2"><sup>2</sup></a></li><li><strong>国外读者</strong><a href="#refer-anchor-3"><sup>3</sup></a></li></ol><p>可能熟悉二次元的同学已经反应过来了，这是来自于 《Angel Beats!》 第三话的插曲《My Song》的歌词。 在现实生活中，由歌手中村真里奈演唱。收藏在专辑《Crow Song》中</p><p>Angel Beats! 是一部由 Key 制作的动画，讲述了一群都有着生前特殊的遗憾的人们聚集在一个名为“死后世界”的地方，通过与“天使”对抗，寻找自己的遗憾，最终解决自己的遗憾，从而得以超度的故事。而 《My Song》 的剧中主唱，岩泽雅美（岩沢まさみ）便是其中的一员。</p><p>Angel Beat 每个人的故事都很特殊，男主音无结弦， 从小和重病的妹妹相依为命，为了治愈更多的人而立志投生于医学。但是在前往医学院的考场路上遇上土石塌方。在黑暗中音无利用自己的医学知识帮助其余人使其活到了救援到来的时候。而在救援队挖开落石的那一刻因为伤势过重而死亡。在死前选择将器官全部捐献出去。</p><p>而女主仲村由理的故事更为惨痛，生前住在生活得很快乐的富裕家庭，有三个弟弟妹妹。一天下午，四名强盗趁双亲不在家时入室抢劫。她被强迫在家中找出值钱的东西，不然就每10分钟杀死她的一个弟弟妹妹，结果警察于30分钟后赶到——这时由理已经眼睁睁地看着弟妹三人依次被杀。 “そんな人生なんて、許せないじゃない”/“怎么能原谅这种人生”是她发出的怒吼</p><p>和音无类似于圣徒，由理从地狱归来的人设不太一样。雅美的生前的故事也很惨，但是她同样也没有放弃，生活在家暴与吵架中的贫苦家庭。被音乐所拯救，靠着一把在雨中的垃圾堆捡来的一把吉他在音乐的路上奔驰着，一心用音乐帮助更多的人。但是因为被一酒瓶子砸在脑袋上，脑出血导致脑梗塞，在失语症中度过残生。</p><p>雅美的故事可能会更让人在生活中找到一丝真切感，我们见到的很多乐队的主唱都有类似的成长环境。如果说音无和由理的故事离我们太远，那么雅美的故事可能会是我们所见证过的故事。而雅美在逆境中的坚持以及利用音乐给他人点亮的希望之火，也更容易让我们为之动容。</p><p>之前有人问过我“你真的觉得这个世界上有奥特曼存在吗”，我的回答是“是的，在我心里”</p><p>所以在这个版本中，我想用她所演唱的一首歌的歌词来让我们一起铭记住她。如果平行世界的雅美真的知道了这件事，我相信她也会很开心的</p><h2 id="最后"><a href="#最后" class="headerlink" title="最后"></a>最后</h2><p>用这首歌的另外一句歌词来作为本文的结尾吧</p><blockquote><p>こんなにも汚れて丑い世界で, 出会えた奇迹にありがとう/向在如此肮脏丑陋的世界中, 相遇的奇迹致谢吧</p></blockquote><h2 id="Reference"><a href="#Reference" class="headerlink" title="Reference"></a>Reference</h2><div id="refer-anchor-1"></div><ul><li>[1]. <a href="https://github.com/apache/opendal/discussions/5303">https://github.com/apache/opendal/discussions/5303</a></li></ul><div id="refer-anchor-2"></div><ul><li>[2]. <a href="https://www.bilibili.com/video/BV13x411a79f">https://www.bilibili.com/video/BV13x411a79f</a></li></ul><div id="refer-anchor-3"></div><ul><li>[3]. <a href="https://www.youtube.com/watch?v=mlUCxND9EU8">https://www.youtube.com/watch?v=mlUCxND9EU8</a></li></ul>]]></content>
    
    
    <summary type="html">&lt;p&gt;Apache OpenDAL v0.51.0 第一个 RC 版本发布了，这个大版本有些特殊，是我第三个负责 Release 的版本，也是第一个带有 Subtitle 的版本&lt;/p&gt;
&lt;p&gt;这个版本的副标题为 “本当の僕らをありがとう” ，意为 “向真实的我们致谢”&lt;/p&gt;
&lt;p&gt;聊聊这个版本背后的一些故事吧&lt;/p&gt;</summary>
    
    
    
    <category term="编程" scheme="https://www.manjusaka.blog/categories/%E7%BC%96%E7%A8%8B/"/>
    
    <category term="随笔" scheme="https://www.manjusaka.blog/categories/%E7%BC%96%E7%A8%8B/%E9%9A%8F%E7%AC%94/"/>
    
    
    <category term="编程" scheme="https://www.manjusaka.blog/tags/%E7%BC%96%E7%A8%8B/"/>
    
    <category term="杂记" scheme="https://www.manjusaka.blog/tags/%E6%9D%82%E8%AE%B0/"/>
    
    <category term="人生" scheme="https://www.manjusaka.blog/tags/%E4%BA%BA%E7%94%9F/"/>
    
  </entry>
  
  <entry>
    <title>好了，现在你的知识也是我的了.jpg</title>
    <link href="https://www.manjusaka.blog/posts/2024/12/06/ok-I-got-all-you-know/"/>
    <id>https://www.manjusaka.blog/posts/2024/12/06/ok-I-got-all-you-know/</id>
    <published>2024-12-06T18:00:00.000Z</published>
    <updated>2026-03-29T17:00:43.284Z</updated>
    
    <content type="html"><![CDATA[<script src="/assets/js/DPlayer.min.js"> </script><p>最近给朋友介绍了一下我对于一些外部资料的吸收的经验。我想了想整理了下聊天记录，将我整个思路以博客的形式展现出来。希望能帮到看这篇博客的人。</p><span id="more"></span><h2 id="正文"><a href="#正文" class="headerlink" title="正文"></a>正文</h2><p>首先说明，我自己看过关于知识体系方法论的资料很少。本文描述的是我自己在这些年经过不断改良后觉得目前最适合我的一套方法，可能不适合所有人，仅供参考</p><p>我自己的宗旨是 “有些时候我不一定需要成为知识的生产者，而是成为知识的搬运者”</p><p>或者套用鸭子类型的一个描述就是“如果一个项目，你知道原理，理解细节，能完整的给别人分享其中的细节，那么这就是你做过的项目”</p><p>成为一个合格的知识搬运者，那么意味着你需要擅长或者说能较好的吸收一些外部资料。我自己将原则归纳为两点</p><ol><li>摆正心态</li><li>知识迁移</li></ol><p>这里我先要着重强调心态上的重要性，再开始去描述我们怎么样去做知识的迁移</p><p>我自己对于心态是这样的观点：<strong>很多人对于去内化别人的资料有一种潜在的耻辱性心态，大家在面对一些资料的时候，会下意识的有“这不是我做的”，“我好菜啊”，“好难懂”心态，而不是抱着“原来我可以这么做”，“原来我之前这么做是对的”，这个心态将决定了你知识吸收的效率。</strong></p><p>在我们聊完心态的重要性后，我将以之前看到的一篇 <a href="https://mp.weixin.qq.com/s/vL582Eulh-s5JKt18HZpbw">Mooncake 分离式推理架构创新与实践</a> 的文章为例子，大致的描述我怎么样将一篇资料消化的</p><p>这篇文章比较清晰的介绍了 Mooncake 在一些场景下的面临的挑战和解决思路。我们现在来逐行拆分一些我消化过程中觉得关键的点</p><p>在继续之前，我将这个过程总结为</p><ol><li>知识验证</li><li>平行迁移</li><li>知识增强</li><li>新领域启发</li></ol><p>这四部曲，以及如果你在面对一些可能是你看来比较 naive 的资料，可能还会有一部分我称之为对抗性思路的过程，即你需要去思考，如果是你，你会怎么样做的比他好？</p><p>首先我们聊一下第一部分知识验证的过程</p><p>比如以上文提到的资料中这样一段资料</p><p>第一段</p><blockquote><p>除了性能挑战，我们还需要在大规模推理时采用一些自动运维手段，以减少人力投入，专注于解决更重要的问题。为此，我们采取了以下措施:首先，我们实现了推理实例的快速切换和快速拉齐方法。由于显卡是容易损坏的硬件，我们有硬件巡检手段，能够在机器出现问题时快速隔离，并在一定时间内如果无法恢复则人工介入。其次，在深夜时段，推理压力不大时，我们会释放一部分空闲资源来执行一些长时间或离线的任务，这些任务对延迟不敏感，可以异步进行。或者将这些机器用于一些轻量级的训练任务，以避免资源闲置。</p></blockquote><p>第二段</p><blockquote><p>首先，针对 Prefill Cache Miss 的问题，关键在于机器 B 没有热请求的 KV Cache。我们的解决方案是采用 Prefil 到 Prefill 的 Cache Transfer。当机器 B 发现没有 KV Cache时，我们不选择重新计算，因为这会消耗大量时间，而是让机器 A 直接将 KV Cache 传输给机器 B。这样，机器 B 就可以打破恶性循环，减少 TTFT 压力，提高并行度。<br>其次，我们需要处理 Prefil 到 Decode 的传输，这使得我们的 RDMA 网络带宽使用非常频繁。因此，我们需要一个更优的 RDMA 传输方案。许多开源工具的实现可以达到 80GE的水平，但离理论上限还有一定距离。我们对 RDMA 传输进行了精细调整，使得传输速率可以达到 180GB 每秒，非常接近 200GB 的理论上限。</p></blockquote><p>这两部分资料提到的有两个东西</p><ol><li>自动化巡检</li><li>对于缓存系统的 Prefetch 预热</li></ol><p>这一部分实际上是对于我已有知识的验证，我会去思考过去做过的自动化巡检系统中的一些细节（包括怎么样提升准确性，减少误报等关键点），以及验证我自己之前做过的一些缓存预热的手段。</p><p>当然实际上在看这一部分的时候还包含了一些知识的增强。比如在这个过程中我会去查询目前 RDMA 开源方案的一些瓶颈，同时去查阅一些调优的文章。这样确保即便我没做过 RDMA 相关的部分，我也能对这样一个知识领域有一些最基本的了解</p><p>OK，我们来看第二步，知识的平行迁移</p><blockquote><p>我们总结了几个关键公式:<br>更低的推理成本 =更省的模型结构 + 更便宜的硬件<br>更便宜的 Long Context= 更快的 Attention 计算 + 更小的 KVCache<br>更便宜的 Generation =更大的 Batch Size+更 Decode 友好的并行方式<br>这里所说的更节省的模型结构，指的是在时间和显存上的优化。如果我们进一步拆分推理成本，会发现两个关键点:一是长上下文的 Prefil(预填充)，二是 Generation(生成)的成本。对于长上下文的预填充，我们知道 Attention(注意力机制)具有平方级的时间复杂度。随着上下文长度的增加，比如达到 64K或一兆，所需的时间也会呈平方级增长，这成为我们系统中非常关键的一部分。因此，我们需要对这种场景进行专门的优化。优化长上下文预填充后，我们发现从整体上看，生成的成本才是推理系统的主要成本。因为用户在对话过程中需要模型输出的字数越来越长，而生成是一个 Memory Bound 的过程。</p></blockquote><p>这里是文章开头中介绍 Mooncake 这一套系统的背景。在我看来，这一段可能技术上可以吸收的点并不多。但是这一段包含了很不错的演讲分享技巧。</p><ol><li>用简单清晰的公式去吸引观众的注意</li><li>用 “比如达到 64K或一兆，所需的时间也会呈平方级增长” 这样精炼而吸睛的描述去做一个关键点的突出</li></ol><p>这一样一组技巧，我会直接吸收下来作为我自己后续演讲准备中的一些素材。我自己也会在脑海中构建一个场景，如果时间回到21年，我在阿里晋升答辩的时候，如果让我用现在的技巧去概括我过去一年在网关中做的一些关键工作，我会怎么样去概述。</p><p>OK，现在我们来看第三步，知识的增强</p><blockquote><p>我们的优化工作带来了显著的收益，这些收益体现在几个关键指标上。首先，我们实现了TTFT 的 10 倍提升，这主要得益于 Cache Miss 的显著降低，目前我们能做到小于 10%的 Cache Miss 水平，大量的计算可以被重复利用。<br>其次，我们在 TBT 上获得了大约5 倍的提升。这主要归功于 decode 节点能够将 batchsize 增大两倍以上。如果我们采用 Prefi 和 Decode 混合部署的方式，Decode 节点的TVT(Time to Value)压力会比较大，因为需要在 Decode 之间插入 Prefil 的计算。但如果我们将 Prefi 和 Decode 分离， Decode 节点就不需要为 prefil 预留任何显存，从而可以增大 batch size。尽管如此，batch size 的增加也会导致 TBT 相应下降，因此在SLO 的限制下，我们最终只能达到两倍多的水平。<br>在总体吞吐量上 RPM 上，我们平均获得了 1.7 倍的提升，对于一些较简单的业务，提升甚至超过了 5 倍。这些成绩的取得，是因为我们挖掘了当前许多框架可能没有充分利用的硬件资源，例如基础架构的 RDMA 通信带宽、内存的容量和带宽，以及 OSS 或 SSD 等多级缓存工具。</p></blockquote><p>我是一个做 infra 出身的 SRE，那么对于可观测性的渴望我是吸纳在骨子里的，在看到这一段 Mooncake 关键结果的描述的时候，我会去思考这样一些问题</p><ol><li>他们的监控规模有多大？时序数据规模有多大？采用的什么方案？Prometheus 等开源结构能否满足这一套需求？</li><li>他们这种对于 cache 的访问场景是不是延时敏感的？如果是的话，要去监控 cache miss 等指标的 overhead 会有多大？</li></ol><p>我会在我脑海中根据我已经有的信息，对我自己给自己提出的问题进行一次或者多次推演（通常我工作累了，会做一些类似的推演或者思考来换个脑子休息一下）。而其有机会的时候，我也会去请教作者我思考过程中的一些问题（比如之前看一些论文或者内核 Patch 我有些思考都会直接去邮件沟通作者）。而不断的推演以及和同行不断的交流，实际上是对于知识的一个正向的反馈过程。能够帮助你对你自己做过的东西理解的更深。</p><p>OK，现在我们来聊新领域的启发</p><p>以前面的例子为例，整篇文章中反复提到了他们利用 RDMA 所做的一些事情。而这一块其实是我知识的盲区所在。所以我会去做以下这一样一些探索</p><ol><li>明确 RDMA 目前的发展的一些状况</li><li>找一到两篇关于 RDMA 的论文进行粗读</li><li>找一到两篇公开的分享来了解 RDMA 在业界的一些落地概况</li></ol><p>我自己日常的场景离 RDMA 可能有不少的 GAP。但是这样一些新领域的探索不仅能让我对 RDMA 有一些最基本的认知。也能让我去对行业头部的发展有一些理性的认识。</p><p>在我看来，知识消化其实就是不断的从你觉得有用的资料中重复这四部曲的过程。你需要基于你已有的知识框架去做一些推演，做一些思考。最终，别人的知识也可以成为你的知识.jpg</p><p>“我们不是知识的生产者，我们是知识的搬运工.jpg”</p><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><p>差不多就这样吧，希望这篇文章能帮到大家</p>]]></content>
    
    
    <summary type="html">&lt;p&gt;最近给朋友介绍了一下我对于一些外部资料的吸收的经验。我想了想整理了下聊天记录，将我整个思路以博客的形式展现出来。希望能帮到看这篇博客的人。&lt;/p&gt;</summary>
    
    
    
    <category term="编程" scheme="https://www.manjusaka.blog/categories/%E7%BC%96%E7%A8%8B/"/>
    
    <category term="随笔" scheme="https://www.manjusaka.blog/categories/%E7%BC%96%E7%A8%8B/%E9%9A%8F%E7%AC%94/"/>
    
    
    <category term="编程" scheme="https://www.manjusaka.blog/tags/%E7%BC%96%E7%A8%8B/"/>
    
    <category term="杂记" scheme="https://www.manjusaka.blog/tags/%E6%9D%82%E8%AE%B0/"/>
    
    <category term="人生" scheme="https://www.manjusaka.blog/tags/%E4%BA%BA%E7%94%9F/"/>
    
  </entry>
  
  <entry>
    <title>如何使用 WASMTIME 来运行 CPython for WASI，然后利用 Python 实现的 HostFunction 来扩展它？</title>
    <link href="https://www.manjusaka.blog/posts/2024/10/02/how-to-extend-the-wasi-python-by-using-host-function-cn/"/>
    <id>https://www.manjusaka.blog/posts/2024/10/02/how-to-extend-the-wasi-python-by-using-host-function-cn/</id>
    <published>2024-10-02T13:00:00.000Z</published>
    <updated>2026-03-29T17:00:43.280Z</updated>
    
    <content type="html"><![CDATA[<script src="/assets/js/DPlayer.min.js"> </script><p>国庆节搞了一个活，利用 wasmtime 来执行编译成 WASM/WASI 字节码的 CPython 虚拟机，并在宿主机一侧利用 Python 实现的 Host Function 来扩展它。</p><p>再次声明一下，这个只是我个人想搞的活，没有再任何生产环境中得到验证，just for fun（XDDD</p><span id="more"></span><h2 id="正文"><a href="#正文" class="headerlink" title="正文"></a>正文</h2><p>首先我们简单介绍一下 WASM/WASI，这里我直接引用一下 AI 生成的 brief summary</p><blockquote><p>WebAssembly (WASM) 是一种低级编程语言,可在现代网页浏览器中运行。它提供接近原生的性能。<br>WebAssembly System Interface (WASI) 是 WASM 的一个标准扩展,允许 WASM 程序在浏览器外运行,访问系统资源。<br>这两项技术旨在提高 Web 应用性能,并使 WASM 在更多环境中可用。</p></blockquote><p>而 WASM/WASI 技术路线核心的优势在于</p><ol><li>跨平台的兼容性</li><li>多语言通过静态编译的支持</li><li>Native Sandbox 带来的安全性</li></ol><p>所以 WASM/WASI 不仅在浏览器得到了广泛的应用， 现在其应用也逐渐扩展到了服务端。Serverless Compute，Database UDF， Gateway Plugin 等场景都在逐渐的铺开。</p><p>在最近在梳理 CPython 代码的时候，我突然有了一个想法，就是如果我用 WASM/WASI Runtime 来运行 CPython，然后在宿主机一侧利用 Python 实现的 Host Function 来扩展它，这样似乎能对一些比如允许用户上传自定义代码的数据 PaaS 这样的场景有所帮助。当然更主要的原因是这个 idea 貌似很好玩。</p><p>在我们继续往下走之前，我们感谢一个人，Brett Cannon， 他几乎以一己之力，完成了 CPython WASM/WASI 的支持。快跟我说 谢谢 Brett Cannon ！</p><p>CPython 整体的 WASM/WASI 演进路线如下</p><ol><li>最早于21年11月，通过 emscripten 支持了 WASM，参见 BPO-40280<a href="#refer-anchor-1"><sup>1</sup></a></li><li>在2023年6月成为官方支持的 Tier3 平台（或者更早?）</li><li>在2024年3月，成为官方支持的 Tier2 平台，参见 GH-116314<a href="#refer-anchor-2"><sup>2</sup></a></li><li>从 Python 3.13 开始，传统的 emscripten 方式的 WASM/WASI 支持将被放弃</li></ol><p>OK，那么我们先来将 CPython 编译为 WASM/WASI 字节码，这里需要提前 setup 你的环境，确保安装 WASI-SDK。这里我为了省事，直接使用官方提供的 devcontainer 来进行所有的操作</p><p>我们使用 vscode setup 好 devcontainer 后，我们执行 <code>python3 Tools/wasm/wasi.py build -- --config-cache --with-pydebug</code> 便可以编译了，这里为了省事，我将原本 wasi.py 设定的先提前预编译一遍 CPython 的部分给去除了</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"></span><br><span class="line"><span class="keyword">def</span> <span class="title function_">build_all</span>(<span class="params">context</span>):</span><br><span class="line">    <span class="string">&quot;&quot;&quot;Build everything.&quot;&quot;&quot;</span></span><br><span class="line">    steps = [</span><br><span class="line">            <span class="comment">#configure_build_python,</span></span><br><span class="line">            <span class="comment">#make_build_python,</span></span><br><span class="line">            configure_wasi_python,</span><br><span class="line">            make_wasi_python</span><br><span class="line">        ]</span><br></pre></td></tr></table></figure><p>在编译完成后，我们使用 <code>cross-build/wasm32-wasi/python.sh</code> 就可以运行我们的 CPython 了，这里实际上是 wrap 了一下 WASMTIME 的命令</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#!/bin/sh</span></span><br><span class="line"><span class="built_in">exec</span> /usr/local/bin/wasmtime run --wasm max-wasm-stack=16777216 --wasi preview2 --<span class="built_in">dir</span> /workspaces/cpython-wasi::/ --<span class="built_in">env</span> PYTHONPATH=/cross-build/wasm32-wasi/build/lib.wasi-wasm32-3.14-pydebug /workspaces/cpython-wasi/cross-build/wasm32-wasi/python.wasm <span class="string">&quot;<span class="variable">$@</span>&quot;</span></span><br></pre></td></tr></table></figure><p>这里我们可以看到，官方的推荐的 WASM/WASI Runtime 是 wasmtime，那么我们用 wasmtime 进行接下来的工作</p><p>由于我们后续想用 Host Function 来扩展这一套流程，所以我们将 bash 的部分重写一下，最开始我使用的是 wasmtime 的 Python binding，大致的代码如下</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">from</span> wasmtime <span class="keyword">import</span> Linker, Engine, Store, WasiConfig, Module, FuncType, ValType, _bindings, Config</span><br><span class="line"><span class="keyword">import</span> sys</span><br><span class="line"></span><br><span class="line"><span class="keyword">def</span> <span class="title function_">test_wasi</span>():</span><br><span class="line">    linker = Linker(Engine())</span><br><span class="line">    linker.define_wasi()</span><br><span class="line">    <span class="keyword">with</span> <span class="built_in">open</span>(<span class="string">&quot;/workspaces/cpython-wasi/cross-build/wasm32-wasi/python.wasm&quot;</span>, <span class="string">&quot;rb&quot;</span>) <span class="keyword">as</span> file:</span><br><span class="line">        module = Module(linker.engine, file.read())</span><br><span class="line">    <span class="keyword">def</span> <span class="title function_">foor_bar</span>(<span class="params">a, b</span>):</span><br><span class="line">        <span class="keyword">return</span> a + b</span><br><span class="line">    linker.define_func(<span class="string">&quot;demo&quot;</span>, <span class="string">&quot;demo&quot;</span>, FuncType([ValType.i32(),ValType.i32()],[ValType.i32()]), foor_bar)</span><br><span class="line">    store = Store(linker.engine)</span><br><span class="line">    config = Config()</span><br><span class="line">    _bindings.wasmtime_config_max_wasm_stack_set(config.ptr(), <span class="number">16777216</span>)</span><br><span class="line">    wasi_config = WasiConfig()</span><br><span class="line">    <span class="comment"># wasi_config.stdin_file = sys.stdin.fileno()</span></span><br><span class="line">    <span class="comment"># wasi_config.stdout_file = sys.stdout.fileno()</span></span><br><span class="line">    <span class="comment"># wasi_config.stderr_file = sys.stderr.fileno()</span></span><br><span class="line">    wasi_config.env = [[<span class="string">&quot;PYTHONPATH&quot;</span>, <span class="string">&quot;/cross-build/wasm32-wasi/build/lib.wasi-wasm32-3.14-pydebug&quot;</span>]]</span><br><span class="line">    wasi_config.inherit_stdout()</span><br><span class="line">    wasi_config.inherit_stderr()</span><br><span class="line">    wasi_config.inherit_stdin()</span><br><span class="line">    wasi_config.preopen_dir(<span class="string">&quot;/workspaces/cpython-wasi&quot;</span>,<span class="string">&quot;/&quot;</span>)</span><br><span class="line">    store.set_wasi(wasi_config)</span><br><span class="line"></span><br><span class="line">    instance=linker.instantiate(store, module)</span><br><span class="line">    instance.exports(store)[<span class="string">&quot;_start&quot;</span>](store)</span><br><span class="line"></span><br><span class="line">test_wasi()</span><br></pre></td></tr></table></figure><p>由于 wasmtime 的 Python binding 是直接走 ctype 的一套封装，很多 config 选项没有在对外暴露的 API 里（比如代码里使用的 wasmtime_config_max_wasm_stack_set 来处理 WASM 的 stack），导致很多操作需要使用没暴露的私有 API，太过于 tricky，所以我选择重新用 Rust 来实现这一套操作</p><figure class="highlight rust"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">use</span> wasmtime::*;</span><br><span class="line"><span class="keyword">use</span> wasmtime_wasi::preview1::&#123;<span class="keyword">self</span>&#125;;</span><br><span class="line"><span class="keyword">use</span> wasmtime_wasi::WasiCtxBuilder;</span><br><span class="line"><span class="keyword">fn</span> <span class="title function_">main</span>() &#123;</span><br><span class="line">    <span class="keyword">let</span> <span class="keyword">mut </span><span class="variable">config</span> = Config::<span class="title function_ invoke__">new</span>();</span><br><span class="line">    config.<span class="title function_ invoke__">max_wasm_stack</span>(<span class="number">16777216</span>);</span><br><span class="line">    <span class="keyword">match</span> Engine::<span class="title function_ invoke__">new</span>(&amp;config) &#123;</span><br><span class="line">        <span class="title function_ invoke__">Ok</span>(engine) =&gt; &#123;</span><br><span class="line">            <span class="keyword">let</span> <span class="keyword">mut </span><span class="variable">linker</span> = Linker::<span class="title function_ invoke__">new</span>(&amp;engine);</span><br><span class="line">            preview1::<span class="title function_ invoke__">add_to_linker_sync</span>(&amp;<span class="keyword">mut</span> linker, |t| t).<span class="title function_ invoke__">unwrap</span>();</span><br><span class="line">            linker.<span class="title function_ invoke__">allow_unknown_exports</span>(<span class="literal">true</span>);</span><br><span class="line">            <span class="keyword">let</span> <span class="keyword">mut </span><span class="variable">builder</span> = WasiCtxBuilder::<span class="title function_ invoke__">new</span>();</span><br><span class="line">            builder.<span class="title function_ invoke__">inherit_stdio</span>();</span><br><span class="line">            builder.<span class="title function_ invoke__">env</span>(</span><br><span class="line">                <span class="string">&quot;PYTHONPATH&quot;</span>,</span><br><span class="line">                <span class="string">&quot;/cross-build/wasm32-wasi/build/lib.wasi-wasm32-3.14-pydebug&quot;</span>,</span><br><span class="line">            );</span><br><span class="line">            builder</span><br><span class="line">                .<span class="title function_ invoke__">preopened_dir</span>(</span><br><span class="line">                    <span class="string">&quot;/workspaces/cpython-wasi&quot;</span>,</span><br><span class="line">                    <span class="string">&quot;/&quot;</span>,</span><br><span class="line">                    wasmtime_wasi::DirPerms::<span class="title function_ invoke__">all</span>(),</span><br><span class="line">                    wasmtime_wasi::FilePerms::<span class="title function_ invoke__">all</span>(),</span><br><span class="line">                )</span><br><span class="line">                .<span class="title function_ invoke__">unwrap</span>();</span><br><span class="line">            builder.<span class="title function_ invoke__">args</span>(&amp;[<span class="string">&quot;--&quot;</span>, <span class="string">&quot;--version&quot;</span>]);</span><br><span class="line">            <span class="keyword">let</span> <span class="variable">wasi_ctx</span> = builder.<span class="title function_ invoke__">build_p1</span>();</span><br><span class="line">            <span class="keyword">let</span> <span class="keyword">mut </span><span class="variable">store</span> = Store::<span class="title function_ invoke__">new</span>(&amp;engine, wasi_ctx);</span><br><span class="line">            <span class="keyword">let</span> <span class="variable">module</span> = Module::<span class="title function_ invoke__">from_file</span>(</span><br><span class="line">                &amp;engine,</span><br><span class="line">                <span class="string">&quot;/workspaces/cpython-wasi/cross-build/wasm32-wasi/python.wasm&quot;</span>,</span><br><span class="line">            )</span><br><span class="line">            .<span class="title function_ invoke__">unwrap</span>();</span><br><span class="line">            <span class="keyword">let</span> <span class="variable">instance</span> = linker.<span class="title function_ invoke__">instantiate</span>(&amp;<span class="keyword">mut</span> store, &amp;module).<span class="title function_ invoke__">unwrap</span>();</span><br><span class="line">            <span class="keyword">let</span> <span class="variable">run</span> = instance</span><br><span class="line">                .get_typed_func::&lt;(), ()&gt;(&amp;<span class="keyword">mut</span> store, <span class="string">&quot;_start&quot;</span>)</span><br><span class="line">                .<span class="title function_ invoke__">unwrap</span>();</span><br><span class="line">            run.<span class="title function_ invoke__">call</span>(&amp;<span class="keyword">mut</span> store, ()).<span class="title function_ invoke__">unwrap</span>();</span><br><span class="line">            <span class="keyword">return</span>;</span><br><span class="line">        &#125;</span><br><span class="line">        <span class="title function_ invoke__">Err</span>(e) =&gt; &#123;</span><br><span class="line">            <span class="built_in">println!</span>(<span class="string">&quot;Error creating engine: &#123;:?&#125;&quot;</span>, e);</span><br><span class="line">            <span class="keyword">return</span>;</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>然后我们执行代码，成功！</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">[root@267e91be24fd wasmtime-demo]# cargo run --release</span><br><span class="line">   Compiling wasmtime-demo v0.1.0 (/workspaces/wasmtime-demo)</span><br><span class="line">    Finished `release` profile [optimized] target(s) <span class="keyword">in</span> 1.81s</span><br><span class="line">     Running `target/release/wasmtime-demo`</span><br><span class="line">Python 3.14.0a0</span><br></pre></td></tr></table></figure><p>现在我们来扩展我们的 CPython。首先声明，由于 dlopen 在 WASM/WASI for CPython 中没有得到支持，所以我们需要更改 Python 的本体部分</p><p>首先，我们在 Python 的 Modules 目录下面新增一个文件，命名为 <code>demo.c</code>，内容如下</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string">&lt;Python.h&gt;</span></span></span><br><span class="line"></span><br><span class="line"><span class="keyword">extern</span> <span class="type">int</span> <span class="title function_">demo</span><span class="params">(<span class="type">int</span> a, <span class="type">int</span> b)</span> &#123;</span><br><span class="line"><span class="keyword">return</span> a + b;</span><br><span class="line">&#125;</span><br><span class="line"><span class="type">static</span> PyObject *</span><br><span class="line"><span class="title function_">foo_bar</span><span class="params">(PyObject *self, PyObject *args)</span></span><br><span class="line">&#123;</span><br><span class="line">Py_INCREF(PyExc_TypeError);</span><br><span class="line"><span class="keyword">return</span> PyLong_FromLong((<span class="type">long</span>) demo(<span class="number">1</span>, <span class="number">2</span>));</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="type">static</span> PyMethodDef foomethods[] = &#123;</span><br><span class="line">&#123;<span class="string">&quot;bar&quot;</span>, foo_bar, METH_VARARGS, <span class="string">&quot;&quot;</span>&#125;,</span><br><span class="line">&#123;<span class="literal">NULL</span>, <span class="literal">NULL</span>, <span class="number">0</span>, <span class="literal">NULL</span>&#125;,</span><br><span class="line">&#125;;</span><br><span class="line"></span><br><span class="line"><span class="type">static</span> PyModuleDef foomodule = &#123;</span><br><span class="line">PyModuleDef_HEAD_INIT,</span><br><span class="line">.m_name = <span class="string">&quot;demo&quot;</span>,</span><br><span class="line">.m_doc = <span class="string">&quot;foo test module&quot;</span>,</span><br><span class="line">.m_size = <span class="number">-1</span>,</span><br><span class="line">.m_methods = foomethods,</span><br><span class="line">&#125;;</span><br><span class="line"></span><br><span class="line">PyMODINIT_FUNC</span><br><span class="line"><span class="title function_">PyInit_demo</span><span class="params">(<span class="type">void</span>)</span></span><br><span class="line">&#123;</span><br><span class="line"><span class="keyword">return</span> PyModule_Create(&amp;foomodule);</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>然后我们在 <code>Modules/Setup.bootstrap.in</code> 中加入一行</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">demo demo.c</span><br></pre></td></tr></table></figure><p>接着重新执行命令 <code>python3 Tools/wasm/wasi.py build -- --config-cache --with-pydebug</code>，生成新的 WASM/WASI 字节码。接着我们将前面的 Rust 代码中，args 的部分改为 <code>[&quot;--&quot;, &quot;-c&quot;, &quot;import demo; print(demo.bar())&quot;]</code>，然后重新执行代码，成功！</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">[root@267e91be24fd wasmtime-demo]# cargo run --release</span><br><span class="line">   Compiling wasmtime-demo v0.1.0 (/workspaces/wasmtime-demo)</span><br><span class="line">    Finished `release` profile [optimized] target(s) <span class="keyword">in</span> 1.73s</span><br><span class="line">     Running `target/release/wasmtime-demo`</span><br><span class="line">3</span><br></pre></td></tr></table></figure><p>现在，我们有了一个扩展模块，demo.c，但是问题是，我们现在的 demo.c 中核心的 <code>demo</code> 函数是 hardcode 在代码中。那么我们需要处理一下这里</p><p>通常来说，在常规的经验下，我们可以将函数的实现和定义分离开，这样方便动态链接。WASM/WASI 的也是类似，不过需要额外的处理</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">extern</span> <span class="type">int</span> <span class="title function_">demo</span><span class="params">(<span class="type">int</span> a, <span class="type">int</span> b)</span> __<span class="title function_">attribute__</span><span class="params">((</span></span><br><span class="line"><span class="params">    __import_module__(<span class="string">&quot;demo&quot;</span>),</span></span><br><span class="line"><span class="params">    __import_name__(<span class="string">&quot;demo&quot;</span>),</span></span><br><span class="line"><span class="params">))</span>;</span><br></pre></td></tr></table></figure><p>这里我们是通过扩展的宏定义，在编译期的时候告诉编译器，demo 函数是从 demo 模块中导入的。这样我们就可以在后续的 Host Function 中，根据约定进行扩展了</p><p>然后我们需要修改一下 CPython 的编译脚本，给编译参数添加上 <code>-Wextra -Wl,--allow-undefined</code></p><p>接着重新执行 <code>python3 Tools/wasm/wasi.py build -- --config-cache --with-pydebug</code>，生成新的 WASM/WASI 字节码。这个时候我们可以先执行 <code>python.sh</code> 一下，我们会得到报错</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">Error: failed to run main module `/workspaces/cpython-wasi/cross-build/wasm32-wasi/python.wasm`</span><br><span class="line"></span><br><span class="line">Caused by:</span><br><span class="line">    0: failed to instantiate <span class="string">&quot;/workspaces/cpython-wasi/cross-build/wasm32-wasi/python.wasm&quot;</span></span><br><span class="line">    1: unknown import: `demo::demo` has not been defined</span><br></pre></td></tr></table></figure><p>符合预期。</p><p>那么我们现在来重新处理下我们的 Rust 代码</p><figure class="highlight rust"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">use</span> wasmtime::*;</span><br><span class="line"><span class="keyword">use</span> wasmtime_wasi::preview1::&#123;<span class="keyword">self</span>&#125;;</span><br><span class="line"><span class="keyword">use</span> wasmtime_wasi::WasiCtxBuilder;</span><br><span class="line"><span class="keyword">fn</span> <span class="title function_">main</span>() &#123;</span><br><span class="line">    <span class="keyword">let</span> <span class="keyword">mut </span><span class="variable">config</span> = Config::<span class="title function_ invoke__">new</span>();</span><br><span class="line">    config.<span class="title function_ invoke__">max_wasm_stack</span>(<span class="number">16777216</span>);</span><br><span class="line">    <span class="keyword">match</span> Engine::<span class="title function_ invoke__">new</span>(&amp;config) &#123;</span><br><span class="line">        <span class="title function_ invoke__">Ok</span>(engine) =&gt; &#123;</span><br><span class="line">            <span class="keyword">let</span> <span class="keyword">mut </span><span class="variable">linker</span> = Linker::<span class="title function_ invoke__">new</span>(&amp;engine);</span><br><span class="line">            preview1::<span class="title function_ invoke__">add_to_linker_sync</span>(&amp;<span class="keyword">mut</span> linker, |t| t).<span class="title function_ invoke__">unwrap</span>();</span><br><span class="line">            linker</span><br><span class="line">                .<span class="title function_ invoke__">func_wrap</span>(<span class="string">&quot;demo&quot;</span>, <span class="string">&quot;demo&quot;</span>, |a: <span class="type">i32</span>, b: <span class="type">i32</span>| &#123;</span><br><span class="line">                    (a+b)*<span class="number">10</span></span><br><span class="line">                &#125;)</span><br><span class="line">                .<span class="title function_ invoke__">unwrap</span>();</span><br><span class="line">            linker.<span class="title function_ invoke__">allow_unknown_exports</span>(<span class="literal">true</span>);</span><br><span class="line">            <span class="keyword">let</span> <span class="keyword">mut </span><span class="variable">builder</span> = WasiCtxBuilder::<span class="title function_ invoke__">new</span>();</span><br><span class="line">            builder.<span class="title function_ invoke__">inherit_stdio</span>();</span><br><span class="line">            builder.<span class="title function_ invoke__">env</span>(</span><br><span class="line">                <span class="string">&quot;PYTHONPATH&quot;</span>,</span><br><span class="line">                <span class="string">&quot;/cross-build/wasm32-wasi/build/lib.wasi-wasm32-3.14-pydebug&quot;</span>,</span><br><span class="line">            );</span><br><span class="line">            builder</span><br><span class="line">                .<span class="title function_ invoke__">preopened_dir</span>(</span><br><span class="line">                    <span class="string">&quot;/workspaces/cpython-wasi&quot;</span>,</span><br><span class="line">                    <span class="string">&quot;/&quot;</span>,</span><br><span class="line">                    wasmtime_wasi::DirPerms::<span class="title function_ invoke__">all</span>(),</span><br><span class="line">                    wasmtime_wasi::FilePerms::<span class="title function_ invoke__">all</span>(),</span><br><span class="line">                )</span><br><span class="line">                .<span class="title function_ invoke__">unwrap</span>();</span><br><span class="line">            builder.<span class="title function_ invoke__">args</span>(&amp;[<span class="string">&quot;--&quot;</span>, <span class="string">&quot;-c&quot;</span>, <span class="string">&quot;import demo; print(demo.bar())&quot;</span>]);</span><br><span class="line">            <span class="keyword">let</span> <span class="variable">wasi_ctx</span> = builder.<span class="title function_ invoke__">build_p1</span>();</span><br><span class="line">            <span class="keyword">let</span> <span class="keyword">mut </span><span class="variable">store</span> = Store::<span class="title function_ invoke__">new</span>(&amp;engine, wasi_ctx);</span><br><span class="line">            <span class="keyword">let</span> <span class="variable">module</span> = Module::<span class="title function_ invoke__">from_file</span>(</span><br><span class="line">                &amp;engine,</span><br><span class="line">                <span class="string">&quot;/workspaces/cpython-wasi/cross-build/wasm32-wasi/python.wasm&quot;</span>,</span><br><span class="line">            )</span><br><span class="line">            .<span class="title function_ invoke__">unwrap</span>();</span><br><span class="line">            <span class="keyword">let</span> <span class="variable">instance</span> = linker.<span class="title function_ invoke__">instantiate</span>(&amp;<span class="keyword">mut</span> store, &amp;module).<span class="title function_ invoke__">unwrap</span>();</span><br><span class="line">            <span class="keyword">let</span> <span class="variable">run</span> = instance</span><br><span class="line">                .get_typed_func::&lt;(), ()&gt;(&amp;<span class="keyword">mut</span> store, <span class="string">&quot;_start&quot;</span>)</span><br><span class="line">                .<span class="title function_ invoke__">unwrap</span>();</span><br><span class="line">            run.<span class="title function_ invoke__">call</span>(&amp;<span class="keyword">mut</span> store, ()).<span class="title function_ invoke__">unwrap</span>();</span><br><span class="line">            <span class="keyword">return</span>;</span><br><span class="line">        &#125;</span><br><span class="line">        <span class="title function_ invoke__">Err</span>(e) =&gt; &#123;</span><br><span class="line">            <span class="built_in">println!</span>(<span class="string">&quot;Error creating engine: &#123;:?&#125;&quot;</span>, e);</span><br><span class="line">            <span class="keyword">return</span>;</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>执行一下，得到结果</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">[root@267e91be24fd wasmtime-demo]# cargo run --release</span><br><span class="line">   Compiling wasmtime-demo v0.1.0 (/workspaces/wasmtime-demo)</span><br><span class="line">    Finished `release` profile [optimized] target(s) <span class="keyword">in</span> 1.79s</span><br><span class="line">     Running `target/release/wasmtime-demo`</span><br><span class="line">30</span><br></pre></td></tr></table></figure><p>符合预期。</p><p>好了，现在我们支持了 Host Fucntion，我们可以在遵守函数签名的情况下，任意修改我们的逻辑。但是你还记得本文的标题吗？我们想执行 Python 实现的 Host Function。emmmm 虽然有一点绕，但也不是不可以，我们直接祭出 PyO3，更改 Rust 代码如下</p><figure class="highlight rust"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">use</span> pyo3::prelude::*;</span><br><span class="line"><span class="keyword">use</span> pyo3::types::PyTuple;</span><br><span class="line"><span class="keyword">use</span> wasmtime::*;</span><br><span class="line"><span class="keyword">use</span> wasmtime_wasi::preview1::&#123;<span class="keyword">self</span>&#125;;</span><br><span class="line"><span class="keyword">use</span> wasmtime_wasi::WasiCtxBuilder;</span><br><span class="line"><span class="keyword">fn</span> <span class="title function_">main</span>() &#123;</span><br><span class="line">    <span class="keyword">let</span> <span class="keyword">mut </span><span class="variable">config</span> = Config::<span class="title function_ invoke__">new</span>();</span><br><span class="line">    config.<span class="title function_ invoke__">max_wasm_stack</span>(<span class="number">16777216</span>);</span><br><span class="line">    <span class="keyword">match</span> Engine::<span class="title function_ invoke__">new</span>(&amp;config) &#123;</span><br><span class="line">        <span class="title function_ invoke__">Ok</span>(engine) =&gt; &#123;</span><br><span class="line">            <span class="keyword">let</span> <span class="keyword">mut </span><span class="variable">linker</span> = Linker::<span class="title function_ invoke__">new</span>(&amp;engine);</span><br><span class="line">            preview1::<span class="title function_ invoke__">add_to_linker_sync</span>(&amp;<span class="keyword">mut</span> linker, |t| t).<span class="title function_ invoke__">unwrap</span>();</span><br><span class="line">            linker</span><br><span class="line">                .<span class="title function_ invoke__">func_wrap</span>(<span class="string">&quot;demo&quot;</span>, <span class="string">&quot;demo&quot;</span>, |a: <span class="type">i32</span>, b: <span class="type">i32</span>| &#123;</span><br><span class="line">                    Python::<span class="title function_ invoke__">with_gil</span>(|py| &#123;</span><br><span class="line">                        <span class="keyword">let</span> <span class="variable">fun</span>: Py&lt;PyAny&gt; = PyModule::<span class="title function_ invoke__">from_code_bound</span>(</span><br><span class="line">                            py,</span><br><span class="line">                            <span class="string">&quot;def example(*args, **kwargs):</span></span><br><span class="line"><span class="string">                                return (args[0] + args[1])*11&quot;</span>,</span><br><span class="line">                            <span class="string">&quot;&quot;</span>,</span><br><span class="line">                            <span class="string">&quot;&quot;</span>,</span><br><span class="line">                        )</span><br><span class="line">                        .<span class="title function_ invoke__">unwrap</span>()</span><br><span class="line">                        .<span class="title function_ invoke__">getattr</span>(<span class="string">&quot;example&quot;</span>)</span><br><span class="line">                        .<span class="title function_ invoke__">unwrap</span>()</span><br><span class="line">                        .<span class="title function_ invoke__">into</span>();</span><br><span class="line">                        <span class="keyword">let</span> <span class="variable">args</span> = PyTuple::<span class="title function_ invoke__">new_bound</span>(py, &amp;[a, b]);</span><br><span class="line">                        <span class="comment">// cast following to int</span></span><br><span class="line"></span><br><span class="line">                        fun.<span class="title function_ invoke__">call1</span>(py, args).<span class="title function_ invoke__">unwrap</span>().extract::&lt;<span class="type">i32</span>&gt;(py).<span class="title function_ invoke__">unwrap</span>()</span><br><span class="line">                    &#125;)</span><br><span class="line">                &#125;)</span><br><span class="line">                .<span class="title function_ invoke__">unwrap</span>();</span><br><span class="line">            linker.<span class="title function_ invoke__">allow_unknown_exports</span>(<span class="literal">true</span>);</span><br><span class="line">            <span class="keyword">let</span> <span class="keyword">mut </span><span class="variable">builder</span> = WasiCtxBuilder::<span class="title function_ invoke__">new</span>();</span><br><span class="line">            builder.<span class="title function_ invoke__">inherit_stdio</span>();</span><br><span class="line">            builder.<span class="title function_ invoke__">env</span>(</span><br><span class="line">                <span class="string">&quot;PYTHONPATH&quot;</span>,</span><br><span class="line">                <span class="string">&quot;/cross-build/wasm32-wasi/build/lib.wasi-wasm32-3.14-pydebug&quot;</span>,</span><br><span class="line">            );</span><br><span class="line">            builder</span><br><span class="line">                .<span class="title function_ invoke__">preopened_dir</span>(</span><br><span class="line">                    <span class="string">&quot;/workspaces/cpython-wasi&quot;</span>,</span><br><span class="line">                    <span class="string">&quot;/&quot;</span>,</span><br><span class="line">                    wasmtime_wasi::DirPerms::<span class="title function_ invoke__">all</span>(),</span><br><span class="line">                    wasmtime_wasi::FilePerms::<span class="title function_ invoke__">all</span>(),</span><br><span class="line">                )</span><br><span class="line">                .<span class="title function_ invoke__">unwrap</span>();</span><br><span class="line">            builder.<span class="title function_ invoke__">args</span>(&amp;[<span class="string">&quot;--&quot;</span>, <span class="string">&quot;-c&quot;</span>, <span class="string">&quot;import demo; print(demo.bar())&quot;</span>]);</span><br><span class="line">            <span class="keyword">let</span> <span class="variable">wasi_ctx</span> = builder.<span class="title function_ invoke__">build_p1</span>();</span><br><span class="line">            <span class="keyword">let</span> <span class="keyword">mut </span><span class="variable">store</span> = Store::<span class="title function_ invoke__">new</span>(&amp;engine, wasi_ctx);</span><br><span class="line">            <span class="keyword">let</span> <span class="variable">module</span> = Module::<span class="title function_ invoke__">from_file</span>(</span><br><span class="line">                &amp;engine,</span><br><span class="line">                <span class="string">&quot;/workspaces/cpython-wasi/cross-build/wasm32-wasi/python.wasm&quot;</span>,</span><br><span class="line">            )</span><br><span class="line">            .<span class="title function_ invoke__">unwrap</span>();</span><br><span class="line">            <span class="keyword">let</span> <span class="variable">instance</span> = linker.<span class="title function_ invoke__">instantiate</span>(&amp;<span class="keyword">mut</span> store, &amp;module).<span class="title function_ invoke__">unwrap</span>();</span><br><span class="line">            <span class="keyword">let</span> <span class="variable">run</span> = instance</span><br><span class="line">                .get_typed_func::&lt;(), ()&gt;(&amp;<span class="keyword">mut</span> store, <span class="string">&quot;_start&quot;</span>)</span><br><span class="line">                .<span class="title function_ invoke__">unwrap</span>();</span><br><span class="line">            run.<span class="title function_ invoke__">call</span>(&amp;<span class="keyword">mut</span> store, ()).<span class="title function_ invoke__">unwrap</span>();</span><br><span class="line">            <span class="keyword">return</span>;</span><br><span class="line">        &#125;</span><br><span class="line">        <span class="title function_ invoke__">Err</span>(e) =&gt; &#123;</span><br><span class="line">            <span class="built_in">println!</span>(<span class="string">&quot;Error creating engine: &#123;:?&#125;&quot;</span>, e);</span><br><span class="line">            <span class="keyword">return</span>;</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>然后执行一下，得到结果</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">[root@267e91be24fd wasmtime-demo]# cargo run --release</span><br><span class="line">   Compiling wasmtime-demo v0.1.0 (/workspaces/wasmtime-demo)</span><br><span class="line">    Finished `release` profile [optimized] target(s) <span class="keyword">in</span> 1.75s</span><br><span class="line">     Running `target/release/wasmtime-demo`</span><br><span class="line">33</span><br></pre></td></tr></table></figure><p>OK，我们成功了！</p><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><p>本文实际上是一个技术路线的 PoC，验证了特定情况下，将 Python 和 WASI 结合的可能性，但是目前也暴露出一些问题</p><ol><li>dlopen 支持的缺乏导致需要魔改 CPython runtime 本身的代码，不过根据 Brett Cannon 博客中提供的信息，有人 hack 了这一块代码提供了支持。感觉后续可以 follow up 一下</li><li>wasmtime Python binding 实在是太难用了，其实可以考虑直接基于 PyO3 进行一次封装</li><li>利用 Rust 来处理 wasmtime ，PyO3 调用 Python 代码目前存在的问题是 Python VM 对象没法跨线程共享，可能需要自己基于 Rust 封装一套类似 Golang 这样的 channel 的思路来复用虚拟机和传递数据</li></ol><p>不过总体来说，这个 PoC 还是很有意思的，希朋友们也能玩的开心</p><h2 id="参考"><a href="#参考" class="headerlink" title="参考"></a>参考</h2><div id="refer-anchor-1"></div><ul><li>[1]. <a href="https://bugs.python.org/issue40280">https://bugs.python.org/issue40280</a></li></ul><div id="refer-anchor-2"></div><ul><li>[2]. <a href="https://github.com/python/cpython/issues/116314">https://github.com/python/cpython/issues/116314</a></li></ul>]]></content>
    
    
    <summary type="html">&lt;p&gt;国庆节搞了一个活，利用 wasmtime 来执行编译成 WASM/WASI 字节码的 CPython 虚拟机，并在宿主机一侧利用 Python 实现的 Host Function 来扩展它。&lt;/p&gt;
&lt;p&gt;再次声明一下，这个只是我个人想搞的活，没有再任何生产环境中得到验证，just for fun（XDDD&lt;/p&gt;</summary>
    
    
    
    <category term="编程" scheme="https://www.manjusaka.blog/categories/%E7%BC%96%E7%A8%8B/"/>
    
    <category term="CPython" scheme="https://www.manjusaka.blog/categories/%E7%BC%96%E7%A8%8B/CPython/"/>
    
    
    <category term="编程" scheme="https://www.manjusaka.blog/tags/%E7%BC%96%E7%A8%8B/"/>
    
    <category term="笔记" scheme="https://www.manjusaka.blog/tags/%E7%AC%94%E8%AE%B0/"/>
    
    <category term="水文" scheme="https://www.manjusaka.blog/tags/%E6%B0%B4%E6%96%87/"/>
    
    <category term="CPython" scheme="https://www.manjusaka.blog/tags/CPython/"/>
    
    <category term="WASI" scheme="https://www.manjusaka.blog/tags/WASI/"/>
    
    <category term="WASM" scheme="https://www.manjusaka.blog/tags/WASM/"/>
    
  </entry>
  
  <entry>
    <title>How to Run CPython for WASI Using WASMTIME and Extend It with Python-Implemented Host Functions?</title>
    <link href="https://www.manjusaka.blog/posts/2024/10/02/how-to-extend-the-wasi-python-by-using-host-function-en/"/>
    <id>https://www.manjusaka.blog/posts/2024/10/02/how-to-extend-the-wasi-python-by-using-host-function-en/</id>
    <published>2024-10-02T13:00:00.000Z</published>
    <updated>2026-03-29T17:00:43.280Z</updated>
    
    <content type="html"><![CDATA[<script src="/assets/js/DPlayer.min.js"> </script><p>During the National Day holiday, I worked on a project to use wasmtime to execute CPython virtual machine compiled into WASM/WASI bytecode, and extend it with Host Functions implemented in Python on the host side.</p><p>I’d like to clarify again that this is just a personal project I wanted to work on, without any validation in production environments, just for fun (XDDD</p><span id="more"></span><h2 id="Main-Content"><a href="#Main-Content" class="headerlink" title="Main Content"></a>Main Content</h2><p>First, let’s briefly introduce WASM/WASI. Here, I’ll directly quote an AI-generated brief summary:</p><blockquote><p>WebAssembly (WASM) is a low-level programming language that can run in modern web browsers. It provides near-native performance.<br>WebAssembly System Interface (WASI) is a standard extension of WASM that allows WASM programs to run outside the browser and access system resources.<br>These two technologies aim to improve Web application performance and make WASM available in more environments.</p></blockquote><p>The core advantages of the WASM/WASI technology route are:</p><ol><li>Cross-platform compatibility</li><li>Multi-language support through static compilation</li><li>Security brought by Native Sandbox</li></ol><p>Therefore, WASM/WASI is not only widely used in browsers but is also gradually expanding to the server-side. Scenarios such as Serverless Compute, Database UDF, and Gateway Plugin are gradually being rolled out.</p><p>While reviewing CPython code recently, I suddenly had an idea: what if I use WASM/WASI Runtime to run CPython, and then extend it with Host Functions implemented in Python on the host side? This seems to be helpful for scenarios like data PaaS that allows users to upload custom code. Of course, the main reason is that this idea seems quite interesting.</p><p>Before we continue, let’s thank one person, Brett Cannon, who almost single-handedly completed the support for CPython WASM/WASI. Say thank you to Brett Cannon with me!</p><p>The overall WASM/WASI evolution route of CPython is as follows:</p><ol><li>As early as November 2021, WASM was supported through emscripten, see BPO-40280<a href="#refer-anchor-1"><sup>1</sup></a></li><li>It became an officially supported Tier3 platform in June 2023 (or earlier?)</li><li>It became an officially supported Tier2 platform in March 2024, see GH-116314<a href="#refer-anchor-2"><sup>2</sup></a></li><li>Starting from Python 3.13, the traditional emscripten method of WASM/WASI support will be abandoned</li></ol><p>OK, let’s start by compiling CPython into WASM/WASI bytecode. You need to set up your environment in advance and make sure WASI-SDK is installed. To save time, I directly use the official devcontainer for all operations.</p><p>After setting up the devcontainer with vscode, we can compile by executing <code>python3 Tools/wasm/wasi.py build -- --config-cache --with-pydebug</code>. To save time, I removed the part in wasi.py that pre-compiles CPython:</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">def</span> <span class="title function_">build_all</span>(<span class="params">context</span>):</span><br><span class="line">    <span class="string">&quot;&quot;&quot;Build everything.&quot;&quot;&quot;</span></span><br><span class="line">    steps = [</span><br><span class="line">            <span class="comment">#configure_build_python,</span></span><br><span class="line">            <span class="comment">#make_build_python,</span></span><br><span class="line">            configure_wasi_python,</span><br><span class="line">            make_wasi_python</span><br><span class="line">        ]</span><br></pre></td></tr></table></figure><p>After compilation, we can run our CPython using <code>cross-build/wasm32-wasi/python.sh</code>. This is actually a wrapper for the WASMTIME command:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#!/bin/sh</span></span><br><span class="line"><span class="built_in">exec</span> /usr/local/bin/wasmtime run --wasm max-wasm-stack=16777216 --wasi preview2 --<span class="built_in">dir</span> /workspaces/cpython-wasi::/ --<span class="built_in">env</span> PYTHONPATH=/cross-build/wasm32-wasi/build/lib.wasi-wasm32-3.14-pydebug /workspaces/cpython-wasi/cross-build/wasm32-wasi/python.wasm <span class="string">&quot;<span class="variable">$@</span>&quot;</span></span><br></pre></td></tr></table></figure><p>We can see that the officially recommended WASM/WASI Runtime is wasmtime, so we’ll use wasmtime for our next steps.</p><p>Since we want to use Host Functions to extend this process later, we’ll rewrite the bash part. Initially, I used wasmtime’s Python binding, and the code looked roughly like this:</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">from</span> wasmtime <span class="keyword">import</span> Linker, Engine, Store, WasiConfig, Module, FuncType, ValType, _bindings, Config</span><br><span class="line"><span class="keyword">import</span> sys</span><br><span class="line"></span><br><span class="line"><span class="keyword">def</span> <span class="title function_">test_wasi</span>():</span><br><span class="line">    linker = Linker(Engine())</span><br><span class="line">    linker.define_wasi()</span><br><span class="line">    <span class="keyword">with</span> <span class="built_in">open</span>(<span class="string">&quot;/workspaces/cpython-wasi/cross-build/wasm32-wasi/python.wasm&quot;</span>, <span class="string">&quot;rb&quot;</span>) <span class="keyword">as</span> file:</span><br><span class="line">        module = Module(linker.engine, file.read())</span><br><span class="line">    <span class="keyword">def</span> <span class="title function_">foor_bar</span>(<span class="params">a, b</span>):</span><br><span class="line">        <span class="keyword">return</span> a + b</span><br><span class="line">    linker.define_func(<span class="string">&quot;demo&quot;</span>, <span class="string">&quot;demo&quot;</span>, FuncType([ValType.i32(),ValType.i32()],[ValType.i32()]), foor_bar)</span><br><span class="line">    store = Store(linker.engine)</span><br><span class="line">    config = Config()</span><br><span class="line">    _bindings.wasmtime_config_max_wasm_stack_set(config.ptr(), <span class="number">16777216</span>)</span><br><span class="line">    wasi_config = WasiConfig()</span><br><span class="line">    <span class="comment"># wasi_config.stdin_file = sys.stdin.fileno()</span></span><br><span class="line">    <span class="comment"># wasi_config.stdout_file = sys.stdout.fileno()</span></span><br><span class="line">    <span class="comment"># wasi_config.stderr_file = sys.stderr.fileno()</span></span><br><span class="line">    wasi_config.env = [[<span class="string">&quot;PYTHONPATH&quot;</span>, <span class="string">&quot;/cross-build/wasm32-wasi/build/lib.wasi-wasm32-3.14-pydebug&quot;</span>]]</span><br><span class="line">    wasi_config.inherit_stdout()</span><br><span class="line">    wasi_config.inherit_stderr()</span><br><span class="line">    wasi_config.inherit_stdin()</span><br><span class="line">    wasi_config.preopen_dir(<span class="string">&quot;/workspaces/cpython-wasi&quot;</span>,<span class="string">&quot;/&quot;</span>)</span><br><span class="line">    store.set_wasi(wasi_config)</span><br><span class="line"></span><br><span class="line">    instance=linker.instantiate(store, module)</span><br><span class="line">    instance.exports(store)[<span class="string">&quot;_start&quot;</span>](store)</span><br><span class="line"></span><br><span class="line">test_wasi()</span><br></pre></td></tr></table></figure><p>Since wasmtime’s Python binding is a direct ctype wrapper, many config options are not exposed in the public API (such as using wasmtime_config_max_wasm_stack_set to handle WASM’s stack), which leads to many operations requiring the use of unexposed private APIs. This is too tricky, so I chose to reimplement this set of operations using Rust:</p><figure class="highlight rust"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">use</span> wasmtime::*;</span><br><span class="line"><span class="keyword">use</span> wasmtime_wasi::preview1::&#123;<span class="keyword">self</span>&#125;;</span><br><span class="line"><span class="keyword">use</span> wasmtime_wasi::WasiCtxBuilder;</span><br><span class="line"><span class="keyword">fn</span> <span class="title function_">main</span>() &#123;</span><br><span class="line">    <span class="keyword">let</span> <span class="keyword">mut </span><span class="variable">config</span> = Config::<span class="title function_ invoke__">new</span>();</span><br><span class="line">    config.<span class="title function_ invoke__">max_wasm_stack</span>(<span class="number">16777216</span>);</span><br><span class="line">    <span class="keyword">match</span> Engine::<span class="title function_ invoke__">new</span>(&amp;config) &#123;</span><br><span class="line">        <span class="title function_ invoke__">Ok</span>(engine) =&gt; &#123;</span><br><span class="line">            <span class="keyword">let</span> <span class="keyword">mut </span><span class="variable">linker</span> = Linker::<span class="title function_ invoke__">new</span>(&amp;engine);</span><br><span class="line">            preview1::<span class="title function_ invoke__">add_to_linker_sync</span>(&amp;<span class="keyword">mut</span> linker, |t| t).<span class="title function_ invoke__">unwrap</span>();</span><br><span class="line">            linker.<span class="title function_ invoke__">allow_unknown_exports</span>(<span class="literal">true</span>);</span><br><span class="line">            <span class="keyword">let</span> <span class="keyword">mut </span><span class="variable">builder</span> = WasiCtxBuilder::<span class="title function_ invoke__">new</span>();</span><br><span class="line">            builder.<span class="title function_ invoke__">inherit_stdio</span>();</span><br><span class="line">            builder.<span class="title function_ invoke__">env</span>(</span><br><span class="line">                <span class="string">&quot;PYTHONPATH&quot;</span>,</span><br><span class="line">                <span class="string">&quot;/cross-build/wasm32-wasi/build/lib.wasi-wasm32-3.14-pydebug&quot;</span>,</span><br><span class="line">            );</span><br><span class="line">            builder</span><br><span class="line">                .<span class="title function_ invoke__">preopened_dir</span>(</span><br><span class="line">                    <span class="string">&quot;/workspaces/cpython-wasi&quot;</span>,</span><br><span class="line">                    <span class="string">&quot;/&quot;</span>,</span><br><span class="line">                    wasmtime_wasi::DirPerms::<span class="title function_ invoke__">all</span>(),</span><br><span class="line">                    wasmtime_wasi::FilePerms::<span class="title function_ invoke__">all</span>(),</span><br><span class="line">                )</span><br><span class="line">                .<span class="title function_ invoke__">unwrap</span>();</span><br><span class="line">            builder.<span class="title function_ invoke__">args</span>(&amp;[<span class="string">&quot;--&quot;</span>, <span class="string">&quot;--version&quot;</span>]);</span><br><span class="line">            <span class="keyword">let</span> <span class="variable">wasi_ctx</span> = builder.<span class="title function_ invoke__">build_p1</span>();</span><br><span class="line">            <span class="keyword">let</span> <span class="keyword">mut </span><span class="variable">store</span> = Store::<span class="title function_ invoke__">new</span>(&amp;engine, wasi_ctx);</span><br><span class="line">            <span class="keyword">let</span> <span class="variable">module</span> = Module::<span class="title function_ invoke__">from_file</span>(</span><br><span class="line">                &amp;engine,</span><br><span class="line">                <span class="string">&quot;/workspaces/cpython-wasi/cross-build/wasm32-wasi/python.wasm&quot;</span>,</span><br><span class="line">            )</span><br><span class="line">            .<span class="title function_ invoke__">unwrap</span>();</span><br><span class="line">            <span class="keyword">let</span> <span class="variable">instance</span> = linker.<span class="title function_ invoke__">instantiate</span>(&amp;<span class="keyword">mut</span> store, &amp;module).<span class="title function_ invoke__">unwrap</span>();</span><br><span class="line">            <span class="keyword">let</span> <span class="variable">run</span> = instance</span><br><span class="line">                .get_typed_func::&lt;(), ()&gt;(&amp;<span class="keyword">mut</span> store, <span class="string">&quot;_start&quot;</span>)</span><br><span class="line">                .<span class="title function_ invoke__">unwrap</span>();</span><br><span class="line">            run.<span class="title function_ invoke__">call</span>(&amp;<span class="keyword">mut</span> store, ()).<span class="title function_ invoke__">unwrap</span>();</span><br><span class="line">            <span class="keyword">return</span>;</span><br><span class="line">        &#125;</span><br><span class="line">        <span class="title function_ invoke__">Err</span>(e) =&gt; &#123;</span><br><span class="line">            <span class="built_in">println!</span>(<span class="string">&quot;Error creating engine: &#123;:?&#125;&quot;</span>, e);</span><br><span class="line">            <span class="keyword">return</span>;</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>Then we execute the code, success!</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">[root@267e91be24fd wasmtime-demo]# cargo run --release</span><br><span class="line">   Compiling wasmtime-demo v0.1.0 (/workspaces/wasmtime-demo)</span><br><span class="line">    Finished `release` profile [optimized] target(s) <span class="keyword">in</span> 1.81s</span><br><span class="line">     Running `target/release/wasmtime-demo`</span><br><span class="line">Python 3.14.0a0</span><br></pre></td></tr></table></figure><p>Now let’s extend our CPython. First, note that since dlopen is not supported in WASM/WASI for CPython, we need to modify the Python core itself.</p><p>First, we add a new file in Python’s Modules directory, named <code>demo.c</code>, with the following content:</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string">&lt;Python.h&gt;</span></span></span><br><span class="line"></span><br><span class="line"><span class="keyword">extern</span> <span class="type">int</span> <span class="title function_">demo</span><span class="params">(<span class="type">int</span> a, <span class="type">int</span> b)</span> &#123;</span><br><span class="line"><span class="keyword">return</span> a + b;</span><br><span class="line">&#125;</span><br><span class="line"><span class="type">static</span> PyObject *</span><br><span class="line"><span class="title function_">foo_bar</span><span class="params">(PyObject *self, PyObject *args)</span></span><br><span class="line">&#123;</span><br><span class="line">Py_INCREF(PyExc_TypeError);</span><br><span class="line"><span class="keyword">return</span> PyLong_FromLong((<span class="type">long</span>) demo(<span class="number">1</span>, <span class="number">2</span>));</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="type">static</span> PyMethodDef foomethods[] = &#123;</span><br><span class="line">&#123;<span class="string">&quot;bar&quot;</span>, foo_bar, METH_VARARGS, <span class="string">&quot;&quot;</span>&#125;,</span><br><span class="line">&#123;<span class="literal">NULL</span>, <span class="literal">NULL</span>, <span class="number">0</span>, <span class="literal">NULL</span>&#125;,</span><br><span class="line">&#125;;</span><br><span class="line"></span><br><span class="line"><span class="type">static</span> PyModuleDef foomodule = &#123;</span><br><span class="line">PyModuleDef_HEAD_INIT,</span><br><span class="line">.m_name = <span class="string">&quot;demo&quot;</span>,</span><br><span class="line">.m_doc = <span class="string">&quot;foo test module&quot;</span>,</span><br><span class="line">.m_size = <span class="number">-1</span>,</span><br><span class="line">.m_methods = foomethods,</span><br><span class="line">&#125;;</span><br><span class="line"></span><br><span class="line">PyMODINIT_FUNC</span><br><span class="line"><span class="title function_">PyInit_demo</span><span class="params">(<span class="type">void</span>)</span></span><br><span class="line">&#123;</span><br><span class="line"><span class="keyword">return</span> PyModule_Create(&amp;foomodule);</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>Then we add a line in <code>Modules/Setup.bootstrap.in</code>:</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">demo demo.c</span><br></pre></td></tr></table></figure><p>Next, we re-execute the command <code>python3 Tools/wasm/wasi.py build -- --config-cache --with-pydebug</code> to generate new WASM/WASI bytecode. Then we change the args part in our previous Rust code to <code>[&quot;--&quot;, &quot;-c&quot;, &quot;import demo; print(demo.bar())&quot;]</code>, and re-execute the code, success!</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">[root@267e91be24fd wasmtime-demo]# cargo run --release</span><br><span class="line">   Compiling wasmtime-demo v0.1.0 (/workspaces/wasmtime-demo)</span><br><span class="line">    Finished `release` profile [optimized] target(s) <span class="keyword">in</span> 1.73s</span><br><span class="line">     Running `target/release/wasmtime-demo`</span><br><span class="line">3</span><br></pre></td></tr></table></figure><p>Now, we have an extension module, demo.c, but the problem is that the core <code>demo</code> function in our current demo.c is hardcoded. So we need to handle this.</p><p>Typically, in regular practice, we can separate the implementation and definition of functions to facilitate dynamic linking. WASM/WASI is similar, but requires additional handling:</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">extern</span> <span class="type">int</span> <span class="title function_">demo</span><span class="params">(<span class="type">int</span> a, <span class="type">int</span> b)</span> __<span class="title function_">attribute__</span><span class="params">((</span></span><br><span class="line"><span class="params">    __import_module__(<span class="string">&quot;demo&quot;</span>),</span></span><br><span class="line"><span class="params">    __import_name__(<span class="string">&quot;demo&quot;</span>),</span></span><br><span class="line"><span class="params">))</span>;</span><br></pre></td></tr></table></figure><p>Here, we use extended macro definitions to tell the compiler at compile time that the demo function is imported from the demo module. This way, we can extend it in subsequent Host Functions according to the convention.</p><p>Then we need to modify CPython’s compilation script, adding <code>-Wextra -Wl,--allow-undefined</code> to the compilation parameters.</p><p>Next, re-execute <code>python3 Tools/wasm/wasi.py build -- --config-cache --with-pydebug</code> to generate new WASM/WASI bytecode. At this point, we can first execute <code>python.sh</code>, and we’ll get an error:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">Error: failed to run main module `/workspaces/cpython-wasi/cross-build/wasm32-wasi/python.wasm`</span><br><span class="line"></span><br><span class="line">Caused by:</span><br><span class="line">    0: failed to instantiate <span class="string">&quot;/workspaces/cpython-wasi/cross-build/wasm32-wasi/python.wasm&quot;</span></span><br><span class="line">    1: unknown import: `demo::demo` has not been defined</span><br></pre></td></tr></table></figure><p>This is as expected.</p><p>So now let’s reprocess our Rust code:</p><figure class="highlight rust"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">use</span> wasmtime::*;</span><br><span class="line"><span class="keyword">use</span> wasmtime_wasi::preview1::&#123;<span class="keyword">self</span>&#125;;</span><br><span class="line"><span class="keyword">use</span> wasmtime_wasi::WasiCtxBuilder;</span><br><span class="line"><span class="keyword">fn</span> <span class="title function_">main</span>() &#123;</span><br><span class="line">    <span class="keyword">let</span> <span class="keyword">mut </span><span class="variable">config</span> = Config::<span class="title function_ invoke__">new</span>();</span><br><span class="line">    config.<span class="title function_ invoke__">max_wasm_stack</span>(<span class="number">16777216</span>);</span><br><span class="line">    <span class="keyword">match</span> Engine::<span class="title function_ invoke__">new</span>(&amp;config) &#123;</span><br><span class="line">        <span class="title function_ invoke__">Ok</span>(engine) =&gt; &#123;</span><br><span class="line">            <span class="keyword">let</span> <span class="keyword">mut </span><span class="variable">linker</span> = Linker::<span class="title function_ invoke__">new</span>(&amp;engine);</span><br><span class="line">            preview1::<span class="title function_ invoke__">add_to_linker_sync</span>(&amp;<span class="keyword">mut</span> linker, |t| t).<span class="title function_ invoke__">unwrap</span>();</span><br><span class="line">            linker</span><br><span class="line">                .<span class="title function_ invoke__">func_wrap</span>(<span class="string">&quot;demo&quot;</span>, <span class="string">&quot;demo&quot;</span>, |a: <span class="type">i32</span>, b: <span class="type">i32</span>| &#123;</span><br><span class="line">                    (a+b)*<span class="number">10</span></span><br><span class="line">                &#125;)</span><br><span class="line">                .<span class="title function_ invoke__">unwrap</span>();</span><br><span class="line">            linker.<span class="title function_ invoke__">allow_unknown_exports</span>(<span class="literal">true</span>);</span><br><span class="line">            <span class="keyword">let</span> <span class="keyword">mut </span><span class="variable">builder</span> = WasiCtxBuilder::<span class="title function_ invoke__">new</span>();</span><br><span class="line">            builder.<span class="title function_ invoke__">inherit_stdio</span>();</span><br><span class="line">            builder.<span class="title function_ invoke__">env</span>(</span><br><span class="line">                <span class="string">&quot;PYTHONPATH&quot;</span>,</span><br><span class="line">                <span class="string">&quot;/cross-build/wasm32-wasi/build/lib.wasi-wasm32-3.14-pydebug&quot;</span>,</span><br><span class="line">            );</span><br><span class="line">            builder</span><br><span class="line">                .<span class="title function_ invoke__">preopened_dir</span>(</span><br><span class="line">                    <span class="string">&quot;/workspaces/cpython-wasi&quot;</span>,</span><br><span class="line">                    <span class="string">&quot;/&quot;</span>,</span><br><span class="line">                    wasmtime_wasi::DirPerms::<span class="title function_ invoke__">all</span>(),</span><br><span class="line">                    wasmtime_wasi::FilePerms::<span class="title function_ invoke__">all</span>(),</span><br><span class="line">                )</span><br><span class="line">                .<span class="title function_ invoke__">unwrap</span>();</span><br><span class="line">            builder.<span class="title function_ invoke__">args</span>(&amp;[<span class="string">&quot;--&quot;</span>, <span class="string">&quot;-c&quot;</span>, <span class="string">&quot;import demo; print(demo.bar())&quot;</span>]);</span><br><span class="line">            <span class="keyword">let</span> <span class="variable">wasi_ctx</span> = builder.<span class="title function_ invoke__">build_p1</span>();</span><br><span class="line">            <span class="keyword">let</span> <span class="keyword">mut </span><span class="variable">store</span> = Store::<span class="title function_ invoke__">new</span>(&amp;engine, wasi_ctx);</span><br><span class="line">            <span class="keyword">let</span> <span class="variable">module</span> = Module::<span class="title function_ invoke__">from_file</span>(</span><br><span class="line">                &amp;engine,</span><br><span class="line">                <span class="string">&quot;/workspaces/cpython-wasi/cross-build/wasm32-wasi/python.wasm&quot;</span>,</span><br><span class="line">            )</span><br><span class="line">            .<span class="title function_ invoke__">unwrap</span>();</span><br><span class="line">            <span class="keyword">let</span> <span class="variable">instance</span> = linker.<span class="title function_ invoke__">instantiate</span>(&amp;<span class="keyword">mut</span> store, &amp;module).<span class="title function_ invoke__">unwrap</span>();</span><br><span class="line">            <span class="keyword">let</span> <span class="variable">run</span> = instance</span><br><span class="line">                .get_typed_func::&lt;(), ()&gt;(&amp;<span class="keyword">mut</span> store, <span class="string">&quot;_start&quot;</span>)</span><br><span class="line">                .<span class="title function_ invoke__">unwrap</span>();</span><br><span class="line">            run.<span class="title function_ invoke__">call</span>(&amp;<span class="keyword">mut</span> store, ()).<span class="title function_ invoke__">unwrap</span>();</span><br><span class="line">            <span class="keyword">return</span>;</span><br><span class="line">        &#125;</span><br><span class="line">        <span class="title function_ invoke__">Err</span>(e) =&gt; &#123;</span><br><span class="line">            <span class="built_in">println!</span>(<span class="string">&quot;Error creating engine: &#123;:?&#125;&quot;</span>, e);</span><br><span class="line">            <span class="keyword">return</span>;</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>Execute it, and we get the result:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">[root@267e91be24fd wasmtime-demo]# cargo run --release</span><br><span class="line">   Compiling wasmtime-demo v0.1.0 (/workspaces/wasmtime-demo)</span><br><span class="line">    Finished `release` profile [optimized] target(s) <span class="keyword">in</span> 1.79s</span><br><span class="line">     Running `target/release/wasmtime-demo`</span><br><span class="line">30</span><br></pre></td></tr></table></figure><p>As expected.</p><p>Alright, now we support Host Functions, and we can modify our logic arbitrarily while adhering to the function signature. But do you remember the title of this article? We want to execute Python-implemented Host Functions. Hmm, although it’s a bit roundabout, it’s not impossible. Let’s directly bring out PyO3 and modify our Rust code as follows:</p><figure class="highlight rust"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">use</span> pyo3::prelude::*;</span><br><span class="line"><span class="keyword">use</span> pyo3::types::PyTuple;</span><br><span class="line"><span class="keyword">use</span> wasmtime::*;</span><br><span class="line"><span class="keyword">use</span> wasmtime_wasi::preview1::&#123;<span class="keyword">self</span>&#125;;</span><br><span class="line"><span class="keyword">use</span> wasmtime_wasi::WasiCtxBuilder;</span><br><span class="line"><span class="keyword">fn</span> <span class="title function_">main</span>() &#123;</span><br><span class="line">    <span class="keyword">let</span> <span class="keyword">mut </span><span class="variable">config</span> = Config::<span class="title function_ invoke__">new</span>();</span><br><span class="line">    config.<span class="title function_ invoke__">max_wasm_stack</span>(<span class="number">16777216</span>);</span><br><span class="line">    <span class="keyword">match</span> Engine::<span class="title function_ invoke__">new</span>(&amp;config) &#123;</span><br><span class="line">        <span class="title function_ invoke__">Ok</span>(engine) =&gt; &#123;</span><br><span class="line">            <span class="keyword">let</span> <span class="keyword">mut </span><span class="variable">linker</span> = Linker::<span class="title function_ invoke__">new</span>(&amp;engine);</span><br><span class="line">            preview1::<span class="title function_ invoke__">add_to_linker_sync</span>(&amp;<span class="keyword">mut</span> linker, |t| t).<span class="title function_ invoke__">unwrap</span>();</span><br><span class="line">            linker</span><br><span class="line">                .<span class="title function_ invoke__">func_wrap</span>(<span class="string">&quot;demo&quot;</span>, <span class="string">&quot;demo&quot;</span>, |a: <span class="type">i32</span>, b: <span class="type">i32</span>| &#123;</span><br><span class="line">                    Python::<span class="title function_ invoke__">with_gil</span>(|py| &#123;</span><br><span class="line">                        <span class="keyword">let</span> <span class="variable">fun</span>: Py&lt;PyAny&gt; = PyModule::<span class="title function_ invoke__">from_code_bound</span>(</span><br><span class="line">                            py,</span><br><span class="line">                            <span class="string">&quot;def example(*args, **kwargs):</span></span><br><span class="line"><span class="string">                                return (args[0] + args[1])*11&quot;</span>,</span><br><span class="line">                            <span class="string">&quot;&quot;</span>,</span><br><span class="line">                            <span class="string">&quot;&quot;</span>,</span><br><span class="line">                        )</span><br><span class="line">                        .<span class="title function_ invoke__">unwrap</span>()</span><br><span class="line">                        .<span class="title function_ invoke__">getattr</span>(<span class="string">&quot;example&quot;</span>)</span><br><span class="line">                        .<span class="title function_ invoke__">unwrap</span>()</span><br><span class="line">                        .<span class="title function_ invoke__">into</span>();</span><br><span class="line">                        <span class="keyword">let</span> <span class="variable">args</span> = PyTuple::<span class="title function_ invoke__">new_bound</span>(py, &amp;[a, b]);</span><br><span class="line">                        <span class="comment">// cast following to int</span></span><br><span class="line"></span><br><span class="line">                        fun.<span class="title function_ invoke__">call1</span>(py, args).<span class="title function_ invoke__">unwrap</span>().extract::&lt;<span class="type">i32</span>&gt;(py).<span class="title function_ invoke__">unwrap</span>()</span><br><span class="line">                    &#125;)</span><br><span class="line">                &#125;)</span><br><span class="line">                .<span class="title function_ invoke__">unwrap</span>();</span><br><span class="line">            linker.<span class="title function_ invoke__">allow_unknown_exports</span>(<span class="literal">true</span>);</span><br><span class="line">            <span class="keyword">let</span> <span class="keyword">mut </span><span class="variable">builder</span> = WasiCtxBuilder::<span class="title function_ invoke__">new</span>();</span><br><span class="line">            builder.<span class="title function_ invoke__">inherit_stdio</span>();</span><br><span class="line">            builder.<span class="title function_ invoke__">env</span>(</span><br><span class="line">                <span class="string">&quot;PYTHONPATH&quot;</span>,</span><br><span class="line">                <span class="string">&quot;/cross-build/wasm32-wasi/build/lib.wasi-wasm32-3.14-pydebug&quot;</span>,</span><br><span class="line">            );</span><br><span class="line">            builder</span><br><span class="line">                .<span class="title function_ invoke__">preopened_dir</span>(</span><br><span class="line">                    <span class="string">&quot;/workspaces/cpython-wasi&quot;</span>,</span><br><span class="line">                    <span class="string">&quot;/&quot;</span>,</span><br><span class="line">                    wasmtime_wasi::DirPerms::<span class="title function_ invoke__">all</span>(),</span><br><span class="line">                    wasmtime_wasi::FilePerms::<span class="title function_ invoke__">all</span>(),</span><br><span class="line">                )</span><br><span class="line">                .<span class="title function_ invoke__">unwrap</span>();</span><br><span class="line">            builder.<span class="title function_ invoke__">args</span>(&amp;[<span class="string">&quot;--&quot;</span>, <span class="string">&quot;-c&quot;</span>, <span class="string">&quot;import demo; print(demo.bar())&quot;</span>]);</span><br><span class="line">            <span class="keyword">let</span> <span class="variable">wasi_ctx</span> = builder.<span class="title function_ invoke__">build_p1</span>();</span><br><span class="line">            <span class="keyword">let</span> <span class="keyword">mut </span><span class="variable">store</span> = Store::<span class="title function_ invoke__">new</span>(&amp;engine, wasi_ctx);</span><br><span class="line">            <span class="keyword">let</span> <span class="variable">module</span> = Module::<span class="title function_ invoke__">from_file</span>(</span><br><span class="line">                &amp;engine,</span><br><span class="line">                <span class="string">&quot;/workspaces/cpython-wasi/cross-build/wasm32-wasi/python.wasm&quot;</span>,</span><br><span class="line">            )</span><br><span class="line">            .<span class="title function_ invoke__">unwrap</span>();</span><br><span class="line">            <span class="keyword">let</span> <span class="variable">instance</span> = linker.<span class="title function_ invoke__">instantiate</span>(&amp;<span class="keyword">mut</span> store, &amp;module).<span class="title function_ invoke__">unwrap</span>();</span><br><span class="line">            <span class="keyword">let</span> <span class="variable">run</span> = instance</span><br><span class="line">                .get_typed_func::&lt;(), ()&gt;(&amp;<span class="keyword">mut</span> store, <span class="string">&quot;_start&quot;</span>)</span><br><span class="line">                .<span class="title function_ invoke__">unwrap</span>();</span><br><span class="line">            run.<span class="title function_ invoke__">call</span>(&amp;<span class="keyword">mut</span> store, ()).<span class="title function_ invoke__">unwrap</span>();</span><br><span class="line">            <span class="keyword">return</span>;</span><br><span class="line">        &#125;</span><br><span class="line">        <span class="title function_ invoke__">Err</span>(e) =&gt; &#123;</span><br><span class="line">            <span class="built_in">println!</span>(<span class="string">&quot;Error creating engine: &#123;:?&#125;&quot;</span>, e);</span><br><span class="line">            <span class="keyword">return</span>;</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>Then execute it, and we get the result:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">[root@267e91be24fd wasmtime-demo]# cargo run --release</span><br><span class="line">   Compiling wasmtime-demo v0.1.0 (/workspaces/wasmtime-demo)</span><br><span class="line">    Finished `release` profile [optimized] target(s) <span class="keyword">in</span> 1.75s</span><br><span class="line">     Running `target/release/wasmtime-demo`</span><br><span class="line">33</span><br></pre></td></tr></table></figure><p>OK, we succeeded!</p><h2 id="Summary"><a href="#Summary" class="headerlink" title="Summary"></a>Summary</h2><p>This article is actually a Proof of Concept (PoC) for a technical route, verifying the possibility of combining Python and WASI in specific situations. However, it also exposes some problems:</p><ol><li>The lack of dlopen support requires modifying the CPython runtime code itself. However, according to information provided in Brett Cannon’s blog, someone has hacked this part of the code to provide support. It feels like we can follow up on this later.</li><li>The wasmtime Python binding is really difficult to use. We could consider wrapping it once based on PyO3.</li><li>Using Rust to handle wasmtime and PyO3 to call Python code currently has the problem that Python VM objects cannot be shared across threads. We might need to encapsulate a set of channels similar to Golang based on Rust to reuse the virtual machine and pass data.</li></ol><p>However, overall, this PoC is still very interesting. I hope friends can also have fun playing with it.</p><h2 id="References"><a href="#References" class="headerlink" title="References"></a>References</h2><div id="refer-anchor-1"></div><ul><li>[1]. <a href="https://bugs.python.org/issue40280">https://bugs.python.org/issue40280</a></li></ul><div id="refer-anchor-2"></div><ul><li>[2]. <a href="https://github.com/python/cpython/issues/116314">https://github.com/python/cpython/issues/116314</a></li></ul>]]></content>
    
    
    <summary type="html">&lt;p&gt;During the National Day holiday, I worked on a project to use wasmtime to execute CPython virtual machine compiled into WASM/WASI bytecode, and extend it with Host Functions implemented in Python on the host side.&lt;/p&gt;
&lt;p&gt;I’d like to clarify again that this is just a personal project I wanted to work on, without any validation in production environments, just for fun (XDDD&lt;/p&gt;</summary>
    
    
    
    <category term="编程" scheme="https://www.manjusaka.blog/categories/%E7%BC%96%E7%A8%8B/"/>
    
    <category term="CPython" scheme="https://www.manjusaka.blog/categories/%E7%BC%96%E7%A8%8B/CPython/"/>
    
    
    <category term="编程" scheme="https://www.manjusaka.blog/tags/%E7%BC%96%E7%A8%8B/"/>
    
    <category term="CPython" scheme="https://www.manjusaka.blog/tags/CPython/"/>
    
    <category term="Notes" scheme="https://www.manjusaka.blog/tags/Notes/"/>
    
    <category term="WASI" scheme="https://www.manjusaka.blog/tags/WASI/"/>
    
    <category term="WASM" scheme="https://www.manjusaka.blog/tags/WASM/"/>
    
    <category term="Blog" scheme="https://www.manjusaka.blog/tags/Blog/"/>
    
  </entry>
  
  <entry>
    <title>Debug 日志：eCapture GH-604</title>
    <link href="https://www.manjusaka.blog/posts/2024/09/18/a-live-debug-ecapture-gh-604/"/>
    <id>https://www.manjusaka.blog/posts/2024/09/18/a-live-debug-ecapture-gh-604/</id>
    <published>2024-09-18T14:00:00.000Z</published>
    <updated>2026-03-29T17:00:43.280Z</updated>
    
    <content type="html"><![CDATA[<script src="/assets/js/DPlayer.min.js"> </script><p>Debug 日志系列第二篇，eCapture 的 GH-604， 一个和 Go， Glibc，静态编译相关的问题</p><p>太长不看版：在 eCapture 中，由于在静态链接时 glibc 版本的差异，导致在 Ubuntu 下编译的二进制会在特定发行版上 Segment fault</p><span id="more"></span><h2 id="开篇"><a href="#开篇" class="headerlink" title="开篇"></a>开篇</h2><p>首先介绍下 eCapture，这个项目是基于 eBPF 做的一套安全工具，核心的能力是可以提供在旁路对于 TLS 流量解密的能力</p><p>在8月25日的时候，社区反馈了一个 bug，编号 GH-604，其核心行为如下</p><p>下载在 GitHub Release 中发布的二进制，在 Arch Linux 下会 Segment Fault，报错大致如下</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br></pre></td><td class="code"><pre><span class="line"><span class="number">2024</span><span class="number">-09</span><span class="number">-18</span>T21:<span class="number">10</span>:<span class="number">47</span>+<span class="number">08</span>:<span class="number">00</span> INF BTF bytecode mode: CORE. btfMode=<span class="number">0</span></span><br><span class="line"><span class="number">2024</span><span class="number">-09</span><span class="number">-18</span>T21:<span class="number">10</span>:<span class="number">47</span>+<span class="number">08</span>:<span class="number">00</span> INF module initialization. isReload=<span class="literal">false</span> moduleName=EBPFProbeOPENSSL</span><br><span class="line"><span class="number">2024</span><span class="number">-09</span><span class="number">-18</span>T21:<span class="number">10</span>:<span class="number">47</span>+<span class="number">08</span>:<span class="number">00</span> INF Module.Run()</span><br><span class="line">SIGSEGV: segmentation violation</span><br><span class="line">PC=<span class="number">0x7f29ee844696</span> m=<span class="number">5</span> sigcode=<span class="number">1</span> addr=<span class="number">0x1e83c0</span></span><br><span class="line">signal arrived during cgo execution</span><br><span class="line"></span><br><span class="line">goroutine <span class="number">19</span> gp=<span class="number">0xc0005b81c0</span> m=<span class="number">5</span> mp=<span class="number">0xc000100008</span> [syscall]:</span><br><span class="line">runtime.cgocall(<span class="number">0x10990e0</span>, <span class="number">0xc0000bca90</span>)</span><br><span class="line">        /root/.<span class="keyword">go</span>/src/runtime/cgocall.<span class="keyword">go</span>:<span class="number">167</span> +<span class="number">0x4b</span> fp=<span class="number">0xc0000bca58</span> sp=<span class="number">0xc0000bca20</span> pc=<span class="number">0x4739ab</span></span><br><span class="line">net._C2func_getaddrinfo(<span class="number">0xc00058e3c0</span>, <span class="number">0x0</span>, <span class="number">0xc0005886f0</span>, <span class="number">0xc00058a0a0</span>)</span><br><span class="line">        _cgo_gotypes.<span class="keyword">go</span>:<span class="number">108</span> +<span class="number">0x55</span> fp=<span class="number">0xc0000bca90</span> sp=<span class="number">0xc0000bca58</span> pc=<span class="number">0x84a7f5</span></span><br><span class="line">net._C_getaddrinfo.func1(<span class="number">0xc00058e3c0</span>, <span class="number">0x0</span>, <span class="number">0xc0005886f0</span>, <span class="number">0xc00058a0a0</span>)</span><br><span class="line">        /root/.<span class="keyword">go</span>/src/net/cgo_unix_cgo.<span class="keyword">go</span>:<span class="number">78</span> +<span class="number">0xeb</span> fp=<span class="number">0xc0000bcb48</span> sp=<span class="number">0xc0000bca90</span> pc=<span class="number">0x84af4b</span></span><br><span class="line">net._C_getaddrinfo(<span class="number">0xc00058e3c0</span>, <span class="number">0x0</span>, <span class="number">0xc0005886f0</span>, <span class="number">0xc00058a0a0</span>)</span><br><span class="line">        /root/.<span class="keyword">go</span>/src/net/cgo_unix_cgo.<span class="keyword">go</span>:<span class="number">78</span> +<span class="number">0x6c</span> fp=<span class="number">0xc0000bcbd0</span> sp=<span class="number">0xc0000bcb48</span> pc=<span class="number">0x84adac</span></span><br><span class="line">net.cgoLookupHostIP(&#123;<span class="number">0x1351556</span>, <span class="number">0x3</span>&#125;, &#123;<span class="number">0x13727d2</span>, <span class="number">0x9</span>&#125;)</span><br><span class="line">        /root/.<span class="keyword">go</span>/src/net/cgo_unix.<span class="keyword">go</span>:<span class="number">181</span> +<span class="number">0x3f9</span> fp=<span class="number">0xc0000bce38</span> sp=<span class="number">0xc0000bcbd0</span> pc=<span class="number">0x7f65b9</span></span><br><span class="line">net.cgoLookupIP.func1()</span><br><span class="line">        /root/.<span class="keyword">go</span>/src/net/cgo_unix.<span class="keyword">go</span>:<span class="number">226</span> +<span class="number">0x85</span> fp=<span class="number">0xc0000bcf00</span> sp=<span class="number">0xc0000bce38</span> pc=<span class="number">0x7f7145</span></span><br><span class="line">net.doBlockingWithCtx[...].func1()</span><br><span class="line">        /root/.<span class="keyword">go</span>/src/net/cgo_unix.<span class="keyword">go</span>:<span class="number">70</span> +<span class="number">0x8f</span> fp=<span class="number">0xc0000bcfe0</span> sp=<span class="number">0xc0000bcf00</span> pc=<span class="number">0x84de4f</span></span><br><span class="line">runtime.goexit(&#123;&#125;)</span><br><span class="line">        /root/.<span class="keyword">go</span>/src/runtime/asm_amd64.s:<span class="number">1700</span> +<span class="number">0x1</span> fp=<span class="number">0xc0000bcfe8</span> sp=<span class="number">0xc0000bcfe0</span> pc=<span class="number">0x482301</span></span><br><span class="line">created by net.doBlockingWithCtx[...] in goroutine <span class="number">18</span></span><br><span class="line">        /root/.<span class="keyword">go</span>/src/net/cgo_unix.<span class="keyword">go</span>:<span class="number">67</span> +<span class="number">0x3c5</span></span><br></pre></td></tr></table></figure><p>我在 Garuda 下能复现同样的问题，由于作者没有 Arch Linux 的环境，那么就由我来接手了</p><p>最开始的排查方向是先利用容器环境进行启动，发现执行正常。那么目前可以初步判断是依赖的二进制版本不同导致的问题，但是 eCapture 依赖的二进制有点多，那么怎么办呢？</p><p>这个时候 issue 的提出者提供了一个关键点，这个问题是 v0.8.1 之后出现的，那么很好办，祭出我们的 <code>git bisect</code> 大法</p><p>最后确定是 938fcffb95e23015af8643ae046c0e912de0a438 带来的问题，我们来看一下代码，这个代码核心的的变更在于</p><ol><li>重构了一部分 Module 的注册逻辑</li><li>引入 Gin 框架来作为 HTTP Configuration 变更的框架</li></ol><p>那么这里我们来调试一下，因为原本的二进制是 strip 了符号信息，我们先关闭符号信息， 然后上 gdb ，获取崩溃时的栈信息，能得到如下信息</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br></pre></td><td class="code"><pre><span class="line">[Switching to LWP 1772723]</span><br><span class="line">0x00007fffabe44696 in __ctype_init () from /usr/lib/libc.so.6</span><br><span class="line">(gdb) bt</span><br><span class="line">#0  0x00007fffabe44696 in __ctype_init () from /usr/lib/libc.so.6</span><br><span class="line">#1  0x00007fffabf785d1 in __libc_early_init () from /usr/lib/libc.so.6</span><br><span class="line">#2  0x000000000118729f in dl_open_worker_begin ()</span><br><span class="line">#3  0x000000000113a7b8 in _dl_catch_exception ()</span><br><span class="line">#4  0x0000000001186469 in dl_open_worker ()</span><br><span class="line">#5  0x000000000113a7b8 in _dl_catch_exception ()</span><br><span class="line">#6  0x000000000118681b in _dl_open ()</span><br><span class="line">#7  0x000000000113a8f6 in do_dlopen ()</span><br><span class="line">#8  0x000000000113a7b8 in _dl_catch_exception ()</span><br><span class="line">#9  0x000000000113a883 in _dl_catch_error ()</span><br><span class="line">#10 0x000000000113aa74 in __libc_dlopen_mode ()</span><br><span class="line">#11 0x0000000001128eb5 in module_load ()</span><br><span class="line">#12 0x0000000001129315 in __nss_module_get_function ()</span><br><span class="line">#13 0x0000000001118fec in getaddrinfo ()</span><br><span class="line">#14 0x0000000001099119 in _cgo_04fbb8f65a5f_C2func_getaddrinfo (v=0xc00013ca90) at cgo-gcc-prolog:60</span><br><span class="line">#15 0x0000000000481f84 in runtime.asmcgocall () at /root/.go/src/runtime/asm_amd64.s:923</span><br><span class="line">#16 0x000000c0001048c0 in ?? ()</span><br><span class="line">#17 0x000000000048045a in runtime.morestack () at /root/.go/src/runtime/asm_amd64.s:621</span><br><span class="line">#18 0x47681163f543b200 in ?? ()</span><br><span class="line">#19 0x0100000000000016 in ?? ()</span><br><span class="line">#20 0x0000000000800000 in net.(*sysDialer).dialSerial (sd=0x0, ctx=..., ras=..., ~r0=..., ~r1=...) at /root/.go/src/net/dial.go:630</span><br><span class="line">#21 0x0000000000000000 in ?? ()</span><br></pre></td></tr></table></figure><p>我们能看到 <code>net.(*sysDialer).dialSerial</code> 非常显眼，这个函数通常是在使用 net.Dialer ，进行 TCP 的监听时处理的，我们根据这一个信息，对比 code diff，便能确定，这一点是我们所引入 Gin 框架，执行 TCP 监听流程时遇到问题。</p><p>我们再往下看，我们能看到 <code>getaddrinfo</code> 这个函数，这个是执行 DNS Lookup 的痕迹。我们将代码中的 <code>localhost:xx</code> 更改为 IP 地址的形式，如同我们所预料的一样，问题消失了</p><p>那么我们可以判定，这个问题是 Golang 走 CGO 调用 <code>getaddrinfo</code> 时变量导致的问题</p><p>我们可以在开源社区的 Issue 中，查到之前的 Report，参见 <a href="https://github.com/golang/go/issues/30310">https://github.com/golang/go/issues/30310</a>，解决方法是可以避免使用 glibc 提供的 DNS lookup 而使用 Go 内置实现的 DNS 来处理。</p><p>在将项目代码构建参数新增 <code>-tags &#39;netgo&#39;</code> 后，问题解决。</p><p>那么这个问题就到词结束了吗？并不是，我们的问题依然存在，到底是什么原因导致我们会出现使用 glibc 的时候有 Segment fault 的发生？</p><p>我们先把我们复现代码最小化</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">package</span> main</span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> (</span><br><span class="line">    <span class="string">&quot;fmt&quot;</span></span><br><span class="line">    <span class="string">&quot;net&quot;</span></span><br><span class="line">)</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">main</span><span class="params">()</span></span> &#123;</span><br><span class="line">    address := <span class="string">&quot;localhost:8080&quot;</span></span><br><span class="line"></span><br><span class="line">    listener, err := net.Listen(<span class="string">&quot;tcp&quot;</span>, address)</span><br><span class="line">    <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">        fmt.Println(<span class="string">&quot;Error creating listener:&quot;</span>, err)</span><br><span class="line">        <span class="keyword">return</span></span><br><span class="line">    &#125;</span><br><span class="line">    <span class="keyword">defer</span> listener.Close()</span><br><span class="line"></span><br><span class="line">    fmt.Printf(<span class="string">&quot;Listening on %s\n&quot;</span>, address)</span><br><span class="line"></span><br><span class="line">    <span class="keyword">for</span> &#123;</span><br><span class="line">        conn, err := listener.Accept()</span><br><span class="line">        <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">            fmt.Println(<span class="string">&quot;Error accepting connection:&quot;</span>, err)</span><br><span class="line">            <span class="keyword">continue</span></span><br><span class="line">        &#125;</span><br><span class="line"></span><br><span class="line">        <span class="keyword">go</span> handleConnection(conn)</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">handleConnection</span><span class="params">(conn net.Conn)</span></span> &#123;</span><br><span class="line">    <span class="keyword">defer</span> conn.Close()</span><br><span class="line"></span><br><span class="line">    buffer := <span class="built_in">make</span>([]<span class="type">byte</span>, <span class="number">1024</span>)</span><br><span class="line">    n, err := conn.Read(buffer)</span><br><span class="line">    <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">        fmt.Println(<span class="string">&quot;Error reading from connection:&quot;</span>, err)</span><br><span class="line">        <span class="keyword">return</span></span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    fmt.Printf(<span class="string">&quot;Received: %s\n&quot;</span>, <span class="type">string</span>(buffer[:n]))</span><br><span class="line"></span><br><span class="line">    response := <span class="string">&quot;Hello, client!&quot;</span></span><br><span class="line">    _, err = conn.Write([]<span class="type">byte</span>(response))</span><br><span class="line">    <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">        fmt.Println(<span class="string">&quot;Error writing to connection:&quot;</span>, err)</span><br><span class="line">        <span class="keyword">return</span></span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>我们先使用，<code>CGO_ENABLED=1 go build</code> 来构建复现代码，然后发现，可以在不同环境下运行。而当我们使用 <code>CGO_ENABLED=1 go build -ldflags &quot;-linkmode=external -extldflags -static&quot;</code> 的参数构建的产物则不可以。为什么呢？我们来对比下汇编</p><p>我们能发现在第一种参数构建的代码，其 <code>getaddrinfo</code> 的部分如下</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">00000000004022a0 &lt;getaddrinfo@plt&gt;:</span><br><span class="line">  4022a0:ff 25 aa 3e 1e 00    jmp    *0x1e3eaa(%rip)        # 5e6150 &lt;getaddrinfo@GLIBC_2.2.5&gt;</span><br><span class="line">  4022a6:68 27 00 00 00       push   $0x27</span><br><span class="line">  4022ab:e9 70 fd ff ff       jmp    402020 &lt;_init+0x20&gt;</span><br></pre></td></tr></table></figure><p>哦，熟悉的 PLT 的部分，这一部分是纯动态链接，直接在加载时由链接器来处理。而第二种方式构建的的产物却不一样</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br></pre></td><td class="code"><pre><span class="line">0000000000528fd0 &lt;getaddrinfo&gt;:</span><br><span class="line">  528fd0:f3 0f 1e fa          endbr64</span><br><span class="line">  528fd4:55                   push   %rbp</span><br><span class="line">  528fd5:48 89 e5             mov    %rsp,%rbp</span><br><span class="line">  528fd8:41 57                push   %r15</span><br><span class="line">  528fda:49 89 d7             mov    %rdx,%r15</span><br><span class="line">  528fdd:41 56                push   %r14</span><br><span class="line">  528fdf:41 55                push   %r13</span><br><span class="line">  528fe1:41 54                push   %r12</span><br><span class="line">  528fe3:49 89 f4             mov    %rsi,%r12</span><br><span class="line">  528fe6:53                   push   %rbx</span><br><span class="line">  528fe7:48 81 ec 38 07 00 00 sub    $0x738,%rsp</span><br><span class="line">  528fee:48 89 bd 18 f9 ff ff mov    %rdi,-0x6e8(%rbp)</span><br><span class="line">  528ff5:48 89 8d b0 f8 ff ff mov    %rcx,-0x750(%rbp)</span><br><span class="line">  528ffc:64 48 8b 04 25 28 00 mov    %fs:0x28,%rax</span><br><span class="line">  529003:00 00 </span><br><span class="line">  529005:48 89 45 c8          mov    %rax,-0x38(%rbp)</span><br><span class="line">  529009:31 c0                xor    %eax,%eax</span><br><span class="line">  52900b:48 c7 85 30 f9 ff ff movq   $0x0,-0x6d0(%rbp)</span><br><span class="line">  529012:00 00 00 00 </span><br><span class="line">  529016:48 85 ff             test   %rdi,%rdi</span><br><span class="line">  529019:0f 84 3a 08 00 00    je     529859 &lt;getaddrinfo+0x889&gt;</span><br><span class="line">  52901f:80 3f 2a             cmpb   $0x2a,(%rdi)</span><br><span class="line">  529022:0f 84 27 08 00 00    je     52984f &lt;getaddrinfo+0x87f&gt;</span><br><span class="line">  529028:4d 85 e4             test   %r12,%r12</span><br><span class="line">  52902b:74 0b                je     529038 &lt;getaddrinfo+0x68&gt;</span><br><span class="line">  52902d:41 80 3c 24 2a       cmpb   $0x2a,(%r12)</span><br><span class="line">  529032:0f 84 7c 0b 00 00    je     529bb4 &lt;getaddrinfo+0xbe4&gt;</span><br><span class="line">  529038:4d 85 ff             test   %r15,%r15</span><br><span class="line">  52903b:0f 84 4f 08 00 00    je     529890 &lt;getaddrinfo+0x8c0&gt;</span><br><span class="line">  529041:41 8b 07             mov    (%r15),%eax</span><br><span class="line">  529044:a9 00 f8 ff ff       test   $0xfffff800,%eax</span><br><span class="line">  529049:0f 85 6d 19 00 00    jne    52a9bc &lt;getaddrinfo+0x19ec&gt;</span><br><span class="line">  52904f:48 83 bd 18 f9 ff ff cmpq   $0x0,-0x6e8(%rbp)</span><br></pre></td></tr></table></figure><p>这里省略了很多的汇编，我们可以结合 GDB 的调试来看一下关键信息</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line">#0  0x00007fffb0044696 in __GI___ctype_init () at ctype-info.c:31</span><br><span class="line">#1  0x00007fffb01785d1 in __libc_early_init (initial=false) at libc_early_init.c:35</span><br><span class="line">#2  0x000000000059549f in dl_open_worker_begin ()</span><br><span class="line">#3  0x000000000054a5e8 in _dl_catch_exception ()</span><br><span class="line">#4  0x0000000000594669 in dl_open_worker ()</span><br><span class="line">#5  0x000000000054a5e8 in _dl_catch_exception ()</span><br><span class="line">#6  0x0000000000594a1b in _dl_open ()</span><br><span class="line">#7  0x000000000054a726 in do_dlopen ()</span><br><span class="line">#8  0x000000000054a5e8 in _dl_catch_exception ()</span><br><span class="line">#9  0x000000000054a6b3 in _dl_catch_error ()</span><br><span class="line">#10 0x000000000054a8a4 in __libc_dlopen_mode ()</span><br><span class="line">#11 0x0000000000538ce5 in module_load ()</span><br><span class="line">#12 0x0000000000539145 in __nss_module_get_function ()</span><br><span class="line">#13 0x000000000052aa3c in getaddrinfo ()</span><br><span class="line">#14 0x00000000004da549 in _cgo_04fbb8f65a5f_C2func_getaddrinfo (v=0xc0001acdd0) at /tmp/go-build/cgo-gcc-prolog:60</span><br><span class="line">#15 0x0000000000471204 in runtime.asmcgocall () at /root/.go/src/runtime/asm_amd64.s:923</span><br><span class="line">#16 0x000000c0001868c0 in ?? ()</span><br></pre></td></tr></table></figure><p>我们能看到第二种方式（即使用外部链接器，以静态链接方式进行链接）的背后是会用 <code>dl_open</code> 去处理 glibc 的链接</p><p>我们直接跳转到 <code>__ctype_init</code> 看下源码以及汇编，这里第一段汇编是在 Glibc 2.35 下编译产物，第二段是在 Arch Linux 下的 Glibc 2.40 下编译的产物</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="type">void</span></span><br><span class="line">__ctype_init (<span class="type">void</span>)</span><br><span class="line">&#123;</span><br><span class="line">  <span class="type">const</span> <span class="type">uint16_t</span> **bp = __libc_tsd_address (<span class="type">const</span> <span class="type">uint16_t</span> *, CTYPE_B);</span><br><span class="line">  *bp = (<span class="type">const</span> <span class="type">uint16_t</span> *) _NL_CURRENT (LC_CTYPE, _NL_CTYPE_CLASS) + <span class="number">128</span>;</span><br><span class="line">  <span class="type">const</span> <span class="type">int32_t</span> **up = __libc_tsd_address (<span class="type">const</span> <span class="type">int32_t</span> *, CTYPE_TOUPPER);</span><br><span class="line">  *up = ((<span class="type">int32_t</span> *) _NL_CURRENT (LC_CTYPE, _NL_CTYPE_TOUPPER) + <span class="number">128</span>);</span><br><span class="line">  <span class="type">const</span> <span class="type">int32_t</span> **lp = __libc_tsd_address (<span class="type">const</span> <span class="type">int32_t</span> *, CTYPE_TOLOWER);</span><br><span class="line">  *lp = ((<span class="type">int32_t</span> *) _NL_CURRENT (LC_CTYPE, _NL_CTYPE_TOLOWER) + <span class="number">128</span>);</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br></pre></td><td class="code"><pre><span class="line">000000000055aee0 &lt;__ctype_init&gt;:</span><br><span class="line">  55aee0:f3 0f 1e fa          endbr64</span><br><span class="line">  55aee4:48 c7 c0 80 ff ff ff mov    $0xffffffffffffff80,%rax</span><br><span class="line">  55aeeb:48 c7 c1 f0 ff ff ff mov    $0xfffffffffffffff0,%rcx</span><br><span class="line">  55aef2:64 48 8b 00          mov    %fs:(%rax),%rax</span><br><span class="line">  55aef6:48 8b 00             mov    (%rax),%rax</span><br><span class="line">  55aef9:48 8b 70 40          mov    0x40(%rax),%rsi</span><br><span class="line">  55aefd:48 8d 96 00 01 00 00 lea    0x100(%rsi),%rdx</span><br><span class="line">  55af04:64 48 89 11          mov    %rdx,%fs:(%rcx)</span><br><span class="line">  55af08:48 8b 78 48          mov    0x48(%rax),%rdi</span><br><span class="line">  55af0c:48 c7 c1 e8 ff ff ff mov    $0xffffffffffffffe8,%rcx</span><br><span class="line">  55af13:48 8d 97 00 02 00 00 lea    0x200(%rdi),%rdx</span><br><span class="line">  55af1a:64 48 89 11          mov    %rdx,%fs:(%rcx)</span><br><span class="line">  55af1e:48 8b 40 58          mov    0x58(%rax),%rax</span><br><span class="line">  55af22:48 c7 c2 e0 ff ff ff mov    $0xffffffffffffffe0,%rdx</span><br><span class="line">  55af29:48 05 00 02 00 00    add    $0x200,%rax</span><br><span class="line">  55af2f:64 48 89 02          mov    %rax,%fs:(%rdx)</span><br><span class="line">  55af33:c3                   ret</span><br><span class="line">  55af34:66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1)</span><br><span class="line">  55af3b:00 00 00 </span><br><span class="line">  55af3e:66 90                xchg   %ax,%ax</span><br></pre></td></tr></table></figure><p>第二段</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br></pre></td><td class="code"><pre><span class="line">000000000111c270 &lt;__ctype_init&gt;:</span><br><span class="line"> 111c270:f3 0f 1e fa          endbr64</span><br><span class="line"> 111c274:48 c7 c0 90 ff ff ff mov    $0xffffffffffffff90,%rax</span><br><span class="line"> 111c27b:48 c7 c1 e8 ff ff ff mov    $0xffffffffffffffe8,%rcx</span><br><span class="line"> 111c282:64 48 8b 00          mov    %fs:(%rax),%rax</span><br><span class="line"> 111c286:48 8b 00             mov    (%rax),%rax</span><br><span class="line"> 111c289:48 8b 70 38          mov    0x38(%rax),%rsi</span><br><span class="line"> 111c28d:48 8d 96 00 01 00 00 lea    0x100(%rsi),%rdx</span><br><span class="line"> 111c294:64 48 89 11          mov    %rdx,%fs:(%rcx)</span><br><span class="line"> 111c298:48 8b 78 40          mov    0x40(%rax),%rdi</span><br><span class="line"> 111c29c:48 c7 c1 e0 ff ff ff mov    $0xffffffffffffffe0,%rcx</span><br><span class="line"> 111c2a3:48 8d 97 00 02 00 00 lea    0x200(%rdi),%rdx</span><br><span class="line"> 111c2aa:64 48 89 11          mov    %rdx,%fs:(%rcx)</span><br><span class="line"> 111c2ae:48 8b 40 50          mov    0x50(%rax),%rax</span><br><span class="line"> 111c2b2:48 c7 c2 d8 ff ff ff mov    $0xffffffffffffffd8,%rdx</span><br><span class="line"> 111c2b9:48 05 00 02 00 00    add    $0x200,%rax</span><br><span class="line"> 111c2bf:64 48 89 02          mov    %rax,%fs:(%rdx)</span><br><span class="line"> 111c2c3:c3                   ret</span><br><span class="line"> 111c2c4:66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1)</span><br><span class="line"> 111c2cb:00 00 00 </span><br><span class="line"> 111c2ce:66 90                xchg   %ax,%ax</span><br></pre></td></tr></table></figure><p>我们能看到两段代码行为基本一致，但是 offset 存在明显差异。这个时候我们对比一下 Glibc 两个版本的代码的差异</p><p>我们能发现，由于 <code>__locale_data</code> 结构的变化，导致 <code>_NL_CTYPE_CLASS</code> 的 offset 在不同版本下存在偏移</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">//v2.35</span></span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">struct</span> __<span class="title">locale_data</span></span></span><br><span class="line"><span class="class">&#123;</span></span><br><span class="line">  <span class="type">const</span> <span class="type">char</span> *name;</span><br><span class="line">  <span class="type">const</span> <span class="type">char</span> *filedata;<span class="comment">/* Region mapping the file data.  */</span></span><br><span class="line">  <span class="type">off_t</span> filesize;<span class="comment">/* Size of the file (and the region).  */</span></span><br><span class="line">  <span class="class"><span class="keyword">enum</span>/* <span class="title">Flavor</span> <span class="title">of</span> <span class="title">storage</span> <span class="title">used</span> <span class="title">for</span> <span class="title">those</span>.  */</span></span><br><span class="line"><span class="class">  &#123;</span></span><br><span class="line">    ld_malloced,<span class="comment">/* Both are malloc&#x27;d.  */</span></span><br><span class="line">    ld_mapped,<span class="comment">/* name is malloc&#x27;d, filedata mmap&#x27;d */</span></span><br><span class="line">    ld_archive<span class="comment">/* Both point into mmap&#x27;d archive regions.  */</span></span><br><span class="line">  &#125; alloc;</span><br><span class="line"></span><br><span class="line">  <span class="comment">/* This provides a slot for category-specific code to cache data computed</span></span><br><span class="line"><span class="comment">     about this locale.  That code can set a cleanup function to deallocate</span></span><br><span class="line"><span class="comment">     the data.  */</span></span><br><span class="line">  <span class="class"><span class="keyword">struct</span></span></span><br><span class="line"><span class="class">  &#123;</span></span><br><span class="line">    <span class="type">void</span> (*cleanup) (<span class="keyword">struct</span> __locale_data *);</span><br><span class="line">    <span class="class"><span class="keyword">union</span></span></span><br><span class="line"><span class="class">    &#123;</span></span><br><span class="line">      <span class="type">void</span> *data;</span><br><span class="line">      <span class="class"><span class="keyword">struct</span> <span class="title">lc_time_data</span> *<span class="title">time</span>;</span></span><br><span class="line">      <span class="type">const</span> <span class="class"><span class="keyword">struct</span> <span class="title">gconv_fcts</span> *<span class="title">ctype</span>;</span></span><br><span class="line">    &#125;;</span><br><span class="line">  &#125; private;</span><br><span class="line"></span><br><span class="line">  <span class="type">unsigned</span> <span class="type">int</span> usage_count;<span class="comment">/* Counter for users.  */</span></span><br><span class="line"></span><br><span class="line">  <span class="type">int</span> use_translit;<span class="comment">/* Nonzero if the mb*towv*() and wc*tomb()</span></span><br><span class="line"><span class="comment">   functions should use transliteration.  */</span></span><br><span class="line"></span><br><span class="line">  <span class="type">unsigned</span> <span class="type">int</span> nstrings;<span class="comment">/* Number of strings below.  */</span></span><br><span class="line">  <span class="class"><span class="keyword">union</span> <span class="title">locale_data_value</span></span></span><br><span class="line"><span class="class">  &#123;</span></span><br><span class="line">    <span class="type">const</span> <span class="type">uint32_t</span> *wstr;</span><br><span class="line">    <span class="type">const</span> <span class="type">char</span> *<span class="built_in">string</span>;</span><br><span class="line">    <span class="type">unsigned</span> <span class="type">int</span> word;<span class="comment">/* Note endian issues vs 64-bit pointers.  */</span></span><br><span class="line">  &#125;</span><br><span class="line">  values __flexarr;<span class="comment">/* Items, usually pointers into `filedata&#x27;.  */</span></span><br><span class="line">&#125;;</span><br><span class="line"></span><br><span class="line"><span class="comment">//v2.40</span></span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">struct</span> __<span class="title">locale_data</span></span></span><br><span class="line"><span class="class">&#123;</span></span><br><span class="line">  <span class="type">const</span> <span class="type">char</span> *name;</span><br><span class="line">  <span class="type">const</span> <span class="type">char</span> *filedata;<span class="comment">/* Region mapping the file data.  */</span></span><br><span class="line">  <span class="type">off_t</span> filesize;<span class="comment">/* Size of the file (and the region).  */</span></span><br><span class="line">  <span class="class"><span class="keyword">enum</span>/* <span class="title">Flavor</span> <span class="title">of</span> <span class="title">storage</span> <span class="title">used</span> <span class="title">for</span> <span class="title">those</span>.  */</span></span><br><span class="line"><span class="class">  &#123;</span></span><br><span class="line">    ld_malloced,<span class="comment">/* Both are malloc&#x27;d.  */</span></span><br><span class="line">    ld_mapped,<span class="comment">/* name is malloc&#x27;d, filedata mmap&#x27;d */</span></span><br><span class="line">    ld_archive<span class="comment">/* Both point into mmap&#x27;d archive regions.  */</span></span><br><span class="line">  &#125; alloc;</span><br><span class="line"></span><br><span class="line">  <span class="comment">/* This provides a slot for category-specific code to cache data</span></span><br><span class="line"><span class="comment">     computed about this locale.  Type of the data pointed to:</span></span><br><span class="line"><span class="comment"></span></span><br><span class="line"><span class="comment">     LC_CTYPE   struct lc_ctype_data (_nl_intern_locale_data)</span></span><br><span class="line"><span class="comment">     LC_TIME    struct lc_time_data (_nl_init_alt_digit, _nl_init_era_entries)</span></span><br><span class="line"><span class="comment"></span></span><br><span class="line"><span class="comment">     This data deallocated at the start of _nl_unload_locale.  */</span></span><br><span class="line">  <span class="type">void</span> *private;</span><br><span class="line"></span><br><span class="line">  <span class="type">unsigned</span> <span class="type">int</span> usage_count;<span class="comment">/* Counter for users.  */</span></span><br><span class="line"></span><br><span class="line">  <span class="type">int</span> use_translit;<span class="comment">/* Nonzero if the mb*towv*() and wc*tomb()</span></span><br><span class="line"><span class="comment">   functions should use transliteration.  */</span></span><br><span class="line"></span><br><span class="line">  <span class="type">unsigned</span> <span class="type">int</span> nstrings;<span class="comment">/* Number of strings below.  */</span></span><br><span class="line">  <span class="class"><span class="keyword">union</span> <span class="title">locale_data_value</span></span></span><br><span class="line"><span class="class">  &#123;</span></span><br><span class="line">    <span class="type">const</span> <span class="type">uint32_t</span> *wstr;</span><br><span class="line">    <span class="type">const</span> <span class="type">char</span> *<span class="built_in">string</span>;</span><br><span class="line">    <span class="type">unsigned</span> <span class="type">int</span> word;<span class="comment">/* Note endian issues vs 64-bit pointers.  */</span></span><br><span class="line">  &#125;</span><br><span class="line">  values __flexarr;<span class="comment">/* Items, usually pointers into `filedata&#x27;.  */</span></span><br><span class="line">&#125;;</span><br></pre></td></tr></table></figure><p>那么我们问题的 Root cause 也就得到了确定，整个问题的因果链如下</p><ol><li>我们项目使用引入 Gin，来作为 HTTP Server</li><li>我们使用 localhost 来作为默认的监听地址</li><li>localhost 在服务端启动监听的时候触发了 DNS Lookup 行为</li><li>CGO_ENABLED=1 的情况下，Golang 默认使用 glibc 中的 <code>getaddrinfo</code> 进行 DNS lookup</li><li>我们项目开启了 <code>-ldflags &quot;-linkmode=external -extldflags -static&quot;</code>，即使用外部链接器，以静态链接方式进行链接），将会使用 <code>dl_open</code> 来处理 glibc，而且这种情况下，<code>__ctype_init</code> 这类方法将会被静态编译至二进制中</li><li>Glibc 中特定字段不同版本的 offset 不一致</li><li>结合 4&amp;5&amp;6, 我们在 Glibc 2.35 （即文中默认的构建机）静态编译后的产物，因为 offset 不一致，在 Glibc 2.40 （即 Arch Linux）下使用时，会出现 segment fault</li></ol><p>问题得证</p><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><p>这个问题变更只有一行，但是查了我很久的时间，反复在 Go 和 Glibc 的源码中横跳。顺便还去复习了 Linker 的很多知识</p><p>这某种意义上是我很喜欢这个行业的原因，因为我们所遇到的每个问题背后的风景，都很值得一看。</p>]]></content>
    
    
    <summary type="html">&lt;p&gt;Debug 日志系列第二篇，eCapture 的 GH-604， 一个和 Go， Glibc，静态编译相关的问题&lt;/p&gt;
&lt;p&gt;太长不看版：在 eCapture 中，由于在静态链接时 glibc 版本的差异，导致在 Ubuntu 下编译的二进制会在特定发行版上 Segment fault&lt;/p&gt;</summary>
    
    
    
    <category term="编程" scheme="https://www.manjusaka.blog/categories/%E7%BC%96%E7%A8%8B/"/>
    
    <category term="Linux" scheme="https://www.manjusaka.blog/categories/%E7%BC%96%E7%A8%8B/Linux/"/>
    
    
    <category term="编程" scheme="https://www.manjusaka.blog/tags/%E7%BC%96%E7%A8%8B/"/>
    
    <category term="Linux" scheme="https://www.manjusaka.blog/tags/Linux/"/>
    
    <category term="笔记" scheme="https://www.manjusaka.blog/tags/%E7%AC%94%E8%AE%B0/"/>
    
    <category term="水文" scheme="https://www.manjusaka.blog/tags/%E6%B0%B4%E6%96%87/"/>
    
  </entry>
  
  <entry>
    <title>2024 年了，是 Gevent 还是选择 asyncio Part 1？</title>
    <link href="https://www.manjusaka.blog/posts/2024/08/19/benchmark-for-python-web-framework-2024-part1-cn/"/>
    <id>https://www.manjusaka.blog/posts/2024/08/19/benchmark-for-python-web-framework-2024-part1-cn/</id>
    <published>2024-08-19T17:00:00.000Z</published>
    <updated>2026-03-29T17:00:43.280Z</updated>
    
    <content type="html"><![CDATA[<script src="/assets/js/DPlayer.min.js"> </script><p>Gevent 还是 asyncio 这一直是个经典的问题，在这里我们直接用数据来帮助大家做一下决策</p><span id="more"></span><h2 id="开篇"><a href="#开篇" class="headerlink" title="开篇"></a>开篇</h2><p>Lin Wei 老师珠玉在前</p><p><img src="https://i.imgur.com/Jk2ubDY.png" alt="HiRedis"></p><p>给出了 asyncio 和 Gevnet 的极限性能。 在这里我们看到了 asyncio 配合 uvloop 基本上是 Gevent 的 double 了</p><p>那么在在 Web 框架下是否如此呢？</p><p>我们来做一下实验吧</p><p>首先说一下负载机器的配置，这里我选用了 Azure 上 D8as_v5 的机器，该机器配置如下：</p><ol><li>8Core32G 的配置</li><li>底座硬件基于 EPYC 7763 系列处理器</li><li>共计4个节点，分配给 Django/Flast/FastAPI/Starlette 四个不同的框架</li></ol><p>我们压测框架选择 locust，同样基于 Kuberntes 集群，因为我账户的 D8as_v5 机器的 Quota 不太够，所以压测框架我们选了不同机器的混合部署</p><ol><li>4个 D8as_v5，共计 32 Core 算力</li><li>4个 D8as_v3，共计 32 Core 算力</li><li>4个 D4as_v2，共计 16 Core 算力</li></ol><p>我们测试的主要目的是模拟在生产环境下的吞吐，所以我选择的测试方式如下</p><ol><li>准备一台 16Core 64G 的 MySQL 实例，用于存储数据</li><li>创建一张表，随机写入100万数据</li><li>在框架代码中进行 SQL 查询，返回查询结果</li></ol><p>MySQL 表结构如下</p><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">create</span> <span class="keyword">table</span>  if <span class="keyword">not</span> <span class="keyword">exists</span>  `demo_data`</span><br><span class="line">(</span><br><span class="line">    `id`          <span class="type">bigint</span>(<span class="number">20</span>)   <span class="keyword">not</span> <span class="keyword">null</span> auto_increment,</span><br><span class="line">    `name`        <span class="type">varchar</span>(<span class="number">255</span>) <span class="keyword">not</span> <span class="keyword">null</span>,</span><br><span class="line">    `create_time` <span class="type">timestamp</span> <span class="keyword">default</span> <span class="built_in">CURRENT_TIMESTAMP</span>,</span><br><span class="line">    `update_time` <span class="type">timestamp</span> <span class="keyword">default</span> <span class="built_in">CURRENT_TIMESTAMP</span>,</span><br><span class="line">    <span class="keyword">primary</span> key (`id`),</span><br><span class="line">    index (`name`)</span><br><span class="line">) charset <span class="operator">=</span> utf8mb4</span><br><span class="line">  engine <span class="operator">=</span> innodb;</span><br></pre></td></tr></table></figure><p>Django 代码如下</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> random</span><br><span class="line"></span><br><span class="line"><span class="keyword">from</span> django.core <span class="keyword">import</span> serializers</span><br><span class="line"><span class="keyword">from</span> django.shortcuts <span class="keyword">import</span> HttpResponse</span><br><span class="line"></span><br><span class="line"><span class="keyword">from</span> .models <span class="keyword">import</span> DemoData</span><br><span class="line"></span><br><span class="line">TEMP = <span class="string">&quot;1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!@#$%^&amp;*()_+=-&quot;</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="comment"># Create your views here.</span></span><br><span class="line"><span class="keyword">def</span> <span class="title function_">demo_views</span>(<span class="params">request</span>):</span><br><span class="line">    result = DemoData.objects.<span class="built_in">filter</span>(</span><br><span class="line">        name=<span class="string">&quot;&quot;</span>.join(random.choices(TEMP, k=random.randrange(<span class="number">1</span>, <span class="number">254</span>)))</span><br><span class="line">    )</span><br><span class="line">    <span class="comment"># x = json.dumps(request.body)</span></span><br><span class="line">    <span class="keyword">return</span> HttpResponse(</span><br><span class="line">        serializers.serialize(<span class="string">&quot;json&quot;</span>, result <span class="keyword">if</span> result <span class="keyword">else</span> []),</span><br><span class="line">        content_type=<span class="string">&quot;application/json&quot;</span>,</span><br><span class="line">    )</span><br></pre></td></tr></table></figure><p>Flask 代码如下</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> json</span><br><span class="line"><span class="keyword">import</span> random</span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> os</span><br><span class="line"><span class="keyword">import</span> dataset</span><br><span class="line"><span class="keyword">from</span> flask <span class="keyword">import</span> Flask, Response</span><br><span class="line"></span><br><span class="line">app = Flask(__name__)</span><br><span class="line"></span><br><span class="line">DATABASE_URL = <span class="string">f&quot;mysql://<span class="subst">&#123;os.getenv(<span class="string">&#x27;DATABASE_USER&#x27;</span>)&#125;</span>:<span class="subst">&#123;os.getenv(<span class="string">&#x27;DATABASE_PASSWORD&#x27;</span>)&#125;</span>@<span class="subst">&#123;os.getenv(<span class="string">&#x27;DATABASE_HOST&#x27;</span>)&#125;</span>:3306/demo&quot;</span></span><br><span class="line">db = dataset.connect(DATABASE_URL, engine_kwargs=&#123;<span class="string">&quot;pool_size&quot;</span>: <span class="number">10000</span>&#125;)</span><br><span class="line"></span><br><span class="line">TEMP = <span class="string">&quot;1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!@#$%^&amp;*()_+=-&quot;</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="meta">@app.route(<span class="params"><span class="string">&quot;/demo&quot;</span>, methods=[<span class="string">&quot;GET&quot;</span>]</span>)</span></span><br><span class="line"><span class="keyword">def</span> <span class="title function_">demo_code</span>():</span><br><span class="line">    <span class="keyword">return</span> Response(</span><br><span class="line">        response=json.dumps(</span><br><span class="line">            <span class="built_in">list</span>(</span><br><span class="line">                db.query(</span><br><span class="line">                    <span class="string">f&quot;select * from demo_data where name=&#x27;<span class="subst">&#123;<span class="string">&#x27;&#x27;</span>.join(random.choices(TEMP, k=random.randrange(<span class="number">1</span>, <span class="number">254</span>)))&#125;</span>&#x27;&quot;</span></span><br><span class="line">                )</span><br><span class="line">            ),</span><br><span class="line">            default=<span class="built_in">str</span></span><br><span class="line">        ),</span><br><span class="line">        status=<span class="number">200</span>,</span><br><span class="line">        content_type=<span class="string">&quot;application/json&quot;</span>,</span><br><span class="line">    )</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> __name__ == <span class="string">&quot;__main__&quot;</span>:</span><br><span class="line">    app.run(debug=<span class="literal">True</span>)</span><br></pre></td></tr></table></figure><p>FastAPI 代码如下</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> random</span><br><span class="line"><span class="keyword">import</span> os</span><br><span class="line"><span class="keyword">from</span> typing <span class="keyword">import</span> <span class="type">List</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> databases</span><br><span class="line"><span class="keyword">import</span> pymysql</span><br><span class="line"><span class="keyword">import</span> sqlalchemy</span><br><span class="line"><span class="keyword">import</span> json</span><br><span class="line"><span class="keyword">from</span> fastapi <span class="keyword">import</span> FastAPI</span><br><span class="line"><span class="keyword">from</span> fastapi.responses <span class="keyword">import</span> Response</span><br><span class="line"><span class="keyword">from</span> pydantic <span class="keyword">import</span> BaseModel</span><br><span class="line"></span><br><span class="line">pymysql.install_as_MySQLdb()</span><br><span class="line"></span><br><span class="line">AYSNC_DATABASE_URL = <span class="string">f&quot;mysql+aiomysql://<span class="subst">&#123;os.getenv(<span class="string">&#x27;DATABASE_USER&#x27;</span>)&#125;</span>:<span class="subst">&#123;os.getenv(<span class="string">&#x27;DATABASE_PASSWORD&#x27;</span>)&#125;</span>@<span class="subst">&#123;os.getenv(<span class="string">&#x27;DATABASE_HOST&#x27;</span>)&#125;</span>:3306/demo&quot;</span></span><br><span class="line">SYNC_DATABASE_URL = <span class="string">f&quot;mysql+mysqldb://<span class="subst">&#123;os.getenv(<span class="string">&#x27;DATABASE_USER&#x27;</span>)&#125;</span>:<span class="subst">&#123;os.getenv(<span class="string">&#x27;DATABASE_PASSWORD&#x27;</span>)&#125;</span>@<span class="subst">&#123;os.getenv(<span class="string">&#x27;DATABASE_HOST&#x27;</span>)&#125;</span>:3306/demo&quot;</span></span><br><span class="line"></span><br><span class="line">database = databases.Database(AYSNC_DATABASE_URL, max_size=<span class="number">10000</span>)</span><br><span class="line"></span><br><span class="line">metadata = sqlalchemy.MetaData()</span><br><span class="line"></span><br><span class="line">demo_data = sqlalchemy.Table(</span><br><span class="line">    <span class="string">&quot;demo_data&quot;</span>,</span><br><span class="line">    metadata,</span><br><span class="line">    sqlalchemy.Column(<span class="string">&quot;id&quot;</span>, sqlalchemy.Integer, primary_key=<span class="literal">True</span>),</span><br><span class="line">    sqlalchemy.Column(<span class="string">&quot;name&quot;</span>, sqlalchemy.String),</span><br><span class="line">    sqlalchemy.Column(<span class="string">&quot;create_time&quot;</span>, sqlalchemy.DATETIME),</span><br><span class="line">    sqlalchemy.Column(<span class="string">&quot;update_time&quot;</span>, sqlalchemy.DATETIME),</span><br><span class="line">)</span><br><span class="line">engine = sqlalchemy.create_engine(SYNC_DATABASE_URL)</span><br><span class="line">metadata.create_all(engine)</span><br><span class="line">TEMP = <span class="string">&quot;1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!@#$%^&amp;*()_+=-&quot;</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">DemoData</span>(<span class="title class_ inherited__">BaseModel</span>):</span><br><span class="line">    <span class="built_in">id</span>: <span class="built_in">int</span></span><br><span class="line">    name: <span class="built_in">str</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line">app = FastAPI()</span><br><span class="line"></span><br><span class="line">init = <span class="literal">False</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="meta">@app.get(<span class="params"><span class="string">&quot;/demo&quot;</span>, response_model=<span class="type">List</span>[DemoData]</span>)</span></span><br><span class="line"><span class="keyword">async</span> <span class="keyword">def</span> <span class="title function_">demo_code</span>():</span><br><span class="line">    <span class="keyword">global</span> init</span><br><span class="line">    <span class="keyword">if</span> <span class="keyword">not</span> init:</span><br><span class="line">        <span class="keyword">await</span> database.connect()</span><br><span class="line">        init = <span class="literal">True</span></span><br><span class="line"></span><br><span class="line">    query = demo_data.select().where(</span><br><span class="line">        demo_data.c.name == <span class="string">&quot;&quot;</span>.join(random.choices(TEMP, k=random.randrange(<span class="number">1</span>, <span class="number">254</span>)))</span><br><span class="line">    )</span><br><span class="line">    data = <span class="keyword">await</span> database.fetch_all(query)</span><br><span class="line">    response = json.dumps(data, default=<span class="built_in">str</span>)</span><br><span class="line">    <span class="keyword">return</span> Response(content=response, status_code=<span class="number">200</span>, media_type=<span class="string">&quot;application/json&quot;</span>)</span><br></pre></td></tr></table></figure><p>Starlette 代码如下</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> random</span><br><span class="line"><span class="keyword">import</span> os</span><br><span class="line"><span class="keyword">from</span> typing <span class="keyword">import</span> <span class="type">List</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> databases</span><br><span class="line"><span class="keyword">import</span> pymysql</span><br><span class="line"><span class="keyword">import</span> json</span><br><span class="line"><span class="keyword">import</span> sqlalchemy</span><br><span class="line"><span class="keyword">from</span> starlette.applications <span class="keyword">import</span> Starlette</span><br><span class="line"><span class="keyword">from</span> starlette.responses <span class="keyword">import</span> Response</span><br><span class="line"><span class="keyword">from</span> starlette.routing <span class="keyword">import</span> Route</span><br><span class="line"><span class="keyword">from</span> pydantic <span class="keyword">import</span> BaseModel</span><br><span class="line"></span><br><span class="line">pymysql.install_as_MySQLdb()</span><br><span class="line"></span><br><span class="line">AYSNC_DATABASE_URL = <span class="string">f&quot;mysql+aiomysql://<span class="subst">&#123;os.getenv(<span class="string">&#x27;DATABASE_USER&#x27;</span>)&#125;</span>:<span class="subst">&#123;os.getenv(<span class="string">&#x27;DATABASE_PASSWORD&#x27;</span>)&#125;</span>@<span class="subst">&#123;os.getenv(<span class="string">&#x27;DATABASE_HOST&#x27;</span>)&#125;</span>:3306/demo&quot;</span></span><br><span class="line">SYNC_DATABASE_URL = <span class="string">f&quot;mysql+mysqldb://<span class="subst">&#123;os.getenv(<span class="string">&#x27;DATABASE_USER&#x27;</span>)&#125;</span>:<span class="subst">&#123;os.getenv(<span class="string">&#x27;DATABASE_PASSWORD&#x27;</span>)&#125;</span>@<span class="subst">&#123;os.getenv(<span class="string">&#x27;DATABASE_HOST&#x27;</span>)&#125;</span>:3306/demo&quot;</span></span><br><span class="line"></span><br><span class="line">database = databases.Database(AYSNC_DATABASE_URL, max_size=<span class="number">10000</span>)</span><br><span class="line"></span><br><span class="line">metadata = sqlalchemy.MetaData()</span><br><span class="line"></span><br><span class="line">demo_data = sqlalchemy.Table(</span><br><span class="line">    <span class="string">&quot;demo_data&quot;</span>,</span><br><span class="line">    metadata,</span><br><span class="line">    sqlalchemy.Column(<span class="string">&quot;id&quot;</span>, sqlalchemy.Integer, primary_key=<span class="literal">True</span>),</span><br><span class="line">    sqlalchemy.Column(<span class="string">&quot;name&quot;</span>, sqlalchemy.String),</span><br><span class="line">    sqlalchemy.Column(<span class="string">&quot;create_time&quot;</span>, sqlalchemy.DATETIME),</span><br><span class="line">    sqlalchemy.Column(<span class="string">&quot;update_time&quot;</span>, sqlalchemy.DATETIME),</span><br><span class="line">)</span><br><span class="line">engine = sqlalchemy.create_engine(SYNC_DATABASE_URL)</span><br><span class="line">metadata.create_all(engine)</span><br><span class="line">TEMP = <span class="string">&quot;1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!@#$%^&amp;*()_+=-&quot;</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">DemoData</span>(<span class="title class_ inherited__">BaseModel</span>):</span><br><span class="line">    <span class="built_in">id</span>: <span class="built_in">int</span></span><br><span class="line">    name: <span class="built_in">str</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line">init = <span class="literal">False</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">async</span> <span class="keyword">def</span> <span class="title function_">demo_code</span>(<span class="params">request</span>):</span><br><span class="line">    <span class="keyword">global</span> init</span><br><span class="line">    <span class="keyword">if</span> <span class="keyword">not</span> init:</span><br><span class="line">        <span class="keyword">await</span> database.connect()</span><br><span class="line">        init = <span class="literal">True</span></span><br><span class="line"></span><br><span class="line">    query = demo_data.select().where(</span><br><span class="line">        demo_data.c.name == <span class="string">&quot;&quot;</span>.join(random.choices(TEMP, k=random.randrange(<span class="number">1</span>, <span class="number">254</span>)))</span><br><span class="line">    )</span><br><span class="line">    data = <span class="keyword">await</span> database.fetch_all(query)</span><br><span class="line">    <span class="keyword">return</span> Response(content=json.dumps(data, default=<span class="built_in">str</span>), status_code=<span class="number">200</span>, media_type=<span class="string">&quot;application/json&quot;</span>)</span><br><span class="line"></span><br><span class="line">routes = [</span><br><span class="line">    Route(<span class="string">&quot;/demo&quot;</span>, demo_code, methods=[<span class="string">&quot;GET&quot;</span>]),</span><br><span class="line">]</span><br><span class="line"></span><br><span class="line">app = Starlette(debug=<span class="literal">False</span>, routes=routes)</span><br></pre></td></tr></table></figure><p>然后部署方式如下</p><ol><li>各服务都部署在 K8S 上，POD 类型为 Guaranteed</li><li>所有镜像都基于 3.12 构建</li><li>服务限制 6Core 的 CPU</li><li>Django 和 Flask 基于 Gevent + Gunicorn 进行部署，利用 Greenify 对二进制进行 Patch</li><li>FastAPI 和 Starlette 基于 uvicorn 进行部署，使用 uvloop 作为 event loop</li></ol><p>OK， 我们现在来公布测试结果</p><h2 id="标准操作下的测试结果"><a href="#标准操作下的测试结果" class="headerlink" title="标准操作下的测试结果"></a>标准操作下的测试结果</h2><p>Django:</p><p><img src="https://i.imgur.com/28P4bcT.png" alt="django"></p><p>FastAPI</p><p><img src="https://i.imgur.com/T1xiYZe.png" alt="FastAPI"></p><p>Flask</p><p><img src="https://i.imgur.com/mUkzLNf.png" alt="Flask"></p><p>Starlette </p><p><img src="https://i.imgur.com/8Fu8vST.png" alt="Starlette"></p><p>Django 毫无疑问的最后，其余三者的性能是 Flask + Gevent &gt; Starlette &gt; FastAPI，后三个框架 CPU 占用率均 &gt; 90%</p><h2 id="空转测试"><a href="#空转测试" class="headerlink" title="空转测试"></a>空转测试</h2><p>为了保险起见，我们将后续三个框架进行空转测试</p><p>Flask</p><p><img src="https://i.imgur.com/9DjHr00.png" alt="Flask"></p><p>FastAPI</p><p><img src="https://i.imgur.com/4hq7gqo.png" alt="FastAPI"></p><p>Starlette</p><p><img src="https://i.imgur.com/Pugbi7M.png" alt="Starlette"></p><p>Starlette &gt; FastAPI &gt; Flask + Gevent</p><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><p>目前来看，整体结论是这样</p><ol><li>在空转情况下，asyncio 的性能要搞出 Gevent 不少，加上框架因素后，也有百分之10-20% 的提升</li><li>在 ORM + MySQL Driver 的情况下，Gevent 的生态要好于 asyncio 的生态</li></ol><p>如果换成 ORM + PGSQL 的生态结论会不会更好一些呢？有点期待下一轮测试的结果</p>]]></content>
    
    
    <summary type="html">&lt;p&gt;Gevent 还是 asyncio 这一直是个经典的问题，在这里我们直接用数据来帮助大家做一下决策&lt;/p&gt;</summary>
    
    
    
    <category term="编程" scheme="https://www.manjusaka.blog/categories/%E7%BC%96%E7%A8%8B/"/>
    
    <category term="CPython" scheme="https://www.manjusaka.blog/categories/%E7%BC%96%E7%A8%8B/CPython/"/>
    
    
    <category term="编程" scheme="https://www.manjusaka.blog/tags/%E7%BC%96%E7%A8%8B/"/>
    
    <category term="笔记" scheme="https://www.manjusaka.blog/tags/%E7%AC%94%E8%AE%B0/"/>
    
    <category term="水文" scheme="https://www.manjusaka.blog/tags/%E6%B0%B4%E6%96%87/"/>
    
    <category term="CPython" scheme="https://www.manjusaka.blog/tags/CPython/"/>
    
  </entry>
  
  <entry>
    <title>In 2024, Gevent or asyncio? Part 1</title>
    <link href="https://www.manjusaka.blog/posts/2024/08/19/benchmark-for-python-web-framework-2024-part1-en/"/>
    <id>https://www.manjusaka.blog/posts/2024/08/19/benchmark-for-python-web-framework-2024-part1-en/</id>
    <published>2024-08-19T17:00:00.000Z</published>
    <updated>2026-03-29T17:00:43.280Z</updated>
    
    <content type="html"><![CDATA[<script src="/assets/js/DPlayer.min.js"> </script><p>The choice between Gevent and asyncio has always been a classic question. Here, we’ll use data to help you make a decision.</p><span id="more"></span><h2 id="Introduction"><a href="#Introduction" class="headerlink" title="Introduction"></a>Introduction</h2><p>Professor Lin Wei has set a high standard:</p><p><img src="https://i.imgur.com/Jk2ubDY.png" alt="HiRedis"></p><p>This graph shows the extreme performance of asyncio and Gevent. We can see that asyncio with uvloop is basically double the performance of Gevent.</p><p>But is this the case under web frameworks?</p><p>Let’s conduct an experiment.</p><p>First, let’s talk about the configuration of the load machine. I chose a D8as_v5 machine on Azure with the following configuration:</p><ol><li>8 Core 32GB configuration</li><li>The underlying hardware is based on the EPYC 7763 series processor</li><li>A total of 4 nodes, allocated to Django/Flask/FastAPI/Starlette, four different frameworks</li></ol><p>We chose locust as our load testing framework, also based on a Kubernetes cluster. Because the quota for D8as_v5 machines in my account wasn’t sufficient, we chose a mixed deployment of different machines for the load testing framework:</p><ol><li>4 D8as_v5, totaling 32 Core computing power</li><li>4 D8as_v3, totaling 32 Core computing power</li><li>4 D4as_v2, totaling 16 Core computing power</li></ol><p>Our main purpose for testing is to simulate throughput in a production environment, so I chose the following test method:</p><ol><li>Prepare a 16 Core 64GB MySQL instance for data storage</li><li>Create a table and randomly write 1 million data entries</li><li>Perform SQL queries in the framework code and return the query results</li></ol><p>The MySQL table structure is as follows:</p><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">create</span> <span class="keyword">table</span> if <span class="keyword">not</span> <span class="keyword">exists</span> `demo_data`</span><br><span class="line">(</span><br><span class="line">    `id`          <span class="type">bigint</span>(<span class="number">20</span>)   <span class="keyword">not</span> <span class="keyword">null</span> auto_increment,</span><br><span class="line">    `name`        <span class="type">varchar</span>(<span class="number">255</span>) <span class="keyword">not</span> <span class="keyword">null</span>,</span><br><span class="line">    `create_time` <span class="type">timestamp</span> <span class="keyword">default</span> <span class="built_in">CURRENT_TIMESTAMP</span>,</span><br><span class="line">    `update_time` <span class="type">timestamp</span> <span class="keyword">default</span> <span class="built_in">CURRENT_TIMESTAMP</span>,</span><br><span class="line">    <span class="keyword">primary</span> key (`id`),</span><br><span class="line">    index (`name`)</span><br><span class="line">) charset <span class="operator">=</span> utf8mb4</span><br><span class="line">  engine <span class="operator">=</span> innodb;</span><br></pre></td></tr></table></figure><p>Django code is as follows:</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> random</span><br><span class="line"></span><br><span class="line"><span class="keyword">from</span> django.core <span class="keyword">import</span> serializers</span><br><span class="line"><span class="keyword">from</span> django.shortcuts <span class="keyword">import</span> HttpResponse</span><br><span class="line"></span><br><span class="line"><span class="keyword">from</span> .models <span class="keyword">import</span> DemoData</span><br><span class="line"></span><br><span class="line">TEMP = <span class="string">&quot;1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!@#$%^&amp;*()_+=-&quot;</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="comment"># Create your views here.</span></span><br><span class="line"><span class="keyword">def</span> <span class="title function_">demo_views</span>(<span class="params">request</span>):</span><br><span class="line">    result = DemoData.objects.<span class="built_in">filter</span>(</span><br><span class="line">        name=<span class="string">&quot;&quot;</span>.join(random.choices(TEMP, k=random.randrange(<span class="number">1</span>, <span class="number">254</span>)))</span><br><span class="line">    )</span><br><span class="line">    <span class="comment"># x = json.dumps(request.body)</span></span><br><span class="line">    <span class="keyword">return</span> HttpResponse(</span><br><span class="line">        serializers.serialize(<span class="string">&quot;json&quot;</span>, result <span class="keyword">if</span> result <span class="keyword">else</span> []),</span><br><span class="line">        content_type=<span class="string">&quot;application/json&quot;</span>,</span><br><span class="line">    )</span><br></pre></td></tr></table></figure><p>Flask code is as follows:</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> json</span><br><span class="line"><span class="keyword">import</span> random</span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> os</span><br><span class="line"><span class="keyword">import</span> dataset</span><br><span class="line"><span class="keyword">from</span> flask <span class="keyword">import</span> Flask, Response</span><br><span class="line"></span><br><span class="line">app = Flask(__name__)</span><br><span class="line"></span><br><span class="line">DATABASE_URL = <span class="string">f&quot;mysql://<span class="subst">&#123;os.getenv(<span class="string">&#x27;DATABASE_USER&#x27;</span>)&#125;</span>:<span class="subst">&#123;os.getenv(<span class="string">&#x27;DATABASE_PASSWORD&#x27;</span>)&#125;</span>@<span class="subst">&#123;os.getenv(<span class="string">&#x27;DATABASE_HOST&#x27;</span>)&#125;</span>:3306/demo&quot;</span></span><br><span class="line">db = dataset.connect(DATABASE_URL, engine_kwargs=&#123;<span class="string">&quot;pool_size&quot;</span>: <span class="number">10000</span>&#125;)</span><br><span class="line"></span><br><span class="line">TEMP = <span class="string">&quot;1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!@#$%^&amp;*()_+=-&quot;</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="meta">@app.route(<span class="params"><span class="string">&quot;/demo&quot;</span>, methods=[<span class="string">&quot;GET&quot;</span>]</span>)</span></span><br><span class="line"><span class="keyword">def</span> <span class="title function_">demo_code</span>():</span><br><span class="line">    <span class="keyword">return</span> Response(</span><br><span class="line">        response=json.dumps(</span><br><span class="line">            <span class="built_in">list</span>(</span><br><span class="line">                db.query(</span><br><span class="line">                    <span class="string">f&quot;select * from demo_data where name=&#x27;<span class="subst">&#123;<span class="string">&#x27;&#x27;</span>.join(random.choices(TEMP, k=random.randrange(<span class="number">1</span>, <span class="number">254</span>)))&#125;</span>&#x27;&quot;</span></span><br><span class="line">                )</span><br><span class="line">            ),</span><br><span class="line">            default=<span class="built_in">str</span></span><br><span class="line">        ),</span><br><span class="line">        status=<span class="number">200</span>,</span><br><span class="line">        content_type=<span class="string">&quot;application/json&quot;</span>,</span><br><span class="line">    )</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> __name__ == <span class="string">&quot;__main__&quot;</span>:</span><br><span class="line">    app.run(debug=<span class="literal">True</span>)</span><br></pre></td></tr></table></figure><p>FastAPI code is as follows:</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> random</span><br><span class="line"><span class="keyword">import</span> os</span><br><span class="line"><span class="keyword">from</span> typing <span class="keyword">import</span> <span class="type">List</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> databases</span><br><span class="line"><span class="keyword">import</span> pymysql</span><br><span class="line"><span class="keyword">import</span> sqlalchemy</span><br><span class="line"><span class="keyword">import</span> json</span><br><span class="line"><span class="keyword">from</span> fastapi <span class="keyword">import</span> FastAPI</span><br><span class="line"><span class="keyword">from</span> fastapi.responses <span class="keyword">import</span> Response</span><br><span class="line"><span class="keyword">from</span> pydantic <span class="keyword">import</span> BaseModel</span><br><span class="line"></span><br><span class="line">pymysql.install_as_MySQLdb()</span><br><span class="line"></span><br><span class="line">AYSNC_DATABASE_URL = <span class="string">f&quot;mysql+aiomysql://<span class="subst">&#123;os.getenv(<span class="string">&#x27;DATABASE_USER&#x27;</span>)&#125;</span>:<span class="subst">&#123;os.getenv(<span class="string">&#x27;DATABASE_PASSWORD&#x27;</span>)&#125;</span>@<span class="subst">&#123;os.getenv(<span class="string">&#x27;DATABASE_HOST&#x27;</span>)&#125;</span>:3306/demo&quot;</span></span><br><span class="line">SYNC_DATABASE_URL = <span class="string">f&quot;mysql+mysqldb://<span class="subst">&#123;os.getenv(<span class="string">&#x27;DATABASE_USER&#x27;</span>)&#125;</span>:<span class="subst">&#123;os.getenv(<span class="string">&#x27;DATABASE_PASSWORD&#x27;</span>)&#125;</span>@<span class="subst">&#123;os.getenv(<span class="string">&#x27;DATABASE_HOST&#x27;</span>)&#125;</span>:3306/demo&quot;</span></span><br><span class="line"></span><br><span class="line">database = databases.Database(AYSNC_DATABASE_URL, max_size=<span class="number">10000</span>)</span><br><span class="line"></span><br><span class="line">metadata = sqlalchemy.MetaData()</span><br><span class="line"></span><br><span class="line">demo_data = sqlalchemy.Table(</span><br><span class="line">    <span class="string">&quot;demo_data&quot;</span>,</span><br><span class="line">    metadata,</span><br><span class="line">    sqlalchemy.Column(<span class="string">&quot;id&quot;</span>, sqlalchemy.Integer, primary_key=<span class="literal">True</span>),</span><br><span class="line">    sqlalchemy.Column(<span class="string">&quot;name&quot;</span>, sqlalchemy.String),</span><br><span class="line">    sqlalchemy.Column(<span class="string">&quot;create_time&quot;</span>, sqlalchemy.DATETIME),</span><br><span class="line">    sqlalchemy.Column(<span class="string">&quot;update_time&quot;</span>, sqlalchemy.DATETIME),</span><br><span class="line">)</span><br><span class="line">engine = sqlalchemy.create_engine(SYNC_DATABASE_URL)</span><br><span class="line">metadata.create_all(engine)</span><br><span class="line">TEMP = <span class="string">&quot;1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!@#$%^&amp;*()_+=-&quot;</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">DemoData</span>(<span class="title class_ inherited__">BaseModel</span>):</span><br><span class="line">    <span class="built_in">id</span>: <span class="built_in">int</span></span><br><span class="line">    name: <span class="built_in">str</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line">app = FastAPI()</span><br><span class="line"></span><br><span class="line">init = <span class="literal">False</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="meta">@app.get(<span class="params"><span class="string">&quot;/demo&quot;</span>, response_model=<span class="type">List</span>[DemoData]</span>)</span></span><br><span class="line"><span class="keyword">async</span> <span class="keyword">def</span> <span class="title function_">demo_code</span>():</span><br><span class="line">    <span class="keyword">global</span> init</span><br><span class="line">    <span class="keyword">if</span> <span class="keyword">not</span> init:</span><br><span class="line">        <span class="keyword">await</span> database.connect()</span><br><span class="line">        init = <span class="literal">True</span></span><br><span class="line"></span><br><span class="line">    query = demo_data.select().where(</span><br><span class="line">        demo_data.c.name == <span class="string">&quot;&quot;</span>.join(random.choices(TEMP, k=random.randrange(<span class="number">1</span>, <span class="number">254</span>)))</span><br><span class="line">    )</span><br><span class="line">    data = <span class="keyword">await</span> database.fetch_all(query)</span><br><span class="line">    response = json.dumps(data, default=<span class="built_in">str</span>)</span><br><span class="line">    <span class="keyword">return</span> Response(content=response, status_code=<span class="number">200</span>, media_type=<span class="string">&quot;application/json&quot;</span>)</span><br></pre></td></tr></table></figure><p>Starlette code is as follows:</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> random</span><br><span class="line"><span class="keyword">import</span> os</span><br><span class="line"><span class="keyword">from</span> typing <span class="keyword">import</span> <span class="type">List</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> databases</span><br><span class="line"><span class="keyword">import</span> pymysql</span><br><span class="line"><span class="keyword">import</span> json</span><br><span class="line"><span class="keyword">import</span> sqlalchemy</span><br><span class="line"><span class="keyword">from</span> starlette.applications <span class="keyword">import</span> Starlette</span><br><span class="line"><span class="keyword">from</span> starlette.responses <span class="keyword">import</span> Response</span><br><span class="line"><span class="keyword">from</span> starlette.routing <span class="keyword">import</span> Route</span><br><span class="line"><span class="keyword">from</span> pydantic <span class="keyword">import</span> BaseModel</span><br><span class="line"></span><br><span class="line">pymysql.install_as_MySQLdb()</span><br><span class="line"></span><br><span class="line">AYSNC_DATABASE_URL = <span class="string">f&quot;mysql+aiomysql://<span class="subst">&#123;os.getenv(<span class="string">&#x27;DATABASE_USER&#x27;</span>)&#125;</span>:<span class="subst">&#123;os.getenv(<span class="string">&#x27;DATABASE_PASSWORD&#x27;</span>)&#125;</span>@<span class="subst">&#123;os.getenv(<span class="string">&#x27;DATABASE_HOST&#x27;</span>)&#125;</span>:3306/demo&quot;</span></span><br><span class="line">SYNC_DATABASE_URL = <span class="string">f&quot;mysql+mysqldb://<span class="subst">&#123;os.getenv(<span class="string">&#x27;DATABASE_USER&#x27;</span>)&#125;</span>:<span class="subst">&#123;os.getenv(<span class="string">&#x27;DATABASE_PASSWORD&#x27;</span>)&#125;</span>@<span class="subst">&#123;os.getenv(<span class="string">&#x27;DATABASE_HOST&#x27;</span>)&#125;</span>:3306/demo&quot;</span></span><br><span class="line"></span><br><span class="line">database = databases.Database(AYSNC_DATABASE_URL, max_size=<span class="number">10000</span>)</span><br><span class="line"></span><br><span class="line">metadata = sqlalchemy.MetaData()</span><br><span class="line"></span><br><span class="line">demo_data = sqlalchemy.Table(</span><br><span class="line">    <span class="string">&quot;demo_data&quot;</span>,</span><br><span class="line">    metadata,</span><br><span class="line">    sqlalchemy.Column(<span class="string">&quot;id&quot;</span>, sqlalchemy.Integer, primary_key=<span class="literal">True</span>),</span><br><span class="line">    sqlalchemy.Column(<span class="string">&quot;name&quot;</span>, sqlalchemy.String),</span><br><span class="line">    sqlalchemy.Column(<span class="string">&quot;create_time&quot;</span>, sqlalchemy.DATETIME),</span><br><span class="line">    sqlalchemy.Column(<span class="string">&quot;update_time&quot;</span>, sqlalchemy.DATETIME),</span><br><span class="line">)</span><br><span class="line">engine = sqlalchemy.create_engine(SYNC_DATABASE_URL)</span><br><span class="line">metadata.create_all(engine)</span><br><span class="line">TEMP = <span class="string">&quot;1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!@#$%^&amp;*()_+=-&quot;</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">DemoData</span>(<span class="title class_ inherited__">BaseModel</span>):</span><br><span class="line">    <span class="built_in">id</span>: <span class="built_in">int</span></span><br><span class="line">    name: <span class="built_in">str</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line">init = <span class="literal">False</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">async</span> <span class="keyword">def</span> <span class="title function_">demo_code</span>(<span class="params">request</span>):</span><br><span class="line">    <span class="keyword">global</span> init</span><br><span class="line">    <span class="keyword">if</span> <span class="keyword">not</span> init:</span><br><span class="line">        <span class="keyword">await</span> database.connect()</span><br><span class="line">        init = <span class="literal">True</span></span><br><span class="line"></span><br><span class="line">    query = demo_data.select().where(</span><br><span class="line">        demo_data.c.name == <span class="string">&quot;&quot;</span>.join(random.choices(TEMP, k=random.randrange(<span class="number">1</span>, <span class="number">254</span>)))</span><br><span class="line">    )</span><br><span class="line">    data = <span class="keyword">await</span> database.fetch_all(query)</span><br><span class="line">    <span class="keyword">return</span> Response(content=json.dumps(data, default=<span class="built_in">str</span>), status_code=<span class="number">200</span>, media_type=<span class="string">&quot;application/json&quot;</span>)</span><br><span class="line"></span><br><span class="line">routes = [</span><br><span class="line">    Route(<span class="string">&quot;/demo&quot;</span>, demo_code, methods=[<span class="string">&quot;GET&quot;</span>]),</span><br><span class="line">]</span><br><span class="line"></span><br><span class="line">app = Starlette(debug=<span class="literal">False</span>, routes=routes)</span><br></pre></td></tr></table></figure><p>The deployment method is as follows:</p><ol><li>All services are deployed on K8S, with POD type as Guaranteed</li><li>All image is built base on the Python 3.12</li><li>Services are limited to 6 Core CPU</li><li>Django and Flask are deployed based on Gevent + Gunicorn, using Greenify to patch the binary</li><li>FastAPI and Starlette are deployed based on uvicorn, using uvloop as the event loop</li></ol><p>OK, now let’s reveal the test results.</p><h2 id="Test-Results-Under-Standard-Operations"><a href="#Test-Results-Under-Standard-Operations" class="headerlink" title="Test Results Under Standard Operations"></a>Test Results Under Standard Operations</h2><p>Django:</p><p><img src="https://i.imgur.com/28P4bcT.png" alt="django"></p><p>FastAPI:</p><p><img src="https://i.imgur.com/T1xiYZe.png" alt="FastAPI"></p><p>Flask:</p><p><img src="https://i.imgur.com/mUkzLNf.png" alt="Flask"></p><p>Starlette:</p><p><img src="https://i.imgur.com/8Fu8vST.png" alt="Starlette"></p><p>Django is undoubtedly the last, while the performance of the other three is Flask + Gevent &gt; Starlette &gt; FastAPI. The CPU usage of the latter three frameworks is all &gt; 90%.</p><h2 id="Idle-Test"><a href="#Idle-Test" class="headerlink" title="Idle Test"></a>Idle Test</h2><p>To be on the safe side, we conducted an idle test on the latter three frameworks.</p><p>Flask:</p><p><img src="https://i.imgur.com/9DjHr00.png" alt="Flask"></p><p>FastAPI:</p><p><img src="https://i.imgur.com/4hq7gqo.png" alt="FastAPI"></p><p>Starlette:</p><p><img src="https://i.imgur.com/Pugbi7M.png" alt="Starlette"></p><p>Starlette &gt; FastAPI &gt; Flask + Gevent</p><h2 id="Conclusion"><a href="#Conclusion" class="headerlink" title="Conclusion"></a>Conclusion</h2><p>Currently, the overall conclusions are as follows:</p><ol><li>In idle situations, the performance of asyncio is significantly better than Gevent. Even with the framework factor, there is still a 10-20% improvement.</li><li>In the case of ORM + MySQL Driver, Gevent’s ecosystem is better than asyncio’s ecosystem.</li></ol><p>If we switch to ORM + PGSQL ecosystem, will the conclusion be even better? Looking forward to the results of the next round of tests.</p>]]></content>
    
    
    <summary type="html">&lt;p&gt;The choice between Gevent and asyncio has always been a classic question. Here, we’ll use data to help you make a decision.&lt;/p&gt;</summary>
    
    
    
    <category term="编程" scheme="https://www.manjusaka.blog/categories/%E7%BC%96%E7%A8%8B/"/>
    
    <category term="CPython" scheme="https://www.manjusaka.blog/categories/%E7%BC%96%E7%A8%8B/CPython/"/>
    
    
    <category term="编程" scheme="https://www.manjusaka.blog/tags/%E7%BC%96%E7%A8%8B/"/>
    
    <category term="CPython" scheme="https://www.manjusaka.blog/tags/CPython/"/>
    
    <category term="Notes" scheme="https://www.manjusaka.blog/tags/Notes/"/>
    
    <category term="Casual Writing" scheme="https://www.manjusaka.blog/tags/Casual-Writing/"/>
    
  </entry>
  
  <entry>
    <title>Debug 日志：CPython GH-121528</title>
    <link href="https://www.manjusaka.blog/posts/2024/07/16/a-live-debug-gh121528/"/>
    <id>https://www.manjusaka.blog/posts/2024/07/16/a-live-debug-gh121528/</id>
    <published>2024-07-16T18:20:00.000Z</published>
    <updated>2026-03-29T17:00:43.280Z</updated>
    
    <content type="html"><![CDATA[<script src="/assets/js/DPlayer.min.js"> </script><p>Debug 日志系列第二篇，CPython 的 GH-121528，也是很有趣的调试和讨论过程，写出来希望帮助大家</p><p>太长不看的版：Python 3.13 Beta 版本中，因为 PEP 683 的实现+周边的改动，导致低版本下编译的一些扩展无法在 Python 3.13 中运行</p><span id="more"></span><h2 id="开篇"><a href="#开篇" class="headerlink" title="开篇"></a>开篇</h2><p>7月9日的时候，PyO3 社区提出了一个 Bug , 编号为 GH-121528<a href="#reference1"><sup>1</sup></a>。这个 Bug 可以做这样的表示</p><p>假设我们有一个 C 扩展文件</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string">&lt;Python.h&gt;</span></span></span><br><span class="line"></span><br><span class="line"><span class="type">static</span> PyObject *</span><br><span class="line"><span class="title function_">foo_bar</span><span class="params">(PyObject *self, PyObject *args)</span></span><br><span class="line">&#123;</span><br><span class="line">Py_INCREF(PyExc_TypeError);</span><br><span class="line">PyErr_SetString(PyExc_TypeError, <span class="string">&quot;foo&quot;</span>);</span><br><span class="line"><span class="keyword">return</span> <span class="literal">NULL</span>;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="type">static</span> PyMethodDef foomethods[] = &#123;</span><br><span class="line">&#123;<span class="string">&quot;bar&quot;</span>, foo_bar, METH_VARARGS, <span class="string">&quot;&quot;</span>&#125;,</span><br><span class="line">&#123;<span class="literal">NULL</span>, <span class="literal">NULL</span>, <span class="number">0</span>, <span class="literal">NULL</span>&#125;,</span><br><span class="line">&#125;;</span><br><span class="line"></span><br><span class="line"><span class="type">static</span> PyModuleDef foomodule = &#123;</span><br><span class="line">PyModuleDef_HEAD_INIT,</span><br><span class="line">.m_name = <span class="string">&quot;foo&quot;</span>,</span><br><span class="line">.m_doc = <span class="string">&quot;foo test module&quot;</span>,</span><br><span class="line">.m_size = <span class="number">-1</span>,</span><br><span class="line">.m_methods = foomethods,</span><br><span class="line">&#125;;</span><br><span class="line"></span><br><span class="line">PyMODINIT_FUNC</span><br><span class="line"><span class="title function_">PyInit_foo</span><span class="params">(<span class="type">void</span>)</span></span><br><span class="line">&#123;</span><br><span class="line"><span class="keyword">return</span> PyModule_Create(&amp;foomodule);</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>然后假设我们有这样的 <code>setup.py</code> 文件</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">from</span> setuptools <span class="keyword">import</span> setup, Extension</span><br><span class="line"></span><br><span class="line">setup(name=<span class="string">&#x27;foo&#x27;</span>,</span><br><span class="line">      version=<span class="string">&#x27;0&#x27;</span>,</span><br><span class="line">      ext_modules=[</span><br><span class="line">          Extension(<span class="string">&#x27;foo&#x27;</span>, [<span class="string">&#x27;foo.c&#x27;</span>], py_limited_api=<span class="string">&#x27;cp38&#x27;</span>),</span><br><span class="line">      ])</span><br></pre></td></tr></table></figure><p>OK， 基于 Limited API (aka Stable ABI) 编译，社区发现，如果在 &lt;= 3.11 的版本中编译的扩展，在 Python 3.13 以及最新主分支中加载扩展，那么会出现问题</p><p>我们来看下堆栈</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br></pre></td><td class="code"><pre><span class="line">Process 10157 stopped</span><br><span class="line">* thread #1, queue = &#x27;com.apple.main-thread&#x27;, stop reason = hit program assert</span><br><span class="line">    frame #4: 0x000000010034043c python.exe`_PyType_AllocNoTrack.cold.2 [inlined] _PyObject_Init(op=&lt;unavailable&gt;, typeobj=&lt;unavailable&gt;) at pycore_object.h:269:5 [opt]</span><br><span class="line">   266  &#123;</span><br><span class="line">   267      assert(op != NULL);</span><br><span class="line">   268      Py_SET_TYPE(op, typeobj);</span><br><span class="line">-&gt; 269      assert(_PyType_HasFeature(typeobj, Py_TPFLAGS_HEAPTYPE) || _Py_IsImmortal(typeobj));</span><br><span class="line">   270      Py_INCREF(typeobj);</span><br><span class="line">   271      _Py_NewReference(op);</span><br><span class="line">   272  &#125;</span><br><span class="line">Target 0: (python.exe) stopped.</span><br><span class="line">warning: python.exe was compiled with optimization - stepping may behave oddly; variables may not be available.</span><br><span class="line">(lldb) bt</span><br><span class="line">* thread #1, queue = &#x27;com.apple.main-thread&#x27;, stop reason = hit program assert</span><br><span class="line">    frame #0: 0x0000000190ec75e0 libsystem_kernel.dylib`__pthread_kill + 8</span><br><span class="line">    frame #1: 0x0000000190efff70 libsystem_pthread.dylib`pthread_kill + 288</span><br><span class="line">    frame #2: 0x0000000190e0c908 libsystem_c.dylib`abort + 128</span><br><span class="line">    frame #3: 0x0000000190e0bc1c libsystem_c.dylib`__assert_rtn + 284</span><br><span class="line">  * frame #4: 0x000000010034043c python.exe`_PyType_AllocNoTrack.cold.2 [inlined] _PyObject_Init(op=&lt;unavailable&gt;, typeobj=&lt;unavailable&gt;) at pycore_object.h:269:5 [opt]</span><br><span class="line">    frame #5: 0x000000010034041c python.exe`_PyType_AllocNoTrack.cold.2 at typeobject.c:2224:9 [opt]</span><br><span class="line">    frame #6: 0x00000001001299a8 python.exe`_PyType_AllocNoTrack [inlined] _PyObject_Init(op=0x0000000100b0eba0, typeobj=0x000000010054db80) at pycore_object.h:269:5 [opt]</span><br><span class="line">    frame #7: 0x00000001001299a4 python.exe`_PyType_AllocNoTrack(type=0x000000010054db80, nitems=0) at typeobject.c:2224:9 [opt]</span><br><span class="line">    frame #8: 0x00000001001297bc python.exe`PyType_GenericAlloc(type=0x000000010054db80, nitems=&lt;unavailable&gt;) at typeobject.c:2238:21 [opt]</span><br><span class="line">    frame #9: 0x00000001000a7638 python.exe`BaseException_vectorcall(type_obj=0x000000010054db80, args=0x000000016fdfd500, nargsf=9223372036854775809, kwnames=&lt;unavailable&gt;) at exceptions.c:92:37 [opt]</span><br><span class="line">    frame #10: 0x0000000100093220 python.exe`_PyObject_VectorcallTstate(tstate=0x00000001005e6370, callable=0x000000010054db80, args=0x000000016fdfd500, nargsf=9223372036854775809, kwnames=0x0000000000000000) at pycore_call.h:167:11 [opt]</span><br><span class="line">    frame #11: 0x00000001000942bc python.exe`PyObject_CallOneArg(func=&lt;unavailable&gt;, arg=&lt;unavailable&gt;) at call.c:395:12 [opt]</span><br><span class="line">    frame #12: 0x0000000100214d2c python.exe`_PyErr_CreateException(exception_type=0x000000010054db80, value=&lt;unavailable&gt;) at errors.c:44:15 [opt]</span><br><span class="line">    frame #13: 0x0000000100215160 python.exe`_PyErr_SetObject(tstate=0x00000001005e6370, exception=0x000000010054db80, value=0x0000000100c41530) at errors.c:184:33 [opt]</span><br><span class="line">    frame #14: 0x0000000100214ed0 python.exe`PyErr_SetString [inlined] _PyErr_SetString(tstate=0x00000001005e6370, exception=&lt;unavailable&gt;, string=&lt;unavailable&gt;) at errors.c:291:9 [opt]</span><br><span class="line">    frame #15: 0x0000000100214eb0 python.exe`PyErr_SetString(exception=0x000000010054db80, string=&lt;unavailable&gt;) at errors.c:300:5 [opt]</span><br><span class="line">    frame #16: 0x000000010099bf30 foo.abi3.so`foo_bar(self=&lt;unavailable&gt;, args=&lt;unavailable&gt;) at foo.c:7:2 [opt]</span><br></pre></td></tr></table></figure><p>OK ，看到问题的部分的代码是这样</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="type">static</span> <span class="keyword">inline</span> <span class="type">void</span></span><br><span class="line">_PyObject_Init(PyObject *op, PyTypeObject *typeobj)</span><br><span class="line">&#123;</span><br><span class="line">    assert(op != <span class="literal">NULL</span>);</span><br><span class="line">    Py_SET_TYPE(op, typeobj);</span><br><span class="line">    assert(_PyType_HasFeature(typeobj, Py_TPFLAGS_HEAPTYPE) || _Py_IsImmortal(typeobj));</span><br><span class="line">    Py_INCREF(typeobj);</span><br><span class="line">    _Py_NewReference(op);</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>我们能看到是在处理 <code>PyExc_TypeError</code> 对象的时候， 进入到了 <code>_PyObject_Init</code> 函数，这里有一个逻辑是判定对象是否是在堆上或者是 Immortal 对象</p><p>我们 Bisect 确认了下，这个变更是在 GH-116115<a href="#reference2"><sup>2</sup></a> 中引入的，原本的逻辑是这样的</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="type">static</span> <span class="keyword">inline</span> <span class="type">void</span></span><br><span class="line">_PyObject_Init(PyObject *op, PyTypeObject *typeobj)</span><br><span class="line">&#123;</span><br><span class="line">    assert(op != <span class="literal">NULL</span>);</span><br><span class="line">    Py_SET_TYPE(op, typeobj);</span><br><span class="line">    <span class="keyword">if</span> (_PyType_HasFeature(typeobj, Py_TPFLAGS_HEAPTYPE)) &#123;</span><br><span class="line">        Py_INCREF(typeobj);</span><br><span class="line">    &#125;</span><br><span class="line">    Py_INCREF(typeobj);</span><br><span class="line">    _Py_NewReference(op);</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>这里我们需要先去看下 <code>PyExc_TypeError</code> 的定义</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="keyword">define</span> PyObject_HEAD_INIT(type)    \</span></span><br><span class="line"><span class="meta">    &#123;                               \</span></span><br><span class="line"><span class="meta">        &#123; _Py_IMMORTAL_REFCNT &#125;,    \</span></span><br><span class="line"><span class="meta">        (type)                      \</span></span><br><span class="line"><span class="meta">    &#125;,</span></span><br><span class="line"></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> PyVarObject_HEAD_INIT(type, size) \</span></span><br><span class="line"><span class="meta">    &#123;                                     \</span></span><br><span class="line"><span class="meta">        PyObject_HEAD_INIT(type)          \</span></span><br><span class="line"><span class="meta">        (size)                            \</span></span><br><span class="line"><span class="meta">    &#125;,</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="type">static</span> PyTypeObject _PyExc_ ## EXCNAME = &#123; \</span><br><span class="line">    PyVarObject_HEAD_INIT(<span class="literal">NULL</span>, <span class="number">0</span>) \</span><br><span class="line">    # EXCNAME, \</span><br><span class="line">    <span class="keyword">sizeof</span>(Py ## EXCSTORE ## Object), <span class="number">0</span>, \</span><br><span class="line">    (destructor)EXCSTORE ## _dealloc, <span class="number">0</span>, <span class="number">0</span>, <span class="number">0</span>, <span class="number">0</span>, <span class="number">0</span>, <span class="number">0</span>, <span class="number">0</span>, <span class="number">0</span>, <span class="number">0</span>, <span class="number">0</span>, \</span><br><span class="line">    (reprfunc)EXCSTR, <span class="number">0</span>, <span class="number">0</span>, <span class="number">0</span>, \</span><br><span class="line">    Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE | Py_TPFLAGS_HAVE_GC, \</span><br><span class="line">    PyDoc_STR(EXCDOC), (traverseproc)EXCSTORE ## _traverse, \</span><br><span class="line">    (inquiry)EXCSTORE ## _clear, <span class="number">0</span>, <span class="number">0</span>, <span class="number">0</span>, <span class="number">0</span>, EXCMETHODS, \</span><br><span class="line">    EXCMEMBERS, EXCGETSET, &amp;_ ## EXCBASE, \</span><br><span class="line">    <span class="number">0</span>, <span class="number">0</span>, <span class="number">0</span>, offsetof(Py ## EXCSTORE ## Object, dict), \</span><br><span class="line">    (initproc)EXCSTORE ## _init, <span class="number">0</span>, EXCNEW,\</span><br><span class="line">&#125;; \</span><br><span class="line">PyObject *PyExc_ ## EXCNAME = (PyObject *)&amp;_PyExc_ ## EXCNAME</span><br><span class="line"></span><br><span class="line">SimpleExtendsException(PyExc_Exception, TypeError,</span><br><span class="line">                       <span class="string">&quot;Inappropriate argument type.&quot;</span>);</span><br></pre></td></tr></table></figure><p>这里我们能看到（注意 <code>_Py_IMMORTAL_REFCNT</code> 和 <code>Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE | Py_TPFLAGS_HAVE_GC</code>），<code>PyExc_TypeError</code> 是一个非堆上 Immortal 对象，在 GH-116115<a href="#reference2"><sup>2</sup></a> 之前，我们走到 false 的分支，而在之后，理论上讲 <code>_PyType_HasFeature(typeobj, Py_TPFLAGS_HEAPTYPE) || _Py_IsImmortal(typeobj)</code> 应该是一个为 true 的表达式，不应该会 assert failed 才对。那么为什么呢</p><p>我们在这里断点一下看一下表达式的值，结果我们惊讶的发现，<code>_Py_IsImmortal(typeobj)</code> 也为 false ，为啥捏？</p><p>我们先来看一下 <code>_Py_IsImmortal(typeobj)</code> 的实现</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="type">static</span> <span class="keyword">inline</span> Py_ALWAYS_INLINE <span class="type">int</span> _Py_IsImmortal(PyObject *op)</span><br><span class="line">&#123;</span><br><span class="line"></span><br><span class="line">    <span class="keyword">return</span> (op-&gt;ob_refcnt == _Py_IMMORTAL_REFCNT);</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>这里我们能看到，<code>_Py_IsImmortal</code> 的实现是判断对象的引用计数是否等于 <code>_Py_IMMORTAL_REFCNT</code> ，奇怪，我们之前看到的 <code>PyExc_TypeError</code> 的定义里其 Reference Count 是 <code>_Py_IMMORTAL_REFCNT</code>， 难道 reference count 发生了什么变化？这个时候我们需要注意到，在 PyErr_SetString 之前我们调用了 <code>Py_INCREF</code>，我们来验证下</p><p>我们在 foo_bar 函数中加入断点，我们发现，在执行 <code>Py_INCREF</code> 后，我们我们的引用技术 +1 ，从而导致了 <code>_Py_IsImmortal</code> 的判断为 false</p><p>那么这里新的问题又来了，为什么我们在 &gt;= 3.12 的版本上编译的插件，在后续执行正常呢？这种奇怪的问题我们就先来看下汇编</p><p>在 3.11 下编译的产物</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line">0000000000001120 &lt;foo_bar&gt;:</span><br><span class="line">    1120:48 83 ec 08          sub    $0x8,%rsp</span><br><span class="line">    1124:48 8b 05 9d 2e 00 00 mov    0x2e9d(%rip),%rax        # 3fc8 &lt;PyExc_TypeError@Base&gt;</span><br><span class="line">    112b:48 8d 35 ce 0e 00 00 lea    0xece(%rip),%rsi        # 2000 &lt;_fini+0xe9c&gt;</span><br><span class="line">    1132:48 8b 38             mov    (%rax),%rdi</span><br><span class="line">    1135:48 83 07 01          addq   $0x1,(%rdi)</span><br><span class="line">    1139:e8 f2 fe ff ff       call   1030 &lt;PyErr_SetString@plt&gt;</span><br><span class="line">    113e:31 c0                xor    %eax,%eax</span><br><span class="line">    1140:48 83 c4 08          add    $0x8,%rsp</span><br><span class="line">    1144:c3                   ret</span><br><span class="line">    1145:66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1)</span><br><span class="line">    114c:00 00 00 00</span><br></pre></td></tr></table></figure><p>在 3.13 下编译的产物</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line">0000000000001120 &lt;foo_bar&gt;:</span><br><span class="line">    1120:48 83 ec 08          sub    $0x8,%rsp</span><br><span class="line">    1124:48 8b 05 9d 2e 00 00 mov    0x2e9d(%rip),%rax        # 3fc8 &lt;PyExc_TypeError@Base&gt;</span><br><span class="line">    112b:48 8b 38             mov    (%rax),%rdi</span><br><span class="line">    112e:8b 07                mov    (%rdi),%eax</span><br><span class="line">    1130:83 c0 01             add    $0x1,%eax</span><br><span class="line">    1133:74 02                je     1137 &lt;foo_bar+0x17&gt;</span><br><span class="line">    1135:89 07                mov    %eax,(%rdi)</span><br><span class="line">    1137:48 8d 35 c2 0e 00 00 lea    0xec2(%rip),%rsi        # 2000 &lt;_fini+0xe9c&gt;</span><br><span class="line">    113e:e8 ed fe ff ff       call   1030 &lt;PyErr_SetString@plt&gt;</span><br><span class="line">    1143:31 c0                xor    %eax,%eax</span><br><span class="line">    1145:48 83 c4 08          add    $0x8,%rsp</span><br><span class="line">    1149:c3                   ret</span><br><span class="line">    114a:66 0f 1f 44 00 00    nopw   0x0(%rax,%rax,1)</span><br></pre></td></tr></table></figure><p>我们能发现我们在 <code>call   1030 &lt;PyErr_SetString@plt&gt;</code> 这条指令前的汇编完全不一样，我们这里能归纳出两点</p><ol><li>PyErr_SetString 调用的地址是在运行时动态解析的</li><li>而 <code>Py_INCREF</code> 则处理成不同逻辑的汇编了</li></ol><p>这种情况只有两种可能</p><ol><li><code>Py_INCREF</code> 是一组宏定义</li><li><code>Py_INCREF</code> 是被 inline 处理了</li></ol><p>我们来看下 <code>Py_INCREF</code> 的定义</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="type">static</span> <span class="keyword">inline</span> Py_ALWAYS_INLINE <span class="type">void</span> <span class="title function_">Py_INCREF</span><span class="params">(PyObject *op)</span>;</span><br></pre></td></tr></table></figure><p>果然是第二种情况，那么这种情况就意味着 <code>Py_INCREF</code> 的实现在 3.13 和 3.11 中是不一样的，我们来看下代码</p><p>3.13 </p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="type">static</span> <span class="keyword">inline</span> Py_ALWAYS_INLINE <span class="type">void</span> <span class="title function_">Py_INCREF</span><span class="params">(PyObject *op)</span></span><br><span class="line">&#123;</span><br><span class="line">    <span class="keyword">if</span> (_Py_IsImmortal(op)) &#123;</span><br><span class="line">        <span class="keyword">return</span>;</span><br><span class="line">    &#125;</span><br><span class="line">    op-&gt;ob_refcnt++;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>3.11</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="type">static</span> <span class="keyword">inline</span> <span class="type">void</span> <span class="title function_">Py_INCREF</span><span class="params">(PyObject *op)</span></span><br><span class="line">&#123;</span><br><span class="line"></span><br><span class="line">    op-&gt;ob_refcnt++;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>果然，在 3.13 中我们对于 immortal 对象的引用计数不再增加，而 3.11 不会做检查直接增加，这会使 immortal 对象的引用计数不再是 <code>_Py_IMMORTAL_REFCNT</code>，从而导致了我们的问题</p><p>这个问题那么其实说白了可以这样总结，在 PEP 683 Immortal 对象的实现中，我们将 immortal 的状态和引用技术 mix up 了，导致我们部分 ABI 在低版本 inline 后在高版本中有错误的逻辑。同时我们在 GH-116115<a href="#reference2"><sup>2</sup></a> 中收窄了对于对象检测的严谨性，从而导致出现了兼容的问题</p><p>这个问题其实修复起来也很容易，目前我和另外一个 Python 核心开发者各自采用了一种处理方式</p><ol><li>我是选择将 assert 的部分 revert 到之前的 if condition 检查，这样可以保证对象的兼容性，改动也比较小。缺陷就是算是 case by case 的解决</li><li>另外一位核心开发者解决的方式是将 immortal 的检查范围放大（大小于某个区间即可认为是 immortal 对象），这样的好处是可以扩展，而缺陷就是可能让 immortal 对象的实现复杂度进一步提升</li></ol><p>不过说白了归根到底还是 PEP 683 实现的时候状态混合了，估计后续还有不少问题</p><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><p>这个 case 其实也是个查起来不难，修复不难的问题。但是后面牵扯的东西太多了，很多有趣的讨论可以点进 issue 去看看</p><h2 id="Reference"><a href="#Reference" class="headerlink" title="Reference"></a>Reference</h2><div id="reference1"></div><ol><li><a href="https://github.com/python/cpython/issues/121528">https://github.com/python/cpython/issues/121528</a></li></ol><div id="reference2"></div><ol><li><a href="https://github.com/python/cpython/pull/116115">https://github.com/python/cpython/pull/116115</a></li></ol>]]></content>
    
    
    <summary type="html">&lt;p&gt;Debug 日志系列第二篇，CPython 的 GH-121528，也是很有趣的调试和讨论过程，写出来希望帮助大家&lt;/p&gt;
&lt;p&gt;太长不看的版：Python 3.13 Beta 版本中，因为 PEP 683 的实现+周边的改动，导致低版本下编译的一些扩展无法在 Python 3.13 中运行&lt;/p&gt;</summary>
    
    
    
    <category term="编程" scheme="https://www.manjusaka.blog/categories/%E7%BC%96%E7%A8%8B/"/>
    
    <category term="CPython" scheme="https://www.manjusaka.blog/categories/%E7%BC%96%E7%A8%8B/CPython/"/>
    
    
    <category term="编程" scheme="https://www.manjusaka.blog/tags/%E7%BC%96%E7%A8%8B/"/>
    
    <category term="笔记" scheme="https://www.manjusaka.blog/tags/%E7%AC%94%E8%AE%B0/"/>
    
    <category term="水文" scheme="https://www.manjusaka.blog/tags/%E6%B0%B4%E6%96%87/"/>
    
    <category term="CPython" scheme="https://www.manjusaka.blog/tags/CPython/"/>
    
  </entry>
  
  <entry>
    <title>Debug 日志：CPython GH-120437</title>
    <link href="https://www.manjusaka.blog/posts/2024/06/19/a-live-debug-gh120437/"/>
    <id>https://www.manjusaka.blog/posts/2024/06/19/a-live-debug-gh120437/</id>
    <published>2024-06-19T19:40:00.000Z</published>
    <updated>2026-03-29T17:00:43.280Z</updated>
    
    <content type="html"><![CDATA[<script src="/assets/js/DPlayer.min.js"> </script><p>和 SRE 日志 系列一样，Debug 日志用来复盘我一些可以公开的调试经历，希望能帮助到大家。</p><p>这篇是 Python 3.13 Beta 下 JIT/Tier 2 优化器的一个 Bug ，前后历时五天，最终修改点很小，非常有趣</p><span id="more"></span><h2 id="开篇"><a href="#开篇" class="headerlink" title="开篇"></a>开篇</h2><p>13号的时候，用户反馈了一个 Bug，编号 GH120437<a href="#refer-anchor-1"><sup>1</sup></a> ，具体的行为是这样</p><p>Python 3.13 引入了实验性的 JIT 优化器，具体的细节可以参考我之前的文章 简单聊聊 Python 3.13 的 JIT 方案<a href="#refer-anchor-2"><sup>2</sup></a>，用户可以在构建的时候选择性的开启</p><blockquote><p>./configure —enable-experimental-jit —with-pydebug &amp;&amp; make -j</p></blockquote><p>用户在开启 JIT 的情况下，发现了一个非常奇怪的问题，执行</p><blockquote><p>./python -m ensurepip</p></blockquote><p>会抛出异常</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">subprocess.CalledProcessError: Command &#x27;[&#x27;/home/jglass/Documents/cpython/python&#x27;, &#x27;-W&#x27;, &#x27;ignore::DeprecationWarning&#x27;, &#x27;-c&#x27;, &#x27;\nimport runpy\nimport sys\nsys.path = [\&#x27;/tmp/tmpsu81mj6o/pip-24.0-py3-none-any.whl\&#x27;] + sys.path\nsys.argv[1:] = [\&#x27;install\&#x27;, \&#x27;--no-cache-dir\&#x27;, \&#x27;--no-index\&#x27;, \&#x27;--find-links\&#x27;, \&#x27;/tmp/tmpsu81mj6o\&#x27;, \&#x27;pip\&#x27;]\nrunpy.run_module(&quot;pip&quot;, run_name=&quot;__main__&quot;, alter_sys=True)\n&#x27;]&#x27; died with &lt;Signals.SIGABRT: 6&gt;.</span><br></pre></td></tr></table></figure><p>我在最新分支上无法复现这个问题，在3.13分支上能够稳定复现。</p><p>能够稳定复现就好办了。首先为了调试下去，我们需要在一个更小范围的能够复现的测试用例，我去阅读了一下 ensurepip 部分的代码，有关的部分大概长这样</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">def</span> <span class="title function_">_run_pip</span>(<span class="params">args, additional_paths=<span class="literal">None</span></span>):</span><br><span class="line">    <span class="comment"># Run the bootstrapping in a subprocess to avoid leaking any state that happens</span></span><br><span class="line">    <span class="comment"># after pip has executed. Particularly, this avoids the case when pip holds onto</span></span><br><span class="line">    <span class="comment"># the files in *additional_paths*, preventing us to remove them at the end of the</span></span><br><span class="line">    <span class="comment"># invocation.</span></span><br><span class="line">    code = <span class="string">f&quot;&quot;&quot;</span></span><br><span class="line"><span class="string">import runpy</span></span><br><span class="line"><span class="string">import sys</span></span><br><span class="line"><span class="string">sys.path = <span class="subst">&#123;additional_paths <span class="keyword">or</span> []&#125;</span> + sys.path</span></span><br><span class="line"><span class="string">sys.argv[1:] = <span class="subst">&#123;args&#125;</span></span></span><br><span class="line"><span class="string">runpy.run_module(&quot;pip&quot;, run_name=&quot;__main__&quot;, alter_sys=True)</span></span><br><span class="line"><span class="string">&quot;&quot;&quot;</span></span><br><span class="line"></span><br><span class="line">    cmd = [</span><br><span class="line">        sys.executable,</span><br><span class="line">        <span class="string">&#x27;-W&#x27;</span>,</span><br><span class="line">        <span class="string">&#x27;ignore::DeprecationWarning&#x27;</span>,</span><br><span class="line">        <span class="string">&#x27;-c&#x27;</span>,</span><br><span class="line">        code,</span><br><span class="line">    ]</span><br><span class="line">    <span class="keyword">if</span> sys.flags.isolated:</span><br><span class="line">        <span class="comment"># run code in isolated mode if currently running isolated</span></span><br><span class="line">        cmd.insert(<span class="number">1</span>, <span class="string">&#x27;-I&#x27;</span>)</span><br><span class="line">    <span class="keyword">return</span> subprocess.run(cmd, check=<span class="literal">True</span>).returncode</span><br></pre></td></tr></table></figure><p>那么这里我直接构造一个 Python 脚本，直接用 Python 来执行，理论上讲是没有问题的</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> runpy</span><br><span class="line"><span class="keyword">import</span> sys</span><br><span class="line">sys.path = [<span class="string">&#x27;/tmp/tmp04bw2hi9/pip-23.3.2-py3-none-any.whl&#x27;</span>] + sys.path</span><br><span class="line">sys.argv[<span class="number">1</span>:] = [<span class="string">&#x27;install&#x27;</span>, <span class="string">&#x27;--no-cache-dir&#x27;</span>, <span class="string">&#x27;--no-index&#x27;</span>, <span class="string">&#x27;--find-links&#x27;</span>, <span class="string">&#x27;/tmp/tmp04bw2hi9&#x27;</span>, <span class="string">&#x27;pip&#x27;</span>]</span><br><span class="line">runpy.run_module(<span class="string">&quot;pip&quot;</span>, run_name=<span class="string">&quot;__main__&quot;</span>, alter_sys=<span class="literal">True</span>)</span><br></pre></td></tr></table></figure><p>bingo，这个脚本能够稳定复现问题，那么我们就可以开始进一步的分析问题了</p><p>我们现在要做的一个很关键的事是确认 Bug 引入的时间点和范围。那么这个问题理论上讲是 JIT 优化器引入的，JIT 第一个引入的 commit 是 f6d9e5926b6138994eaa60d1c36462e36105733d<a href="#refer-anchor-3"><sup>3</sup></a>，那么我们可以通过 git bisect 来确认问题的引入时间点（这里额外的确认是该 commit 前一个 commit 是没有问题的）</p><p>经过确认后，我们发现问题的引入时间点是 1ab6356ebec25f216a0eddbd81225abcb93f2d55<a href="#refer-anchor-4"><sup>4</sup></a>，那么我们就可以开始进一步的分析了</p><p>先上 gdb ，看一下栈的情况</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line">__pthread_kill_implementation (threadid=&lt;optimized out&gt;, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44</span><br><span class="line">44            return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO (ret) : 0;                                                                                                                          </span><br><span class="line">(gdb) bt</span><br><span class="line">#0  __pthread_kill_implementation (threadid=&lt;optimized out&gt;, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44</span><br><span class="line">#1  0x00007ffff7d3eeb3 in __pthread_kill_internal (threadid=&lt;optimized out&gt;, signo=6) at pthread_kill.c:78</span><br><span class="line">#2  0x00007ffff7ce6a30 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26</span><br><span class="line">#3  0x00007ffff7cce4c3 in __GI_abort () at abort.c:79</span><br><span class="line">#4  0x00007ffff7cce3df in __assert_fail_base (fmt=0x7ffff7e59b68 &quot;%s%s%s:%u: %s%sAssertion `%s&#x27; failed.\n%n&quot;, assertion=assertion@entry=0x7ffff69bb47c &quot;tstate-&gt;datastack_top &lt; tstate-&gt;datastack_limit&quot;, </span><br><span class="line">    file=file@entry=0x7ffff69bb431 &quot;/home/manjusaka/Documents/projects/cpython/Include/internal/pycore_frame.h&quot;, line=line@entry=284, </span><br><span class="line">    function=function@entry=0x7ffff69bb4ac &quot;_PyInterpreterFrame *_PyFrame_PushUnchecked(PyThreadState *, PyFunctionObject *, int)&quot;) at assert.c:94</span><br><span class="line">#5  0x00007ffff7cdec67 in __assert_fail (assertion=0x7ffff69bb47c &quot;tstate-&gt;datastack_top &lt; tstate-&gt;datastack_limit&quot;, </span><br><span class="line">    file=0x7ffff69bb431 &quot;/home/manjusaka/Documents/projects/cpython/Include/internal/pycore_frame.h&quot;, line=284, </span><br><span class="line">    function=0x7ffff69bb4ac &quot;_PyInterpreterFrame *_PyFrame_PushUnchecked(PyThreadState *, PyFunctionObject *, int)&quot;) at assert.c:103</span><br><span class="line">#6  0x00007ffff69b07e8 in ?? ()</span><br><span class="line">#7  0x416b4a710a2907e9 in ?? ()</span><br><span class="line">#8  0x00005555556c9023 in _Py_INCREF_IncRefTotal () at Objects/object.c:230</span><br><span class="line">Backtrace stopped: previous frame inner to this frame (corrupt stack?)</span><br></pre></td></tr></table></figure><p>What the fuck，这什么栈？我们能拿到的唯一的有效信息是崩溃在这里</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="type">static</span> <span class="keyword">inline</span> _PyInterpreterFrame *</span><br><span class="line">_PyFrame_PushUnchecked(PyThreadState *tstate, PyFunctionObject *func, <span class="type">int</span> null_locals_from)</span><br><span class="line">&#123;</span><br><span class="line">    CALL_STAT_INC(frames_pushed);</span><br><span class="line">    PyCodeObject *code = (PyCodeObject *)func-&gt;func_code;</span><br><span class="line">    _PyInterpreterFrame *new_frame = (_PyInterpreterFrame *)tstate-&gt;datastack_top;</span><br><span class="line">    tstate-&gt;datastack_top += code-&gt;co_framesize;</span><br><span class="line">    assert(tstate-&gt;datastack_top &lt; tstate-&gt;datastack_limit);</span><br><span class="line">    _PyFrame_Initialize(new_frame, func, <span class="literal">NULL</span>, code, null_locals_from);</span><br><span class="line">    <span class="keyword">return</span> new_frame;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>其余的信息，没有。。。这也算 JIT 的坑了，由于是动态加载的二进制，会导致调试进程的时候会有很多额外的工作量。理论上我可以挂一下 frame 拿到 executor 的信息然后再调 JIT 的汇编的，但是我不想这么搞啊？</p><p>这里陷入了僵局，我在实在没想到很好的办法准备硬调的时候，遛狗时突然想起 Python 的 JIT 是基于 Copy and Patch 做的，是基于已有的 executor case 来生成 JIT 二进制的（具体细节还是参考我之前那篇文章）。那么我应该可以直接将 JIT 的部分关掉，只用 Tier2 优化器的 OPCODE 来测试，应该行为是一致的</p><p>重新基于 <code>./configure --with-pydebug --enable-pystats --enable-profiling --with-dtrace --enable-experimental-jit=interpreter</code> 来编译代码，用gdb 测试，果然，这次的栈美好了很多</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br></pre></td><td class="code"><pre><span class="line">#1  0x00007ffff7d3eeb3 in __pthread_kill_internal (threadid=&lt;optimized out&gt;, signo=6) at pthread_kill.c:78</span><br><span class="line">#2  0x00007ffff7ce6a30 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26</span><br><span class="line">#3  0x00007ffff7cce4c3 in __GI_abort () at abort.c:79</span><br><span class="line">#4  0x00007ffff7cce3df in __assert_fail_base (fmt=0x7ffff7e59b68 &quot;%s%s%s:%u: %s%sAssertion `%s&#x27; failed.\n%n&quot;, assertion=assertion@entry=0x55555591a150 &quot;tstate-&gt;datastack_top &lt; tstate-&gt;datastack_limit&quot;, </span><br><span class="line">    file=file@entry=0x555555901138 &quot;./Include/internal/pycore_frame.h&quot;, line=line@entry=284, function=function@entry=0x555555977030 &lt;__PRETTY_FUNCTION__.30&gt; &quot;_PyFrame_PushUnchecked&quot;) at assert.c:94</span><br><span class="line">#5  0x00007ffff7cdec67 in __assert_fail (assertion=assertion@entry=0x55555591a150 &quot;tstate-&gt;datastack_top &lt; tstate-&gt;datastack_limit&quot;, file=file@entry=0x555555901138 &quot;./Include/internal/pycore_frame.h&quot;, </span><br><span class="line">    line=line@entry=284, function=function@entry=0x555555977030 &lt;__PRETTY_FUNCTION__.30&gt; &quot;_PyFrame_PushUnchecked&quot;) at assert.c:103</span><br><span class="line">#6  0x000055555578ec88 in _PyFrame_PushUnchecked (tstate=tstate@entry=0x555555d9e0c0 &lt;_PyRuntime+293952&gt;, func=&lt;optimized out&gt;, null_locals_from=null_locals_from@entry=3)</span><br><span class="line">    at ./Include/internal/pycore_frame.h:284</span><br><span class="line">#7  0x00005555557b8c51 in _PyEval_EvalFrameDefault (tstate=0x555555d9e0c0 &lt;_PyRuntime+293952&gt;, frame=0x7ffff7f98e58, throwflag=0) at Python/executor_cases.c.h:3326</span><br><span class="line">#8  0x00005555557bc37e in _PyEval_EvalFrame (tstate=tstate@entry=0x555555d9e0c0 &lt;_PyRuntime+293952&gt;, frame=&lt;optimized out&gt;, throwflag=throwflag@entry=0) at ./Include/internal/pycore_ceval.h:118</span><br><span class="line">#9  0x00005555557bc4a4 in _PyEval_Vector (tstate=0x555555d9e0c0 &lt;_PyRuntime+293952&gt;, func=0x7ffff6fe10d0, locals=locals@entry=0x0, args=0x7fffffff15e0, argcount=2, kwnames=0x0) at Python/ceval.c:1818</span><br><span class="line">#10 0x00005555556728e4 in _PyFunction_Vectorcall (func=&lt;optimized out&gt;, stack=&lt;optimized out&gt;, nargsf=&lt;optimized out&gt;, kwnames=&lt;optimized out&gt;) at Objects/call.c:413</span><br><span class="line">#11 0x0000555555672c54 in _PyObject_VectorcallTstate (tstate=tstate@entry=0x555555d9e0c0 &lt;_PyRuntime+293952&gt;, callable=callable@entry=&lt;function at remote 0x7ffff6fe10d0&gt;, args=args@entry=0x7fffffff15e0, </span><br><span class="line">    nargsf=nargsf@entry=2, kwnames=kwnames@entry=0x0) at ./Include/internal/pycore_call.h:168</span><br><span class="line">#12 0x0000555555673b8c in object_vacall (tstate=tstate@entry=0x555555d9e0c0 &lt;_PyRuntime+293952&gt;, base=base@entry=0x0, callable=&lt;function at remote 0x7ffff6fe10d0&gt;, vargs=vargs@entry=0x7fffffff1660)</span><br><span class="line">    at Objects/call.c:819</span><br><span class="line">#13 0x0000555555673cea in PyObject_CallMethodObjArgs (obj=0x0, name=&lt;optimized out&gt;) at Objects/call.c:880</span><br><span class="line">#14 0x00005555557fb230 in import_find_and_load (tstate=tstate@entry=0x555555d9e0c0 &lt;_PyRuntime+293952&gt;, abs_name=abs_name@entry=&#x27;_winapi&#x27;) at Python/import.c:3080</span><br><span class="line">#15 0x00005555557feb3a in PyImport_ImportModuleLevelObject (name=name@entry=&#x27;_winapi&#x27;, globals=&lt;optimized out&gt;, </span><br><span class="line">    locals=locals@entry=&#123;&#x27;__name__&#x27;: &#x27;mimetypes&#x27;, &#x27;__doc__&#x27;: &#x27;Guess the MIME type of a file.\n\nThis module defines two useful functions:\n\nguess_type(url, strict=True) -- guess the MIME type and encoding of a URL.\n\nguess_extension(type, strict=True) -- guess the extension for a given MIME type.\n\nIt also contains the following, for tuning the behavior:\n\nData:\n\nknownfiles -- list of files to parse\ninited -- flag set when init() has been called\nsuffix_map -- dictionary mapping suffixes to suffixes\nencodings_map -- dictionary mapping suffixes to encodings\ntypes_map -- dictionary mapping suffixes to types\n\nFunctions:\n\ninit([files]) -- parse a list of files, default knownfiles (on Windows, the\n  default values are taken from the registry)\nread_mime_types(file) -- parse one file, return a dictionary or None\n&#x27;, &#x27;__package__&#x27;: &#x27;&#x27;, &#x27;__loader__&#x27;: &lt;SourceFileLoader(name=&#x27;mimetypes&#x27;, path=&#x27;/home/manjusaka/Documents/projects/cpython/Lib/mimetypes.py&#x27;) at remote 0x7ffff5395100&gt;, &#x27;__spec__&#x27;: &lt;ModuleSpec(name=&#x27;mimetypes&#x27;, loader...(truncated), fromlist=fromlist@entry=(&#x27;_mimetypes_read_windows_registry&#x27;,), level=level@entry=0) at Python/import.c:3160</span><br><span class="line">#16 0x000055555578f3fa in import_name (tstate=tstate@entry=0x555555d9e0c0 &lt;_PyRuntime+293952&gt;, frame=frame@entry=0x7ffff7f98b18, name=&#x27;_winapi&#x27;, fromlist=fromlist@entry=(&#x27;_mimetypes_read_windows_registry&#x27;,), </span><br><span class="line">    level=level@entry=0) at Python/ceval.c:2629</span><br><span class="line">#17 0x00005555557a244b in _PyEval_EvalFrameDefault (tstate=0x555555d9e0c0 &lt;_PyRuntime+293952&gt;, frame=0x7ffff7f98b18, throwflag=0) at Python/generated_cases.c.h:3196</span><br><span class="line">#18 0x00005555557bc37e in _PyEval_EvalFrame (tstate=tstate@entry=0x555555d9e0c0 &lt;_PyRuntime+293952&gt;, frame=&lt;optimized out&gt;, throwflag=throwflag@entry=0) at ./Include/internal/pycore_ceval.h:118</span><br></pre></td></tr></table></figure><p>这个栈看着就轻松很多了，我们很轻松的来到 #7 ，判断出当前的 opcode <code>_INIT_CALL_PY_EXACT_ARGS_x</code>，这是一个 Tier2 的特化指令，这里可以近似的认为我们对于这个指令有足够的上下文，比如函数初始化的时候参数有两个（对应此处的 _INIT_CALL_PY_EXACT_ARGS_2),然后有一些 short pass，在这个 short pass 中，_PyFrame_PushUnchecked 会被快速调用（免去了额外的 frame 大小的校验）。那么我最开始的想法是这样，我可以在这个指令的特化逻辑加一个额外的 check，如果当前的线程状态中保存的栈大小小于我们需要的大小，那么则退出特化，走传统的调用方式，那么更改起来也相对简单，<code>_INIT_CALL_PY_EXACT_ARGS_x</code> 有一个前置指令是 <code>_CHECK_FUNCTION_EXACT_ARGS</code></p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">op(_CHECK_FUNCTION_EXACT_ARGS, (func_version/<span class="number">2</span>, callable, self_or_null, unused[oparg] -- callable, self_or_null, unused[oparg])) &#123;</span><br><span class="line">    EXIT_IF(!PyFunction_Check(callable));</span><br><span class="line">    PyFunctionObject *func = (PyFunctionObject *)callable;</span><br><span class="line">    EXIT_IF(func-&gt;func_version != func_version);</span><br><span class="line">    PyCodeObject *code = (PyCodeObject *)func-&gt;func_code;</span><br><span class="line">    EXIT_IF(code-&gt;co_argcount != oparg + (self_or_null != <span class="literal">NULL</span>));</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>那么我们可以在这里添加一个额外的特化处理逻辑，如果当前的线程状态中保存的栈大小小于我们需要的大小，那么则退出特化，走传统的调用方式</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">op(_CHECK_FUNCTION_EXACT_ARGS, (func_version/<span class="number">2</span>, callable, self_or_null, unused[oparg] -- callable, self_or_null, unused[oparg])) &#123;</span><br><span class="line">    EXIT_IF(!PyFunction_Check(callable));</span><br><span class="line">    PyFunctionObject *func = (PyFunctionObject *)callable;</span><br><span class="line">    EXIT_IF(func-&gt;func_version != func_version);</span><br><span class="line">    PyCodeObject *code = (PyCodeObject *)func-&gt;func_code;</span><br><span class="line">    EXIT_IF(code-&gt;co_argcount != oparg + (self_or_null != <span class="literal">NULL</span>));</span><br><span class="line">    EXIT_IF(!_PyThreadState_HasStackSpace(tstate, code-&gt;co_framesize));</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>编译后通过测试，问题解决，我开始提交 PR。你是不是以为到这里就完事了？不，这里我犯了一个很典型的错误就是，逻辑没有闭环，我没有解释清楚，为什么在 1ab6356ebec25f216a0eddbd81225abcb93f2d55<a href="#refer-anchor-4"><sup>4</sup></a> 引入了这个 Bug？查问题的时候逻辑闭环是个非常重要的事情</p><p>在提交 PR 后，核心开发者 Ken Jin（也是我现在的 Mentor）提醒我，这里的问题实际上可能和 <code>_INIT_CALL_PY_EXACT_ARGS_x</code> 毫无关联，而是 <code>_CHECK_STACK_SPACE</code> 特化的一个问题</p><p>他之所以能确定这一点，是因为他在看到这个问题的时候将 <code>_CHECK_STACK_SPACE</code> 的部分注释掉后，发现这个地方能够正常的运行。那么通常来说一个 Bug 只能有一个原因，那么我现在需要来查一查为什么 <code>_CHECK_STACK_SPACE</code> 会导致这个问题</p><p>这里要介绍下 <code>_CHECK_STACK_SPACE</code> 特化，是在 GH-116168<a href="#refer-anchor-5"><sup>5</sup></a> 中引入的，这个特化的目的是为了在特定的情况下，我们可以合并一些栈的检查，这个特化的逻辑是这样</p><p>假设我们有这样的顺序调用，字节码如下</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">_CHECK_STACK_SPACE A</span><br><span class="line">_PUSH_FRAME</span><br><span class="line">_POP_FRAME</span><br><span class="line">_CHECK_STACK_SPACE B</span><br><span class="line">_PUSH_FRAME</span><br><span class="line">_POP_FRAME</span><br></pre></td></tr></table></figure><p>那么我们可以确定这个函数需要的大小是 max(A,B)，那我们特化的后的指令如下</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">_CHECK_STACK_SPACE max(A, B)</span><br><span class="line">_PUSH_FRAME</span><br><span class="line">_POP_FRAME</span><br><span class="line">_PUSH_FRAME</span><br><span class="line">_POP_FRAME</span><br></pre></td></tr></table></figure><p>对于嵌套调用</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">_CHECK_STACK_SPACE A</span><br><span class="line">_PUSH_FRAME</span><br><span class="line">_CHECK_STACK_SPACE B</span><br><span class="line">_PUSH_FRAME</span><br><span class="line">_POP_FRAME</span><br><span class="line">_POP_FRAME</span><br></pre></td></tr></table></figure><p>那么我们可以确定这个函数需要的大小是 A + B，那我们特化的后的指令如下</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">_CHECK_STACK_SPACE A + B</span><br><span class="line">_PUSH_FRAME</span><br><span class="line">_PUSH_FRAME</span><br><span class="line">_POP_FRAME</span><br><span class="line">_POP_FRAME</span><br></pre></td></tr></table></figure><p>实现上来说，在第一次调用 <code>_CHECK_STACK_SPACE</code> 的时候，会有这样的逻辑</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">case</span> _CHECK_STACK_SPACE: &#123;</span><br><span class="line">    assert(corresponding_check_stack == <span class="literal">NULL</span>);</span><br><span class="line">    corresponding_check_stack = &amp;buffer[pc];</span><br><span class="line">    <span class="keyword">break</span>;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>我们将当前指令放在 corresponding_check_stack 中，然后在第一次调用 <code>_PUSH_FRAME</code> 的时候，我们会有这样的逻辑</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line">max_space = curr_space &gt; max_space ? curr_space : max_space;</span><br><span class="line"><span class="keyword">if</span> (first_valid_check_stack == <span class="literal">NULL</span>) &#123;</span><br><span class="line">    first_valid_check_stack = corresponding_check_stack;</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">else</span> &#123;</span><br><span class="line">    <span class="comment">// delete all but the first valid _CHECK_STACK_SPACE</span></span><br><span class="line">    corresponding_check_stack-&gt;opcode = _NOP;</span><br><span class="line">&#125;</span><br><span class="line">corresponding_check_stack = <span class="literal">NULL</span>;</span><br><span class="line"><span class="keyword">break</span>;</span><br></pre></td></tr></table></figure><p>在最后第一次执行完成的时候，我们会有这样的逻辑</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">finish:</span><br><span class="line">    <span class="keyword">if</span> (first_valid_check_stack != <span class="literal">NULL</span>) &#123;</span><br><span class="line">        assert(first_valid_check_stack-&gt;opcode == _CHECK_STACK_SPACE);</span><br><span class="line">        assert(max_space &gt; <span class="number">0</span>);</span><br><span class="line">        assert(max_space &lt;= INT_MAX);</span><br><span class="line">        assert(max_space &lt;= INT32_MAX);</span><br><span class="line">        first_valid_check_stack-&gt;opcode = _CHECK_STACK_SPACE_OPERAND;</span><br><span class="line">        first_valid_check_stack-&gt;operand = max_space;</span><br><span class="line">    &#125;</span><br></pre></td></tr></table></figure><p>这里实际上是将 <code>_CHECK_STACK_SPACE</code> 的逻辑合并到了 <code>_CHECK_STACK_SPACE_OPERAND</code> 中，然后新指令的操作数是我们在执行过程中确认的当前我们需要的最大的 frame，那么我们可以看到，这里的逻辑是没有问题的，那么问题出在哪里呢？</p><p>在 1ab6356ebec25f216a0eddbd81225abcb93f2d55<a href="#refer-anchor-4"><sup>4</sup></a> 中，作者将在引入的新指令 <code>_PY_FRAME_GENERAL</code> 中 <code>first_valid_check_stack</code> 设置为 NULL，这会导致最后的指令替换的逻辑没法执行，同时我们在 <code>_PUSH_FRAME</code> 中将后续的 <code>_CHECK_STACK_SPACE</code> 指令替换为了 <code>_NOP</code>，这会导致我们 stack check 事实上的失效，最终导致进程的 crash</p><p>在确定最终的 root cause 后，这个问题就可以被修复了（就一行有效变更）</p><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><p>这个问题是典型的查起来麻烦，修起来简单的问题，不过这个查 bug 过程我觉得挺有价值的，所以单独记录一下吧。以及 Python 的 Tier2 优化器设计真的蛮有趣的，希望后面能发现更多好玩的点（我目前在尝试做常量类型 Guard 的优化，希望能顺利）</p><p>差不多这样</p><h2 id="Reference"><a href="#Reference" class="headerlink" title="Reference"></a>Reference</h2><div id="refer-anchor-1"></div><ul><li>[1]. <a href="https://github.com/python/cpython/issues/120437">https://github.com/python/cpython/issues/120437</a></li></ul><div id="refer-anchor-2"></div><ul><li>[2]. <a href="https://www.manjusaka.blog/posts/2024/01/03/a-simple-introduction-about-python-jit/">https://www.manjusaka.blog/posts/2024/01/03/a-simple-introduction-about-python-jit/</a></li></ul><div id="refer-anchor-3"></div><ul><li>[3]. <a href="https://github.com/python/cpython/commit/f6d9e5926b6138994eaa60d1c36462e36105733d">https://github.com/python/cpython/commit/f6d9e5926b6138994eaa60d1c36462e36105733d</a></li></ul><div id="refer-anchor-4"></div><ul><li>[4]. <a href="https://github.com/python/cpython/commit/1ab6356ebec25f216a0eddbd81225abcb93f2d55">https://github.com/python/cpython/commit/1ab6356ebec25f216a0eddbd81225abcb93f2d55</a></li></ul><div id="refer-anchor-5"></div><ul><li>[5]. <a href="https://github.com/python/cpython/issues/116168">https://github.com/python/cpython/issues/116168</a></li></ul>]]></content>
    
    
    <summary type="html">&lt;p&gt;和 SRE 日志 系列一样，Debug 日志用来复盘我一些可以公开的调试经历，希望能帮助到大家。&lt;/p&gt;
&lt;p&gt;这篇是 Python 3.13 Beta 下 JIT/Tier 2 优化器的一个 Bug ，前后历时五天，最终修改点很小，非常有趣&lt;/p&gt;</summary>
    
    
    
    <category term="编程" scheme="https://www.manjusaka.blog/categories/%E7%BC%96%E7%A8%8B/"/>
    
    <category term="CPython" scheme="https://www.manjusaka.blog/categories/%E7%BC%96%E7%A8%8B/CPython/"/>
    
    
    <category term="编程" scheme="https://www.manjusaka.blog/tags/%E7%BC%96%E7%A8%8B/"/>
    
    <category term="笔记" scheme="https://www.manjusaka.blog/tags/%E7%AC%94%E8%AE%B0/"/>
    
    <category term="水文" scheme="https://www.manjusaka.blog/tags/%E6%B0%B4%E6%96%87/"/>
    
    <category term="CPython" scheme="https://www.manjusaka.blog/tags/CPython/"/>
    
  </entry>
  
  <entry>
    <title>实现 NES 中的一些笔记：nametable 的 mirror 计算</title>
    <link href="https://www.manjusaka.blog/posts/2024/05/24/a-tour-to-make-a-nes-simulator-nametable-caculate/"/>
    <id>https://www.manjusaka.blog/posts/2024/05/24/a-tour-to-make-a-nes-simulator-nametable-caculate/</id>
    <published>2024-05-24T17:00:00.000Z</published>
    <updated>2026-03-29T17:00:43.280Z</updated>
    
    <content type="html"><![CDATA[<script src="/assets/js/DPlayer.min.js"> </script><p>随便记录一些写 NES 中的笔记，这次写一下关于 nametable 的 mirror 计算。</p><span id="more"></span><h2 id="正文"><a href="#正文" class="headerlink" title="正文"></a>正文</h2><p>NES 红白机的渲染过程相对来说比较复杂，为了讲今天的 mirror 计算，大致科普一下一些信息</p><ol><li>首先我们屏幕显示的分辨率为 256<em>240，然后我们最基本的渲染单元为 tile，一个 tile 为8个像素，意味着我们一个屏幕上有 32</em>30 个 tile</li><li>我们屏幕上显示的背景图案是存放在 Pattern Table 中的，Pattern Table 映射到 CHR 中，可能是 RAM 也可能是 ROM，取决于 Mapper 的实现</li><li>我们为了在屏幕上显示合理的图案，我们需要一个 Index 去索引每个 Tile 的图案在 Pattern Table 中的位置。现在 32<em>30 个 tile，我们需要 32</em>30 个 8bit 的 Index，也就是 960 Byte 的数据。然后我们用剩下的 64 Byte 的数据来存放 Attribute Table，Attribute Table 用来存放每个 tile 的属性，比如颜色，是否翻转等等</li></ol><p>通常来说，我们 NES 里面设计了四个 nametable，理论上的空间是 4KB 的空间。但是实际上我们内置的 PPU 的 VRAM 只有 2KB（除非特定的 Mapper 支持映射到 4KB 或者更大），可能一些同学已经想到了，因为大部分游戏背景是重复的，所以我们可以复用背景，所以我们需要做 mirror 计算</p><p>我们四个 nametables 的布局是这样的</p><div class="table-container"><table><thead><tr><th></th><th></th></tr></thead><tbody><tr><td>A</td><td>B</td></tr><tr><td>C</td><td>D</td></tr></tbody></table></div><p>为了方便我们后面描述，我们起始地址设置为 0x00（实际上是 0x2000）</p><ol><li>A: 0x00 ~ 0x3FF</li><li>B: 0x400 ~ 0x7FF</li><li>C: 0x800 ~ 0xBFF</li><li>D: 0xC00 ~ 0xFFF</li></ol><p>我们常见有两种 mirror 计算方式</p><ol><li>垂直镜像，将 C 映射到 A，D 映射到 B</li><li>水平镜像，将 B 映射到 A，D 映射到 C</li></ol><p>那么这个地址的换算逻辑怎么写呢？</p><p>我们最开始直观观察，我们可以发现，这个实际上是有两个 table 映射到 0x00 到 0x400 空间，剩下两个映射到 0x400 到 0x800 空间</p><p>那么我们很简单了，最暴利的方法是直接用哈希表来算</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> enum</span><br><span class="line"></span><br><span class="line">INDEX = [[<span class="number">0</span>, <span class="number">0</span>, <span class="number">1</span>, <span class="number">1</span>], [<span class="number">0</span>, <span class="number">1</span>, <span class="number">0</span>, <span class="number">1</span>]]</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">Direction</span>(enum.IntEnum):</span><br><span class="line">    Horizontal = <span class="number">0</span></span><br><span class="line">    Vertical = <span class="number">1</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">def</span> <span class="title function_">mirror_lookup</span>(<span class="params">direction: Direction, address: <span class="built_in">int</span></span>) -&gt; <span class="built_in">int</span>:</span><br><span class="line">    page = address // <span class="number">0x400</span></span><br><span class="line">    offset = address % <span class="number">0x400</span></span><br><span class="line">    <span class="keyword">return</span> INDEX[direction][page] * <span class="number">0x400</span> + offset</span><br></pre></td></tr></table></figure><p>很简单的操作，我们根据传入的地址除以 0x400 来判断是哪个 page，然后根据 direction + page 来判断是映射到哪个区间，然后返回新的地址</p><p>这样就可以了吗？</p><p>我们看下我们上面的代码，需要一个额外的空间来存储映射关系，以及需要两次额外的寻址操作。在70年代这寸土寸金的地方，毫无疑问是无法接受的</p><p>那么我们有没有更好的方法呢？</p><p>有！</p><p>我们先来看垂直镜像，我们可以发现 A 和 C 的地址是一样的，B 和 D 的地址是一样的，那么实际上，这里我们可以转化为一个简单的对于 0x800 的取模运算</p><p>那么水平镜像的代码怎么写呢？我们可以这样想一下</p><p>我们现在布局可以想象为一个 800 * 800 的矩阵，我们可以先缩小为 400 * 400 的矩阵。即我们 A 到 B 取值范围就缩小为 0x00 到 0x3FF，同时我们 C 到 D 的取值范围也缩小为 0x400 到 0x7FF。这个时候，我们就能发现我们利用位运算 <code>&amp;</code> 的性质，和 0x400 做与运算，我们就能得到 A 和 B 两个区间的基准起始地址 0x00 以及 C 和 D 两个区间的基准起始地址 0x400。最后加上模运算的结果，我们就能得到新的地址</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"></span><br><span class="line"><span class="keyword">def</span> <span class="title function_">mirror_lookup_new</span>(<span class="params">direction: Direction, address: <span class="built_in">int</span></span>) -&gt; <span class="built_in">int</span>:</span><br><span class="line">    <span class="keyword">if</span> direction == Direction.Vertical:</span><br><span class="line">        <span class="keyword">return</span> address % (<span class="number">2</span> * <span class="number">0x400</span>)</span><br><span class="line">    <span class="keyword">return</span> ((address&gt;&gt;<span class="number">1</span>) &amp; <span class="number">0x400</span>) + (address % <span class="number">0x400</span>)</span><br></pre></td></tr></table></figure><p>最后我们来跑一个 benchmark</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">print</span>(</span><br><span class="line">    timeit.repeat(<span class="keyword">lambda</span>: mirror_lookup(Direction.Horizontal, <span class="number">0x401</span>), number=<span class="number">10000000</span>)</span><br><span class="line">)</span><br><span class="line"><span class="built_in">print</span>(</span><br><span class="line">    timeit.repeat(</span><br><span class="line">        <span class="keyword">lambda</span>: mirror_lookup_new(Direction.Horizontal, <span class="number">0x401</span>),</span><br><span class="line">        number=<span class="number">10000000</span>,</span><br><span class="line">    )</span><br><span class="line">)</span><br></pre></td></tr></table></figure><p>结果是</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line">print(</span><br><span class="line">    timeit.repeat(lambda: mirror_lookup(Direction.Horizontal, 0x401), number=10000000)</span><br><span class="line">)</span><br><span class="line">print(</span><br><span class="line">    timeit.repeat(</span><br><span class="line">        lambda: mirror_lookup_new(Direction.Horizontal, 0x401),</span><br><span class="line">        number=10000000,</span><br><span class="line">    )</span><br><span class="line">)</span><br><span class="line"></span><br></pre></td></tr></table></figure><p>我？？？哦，突然想起，Python 中位运算不一定快。这个时候我赶紧用 C 写了个版本进行测试</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string">&lt;stdio.h&gt;</span></span></span><br><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string">&lt;stdint.h&gt;</span></span></span><br><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string">&lt;sys/time.h&gt;</span></span></span><br><span class="line"></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> PAGE_SIZE 0x400</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">typedef</span> <span class="class"><span class="keyword">enum</span> &#123;</span></span><br><span class="line">    Horizontal = <span class="number">0</span>,</span><br><span class="line">    Vertical = <span class="number">1</span></span><br><span class="line">&#125; Direction;</span><br><span class="line"></span><br><span class="line"><span class="type">int</span> INDEX[<span class="number">2</span>][<span class="number">4</span>] = &#123;&#123;<span class="number">0</span>, <span class="number">0</span>, <span class="number">1</span>, <span class="number">1</span>&#125;, &#123;<span class="number">0</span>, <span class="number">1</span>, <span class="number">0</span>, <span class="number">1</span>&#125;&#125;; <span class="comment">// Declare INDEX globally</span></span><br><span class="line"></span><br><span class="line"><span class="type">int</span> <span class="title function_">mirror_lookup</span><span class="params">(Direction direction, <span class="type">int</span> address)</span> &#123;</span><br><span class="line">    <span class="type">int</span> page = address / PAGE_SIZE;</span><br><span class="line">    <span class="type">int</span> offset = address % PAGE_SIZE;</span><br><span class="line">    <span class="keyword">return</span> INDEX[direction][page] * PAGE_SIZE + offset;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="type">int</span> <span class="title function_">mirror_lookup_new</span><span class="params">(Direction direction, <span class="type">int</span> address)</span> &#123;</span><br><span class="line">    <span class="keyword">if</span> (direction == Vertical) &#123;</span><br><span class="line">        <span class="keyword">return</span> address % (<span class="number">2</span> * PAGE_SIZE);</span><br><span class="line">    &#125;</span><br><span class="line">    <span class="keyword">return</span> ((address &gt;&gt; <span class="number">1</span>) &amp; PAGE_SIZE) + (address % PAGE_SIZE);</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="type">long</span> <span class="type">long</span> <span class="title function_">current_time</span><span class="params">()</span> &#123;</span><br><span class="line">    <span class="class"><span class="keyword">struct</span> <span class="title">timeval</span> <span class="title">tv</span>;</span></span><br><span class="line">    gettimeofday(&amp;tv, <span class="literal">NULL</span>);</span><br><span class="line">    <span class="keyword">return</span> (<span class="type">long</span> <span class="type">long</span>)(tv.tv_sec) * <span class="number">1000000</span> + tv.tv_usec;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="type">int</span> <span class="title function_">main</span><span class="params">()</span> &#123;</span><br><span class="line">    <span class="comment">// Timing the original function</span></span><br><span class="line">    <span class="type">long</span> <span class="type">long</span> start1 = current_time();</span><br><span class="line">    <span class="keyword">for</span> (<span class="type">int</span> i = <span class="number">0</span>; i &lt; <span class="number">100000000</span>; i++) &#123;</span><br><span class="line">        mirror_lookup(Horizontal, <span class="number">0x401</span>);</span><br><span class="line">    &#125;</span><br><span class="line">    <span class="type">long</span> <span class="type">long</span> end1 = current_time();</span><br><span class="line">    <span class="built_in">printf</span>(<span class="string">&quot;Time taken for original function: %lld microseconds\n&quot;</span>, end1 - start1);</span><br><span class="line"></span><br><span class="line">    <span class="comment">// Timing the new function</span></span><br><span class="line">    <span class="type">long</span> <span class="type">long</span> start2 = current_time();</span><br><span class="line">    <span class="keyword">for</span> (<span class="type">int</span> i = <span class="number">0</span>; i &lt; <span class="number">100000000</span>; i++) &#123;</span><br><span class="line">        mirror_lookup_new(Horizontal, <span class="number">0x401</span>);</span><br><span class="line">    &#125;</span><br><span class="line">    <span class="type">long</span> <span class="type">long</span> end2 = current_time();</span><br><span class="line">    <span class="built_in">printf</span>(<span class="string">&quot;Time taken for modified function: %lld microseconds\n&quot;</span>, end2 - start2);</span><br><span class="line"></span><br><span class="line">    <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">&#125;</span><br><span class="line"></span><br></pre></td></tr></table></figure><p>结果是</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">Time taken for original function: 355402 microseconds</span><br><span class="line">Time taken for modified function: 251868 microseconds</span><br></pre></td></tr></table></figure><p>大概快了百分之30，符合预期</p><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><p>很多时候能发现各种古早的系统里为了性能做的各种的 trick，非常好玩。</p><p>这里留个思考题</p><blockquote><p>我们假设 NES 的 CPU 是理光 6502，CPU 频率 1.79 MHz，我们能否再定量分析下我们实现一个 mirror 流程的两种方法各自需要多少时钟周期？</p></blockquote>]]></content>
    
    
    <summary type="html">&lt;p&gt;随便记录一些写 NES 中的笔记，这次写一下关于 nametable 的 mirror 计算。&lt;/p&gt;</summary>
    
    
    
    <category term="编程" scheme="https://www.manjusaka.blog/categories/%E7%BC%96%E7%A8%8B/"/>
    
    <category term="计算机体系" scheme="https://www.manjusaka.blog/categories/%E7%BC%96%E7%A8%8B/%E8%AE%A1%E7%AE%97%E6%9C%BA%E4%BD%93%E7%B3%BB/"/>
    
    <category term="汇编" scheme="https://www.manjusaka.blog/categories/%E7%BC%96%E7%A8%8B/%E8%AE%A1%E7%AE%97%E6%9C%BA%E4%BD%93%E7%B3%BB/%E6%B1%87%E7%BC%96/"/>
    
    
    <category term="编程" scheme="https://www.manjusaka.blog/tags/%E7%BC%96%E7%A8%8B/"/>
    
    <category term="Linux" scheme="https://www.manjusaka.blog/tags/Linux/"/>
    
    <category term="笔记" scheme="https://www.manjusaka.blog/tags/%E7%AC%94%E8%AE%B0/"/>
    
    <category term="水文" scheme="https://www.manjusaka.blog/tags/%E6%B0%B4%E6%96%87/"/>
    
  </entry>
  
</feed>
