{"id":2099,"date":"2025-02-27T07:02:53","date_gmt":"2025-02-27T07:02:53","guid":{"rendered":"https:\/\/mailitics.com\/index.php\/2025\/02\/27\/nine-rules-for-simd-acceleration-of-your-rust-code-part-1-c16fe639ce21\/"},"modified":"2025-02-27T07:02:53","modified_gmt":"2025-02-27T07:02:53","slug":"nine-rules-for-simd-acceleration-of-your-rust-code-part-1-c16fe639ce21","status":"publish","type":"post","link":"https:\/\/mailitics.com\/index.php\/2025\/02\/27\/nine-rules-for-simd-acceleration-of-your-rust-code-part-1-c16fe639ce21\/","title":{"rendered":"Nine Rules for SIMD Acceleration of Your Rust Code (Part 1)"},"content":{"rendered":"<p>    Nine Rules for SIMD Acceleration of Your Rust Code (Part 1)<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\" id=\"bb2b\"><em>Thanks to Ben Lichtman (B3NNY) at the Seattle Rust Meetup for pointing me in the right direction on SIMD.<\/em><\/p>\n<\/blockquote>\n<p class=\"wp-block-paragraph\" id=\"6068\"><a href=\"https:\/\/en.wikipedia.org\/wiki\/Single_instruction,_multiple_data\" rel=\"noreferrer noopener\" target=\"_blank\">SIMD<\/a>\u00a0(Single Instruction, Multiple Data) operations have been a feature of Intel\/AMD and ARM CPUs since the early 2000s. These operations enable you to, for example, add an array of eight\u00a0<code>i32<\/code>\u00a0to another array of eight\u00a0<code>i32<\/code>\u00a0with just one CPU operation\u00a0<strong>on a single core<\/strong>. Using SIMD operations greatly speeds up certain tasks. If you\u2019re not using SIMD, you may not be fully using your CPU\u2019s capabilities.<\/p>\n<p class=\"wp-block-paragraph\" id=\"14c1\">Is this \u201cYet Another <a href=\"https:\/\/towardsdatascience.com\/tag\/rust\/\" title=\"Rust\">Rust<\/a> and SIMD\u201d article? Yes and no. Yes, I did apply SIMD to a programming problem and then feel compelled to write an article about it. No, I hope that this article also goes into enough depth that it can guide you through\u00a0<em>your<\/em>\u00a0project. It explains the newly available SIMD capabilities and settings in Rust nightly. It includes a Rust SIMD cheatsheet. It shows how to make your SIMD code generic without leaving safe Rust. It gets you started with tools such as Godbolt and Criterion. Finally, it introduces new cargo commands that make the process easier.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dotted\">\n<p class=\"wp-block-paragraph\" id=\"410a\">The\u00a0<a href=\"https:\/\/crates.io\/crates\/range-set-blaze\" target=\"_blank\" rel=\"noreferrer noopener\"><code>range-set-blaze<\/code><\/a>\u00a0crate uses its\u00a0<code>RangeSetBlaze::from_iter<\/code>\u00a0method to ingest potentially long sequences of integers. When the integers are \u201cclumpy\u201d, it can do this\u00a0<a href=\"https:\/\/github.com\/CarlKCarlK\/range-set-blaze\/blob\/main\/docs\/bench.md\" target=\"_blank\" rel=\"noreferrer noopener\">30 times faster<\/a>\u00a0than Rust\u2019s standard\u00a0<code>HashSet::from_iter<\/code>. Can we do even better if we use <a href=\"https:\/\/towardsdatascience.com\/tag\/simd\/\" title=\"Simd\">Simd<\/a> operations? Yes!<\/p>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\" id=\"3748\"><em>See\u00a0<a href=\"https:\/\/docs.rs\/range-set-blaze\/latest\/range_set_blaze\/struct.RangeSetBlaze.html#constructor-performance\" target=\"_blank\" rel=\"noreferrer noopener\">this documentation<\/a>\u00a0for the definition of \u201cclumpy\u201d. Also, what happens if the integers are not clumpy?\u00a0<code>RangeSetBlaze<\/code>\u00a0is\u00a0<a href=\"https:\/\/github.com\/CarlKCarlK\/range-set-blaze\/blob\/main\/docs\/bench.md\" target=\"_blank\" rel=\"noreferrer noopener\">2 to 3 times\u00a0<\/a><\/em><a href=\"https:\/\/github.com\/CarlKCarlK\/range-set-blaze\/blob\/main\/docs\/bench.md\" target=\"_blank\" rel=\"noreferrer noopener\">slower<\/a><em>\u00a0than\u00a0<code>HashSet<\/code>.<\/em><\/p>\n<\/blockquote>\n<p class=\"wp-block-paragraph\" id=\"ece5\">On clumpy integers,\u00a0<code>RangeSetBlaze::from_slice\u00a0<\/code>\u2014 a new method based on SIMD operations \u2014 is 7 times faster than\u00a0<code>RangeSetBlaze::from_iter.\u00a0<\/code>That makes it more than 200 times faster than\u00a0<code>HashSet::from_iter<\/code>. (When the integers are not clumpy, it is still 2 to 3 times slower than\u00a0<code>HashSet<\/code>.)<\/p>\n<p class=\"wp-block-paragraph\" id=\"9fa1\">Over the course of implementing this speed up, I learned nine rules that can help you accelerate your projects with SIMD operations.<\/p>\n<p class=\"wp-block-paragraph\" id=\"fbcc\">The rules are:<\/p>\n<ol class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">Use nightly Rust and\u00a0<code>core::simd<\/code>, Rust\u2019s experimental standard SIMD module.<\/li>\n<li class=\"wp-block-list-item\">CCC: Check, Control, and Choose your computer\u2019s SIMD capabilities.<\/li>\n<li class=\"wp-block-list-item\">Learn\u00a0<code>core::simd<\/code>, but selectively.<\/li>\n<li class=\"wp-block-list-item\">Brainstorm candidate algorithms.<\/li>\n<li class=\"wp-block-list-item\">Use Godbolt and AI to understand your code\u2019s assembly, even if you don\u2019t know assembly language.<\/li>\n<li class=\"wp-block-list-item\">Generalize to all types and LANES with in-lined generics, (and when that doesn\u2019t work) macros, and (when that doesn\u2019t work) traits.<\/li>\n<\/ol>\n<p class=\"wp-block-paragraph\" id=\"8cbb\">See\u00a0<a href=\"https:\/\/towardsdatascience.com\/nine-rules-for-simd-acceleration-of-your-rust-code-part-2-6a104b3be6f3\" rel=\"noreferrer noopener\" target=\"_blank\">Part 2<\/a>\u00a0for these rules:<\/p>\n<p class=\"wp-block-paragraph\" id=\"7c52\"><em>7. Use Criterion benchmarking to pick an algorithm and to discover that LANES should (almost) always be 32 or 64.<\/em><\/p>\n<p class=\"wp-block-paragraph\" id=\"52a9\"><em>8. Integrate your best SIMD algorithm into your project with\u00a0<code>as_simd<\/code>, special code for\u00a0<code>i128\/u128<\/code>, and additional in-context benchmarking.<\/em><\/p>\n<p class=\"wp-block-paragraph\" id=\"83e7\"><em>9. Extricate your best SIMD algorithm from your project (for now) with an optional cargo feature.<\/em><\/p>\n<p class=\"wp-block-paragraph\" id=\"66da\"><em>Aside: To avoid wishy-washiness, I call these \u201crules\u201d, but they are, of course, just suggestions.<\/em><\/p>\n<h2 class=\"wp-block-heading\">Rule 1: Use nightly Rust and\u00a0<code>core::simd<\/code>, Rust\u2019s experimental standard SIMD module.<\/h2>\n<p class=\"wp-block-paragraph\" id=\"bede\">Rust can access SIMD operations either via the stable\u00a0<a href=\"https:\/\/doc.rust-lang.org\/core\/arch\/index.html\" target=\"_blank\" rel=\"noreferrer noopener\"><code>core::arch<\/code><\/a>\u00a0module or via nighty\u2019s\u00a0<a href=\"https:\/\/doc.rust-lang.org\/nightly\/core\/simd\/struct.Simd.html\" target=\"_blank\" rel=\"noreferrer noopener\"><code>core::simd<\/code><\/a>\u00a0module. Let\u2019s compare them:<\/p>\n<p class=\"wp-block-paragraph\" id=\"5235\"><strong><code>core::arch<\/code><\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">Stable<\/li>\n<li class=\"wp-block-list-item\">\n<a href=\"https:\/\/doc.rust-lang.org\/core\/arch\/index.html#ergonomics\" target=\"_blank\" rel=\"noreferrer noopener\">\u201c[N]ot the easiest thing in the world<\/a>\u201d<\/li>\n<li class=\"wp-block-list-item\">Offers high-performance to downstream users of your crate. For example, because\u00a0<a href=\"https:\/\/github.com\/BurntSushi\/regex\" target=\"_blank\" rel=\"noreferrer noopener\">regex<\/a>\u00a0and\u00a0<a href=\"https:\/\/github.com\/BurntSushi\/memchr\" target=\"_blank\" rel=\"noreferrer noopener\"><code>memchr<\/code><\/a>\u00a0went this route, over 100,000 other crates got stable SIMD acceleration for free. [<a href=\"https:\/\/www.reddit.com\/r\/rust\/comments\/18hj1m6\/comment\/kdbfktb\/?utm_source=share&amp;utm_medium=web2x&amp;context=3\" target=\"_blank\" rel=\"noreferrer noopener\">Reddit discussion<\/a>,\u00a0<a href=\"https:\/\/github.com\/BurntSushi\/memchr\/blob\/master\/src\/arch\/x86_64\/memchr.rs\" target=\"_blank\" rel=\"noreferrer noopener\">some relevant\u00a0<code>memchr<\/code>\u00a0code<\/a>]<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\" id=\"c62f\"><strong><code>core::simd<\/code><\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">Nightly<\/li>\n<li class=\"wp-block-list-item\">Delightfully easy and portable.<\/li>\n<li class=\"wp-block-list-item\">Limits downstream users to nightly.<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\" id=\"a4df\">I decided to go with \u201ceasy\u201d. If you decide to take the harder road, starting first with the easier path may still be worthwhile.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dotted\">\n<p class=\"wp-block-paragraph\" id=\"fab1\">In either case, before we try to use SIMD operations in a larger project, let\u2019s make sure we can get them working at all. Here are the steps:<\/p>\n<p class=\"wp-block-paragraph\" id=\"57d9\">First, create a project called\u00a0<code>simd_hello<\/code>:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-bash\">cargo new simd_hello\ncd simd_hello<\/code><\/pre>\n<p class=\"wp-block-paragraph\" id=\"1b6c\">Edit\u00a0<code>src\/main.rs<\/code>\u00a0to contain (<a href=\"https:\/\/play.rust-lang.org\/?version=nightly&amp;mode=debug&amp;edition=2021&amp;gist=e39aa876c0abed9915d389fe73687839\" target=\"_blank\" rel=\"noreferrer noopener\">Rust playground<\/a>):<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-rust\">\/\/ Tell nightly Rust to enable 'portable_simd'\n#![feature(portable_simd)]\nuse core::simd::prelude::*;\n\n\/\/ constant Simd structs\nconst LANES: usize = 32;\nconst THIRTEENS: Simd&lt;u8, LANES&gt; = Simd::&lt;u8, LANES&gt;::from_array([13; LANES]);\nconst TWENTYSIXS: Simd&lt;u8, LANES&gt; = Simd::&lt;u8, LANES&gt;::from_array([26; LANES]);\nconst ZEES: Simd&lt;u8, LANES&gt; = Simd::&lt;u8, LANES&gt;::from_array([b'Z'; LANES]);\n\nfn main() {\n    \/\/ create a Simd struct from a slice of LANES bytes\n    let mut data = Simd::&lt;u8, LANES&gt;::from_slice(b\"URYYBJBEYQVQBUBCRVGFNYYTBVATJRYY\");\n\n    data += THIRTEENS; \/\/ add 13 to each byte\n\n    \/\/ compare each byte to 'Z', where the byte is greater than 'Z', subtract 26\n    let mask = data.simd_gt(ZEES); \/\/ compare each byte to 'Z'\n    data = mask.select(data - TWENTYSIXS, data);\n\n    let output = String::from_utf8_lossy(data.as_array());\n    assert_eq!(output, \"HELLOWORLDIDOHOPEITSALLGOINGWELL\");\n    println!(\"{}\", output);\n}<\/code><\/pre>\n<p class=\"wp-block-paragraph\" id=\"ed23\">Next \u2014 full SIMD capabilities require the nightly version of Rust. Assuming you have Rust installed, install nightly (<code>rustup install nightly<\/code>). Make sure you have the latest nightly version (<code>rustup update nightly<\/code>). Finally, set this project to use nightly (<code>rustup override set nightly<\/code>).<\/p>\n<p class=\"wp-block-paragraph\" id=\"1d68\">You can now run the program with\u00a0<code>cargo run<\/code>. The program applies\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/ROT13\" rel=\"noreferrer noopener\" target=\"_blank\">ROT13 decryption<\/a>\u00a0to 32 bytes of upper-case letters. With SIMD, the program can decrypt all 32 bytes simultaneously.<\/p>\n<p class=\"wp-block-paragraph\" id=\"040e\">Let\u2019s look at each section of the program to see how it works. It starts with:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-rust\">#![feature(portable_simd)]\nuse core::simd::prelude::*;<\/code><\/pre>\n<p class=\"wp-block-paragraph\" id=\"b859\">Rust nightly offers its extra capabilities (or \u201cfeatures\u201d) only on request. The\u00a0<code>#![feature(portable_simd)]<\/code>\u00a0statement requests that Rust nightly make available the new experimental\u00a0<code>core::simd<\/code>\u00a0module. The\u00a0<code>use<\/code>\u00a0statement then imports the module\u2019s most important types and traits.<\/p>\n<p class=\"wp-block-paragraph\" id=\"d51c\">In the code\u2019s next section, we define useful constants:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-rust\">const LANES: usize = 32;\nconst THIRTEENS: Simd&lt;u8, LANES&gt; = Simd::&lt;u8, LANES&gt;::from_array([13; LANES]);\nconst TWENTYSIXS: Simd&lt;u8, LANES&gt; = Simd::&lt;u8, LANES&gt;::from_array([26; LANES]);\nconst ZEES: Simd&lt;u8, LANES&gt; = Simd::&lt;u8, LANES&gt;::from_array([b'Z'; LANES]);<\/code><\/pre>\n<p class=\"wp-block-paragraph\" id=\"bba5\">The\u00a0<code>Simd<\/code>\u00a0struct is a special kind of Rust array. (It is, for example, always memory aligned.) The constant\u00a0<code>LANES<\/code>\u00a0tells the length of the\u00a0<code>Simd<\/code>\u00a0array. The\u00a0<code>from_array<\/code>\u00a0constructor copies a regular Rust array to create a\u00a0<code>Simd<\/code>. In this case, because we want\u00a0<code>const<\/code>\u00a0<code>Simd<\/code>\u2019s, the arrays we construct from must also be\u00a0<code>const<\/code>.<\/p>\n<p class=\"wp-block-paragraph\" id=\"3d4f\">The next two lines copy our encrypted text into\u00a0<code>data<\/code>\u00a0and then adds 13 to each letter.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-rust\">let mut data = Simd::&lt;u8, LANES&gt;::from_slice(b\"URYYBJBEYQVQBUBCRVGFNYYTBVATJRYY\");\ndata += THIRTEENS;<\/code><\/pre>\n<p class=\"wp-block-paragraph\" id=\"939e\">What if you make an error and your encrypted text isn\u2019t exactly length\u00a0<code>LANES<\/code>\u00a0(32)? Sadly, the compiler won\u2019t tell you. Instead, when you run the program,\u00a0<code>from_slice<\/code>\u00a0will panic. What if the encrypted text contains non-upper-case letters? In this example program, we\u2019ll ignore that possibility.<\/p>\n<p class=\"wp-block-paragraph\" id=\"08e0\">The\u00a0<code>+=<\/code>\u00a0operator does element-wise addition between the\u00a0<code>Simd<\/code>\u00a0<code>data<\/code>\u00a0and\u00a0<code>Simd<\/code>\u00a0<code>THIRTEENS<\/code>. It puts the result in\u00a0<code>data<\/code>. Recall that debug builds of regular Rust addition check for overflows. Not so with SIMD. Rust defines SIMD arithmetic operators to always wrap. Values of type\u00a0<code>u8<\/code>\u00a0wrap after 255.<\/p>\n<p class=\"wp-block-paragraph\" id=\"17b5\">Coincidentally, Rot13 decryption also requires wrapping, but after \u2018Z\u2019 rather than after 255. Here is one approach to coding the needed Rot13 wrapping. It subtracts 26 from any values\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/On_Beyond_Zebra!\" target=\"_blank\" rel=\"noreferrer noopener\">on beyond \u2018Z<\/a>\u2019.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-rust\">let mask = data.simd_gt(ZEES);\ndata = mask.select(data - TWENTYSIXS, data);<\/code><\/pre>\n<p class=\"wp-block-paragraph\" id=\"80c8\">This says to find the element-wise places beyond \u2018Z\u2019. Then, subtract 26 from all values. At the places of interest, use the subtracted values. At the other places, use the original values. Does subtracting from all values and then using only some seem wasteful? With SIMD, this takes no extra computer time and avoids jumps. This strategy is, thus, efficient and common.<\/p>\n<p class=\"wp-block-paragraph\" id=\"3e13\">The program ends like so:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-rust\">let output = String::from_utf8_lossy(data.as_array());\nassert_eq!(output, \"HELLOWORLDIDOHOPEITSALLGOINGWELL\");\nprintln!(\"{}\", output);<\/code><\/pre>\n<p class=\"wp-block-paragraph\" id=\"2db7\">Notice the\u00a0<code>.as_array()<\/code>\u00a0method. It safely transmutes a\u00a0<code>Simd<\/code>\u00a0struct into a regular Rust array without copying.<\/p>\n<p class=\"wp-block-paragraph\" id=\"65bb\">Surprisingly to me, this program runs fine on computers without SIMD extensions. Rust nightly compiles the code to regular (non-SIMD) instructions. But we don\u2019t just want to run \u201cfine\u201d, we want to run\u00a0<em>faster<\/em>. That requires us to turn on our computer\u2019s SIMD power.<\/p>\n<h2 class=\"wp-block-heading\" id=\"f7c4\">Rule 2: CCC: Check, Control, and Choose your computer\u2019s SIMD capabilities.<\/h2>\n<p class=\"wp-block-paragraph\" id=\"57c7\">To make SIMD programs run faster on your machine, you must first discover which SIMD extensions your machine supports. If you have an Intel\/AMD machine, you can use my\u00a0<a href=\"https:\/\/github.com\/CarlKCarlK\/cargo-simd-detect\" target=\"_blank\" rel=\"noreferrer noopener\"><code>simd-detect<\/code><\/a>\u00a0cargo command.<\/p>\n<p class=\"wp-block-paragraph\" id=\"dda6\">Run with:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-bash\">rustup override set nightly\ncargo install cargo-simd-detect --force\ncargo simd-detect<\/code><\/pre>\n<p class=\"wp-block-paragraph\" id=\"3b24\">On my machine, it outputs:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-\">extension       width                   available       enabled\nsse2            128-bit\/16-bytes        true            true\navx2            256-bit\/32-bytes        true            false\navx512f         512-bit\/64-bytes        true            false<\/code><\/pre>\n<p class=\"wp-block-paragraph\" id=\"5fe3\">This says that my machine supports the\u00a0<code>sse2<\/code>,\u00a0<code>avx2<\/code>, and\u00a0<code>avx512f<\/code>\u00a0SIMD extensions. Of those, by default, Rust enables the ubiquitous twenty-year-old\u00a0<code>sse2<\/code>\u00a0extension.<\/p>\n<p class=\"wp-block-paragraph\" id=\"1910\">The SIMD extensions form a hierarchy with\u00a0<code>avx512f<\/code>\u00a0above\u00a0<code>avx2<\/code>\u00a0above\u00a0<code>sse2<\/code>. Enabling a higher-level extension also enables the lower-level extensions.<\/p>\n<p class=\"wp-block-paragraph\" id=\"c659\">Most Intel\/AMD computers also support the ten-year-old\u00a0<code>avx2<\/code>\u00a0extension. You enable it by setting an environment variable:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-bash\"># For Windows Command Prompt\nset RUSTFLAGS=-C target-feature=+avx2\n\n# For Unix-like shells (like Bash)\nexport RUSTFLAGS=\"-C target-feature=+avx2\"<\/code><\/pre>\n<p class=\"wp-block-paragraph\" id=\"fdf6\">\u201cForce install\u201d and run\u00a0<code>simd-detect<\/code>\u00a0again and you should see that\u00a0<code>avx2<\/code>\u00a0is enabled.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-bash\"># Force install every time to see changes to 'enabled'\ncargo install cargo-simd-detect --force\ncargo simd-detect<\/code><\/pre>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-\">extension         width                   available       enabled\nsse2            128-bit\/16-bytes        true            true\navx2            256-bit\/32-bytes        true            true\navx512f         512-bit\/64-bytes        true            false<\/code><\/pre>\n<p class=\"wp-block-paragraph\" id=\"b688\">Alternatively, you can turn on every SIMD extension that your machine supports:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-bash\"># For Windows Command Prompt\nset RUSTFLAGS=-C target-cpu=native\n\n# For Unix-like shells (like Bash)\nexport RUSTFLAGS=\"-C target-cpu=native\"<\/code><\/pre>\n<p class=\"wp-block-paragraph\" id=\"2c0d\">On my machine this enables\u00a0<code>avx512f<\/code>, a newer SIMD extension supported by some Intel computers and a few AMD computers.<\/p>\n<p class=\"wp-block-paragraph\" id=\"ebcd\">You can set SIMD extensions back to their default (<code>sse2<\/code>\u00a0on Intel\/AMD) with:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-bash\"># For Windows Command Prompt\nset RUSTFLAGS=\n\n# For Unix-like shells (like Bash)\nunset RUSTFLAGS<\/code><\/pre>\n<p class=\"wp-block-paragraph\" id=\"0dd4\">You may wonder why\u00a0<code>target-cpu=native<\/code>\u00a0isn\u2019t Rust\u2019s default. The problem is that binaries created using\u00a0<code>avx2<\/code>\u00a0or\u00a0<code>avx512f<\/code>\u00a0won\u2019t run on computers missing those SIMD extensions. So, if you are compiling only for your own use, use\u00a0<code>target-cpu=native<\/code>. If, however, you are compiling for others, choose your SIMD extensions thoughtfully and let people know which SIMD extension level you are assuming.<\/p>\n<p class=\"wp-block-paragraph\" id=\"fb2a\">Happily, whatever level of SIMD extension you pick, Rust\u2019s SIMD support is so flexible you can easily change your decision later. Let\u2019s next learn details of programming with SIMD in Rust.<\/p>\n<h2 class=\"wp-block-heading\">Rule 3: Learn\u00a0<code>core::simd<\/code>, but selectively.<\/h2>\n<p class=\"wp-block-paragraph\">To build with Rust\u2019s new\u00a0<a href=\"https:\/\/doc.rust-lang.org\/nightly\/core\/simd\/index.html\" target=\"_blank\" rel=\"noreferrer noopener\"><code>core::simd<\/code><\/a>\u00a0module you should learn selected building blocks. Here is a\u00a0<a href=\"https:\/\/github.com\/CarlKCarlK\/range-set-blaze\/blob\/nov23\/examples\/simd\/rust_simd_cheatsheet.md\" target=\"_blank\" rel=\"noreferrer noopener\">cheatsheet<\/a>\u00a0with the structs, methods, etc., that I\u2019ve found most useful. Each item includes a link to its\u00a0<a href=\"https:\/\/doc.rust-lang.org\/nightly\/core\/simd\/index.html\" target=\"_blank\" rel=\"noreferrer noopener\">documentation<\/a>.<\/p>\n<h3 class=\"wp-block-heading\" id=\"f3de\">Structs<\/h3>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\n<a href=\"https:\/\/doc.rust-lang.org\/nightly\/core\/simd\/struct.Simd.html\" target=\"_blank\" rel=\"noreferrer noopener\"><code>Simd<\/code><\/a>\u00a0\u2013 a special, aligned, fixed-length array of\u00a0<a href=\"https:\/\/doc.rust-lang.org\/std\/simd\/trait.SimdElement.html\" target=\"_blank\" rel=\"noreferrer noopener\"><code>SimdElement<\/code><\/a>. We refer to a position in the array and the element stored at that position as a \u201clane\u201d. By default, we copy\u00a0<code>Simd<\/code>\u00a0structs rather than reference them.<\/li>\n<li class=\"wp-block-list-item\">\n<a href=\"https:\/\/doc.rust-lang.org\/nightly\/core\/simd\/struct.Mask.html\" target=\"_blank\" rel=\"noreferrer noopener\"><code>Mask<\/code><\/a>\u00a0\u2013 a special Boolean array showing inclusion\/exclusion on a per-lane basis.<\/li>\n<\/ul>\n<h3 class=\"wp-block-heading\" id=\"e0ec\">SimdElements<\/h3>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">Floating-Point Types:\u00a0<code>f32<\/code>,\u00a0<code>f64<\/code>\n<\/li>\n<li class=\"wp-block-list-item\">Integer Types:\u00a0<code>i8<\/code>,\u00a0<code>u8<\/code>,\u00a0<code>i16<\/code>,\u00a0<code>u16<\/code>,\u00a0<code>i32<\/code>,\u00a0<code>u32<\/code>,\u00a0<code>i64<\/code>,\u00a0<code>u64<\/code>,\u00a0<code>isize<\/code>,\u00a0<code>usize<\/code>\n<\/li>\n<li class=\"wp-block-list-item\">\u2014\u00a0<a href=\"https:\/\/github.com\/rust-lang\/portable-simd\/issues\/108\" target=\"_blank\" rel=\"noreferrer noopener\"><em>but not\u00a0<code>i128<\/code>,\u00a0<code>u128<\/code><\/em><\/a>\n<\/li>\n<\/ul>\n<h3 class=\"wp-block-heading\" id=\"ee60\"><strong><code>Simd<\/code>\u00a0constructors<\/strong><\/h3>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\n<a href=\"https:\/\/doc.rust-lang.org\/nightly\/core\/simd\/struct.Simd.html#method.from_array\" target=\"_blank\" rel=\"noreferrer noopener\"><code>Simd::from_array<\/code><\/a>\u00a0\u2013 creates a\u00a0<code>Simd<\/code>\u00a0struct by copying a fixed-length array.<\/li>\n<li class=\"wp-block-list-item\">\n<a href=\"https:\/\/doc.rust-lang.org\/nightly\/core\/simd\/struct.Simd.html#method.from_slice\" target=\"_blank\" rel=\"noreferrer noopener\"><code>Simd::from_slice<\/code><\/a>\u00a0\u2013 creates a\u00a0<code>Simd&lt;T,LANE&gt;<\/code>\u00a0struct by copying the first\u00a0<code>LANE<\/code>\u00a0elements of a slice.<\/li>\n<li class=\"wp-block-list-item\">\n<a href=\"https:\/\/doc.rust-lang.org\/nightly\/core\/simd\/struct.Simd.html#method.splat\" target=\"_blank\" rel=\"noreferrer noopener\"><code>Simd::splat<\/code><\/a>\u00a0\u2013 replicates a single value across all lanes of a\u00a0<code>Simd<\/code>\u00a0struct.<\/li>\n<li class=\"wp-block-list-item\">\n<a href=\"https:\/\/doc.rust-lang.org\/nightly\/core\/simd\/struct.Simd.html#method.to_simd\" target=\"_blank\" rel=\"noreferrer noopener\"><code>slice::as_simd<\/code><\/a>\u00a0\u2013 without copying, safely transmutes a regular slice into an aligned slice of\u00a0<code>Simd<\/code>\u00a0(plus unaligned leftovers).<\/li>\n<\/ul>\n<h3 class=\"wp-block-heading\" id=\"82aa\"><strong><code>Simd<\/code>\u00a0conversion<\/strong><\/h3>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\n<a href=\"https:\/\/doc.rust-lang.org\/nightly\/core\/simd\/struct.Simd.html#method.as_array\" target=\"_blank\" rel=\"noreferrer noopener\"><code>Simd::as_array<\/code><\/a>\u00a0\u2013 without copying, safely transmutes an\u00a0<code>Simd<\/code>\u00a0struct into a regular array reference.<\/li>\n<\/ul>\n<h3 class=\"wp-block-heading\" id=\"ef71\"><strong><code>Simd<\/code>\u00a0methods and operators<\/strong><\/h3>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\n<a href=\"https:\/\/doc.rust-lang.org\/nightly\/core\/simd\/struct.Simd.html#method.index\" target=\"_blank\" rel=\"noreferrer noopener\"><code>simd[i]<\/code><\/a>\u00a0\u2013 extract a value from a lane of a\u00a0<code>Simd<\/code>.<\/li>\n<li class=\"wp-block-list-item\">\n<a href=\"https:\/\/doc.rust-lang.org\/core\/simd\/struct.Simd.html#impl-Add%3C%26'rhs+Simd%3CT,+LANES%3E%3E-for-%26'lhs+Simd%3CT,+LANES%3E\" target=\"_blank\" rel=\"noreferrer noopener\"><code>simd + simd<\/code><\/a>\u00a0\u2013 performs element-wise addition of two\u00a0<code>Simd<\/code>\u00a0structs. Also, supported\u00a0<code>-<\/code>,\u00a0<code>*<\/code>,\u00a0<code>\/<\/code>,\u00a0<code>%<\/code>, remainder, bitwise-and, -or, xor, -not, -shift.<\/li>\n<li class=\"wp-block-list-item\">\n<a href=\"https:\/\/doc.rust-lang.org\/core\/simd\/struct.Simd.html#impl-AddAssign%3CU%3E-for-Simd%3CT,+LANES%3E\" target=\"_blank\" rel=\"noreferrer noopener\"><code>simd += simd<\/code><\/a>\u00a0\u2013 adds another\u00a0<code>Simd<\/code>\u00a0struct to the current one, in place. Other operators supported, too.<\/li>\n<li class=\"wp-block-list-item\">\n<a href=\"https:\/\/doc.rust-lang.org\/nightly\/core\/simd\/struct.Simd.html#method.simd_gt\" target=\"_blank\" rel=\"noreferrer noopener\"><code>Simd::simd_gt<\/code><\/a>\u00a0\u2013 compares two\u00a0<code>Simd<\/code>\u00a0structs, returning a\u00a0<code>Mask<\/code>\u00a0indicating which elements of the first are greater than those of the second. Also, supported\u00a0<code>simd_lt<\/code>,\u00a0<code>simd_le<\/code>,\u00a0<code>simd_ge<\/code>,\u00a0<code>simd_lt<\/code>,\u00a0<code>simd_eq<\/code>,\u00a0<code>simd_ne<\/code>.<\/li>\n<li class=\"wp-block-list-item\">\n<a href=\"https:\/\/doc.rust-lang.org\/nightly\/core\/simd\/struct.Simd.html#method.rotate_elements_left\" target=\"_blank\" rel=\"noreferrer noopener\"><code>Simd::rotate_elements_left<\/code><\/a>\u00a0\u2013 rotates the elements of a\u00a0<code>Simd<\/code>\u00a0struct to the left by a specified amount. Also,\u00a0<code>rotate_elements_right<\/code>.<\/li>\n<li class=\"wp-block-list-item\">\n<a href=\"https:\/\/doc.rust-lang.org\/std\/simd\/prelude\/macro.simd_swizzle.html\" target=\"_blank\" rel=\"noreferrer noopener\"><code>simd_swizzle!(simd, indexes)<\/code><\/a>\u00a0\u2013 rearranges the elements of a\u00a0<code>Simd<\/code>\u00a0struct based on the specified const indexes.<\/li>\n<li class=\"wp-block-list-item\">\n<a href=\"https:\/\/doc.rust-lang.org\/nightly\/core\/simd\/struct.Simd.html#impl-Eq-for-Simd%3CT,+N%3E\" target=\"_blank\" rel=\"noreferrer noopener\"><code>simd == simd<\/code><\/a>\u00a0\u2013 checks for equality between two\u00a0<code>Simd<\/code>\u00a0structs, returning a regular\u00a0<code>bool<\/code>\u00a0result.<\/li>\n<li class=\"wp-block-list-item\">\n<a href=\"https:\/\/doc.rust-lang.org\/nightly\/core\/simd\/struct.Simd.html#method.reduce_and\" target=\"_blank\" rel=\"noreferrer noopener\"><code>Simd::reduce_and<\/code><\/a>\u00a0\u2013 performs a bitwise AND reduction across all lanes of a\u00a0<code>Simd<\/code>\u00a0struct. Also, supported:\u00a0<code>reduce_or<\/code>,\u00a0<code>reduce_xor<\/code>,\u00a0<code>reduce_max<\/code>,\u00a0<code>reduce_min<\/code>,\u00a0<code>reduce_sum<\/code>\u00a0(but no<code>reduce_eq<\/code>).<\/li>\n<\/ul>\n<h3 class=\"wp-block-heading\" id=\"a407\"><strong><code>Mask<\/code>\u00a0methods and operators<\/strong><\/h3>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\n<a href=\"https:\/\/doc.rust-lang.org\/nightly\/core\/simd\/struct.Mask.html#method.select\" target=\"_blank\" rel=\"noreferrer noopener\"><code>Mask::select<\/code><\/a>\u00a0\u2013 selects elements from two\u00a0<code>Simd<\/code>\u00a0struct based on a mask.<\/li>\n<li class=\"wp-block-list-item\">\n<a href=\"https:\/\/doc.rust-lang.org\/nightly\/core\/simd\/struct.Mask.html#method.all\" target=\"_blank\" rel=\"noreferrer noopener\"><code>Mask::all<\/code><\/a>\u00a0\u2013 tells if the mask is all\u00a0<code>true<\/code>.<\/li>\n<li class=\"wp-block-list-item\">\n<a href=\"https:\/\/doc.rust-lang.org\/nightly\/core\/simd\/struct.Mask.html#method.all\" target=\"_blank\" rel=\"noreferrer noopener\"><code>Mask::any<\/code><\/a>\u00a0\u2013 tells if the mask contains any\u00a0<code>true<\/code>.<\/li>\n<\/ul>\n<h3 class=\"wp-block-heading\" id=\"d274\"><strong>All about lanes<\/strong><\/h3>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\n<a href=\"https:\/\/doc.rust-lang.org\/nightly\/core\/simd\/struct.Simd.html#associatedconstant.LANES\" target=\"_blank\" rel=\"noreferrer noopener\"><code>Simd::LANES<\/code><\/a>\u00a0\u2013 a constant indicating the number of elements (lanes) in a\u00a0<code>Simd<\/code>\u00a0struct.<\/li>\n<li class=\"wp-block-list-item\">\n<a href=\"https:\/\/doc.rust-lang.org\/nightly\/core\/simd\/trait.SupportedLaneCount.html\" target=\"_blank\" rel=\"noreferrer noopener\"><code>SupportedLaneCount<\/code><\/a>\u00a0\u2013 tells the allowed values of\u00a0<code>LANES<\/code>. Use by generics.<\/li>\n<li class=\"wp-block-list-item\">\n<a href=\"https:\/\/doc.rust-lang.org\/core\/simd\/struct.Simd.html#method.lanes\" target=\"_blank\" rel=\"noreferrer noopener\"><code>simd.lanes<\/code><\/a>\u00a0\u2013 const method that tells a\u00a0<code>Simd<\/code>\u00a0struct\u2019s number of lanes.<\/li>\n<\/ul>\n<h3 class=\"wp-block-heading\" id=\"5ae8\"><strong>Low-level alignment, offsets, etc.<\/strong><\/h3>\n<p class=\"wp-block-paragraph\" id=\"eb23\"><em>When possible, use\u00a0<\/em><a href=\"https:\/\/doc.rust-lang.org\/nightly\/core\/simd\/struct.Simd.html#method.to_simd\" target=\"_blank\" rel=\"noreferrer noopener\"><em><code>to_simd<\/code><\/em><\/a><em>\u00a0instead.<\/em><\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\n<a href=\"https:\/\/doc.rust-lang.org\/std\/mem\/fn.size_of.html\" target=\"_blank\" rel=\"noreferrer noopener\"><code>mem::size_of<\/code><\/a>,\u00a0<a href=\"https:\/\/doc.rust-lang.org\/std\/mem\/fn.align_of.html\" target=\"_blank\" rel=\"noreferrer noopener\"><code>mem::align_of<\/code><\/a>,\u00a0<a href=\"https:\/\/doc.rust-lang.org\/std\/mem\/fn.align_to.html\" target=\"_blank\" rel=\"noreferrer noopener\"><code>mem::align_to<\/code><\/a>,\u00a0<a href=\"https:\/\/doc.rust-lang.org\/std\/intrinsics\/fn.offset.html\" target=\"_blank\" rel=\"noreferrer noopener\"><code>intrinsics::offset<\/code><\/a>,\u00a0<a href=\"https:\/\/doc.rust-lang.org\/std\/primitive.pointer.html#method.read_unaligned\" target=\"_blank\" rel=\"noreferrer noopener\"><code>pointer::read_unaligned<\/code><\/a>\u00a0(unsafe),\u00a0<a href=\"https:\/\/doc.rust-lang.org\/std\/primitive.pointer.html#method.write_unaligned\" target=\"_blank\" rel=\"noreferrer noopener\"><code>pointer::write_unaligned<\/code><\/a>\u00a0(unsafe),\u00a0<a href=\"https:\/\/doc.rust-lang.org\/std\/mem\/fn.transmute.html\" target=\"_blank\" rel=\"noreferrer noopener\"><code>mem::transmute<\/code><\/a>\u00a0(unsafe, const)<\/li>\n<\/ul>\n<h3 class=\"wp-block-heading\" id=\"0023\"><strong>More, perhaps of interest<\/strong><\/h3>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\n<a href=\"https:\/\/doc.rust-lang.org\/nightly\/core\/simd\/struct.Simd.html#method.deinterleave\" target=\"_blank\" rel=\"noreferrer noopener\"><code>deinterleave<\/code><\/a>,\u00a0<a href=\"https:\/\/doc.rust-lang.org\/nightly\/core\/simd\/struct.Simd.html#method.gather_or\" target=\"_blank\" rel=\"noreferrer noopener\"><code>gather_or<\/code><\/a>,\u00a0<a href=\"https:\/\/doc.rust-lang.org\/nightly\/core\/simd\/struct.Simd.html#method.reverse\" target=\"_blank\" rel=\"noreferrer noopener\"><code>reverse<\/code><\/a>,\u00a0<a href=\"https:\/\/doc.rust-lang.org\/nightly\/core\/simd\/struct.Simd.html#method.scatter\" target=\"_blank\" rel=\"noreferrer noopener\"><code>scatter<\/code><\/a>\n<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">With these building blocks at hand, it\u2019s time to build something.<\/p>\n<h2 class=\"wp-block-heading\">Rule 4: Brainstorm candidate algorithms.<\/h2>\n<p class=\"wp-block-paragraph\" id=\"51f9\">What do\u00a0<em>you<\/em>\u00a0want to speed up? You won\u2019t know ahead of time which SIMD approach (of any) will work best. You should, therefore, create many algorithms that you can then analyze (Rule 5) and benchmark (Rule 7).<\/p>\n<p class=\"wp-block-paragraph\" id=\"d3ec\">I wanted to speed up\u00a0<a href=\"https:\/\/crates.io\/crates\/range-set-blaze\" target=\"_blank\" rel=\"noreferrer noopener\"><code>range-set-blaze<\/code><\/a>, a crate for manipulating sets of \u201cclumpy\u201d integers. I hoped that creating\u00a0<code>is_consecutive<\/code>, a function to detect blocks of consecutive integers, would be useful.<\/p>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\" id=\"c359\"><strong>Background:<\/strong>\u00a0Crate\u00a0<em><code>range-set-blaze<\/code>\u00a0works on \u201cclumpy\u201d integers. \u201cC<\/em>lumpy\u201d, here, means that the number of ranges needed to represent the data is small compared to the number of input integers. For example, these 1002 input integers<\/p>\n<p class=\"wp-block-paragraph\" id=\"817b\"><code>100, 101,<\/code>\u00a0\u2026,\u00a0<code>489, 499, 501, 502,\u00a0<\/code>\u2026,\u00a0<code>998, 999, 999, 100, 0<\/code><\/p>\n<p class=\"wp-block-paragraph\" id=\"c4b2\">Ultimately become three Rust ranges:<\/p>\n<p class=\"wp-block-paragraph\" id=\"7d5f\"><code>0..=0, 100..=499, 501..=999<\/code>.<\/p>\n<p class=\"wp-block-paragraph\" id=\"e999\">(Internally, the\u00a0<a href=\"https:\/\/docs.rs\/range-set-blaze\/latest\/range_set_blaze\/struct.RangeSetBlaze.html#\" target=\"_blank\" rel=\"noreferrer noopener\"><em><code>RangeSetBlaze<\/code><\/em><\/a>\u00a0struct represents a set of integers as a sorted list of disjoint ranges stored in a cache efficient\u00a0<a href=\"http:\/\/const%20method%20that%20tells%20a%20simd%20struct's%20number%20of%20lanes.\/\" target=\"_blank\" rel=\"noreferrer noopener\">BTreeMap<\/a>.)<\/p>\n<p class=\"wp-block-paragraph\" id=\"1adb\">Although the input integers are allowed to be unsorted and redundant, we expect them to often be \u201cnice\u201d. RangeSetBlaze\u2019s\u00a0<code>from_iter<\/code>\u00a0constructor already exploits this expectation by grouping up adjacent integers. For example,\u00a0<code>from_iter<\/code>\u00a0first turns the 1002 input integers into four ranges<\/p>\n<p class=\"wp-block-paragraph\" id=\"ebc6\"><em><code>100..=499, 501..=999, 100..=100, 0..=0.<\/code><\/em><\/p>\n<p class=\"wp-block-paragraph\" id=\"f527\">with minimal, constant memory usage, independent of input size. It then sorts and merges these reduced ranges.<\/p>\n<p class=\"wp-block-paragraph\" id=\"f731\">I wondered if a new\u00a0<code>from_slice<\/code>\u00a0method could speed construction from array-like inputs by quickly finding (some) consecutive integers. For example, could it\u2014 with minimal, constant memory \u2014 turn the 1002 inputs integers\u00a0<em>into five Rust ranges:<\/em><\/p>\n<p class=\"wp-block-paragraph\" id=\"66a6\"><em><code>100..=499, 501..=999, 999..=999, 100..=100, 0..=0.<\/code><\/em><\/p>\n<p class=\"wp-block-paragraph\" id=\"168f\"><em>If so,\u00a0<code>from_iter<\/code>\u00a0could then quickly finish the processing.<\/em><\/p>\n<\/blockquote>\n<p class=\"wp-block-paragraph\" id=\"1793\">Let\u2019s start by writing\u00a0<code>is_consecutive<\/code>\u00a0with regular Rust:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-rust\">pub const LANES: usize = 16;\npub fn is_consecutive_regular(chunk: &amp;[u32; LANES]) -&gt; bool {\n    for i in 1..LANES {\n        if chunk[i - 1].checked_add(1) != Some(chunk[i]) {\n            return false;\n        }\n    }\n    true\n}<\/code><\/pre>\n<p class=\"wp-block-paragraph\" id=\"b3bd\">The algorithm just loops through the array sequentially, checking that each value is one more than its predecessor. It also avoids overflow.<\/p>\n<p class=\"wp-block-paragraph\" id=\"618a\">Looping over the items seemed so easy, I wasn\u2019t sure if SIMD could do any better. Here was my first attempt:<\/p>\n<h3 class=\"wp-block-heading\" id=\"b10a\">Splat0<\/h3>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-rust\">use std::simd::prelude::*;\n\nconst COMPARISON_VALUE_SPLAT0: Simd&lt;u32, LANES&gt; =\n    Simd::from_array([15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0]);\n\npub fn is_consecutive_splat0(chunk: Simd&lt;u32, LANES&gt;) -&gt; bool {\n    if chunk[0].overflowing_add(LANES as u32 - 1) != (chunk[LANES - 1], false) {\n        return false;\n    }\n    let added = chunk + COMPARISON_VALUE_SPLAT0;\n    Simd::splat(added[0]) == added\n}<\/code><\/pre>\n<p class=\"wp-block-paragraph\">Here is an outline of its calculations:<\/p>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" data-dominant-color=\"dadbd9\" data-has-transparency=\"false\" style=\"--dominant-color: #dadbd9;\" loading=\"lazy\" decoding=\"async\" width=\"648\" height=\"113\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_dqlqYlVVmxMHsrAdOA63Wg.webp?resize=648%2C113&#038;ssl=1\" alt=\"\" class=\"wp-image-598489 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_dqlqYlVVmxMHsrAdOA63Wg.webp 648w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_dqlqYlVVmxMHsrAdOA63Wg-300x52.webp 300w\" sizes=\"auto, (max-width: 648px) 100vw, 648px\"><figcaption class=\"wp-element-caption\">Source: This and all following images by author.<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\" id=\"9db4\">It first (needlessly) checks that the first and last items are 15 apart. It then creates\u00a0<code>added<\/code>\u00a0by adding 15 to the 0th item, 14 to the next, etc. Finally, to see if all items in\u00a0<code>added<\/code>\u00a0are the same, it creates a new\u00a0<code>Simd<\/code>\u00a0based on\u00a0<code>added<\/code>\u2019s 0th item and then compares. Recall that\u00a0<code>splat<\/code>\u00a0creates a\u00a0<code>Simd<\/code>\u00a0struct from one value.<\/p>\n<h3 class=\"wp-block-heading\" id=\"2dce\">Splat1 &amp; Splat2<\/h3>\n<p class=\"wp-block-paragraph\" id=\"f51a\">When I mentioned the\u00a0<code>is_consecutive<\/code>\u00a0problem to Ben Lichtman, he independently came up with this, Splat1:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-rust\">const COMPARISON_VALUE_SPLAT1: Simd&lt;u32, LANES&gt; =\n    Simd::from_array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]);\n\npub fn is_consecutive_splat1(chunk: Simd&lt;u32, LANES&gt;) -&gt; bool {\n    let subtracted = chunk - COMPARISON_VALUE_SPLAT1;\n    Simd::splat(chunk[0]) == subtracted\n}<\/code><\/pre>\n<p class=\"wp-block-paragraph\">Splat1 subtracts the comparison value from\u00a0<code>chunk<\/code>\u00a0and checks if the result is the same as the first element of\u00a0<code>chunk<\/code>, splatted.<\/p>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" data-dominant-color=\"dadad9\" data-has-transparency=\"false\" style=\"--dominant-color: #dadad9;\" loading=\"lazy\" decoding=\"async\" width=\"649\" height=\"117\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_lv_bxIY2mfm6ZGk4Apv0ng.webp?resize=649%2C117&#038;ssl=1\" alt=\"\" class=\"wp-image-598490 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_lv_bxIY2mfm6ZGk4Apv0ng.webp 649w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_lv_bxIY2mfm6ZGk4Apv0ng-300x54.webp 300w\" sizes=\"auto, (max-width: 649px) 100vw, 649px\"><\/figure>\n<p class=\"wp-block-paragraph\" id=\"d96f\">He also came up with a variation called Splat2 that splats the first element of\u00a0<code>subtracted<\/code>\u00a0rather than\u00a0<code>chunk<\/code>. That would seemingly avoid one memory access.<\/p>\n<p class=\"wp-block-paragraph\" id=\"7b0e\">I\u2019m sure you are wondering which of these is best, but before we discuss that let\u2019s look at two more candidates.<\/p>\n<h3 class=\"wp-block-heading\" id=\"a56b\">Swizzle<\/h3>\n<p class=\"wp-block-paragraph\" id=\"7ffd\">Swizzle is like Splat2 but uses\u00a0<code>simd_swizzle!<\/code>\u00a0instead of\u00a0<code>splat<\/code>. Macro\u00a0<code>simd_swizzle!<\/code>\u00a0creates a new\u00a0<code>Simd<\/code>\u00a0by rearranging the lanes of an old\u00a0<code>Simd<\/code>\u00a0according to an array of indexes.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-rust\">pub fn is_consecutive_sizzle(chunk: Simd&lt;u32, LANES&gt;) -&gt; bool {\n    let subtracted = chunk - COMPARISON_VALUE_SPLAT1;\n    simd_swizzle!(subtracted, [0; LANES]) == subtracted\n}<\/code><\/pre>\n<h3 class=\"wp-block-heading\" id=\"43e3\">Rotate<\/h3>\n<p class=\"wp-block-paragraph\" id=\"216d\">This one is different. I had high hopes for it.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-rust\">const COMPARISON_VALUE_ROTATE: Simd&lt;u32, LANES&gt; =\n    Simd::from_array([4294967281, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]);\n\npub fn is_consecutive_rotate(chunk: Simd&lt;u32, LANES&gt;) -&gt; bool {\n    let rotated = chunk.rotate_elements_right::&lt;1&gt;();\n    chunk - rotated == COMPARISON_VALUE_ROTATE\n}<\/code><\/pre>\n<p class=\"wp-block-paragraph\">The idea is to rotate all the elements one to the right. We then subtract the original\u00a0<code>chunk<\/code>\u00a0from\u00a0<code>rotated<\/code>. If the input is consecutive, the result should be \u201c-15\u201d followed by all 1\u2019s. (Using wrapped subtraction, -15 is\u00a0<code>4294967281u32<\/code>.)<\/p>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" data-dominant-color=\"dededd\" data-has-transparency=\"false\" style=\"--dominant-color: #dededd;\" loading=\"lazy\" decoding=\"async\" width=\"650\" height=\"114\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_B7wuja8a0SUgFfaUia5AsA.webp?resize=650%2C114&#038;ssl=1\" alt=\"\" class=\"wp-image-598491 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_B7wuja8a0SUgFfaUia5AsA.webp 650w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_B7wuja8a0SUgFfaUia5AsA-300x53.webp 300w\" sizes=\"auto, (max-width: 650px) 100vw, 650px\"><\/figure>\n<p class=\"wp-block-paragraph\">Now that we have candidates, let\u2019s start to evaluate them.<\/p>\n<h2 class=\"wp-block-heading\">Rule 5: Use Godbolt and AI to understand your code\u2019s assembly, even if you don\u2019t know assembly language.<\/h2>\n<p class=\"wp-block-paragraph\">We\u2019ll evaluate the candidates in two ways. First, in this rule, we\u2019ll look at the assembly language generated from our code. Second, in Rule 7, we\u2019ll benchmark the code\u2019s speed.<\/p>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\" id=\"b61b\"><em>Don\u2019t worry if you don\u2019t know assembly language, you can still get something out of looking at it.<\/em><\/p>\n<\/blockquote>\n<p class=\"wp-block-paragraph\">The easiest way to see the generated assembly language is with the\u00a0<a href=\"https:\/\/godbolt.org\/z\/j5GdGah89\" target=\"_blank\" rel=\"noreferrer noopener\">Compiler Explorer, AKA Godbolt<\/a>. It works best on short bits of code that don\u2019t use outside crates. It looks like this:<\/p>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" data-dominant-color=\"edeeee\" data-has-transparency=\"false\" style=\"--dominant-color: #edeeee;\" loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"507\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_zwolAcPYwUlIls2KCNjc5w-1024x507.webp?resize=1024%2C507&#038;ssl=1\" alt=\"\" class=\"wp-image-598492 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_zwolAcPYwUlIls2KCNjc5w-1024x507.webp 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_zwolAcPYwUlIls2KCNjc5w-300x149.webp 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_zwolAcPYwUlIls2KCNjc5w-768x380.webp 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_zwolAcPYwUlIls2KCNjc5w.webp 1308w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\"><\/figure>\n<p class=\"wp-block-paragraph\" id=\"5959\">Referring to the numbers in the figure above, follow these steps to use Godbolt:<\/p>\n<ol class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">Open\u00a0<a href=\"https:\/\/godbolt.org\/z\/odrPv5WcG\" target=\"_blank\" rel=\"noreferrer noopener\">godbolt.org<\/a>\u00a0with your web browser.<\/li>\n<li class=\"wp-block-list-item\">Add a new source editor.<\/li>\n<li class=\"wp-block-list-item\">Select Rust as your language.<\/li>\n<li class=\"wp-block-list-item\">Paste in the code of interest. Make the functions of interest public (<code>pub fn<\/code>). Do not include a main or unneeded functions. The tool doesn\u2019t support external crates.<\/li>\n<li class=\"wp-block-list-item\">Add a new compiler.<\/li>\n<li class=\"wp-block-list-item\">Set the compiler version to nightly.<\/li>\n<li class=\"wp-block-list-item\">Set options (for now) to\u00a0<code>-C opt-level=3 -C target-feature=+avx512f.<\/code>\n<\/li>\n<li class=\"wp-block-list-item\">If there are errors, look at the output.<\/li>\n<li class=\"wp-block-list-item\">If you want to share or save the state of the tool, click \u201cShare\u201d<\/li>\n<\/ol>\n<p class=\"wp-block-paragraph\" id=\"ebc0\">From the image above, you can see that Splat2 and Sizzle are exactly the same, so we can remove Sizzle from consideration. If you\u00a0<a href=\"https:\/\/godbolt.org\/z\/j5GdGah89\" rel=\"noreferrer noopener\" target=\"_blank\">open up a copy of my Godbolt session<\/a>, you\u2019ll also see that most of the functions compile to about the same number of assembly operations. The exceptions are Regular \u2014 which is much longer \u2014 and Splat0 \u2014 which includes the early check.<\/p>\n<p class=\"wp-block-paragraph\" id=\"44dd\">In the assembly, 512-bit registers start with ZMM. 256-bit registers start YMM. 128-bit registers start with XMM. If you want to better understand the generated assembly, use AI tools to generate annotations. For example, here I ask\u00a0<a href=\"https:\/\/www.bing.com\/search?q=Bing+AI&amp;showconv=1&amp;FORM=hpcodx\" rel=\"noreferrer noopener\" target=\"_blank\">Bing Chat<\/a>\u00a0about Splat2:<\/p>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" data-dominant-color=\"dfd3e0\" data-has-transparency=\"false\" style=\"--dominant-color: #dfd3e0;\" loading=\"lazy\" decoding=\"async\" width=\"984\" height=\"833\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_GXuhtmhX7wZLKtVUw7OD6w.webp?resize=984%2C833&#038;ssl=1\" alt=\"\" class=\"wp-image-598493 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_GXuhtmhX7wZLKtVUw7OD6w.webp 984w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_GXuhtmhX7wZLKtVUw7OD6w-300x254.webp 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_GXuhtmhX7wZLKtVUw7OD6w-768x650.webp 768w\" sizes=\"auto, (max-width: 984px) 100vw, 984px\"><\/figure>\n<p class=\"wp-block-paragraph\" id=\"5a34\">Try different compiler settings, including\u00a0<code>-C target-feature=+avx2<\/code>\u00a0and then leaving\u00a0<code>target-feature<\/code>\u00a0completely off.<\/p>\n<p class=\"wp-block-paragraph\" id=\"df53\">Fewer assembly operations don\u2019t necessarily mean faster speed. Looking at the assembly does, however, give us a sanity check that the compiler is at least trying to use SIMD operations, inlining const references, etc. Also, as with Splat1 and Swizzle, it can sometimes let us know when two candidates are the same.<\/p>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\" id=\"d616\"><em>You may need disassembly features beyond what Godbolt offers, for example, the ability to work with code the uses external crates. B3NNY recommended the cargo tool\u00a0<a href=\"https:\/\/github.com\/pacak\/cargo-show-asm\" target=\"_blank\" rel=\"noreferrer noopener\"><code>cargo-show-asm<\/code><\/a>\u00a0to me. I tried it and found it reasonably easy to use.<\/em><\/p>\n<\/blockquote>\n<p class=\"wp-block-paragraph\" id=\"9c1c\">The\u00a0<code>range-set-blaze<\/code>\u00a0crate must handle integer types beyond\u00a0<code>u32<\/code>. Moreover, we must pick a number of LANES, but we have no reason to think that 16 LANES is always best. To address these needs, in the next rule we\u2019ll generalize the code.<\/p>\n<h2 class=\"wp-block-heading\">Rule 6: Generalize to all types and LANES with in-lined generics, (and when that doesn\u2019t work) macros, and (when that doesn\u2019t work) traits.<\/h2>\n<p class=\"wp-block-paragraph\">Let\u2019s first generalize Splat1 with generics.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-rust\">#[inline]\npub fn is_consecutive_splat1_gen&lt;T, const N: usize&gt;(\n    chunk: Simd&lt;T, N&gt;,\n    comparison_value: Simd&lt;T, N&gt;,\n) -&gt; bool\nwhere\n    T: SimdElement + PartialEq,\n    Simd&lt;T, N&gt;: Sub&lt;Simd&lt;T, N&gt;, Output = Simd&lt;T, N&gt;&gt;,\n    LaneCount&lt;N&gt;: SupportedLaneCount,\n{\n    let subtracted = chunk - comparison_value;\n    Simd::splat(chunk[0]) == subtracted\n}<\/code><\/pre>\n<p class=\"wp-block-paragraph\" id=\"84f1\">First, note the\u00a0<code>#[inline]<\/code>\u00a0attribute. It\u2019s important for efficiency and we\u2019ll use it on pretty much every one of these small functions.<\/p>\n<p class=\"wp-block-paragraph\" id=\"9eee\">The function defined above,\u00a0<code>is_consecutive_splat1_gen<\/code>, looks great except that it needs a second input, called\u00a0<code>comparison_value<\/code>, that we have yet to define.<\/p>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\" id=\"1291\"><em>If you don\u2019t need a generic const\u00a0<code>comparison_value<\/code>, I envy you. You can skip to the next rule if you like. Likewise, if you are reading this in the future and creating a generic const\u00a0<code>comparison_value<\/code>\u00a0is as effortless as having your personal robot do your household chores, then I doubly envy you.<\/em><\/p>\n<\/blockquote>\n<p class=\"wp-block-paragraph\" id=\"6a23\">We can try to create a\u00a0<code>comparison_value_splat_gen<\/code>\u00a0that is generic and const. Sadly, neither\u00a0<code>From&lt;usize&gt;<\/code>\u00a0nor alternative\u00a0<code>T::One<\/code>\u00a0are const, so this doesn\u2019t work:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-rust\">\/\/ DOESN'T WORK BECAUSE From&lt;usize&gt; is not const\npub const fn comparison_value_splat_gen&lt;T, const N: usize&gt;() -&gt; Simd&lt;T, N&gt;\nwhere\n    T: SimdElement + Default + From&lt;usize&gt; + AddAssign,\n    LaneCount&lt;N&gt;: SupportedLaneCount,\n{\n    let mut arr: [T; N] = [T::from(0usize); N];\n    let mut i_usize = 0;\n    while i_usize &lt; N {\n        arr[i_usize] = T::from(i_usize);\n        i_usize += 1;\n    }\n    Simd::from_array(arr)\n}<\/code><\/pre>\n<p class=\"wp-block-paragraph\" id=\"23e0\"><strong>Macros are the last refuge of scoundrels.<\/strong>\u00a0So, let\u2019s use macros:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-rust\">#[macro_export]\nmacro_rules! define_is_consecutive_splat1 {\n    ($function:ident, $type:ty) =&gt; {\n        #[inline]\n        pub fn $function&lt;const N: usize&gt;(chunk: Simd&lt;$type, N&gt;) -&gt; bool\n        where\n            LaneCount&lt;N&gt;: SupportedLaneCount,\n        {\n            define_comparison_value_splat!(comparison_value_splat, $type);\n\n            let subtracted = chunk - comparison_value_splat();\n            Simd::splat(chunk[0]) == subtracted\n        }\n    };\n}\n#[macro_export]\nmacro_rules! define_comparison_value_splat {\n    ($function:ident, $type:ty) =&gt; {\n        pub const fn $function&lt;const N: usize&gt;() -&gt; Simd&lt;$type, N&gt;\n        where\n            LaneCount&lt;N&gt;: SupportedLaneCount,\n        {\n            let mut arr: [$type; N] = [0; N];\n            let mut i = 0;\n            while i &lt; N {\n                arr[i] = i as $type;\n                i += 1;\n            }\n            Simd::from_array(arr)\n        }\n    };\n}<\/code><\/pre>\n<p class=\"wp-block-paragraph\" id=\"3c8d\">This lets us run on any particular element type and all number of LANES (<a href=\"https:\/\/play.rust-lang.org\/?version=nightly&amp;mode=debug&amp;edition=2021&amp;gist=f5a6fbac31d64f3ae79440d5613e44ec\" target=\"_blank\" rel=\"noreferrer noopener\">Rust Playground<\/a>):<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-rust\">define_is_consecutive_splat1!(is_consecutive_splat1_i32, i32);\n\nlet a: Simd&lt;i32, 16&gt; = black_box(Simd::from_array(array::from_fn(|i| 100 + i as i32)));\nlet ninety_nines: Simd&lt;i32, 16&gt; = black_box(Simd::from_array([99; 16]));\nassert!(is_consecutive_splat1_i32(a));\nassert!(!is_consecutive_splat1_i32(ninety_nines));<\/code><\/pre>\n<p class=\"wp-block-paragraph\" id=\"76bf\">Sadly, this still isn\u2019t enough for\u00a0<code>range-set-blaze<\/code>. It needs to run on\u00a0<em>all\u00a0<\/em>element types (not just one) and (ideally) all LANES (not just one).<\/p>\n<p class=\"wp-block-paragraph\" id=\"3c1f\">Happily, there\u2019s a workaround, that again depends on macros. It also exploits the fact that we only need to support a finite list of types, namely:\u00a0<code>i8<\/code>,\u00a0<code>i16<\/code>,\u00a0<code>i32<\/code>,\u00a0<code>i64<\/code>,\u00a0<code>isize<\/code>,\u00a0<code>u8<\/code>,\u00a0<code>u16<\/code>,\u00a0<code>u32<\/code>,\u00a0<code>u64<\/code>, and\u00a0<code>usize<\/code>. If you need to also (or instead) support\u00a0<code>f32<\/code>\u00a0and\u00a0<code>f64<\/code>, that\u2019s fine.<\/p>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\" id=\"35eb\"><em>If, on the other hand, you need to support\u00a0<code>i128<\/code>\u00a0and\u00a0<code>u128<\/code>, you may be out of luck. The\u00a0<code>core::simd<\/code>\u00a0module doesn\u2019t support them. We\u2019ll see in Rule 8 how\u00a0<code>range-set-blaze<\/code>\u00a0gets around that at a performance cost.<\/em><\/p>\n<\/blockquote>\n<p class=\"wp-block-paragraph\" id=\"060c\">The workaround defines a new trait, here called\u00a0<code>IsConsecutive<\/code>. We then use a macro (that calls a macro, that calls a macro) to implement the trait on the 10 types of interest.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-rust\">pub trait IsConsecutive {\n    fn is_consecutive&lt;const N: usize&gt;(chunk: Simd&lt;Self, N&gt;) -&gt; bool\n    where\n        Self: SimdElement,\n        Simd&lt;Self, N&gt;: Sub&lt;Simd&lt;Self, N&gt;, Output = Simd&lt;Self, N&gt;&gt;,\n        LaneCount&lt;N&gt;: SupportedLaneCount;\n}\n\nmacro_rules! impl_is_consecutive {\n    ($type:ty) =&gt; {\n        impl IsConsecutive for $type {\n            #[inline] \/\/ very important\n            fn is_consecutive&lt;const N: usize&gt;(chunk: Simd&lt;Self, N&gt;) -&gt; bool\n            where\n                Self: SimdElement,\n                Simd&lt;Self, N&gt;: Sub&lt;Simd&lt;Self, N&gt;, Output = Simd&lt;Self, N&gt;&gt;,\n                LaneCount&lt;N&gt;: SupportedLaneCount,\n            {\n                define_is_consecutive_splat1!(is_consecutive_splat1, $type);\n                is_consecutive_splat1(chunk)\n            }\n        }\n    };\n}\n\nimpl_is_consecutive!(i8);\nimpl_is_consecutive!(i16);\nimpl_is_consecutive!(i32);\nimpl_is_consecutive!(i64);\nimpl_is_consecutive!(isize);\nimpl_is_consecutive!(u8);\nimpl_is_consecutive!(u16);\nimpl_is_consecutive!(u32);\nimpl_is_consecutive!(u64);\nimpl_is_consecutive!(usize);<\/code><\/pre>\n<p class=\"wp-block-paragraph\" id=\"0e47\">We can now call fully generic code (Rust Playground):<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-rust\">\/\/ Works on i32 and 16 lanes\nlet a: Simd&lt;i32, 16&gt; = black_box(Simd::from_array(array::from_fn(|i| 100 + i as i32)));\nlet ninety_nines: Simd&lt;i32, 16&gt; = black_box(Simd::from_array([99; 16]));\n\nassert!(IsConsecutive::is_consecutive(a));\nassert!(!IsConsecutive::is_consecutive(ninety_nines));\n\n\/\/ Works on i8 and 64 lanes\nlet a: Simd&lt;i8, 64&gt; = black_box(Simd::from_array(array::from_fn(|i| 10 + i as i8)));\nlet ninety_nines: Simd&lt;i8, 64&gt; = black_box(Simd::from_array([99; 64]));\n\nassert!(IsConsecutive::is_consecutive(a));\nassert!(!IsConsecutive::is_consecutive(ninety_nines));<\/code><\/pre>\n<p class=\"wp-block-paragraph\">With this technique, we can create multiple candidate algorithms that are fully generic over type and LANES. Next, it is time to benchmark and see which algorithms are fastest.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dotted\">\n<p class=\"wp-block-paragraph\">Those are the first six rules for adding SIMD code to Rust. In\u00a0<a href=\"https:\/\/towardsdatascience.com\/nine-rules-for-simd-acceleration-of-your-rust-code-part-2-6a104b3be6f3\" target=\"_blank\" rel=\"noreferrer noopener\">Part 2<\/a>, we look at rules 7 to 9. These rules will cover how to pick an algorithm and set LANES. Also, how to integrate SIMD operations into your existing code and (importantly) how to make it optional. Part 2 concludes with a discussion of when\/if you should use SIMD and ideas for improving Rust\u2019s SIMD experience. I hope to see you\u00a0<a href=\"https:\/\/towardsdatascience.com\/nine-rules-for-simd-acceleration-of-your-rust-code-part-2-6a104b3be6f3\" target=\"_blank\" rel=\"noreferrer noopener\">there<\/a>.<\/p>\n<p class=\"wp-block-paragraph\"><em>Please\u00a0<\/em><a href=\"https:\/\/medium.com\/@carlmkadie\"><em>follow Carl on Medium<\/em><\/a><em>. I write on scientific programming in Rust and Python, machine learning, and statistics. I tend to write about one article per month.<\/em><\/p>\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/medium.com\/tag\/rust?source=post_page-----c16fe639ce21---------------------------------------\"><\/a><\/p>\n<p>The post <a href=\"https:\/\/towardsdatascience.com\/nine-rules-for-simd-acceleration-of-your-rust-code-part-1-c16fe639ce21\/\">Nine Rules for SIMD Acceleration of Your Rust Code (Part 1)<\/a> appeared first on <a href=\"https:\/\/towardsdatascience.com\/\">Towards Data Science<\/a>.<\/p>\n<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    Carl M. Kadie<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/towardsdatascience.com\/nine-rules-for-simd-acceleration-of-your-rust-code-part-1-c16fe639ce21\/\">Go to original source<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Nine Rules for SIMD Acceleration of Your Rust Code (Part 1) Thanks to Ben Lichtman (B3NNY) at the Seattle Rust Meetup for pointing me in the right direction on SIMD. SIMD\u00a0(Single Instruction, Multiple Data) operations have been a feature of Intel\/AMD and ARM CPUs since the early 2000s. These operations enable you to, for example, [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[62,1869,160,1870,1871,699,158],"tags":[1873,1872,163],"class_list":["post-2099","post","type-post","status-publish","format-standard","hentry","category-aimldsaimlds","category-data-ingestion","category-programming","category-rust","category-simd","category-software-development","category-tips-and-tricks","tag-rust","tag-simd","tag-your"],"_links":{"self":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/2099"}],"collection":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/comments?post=2099"}],"version-history":[{"count":0,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/2099\/revisions"}],"wp:attachment":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/media?parent=2099"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/categories?post=2099"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/tags?post=2099"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}