Backend Misc

Last updated on February 25, 2025 pm

WIP FOREVER!

problems I encountered(wondered) + solutions I found => knowledges I learned.

Database

Index types

Basically there’re 3 core index types of databases:

  • B / B+ Tree
  • LSM Tree
  • Hash

Comparison:

Type Benifits Downsides
Hash Super fast No range queries;
keys need to fit in memory
B / B+ Tree Fast range-queries;
Number of keys not limited by memory
Slow write to btree on disk
LSM Tree Number of keys not limited by memory;
Write faster than B tree and slower than Hash;
Support range-queries but slower than B tree
Extra CPU cost for Compaction

Check this video for more details: LSM Tree + SSTable


LSM-Tree

Core primitives:


                                          ┌───────┐ ┌───────┐
                                      ┌──►│SSTable├►│SSTable│ ... L0
┌───────────┐   ┌───────────────────┐ │   └───┬───┘ └───────┘
│ MemTable  ├──►│ ImmutableMemTable ├─┘   ┌───┴───┐ ┌───────┐
└───────────┘   └───────────────────┘     │SSTable├►│SSTable│ ... L1+
                                          └───┬───┘ └───────┘
                                          ┌───┴───┐ ┌───────┐
                                          │SSTable├►│SSTable│ ... L1+
                                          └───────┘ └───────┘

Writing:


 disk
┌───┐     ┌────────┐
│WAL├────►│MemTable│
└───┘     └──┬─────┘
             │
             │ if data is too much
      ┌──────┴─────────────────┐
 ┌────┴────────────┐    ┌──────┴─────┐
 │ImmutableMemTable│    │New MemTable│
 └───────┬─────────┘    └────────────┘
         │
         │ inorder traversal the tree
         │
     ┌───┴───┐
     │SSTable│ disk L0
     └───┬───┘
         │ compaction
     ┌───┴───┐
     │SSTable│ disk L1
     └───────┘

Querying:


     ┌────────┐
     │MemTable│ mem query
     └────┬───┘
          │
  ┌───────┴─────────┐
  │ImmutableMemTable│  mem query
  └───┬─────────────┘
      │
      ├─────────────────┐
┌─────┴──────┐    ┌─────┴──────┐
│Bloom Filter│    │Bloom Filter│  use bloom filter to check if the target exists in the file
│------------│    │------------│
│  SSTable   │    │  SSTable   │  L0, the SSTables might intersect, so we have to check each of 'em
└─────┬──────┘    └────────────┘
      │
┌─────┴──────┐    ┌────────────┐
│  SSTable   │    │  SSTable   │  L1+, the SSTables aren't intersected, so we can use a binary search to get the target file
└────────────┘    └────────────┘
     ...

Other types(except Index)

I didn’t intend to go into detail on each of them—just a brief introduction and the core ideas.

  1. Group by storage

    • Row-oriented: store the data in rows, and it’s good for OLTP.
    • Column-oriented: store the data in columns, and it’s good for OLAP.
    • Column-family: store the data in column families.
    • Doc: store the data as a document.
    • KV: store the data as HASH.

Row-oriented is simple and common, let’s skip it here.

Column-oriented stores data in columns, which speeds up the query like: select col from table where ...(that’s why it’s good for OLAP), but what if you perform a query like select * from table where ...?

Usually there’re 3 steps:

  • Find the rows according to the where clause.
  • Use multi-threads to fetch the columns of the rows.
  • Aggregate the results.

Column-family could be a litte bit more complex, let’s break it down from 2 perspectives:

  • What does it look like?
  • What problems does it solve?

Assume we have a table like this(Row-oriented):

user_id name age order1 order2 addr1 addr2
1 Alice 30 iPhone MacBook NY LA
2 Bob 28 Android Laptop SF TX

If we convert it into Column-oriented, there’re 7 columns.

Column-family is like split the single one table into several related tables, and each table has a group of columns. For example, we could split the table above into 3 column families:

Column-family1 stands for user basic info:
Row1: (1, Alice, 30)
Row2: (2, Bob, 28)

Column-family2 stands for user orders:
Row1: (iPhone, MacBook)
Row2: (Android, Laptop)

Column-family3 stands for user addresses:
Row1: (NY, LA)
Row2: (SF, TX)

This design benifits in several ways:

  • Comparing to Row-oriented databases, it avoids loading useless data.

    In Row-oriented db, it will load all the columns of the row and extract the needed ones.

  • Comparing to Column-oriented databases, it avoids extra IOs.

    In Column-oriented db, like we mentioned before, it needs to fetch the columns of the rows from different location of the disk(with or without multi-threads).

    In Column-family db, we can query by rows with a specific family just like a Row-oriented db, which reduces the count of IOs.

Now, we can see that Column-family follows a design pattern that sits between Row-oriented and Column-oriented storage models.

Let’s consider another question: what if I decide to use a Row-oriented db and manully split a wide table into several related table instead of using a Column-family db?

This question cannot be answered easily, because a Column-family database usually works with LSM-Tree, and a classic RDBMS usually works with B+ Tree. The difference between them is not just about the storage model, but also about the query model, the transaction model, the consistency model, and so on.

E.g. when we perform DDL to a LSM-Tree based Column-family database to modify a ‘family’ of a table, things’re totally different:

  • Adding a column doesn’t require heavy operations for Column-family db, cuz LSM-Tree is append only, which is a bit like the INSTANT mode of MySQL.
  • Removing a column is almost the same, we only need to modify the metadata of the family without deleting it from the disk.

  1. Different Aims

    • GraphDB: solves the n-degree relationship problem.
    • TSDB: solves the time series data problem.
    • VectorDB: solves the vector data problem.

GraphDB: what does n-degree mean?

The example below is a classic 2-degree relationship:

user_friends_ref:
| user_id | friend_user_id |
|    1    |      2         |
|    1    |      31        |
|    1    |      283       |

If we want to recommend the friends of the friends of user 1 to himself,
how to acheive this in a RDBMS?

E.g Alice knows Bob, and Bob knows Charlie, and we want to recommend Charlie to Alice.

We have to do things like these:
1. select friend_user_id from user_friends_ref where user_id = alice.user_id;
2. select distinct(friend_user_id) from user_friends_ref where user_id in $(the_result_above);
3. recommend the result in the 2nd step;

And if we want to recommend my friends’ friends’ friends to me, it’ll be a 3-degree problem and a disaster:

┌──┐  ┌────┐  ┌───┐  ┌─────┐
│Me├─►│Jack├─►│Amy├─►│Steve│
└──┘  └────┘  └───┘  └──┬──┘
 ▲       recommend      │
 └──────────────────────┘

Time Series Databases (TSDBs), such as InfluxDB, are built on LSM-Tree architecture, with extensive optimizations tailored specifically for time series data.

As for VectorDB, it’s built for vector data, you can use it to perform calculation like L2 and Cosine Similarity easily:

L2(A,B)=(A1B1)2+(A2B2)2++(AnBn)2L2(A, B) = \sqrt{(A_1 - B_1)^2 + (A_2 - B_2)^2 + \text{…} + (A_n - B_n)^2}

cos(θ)=ABA×B\cos(\theta) = \frac{A \cdot B}{||A|| \times ||B||}


Transactions

Optimistic and pessimistic Txs(like TiDB)

  • Optimistic: commit anyway, if there’re any conflicts, rollback.
  • Pessimistic: lock the rows to make sure there’re no conflicts, commit.

The lock is a little bit different, cuz we usually use locks to gurantee the resources are modified correctly.

E.g. if we try to deduct the inventory of a item:
using a optimistic lock is like: update item set amount = amount - 1 where amount > 0 and id = 1, and then we check whether the rows affected is 1;
while using a pessimistic lock is like BEGIN; select amount of item where id = 1; check the amount and do sth; COMMIT;


MVCC

MVCC (Multi-Version Concurrency Control) is a solution designed to address transaction management challenges in concurrent environments, allowing for consistent reads and writes without locking.

Therefore, MVCC is tightly coupled with transactions and is typically a requirement in relational databases to ensure data consistency and concurrency control.

Different dbs have different approaches, and I don’t intend to go into detail on each of them. Instead, let’s focus on the core ideas:

Since there’re multi txs trying to perform r/w ops to a same line at the same time(before they ‘COMMIT’), a single row might have several different versions. And when a tx tries to read a row, it’ll try its best to read the latest version.

To achieve the target, there’re 2 things to be done:

  • Determine which version is the ‘latest’ - so I can read it exactly(neither the outdated data nor the uncommitted data).
  • And to determine it, we need to know the order of the txs.

Keeping the global tx order is easy, all we need to do is using a serial adder to generate the tx id.

With the tx id, the row might look like this:

              ┌────────────────────┐
          ┌──►│ id=1 | data | tx_0 │
          │   └────────────────────┘
          └─────────────────────┐
         ┌────────────────────┐ │
       ┌►│ id=1 | data | tx_1 ├─┘
       │ └────────────────────┘
       └───────────────────┐
    ┌────────────────────┐ │
    │ id=1 | data | tx_2 ├─┘
    └────────────────────┘

All of these versions of data can be read

Don’t worry about the disk usage, thoese copies will be cleaned up after the tx is committed by a daemon thread / task asynchronously.

Now, we can easily get the below information when we’re creating a READ VIEW:

  • current active txs
    • the minimal tx_id indicates the oldest active tx
    • the maximum tx_id indicates the newest active tx

Finally, we can determine whether a specific version of row can be read with the following steps:


// cur_v is the version of the row we're trying to access
// min_id, max_id, my_id are literal meanings
fn can_read(cur_v: i32, active_tx_ids: &[i32], min_id: i32, max_id: i32, my_id: i32) -> bool {
    // current version is old enough
    if cur_v < min_id {
        return true;
    }

    // current version is created by me
    if cur_v == my_id {
        return true;
    }

    // current version is newer than the newest active tx
    if cur_v > max_id {
        return false;
    }

    // if not contains, that means the cur_v was committed before the read view is created
    // otherwise it means the cur_v is still active and uncommitted
    active_tx_ids.contains(&cur_v)
}

Moreover, we can achieve different isolation levels by adjusting the timing of READ VIEW creation:

  • If we create a READ VIEW before starting a tx and keep it until the transaction completes, we achieve Repeatable Read (RR) isolation, as the view remains unchanged throughout the transaction’s lifetime.
  • If we create a READ VIEW before each query within a transaction, we achieve Read Committed (RC) isolation, as each view may differ, but the data read is always committed.

Join

Nested Loop Join

  • SQL:
    select * from TableA join TableB on TableA.user_id = TableB.user_id

  • Pseudo code:

    for rowA in TableA {
        for rowB in TableB {
            if rowA.key == rowB.key {
                output(rowA, rowB);
            }
        }
    }

In the above example, TableA is the Outer table and TableB is the Inner table. By default the inner table is TableB, but the actual situation is determined by the SQL Execution Optimizer – it prefer to choose the smaller table as the inner table.

If we have a index on TableB.user_id, the pseudo code should be like:

for rowA in TableA {
    rowB = index_lookup(TableB, rowA.key); // use index
    if rowB.exists() {
        output(rowA, rowB);
    }
}

The applicable cases are:

  • rows are not too many
  • the inner table has proper indexes

Time complexity: O(n * m), where n is the number of rows in TableA and m is the number of rows in TableB.
With index: O(n * log(m)), if the index provides the O(log(m)) time complexity – like B+ Tree.


Hash Join

  • Pseudo code:

    // Build: walk the inner table to build a hash table
    hash_table = HashMap::new();
    for rowB in TableB {
        hash_table.insert(rowB.key, rowB);
    }
    
    // Probe: walk the outer table to find the corresponding rows
    for rowA in TableA {
        if hash_table.contains_key(rowA.key) {
            output(rowA, hash_table[rowA.key]);
        }
    }

The applicable cases are:

  • no indexes
  • many rows but fit in the mem
  • no sorting requirements

Time complexity: O(n) for building hash table, O(m) for probing, thus the total time complexity is O(n + m) which is faster than Nested Loop Join.

Merged Join

  • Pseudo code:

    iterA = TableA.iterator();
    iterB = TableB.iterator();
    rowA = iterA.next();
    rowB = iterB.next();
    
    while rowA and rowB {
        if rowA.key == rowB.key {
            output(rowA, rowB);
            rowA = iterA.next();
            rowB = iterB.next();
        } else if rowA.key < rowB.key {
            rowA = iterA.next();
        } else {
            rowB = iterB.next();
        }
    }

The applicable cases are:

  • both tables are sorted
  • need sorting
  • range query

Time complexity: O(n + m).


Another example

SQL:

SELECT * FROM user
JOIN avatar ON user.user_id = avatar.user_id
WHERE user.user_id = 1;

If we have the user_id_idx on both tables, the exec process should be like:

for rowA in (SELECT * FROM user WHERE user_id = 1) {
    for rowB in (SELECT * FROM avatar WHERE user_id = rowA.user_id) {
        output(rowA, rowB);
    }
}

If not, the process is like:

for rowA in (SELECT * FROM user WHERE user_id = 1) {
    for rowB in (SELECT * FROM avatar) { // PERF: scan all the table
        if rowA.user_id == rowB.user_id {
            output(rowA, rowB);
        }
    }
}

How much data is appropriate for a table? (B+ Tree type index)

Actually this is a math problem, before we dive into it, we need to figure out the page size of the database, like:

postgres@localhost:my_db> show block_size
+------------+
| block_size |
|------------|
| 8192       |
+------------+
SHOW
Time: 0.262s

For PostgreSQL it’s 8KB, and for MySQL it’s usually 16KB.

Assume we’re using MySQL, and the data type of id is BIGINT, which takes 8B(64 bit), and the pointer or any other necessary data of a tree node takes another 8B, thus each level can hold 16KB / 16B ≈ 1000 nodes.

And we all know that for a cluster index, the leaf node is the actual data, and the non-leaf node is the index, and of course we have a root node. In that case, if the B+ Tree has 3 level, then the count of leaf nodes are:

Leaf nodes=10002=1,000,000\text{Leaf nodes} = 1000^2 = 1,000,000

Now assume each row takes 1KB, then a single page could hold 16KB / 1KB = 16 rows, thus for a 3 level B+ Tree it could hold:

Rows=Leaf nodes16=1,000,00016=16,000,000\text{Rows} = \text{Leaf nodes} * 16 = 1,000,000 * 16 = 16,000,000

For each query it takes 3 IOs(cuz there’re 3 levels). And continuing this idea from above, when the database has more rows, it will take 4-5 and even more IOs for each query in the worst case(there’re buffers, so typically we don’t have to perform that many IOs).


But things’re different in PostgreSQL - it doesn’t store the row data in the leaf nodes!

Instead, it stores the disk pointer in the leaf nodes, and the actual data is stored in the heap. So the leaf nodes of a B+ Tree in PostgreSQL are much smaller than in MySQL, and the number of rows that a single page could hold is much larger.

Typically a disk pointer takes 6B space, which means a single block can hold 7KB(8KB - some reserved space) / 6B ≈ 1200 pointers, much more than the MySQL. But keep in mind that since the block stores a pointer instead of the row data, an additional I/O operation is required to fetch the row.

Let’s do the math again:

Nodes of each level=8KB/16B=500\text{Nodes of each level} = 8KB / 16B = 500

Leaf nodes=5002=250,000\text{Leaf nodes} = 500^2 = 250,000

Rows=Leaf nodes1200=250,0001200=300,000,000\text{Rows} = \text{Leaf nodes} * 1200 = 250,000 * 1200 = 300,000,000

LOL, 300 millions!(With an extra IO though)

Why don’t you use a larger page size?(to reduce the num of tree layers)

Cuz databases are read and written on a page-by-page basis, not key-by-key. Therefore:

A large page size will cause Write Amplification problem, even if we only modified a single row, the whole page will be written to the disk.

Meanwhile, when we’re reading a single row, we have to read the whole page, which will slow down the query speed.

What’s more, the Buffer Pool of databases is also based on the page size, and a larger page size will cause a waste of memory.

Last but not least, a larger page size will cause a waste of disk space, cuz the page is not full, and the file block of operating system is usually 4KB.


PostgreSQL

Conditional Index

Assume we have a state machine table with stages ‘1’, ‘2’ and ‘3’, and there’s a ‘user_id’ column in the table.

id user_id stage
1 38413 1
2 38418 2
3 38413 3
4 38413 3
5 38413 3

If we want to make sure that the ‘user_id’ is unique within the stage ‘1’ and ‘2’, but it’s allowed to be duplicated in the stage ‘3’, we can create a conditional index like this:

CREATE UNIQUE INDEX index_name ON table_name (user_id) WHERE stage IN (1, 2);

If the scope is defined by a varchar column, the DDL is a little bit more complex:

create unique index index_name
    on table_name (user_id, to_user_id)
    where ((status)::text = ANY
           ((ARRAY ['INIT'::character varying, 'PENDING'::character varying, 'ACCEPTED'::character varying])::text[]));

Concurrent Queries on One Connection

This is not allowed due to the limitation of the PostgreSQL protocol. If you want to run multiple queries in parallel, you need to use multiple connections.

However, you can use pipelining to send multiple queries to the server in a batch.

Check this out: Concurrent queries on single connection #738


Connection Pool Size and QPS

Assume:

  • Average query time: t seconds
  • Connection pool size: n
  • QPS: q

We have the following formula:

qnt=max(qps)q \leq \frac{n}{t} = \max(qps)

What will happen if the QPS exceeds the limit? - The connection pool will be exhausted.

Let’s say that q > n/t, then we have:

Extra Reqs=m=qnt\text{Extra Reqs} = m = q - \frac{n}{t}

These requests will be queued and wait for the connection to be available.

delay rounds={0,if nm,but the next n gets smaller:nmm/n,if n<m,and the last n gets smaller:nmmodn\text{delay rounds} = \begin{cases} 0,& \text{if } n \geq m, \text{but the next n gets smaller}: n-m\\ m / n,& \text{if } n < m, \text{and the last n gets smaller}: n-m \bmod n\\ \end{cases}

So if the qps is too high, it’ll cause a delay in the response time. What’s worse, if the qps consistently exceeds the limit, the delay will be accumulated.

However, things’re going well in the real world in most cases.

Because the average query time shoule be quite small (like 10ms), while the connection pool size is usually large enough(like 50).

Then the limit of QPS is quite high(5000 in this case).


MongoDB vs MySQL

Comparison

Performance is rarely the primary factor when choosing between MongoDB and MySQL.

I’m not trying to convince developers to use MongoDB instead of MySQL, or vice verse, or bragging about the benifits of MongoDB. Even if it looks like so…

Read this: MongoDB vs. MySQL Differences

The “real” differences / considerations are:

  • user-friendliness
    No schema means less work, but it also means less control. Do you really need it? – Think about this question.
    What’s more, MongoDB also provides a collection of friendly APIs, which makes it easier to use.

  • scalability
    MongoDB provides both sharded cluster and replica set to scale horizontally and vertically, while MySQL doesn’t.

  • performance
    Like we’ve mentioned above, this is usually not the primary factor cuz it’s really difficult to assessing two completely different database systems.
    For example, MySQL is optmized for join tables that have been appropriately indexed, while MongoDB uses $lookup to achieve this function - which is slower.
    However, join is not how Mongo Doc tend to be used - why don’t you merge those collections into the same one to make it simple and compact?

  • dev experience
    You do not need to exec any DDL with MongoDB.
    You do not need ORM with MongoDB.

And here’re some frequently asked questions within the community:

  1. Does MongoDB faster than MySQL?
    Hard to explain and assess, cuz you can always choose to denormalize the tables when using MySQL, and anohter consideration is how you query the data.

    • Generally speaking, querying a single doc from MongoDB is faster than using join with MySQL.

      It’s easy to understand, join cross a lot of tables and it’s slow, while MongoDB stores the doc in a single collection.

    • Querying a single document from MongoDB is theoretically faster than querying a single row from MySQL in theory when using secondary indexes and the queried columns are not covered by the indexes.

      Be aware of that’s only ‘in theory’, the reason is MongoDB stores the disk pointer at the leaf nodes, while MySQL stores the primary key at the leaf nodes, thus MySQL has to do an extra lookup on the clustered index.

    • Range query docs from MongoDB is theoretically slower than range query rows from MySQL.

      The reason is the same as above, MySQL can use the clustered index to get the rows directly, while MongoDB has to do an extra lookup on the disk.

    However, the real world is more complex, and the performance of the database is not just about the query speed. For example, the join in MySQL is slow, but it’s not a big deal if you have a small dataset.

  2. Does MongoDB support ACID Tx?
    Yes, and it makes MongoDB even more user-friendly.

  3. Does MySQL support JSON docs?
    Yes, but still, it’s not as good as MySQL’s BSON. Cuz MySQL’s approach can detract from developer productivity, consider the following:

    • You’ll have to spend time on learning the MySQL-specific SQL functions to access documents - not user-friendly enough.
    • You’re tied to the low-level drivers like JDBC - these layers impose high learning overhead.
    • MySQL drivers do not have the capability to properly and precisely convert JSON into a useful native data type used by the application.
    • MySQL cannot validate the schema of docs, while MongoDB can.

Anyway, the choice between MongoDB and MySQL is not a simple one, and it’s not just about performance. And in my daily work, I use PostgreSQL to store documents, too - I really spent some time on learning it(the SQLs, the functions and the column types), but for me it’s not a big deal :)


What happens when a doc grows larger?

Since MongoDB is schema-less, we have the flexibility to append any data to a document as needed. This allows for dynamic growth, but it also raises important considerations regarding performance and storage efficiency.

The first question to consider is: how does MongoDB store docs on disk? We’re not going to explain it in detail, instead, let’s simplify it into this question: are those docs stored in a compact or sparse format?

Compact:
┌──────┬──────┬──────┬──────┐
│ doc1 │ doc2 │ doc3 │ doc4 │
└──────┴──────┴──────┴──────┘

Sparse:
┌──────┬──────┬──────┬──────┐
│ doc1 │ free │ doc2 │ free │
└──────┴──────┴──────┴──────┘

From a common-sense perspective, a sparse format seems more reasonable. It allows data to be appended to a document without requiring the entire document to be relocated, though this comes at the cost of increased disk space usage.

But the truth is, WiredTiger uses the compact format to store documents on disk.

Before we dive in, it’s important to clarify one key point: “compact” does not imply absolute compaction. There is still some gap space between slots, but compared to a sparse format, these gaps are significantly smaller.

Each slot is allocated in sizes of 2^n, and since most documents are smaller than the allocated slot, there is usually some leftover space available for appending additional data.

So how does WiredTiger deal with the situation when a document gets larger than the slot?

It moves the doc elsewhere — often to the tail end of the available slots, and leaves a pointer in the original location to reference its new position:

┌──────┬──────┬──────┬──────┬──────────┐
│ doc1 │ ptr_ │ doc3 │ doc4 │ new_doc2 │
└──────┴──┬───┴──────┴──────┴──────────┘
          │                      ▲
          └──────────────────────┘

And it’ll compact thoes fragmented slots in the background.


Things are completely different in MySQL. Since MySQL enforces a schema, you must modify the table structure using DDL (Data Definition Language) before appending extra data to a row.

Basically there’re 2 ways to run DDL:

  1. INSTANT, only modifies the metadata of the table, and the actual data is not moved.
  2. COPY TABLE, create a new temporary table, copy the data to the new table, and then swap the tables and delete the original one.

The COPY TABLE approach is straightforward but comes with a major drawback—it’s slow.

On the other hand, the INSTANT method is extremely fast, but it may lead to increased page splits during future write operations.

E.g. If we execute this SQL query immediately after altering a row using the INSTANT method, it will still be super slow:

alter table userinfo add new_col varchar(255) default null, ALGORITHM=INSTANT;

update userinfo set new_col = 'xx'; -- this sql leads to a lot of page splits

Rust

Cross Compile

See this blog: Rust Cross Compile on OSX

Neovim

Check my NeoVim configuration here.

Lua snippet

-- print debug info to another buf
vim.cmd("new")
local buf = vim.api.nvim_get_current_buf()
vim.api.nvim_buf_set_lines(buf, 0, -1, false, vim.split(vim.inspect(your_own_target), "\n"))

local buf = vim.api.nvim_create_buf(false, true)
vim.api.nvim_buf_set_lines(buf, 0, -1, false, result)
vim.api.nvim_buf_set_option(buf, "modifiable", ext_opt.display_with_buf.modifiable) -- deprecated
vim.api.nvim_set_option_value("modifiable", ext_opt.display_with_buf.modifiable, { buf = buf })
vim.cmd('botright ' .. ext_opt.display_with_buf.height .. ' split | ' .. buf .. 'buffer')

Common

Common Sense

  • x86-64:
    The 64-bit version of the x86 architecture implemented by Intel and AMD processors used in the majority of laptops, desktops, servers, and some game consoles. While the originally 16-bit x86 architecture and its very popular 32-bit extension were developed by Intel, the 64-bit version that we now call x86-64 was initially an extension developed by AMD, often referred to as AMD64. Intel also developed its own 64-bit architecture, IA-64, but ended up adopting AMD’s more popular x86 extension instead (under the names IA-32e, EM64T, and later Intel 64).

  • ARM64:
    The 64-bit version of the ARM architecture used by nearly all modern mobile devices, high performance embedded systems, and also increasingly in recent laptops and desktops. It is also known as AArch64 and was introduced as part of ARMv8. Earlier (32-bit) versions of ARM, which are similar in many ways, are used in an even wider variety of applications. Many popular microcontrollers in every kind of embedded system imaginable, from cars to electronic COVID tests, are based on ARMv6 and ARMv7.


Max open files limit

One day I found that the max open files limit of my server was set to 1024:

ulimit -n
1024

ulimit -Hn
1048576

And then I checked the java service process:

cat /proc/2562403/limits

Limit                     Soft Limit           Hard Limit           Units
Max open files            1048576              1048576              files

I checked another rust process:

cat /proc/2562403/limits

Limit                     Soft Limit           Hard Limit           Units
Max open files            1024                 1048576              files

As we can see, the Max open files limit of the java process was set to 1048576, which was increased to the hard limit of the system. Both of them were set to 1048576.

So I did a little bit research and figured out that was the JVM default behavior.

Check this out:


Don’t forget ssh-agent

One day a colleague asked me why he couldn’t clone the git repo from github. I checked his github ssh key settings, and it’s all good.

Then I tried to regenerate a brand new ssh key pair and configured it in the github settings. But it still didn’t work.

I solved the problem by doing these steps:

# Check the ssh connection to github
ssh -T git@github.com
# The output was like this:
# Hi ${another_guys_username}! You've successfully authenticated, but GitHub does not provide shell access.

# Bingo! The reason must be that he forgot to clean the `ssh-agent`.
# So I checked with this command and of course, there was an old key in the agent:
ssh-add -l

# Finally, I removed the old key from the agent:
ssh-add -D

Tools

# compress image
magick before.png -resize 35% after.png

# transcode audio
ffmpeg -i path/to/input_audio.flac -ar 44100 -sample_fmt s16 path/to/output_audio.wav

# convert flac to alac
xld -o ~/Downloads song.flac -f alac

# generate gpg keys
gpg --full-generate-key
gpg --list-secret-keys --keyid-format=long
gpg --armor --export $(KEY_ID_OR_SUB_KEY_ID)

Mac

to change the password of root: the Directory Utility.app

Simple Math

I’ve been doing some practicing(kata) on Codewars, and I’ve noticed that there’re some frequently used math tricks / formulas.

GCD / LCM

// gcd(a, b) = gcd(b, a % b)
func gcd(a, b *big.Int) *big.Int {
	for b.Cmp(big.NewInt(0)) != 0 {
		a, b = b, new(big.Int).Mod(a, b)
	}
	return a
}

// (a * b) / gcd(a, b)
func lcm(a, b *big.Int) *big.Int {
	return new(big.Int).Div(new(big.Int).Mul(a, b), gcd(a, b))
}

Prime

// gen prime of [0, m]
fn gen_prime(m: u64) -> Vec<u64> {
    let mut primes = vec![2];
    let limit = (m as f64).sqrt().ceil() as u64;
    for i in (3..m).step_by(2) {
        let mut is_prime = true;
        for &p in &primes {
            if p * p > i {
                break;
            }
            if i % p == 0 {
                is_prime = false;
                break;
            }
        }
        if is_prime {
            primes.push(i);
        }
    }
    primes
}

// find the prime divisors of n
fn prime_factors(mut n: u64) -> HashMap<u64, u64> {
    let mut factors = HashMap::new();
    let mut divisor = 2;

    while n >= divisor * divisor {
        while n % divisor == 0 {
            *factors.entry(divisor).or_insert(0) += 1;
            n /= divisor;
        }
        divisor += 1;
    }

    if n > 1 {
        *factors.entry(n).or_insert(0) += 1;
    }

    factors
}

Bit Op

  • xor
fn xor_encrypt_decrypt(data: &str, key: u8) -> String {
    data.chars().map(|c| (c as u8 ^ key) as char).collect()
}

fn xor_swap(a: &mut u8, b: &mut u8) {
    *a ^= *b;
    *b ^= *a;
    *a ^= *b;
}

fn xor_find_once_in_twice(arr: &[u8]) -> u8 {
    arr.iter().fold(0, |acc, &x| acc ^ x)
}

fn xor_bit_flip(bit: u8) -> u8 {
    static MASK: u8 = 0b11111111;
    bit ^ MASK
}

fn xor_checksum(data: &[u8]) -> u8 {
    data.iter().fold(0, |acc, &x| acc ^ x)
}

Which x for that sum?

This is a Codewars kata, you can find it here.

Actually this isn’t that ‘simple’… Cuz I almost forgot those math fundamentals. And thanks to the ChatGPT which helped me solve this kata!!! (•`_´•)!!!

Consider we have a sum of the num sequence below:

Sum=x+2x2+3x3+...+nxnSum = x + 2x^2 + 3x^3 + \text{...} + nx^n

We can convert it into:

k=1nkxk\sum_{k=1}^n{kx^k}

And when x in its domain of convergence, the sum goes to a finite limit depending on x(when n goes to infinite). Now with the given Sum, how to calculate the value of x?

  1. Before we start, we could easily tell that abs(x) < 1, otherwise the sum should be infinite.

  2. First let’s consider the sum this sequence:

    k=0nxk=1+x+x2+...+xn\sum_{k=0}^n{x^k} = 1 + x + x^2 + \text{...} + x^n

    With this formula:

    Sn=a(1rn)1rS_n = \frac{a(1-r^n)}{1-r}

    We got:

    k=1nxk=1(1xn)1x=1xn1x\sum_{k=1}^n{x^k} = \frac{1(1-x^n)}{1-x} = \frac{1-x^n}{1-x}

    And when n goes infinity:

    k=1xk=1xn1x=11x(cuz xn goes to 0)\sum_{k=1}^\infty{x^k} = \frac{1-x^n}{1-x} = \frac{1}{1-x} (\text{cuz } x^n \text{ goes to 0})

  3. Now let’s calculate the derivative of this sum:

    First let’s cal the left side:

    ddxk=1xk=k=1kxk1\frac{d}{dx}\sum_{k=1}^\infty{x^k} = \sum_{k=1}^\infty{kx^{k-1}}

    Now the right side:

    ddx11x=1(1x)2\frac{d}{dx}\frac{1}{1-x} = \frac{1}{(1-x)^2}

    So we have:

    k=1kxk1=1(1x)2\sum_{k=1}^\infty{kx^{k-1}} = \frac{1}{(1-x)^2}

    We almost got there! Now we only need to plus x to both side, we can get the original formula:

    k=1kxk=x(1x)2\sum_{k=1}^\infty{kx^{k}} = \frac{x}{(1-x)^2}

  4. Finally, all we need to do is solve the equation:

    x(1x)2=S(given already)\frac{x}{(1-x)^2} = S(\text{given already})

    we get:

    x=(2S+1)±4S+12Sx = \frac{(2S + 1) \pm \sqrt{4S + 1}}{2S}

    for abs(x) < 1, we should choose this answer:

    x=(2S+1)4S+12Sx = \frac{(2S + 1) - \sqrt{4S + 1}}{2S}

    So the final code is:

    fn solve(S: f64) -> f64 {
        ((2.0 * S + 1.0) - (4.0 * S + 1.0).sqrt()) / (2.0 * S)
    }

Power Tower

I ain’t gonna lie, it’s out of my league :-(

See this kata.

There are two important corollaries:

  1. (HARD) When we consider a number as a base, its behavior under the modulo 10 system depends only on its value under the modulo 20 systems.
    For example:

    21^1 % 10 = 1^1 % 10
    21^2 % 10 = 1^2 % 10
    21^3 % 10 = 1^3 % 10
  2. (EASY) The last number of power calculation is periodic, and the period could reach to 4.
    For example:

    0 = [0]
    1 = [1]
    2 = [2, 4, 8, 6, ...]
    3 = [3, 9, 7, 1, ...]
    4 = [4, 6, ...]
    5 = [5]
    6 = [6]
    7 = [7, 9, 3, 1, ...]
    8 = [8, 4, 2, 6, ...]
    9 = [9, 1, ...]

With these two corollaries, we can easily solve this kata:

fn last_digit(lst: &[u64]) -> u64 {
    lst.into_iter()
       .rev()
       .fold(1u64, |p, &v| {
           let base = if v >= 20 { v % 20 + 20 } else { v };
           let exp = if p >= 4 { p % 4 + 4 } else { p };

           base.pow(exp as u32)
       }) % 10
}

Magic Square

See the wiki here.


Pascal’s Triangle (杨辉三角)

See the wiki here

A important collary is the ith number at nth level(n start from 0) is:

Numn,i=n!i!(ni)!Num_\text{n,i} = \frac{n!}{i!(n-i)!}

Network

Hub, switch, router

  • Hub: all the devices are connected to the hub(with cable), and the hub broadcasts the packets to all the devices.

  • Switch: basically the same as hub, but it has a MAC addr table, so it can forward the packets to the specific device.

  • Router: it’s used to connect different networks, and it has an IP addr table, so it can forward the packets to the specific network.

To be short, assume the topology is like this:

A - Switch1 - B
      |
    Router
      |
C - Switch2 - D

If a packet is send to a device which is in the same subnet, the packet will be sent to the target device directly (or through switch), like A to B.

To judge whether the target device is in the same subnet, we do calculation like this: IP & Mask and compare this value.

Otherwise if the target device is not in the same subnet, the packet will be sent to the Router, and the Router will forward the packet to the target device.

And the router is normally known as the default gateway. Actually, in the real world, the default gateway is somewhere you send the packet to when you don’t know where to send it.

A typical route table can be like this:

Network Gateway Interface
0.0.0.0/0 ISP Gateway ppp0

All the unrecognized packets will be treated as 0.0.0.0 and be sent to the ISP gateway.


NAT

To be short, all the devices behind a router share the same public IP address, and the router will translate the private IP address to the public IP address when the packet is sent to the internet.

The NAT table is like this:

Source IP Source Port Target IP Target Port Status
192.168.31.100 54321 140.82.113.3 443 ESTABLISHED

If there’re 2 devices connected to the same target, we need to add another layer to avoid conflict:

Internal IP Internal Port External IP External Port Target IP Target Port Status
192.168.31.100 54321 203.0.113.2 50001 140.82.113.3 443 ESTABLISHED
192.168.33.100 54321 203.0.113.2 50002 140.82.113.3 443 ESTABLISHED


Backend Misc
https://kayce.world/tech/backend_misc/
Author
kayce
Posted on
December 30, 2024
Updated on
February 25, 2025
Licensed under