Mysql insert on duplicate key update

1/17/2024

You can use some in-memory data store like Redis to share the request UUIDs across multiple processes/hosts but that can add a lot more complexity, latency and risks and is just not worth it IMO.Įven in a single-host setup, it can be a problem because what happens if your process crashes and restarts just after a resource was created in the database but before the ID was sent to the client (with success message)? You would end up with a duplicate record in your DB after the retry since your newly restarted server would not have the UUID in its memory (even though the resource was in fact already created on the server a few milliseconds before the last crash). It's still not as elegant because it doesn't work in a multi-process or multi-server back end if the request fails, you need to make sure that the retry request will hit the same server process (assuming you're keeping the UUIDs in memory on the server). It's a hack to pretend that resource creation starts in the database, when in fact, it starts on the front end. It just seems completely wrong that the client (which created the resource) cannot reference it until the database has inserted it. Just because the client did not receive a successful response from the server, it does not mean that the resource was not created and if it was, you cannot know since the client has no way of referencing that resource in a reliable way. With auto-incrementing IDs, if there is a connection issue or other failure, it's not possible to cleanly figure out if a resource was already created or not since only the database knows what the ID of that resource was until that ID is sent back and reaches the client across the network. One of the best things about UUIDs is that if your front end tries to create a resource and fails due to a bad connection, you can simply resend that exact same resource for creation again (with the same UUID) and you don't need to worry about duplicate records being inserted into the database since you would get an ID collision it results in much simpler/cleaner code which is idempotent and deterministic by default.

Recently, I wrote a client-side wallet app for cryptocurrency use case and it required transactions to be created and signed entirely on the client-side and so the transaction UUID had to be created on the client side (before signing) and I was surprised at how elegant and simple the application's logic turned out to be (both on the front end and back end). The thing is - BTREE would not have write amplification problem either, because the timestamps are going to be sequential (hence no updates to random leaf pages). If you create BRIN on timestamp, that's not going to have write amplification problem, and it'll be good for querying. But it's also a bit pointless, because BRIN on random data is pretty useless for querying (Well, at least the minmax indexes, are. This is why BRIN does not have the write amplification issue. If the 10 rows get appended to the same table page, that'll be just 1 write, with one FPI. If you have BRIN index on UUID column, this does not happen, because the index is not based on ordering but location in the table. So because btrees are based on ordering, random values end up on random leaf pages, causing write amplification. And every first update of a page after a checkpoint (which typically happens every 30 minutes or so), we have to write a FPI (i.e. Imagine inserting 10 random UUID values into a large index - it's pretty likely those will go into 10 different leaf pages. The main source of write amplification comes from updating random pages of the btree index. This is a bit confusing, as it mixes two things - BRIN index and index on a timestamp. IMO, unless you're building an app for high-frequency trading, auto-incrementing IDs aren't worth the pain and lack of flexibility. Also, you could simply index by a separate date/timestamp field if you need records to be ordered by time and probably won't incur any performance cost. I suspect that the average per-query performance loss for a typical app is probably less than 5%. How often do you need to access 10 million contiguous records? Most of the time, for user-facing apps, you'll be accessing 100 contiguous records at most and, in fact, most queries will access a single record. COUNT queries aren't very efficient because typically, all records are traversed here we're talking about a 50% slowdown on 10 million records.

UUID actually performs much better than I thought based on the author's example with a COUNT query. I can't think of many use cases where I would sacrifice the beauty and elegance of UUIDs to optimize access times by a millisecond or two.

0 Comments

Mysql insert on duplicate key update

Leave a Reply.

Author

Archives

Categories