Read committed re-evaluation
Read committed. Default Isolation level for postgres based transactions. As Isolation level it prevents transactions to see uncommitted changes of other transactions(transaction still sees own uncommitted changes). If you are not familiar with this isolation level I highly encourage you to take a look at the Official documentation. However, some sentences from documentation were unclear fo me so I assume that I could clarify them for those who are as bad in Database theory as I am.
Re-evaluated
The most confusing sentence for me was the one that describes re-evaluation of WHERE
clause during two concurrent updated,
namely:
The search condition of the command (the WHERE clause) is re-evaluated to see if the updated version of the row still matches the search condition. If so, the second updater proceeds with its operation using the updated version of the row. In the case of SELECT FOR UPDATE and SELECT FOR SHARE, this means it is the updated version of the row that is locked and returned to the client.
Will the second updater see all the changes made by the first one? Let's see with our best friends
Alice and Bob and the best designed table called
account
CREATE TABLE account(id int primary key,name
text,amount int);
Now let's insert two values into the table
INSERT INTO account(id,name,amount) VALUES (1,'Bob',200),(2,'Bob',800);
.
Then open two parallel transactions using psql
(BTW, do you know about pgcli? I use it daily instead of
heavyweight pgAdmin).
The first transaction will execute (and not commit) the following query
test> update account set amount = 300 where amount = 200; --1
.
And the second one will execute this query
update account set amount = 400 where amount =
200;--2
.
So as the second transaction wants to update a row which has already updated and hasn't been
committed yet
it will be locked. The question is , will it update amount to 400 when the first transaction will
commit it's changes.
Let's see , commit first transaction and see if the second transaction will update amount from 300
to
400.
test> select amount from account order by name desc
800
300
So far so good, Postgres follows the documented behavior. The second transaction re-evaluated WHERE
clause and as the first transaction committed changes
then the second one didn't update any row cause the amount was changed from 200 to 300 and search
condition in the second transaction skipped this row.
What about Subqueries?
And here is the most interesting part. Let's return to our previous statement when Bob had two
accounts with 800 and 200 respectively.Open two parallel transactions.First one will execute the
following query (remember transaction didn't commit this update)
UPDATE account set amount = 100 where amount = 200; -- 1
.
And after that the second transaction will execute this query.
UPDATE account set amount = amount+50 --2
where name in
(select DISTINCT name from account group by
name having sum(amount)>=1000); -- 2
So the second transaction wants to add 50 to all accounts whose total amount is bigger or equals to
1000. As it wants to update the same row as the first transaction it will be locked.
Bob's total amount is 1000. However the first transaction changed the amount of the first account
from
200 to 100
so now Bob should have 900 as soon as the first transaction will be committed. Will the second
transaction re-evaluate subquery
and skip Bob's accounts? Let's see. Commit first transaction and then the second one.
Execute select and account table will look like this.
SELECT * from account
-[ RECORD 1 ]
id | 1
name | Bob
amount | 850
-[ RECORD 2 ]
id | 2
name | Bob
amount | 150
Which means that the second transaction didn't re-evaluate subquery and updated Bob's accounts even
if their total was 900 before changes from the second transaction. Why ? As far as I understand ,
the WHERE
clause in the second query where name in...
was reevaluated and
it checked if name was changed.
However the subquery has already been finished the execution and it saw the state when Bob had total amount equals to 1000.
How to fix it? Well the easiest way would by to execute
two transactions in REPEATABLE READ
isolation level.
In this case the second transaction will fail with could not serialize access due to
concurrent update
exception and your application code
will have to add retry logic for this transaction.
Conclusion
For me it was unobvious that READ COMMITTED
doesn't re-evaluate subqueries inside WHERE
clause but this is how it works. Here is my
StackOverflow
question related to it.