PrevNext
Rare
 0/28

Square Root Decomposition

Authors: Benjamin Qi, Neo Wang, Mihnea Brebenel

Splitting up data into smaller chunks to speed up processing.

Edit This Page

Focus Problem – try your best to solve this problem before continuing!

You should already have done this problem in Point Update Range Sum, but here we'll present two more approaches. Both run in O(QN)\mathcal{O}(Q\sqrt N) time.

Blocking

We partition the array into blocks of size block_size=N\texttt{block\_size}=\lceil \sqrt{N} \rceil. Each block stores the sum of elements within it, and allows for the creation of corresponding update and query operations.

Update Queries: O(1)\mathcal{O}(1)

To update an element at location xx, first find the corresponding block using the formula xblock_size\frac{x}{\texttt{block\_size}}.

Then, apply the corresponding difference between the element currently stored at xx and the element we want to change it to.

Sum Queries: O(N)\mathcal{O}(\sqrt{N})

To perform a sum query from [0r][0\ldots r], calculate

i=0R1blocks[i]+Rblock_sizernums[i]\sum_{i = 0}^{R-1} \texttt{blocks}[i] + \sum_{R \cdot \texttt{block\_size}}^r \texttt{nums}[i]

where blocks[i]\texttt{blocks}[i] represents the total sum of the ii-th block, the ii-th block represents the sum of the elements from the range [iblock_size,(i+1)block_size)[i\cdot \texttt{block\_size},(i + 1)\cdot \texttt{block\_size}), and R=rblock_sizeR=\left\lceil \frac{r}{\texttt{block\_size}} \right\rceil.

Finally, i=lrnums[i]\sum_{i=l}^{r} \texttt{nums}[i] is the difference between the two sums i=0rnums[i]\sum_{i=0}^{r}\texttt{nums}[i] and i=0l1nums[i]\sum_{i=0}^{l-1}\texttt{nums}[i], which each are calculated in O(N)\mathcal{O}(\sqrt N).

C++

#include <bits/stdc++.h>
using namespace std;
struct Sqrt {
int block_size;
vector<int> nums;
vector<long long> blocks;
Sqrt(int sqrtn, vector<int> &arr) : block_size(sqrtn), blocks(sqrtn, 0) {
nums = arr;
for (int i = 0; i < nums.size(); i++) { blocks[i / block_size] += nums[i]; }

Java

import java.io.*;
import java.util.*;
public class DRSQ {
public static void main(String[] args) throws IOException {
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
PrintWriter pw = new PrintWriter(System.out);
StringTokenizer st = new StringTokenizer(br.readLine());
int n = Integer.parseInt(st.nextToken());

Combining Algorithms

Focus Problem – try your best to solve this problem before continuing!

Doing this problem with DP has a time complexity of O(N2)\mathcal{O}(N^2).

C++

vector<int> dp(n, 1);
for (int i = n - 1; i >= 0; i--) {
for (int x = i + a[i]; x <= n; x += a[i]) { dp[i] = (dp[i] + dp[x]) % MOD; }
}

If we try prefix sums, the complexity is still O(N2)\mathcal{O}(N^2).

C++

for (int i = n - 1; i >= 0; i--) {
dp[i] += s[a[i]][i % a[i]];
dp[i] %= MOD;
for (int j = 1; j <= x; j++) {
s[j][i % j] += dp[i];
s[j][i % j] %= MOD;
}
}

We can apply the DP algorithm to the steps where Aix>=(N)A_i \cdot x >= \sqrt(N) because the jump is bigger, thus resulting in a faster loop We can apply prefix sums for the remaining cases where Aix<(N)A_i \cdot x < \sqrt(N).

This trick allows us to combine two O(N2)\mathcal{O}(N^2) algorithm into one O(N(N))\mathcal{O}(N\sqrt(N)) algorithm.

C++

#include <bits/stdc++.h>
using namespace std;
const int MOD = 998244353;
int main() {
int n;
cin >> n;
vector<int> a(n);
for (int i = 0; i < n; i++) { cin >> a[i]; }

Batching

See the CPH section on batch processing.

Maintain a "buffer" of the latest updates (up to N\sqrt N). The answer for each sum query can be calculated with prefix sums and by examining each update within the buffer. When the buffer gets too large (N\ge \sqrt N), clear it and recalculate prefix sums.

C++

#include <bits/stdc++.h>
using namespace std;
int n, q;
vector<int> arr;
vector<long long> prefix;
/** Build the prefix array for arr */
void build() {
prefix[0] = 0;

Java

Warning: TLE

Due to tight time constraint on CSES, the Java implementation might get TLE.

import java.io.*;
import java.util.*;
public class DRSQ {
static int[] arr;
static List<Long> prefix;
public static void main(String[] args) throws IOException {
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
PrintWriter pw = new PrintWriter(System.out);

Mo's Algorithm

Focus Problem – try your best to solve this problem before continuing!

Resources
CF

very brief description

HE

elaborate description with proof

CPH

C++

#include <bits/stdc++.h>
using namespace std;
struct Query {
int l, r, idx;
};
int main() {
int n;
cin >> n;

Additional Notes

  • Low constraints (ex. n=5104n=5\cdot 10^4) and/or high time limits (greater than 2s) can be signs that square root decomposition is intended.

  • In practice, it is not necessary to use the exact value of n\sqrt n as a parameter, and instead we may use parameters kk and n/kn/k where kk is different from n\sqrt n. The optimal parameter depends on the problem and input. For example, if an algorithm often goes through the blocks but rarely inspects single elements inside the blocks, it may be a good idea to divide the array into k<nk<\sqrt n blocks, each of which contains n/k>nn/k > \sqrt n elements.

  • If an update takes time proportional to the size of one block (O(n/k)\mathcal{O}(n/k)) while a query takes time proportional to the number of blocks times logn\log n (O(klogn)\mathcal{O}(k\log n)) then we can set knlognk\approx \sqrt{\frac{n}{\log n}} to make both updates and queries take time O(nlogn)\mathcal{O}(\sqrt{n\log n}).

  • Solutions with worse complexities are not necessarily slower (at least for problems with reasonable input sizes, ex. n5105n\le 5\cdot 10^5). I recall an instance where a fast O(nnlogn)\mathcal{O}(n\sqrt n\log n) solution passed (where logn\log n came from a BIT) while an O(nn)\mathcal{O}(n\sqrt n) solution did not. Constant factors are important!

On Trees

The techniques mentioned in the blogs below are extremely rare but worth a mention.

Some more discussion about how square root decomposition can be used:

Resources
CF

format isn't great but tree example is ok

Problems

Set A

Problems where the best solution involves square root decomposition.

StatusSourceProblem NameDifficultyTags
CFEasy
Show TagsSqrt
POIEasy
Show TagsSqrt
CFEasy
Show TagsSqrt
CFEasy
Show TagsSqrt
JOINormal
Show TagsDP, Sqrt
YSNormal
Show TagsSqrt
CFNormal
Show TagsMo's Algorithm
APIOHard
Show TagsSqrt
JOIHard
Show TagsSOS DP
PlatinumVery Hard
Show TagsSqrt
DMOPCVery Hard
Show TagsSqrt
Wesley's Anger ContestVery Hard
Show TagsSqrt

Set B

Problems that can be solved without it. But you might as well try to use it!

StatusSourceProblem NameDifficultyTags
JOINormal
Show Tags2DRQ, Mo's Algorithm
IOINormal
Show TagsSqrt
PlatinumNormal
Show TagsSqrt
PlatinumHard
Show TagsSqrt
CFHard
Show TagsSqrt
CFHard
Show TagsHLD
TLXHard
Show TagsSqrt
CSAHard
Show TagsSqrt
Old GoldHard
Show TagsConvex
CFVery Hard
Show TagsConvex
IOIVery Hard
Show TagsSqrt
PlatinumVery Hard
Show TagsSqrt
IOIVery Hard
Show Tags2DRQ

Module Progress:

Join the USACO Forum!

Stuck on a problem, or don't understand a module? Join the USACO Forum and get help from other competitive programmers!

PrevNext