Editorial for IOI '09 P7 - Regions
Submitting an official solution before solving the problem yourself is a bannable offence.
Although the employees are already assigned numbers in the input, the numbers can be reassigned in a way that makes them more useful. The supervisor relationships clearly organize the employees into a tree. Assign the new employee numbers in a pre-order walk of the tree3. Figure 1 shows an example of such a numbering.
3 A pre-order walk of a tree first processes the root of that tree, then recursively processes each sub-tree in turn.

Figure 1: An example of numbering employees by a pre-order walk. The bottom numbers indicate the range of employee numbers in each sub-tree.
A useful property of this numbering is that all the employees in a sub-tree have sequential numbers. For a given employee , let
be the range of employee numbers managed by
. Notice that for a region, we can construct an ordered array of all the interval end-points for that region, and a list of all employees in that region. This can be done during the assignment of numbers in linear time.
Now let us consider how to answer queries . Let the sizes of the regions be
and
respectively. Given this data structure, a natural solution is to consider every pair of employees
from these regions and check whether
lies in the interval
. However, this will take
time per query, which we can improve upon.
The interval end-points for region divide the integers into contiguous blocks. All employees in the same block have the same managers from
, and we can precompute the number of such managers for each such block. This gives us a faster way to answer queries. Rather than comparing every employee in
with every block for
, we can observe that both are ordered by employee ID. Thus, one can maintain an index into each list, and in each step advance whichever index is lagging behind the other. Since each index traverses a list once, this takes
time.
Using just this query mechanism can still take time, because all the queries might involve large regions. However, it is sufficient to earn the points for the tests where no region has more than
employees.
Precomputing queries
In the query algorithm above, it is also possible to replace the list of employees in with the entire list of employees, and thus compute the answer to all queries for a particular
. This still requires only a single pass over the blocks for
, so it takes
time to produce all the answers for a particular
. Similarly, one can iterate over all interval end-points while fixing
, giving all answers for a particular
.
This allows all possible queries to be pre-computed in time and
memory. This is sufficient to earn the points for the tests where
.
This algorithm is too slow and uses too much memory to solve all the tests. However, it is not necessary to precompute all answers, just the most expensive ones. We will precompute the answers involving regions with size at least . There are obviously at most
such regions, so this will take
time and
memory. The remaining queries involve only small regions, so they can be answered in
time. Choosing
gives
time and
memory, which is sufficient for a full score.
Caching queries
As an alternative to precomputation, one can cache the results of all queries, and take the answer from the cache if the same query is made again. Let be the number of unique queries. The cost of maintaining the query cache depends on the data structure used; a balanced binary tree gives
overhead for this.
Combining the cache with the algorithm is sufficient to achieve the points for tests that have either no more than
employees per region (because this is the case even without the cache), as well as the cases with no more than
regions (since the total cost of all distinct queries together is
).
To achieve a full score with a cache rather than precomputation, one must use a better method for answering queries. Suppose we have a block in , and wish to find all matching employees from
. While we have previously relied on a linear walk over the employees from
, we can instead use a binary search to find the start and end of the range in
time. This allows the entire query to be answered in
time. A similar transformation (binary searching the blocks for each employee in
) gives
time for each query.
Now when answering each query, choose the best out of the ,
and
query mechanisms. To establish an upper bound on run-time, we will make assumptions about which method is chosen to answer particular types of queries.
Again, divide the problem into large regions with at least employees and the rest. For queries involving one of the large regions, use the
algorithm (where
and
are respectively the smaller and larger of
and
). The caching of queries ensures that this contributes no more than
time. For the remaining queries, use an
algorithm. The smaller regions have at most
employees, so this contributes
time.
The optimal value of occurs when the two parts account for equal time. Solving for this optimal
gives a bound of
for answering non-duplicate queries; combined with the cost for the query cache, this gives an algorithm with time complexity
and memory complexity
.
The time bound is marginally worse than for the first solution, but in practical terms this solution runs at about the same speed and uses significantly less memory.
Comments