Editorial for COCI '12 Contest 1 #6 Mars
Submitting an official solution before solving the problem yourself is a bannable offence.
We will describe two solutions based on dynamic programming.
For the purposes of both solutions, we will call sets of positions in the sequence the
-structures, sets
the
-structures, sets
the
-structures and so on. Two bacteria in the same
-structure will be called siblings; if they are not siblings, but are in the same
-structure, we'll call them first cousins; if they are neither siblings nor first cousins, we'll call them second cousins etc.
The first solution
(devised by Luka Kalinovčić while solving COCI)
Imagine that we are adding bacteria into the sequence from left to right; then the state is described by the position in the sequence and the number of the bacterium placed in that position.
At first glance, these two parameters appear insufficient, but it can be shown that they are in fact sufficient. Let and
be two adjacent positions in the sequence, with a bacterium marked prev placed in position
. Let us show that we can, without knowledge about the rest of the sequence constructed before position
, unambiguously determine which bacteria are candidates for placement in position
(thus giving us the set of possible following states, making the state transition in dynamic programming possible). With that goal, let us consider the following cases:
- 1)
and
are in the same
-structure. It is clear that the only candidate for
is the sibling of prev.
- 2) Case 1 doesn't apply, but
and
are in the same
-structure. Here candidates for position
are the two first cousins of prev.
- 3) None of the previous cases apply, but
and
are in the same
-structure. Here candidates for position
are the four second cousins of prev.
- …
) None of the previous cases apply, but
and
are in the same
-structure (spanning all positions). Here candidates for position
are the
most distant cousins of prev.
Notice that this reasoning leads to the exact set of candidates for the next position, since none of the candidates could have been used before (take a moment to think and convince yourself).
The complexity of the algorithm is the number of states, , multiplied by the transition complexity (number of candidates), which, at first glance, totals
. Since the number of candidates is
in
of cases,
in
of cases,
in
of cases and so on, the actual complexity is even lower:
.
The second solution
Let be the smallest possible length of a subsequence starting with bacterium
, ending with bacterium
, and including precisely the bacteria which are descendants of the first (lowest) common ancestor of
and
(denoted by
). The agenda is as follows:
- 1) Compute
for all
that are siblings.
- 2) Compute
for all
that are first cousins.
- 3) Compute
for all
that are second cousins.
- …
) Compute
for all
that are most distant cousins.
The final result will, of course, be the smallest of the numbers obtained in the last, step, since they cover all sequences containing all
bacteria.
Let us denote with the repulsion between bacteria
and
.
values in the first step are trivial to obtain, since
. Other steps are a bit more complex, so bear with us.
For given and
, we can try out all selections of bacteria
and
that "divide structures
and
", i.e. are exactly in the middle of the subsequence we're building between
and
, where
is the cousin closer to
than
, while
is the cousin closer to
than
. They cannot be too close cousins, since we have to fit half of all descendants of
between
and
, and analogously for
– in other words,
and
must be the two children of
.
For fixed , and a selection of bacteria
and
, the smallest possible length of the segment between
and
is obviously:
Therefore, is equal to the minimum of this expression over all valid choices of bacteria
and
.
We need to find a fast method of computing this transition. If, for given and
, we select some bacterium
, we need to minimize:
for all legal choices of . Notice that this minimum doesn't depend on
. Let us denote it by
.
We can thus, before computing values of for the current step, precompute values of
needed for that step (where
is obtained by trying out all valid values of
) and then use the
values to obtain
values for that step more efficiently.
The complexity is obtained by summing the complexity for computing values and for computing
values. Both are at most
since they compute
values, each of which requires one for loop. Since this loop is often short because of relatively few valid choices, mathematically inclined readers are encouraged to determine whether this complexity is really
or somewhat smaller, as in the previous solution.
Comments