This blog was last modified 522 days before.
Question Description
In this question, we need to consider n
pieces of different file fragments and the equal number of channels. Each file fragment has a possibility ${P_i}$ to be visited, and it's promised that ${\sum_{i} P_i} = 1$.
We need to properly store them in the different places on the disk, which could make the Average Random Search Time (We will use ${T}$ to represent this in below) to be the minimum.
Channels and Channel Distance
Consider that we have $n$ channels, represented by ${\left { C_1, C_2,\dots,C_n \right } }$. They are physically next to each other in order.
Now, we define the distance of two different channel ${C_i}$ and ${C_j}$ as belows:
$${ \begin{aligned} D_{ij} &= \left | D_{C_i} - D_{C_j} \right | \\ &= \left | i-j \right | \end{aligned} }$$That is, the distance between two different channel can be represented by the absolute of the difference of their subscript.
Distribute Weight Sequence
Notice that we could determine the place and way how we put these file fragments to the channel to control how $P_i$ distributes.
Distribution Weight Sequence has the pattern like ${\left { W_1, W_2, \cdots, W_n \right }}$. And in this question, what we can do is actually determine the order of these different weight $W_i$.
For example, we have 3 files, and there weight is 5, 10 and 85, then one possible way to place them is like this weight sequence:
$S_{example} = {\left \langle 10, 85, 5 \right \rangle}$
Then $P_1=0.10$, ${P_2 = 0.85}$, and ${P_3 = 0.05}$.
And if you change $S$ (distribution sequence), ${P_i}$ will changes in consequence.
Also, we need to define some other attributes & functions of $S$ for later use.
- $|S|$ : The number of the elements in the sequence. Use the example above, $|S_{example}| = 3$
- $sum(S)$ : The sum of all the weight in the sequence $S$. Use the example above, ${sum(S_{example}) = 100}$
- $[W_i:W_j]$ : Sum of all the weight in this interval $W_i + W_{i+1} + W_{i+2} + \cdots + W_{j}$
- ${\left \langle \dots S \right \rangle}$ : Spread operation of Distribution Weight Sequence. Means that spread all the elements in $S$ to the current position in the same order as they in $S$. For example: ${S = \left \langle 1, 2 \right \rangle}$, then: ${\left \langle 3, \dots S, 4 \right \rangle = \left \langle 3, 1, 2, 4 \right \rangle}$
By this definition, we know now that: ${W_i = P_i \cdot sum(S)}$
Average Random Search Time
The question says that we should try to achieve a minimum Average Random Search Time.
For a certain distribution sequence $S$, which could be calculated by the formula:
$$ \begin{aligned} T(S) &= \sum_{1 \le i < j \le n} {P_i \cdot P_j \cdot D_{ij}} \\ &= \sum_{1 \le i < j \le n} {P_i \cdot P_j \cdot (j-i)} \end{aligned} $$Directly Use Weight In Calculation
When input, we could not directly get the possiblility ${P_i}$ of each file fragment, instead, we get a weight number ${W_i}$, and the possibility could be calculated by ${P_a = \frac{W_a}{\sum_{i} W_i}}$. But to help simplify our solution, we now try to use $W_i$ directly in our process.
If we substitute $P_i$ by $W_i$ in formula below and get a new search time criteria ${T_W}$:
$$ \begin{aligned} T_{W}(S) &= \sum_{1 \le i < j \le n} {W_i \cdot W_j \cdot D_{ij}} \\ &= \sum_{1 \le i < j \le n} {W_i \cdot W_j \cdot (j-i)} \\ T_W(S)&= \sum_{1 \le i < j \le n} {sum(S)^{2} \cdot P_i \cdot P_j \cdot (j-i)} \\ &= sum(S)^{2} \sum_{1 \le i < j \le n} {P_i \cdot P_j \cdot (j-i)} \\ &= sum(S)^{2} \cdot T(S) \end{aligned} $$Then, when $sum(S_1) = sum(S_2) = M \ge 0$, it could be proved that:
$$ \begin{aligned} T_{W}(S_1) > T_{W}(S_2) &\Leftrightarrow sum(S_1)^2 \cdot T(S_1) > sum(S_2)^2 \cdot T(S_2) \\ &\Leftrightarrow M^2 \cdot T(S_1) > M^2 \cdot T(S_2) \\ &\Leftrightarrow T(S_1) > T(S_2) \end{aligned} $$Code Analysis
Now we can consider that quesion as:
We get a batch of weight ${\left { W_1, W_2, \cdots, W_n \right }}$, how to arrange them into $S$ to make $T(S)$ achieves its minimum.
Basic Senario
When ${|S| = 1}$ or ${|S| = 2}$, we could always get the optimal solution.
For example, for two weight: 2 and 8, no matter how to place these two, the result ${T(S)}$ is the same:
$T(\left \langle 2, 8 \right \rangle) = 0.2 \cdot 0.8 \cdot (2-1) = 0.16$
$T(\left \langle 8, 2 \right \rangle) = 0.8 \cdot 0.2 \cdot (2-1) = 0.16$
So they are both minimum and optimal.
We use ${S_{min}}$ to represent a $S$ if this S is the optimal distribution.
Solution For Larger Scale
Consider we have already have a optimal distribution weight sequence $S_{min}$, and for ${S_{min}}$, we know:
- ${|S| = n}$
- The minimum weight inside this sequence: $W_{min} := {min(\dots S_{min})}$
Also, we have a new weight number $W_{new}$, which is less or equal than $W_{min}$ (that means ${W_{new} \le W_{min}}$). Let new sequence:
${S_f = \left \langle W_{new}, \dots S_{min} \right \rangle}$, ${S_l = \left \langle \dots S_{min}, W_{new} \right \rangle}$
Now, based on the condition above, it can be proved that:
after adding ${W_{new}}$ to previous sequence ${S_{min}}$, the new optimal sequence:
$$ \begin{aligned} S_{min}' = \end{aligned} \left \{ \begin{aligned} S_f &, T(S_f) \le T(S_l)\\ S_l \ &, T(S_f) > T(S_l) \end{aligned} \right. $$Which, in words, means that the optimal place for this new element is either the start of the sequence or the end of the sequence.
Prove
Now let first assume we randomly insert this new element into the middle of the previous sequence (neither at the first nor at the end), then we get and new sequence like:
$$\left \langle W_1 \dots, W_{k-1}, W_{new}, W_{k+1}, \dots W_n\right \rangle , (k \ge 2)$$We know let's assume that $[W_1:W_{k-2}] < [W_{k+1}:W_n]$,Notice that if $k=2$ then we consider that $[W_1:W_{k-2}] = 0$.
(Notice: if $[W_1:W_{k-2}] = [W_{k+1}:W_n]$, then change the selection, you could select $[W_1:W_{k-1}] < [W_{k+2}:W_n]$, this will not affact the later process, and we could esaily prove that there always at least one choose method that make this two part not equal)
Now if we exchange the place of $W_{new}$ and ${W_{k-1}}$ and get a new sequence $S_{ex}$, then we have:
$$ \begin{aligned} \Delta T_W =& T_W(S_{ex}) - T_W(S_{min})\\ =& [W_1:W_{k-2}] \cdot ( W_{k-1} - W_{new} ) + [W_{k+1}:W_n] \cdot ( W_{new} - W_{k-1} ) \\ =& [W_1:W_{k-2}] \cdot ( W_{k-1} - W_{new} ) - [W_{k+1}:W_n] \cdot ( W_{k-1} - W_{new} ) \\ =& ( W_{k-1} - W_{new} ) \cdot ([W_1:W_{k-2}] - [W_{k+1}:W_n]) \\ \le & 0 \end{aligned} $$Since $W_{new} \le W_{i}$ for any $i$, the first part of the result $( W_{k-1} - W_{new} ) \ge 0$.
Also, since $[W_1:W_{k-2}] < [W_{k+1}:W_n]$, so we know that for the second part: $([W_1:W_{k-2}] - [W_{k+1}:W_n]) < 0$.
Since in this two sequence, $sum(S)$ is always the same, so $\Delta T_W \le 0 \Leftrightarrow \Delta T \le 0$.
Based on these info, we now know that after changing the place, $\Delta T \le 0$, which means in this case, changing the place is always better than not. And since $W_{new}$ is less than any $W_i$, we can imply that we can continuously changing the place of ${S_new}$ and its adjacent element to make $T(S)$ smaller, until $W_{new}$ arrive at the first or the last of the sequence.
So above all, we now know if the element we are adding are the minimum, than the optimal place will either at the first or at the end of the previous sequence.
Dive Deeper
Now we want to prove that, the optimal solution actually has a specific pattern.
For $n$ weight $\left { W_1, W_2, \dots, W_n \right }$, if these number are in descending order (which means $W_i > W_{i+1}$), then we can say the optimal sequence $S_{min}$ would have the pattern like:
When $|S|$ is odd
$$ \left \langle W_{2k} \cdots,W_4, W_2 , W_1, W_3, W_5, \cdots, W_{2k+1} \right \rangle $$When $|S|$ is even
$$ \left \langle W_{2k} \cdots,W_4, W_2 , W_1, W_3, W_5, \cdots, W_{2k-1} \right \rangle $$If a sequence is in this pattern, we say this sequence is beautiful.
Prove
We can prove this conclusion by mathematical induction.
Base Scale
For any case that $|S|=1$, $|S| = 2$ and $|S| = 3$, it's easy to prove that the optimal sequence is always beautiful.
For Larger Scale
Now assume we have a beautiful sequence $S$. If we add a new $W_{new}$ that $W_{new} \le min(\dots S)$ (which means this new weight is less than or equal to any element in $S$), and get the new optimal sequence $S'$.
In this case, based on the previous conclusion, this new $S_{new}$ could only be placed either at the first or at the end of the previous sequence. So there are two possibilities of new sequence $S'$:
$${ \begin{aligned} {S_f = \left \langle W_{new}, \dots S_{min} \right \rangle} \\ {S_l = \left \langle \dots S_{min}, W_{new} \right \rangle} \\ \end{aligned} }$$- When $|S|$ is odd:
Then if we choose to put the new element at the first:
$$ \begin{aligned} \Delta T_{W_{f}} =& T_W(S_{f}) - T_W(S)\\ =& \sum_{i=1}^{k}W_{2i}W_{new}(k-i) + \sum_{i=0}^{k}W_{2i+1}W_{new}(k+1+i)\\ \end{aligned} $$If we choose to put the new element at the end:
$$ \begin{aligned} \Delta T_{W_{l}} =& T_W(S_{l}) - T_W(S) \\ =& \sum_{i=1}^{k}W_{2i}W_{new}(k+i+1) + \sum_{i=0}^{k}W_{2i+1}W_{new}(k+1-i)\\ \end{aligned} $$Now we are going to check whether is better, put at first or at the end?
$$ \begin{aligned} \Delta T_{W_{f}} - \Delta T_{W_{l}} =& T_W(S_{f}) - T_W(S_l) \\ = & \sum_{i=1}^{k}W_{2i}W_{new}[(k-i)-(k+1+i)] \\ & + \sum_{i=0}^{k}W_{2i+1}W_{new}[(k+1+i)-(k+1-i)] \\ = & \sum_{i=1}^{k}W_{2i}W_{new}(-1-2i) \\ & + \sum_{i=0}^{k}W_{2i+1}W_{new}(2i)\\ = & \sum_{i=1}^{k}(W_{2i+1} - W_{2i})W_{new}(2i) - \sum_{i=1}^{k}W_{2i}W_{new} \\ \le & 0 \\ \end{aligned} $$Based on the result above, $T_W(S_f) \le T_W(S_l)$, and we know $sum(S_f) = sum(S_l)$, so we have $T(S_f) \le T(S_l)$.
This means when $S$ has pattern:
$${S = \left \langle W_{2k} \cdots,W_4, W_2 , W_1, W_3, W_5, \cdots, W_{2k+1} \right \rangle}$$Then put the new minimum element at the first place is always better then put at the end, so we get the pattern of the new sequence:
$${S' =\left \langle W_{new}, W_{2k} \cdots,W_4, W_2 , W_1, W_3, W_5, \cdots, W_{2k+1} \right \rangle }$$Since we know W_{new} is the smallest, which also can be written as:
$${S' = \left \langle W_{2k+3}, W_{2k} \cdots,W_4, W_2 , W_1, W_3, W_5, \cdots, W_{2k+1} \right \rangle }$$We can see that $S'$ is still beautiful.
- When ${|S|}$ is even:
This means that:
$${S = \left \langle W_{2k} \cdots,W_4, W_2 , W_1, W_3, W_5, \cdots, W_{2k-1} \right \rangle}$$Also, we can use the similar method above to prove that, adding new elements $S_{new}$ at the end is better, which we can get the pattern of the new sequence:
$${S' = \left \langle W_{2k} \cdots,W_4, W_2 , W_1, W_3, W_5, \cdots, W_{2k+1} \right \rangle}$$Also, this new sequence ${S'}$ is still beautiful.
General Conclusion
Based on above we know:
- When $|S| \le 2$: The optimal sequence $S$ is always beautiful.
- When $|S| > 2$: If $S$ is beautiful, and we add a $S_{new}$ which that $W_{new} \le min(\dots S)$ (which means this new weight is less than or equal to any element in $S$), and get the new optimal sequence $S'$. Then $S'$ is still beautiful.
So we can prove that for any $S$, if it's optimal sequence, than it's beautiful.
No comment