Abstract
Caching (also known as paging) is a classical problem concerning page replacement policies in two-level memory systems. General caching is the variant with pages of different sizes and fault costs. The strong NP-hardness of its two important cases, the fault model (each page has unit fault cost) and the bit model (each page has the same fault cost as size) has been established, but under the assumption that there are pages as large as half of the cache size. We prove that this already holds when page sizes are bounded by a small constant: The bit and fault models are strongly NP-complete even when page sizes are limited to \(\{1, 2, 3\}\). Considering only the decision versions of the problems, general caching is equivalent to the unsplittable flow on a path problem and therefore our results also improve the hardness results about this problem.





Similar content being viewed by others
References
Achlioptas, D., Chrobak, M., Noga, J.: Competitive analysis of randomized paging algorithms. Theor. Comput. Sci. 234(1–2), 203–218 (2000). doi:10.1016/S0304-3975(98)00116-9. (A preliminary version appeared at ESA 1996)
Adamaszek, A., Czumaj, A., Englert, M., Räcke, H.: An O(log k)-competitive algorithm for generalized caching. In: Proceedings of the 23rd Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 1681–1689 (2012). doi:10.1137/1.9781611973099
Albers, S., Arora, S., Khanna, S.: Page replacement for general caching problems. In: Proceedings of the 10th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 31–40 (1999). http://dl.acm.org/citation.cfm?id=314500.314528
Anagnostopoulos, A., Grandoni, F., Leonardi, S., Wiese, A.: A mazing \(2+\epsilon \) approximation for unsplittable flow on a path. In: Chekuri, C. (ed.) Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2014, Portland, Oregon, USA, January 5–7, 2014, pp. 26–41. SIAM (2014). doi:10.1137/1.9781611973402.3
Bansal, N., Buchbinder, N., Naor, J.: Randomized competitive algorithms for generalized caching. SIAM J. Comput. 41(2), 391–414 (2012). doi:10.1137/090779000. (A preliminary version appeared at STOC 2008)
Bansal, N., Chakrabarti, A., Epstein, A., Schieber, B.: A quasi-PTAS for unsplittable flow on line graphs. In: Kleinberg, J.M. (ed.) Proceedings of the 38th Annual ACM Symposium on Theory of Computing, pp. 721–729. ACM (2006). doi:10.1145/1132516.1132617
Bar-Noy, A., Bar-Yehuda, R., Freund, A., Naor, J., Schieber, B.: A unified approach to approximating resource allocation and scheduling. J. ACM 48(5), 1069–1090 (2001). doi:10.1145/502102.502107. (A preliminary version appeared at STOC 2000)
Batra, J., Garg, N., Kumar, A., Mömke, T., Wiese, A.: New approximation schemes for unsplittable flow on a path. In: Indyk, P. (ed.) Proceedings of the 26th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 47–58. SIAM (2015). doi:10.1137/1.9781611973730.5
Belady, L.A.: A study of replacement algorithms for a virtual-storage computer. IBM Syst. J. 5(2), 78–101 (1966). doi:10.1147/sj.52.0078
Bonsma, P.S., Schulz, J., Wiese, A.: A constant-factor approximation algorithm for unsplittable flow on paths. SIAM J. Comput. 43(2), 767–799 (2014). doi:10.1137/120868360
Borodin, A., El-Yaniv, R.: Online Computation and Competitive Analysis. Cambridge University Press, Cambridge (1998)
Brodal, G.S., Moruz, G., Negoescu, A.: Onlinemin: a fast strongly competitive randomized paging algorithm. Theory Comput. Syst. 56(1), 22–40 (2015). doi:10.1007/s00224-012-9427-y. (A preliminary version appeared at WAOA 2011)
Chrobak, M., Karloff, H.J., Payne, T.H., Vishwanathan, S.: New results on server problems. SIAM J. Discrete Math. 4(2), 172–181 (1991). doi:10.1137/0404017. (A preliminary version appeared at SODA 1990)
Chrobak, M., Larmore, L.L., Lund, C., Reingold, N.: A better lower bound on the competitive ratio of the randomized 2-server problem. Inf. Process. Lett. 63(2), 79–83 (1997). doi:10.1016/S0020-0190(97)00099-9
Chrobak, M., Woeginger, G.J., Makino, K., Xu, H.: Caching is hard–even in the fault model. Algorithmica 63(4), 781–794 (2012). doi:10.1007/s00453-011-9502-9. (A preliminary version appeared at ESA 2010)
Darmann, A., Pferschy, U., Schauer, J.: Resource allocation with time intervals. Theor. Comput. Sci. 411(49), 4217–4234 (2010). doi:10.1016/j.tcs.2010.08.028
Fiat, A., Karp, R.M., Luby, M., McGeoch, L.A., Sleator, D.D., Young, N.E.: Competitive paging algorithms. J. Algorithms 12(4), 685–699 (1991). doi:10.1016/0196-6774(91)90041-V. (A preliminary version appeared in 1988)
Folwarczný, L., Sgall, J.: General caching is hard: Even with small pages. In: Elbassioni, K.M., Makino, K. (eds.) Algorithms and Computation—26th International Symposium, ISAAC 2015, Nagoya, Japan, December 9–11, 2015, Proceedings, Lecture Notes in Computer Science, vol. 9472, pp. 116–126. Springer (2015). doi:10.1007/978-3-662-48971-0_11
Irani, S.: Page replacement with multi-size pages and applications to web caching. Algorithmica 33(3), 384–409 (2002). doi:10.1007/s00453-001-0125-4. (A preliminary version appeared at STOC 1997)
McGeoch, L.A., Sleator, D.D.: A strongly competitive randomized paging algorithm. Algorithmica 6(6), 816–825 (1991). doi:10.1007/BF01759073. (A preliminary version appeared in 1989)
Young, N.E.: On-line file caching. Algorithmica 33(3), 371–383 (2002). doi:10.1007/s00453-001-0124-5. (A preliminary version appeared at SODA 1998)
Author information
Authors and Affiliations
Corresponding author
Additional information
Partially supported by the Center of Excellence—ITI, Project P202/12/G061 of GA ČR (J. Sgall) and by the Project 14-10003S of GA ČR (L. Folwarczný).
Appendix: The Simple Proof
Appendix: The Simple Proof
In this appendix, we present a simple variant of the proof for the almost-fault model with two distinct costs. This completes the sketch of the proof presented at the end of Sect. 2. We present it with a complete description of the simplified reduction, so that it can be read independently of the rest of the paper. This appendix can therefore serve as a short proof of the hardness of general caching.
Theorem 6.1
General caching is strongly NP-hard, even in the case when page sizes are limited to 1, 2, 3 and there are only two distinct fault costs.
We prove the theorem for the optional policy. It is easy to obtain the theorem also for the forced policy the same way as in the proof of Theorem 5.1.
1.1 The Reduction
The reduction described here will be equivalent to Reduction 2.1 with \(H = 1\) and the fault cost of each vertex-page set to \(1/(n+1)\).
Suppose we have a graph \(G=(V,E)\) with n nodes and m edges. We construct an instance of general caching whose optimal solution encodes a maximum independent set in G. Fix an arbitrary numbering of edges \(e_1, \ldots , e_m\).
The cache size is \(C=2m+1\). For each vertex v, we have a vertex-page \(p_v\) with size one and cost \(1/(n+1)\). For each edge e, we have six associated edge-pages \(a^e, \bar{a}^e, \alpha ^e, b^e, \bar{b}^e, \beta ^e\); all have cost one, pages \(\alpha ^e,\beta ^e\) have size three and the remaining pages have size two.
The request sequence is organized in phases and blocks. There is one phase for each vertex. In each phase, there are two adjacent blocks associated with every edge e incident with v; the incident edges are processed in an arbitrary order. In addition, there is one initial block I before all phases and one final block F after all phases. Altogether, there are \(d=4m+2\) blocks. There are four blocks associated with each edge e; denote them \(B^e_1\), \(B^e_2\), \(B^e_3\), \(B^e_4\), in the order as they appear in the request sequence.
For each \(v\in V\), the associated page \(p_v\) is requested exactly twice, right before the beginning of the v-phase and right after the end of the v-phase; these requests do not belong to any phase. An example of the structure of phases and blocks is given in Fig. 6.
Even though each block is associated with some fixed edge, it contains one or more requests to the associated pages for every edge e. In each block, we process the edges \(e_1,\ldots ,e_m\) in this order. For each edge e, we make one or more requests to the associated pages according to Table 3.
Figure 7 shows an example of the requests on edge-pages associated with one particular edge.
Requests on all pages associated with the edge e. Each column represents some block(s). The four labeled columns represent the blocks in the heading, the first column represents every block before \(B^e_1\), the middle column represents every block between \(B^e_3\) and \(B^e_4\), and the last column represents every block after \(B^e_4\). The requests in one column are ordered from top to bottom
1.2 Proof of Correctness
Instead of minimizing the service cost, we maximize the saving compared to the service which does not use the cache at all. This is clearly equivalent when considering the decision version of the problem.
Without loss of generality, we assume that any page is brought into the cache only immediately before a request to that page and removed from the cache only immediately after a request to that page; furthermore, at the beginning and at the end the cache is empty. I.e., a page may be in the cache only between two consecutive requests to this page, and either it is in the cache for the whole interval or not at all.
Each page of size three is requested only twice in two consecutive blocks, and these blocks are distinct for all pages of size three. Thus, a service of edge-pages is valid if and only if at each time, at most m edge-pages are in the cache. It is thus convenient to think of the cache as of m slots for edge-pages.
Each vertex-page is requested twice. Thus, the saving on the n vertex-pages is at most \(n/(n+1)<1\). Since all edge-pages have cost one, the optimal service must serve them optimally. Furthermore, a vertex-page can be cached if and only if during the phase it never happens that at the same time all slots for edge-pages are full and a page of size three is cached.
Let \(S_B\) denote the set of all edge-pages cached at the beginning of the block B and let \(s_B = |S_B|\). Now observe that each edge-page is requested only in a contiguous segment of blocks, once in each block. It follows that the total saving on edge-pages is equal to \(\sum _B s_B\) where the sum is over all blocks. In particular, the maximal possible saving on the edge-pages is \((d-1)m\), using the fact that \(S_I\) is empty.
We prove that there is a service with the total saving at least \((d-1)m+K/(n+1)\) if and only if there is an independent set of size K in G. First the easy direction.
Lemma 6.2
Suppose that G has an independent set W of size K. Then there exists a service with the total saving \((d-1)m+K/(n+1)\).
Proof
For any e, denote \(e=\{u, v\}\) so that u precedes v in the ordering of phases. If \(u\in W\), we keep \(\bar{a}^e, b^e, \bar{b}^e\) and \(\beta ^e\) in the cache from the first to the last request on each page, and we do not cache \(a^e\) and \(\alpha ^e\) at any time. Otherwise we cache \(\bar{b}^e, a^e, \bar{a}^e\) and \(\alpha ^e\), and do not cache \(b^e\) and \(\beta ^e\) at any time. In both cases, at each time at most one page associated with e is in the cache and the saving on those pages is \((d-1)m\). See Fig. 8 for an illustration.
The two ways of caching in Lemma 6.2
For any \(v\in W\), we cache \(p_v\) between its two requests. To check that this is a valid service, observe that if \(v\in W\), then during the corresponding phase no page of size three is cached. Thus the page \(p_v\) always fits in the cache together with at most m pages of size two. \(\square \)
Now we prove the converse in a sequence of claims. Fix a valid service with saving at least \((d-1)m\). For a block B, let \(B^{\prime }\) denote the following block.
Claim 6.3
For any block B, with the exception of \(B=I\), we have \(s_B=m\).
Proof
For each \(B\ne I\) we have \(s_B\le m\). Because \(s_I = 0\), the total saving on edge-pages is \(\sum _B s_B\le (d-1)m\). We need an equality. \(\square \)
We now prove that each edge occupies exactly one slot during the service.
Claim 6.4
For any block \(B\ne I\) and for any e, \(S_B\) contains exactly one page associated with e.
Proof
Let us use the notation \(S_B^{\le k} = S_B^{e_1} \cup \cdots \cup S_B^{e_k}\) and \(s_B^{\le k} = \left| S_B^{\le k}\right| \). First, we shall prove for each \(k \le m\)
This is true for \(B=F\), as only the m edge-pages \(\bar{b}^e\) can be cached there, and by the previous claim all of them are indeed cached. Similarly for \(B=I^{\prime }\) (i.e., immediately following the initial block).
If (8) is not true, then for some k and \(B\not \in \{I,F\}\) we have \(s_B^{\le k}<s_{B^{\prime }}^{\le k}\). Then after processing the edge \(e_k\) in the block B we have in the cache all the pages in \((S_B{\setminus }S_B^{\le k})\cup S_{B^{\prime }}^{\le k}\). Their number is \((m-s_B^{\le k})+s_{B^{\prime }}^{\le k}>m\), a contradiction.
The statement of the claim is an immediate consequence of (8). \(\square \)
Claim 6.5
For any edge e, at least one of the pages \(\alpha ^e\) and \(\beta ^e\) is cached between its two requests.
Proof
Assume that none of the two pages is cached. It follows from the previous claim that \(b^e\in S_{B^e_2}\), as at this point \(\alpha ^e\) and \(b^e\) are the only pages associated with e that can be cached. Similarly, \(a^e\in S_{B^e_4}\).
It follows that there exists a block B between \(B^e_1\) and \(B^e_4\) such that \(S_B\) contains the page \(b^e\) and \(S_{B^{\prime }}\) contains the page \(a^e\). However, in B, the page \(a^e\) is requested before the page \(b^e\). Thus at the point between the two requests, the cache contains two pages associated with e, plus one page associated with every other edge, the total of \(m+1\) pages, a contradiction. \(\square \)
Now we are ready to complete this direction.
Lemma 6.6
Suppose that there exists a valid service with the total saving \((d-1)m+K/(n+1)\). Then G has an independent set W of size K.
Proof
Let W be the set of all v such that \(p_v\) is cached between its two requests. The total saving implies that \(|W|=K\).
Now we claim that W is independent. Suppose not, let \(e=uv\) be an edge with \(u,v\in W\). Then \(p_u\) and \(p_v\) are cached in the corresponding phases. Thus neither \(\alpha ^e\) nor \(\beta ^e\) can be cached, since together with other \(m-1\) requests of size 2 associated with the remaining edges, the cache size needed would be \(2m+2\). However, this contradicts the last claim. \(\square \)
Lemmas 6.2 and 6.6 together show that we constructed a valid polynomial-time reduction from the problem of independent set to general caching. Therefore, Theorem 6.1 is proven.
Rights and permissions
About this article
Cite this article
Folwarczný, L., Sgall, J. General Caching Is Hard: Even with Small Pages. Algorithmica 79, 319–339 (2017). https://doi.org/10.1007/s00453-016-0185-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00453-016-0185-0