Skip to content

Fix: Complete frequently used cache eviction algos #13

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 30, 2025
Merged

Conversation

haochengxia
Copy link
Collaborator

No description provided.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @haochengxia, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

I've focused on significantly enhancing the libcachesim library by integrating a comprehensive suite of frequently used cache eviction algorithms, including probabilistic, size-based, and hyperbolic strategies, along with specialized multi-level caches. A key part of this effort involved improving the Python bindings to expose more configuration parameters for existing algorithms. To support these additions and ensure a flexible development environment, I've also revamped the build system to dynamically detect and enable machine learning-related dependencies, leading to a more robust and adaptable compilation process.

Highlights

  • Expanded Cache Algorithm Support: I've significantly expanded the libcachesim library by integrating a comprehensive suite of new cache eviction algorithms. This includes probabilistic algorithms like LRUProb and FlashProb, size-based algorithms such as Size and GDSF, hyperbolic algorithms like Hyperbolic, and advanced multi-level caches including ThreeLCache, GLCache, and LRB.
  • Enhanced Algorithm Parameterization: Many existing cache algorithms, such as Clock, S3FIFO, TwoQ, WTinyLFU, LeCaR, ClockPro, and BeladySize, have been enhanced to expose and utilize specific configuration parameters. This provides users with greater flexibility and fine-grained control over cache behavior and simulation scenarios.
  • Dynamic and Intelligent Build System: I've introduced a new smart_build.py script that intelligently detects the presence of optional machine learning dependencies (like XGBoost and LightGBM) on the system. This script automatically configures the underlying C++ build process to enable corresponding cache algorithms (e.g., GLCache, LRB, ThreeLCache) that rely on these libraries, streamlining the compilation of advanced features.
  • Improved Dependency Management: The project's dependency management has been improved by updating pyproject.toml and uv.lock. pybind11 and pytest are now explicitly declared as core dependencies, while xgboost and lightgbm are listed as optional dependencies, ensuring proper installation and functionality for users who wish to leverage the advanced cache features.
  • Codebase Refinements and Cleanup: Various Python files across the codebase have undergone minor formatting and cleanup. These refinements improve code consistency, readability, and maintainability, contributing to a cleaner and more professional codebase.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@haochengxia haochengxia merged commit d3c3ad7 into main Jul 30, 2025
32 checks passed
@haochengxia haochengxia deleted the hxia/pack branch July 30, 2025 08:51
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly expands the library by adding a comprehensive set of frequently used cache eviction algorithms to the Python bindings. It also introduces a new 'smart build' system to automatically handle optional dependencies like xgboost and lightgbm, which is a great improvement for usability. The code is generally well-written, but I've identified a critical issue in the new build script that would cause it to fail, along with a couple of medium-severity issues related to code style and maintainability. My review includes suggestions to address these points.

Comment on lines +39 to +42
def fix_pybind11():
"""Fix pybind11 installation"""
print("Checking pybind11...")
subprocess.run([sys.executable, "scripts/fix_pybind11.py"], check=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This function attempts to run a script scripts/fix_pybind11.py which is not present in the pull request. This will cause the build to fail. The logic for fixing pybind11 is available as a function in scripts/detect_deps.py.

Instead of using a subprocess to call a non-existent script, you should import and call the function directly.

Suggested change
def fix_pybind11():
"""Fix pybind11 installation"""
print("Checking pybind11...")
subprocess.run([sys.executable, "scripts/fix_pybind11.py"], check=True)
def fix_pybind11():
"""Fix pybind11 installation"""
from detect_deps import fix_pybind11 as fix_pybind11_impl
fix_pybind11_impl()

consider_obj_metadata: bool = False,
segment_size: int = 100,
n_merge: int = 2,
type: str = "learned",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The parameter name type shadows the built-in function type(). According to PEP 8, this should be avoided to prevent potential confusion and errors. A common convention is to use a trailing underscore.

After renaming this parameter, remember to update its usage on line 589 as well.

Suggested change
type: str = "learned",
type_: str = "learned",

Comment on lines +39 to +162
found = True
break
if not found:
print("✗ xgboost CMake directory not found (not required for Python usage, only for C++ linkage)")
return True
except ImportError:
print("✗ xgboost is not installed")
print("Reinstalling xgboost...")
try:
subprocess.run([sys.executable, "-m", "pip", "install", "--force-reinstall", "xgboost"], check=True)
print("✓ xgboost reinstalled successfully")
import xgboost
print("✓ xgboost is installed after reinstall")
# Repeat CMake dir check after reinstall
cmake_dir = getattr(xgboost, 'cmake_dir', None)
if cmake_dir:
print(f"✓ xgboost CMake directory: {cmake_dir}")
else:
import os
possible_dirs = [
os.path.join(xgboost.__path__[0], 'cmake'),
os.path.join(xgboost.__path__[0], '..', 'cmake'),
'/usr/local/lib/cmake/xgboost',
'/usr/local/share/cmake/xgboost',
'/opt/homebrew/lib/cmake/xgboost',
]
found = False
for d in possible_dirs:
if os.path.isdir(d):
print(f"✓ xgboost CMake directory: {os.path.abspath(d)}")
found = True
break
if not found:
print("✗ xgboost CMake directory not found (not required for Python usage, only for C++ linkage)")
return True
except Exception as e:
print(f"✗ xgboost installation failed: {e}")
return False

def fix_lightgbm():
"""Fix lightgbm installation"""
print("Checking lightgbm installation...")
try:
import lightgbm
print("✓ lightgbm is installed")
# Try to find CMake directory (if available)
cmake_dir = getattr(lightgbm, 'cmake_dir', None)
if cmake_dir:
print(f"✓ lightgbm CMake directory: {cmake_dir}")
else:
import os
possible_dirs = [
os.path.join(lightgbm.__path__[0], 'cmake'),
os.path.join(lightgbm.__path__[0], '..', 'cmake'),
'/usr/local/lib/cmake/LightGBM',
'/usr/local/share/cmake/LightGBM',
'/opt/homebrew/lib/cmake/LightGBM',
]
found = False
for d in possible_dirs:
if os.path.isdir(d):
print(f"✓ lightgbm CMake directory: {os.path.abspath(d)}")
found = True
break
if not found:
print("✗ lightgbm CMake directory not found (not required for Python usage, only for C++ linkage)")
return True
except ImportError:
print("✗ lightgbm is not installed")
print("Reinstalling lightgbm...")
try:
subprocess.run([sys.executable, "-m", "pip", "install", "--force-reinstall", "lightgbm"], check=True)
print("✓ lightgbm reinstalled successfully")
import lightgbm
print("✓ lightgbm is installed after reinstall")
# Repeat CMake dir check after reinstall
cmake_dir = getattr(lightgbm, 'cmake_dir', None)
if cmake_dir:
print(f"✓ lightgbm CMake directory: {cmake_dir}")
else:
import os
possible_dirs = [
os.path.join(lightgbm.__path__[0], 'cmake'),
os.path.join(lightgbm.__path__[0], '..', 'cmake'),
'/usr/local/lib/cmake/LightGBM',
'/usr/local/share/cmake/LightGBM',
'/opt/homebrew/lib/cmake/LightGBM',
]
found = False
for d in possible_dirs:
if os.path.isdir(d):
print(f"✓ lightgbm CMake directory: {os.path.abspath(d)}")
found = True
break
if not found:
print("✗ lightgbm CMake directory not found (not required for Python usage, only for C++ linkage)")
return True
except Exception as e:
print(f"✗ lightgbm installation failed: {e}")
return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The functions fix_xgboost and fix_lightgbm contain a lot of duplicated code. You can refactor this by creating a helper function to handle the common logic of checking, reinstalling, and finding the CMake directory for a dependency. This will make the code more maintainable and easier to read.

def _check_and_fix_dependency(module_name, cmake_paths):
    """Helper to check and fix a dependency."""
    print(f"Checking {module_name} installation...")
    module = None
    try:
        module = __import__(module_name)
        print(f"✓ {module_name} is installed")
    except ImportError:
        print(f"✗ {module_name} is not installed")
        print(f"Reinstalling {module_name}...")
        try:
            subprocess.run([sys.executable, "-m", "pip", "install", "--force-reinstall", module_name], check=True)
            print(f"✓ {module_name} reinstalled successfully")
            module = __import__(module_name)
            print(f"✓ {module_name} is installed after reinstall")
        except Exception as e:
            print(f"✗ {module_name} installation failed: {e}")
            return False

    # Check for CMake directory
    cmake_dir = getattr(module, 'cmake_dir', None)
    if cmake_dir:
        print(f"✓ {module_name} CMake directory: {cmake_dir}")
    else:
        import os
        possible_dirs = [os.path.join(module.__path__[0], p) for p in ['cmake', '../cmake']] + cmake_paths
        found = False
        for d in possible_dirs:
            if os.path.isdir(d):
                print(f"✓ {module_name} CMake directory: {os.path.abspath(d)}")
                found = True
                break
        if not found:
            print(f"✗ {module_name} CMake directory not found (not required for Python usage, only for C++ linkage)")
    return True

def fix_xgboost():
    """Fix xgboost installation"""
    return _check_and_fix_dependency(
        "xgboost",
        [
            '/usr/local/lib/cmake/xgboost',
            '/usr/local/share/cmake/xgboost',
            '/opt/homebrew/lib/cmake/xgboost',
        ]
    )

def fix_lightgbm():
    """Fix lightgbm installation"""
    return _check_and_fix_dependency(
        "lightgbm",
        [
            '/usr/local/lib/cmake/LightGBM',
            '/usr/local/share/cmake/LightGBM',
            '/opt/homebrew/lib/cmake/LightGBM',
        ]
    )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy