-
Notifications
You must be signed in to change notification settings - Fork 0
Fix: Complete frequently used cache eviction algos #13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @haochengxia, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
I've focused on significantly enhancing the libcachesim
library by integrating a comprehensive suite of frequently used cache eviction algorithms, including probabilistic, size-based, and hyperbolic strategies, along with specialized multi-level caches. A key part of this effort involved improving the Python bindings to expose more configuration parameters for existing algorithms. To support these additions and ensure a flexible development environment, I've also revamped the build system to dynamically detect and enable machine learning-related dependencies, leading to a more robust and adaptable compilation process.
Highlights
- Expanded Cache Algorithm Support: I've significantly expanded the
libcachesim
library by integrating a comprehensive suite of new cache eviction algorithms. This includes probabilistic algorithms like LRUProb and FlashProb, size-based algorithms such as Size and GDSF, hyperbolic algorithms like Hyperbolic, and advanced multi-level caches including ThreeLCache, GLCache, and LRB. - Enhanced Algorithm Parameterization: Many existing cache algorithms, such as Clock, S3FIFO, TwoQ, WTinyLFU, LeCaR, ClockPro, and BeladySize, have been enhanced to expose and utilize specific configuration parameters. This provides users with greater flexibility and fine-grained control over cache behavior and simulation scenarios.
- Dynamic and Intelligent Build System: I've introduced a new
smart_build.py
script that intelligently detects the presence of optional machine learning dependencies (like XGBoost and LightGBM) on the system. This script automatically configures the underlying C++ build process to enable corresponding cache algorithms (e.g., GLCache, LRB, ThreeLCache) that rely on these libraries, streamlining the compilation of advanced features. - Improved Dependency Management: The project's dependency management has been improved by updating
pyproject.toml
anduv.lock
.pybind11
andpytest
are now explicitly declared as core dependencies, whilexgboost
andlightgbm
are listed as optional dependencies, ensuring proper installation and functionality for users who wish to leverage the advanced cache features. - Codebase Refinements and Cleanup: Various Python files across the codebase have undergone minor formatting and cleanup. These refinements improve code consistency, readability, and maintainability, contributing to a cleaner and more professional codebase.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request significantly expands the library by adding a comprehensive set of frequently used cache eviction algorithms to the Python bindings. It also introduces a new 'smart build' system to automatically handle optional dependencies like xgboost
and lightgbm
, which is a great improvement for usability. The code is generally well-written, but I've identified a critical issue in the new build script that would cause it to fail, along with a couple of medium-severity issues related to code style and maintainability. My review includes suggestions to address these points.
def fix_pybind11(): | ||
"""Fix pybind11 installation""" | ||
print("Checking pybind11...") | ||
subprocess.run([sys.executable, "scripts/fix_pybind11.py"], check=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function attempts to run a script scripts/fix_pybind11.py
which is not present in the pull request. This will cause the build to fail. The logic for fixing pybind11 is available as a function in scripts/detect_deps.py
.
Instead of using a subprocess to call a non-existent script, you should import and call the function directly.
def fix_pybind11(): | |
"""Fix pybind11 installation""" | |
print("Checking pybind11...") | |
subprocess.run([sys.executable, "scripts/fix_pybind11.py"], check=True) | |
def fix_pybind11(): | |
"""Fix pybind11 installation""" | |
from detect_deps import fix_pybind11 as fix_pybind11_impl | |
fix_pybind11_impl() |
consider_obj_metadata: bool = False, | ||
segment_size: int = 100, | ||
n_merge: int = 2, | ||
type: str = "learned", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parameter name type
shadows the built-in function type()
. According to PEP 8, this should be avoided to prevent potential confusion and errors. A common convention is to use a trailing underscore.
After renaming this parameter, remember to update its usage on line 589 as well.
type: str = "learned", | |
type_: str = "learned", |
found = True | ||
break | ||
if not found: | ||
print("✗ xgboost CMake directory not found (not required for Python usage, only for C++ linkage)") | ||
return True | ||
except ImportError: | ||
print("✗ xgboost is not installed") | ||
print("Reinstalling xgboost...") | ||
try: | ||
subprocess.run([sys.executable, "-m", "pip", "install", "--force-reinstall", "xgboost"], check=True) | ||
print("✓ xgboost reinstalled successfully") | ||
import xgboost | ||
print("✓ xgboost is installed after reinstall") | ||
# Repeat CMake dir check after reinstall | ||
cmake_dir = getattr(xgboost, 'cmake_dir', None) | ||
if cmake_dir: | ||
print(f"✓ xgboost CMake directory: {cmake_dir}") | ||
else: | ||
import os | ||
possible_dirs = [ | ||
os.path.join(xgboost.__path__[0], 'cmake'), | ||
os.path.join(xgboost.__path__[0], '..', 'cmake'), | ||
'/usr/local/lib/cmake/xgboost', | ||
'/usr/local/share/cmake/xgboost', | ||
'/opt/homebrew/lib/cmake/xgboost', | ||
] | ||
found = False | ||
for d in possible_dirs: | ||
if os.path.isdir(d): | ||
print(f"✓ xgboost CMake directory: {os.path.abspath(d)}") | ||
found = True | ||
break | ||
if not found: | ||
print("✗ xgboost CMake directory not found (not required for Python usage, only for C++ linkage)") | ||
return True | ||
except Exception as e: | ||
print(f"✗ xgboost installation failed: {e}") | ||
return False | ||
|
||
def fix_lightgbm(): | ||
"""Fix lightgbm installation""" | ||
print("Checking lightgbm installation...") | ||
try: | ||
import lightgbm | ||
print("✓ lightgbm is installed") | ||
# Try to find CMake directory (if available) | ||
cmake_dir = getattr(lightgbm, 'cmake_dir', None) | ||
if cmake_dir: | ||
print(f"✓ lightgbm CMake directory: {cmake_dir}") | ||
else: | ||
import os | ||
possible_dirs = [ | ||
os.path.join(lightgbm.__path__[0], 'cmake'), | ||
os.path.join(lightgbm.__path__[0], '..', 'cmake'), | ||
'/usr/local/lib/cmake/LightGBM', | ||
'/usr/local/share/cmake/LightGBM', | ||
'/opt/homebrew/lib/cmake/LightGBM', | ||
] | ||
found = False | ||
for d in possible_dirs: | ||
if os.path.isdir(d): | ||
print(f"✓ lightgbm CMake directory: {os.path.abspath(d)}") | ||
found = True | ||
break | ||
if not found: | ||
print("✗ lightgbm CMake directory not found (not required for Python usage, only for C++ linkage)") | ||
return True | ||
except ImportError: | ||
print("✗ lightgbm is not installed") | ||
print("Reinstalling lightgbm...") | ||
try: | ||
subprocess.run([sys.executable, "-m", "pip", "install", "--force-reinstall", "lightgbm"], check=True) | ||
print("✓ lightgbm reinstalled successfully") | ||
import lightgbm | ||
print("✓ lightgbm is installed after reinstall") | ||
# Repeat CMake dir check after reinstall | ||
cmake_dir = getattr(lightgbm, 'cmake_dir', None) | ||
if cmake_dir: | ||
print(f"✓ lightgbm CMake directory: {cmake_dir}") | ||
else: | ||
import os | ||
possible_dirs = [ | ||
os.path.join(lightgbm.__path__[0], 'cmake'), | ||
os.path.join(lightgbm.__path__[0], '..', 'cmake'), | ||
'/usr/local/lib/cmake/LightGBM', | ||
'/usr/local/share/cmake/LightGBM', | ||
'/opt/homebrew/lib/cmake/LightGBM', | ||
] | ||
found = False | ||
for d in possible_dirs: | ||
if os.path.isdir(d): | ||
print(f"✓ lightgbm CMake directory: {os.path.abspath(d)}") | ||
found = True | ||
break | ||
if not found: | ||
print("✗ lightgbm CMake directory not found (not required for Python usage, only for C++ linkage)") | ||
return True | ||
except Exception as e: | ||
print(f"✗ lightgbm installation failed: {e}") | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The functions fix_xgboost
and fix_lightgbm
contain a lot of duplicated code. You can refactor this by creating a helper function to handle the common logic of checking, reinstalling, and finding the CMake directory for a dependency. This will make the code more maintainable and easier to read.
def _check_and_fix_dependency(module_name, cmake_paths):
"""Helper to check and fix a dependency."""
print(f"Checking {module_name} installation...")
module = None
try:
module = __import__(module_name)
print(f"✓ {module_name} is installed")
except ImportError:
print(f"✗ {module_name} is not installed")
print(f"Reinstalling {module_name}...")
try:
subprocess.run([sys.executable, "-m", "pip", "install", "--force-reinstall", module_name], check=True)
print(f"✓ {module_name} reinstalled successfully")
module = __import__(module_name)
print(f"✓ {module_name} is installed after reinstall")
except Exception as e:
print(f"✗ {module_name} installation failed: {e}")
return False
# Check for CMake directory
cmake_dir = getattr(module, 'cmake_dir', None)
if cmake_dir:
print(f"✓ {module_name} CMake directory: {cmake_dir}")
else:
import os
possible_dirs = [os.path.join(module.__path__[0], p) for p in ['cmake', '../cmake']] + cmake_paths
found = False
for d in possible_dirs:
if os.path.isdir(d):
print(f"✓ {module_name} CMake directory: {os.path.abspath(d)}")
found = True
break
if not found:
print(f"✗ {module_name} CMake directory not found (not required for Python usage, only for C++ linkage)")
return True
def fix_xgboost():
"""Fix xgboost installation"""
return _check_and_fix_dependency(
"xgboost",
[
'/usr/local/lib/cmake/xgboost',
'/usr/local/share/cmake/xgboost',
'/opt/homebrew/lib/cmake/xgboost',
]
)
def fix_lightgbm():
"""Fix lightgbm installation"""
return _check_and_fix_dependency(
"lightgbm",
[
'/usr/local/lib/cmake/LightGBM',
'/usr/local/share/cmake/LightGBM',
'/opt/homebrew/lib/cmake/LightGBM',
]
)
No description provided.