Skip to content

Clarify licensing, add CONTRIBUTING.md, and update README.md #242

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 19, 2025

Conversation

ekaf
Copy link
Member

@ekaf ekaf commented Jun 17, 2025

This PR resolves #241 by adding three key files to clarify the licensing structure of the nltk_data repository:

  • LICENSE — The full text of the Apache License 2.0, which applies to the repository as a whole.
  • LICENSE-OVERVIEW.md — A concise, human-readable summary of the repository’s licensing, emphasizing the diversity of data package licenses and the need for users to review individual dataset terms.
  • DATASET-LICENSES.md — A comprehensive, grouped list of all individual data packages and their licenses, including explicit notes and warnings regarding packages with ambiguous or unclarified licensing.

These additions are intended to increase transparency, support responsible use, and improve compliance for all users of nltk_data.

This intends to close #241.

@ekaf ekaf requested a review from Copilot June 17, 2025 09:25
Copilot

This comment was marked as outdated.

@ekaf ekaf requested a review from Copilot June 17, 2025 10:06
Copilot

This comment was marked as outdated.

@stevenbird stevenbird self-assigned this Jun 17, 2025
@ekaf
Copy link
Member Author

ekaf commented Jun 18, 2025

Adding CONTRIBUTING.md and a modernized README.md to this PR to further improve transparency and make it easier for contributors to get started.

  • CONTRIBUTING.md: Provides detailed, step-by-step contribution guidelines, including how to add new data packages and clarify licensing.
  • README.md: Replaces the old README.txt with a Markdown file that highlights installation instructions, recent enhancements (including the new licensing overview), and directs users to the new contribution and licensing documentation.

Since these files are closely related to the goals of this PR—clarifying licensing and improving repository documentation—adding them here should keep all documentation improvements together for easier review and a more cohesive update.

Closes #240 as well.

@ekaf ekaf changed the title Add top-level licensing documentation files Clarify licensing, add CONTRIBUTING.md, and update README.md Jun 18, 2025
@ekaf ekaf requested a review from Copilot June 18, 2025 05:12
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR clarifies the repository’s licensing structure, introduces contribution guidelines, and updates the primary README.

  • Adds a top-level Apache 2.0 LICENSE and human-readable LICENSE-OVERVIEW.md with licensing summary
  • Introduces DATASET-LICENSES.md listing individual package licenses and flags unclear cases
  • Adds CONTRIBUTING.md and replaces the old README.txt with an enhanced README.md

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
README.txt Removed legacy placeholder file
README.md New project overview, installation, and guide links
LICENSE-OVERVIEW.md Summary of repository vs. per-package license rules
LICENSE Full Apache License 2.0 text
DATASET-LICENSES.md Detailed list of all data package licenses
CONTRIBUTING.md Step-by-step instructions for adding datasets
Comments suppressed due to low confidence (2)

README.md:1

  • [nitpick] Consider adding a license badge (e.g., Apache 2.0) and a link to the LICENSE file at the top of the README to make the repository license more visible to users.
# Data Distribution for NLTK

CONTRIBUTING.md:28

  • It may help to remind contributors here to update DATASET-LICENSES.md and LICENSE-OVERVIEW.md whenever they add a new package, ensuring licensing documentation stays in sync.
### 3. Add Your Data Package

@stevenbird stevenbird merged commit b2f5e5f into nltk:gh-pages Jun 19, 2025
@stevenbird
Copy link
Member

Wonderful contribution, thank you @ekaf

@ekaf ekaf mentioned this pull request Jun 26, 2025
@ekaf ekaf deleted the hotfix-241 branch June 26, 2025 16:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clarify repository and data package licensing structure
2 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy