Skip to content

Add multinomial Naive Bayes text classification example#14665

Open
Raghav8690 wants to merge 4 commits intoTheAlgorithms:masterfrom
Raghav8690:naive-bayes-text-classification
Open

Add multinomial Naive Bayes text classification example#14665
Raghav8690 wants to merge 4 commits intoTheAlgorithms:masterfrom
Raghav8690:naive-bayes-text-classification

Conversation

@Raghav8690
Copy link
Copy Markdown

Describe your change:

Add a new machine learning algorithm implementation for text classification using Multinomial Naive Bayes.

This PR adds

  • A new file: naive_bayes_text_classification.py
  • A class-based implementation with:
    • fit
    • predict_proba
    • predict
  • Input validation with proper exceptions
  • Comprehensive docstrings and doctests for:
    • Valid cases
    • Invalid cases
  • A small toy text dataset example for demonstration
  • Reference links to algorithm explanations and learning resources

Additional Notes

  • The implementation is designed to be simple, readable, and beginner-friendly.
  • The classifier supports basic text preprocessing and probabilistic prediction.
  • Error handling has been added to improve robustness and usability.

Fixes #14664

  • Add an algorithm?
  • Fix a bug or typo in an existing algorithm?
  • Add or change doctests? -- Note: Please avoid changing both code and tests in a single pull request.
  • Documentation change?

Checklist:

  • I have read CONTRIBUTING.md.
  • This pull request is all my own work -- I have not plagiarized.
  • I know that pull requests will not be merged if they fail the automated tests.
  • This PR only changes one algorithm file. To ease review, please open separate PRs for separate algorithms.
  • All new Python files are placed inside an existing directory.
  • All filenames are in all lowercase characters with no spaces or dashes.
  • All functions and variable names follow Python naming conventions.
  • All function parameters and return values are annotated with Python type hints.
  • All functions have doctests that pass the automated testing.
  • All new algorithms include at least one URL that points to Wikipedia or another similar explanation.
  • If this pull request resolves one or more open issues then the description above includes the issue number(s) with a closing keyword: "Fixes [FEATURE] Add Naive Bayes Text Classification Algorithm #14664".

@algorithms-keeper algorithms-keeper Bot added the awaiting reviews This PR is ready to be reviewed label May 11, 2026
@AG141293
Copy link
Copy Markdown

@Raghav8690,
Hi, Raghav, great implementation overall! I noticed a few things while reviewing:

  1. Type hint issue in predict() — probabilities.get has return type float | None, which may cause mypy to fail. Consider using key=lambda label: probabilities[label] instead.
  2. Empty string input — predict_proba("") silently returns class priors with no warning. A doctest covering this edge case would help learners understand the behaviour.
  3. Validation order in fit() — the empty-list check comes after the length check. Since empty lists trivially satisfy len(texts) == len(labels), it works correctly — but reversing the order (check empty first) would be more readable.
  4. No printed demo in main — most files in this repo print example output so learners can run python naive_bayes_text_classification.py and see results. Worth adding a few print() calls after doctest.testmod().
    Overall the math and structure are solid. Nice work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting reviews This PR is ready to be reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Add Naive Bayes Text Classification Algorithm

2 participants