Skip to content

NLP Word Frequency Algorithms #2142

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Jun 25, 2020
Merged

NLP Word Frequency Algorithms #2142

merged 21 commits into from
Jun 25, 2020

Conversation

danmurphy1217
Copy link
Contributor

Describe your change:

  • Add an algorithm? Added common NLP algorithms: term frequency, document frequency, and inverse document
    frequency
  • Fix a bug or typo in an existing algorithm?
  • Documentation change?

Checklist:

  • I have read CONTRIBUTING.md.
  • This pull request is all my own work -- I have not plagiarized.
  • I know that pull requests will not be merged if they fail the automated tests.
  • This PR only changes one algorithm file. To ease review, please open separate PRs for separate algorithms.
  • All new Python files are placed inside an existing directory.
  • All filenames are in all lowercase characters with no spaces or dashes.
  • All functions and variable names follow Python naming conventions.
  • All function parameters and return values are annotated with Python type hints.
  • All functions have doctests that pass the automated testing.
  • All new algorithms have a URL in its comments that points to Wikipedia or other similar explanation.
  • If this pull request resolves one or more open issues then the commit message contains Fixes: #{$ISSUE_NO}.

Copy link
Member

@l3str4nge l3str4nge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://travis-ci.com/github/TheAlgorithms/Python/builds/172385776#L804

Please fix your code for passing unit tests for further review.

danmurphy1217 and others added 5 commits June 22, 2020 17:23
Co-authored-by: Christian Clauss <cclauss@me.com>
Co-authored-by: Christian Clauss <cclauss@me.com>
Co-authored-by: Christian Clauss <cclauss@me.com>
Co-authored-by: Christian Clauss <cclauss@me.com>
@TravisBuddy
Copy link

Hey @danmurphy1217,
Something went wrong with the build.

TravisCI finished with status errored, which means the build failed because of something unrelated to the tests, such as a problem with a dependency or the build process itself.

View build log

TravisBuddy Request Identifier: 49fd52f0-b4cf-11ea-bd89-f57d85bb28d1

@TravisBuddy
Copy link

Hey @danmurphy1217,
Something went wrong with the build.

TravisCI finished with status errored, which means the build failed because of something unrelated to the tests, such as a problem with a dependency or the build process itself.

View build log

TravisBuddy Request Identifier: 25c1c850-b4d2-11ea-bd89-f57d85bb28d1

@TravisBuddy
Copy link

Travis tests have failed

Hey @danmurphy1217,
Please read the following log in order to understand the failure reason.
It'll be awesome if you fix what's wrong and commit the changes.

TravisBuddy Request Identifier: 017458d0-b4d4-11ea-bd89-f57d85bb28d1

@TravisBuddy
Copy link

Travis tests have failed

Hey @danmurphy1217,
Please read the following log in order to understand the failure reason.
It'll be awesome if you fix what's wrong and commit the changes.

TravisBuddy Request Identifier: 13ea6f70-b4db-11ea-bd89-f57d85bb28d1

@TravisBuddy
Copy link

Hey @danmurphy1217,
Something went wrong with the build.

TravisCI finished with status errored, which means the build failed because of something unrelated to the tests, such as a problem with a dependency or the build process itself.

View build log

TravisBuddy Request Identifier: fa991780-b4ec-11ea-bd89-f57d85bb28d1

@TravisBuddy
Copy link

Travis tests have failed

Hey @danmurphy1217,
Please read the following log in order to understand the failure reason.
It'll be awesome if you fix what's wrong and commit the changes.

TravisBuddy Request Identifier: 75e8dcc0-b4ef-11ea-bd89-f57d85bb28d1

@cclauss
Copy link
Member

cclauss commented Jun 23, 2020

On your local machine, try: python3 -m doctest -v word_frequency_functions.py

@TravisBuddy
Copy link

Hey @danmurphy1217,
Something went wrong with the build.

TravisCI finished with status errored, which means the build failed because of something unrelated to the tests, such as a problem with a dependency or the build process itself.

View build log

TravisBuddy Request Identifier: a49a18d0-b568-11ea-aae7-0b05fcc524af

@danmurphy1217
Copy link
Contributor Author

On your local machine, try: python3 -m doctest -v word_frequency_functions.py

Are there any other updates I should make?

return idf
except ZeroDivisionError:
print("The term you searched for is not in the corpus.")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if raise this function not returning float

Copy link
Contributor Author

@danmurphy1217 danmurphy1217 Jun 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean? instead of print raise an error?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If error raised None will be returned. Is it good way? Wouldn't be better to check df value before calculation and raise error if df==0?. Also please add doctest with this situation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Algorithm functions should not print as discussed in CONTRIBUTING.md. If our type hints promise that we will return a float then we should do so or raise an exception explaining why we can not do so.

@l3str4nge l3str4nge merged commit b368b1e into TheAlgorithms:master Jun 25, 2020
@danmurphy1217 danmurphy1217 deleted the dan branch June 25, 2020 11:02
stokhos pushed a commit to stokhos/Python that referenced this pull request Jan 3, 2021
* NLP Word Frequency Algorithms

* Added type hints and Wikipedia link to tf-idf

* Update machine_learning/word_frequency_functions.py

Co-authored-by: Christian Clauss <cclauss@me.com>

* Update machine_learning/word_frequency_functions.py

Co-authored-by: Christian Clauss <cclauss@me.com>

* Update machine_learning/word_frequency_functions.py

Co-authored-by: Christian Clauss <cclauss@me.com>

* Update machine_learning/word_frequency_functions.py

Co-authored-by: Christian Clauss <cclauss@me.com>

* Fix line length for flake8

* Fix line length for flake8 V2

* Add line escapes and change int to float

* Corrected doctests

* Fix for TravisCI

* Fix for TravisCI V2

* Tests passing locally

* Tests passing locally

* Update machine_learning/word_frequency_functions.py

Co-authored-by: Christian Clauss <cclauss@me.com>

* Update machine_learning/word_frequency_functions.py

Co-authored-by: Christian Clauss <cclauss@me.com>

* Update machine_learning/word_frequency_functions.py

Co-authored-by: Christian Clauss <cclauss@me.com>

* Update machine_learning/word_frequency_functions.py

Co-authored-by: Christian Clauss <cclauss@me.com>

* Add doctest examples and clean up docstrings

Co-authored-by: Christian Clauss <cclauss@me.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants