For Natural Language Processing, usually you will want to clean the text of common “stop words” that don’t usually contribute to a topical analysis. No version of this list is standard, as requirements change from project to project.
If you find yourself processing either Elizabethan or older English texts, most modern stopword lists will fail to pick up things like “thee,” “thy,” or “thine.”
The older words are arranged in alphabetical order at the end of the standard stop word list on Github, and below is an easy copy-paste so you can add it to your own stopword file easily.
If I’ve forgotten any, let me know and I’ll add them.
All the world’s a stage,And all the men and women merely players.They have their exits and their entrances,And one man in his time plays many parts,His acts being seven ages.