-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Description
I started working a bit on learning.py, and I have noticed some issues I need some directions on.
a) All the Learners require a DataSet object as input. Unfortunately, even though for some learners (like the kNN learner) it works intuitively, for others (like the NeuralNetLearner) it requires some actual pre-processing of the dataset. If I'm given the green light, I will implement these additional functions so that all the learners are ran the same way. Or, we could write some documentation on how to pre-process the datasets for each algorithm, as now it is difficult to gauge what an implementation requires.
An example to understand what I'm talking about:
zoo = DataSet(name="zoo")
NaiveBayesLearner(zoo) # This works
NeuralNetLearner(zoo) # This doesn't work
b) The Jupyter Notebook for the file is mainly focused on this one dataset, with some side-tracking for libraries like NumPy and Scikit-learn. Not much is written about the actual implementation of the algorithms on AIMA. I would like to help on this Notebook, but I am not sure of the direction we want this taken. I have the following suggestions:
-
Don't work with the MNIST dataset (or at least don't spend as much space on it as we currently do). I believe a simpler dataset, like Fisher's Iris, would work better in showcasing the algorithms.
-
Remove information about external libraries.
-
Write some information about the algorithms, showcase the code and write a small tutorial on their usage.
I have made a small sample notebook on #318. I plan on writing more on the same style, if approved.
What do you guys think about that? I'm happy to discuss this and change things up. I just want to make sure I know where we want to take this before I jump in.
PS: Sorry for the long post.