A while back I was interviewing for a job and was asked about recommender systems. The extent of my knowledge at the time was the week on recommenders in Andrew Ng’s course on machine learning, so after the interview I decided to work on a side project to learn more. I had the final product running at http://www.kivaloans4.me, but I recently spun down the EC2 instance. However, you can download all of the code and run it locally, the rest of this blog post explains some of the steps that went into building and tuning it.
I’ve given many microloans through Kiva through the years, but I’ve always found their interface a little cumbersome so I decided to use publicly available data through their API to automatically generate recommendations.
I started out by looking at a really great blog post on basic recommender systems and decided to build a content-based filtering system. This is one in which loans are recommended based on their similarity to a user’s previous loans.
The fun part came in trying to figure out how to measure which loans were similar. Data on loans is stored in a JSON object, which contains information on the borrower(s) (number of borrowers, gender, country of origin) as well as information on what the loan would be used for (agriculture, education, or retail, for example) and tags and themes (eco-friendly, woman-owned business, vulnerable groups, etc).
For example, here is a breakdown of the 18 loans I’ve given in the past:
The majority of loans that I’ve contributed to have been to rural and woman-owned businesses in East Africa. (Mostly Uganda in particular, since I lived there and think it would be so cool to run into one of these business owners one day on a visit back.)
After trying out a few metrics, I decided to create a hybrid of Jaccard and Cosine similarity. This allowed me to use the details of a user’s loans (country and continent of origin, loan sector, and associated tags and themes) while also weighting towards the elements of those loans that the user appeared to favor.
Here’s the top five “best” loans output from v1 of the recommender:
In the first iteration, the recommender was a little too precise – in my case returning almost solely rural and woman-owned businesses in Uganda. (In fact, there were only a small amount of loans from Uganda available from Kiva at the time I ran this. With more available loans, they would have certainly been even more homogenous.)
This was a little repetitive and un-exciting, so I changed the weighting function a bit to allow for a little more randomness and variety. (More details on how I decided on that method are here.)
The resulting recommendations were still many rural and woman-owned businesses in Africa, but there was more variety in country of origin, introducing me to loans I wouldn’t have otherwise seen.
To check whether the recommender worked well for users with different types of loan profiles, I looked at a few other publicly available users. The user below has given 20 loans, focused on animals and agriculture and all to people in the Philippines:
The top 5 recommended loans for her are here:
In this case, the recommended loans are still all very similar despite the scaling factor on the elements I included to increase diversity. One way to address this would be to add random noise to the weight for each loan element, perhaps determining the amount of noise to add based on the homogeneity of previous loans. If I were to extend this project, I would love to validate these recommenders by trying out a number of different similarity metrics and testing their effect on loan contributions for different users.
If you’ve ever given loans through Kiva and would like to try out the recommender, it’s currently running at http://www.kivaloans4.me. The whole project (including jupyter notebooks for data analysis, as well as the web interface using python, flask, and javascript) is at https://github.com/briannaschuyler/loan_project.