2014-03-11 11:57:07.0|分类: mahout|浏览量: 1931
关于皮尔逊积矩相关系数原理分析我写了一片文章,由于图片太多,上传很麻烦,我就放在了GITHUB,地址:https://github.com/tianbaoxing/hmahout/blob/master/doc/recommder/pearsonCorrelation-%E5%8E%9F%E7%90%86%E5%88%86%E6%9E%90.doc GenericUserBasedRecommender推荐源码流程图:
GenericUserBasedRecommender中recommend推荐方法源码分析 @Override public List<RecommendedItem> recommend(long userID, int howMany, IDRescorer rescorer) throws TasteException { Preconditions.checkArgument(howMany >= 1, "howMany must be at least 1"); log.debug("Recommending items for user ID '{}'", userID); long[] theNeighborhood = neighborhood.getUserNeighborhood(userID); if (theNeighborhood.length == 0) { return Collections.emptyList(); } FastIDSet allItemIDs = getAllOtherItems(theNeighborhood, userID); TopItems.Estimator<Long> estimator = new Estimator(userID, theNeighborhood); List<RecommendedItem> topItems = TopItems .getTopItems(howMany, allItemIDs.iterator(), rescorer, estimator); log.debug("Recommendations are: {}", topItems); return topItems; } (2) FastIDSet allItemIDs = getAllOtherItems(theNeighborhood, userID); 获取到用户邻居喜欢的主题(去掉自己喜欢的主题) (3) List<RecommendedItem> topItems = TopItems.getTopItems(howMany, allItemIDs.iterator(), rescorer, estimator); 获取到评分最高的主题,推荐给用户. 现在看看那TopItems.getTopItems(howMany, allItemIDs.iterator(), rescorer, estimator);源码 public static List<RecommendedItem> getTopItems(int howMany, LongPrimitiveIterator possibleItemIDs, IDRescorer rescorer, Estimator<Long> estimator) throws TasteException { Preconditions.checkArgument(possibleItemIDs != null, "argument is null"); Preconditions.checkArgument(estimator != null, "argument is null"); Queue<RecommendedItem> topItems = new PriorityQueue<RecommendedItem>(howMany + 1, Collections.reverseOrder(ByValueRecommendedItemComparator.getInstance())); boolean full = false; double lowestTopValue = Double.NEGATIVE_INFINITY; while (possibleItemIDs.hasNext()) { long itemID = possibleItemIDs.next(); if (rescorer == null || !rescorer.isFiltered(itemID)) { double preference; try { preference = estimator.estimate(itemID); } catch (NoSuchItemException nsie) { continue; } double rescoredPref = rescorer == null ? preference : rescorer.rescore(itemID, preference); if (!Double.isNaN(rescoredPref) && (!full || rescoredPref > lowestTopValue)) { topItems.add(new GenericRecommendedItem(itemID, (float) rescoredPref)); if (full) { topItems.poll(); } else if (topItems.size() > howMany) { full = true; topItems.poll(); } lowestTopValue = topItems.peek().getValue(); } } } int size = topItems.size(); if (size == 0) { return Collections.emptyList(); } List<RecommendedItem> result = Lists.newArrayListWithCapacity(size); result.addAll(topItems); Collections.sort(result, ByValueRecommendedItemComparator.getInstance()); return result; } 主意:IDRescorer就是在这里调用的 (1) if (rescorer == null || !rescorer.isFiltered(itemID)) 这里主要是判断这个主题是否过滤掉. (2)preference = estimator.estimate(itemID)计算这个用户喜欢这个主题的评分 (3)topItems.add(new GenericRecommendedItem(itemID, (float) rescoredPref)); 把主题增加到一个集合中. |