##Building a custom search for support tickets AKA a bug tracker

When submitting a bug report, you want to make sure that no one has submitted the same report than you, to avoid duplicates as well as mean comments from the community. The problem is, the default search engine for github tickets does not handle synonyms or even plurals, so it's not very efficient. So we decided to build another search engine.

To do this, we used the Algolia API, as well as two plugins, in Dataiku DSS of course, because we wanted something quick to setup.

This is actually a project that we use internally at Dataiku to search bugs reports, and our support tickets.

# BUSINESS GOALS

1. Save time by finding the relevant github ticket right away:
  - find answer to already asked questions
  - avoid duplicates
  - comment on issues encountered by another user
  - etc.

2. Visualise stats one those tickets, for fun


#HOW DID WE DO THIS?

The idea is simple:
- We download issues from github,
- We format them so they can be used by the Algolia API,
- Sync them to Algolia.

To save time, we make sure that everyday, only the tickets updated the day before are pushed to Algolia.


#EXPLORE THE SAMPLE PROJECT

1. So you can understand what we're trying to build, let's start by exploring the finished project. It's a search page. You can type in a few words to look for issues, and refine your search with the facets on the left.
<p class="text-center">
<a href="/projects/DKU_CUSTOM_SEARCH/dashboards/insights/ysEWf6b_searchtickets/view"  class="btn btn-datasets-color btn-cta-big-mod"><i class="icon-dku-sample_project" class="btn-cta-big-mod-icon" />Search Page</a><br/>
</p>
2. Next, let's explore the flow:

    <p class="text-center"><a href="/projects/DKU_CUSTOM_SEARCH/flow/"  class="btn btn-datasets-color btn-cta-big-mod"><i class="icon-dku-sample_project" class="btn-cta-big-mod-icon" />Flow</a><br/><br/></p>

The flow is linear. 

We started by downloading all the issues from the Scikit repository on git. You can obviously choose another repository, even a private one to which you have access.  We do this with Dataiku DSS's github plugin, which provides a custom dataset. This is the input dataset at the left of the flow.

  - This dataset does not store the data (just a sample for preview), it mereley provides the connection. So <a href="/projects/DKU_CUSTOM_SEARCH/recipes/sync_issues_raw/">a sync recipe</a> copies this data into a <a href="/projects/DKU_CUSTOM_SEARCH/datasets/issues_cache/explore/">managed local dataset</a>.
  - Then a <a href="/projects/DKU_CUSTOM_SEARCH/recipes/compute_issues_with_comments_bodies/#">Python recipe</a> formats the comments.
  - The resulting dataset goes through <a href="/projects/DKU_CUSTOM_SEARCH/recipes/prepare_issues_prepared/">a preparation script</a> to rename some columns and format dates according to Algolia's needs. The result is partitioned thanks to the “Redispatch partitioning according to input columns” option. (We partition according to the “updated_at_partition_id” column).
  - This <a href="/DKU_CUSTOM_SEARCH/datasets/issues_prepared/explore/">final dataset</a> is uploaded to Algolia by <a href="/projects/DKU_CUSTOM_SEARCH/recipes/sync_issues_prepared/">a sync recipe</a>, which every day uploads just the tickets updated the day before. Here as well, the output dataset does not store the data, it only provides the connection.

3<span></span>. Finally, here are some graphs. We could run many more stats on this dataset, this is just an illustration.
<p class="text-center">
<a href="/projects/DKU_CUSTOM_SEARCH/datasets/issues_cache/visualize/"  class="btn btn-datasets-color btn-cta-big-mod"><i class="icon-dku-sample_project" class="btn-cta-big-mod-icon" />Graphs</a><br/><br/>
</p>

#RELATED CONTENT

If you want to look into some other projects or articles related to this sample project, have a go at these:

   - back to basics: <a href="http://www.dataiku.com/learn/guide/getting-started/dss-concepts/universes-and-concepts.html">the main concept of DSS</a>
   - <a href="https://www.dataiku.com/learn/guide/code/plugins/writing-your-first-dss-plugin.html">how to write your first plugin</a>
   - <a href="http://www.dataiku.com/learn/guide/code/webapps/your-first-webapp.html">how to write your first webapp</a>
   - all you need in our <a href="http://doc.dataiku.com/dss/latest/">reference documentation</a>
 
In general, you'll find lots of stuff in <a href="http://www.dataiku.com/learn/">the Learn section of Dataiku.com</a>




#INSTALLATION

Should you want to reuse and adapt this project to your own need, here are the steps. This project requires:

- the python package markdown, which you can install by typing `DATA_DIR/bin/pip install markdown`
- the plugins github and algolia, which you can install on <a href="/admin/plugins/">the plugins management page</a>.
- a patch to PyGitHub to avoid exceeding the rate limit: modify `DATA_DIR/pyenv/lib/python2.7/site-packages/github/Requester.py` according to https://github.com/PyGithub/PyGithub/pull/378/files
- a few images for the webapp. Please unzip <a href="/local/static/algolia.zip" target="_blank">this archive</a> in DATA_DIR/local/static (create the later directory if needed. See <a href="http://www.dataiku.com/learn/guide/code/webapps/static-files-in-webapp.html">here</a> for more info).
- a setup of Algolia:

## Algolia setup
1. Create an account on <a href="https://www.algolia.com/">algolia.com</a> and choose the free plan. Skip the tutorial that offers to import data: this project will import data.
1. Browse to “API keys” and copy-paste the application ID and Admin API key into <a href="/projects/DKU_CUSTOM_SEARCH/datasets/algolia/settings/">the algolia dataset settings</a>. While at it, copy-paste the Search-Only API Key into the JS tab of <a href="/projects/DKU_CUSTOM_SEARCH/insights/ysEWf6b/HTML_APP/edit/">the webapp</a>.
1. On algolia.com, create an new index “issues”.
1. Push data to Algolia: build the dataset “algolia_unpartitioned_for_initial_import”, build mode “recursive”. Note that downloading issues from github is slow, maybe 5K issues per hour.
1. Optionnaly, schedule a daily data update: in <a href="/projects/DKU_CUSTOM_SEARCH/scenarios/">a scenario</a>, create a new job schedule on the dataset “algolia” to build the partition “PREVIOUS_DAY”, daily at 3:10, build mode “force-rebuild”.
1. Reload the page algolia.com to see the fresh data, then configure the index: click “ranking”, add this in “attribute to index”: title, texts, objectID, state, _tags, user, milestone, assignee. In Custom ranking, add updated_at_ts.
1. Finally, configure the Display tab of Algolia: in “Attributes for faceting”, enter _tags, assignee, created_at_ts, milestone, state, updated_at_ts, user.