# Hands-On Tutorial: R Markdown Reports in Dataiku[¶](https://knowledge.dataiku.com/latest/courses/advanced-code/r/rmarkdown.html#hands-on-tutorial-r-markdown-reports-in-dataiku "Permalink to this headline")

## Overview[¶](https://knowledge.dataiku.com/latest/courses/advanced-code/r/rmarkdown.html#overview "Permalink to this headline")

R Markdown is an R package to create fully reproducible, print-quality documents that incorporate narrative text and code to produce elegant output that can be shared on dashboards or delivered in a variety of static formats for offline reading.

It is an example of literate programming as it weaves together natural language with source code.

In this brief tutorial, we’ll create a simple R Markdown report in Dataiku. To view the final output, visit Haiku T-Shirt Analytics.

### Technical Requirements[¶](https://knowledge.dataiku.com/latest/courses/advanced-code/r/rmarkdown.html#technical-requirements "Permalink to this headline")

* A proper installation of R on the server running Dataiku.

>

>

> 	+ Visit R integration for installation instructions.

>

* An existing R code environment including the `ggplot2` and `magrittr` packages, in addition to the required `dplyr` and `dataiku` packages

>

>

> 	+ Visit Operations (R) for instructions on creating an R code environment.

>

* An installation of pandoc, in order to download reports as PDFs, with the `adjustbox`, `collectbox`, `ucs`, `collection-fontsrecommended`, and `titling` LaTeX packages.

### Supporting Data[¶](https://knowledge.dataiku.com/latest/courses/advanced-code/r/rmarkdown.html#supporting-data "Permalink to this headline")

* The *Orders\_by\_customer* dataset.

+ The *Orders\_by\_customer* dataset can be found in the project **DSS Tutorials > Automation > Deployment**.

+ Alternatively, download \*Orders\_by\_customer\* and upload the dataset to a new blank project.

## Creating A New R Markdown Report[¶](https://knowledge.dataiku.com/latest/courses/advanced-code/r/rmarkdown.html#creating-a-new-r-markdown-report "Permalink to this headline")

From the Deployment tutorial or the new blank project if directly downloading the data, create a new empty R Markdown report:

* In the Code menu (</>) of the top navigation bar, select **RMarkdown Reports**.

* Click “+ New Report” or “+ Create Your First Report”.

* Choose “Empty document” and type a name for the report, in this case `Haiku T-Shirt Analytics`.

You will be redirected to the R Markdown editor.

## The R Markdown Editor[¶](https://knowledge.dataiku.com/latest/courses/advanced-code/r/rmarkdown.html#the-r-markdown-editor "Permalink to this headline")

The R Markdown editor is divided into two panes.

The left pane allows you to see and edit the markdown (including code) underlying the report.

The right pane gives you several views on the report.

* The **Preview** tab allows you to write and test your markdown in the left pane while having immediate visual feedback in the right pane. At any time you can save or reload your current markdown by clicking on the Save button.

* The **Log** is useful for troubleshooting problems.

* **Settings** allows you to set the output format of the preview. You can also set the code environment, if you want it to be different from the project default.

## Writing An R Markdown Report[¶](https://knowledge.dataiku.com/latest/courses/advanced-code/r/rmarkdown.html#writing-an-r-markdown-report "Permalink to this headline")

Let’s build the markdown and code behind the report. In this section, we’ll add three types of content:

* Metadata inside a YAML header, wrapped by `---`

* R code chunks, wrapped by `````

* Narrative text with simple markdown formatting

### Defining the Document Metadata[¶](https://knowledge.dataiku.com/latest/courses/advanced-code/r/rmarkdown.html#defining-the-document-metadata "Permalink to this headline")

Start with the YAML header, demarcated by three dashes, `---`, to define document metadata.

In the left pane, insert the following code to define document properties, including the title, author name, date, and how to handle certain types of output.

* The report date specification uses R code to insert the current system date.

* When generating PDF output for this report, it should include a table of contents.

§ ---

§ title: "Haiku T-Shirt Analytics"

§ author: "Dataiku Learn"

§ date: "`r format(Sys.Date())`"

§ output:

§ pdf\_document:

§ toc: true

§ ---

This YAML header defines only a few properties, but it can control many options such as the formatting of sections, figures and tables.

### Importing the Necessary Packages[¶](https://knowledge.dataiku.com/latest/courses/advanced-code/r/rmarkdown.html#importing-the-necessary-packages "Permalink to this headline")

In an R Markdown document, three backticks demarcate the beginning and end of a code chunk.

In the left pane, insert the following code chunk to import the R packages that will be used to generate the report output.

# Pull the necessary libraries

library(dataiku)

library(magrittr)

library(ggplot2)

library(dplyr)

Each code chunk specifies the language (in this case, R) and additional parameters that apply to that code chunk. These parameter settings will not include the code itself in the final output, nor print any warnings or messages.

### Report Introduction and Data Import[¶](https://knowledge.dataiku.com/latest/courses/advanced-code/r/rmarkdown.html#report-introduction-and-data-import "Permalink to this headline")

The third type of content in an R Markdown report is narrative text (markdown).

In the left pane, insert the following code chunk and line of text.

# Read the Dataiku dataset we want to use

df <- dkuReadDataset("Orders\_by\_customer", samplingMethod="head", nbRows=1000000)

§ This report is prepared for the executives of the Haiku T-Shirt company to apprise them of the current state of customer analytics.

It uses the `dkuReadDataset()` function to read the *Orders\_by\_customer* dataset in the same way an R code recipe would. Outside of the code chunk, text forms the body of the report.

### Basic Reporting on Customer Location[¶](https://knowledge.dataiku.com/latest/courses/advanced-code/r/rmarkdown.html#basic-reporting-on-customer-location "Permalink to this headline")

Now let’s build the first main section of the report.

In the left pane, insert the following block of code and text:

§ # Customers by Country

§ The following bar chart shows that:

§ - the United States is our largest market

§ - the agglomeration of all other countries where we have fewer than 100 customers accounts for more business than any other single market

§ - China is the next largest market

df %>%

count(ip\_address\_country) %>%

filter(n>=100) -> country\_count

df %>%

count(ip\_address\_country) %>%

filter(n<100) %>%

summarize(ip\_address\_country="Others",n=sum(n))%>%

bind\_rows(country\_count) -> country\_count

country\_count$ip\_address\_country[is.na(country\_count$ip\_address\_country)] <- "Unknown"

country\_count$ip\_address\_country <- factor(country\_count$ip\_address\_country,

levels=country\_count$ip\_address\_country[order(country\_count$n)])

country\_count %>%

ggplot(aes(ip\_address\_country,n,fill=n)) +

geom\_bar(stat="identity") +

coord\_flip()

Now let’s analyze this block in detail, piece by piece:

* Outside of a code chunk, the hashtag, #, is a markdown indication for a new heading.

* The text that explains the chart uses the - markdown to create a bulleted list.

The R code produces the plot above in several steps:

* Process the raw data frame to count the number of customers in each country, filtering out all countries with fewer than 100 customers, and saving to a *country\_count* data frame.

§ df %>%

§ count(ip\_address\_country) %>%

§ filter(n>=100) -> country\_count

* Count the total number of customers across countries with fewer than 100 customers each, and add them as an extra row in the *country\_count* data frame.

§ df %>%

§ count(ip\_address\_country) %>%

§ filter(n<100) %>%

§ summarize(ip\_address\_country="Others",n=sum(n))%>%

§ bind\_rows(country\_count) -> country\_count

* Recode the NA values for customers whose country is unknown to the string “Unknown”; then reorder the factor levels of the column *ip\_address\_country* so that they are organized in descending order from the country with the most customers to the one with the least.

§ country\_count$ip\_address\_country[is.na(country\_count$ip\_address\_country)] <- "Unknown"

§ country\_count$ip\_address\_country <- factor(country\_count$ip\_address\_country,

§ levels=country\_count$ip\_address\_country[order(country\_count$n)])

* Finally, create the bar chart of number of customers per country, with the coordinate axis flipped so that the bars are horizontal rather than vertical.

§ country\_count %>%

§ ggplot(aes(ip\_address\_country,n,fill=n)) +

§ geom\_bar(stat="identity") +

§ coord\_flip()

### Reporting on Customer Lifetime Spending[¶](https://knowledge.dataiku.com/latest/courses/advanced-code/r/rmarkdown.html#reporting-on-customer-lifetime-spending "Permalink to this headline")

Now build the second graphic. In the left pane, insert the following markdown and code.

The R code in this section produces another bar chart, showing the total amount spent by customers, broken down by gender and whether they are part of the company’s marketing campaign.

§ # Customer Lifetime Spending

§ A quick look at the amount spent by customers shows that those targeted by the company's marketing campaign tend to spend much more than those who aren't.  There does not appear to be a significant difference between genders.

df %>%

ggplot(aes(campaign, total\_sum,fill=gender)) +

geom\_bar(stat="summary",fun.y="mean",position="dodge") +

scale\_y\_continuous(name="Customer lifetime spending")

## Publishing An R Markdown Report[¶](https://knowledge.dataiku.com/latest/courses/advanced-code/r/rmarkdown.html#publishing-an-r-markdown-report "Permalink to this headline")

When you are done with editing, there are a number of options for distributing your report.

* Publish on a dashboard from the **Actions** dropdown at the top-right corner of the screen.

* Download to your local filesystem in one of a variety of formats, again from the Actions dropdown.

* Email as part of an automation scenario.

## What’s next[¶](https://knowledge.dataiku.com/latest/courses/advanced-code/r/rmarkdown.html#what-s-next "Permalink to this headline")

Congratulations! Using Dataiku, you have created an R Markdown report.

To view a completed version of this report, visit Haiku T-Shirt Analytics.

For further inspiration on what is possible in R Markdown reports, you can visit the R Markdown gallery (external).
