# Hands-On Tutorial: Geographic Processing with Dataiku[¶](https://knowledge.dataiku.com/latest/kb/analytics-ml/geospatial/geo-processing.html#hands-on-tutorial-geographic-processing-with-dataiku "Permalink to this headline")

This tutorial demonstrates many of the visual geographic processors available in Dataiku.

## Workflow Overview[¶](https://knowledge.dataiku.com/latest/kb/analytics-ml/geospatial/geo-processing.html#workflow-overview "Permalink to this headline")

Using the data on French post offices from the Creating maps without code tutorial and the familiar Haiku T-Shirt customer data used in many examples, this tutorial reviews processors that:

* Create GeoPoints from lat/lon coordinates

* Extract lat/lon coordinates from GeoPoints

* Resolve IP addresses to geographic information like country and coordinates

* Calculate distance between two geographic points

* Perform a geographic nearest-neighbor join between two datasets with geographic coordinates.

By the end of this brief walkthrough, your workflow in Dataiku should mirror the one below. Moreover, the completed project can be found in the Dataiku gallery.

## Supporting Data[¶](https://knowledge.dataiku.com/latest/kb/analytics-ml/geospatial/geo-processing.html#supporting-data "Permalink to this headline")

The data in this tutorial come from two sources:

* The first dataset is the *post\_offices\_prepared*, found following the data preparation steps in the Creating maps without code tutorial .

* The second dataset, Orders\_enriched, comes from the fictional retailer, Haiku T-Shirts. You can download orders\_enriched as a csv file or export it from the Automation tutorial.

## Creating GeoPoints[¶](https://knowledge.dataiku.com/latest/kb/analytics-ml/geospatial/geo-processing.html#creating-geopoints "Permalink to this headline")

Resume the project created in the Creating maps without code lesson.

Recall that this lesson used the **Create GeoPoint from lat/lon** processor in the Prepare recipe, *compute\_post\_offices\_prepared* .

This visual processor takes two columns of latitude and longitude coordinates as input and produces a GeoPoint ready for mapping and other spatial analysis.

If not yet having already done so, deploy the visual analysis script to the Flow, creating the output dataset *post\_offices\_prepared*.

## Resolving IP Addresses[¶](https://knowledge.dataiku.com/latest/kb/analytics-ml/geospatial/geo-processing.html#resolving-ip-addresses "Permalink to this headline")

In the previous tutorial, we successfully mapped the location of post offices in France. Now we want to compare those locations to the locations of our French customers.

In the same project, upload the *Orders\_enriched* dataset from the Haiku T-Shirt retailer.

This data includes information on orders made by customers, including the IP address of those customers. From this IP address, we can use the **Resolve GeoIP** visual processor to extract a geographic location for each customer.

* After uploading the dataset, create a new visual analysis in the **Lab**.

* To simplify the data wrangling, remove five columns we won’t need: *order\_date*, *pages\_visited*, *birthdate*, *user\_agent*, and *campaign*.

* Using the *Formula* processor, create a new column `total` using the expression `tshirt\_price \* tshirt\_quantity`.

* Use the **Resolve GeoIP** processor on the *ip\_address* column, extracting the country and GeoPoint as new columns. Use `ip\_address\_` as the prefix for generated output columns.

* Using one of these new columns, it’s now easy to keep only rows where *ip\_address\_country* is France.

Deploy this Script to the Flow, producing the output dataset *Orders\_enriched\_prepared*.

## Mapping Unique Customers[¶](https://knowledge.dataiku.com/latest/kb/analytics-ml/geospatial/geo-processing.html#mapping-unique-customers "Permalink to this headline")

The **Resolve GeoIP** processor produced a location for each customer. The dataset, however, could use some cleaning. Before mapping, let’s perform a simple **Group By** recipe to get a dataset of unique customers.

* From the *Orders\_enriched\_prepared* dataset, initiate a **Group By** recipe.

* Choose to group by *customer\_id* and name the output dataset `unique\_customers`.

* In the Group step, for Per Field Aggregations, choose the Sum of *total*, the First of *gender*, and the First of *ip\_address\_geopoint*.

* In the Output step, remove the “\_first” from the *gender* and *ip\_address\_geopoint* columns for clarity,.

* Run the recipe.

We can visualize our progress with a quick map of the results.

* On the Charts tab of the *unique\_customers* dataset, create a **Scatter Map**.

* Drag *ip\_address\_geopoint* to the Geo field, *gender* to the color droplet, and *total\_sum* to the base radius field.

Now we have an interactive map of all customers in France colored by gender and scaled according to the total sum of all their purchases.

## Calculating Distance[¶](https://knowledge.dataiku.com/latest/kb/analytics-ml/geospatial/geo-processing.html#calculating-distance "Permalink to this headline")

To begin analyzing our potential shipping costs, let’s simply calculate the distance from the office to each customer.

* From the *unique\_customers* dataset, create a **Prepare** recipe.

* Use the **Compute distance between geopoints** processor on the *ip\_address\_geopoint* column.

>

>

> 	+ This processor will compute distance between a fixed geopoint or another geopoint column. Choose **a fixed geopoint** with lat/lon coordinates of `48.8443079` and `2.3685028`, respectively.

> 	+ Select kilometers as the output unit and name the column `km\_to\_office`.

>

Using the **Analyze** tool, we can see that the *km\_to\_office* column has an extremely right-skewed distribution, with the vast majority of customers less than 20 kilometers away and small numbers of customers hundreds of kilometers away.

## Geo-Joining Spatial Datasets[¶](https://knowledge.dataiku.com/latest/kb/analytics-ml/geospatial/geo-processing.html#geo-joining-spatial-datasets "Permalink to this headline")

In order to ease our shipping costs, perhaps we could explore collaboration with the network of post offices. We can use the **Geo-join** processor to find the nearest post office to every customer.

* In same Prepare recipe (*compute\_unique\_customers\_prepared*), use the **Extract lat/lon from GeoPoint** processor on the *ip\_address\_geopoint* column to produce two output columns: `customer\_lat` and `customer\_lon`.

* Use the **Geo-join** processor to match the nearest post office to each customer.

>

>

> 	+ *customer\_lat* and *customer\_lon* are the columns from “this” dataset. They need to be joined with the *Latitude* and *Longitude* columns from the *post\_offices\_prepared* dataset.

> 	+ Additionally, copy the columns *Libellé\_du\_site* and *GeoPoint*.

>

* In another step, rename the new columns for clarity:

>

>

> 	+ *Libellé\_du\_site* to `nearest\_post\_office`

> 	+ *GeoPoint* to `post\_office\_GeoPoint`

> 	+ *join\_distance* to `km\_to\_post\_office`.

>

* Run the recipe.

In the output dataset, *unique\_customers\_prepared*, we can use the **Analyze** tool to examine the most common post offices and the distribution of the distance to the nearest post office.

* It seems nearly 40% of customers share the same nearest post office.

* Nearly all customers have a post office within 1 kilometer (according to their IP address).

Note

**Geo Join Recipe**

In addition to the Geo-join processor within the Prepare recipe, a more powerful alternative is the Geo Join recipe. For example, using the Geo Join recipe, you could configure a join condition that finds all post offices that are within (or beyond) a 1 km distance from the customer. For hands-on practice with the Geo Join recipe, visit Hands-On Tutorial: Geo Join.

## What’s Next?[¶](https://knowledge.dataiku.com/latest/kb/analytics-ml/geospatial/geo-processing.html#what-s-next "Permalink to this headline")

Congratulations! You used a range of different visual geographic processors to determine the distance between customers and their nearest post office.

Review a read-only version of this project in the Dataiku gallery.

More information about geographic processing in Dataiku can be found in the product documentation on Geographic processors.
