Pulp Data


Here at qa2l.com, we love stories. History, sci-fi, fantasy—these are some of our favorite genres. But every now and then we go for a good data story as well!

Today's post is a set of three vignettes told by prominent data practitioners: Gary Angel, Stéphane Hamel, and David Bressler. Each story is a window into the trials and tribulations analysts go through when implementing or managing data solutions. If you like what you read, we'd love to hear your own war story and maybe feature it in our next edition.


Déjà V...isitorization All Over Again

When Gary Angel started Digital Mortar in 2016, he had no illusions about the many challenges that lay ahead on his Quest for Measuring the In-Store Customer Journey

The nascent industry of capturing and analyzing the behavior of customers as they peruse store departments in many ways reminded Gary of the grizzled early days of web analytics. "Half of the time, most of what you'd find were problems in the data, not real stuff," he says.

We sat down with Gary to chat about the challenges that wait in store for this new industry, the parallels he draws to the data quality aspects of the digital analytics realm, and, of course, the gory details.

Correct identification of visitors/customers seems to be as big a challenge in retail analytics as it is in digital analytics. 

One of the ways in which stores identify customers is through Wi-Fi access points. Smart phones send out passive pings at certain intervals. If a store has 2 or more Wi-Fi access points, they can triangulate the signal from the visitor's phone and map a customer's location as they move through the store. One of the advantages of this approach is that there is no need for new hardware if the store already has Wi-Fi access points. But in 2016, Apple updated iOS to issue a different MAC address with each passive ping, making it impossible to stitch together pings from the same phone/customer.

Another collection method employed in retail analytics is cameras. They do a good job of keeping counts ("How many customers entered the store?") but have difficulty answering questions that span more than one camera zone ("How many customers passed through the shoe department in the last hour?").

The approaches and inherent limitations thereof are all eerily reminiscent of the early days of web analytics. For starters, there is dependency on systems that were not built with measurement in mind: web server logs in the case of web analytics, Wi-Fi access point logs and cameras in the case of retail analytics. The similarities carry over into visitorization methods: digital analytics used to key off the IP address, retail analytics struggles with dependency on the MAC address. It is not surprising that even the cop-outs sound similar: "Focus on the trends, not the absolute numbers," or "Extrapolate all customer counts based on Android counts." 

An important lesson Gary shares from the evolution of the Analytics industry is that in order to bypass such obstacles, one must first understand the limitations of any existing system. Knowing how and why the data is bad is instrumental in deciding if the data can / should be extrapolated, trended or otherwise adjusted. Long term, collection technologies designed with measurement in mind need to evolve from such understanding.

The digital analytics field has come a long way since relying on IP addresses in web server log files. Specialized tracking cookies, visitor co-op's, and enhanced processing capabilities have made it possible to stitch visitor activity even across different devices. Similarly, in-store analytics is beginning to adopt hybrid methods of data collection, featuring passive sniffers as well as a new generation of cameras & image processing software.


Order Confirmation Page Lost Its Tracking? Keep Calm, and Estimate


It was a Friday night when one of David Bressler's clients rolled out a major site redesign. The redesign spanned across several properties: a Ticketing system, a Vacation Package system, and a Merchandise Store. These were all using Adobe Analytics but with various implementations standards, some employing DTM and others using a hard-coded implementation of the s_code/AppMeasurement Javascript includes.

The following Monday it was discovered that the Merchandising store was no longer reporting any revenue or orders. David's team at Net Conversion was called in and they were able to diagnose the issue quickly: most of the Merchandise Store pages were using DTM, but the confirmation page was using a hardcoded scode. The scode file was hosted on a server that was decommissioned as part of the upgrade. 

In trying to bridge the data gap, David's team devised a creative approach. Sales data for three months prior to the blackout was analyzed. They discovered a strong correlation between visitors to the billing page and the order confirmation page (r=.98). A linear equation was built to estimate both Orders and Revenue directly in the Adobe interface using a combination of Adobe's Calculated Metrics and Custom Segments. 

Some of David's takeaways from this anecdote are that there are always creative ways to approach data loss, and that setting up automated alerts to detect data gaps early can go a long way in addressing such issues. 


"Business Without Ethics is Not Worth Doing"


As the original author of WASP back in 2006, and more recently Da Vinci Tools, Stéphane Hamel has probably seen more data gore than any given room full of analysts combined.

 
Stéphane tells us about a training/coaching workshop he conducted for an agency. For such training sessions, Stéphane always recommends using real client data in order to provide real-life examples and to crank up the workshop to a proper digital analytics maturity level.


As he was introducing the Google Analytics reports, he was also showing where the data originated from using a browser and WASP. Then he noticed something odd… Some pages didn’t have any tags at all, while others had the older GA tracking code plus the newer library. That is: some data had black holes, and some had bloated… resulting in unreliable insight!



Stéphane pointed this out and insisted on the importance of telling the client. The audience was strictly internal, composed of the account managers, web developers and a few analysts. The initial reaction was “Oh no! We can’t tell the client! We would look unprofessional!” Stéphane's reaction was instantaneous and visceral—he was ready to pack his things and leave if they chose to not inform their client. By way of conclusion, he quotes a wise man he once met: “Business without ethics is not worth doing.”

QA2L is a data governance platform specializing in the automated validation of tracking tags/pixels. We make it easy to automate even the most complicated user journeys (flows) and to QA all your KPIs in a robust set of tests that is a breeze to maintain.

Try it real quick:

Subscribe to our quarterly data quality newsletter: