Merging Data from Different Sources

May 3, 2010

In my Year in Review post I mentioned that I was not going to focus on certain tools and focus more on my experiences that could be applied to different tools. In this post, I am going to cover a little bit of each. I am going to focus more on a specific web analytics tool and how to merge data from different sources to get a more holistic view of the visitor. My experience of merging data from different sources comes mainly from a media site perspective and that is what I am going to focus on. As some of you may know, Omniture allows you to use Data Sources to import information from another source into Omniture. That is one way to get an overall view of the customer. However, I am going to focus on option from Omniture that is not as widely known called ‘Data Feed.’ The data feed is a great option if you want to export your Omniture data into another tool to do more analysis.

Data Feed Information
The short description of the data feed is the raw data for one report suite. Another way to think about the data feed is the data that Omnitue sees before any scrubbing of the data has occurred, which includes before any of the of the IP Address blocking has taken place or any of the VISTA (Visitor Identification, Segmentation and Transformation Architecture) rules have run. Each compressed data feed file; one per report suite contains several tab-delimited text files. One file contains traffic data and the other files contain reference data. Some columns in the traffic data file contain a numeric ID number that references a value in one of the reference data files. The data feed can be sent to a FTP account as frequently (hourly) or less frequently (yearly) as you want, though in my experience I have found a daily feed to be the most effective. The FTP account that the data feed file is sent to can be one either one that Omniture sets up or one that your company sets up. As visitors will be viewing your site at all hours of the day, some of the visits might not have finished when the data feed starts to run. For those visits that were not completed when the data started running, they will be included in the next data feed file. The size of the file can vary greatly depending on how much traffic the one report suite is collecting. Though on average, the data feed will be 500 bytes for each page view or 500MB for 1M page views per day. To get more information about setting up a data feed for your company, I would contact your account manager.

Business Case for Using Data Feed
Working for a media company presents some challenges that companies in other verticals do not necessarily face. A little while ago, the core site team and the online marketing came to me and they wanted to know how much revenue we were making per page view and per visit. To a lot of companies, this is a pretty easy calculation but for a media site this can be a little challenging especially with the systems that we are using. The other reason this can be a little challenging is the revenue can change on a page by page basis, depending rate for the ad impression. The first hurdle that I had to overcome is the system that we use to display ads on our site. The system that we use to display the ads on our site is DART for Enterprise (DE). DE is exactly the same thing as DART for Publishers (DFP) except that DE is housed on our servers and we do not get the updates to the software as often. But we still have the same flexibility to target the ads that will meet the client’s needs. Since the DART data sits on our servers, it is a little more difficult to export the data out of DART and into Omniture. So after doing a little digging and making a few phone calls, I soon learned about the Data Feed that Omniture offers. After thinking about how to best use the data feed, I went and talked to our Business Intelligence team to see we can work out. After running through some of the different options, we ultimately decided to send the data feed to our FTP account and to upload the data into Business Objects. Once we loaded the data into Business Objects, we then could link the Omniture data to the DART data since one column in both files are the same. Remember when I mentioned above that the data feed is the raw data before any of the scrubbing has been done yet. One of the first things that we had to do was apply the same VISTA and/or ip address exclude rules to the data feed that we do to the Omniture data to get the data feed numbers as closely as possible to the numbers in Site Catalyst. Once we accomplished getting the both sets of data as close as possible then we can now merge both the DART data and the Omniture in Business Objects. Once the data is merged together in one table, we can then calculate the revenue per page view and per visit. Also, since we are using the data feed, which has all of the data, we can then also calculate revenue per page view per traffic source. This information will allow us to see which page or pages are our highest revenue generating pages and which traffic source is driving the most revenue. Even though a particular traffic source is driving a lot of page views, but are those pages high value pages or lower value pages. This information will then let us know which traffic source we need to focus on to drive more traffic to higher value pages.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: