🥳 You have completed all topics in the Handbook. Click here to claim your certificate!

3. Structured Query Language (SQL)

Structured Query Language (SQL) is a language designed for accessing and manipulating structured data, particularly in a relational context. Understanding the basics of SQL is important for anyone who wants to take control of data analysis.

When you buy goods from a grocery store and the cashier registers them, the store’s application interfaces use SQL to update the transaction into a relational database and to update the inventory stock count.

When you log into your favorite app on your phone, that app uses SQL to fetch your user data records from the user database.

When you start building your own app, one of the first things you’ll do is create a database schema with some tables and entities – with SQL, of course.

And when you use a data analytics system like Google Analytics or Snowplow, you are using SQL either indirectly through an analysis interface or directly by typing the queries yourself.

SQL is ubiquitous in the digital world. It’s a language that’s available in a million different contexts – from application code to spreadsheet scripts, from analytics interfaces to smart TVs, and from AI assistants to smart kitchen appliances.

For technical marketers, it’s one of the key skills to learn. For a data-savvy marketer, SQL offers a way to communicate directly with the data structures of an organization. Without direct SQL access, the marketer is forced to rely on the user interfaces and the abstractions of the tools and services that the organization uses. While this is not an issue for many use cases, it does become limiting in the long run, especially if the marketer wants to have direct data access to verify or validate the results these tools offer.

In this Topic, you’ll learn the key concepts of SQL as well as some of the more ontological insights that structured data can introduce to your “data-informed” world view. As with all languages, SQL is more than just a communication tool that lets you query data. It’s also a way to understand and to structure the world.

Don’t miss this fact!

Analytics systems often have reporting interfaces that let you pull in data from the data warehouse without having to write any SQL. This is very useful. However, remember that these systems build their own abstractions against their own schemas. By learning SQL, you can verify and validate how these systems work so that you can understand them better and so that you can structure the source data in optimal ways.

Validate and verify with SQL

When you use an analytics system for your reporting needs, you are relying on abstractions and visualizations determined by the system itself. Some analytics tools let you edit these definitions to some extent, but ultimately these tools draw up their own queries against their own data using their own preconceptions about what those queries should look like.

In many cases, this is just fine. After all, as long as the analytics system produces consistent results it doesn’t necessarily matter how the sausage gets made, right?

However, these systems are often designed for a global population of users with vastly different business needs and market fits. Vendored analytics systems can’t tailor to unique business cases of each individual customer – instead, they choose a more homogenous approach with dimensions, metrics, and queries that are generic enough to suit the majority of their customers without singling out any one of their unique approaches.

But what if you do want to be singled out? What if you want your analytics system to relate details about your data in a way that goes against the grain of what is generally accepted?

What if you don’t agree with the analytics system’s definitions? What if you believe your business questions to be unique enough to require flexibility from the analytics system that the user interface doesn’t offer?

This should always be the case! You should always be critical of the systems that you use and the abstractions they rely on.

Unfortunately, many analytics systems simply don’t give you the freedom of interacting with data in any way you like. Google Analytics, for example, places restrictions on what dimensions and metrics you can query together. Similarly, its key aggregation units of “Users” and “Sessions” are based on algorithms that you might not agree with.

This is where direct data access becomes your friend and ally. By using SQL on the data produced by these analytics systems, you can break the definition lock and choose your own approach without being hamstrung by the user interface of the analytics system.

-- Count users as the number of different "customer_id" values
      key = 'customer_id')) AS users

In the query above, the aggregation unit of users is redefined by counting the number of unique customer_id values in the database. The analytics system itself doesn’t let you change the key metric “Users” to this in the user interface, so you need to resort to SQL to get the job done.

SQL lets you complement the analytics interfaces of the tools you use with your own approaches to querying the underlying data. SQL is like your second opinion generator – you can use it to validate and verify the aggregations and visualizations available in the analytics user interface.

Deep Dive

Fixed schema of the GA4 session

A solid example of an analytics system forcing you to adopt its schema is the “Session” definition of Google Analytics.

Sessions play a key part in Google Analytics. They bucket multiple events from the same user together, as long as they happened within a certain timeframe. Sessions are often called “visits”, too. Thus, they can be used as an analogy to a single “user journey” on the site or the app.

In Google Analytics 4, the definition of a session is very simple: a session expires after 30 minutes of no events collected from the user. In other words, when GA4 collects events from a user, they are always grouped into the same session until 30 minutes pass with no events collected from the user. At that point, the session ends. The next time the user sends an event to GA4, it will be part of a new session.

While this 30 minute timeout can be changed, it will always be the same for all your users.

The problem with this is how incredibly abstract and generic this definition is.

The custom approach groups the purchase journey into the first session and the video interactions into the second session.

It doesn’t take into account the interactions that happen within the session, nor does it consider visits from different sources (e.g. first visiting the site from a Google search and shortly after from a Facebook ad), and it doesn’t stop the session when a clear milestone is reached (such as a purchase conversion).

When you use GA4, you will always be subject to this definition of a session. If you want to break free of this abstraction, you need to directly access the underlying data to reconstruct this aggregation.

SQL syntax

For data queries, SQL syntax is quite readable. It relies on English language patterns rather than obscure programming shorthand.

In SQL, you SELECT data FROM a table.

Sometimes you can further qualify which source data is accessed by instructing the query to pull in data WHERE the data matches certain criteria.

transaction_table source
-- Count the number of purchases for each user
  COUNT(*) AS purchases
query result

Often, the data you query has relations to other data – either in the same table or in other tables. For example, you might have a table of purchases where each transaction is associated with a user_id. This user_id can also be found in a table of users with additional details about the user, such as their country or the number of times they’ve already made a purchase.

In these cases, you might end up building a query that SELECTs data FROM a table that is JOINed with another table for additional information.

user_table source
-- Count the number of purchases by country
  COUNT(*) AS purchases
  user_table JOIN transaction_table USING user_id
query result

In the query above, each purchase is associated with the country of the user who made the purchase by fetching the country for each user_id from the related user table.

SQL can get very messy very fast. Often, the readability, quality, and performance of any given query is directly related to the quality of the data being queried. If the data is messy, lacking structure, and spread across a number of tables, your queries will become structurally unviable, too.

In this way, your understanding of SQL can directly impact how you organize the data engineering efforts in your organization.

If you find it difficult to write efficient queries against your data, it might be a symptom of an unnecessarily complex data architecture in your organization. If you run into roadblocks consistently, you might need to consider altering the architecture to make querying the data less of a chore. After all, in many organizations it is precisely the querying of the data that should dictate its organization rather than cost or complexity of storage.

Don’t miss this fact!

If you find yourself struggling with creating a clean and readable SQL query, it might not be because of your lack of skills with the language. A potential culprit is messy source data that requires a lot of wrangling before it produces a clean result.

Digital marketing use cases

Where there is an analytics question, there is a SQL query.

In digital marketing, having access to the raw data generated by your marketing tools is invaluable. Couple this data access with decent SQL skills, and you can vastly improve your understanding of the efficiency of your marketing channels and how they interact with each other.

Here is a non-exhaustive list of things you can do with SQL when querying relevant data sources.

Customer Behavior Analysis

Segment audiences based on their actions on your sites and in your apps. Combine this with their purchase history and demographic information. Feed these segments into your campaign management tools for more control over audience building and targeting.

Query user data across online and offline data tables to build a more comprehensive look into their lifetime value to your organization.

Generalize observations across larger datasets that might suffer from gaps or patches of bad data.

Campaign Analysis and Optimization

Pull data from analytics, advertising, and social media data sources, and join it together around users and traffic sources. Analyze the performance of your marketing campaigns across different channels.

Calculate the return on investment (ROI) of your campaigns by aligning marketing spend with outcomes like sales and conversions across different channels.

Trends and Consumer Insights

Use machine learning to analyze patterns and changes in consumer behavior over time. Detect anomalies before they become disruptive problems for your organization.

Feed data from social media and customer reviews into dedicated systems that can gauge public sentiment towards your brand.

Personalization and Content Strategy

Analyze which type of content performs best with different audience segments. Design your content strategy and content calendars around these insights.

Create personalized and near-real-time product recommendations based on the user’s interactions with your brand both in the past and in the current session.

Fuel Marketing Integrations

Use SQL for automated data extraction, transformation, and reporting in supported marketing tools and platforms. Build scheduled pipelines that deliver the right type of the data at the right time to the right person.

Build useful dashboards using SQL on the source data in a data visualization tool that supports this approach.

As a technical marketer, you are most likely working with SQL on a daily basis. You might not type the queries yourself, but the tools and interfaces you interact with will most certainly query the underlying data using SQL.

SQL can be revelatory not only in how it validates and verifies the underlying data but also how it helps you restructure the data to be more performant when accessed through a query interface.

SQL is thus more than just a language – it’s an approach to organizing the data within your organization. The act of querying is the end result of a long chain of decisions that begins with data architecture design. At its best, SQL can be a powerful feedback loop into improving the data structures that your organization relies on.

Key takeaway #1: Use SQL to skip the analytics UI

Analytics tools build their user interfaces as abstraction layers on top of the raw data. Often, there’s a lot of processing, thresholding, sampling, and normalizing going on that turns the thing you see in the UI quite different from what is actually stored in the data tables. With SQL and raw data access, you can build your understanding of the underlying data. You can generate your own abstractions that work better with your business questions than those used by the analytics tool.

Key takeaway #2: SQL makes you a better measurement planner

While it’s tempting to think of SQL as just a query language, it’s also a way to structure the world, too. The more you understand about the declarative logic of SQL, the better you can organize your data to support the query use cases. Your knowledge of SQL can feed directly into how you build your measurement plans. Instead of looking at data as arbitrary rows and entities that hopefully make sense once collected, you can approach data collection from the POV of how you’ll eventually want to query the data. And here, a solid understanding of SQL will be instrumental in making your measurement plans more comprehensive.

Key takeaway #3: Digital marketing use cases

SQL is everywhere in digital marketing. From data analysis with data from tools like Google Analytics and Piwik Pro to running dashboard reports in Looker Studio or Tableau, from building machine learning models on complex datasets to starting complex data flows in a cloud data pipeline. There’s not a single data-related discipline in digital marketing that wouldn’t benefit from SQL knowledge.

Quiz: Structured Query Language (SQL)

Ready to test what you've learned? Dive into the quiz below!

1. What does the "JOIN" clause do in SQL?

2. Why would you use SQL if you already have an analytics tool that gives you all the answers?

3. What type of data is SQL best suited for?

Your score is


What did you think about this topic?

Thanks for your feedback!

Unlock Premium Content

Simmer specializes in self-paced online courses for technical marketers. Take a look at our offering and enroll in one or more of our courses!

Online course

Query GA4 Data In Google BigQuery

Learn SQL and how to put it to use with one of the most popular data warehouse systems around: Google BigQuery.
0/27 Topics completed

On the right track!

Complete all topics to claim your certificate.

Next Chapter