Skip to main content

Unlock the hidden potential of zipped csv files with Google Sheets

In digital marketing product data is often essential to make certain tasks or types of promotion possible. Publishers, advertisers and agencies often rely on simple text files containing product information or keyword lists to power their activity. That is sometimes harder than it should be when the file is ‘zipped’ or the file in in the ‘wrong’ format.

This guide will show you how to take back control of your data by using Google Sheets to import a product feed (zipped or otherwise), create a feed that is ‘perfect’ for your requirements and make it available for use elsewhere.

The example uses an affiliate product feed of refurbished phones, but it will work for pretty much anything.

The aim is to allow anyone with basic Excel or Google Sheets skills to get extra value from product data without relying on time consuming or expensive developers. If you can write simple Excel formulas and make basic pivot tables that is more than enough to create your own custom data source.

Note: If you just want the script, you can find it here: https://github.com/multiplicit-com/Sheets-Zipped-csv/.

What this guide will help you do:

  1. Load a file from an online location
  2. Unzip it (if required)
  3. Filter and alter the contents using a pivot table
  4. Create a live URL that will export the latest data as a csv file
  5. Make it update automatically

Requirements

You just need a few things to make this possible:

  1. A Google Sheets account
  2. An URL zip file which can be accessed from an online location
  3. Only one csv in the file
  4. The csv should be correctly formatted. It doesn’t matter how many rows or columns, as long as the layout is uniform

What is an AppScript?

AppScript is the scripting language used to unlock advanced features in all Google Workspace apps including Sheets, Docs, and Slides. It allows users to automate tasks, integrate with external data, and customize app behaviour. AppScript is largely based on JavaScript language runs on Google’s own servers so it is very easy for it to interact with Google services like Sheets and you won’t need an expensive server to use it.

Why are Zipped files so important? …Awin

Why is it so important to open zipped files? For affiliate marketers like myself the answer is just one word; Awin.

Awin (Affiliate Window for the old school) is by no means the only affiliate network on the market, but it dominates the UK and European markets and their feed system can only generate zipped files. There isn’t much point showing you how to do something like this if it doesn’t work for the biggest affiliate network in Europe.

If you don’t need to open a zipped file don’t worry; this method will work for normal uncompressed csv files as well.

Not just for affiliates

I approached this script from the point of view of a digital marketer, but it will work for almost anything.

If you are in the affiliate industry like me, it’s not just affiliates who might use this. In fact I originally wrote this script to improve the product feed for a merchant I was managing. You can use it to accept a raw merchant feed, process it, then supply it to the network for distribution.


The Apps Script

Google AppScript is scripting language that can manipulate your sheet in all sorts of ways. It is similar to VB scripts and macros in Excel, but because it is based on the JavaScript language it is even easier to understand. Arguably is is more accessible and because it runs in the cloud it could be considered more powerful too.

This very simple AppScript will load a zipped csv file into Google Sheets, with a few added features like sorting and adding extra columns containing formulas.

You don’t need to know how to code – just paste the script into Google Sheets and change one or two values, which are clearly labelled.

I originally included the script on this page, but as I improved it (and fixed the occasional bug) it needed a more consistent home. So now you can find the script on github: https://github.com/multiplicit-com/Sheets-Zipped-csv/.

Just copy the entire script from the Github page.

  • Lines that start with // have been ‘commented out’, and can be uncommented to add extra functionality (sorting and formulas).
  • Lines that start with //* are just notes and can be ignored. if you remove the comment tag for these lines the script will break.

Loading the script into Google Sheets

Step 1: Open the App Script Editor

Extensions > Apps Script


Step 2: Rename the App Script Project

Click “Untitled Project” at the top of the window to rename the project. This name will be used to manage access permissions, so make it descriptive enough that you will recognise it later.


Step 3: Copy the import function

You can find it on github.


Step 4: Paste the script into the script editor

Replace everything else in the window.


Updating and running the first import

To get the script running you simply need to review and change four lines of code. We’ll ignore the optional settings for now

The most important thing is the address of the product feed:

//* SPECIFY FILE URL TO DOWNLOAD
const TargetFile = 'https://site.com.co.uk/zip_file.zip';

The second line tells the script what type of import – whether it is csv, tab delimited, zipped etc.
Simply set it to ‘1’ if it is zipped, or ‘0’ if it is a natural csv file. If you are following the example in the video using an Awin feed you won’t need to change anything.

//* CHOOSE IMPORT TYPE 
//* 0=csv, 1=zipped csv, 2=tab delimted txt
const ImportType =1;

By the way, the code here was designed to be extensible – it is easy to add support for new formats if you need to.

Next, does the file you are importing include column labels on the first row? If not change this value to 0:

//* DOES FILE HAVE A HEADER ROW? 
//* 0=No, 1=Yes
const HasHeader =1;

The last thing to confirm is the destination for the imported data – which tab it should be inserted in.

//* Enter the name of the sheet/tab to import into
//* 'Sheet1 is the default for a new Google Sheet
const DataSheetName = "Sheet1";

If you didn’t change the name of the first tab in the spreadsheet you shouldn’t need to do anything. If you did change the default name of ‘Sheet1’ you need to update this row in the script to match. I don’t see any reason to change it, but if you want to you can.


Granting permissions

Now we need to give the script permission to run. The simplest way to do this is to attempt to run the script, and grant the necessary permissions when Google throws up an error.

Click ‘save’, then ‘run’. to execute the file for the first time.


Choose the Google account to provide access to.


You may receive a warning that this script has not been checked by Google and is not secure. Don’t worry, that is normal. This script really is so simple you shouldn’t be concerned – it doesn’t connect to any outside services and it uses the simplest of code.

Simply click “advanced” and click to continue.


Now review the permissions that the script is requesting and choose “Allow”.

You’ll know if it worked because the empty ‘sheet1’ tab in your sheet should now contain lots of data!


Processing

Now our data is available in Google Sheets. The next step is to do something with it. The obvious choice is to use a pivot tables to create a new, customised file.

What can you solve in the processing stage?

With a pivot table you can sort, filter and organise the data, and create new fields and columns using formulas (calculated fields). That means you can:

  1. Discard rows
  2. Combine several columns together
  3. Split or separate a column to extract one piece of information
  4. Match a field with a lookup table (vlookups)

The aim is to make a pivot table that looks exactly like the feed you want to create. In the next step we will simply export the tab or sheet containing the pivot and that will form your new csv file.


Calculated fields

It might be enough to use a pivot table to sort and filter the raw data, but sometimes you need to change the contents of a field. That’s where calculated fields come in.

Calculated fields in Google Sheets are actually simpler and easier to use than in MS Excel, especially when you are working with text content. You can even do Vlookups. The only real problem is that you have to squeeze the formula into a tiny text box, but hohum, can’t have everything.

In this method calculated fields perform two tasks:

  1. Showing a single value when there are several available
  2. Alter or combine fields using formulas

Adding a calculated value

  1. Find the ‘values’ section in the pivot table editor. Click ‘add’ and choose ‘calculated field’ (it’s at the top)
  2. Enter the formula
  3. Change the ‘summarise by’ method from SUM to ‘Custom’.

Your new formula should now show in the pivot table. You can rename it and change its position like any other value field.

Showing a single value

Like Excel, Google Sheets lets you put data in rows or columns of your pivot. However, that isn’t always a good idea. For example, if a product has 6 available colours, using values that are ‘too unique’ will create 6 rows in your pivot. It is common to just require one row and one link for all 6 colour options – the customer can always change the colour or size on the landing page.

In this instance you can use a calculated field. Is is super easy – simply choose a calculated field as described above and enter the field name you want to display after the = sign, like this:

=FieldName

The handy thing about calculated fields is there can only be one per row. Sheets will use the first value it can find, but there could be several possible candidates so make sure you read and understand the notes on sorting – this is the sort of situation where you can accidentally return the wrong results.

Also, don’t forget to change the ‘summarise’ method to ‘custom’ so Sheets doesn’t treat the value as a number.

Complex formulas

As a rule you can use any formula you would use in a normal Google Sheet or even Excel. There are a few limitations – ‘IF’ statements in particular seem to struggle when the output is a text string, but most formulas work just fine. If anything they are a bit more forgiving than in Excel.

I’m not going to spell out every possible formula – there is plenty of documentation online, but here are a few examples. If you have any special requirements please post them in the comments.

Perform a Vlookup against another sheet, default to the lookup value if there is no match:

=iferror(vlookup(field_nameA,Lookups!A:B,2,false),field_nameA)

Perform a Vlookup, but show a different field if there is no match:

=iferror(vlookup(field_nameA,Lookups!A:B,2,false),field_nameX)

Split a field at the first instance of a certain character (#):

=left(field_nameA,find("#",field_nameA)-1)

Combine number summary functions and text strings together:

="Colours: "&countunique(unit_colour)&" | Storage variations: "&countunique(storage_size)

//Output:
Colours: 5 | Storage variations: 3

Sorting

When you are using pivot tables it is very easy to manipulate the results. I regularly use the MIN() command to find the lowest price for each product. That is a really useful ability but it also gives your pivot table permission to use information from multiple rows.

When you use commands like this you have to be very careful not to accidentally combine several rows of data together.

For example, when using MIN() to find the lowest price the pivot table could actually be looking at the following cells:

The output might look correct at first but if you visited the link you would realise the price was wrong – it’s for a different product variation.

You can avoid this with careful pivot design and sorting. If your data uses commands like MIN() to find the lowest price, everything else should also be sorted by price (ascending) too. By sorting the whole data by price (asc) we can see everything now lines up:

All of this can be solved and implemented directly in the pivot table with some some careful filters and sorting, but the github script also makes it easy to sort the data at the point it is imported.

It is a bit of a blunt instrument but it can make make life a lot easier if the raw data is already in the correct order.

To sort at the import stage look for these lines in the script:

//* UNCOMMENT THESE ROWS TO SORT THE DATA
//* Just change the column numbers to what you want to sort by. A=1, B=2 etc.
//* this works best when the cell value is a number

//  var SortOrder = [
//     {column: 1, ascending: true}
//    ,{column: 2, ascending: true}
//    ,{column: 3, ascending: false}
//  ];

Simply uncomment the rows, delete any unused criteria and update the column number.

In this case we just want to sort by ‘price’, which is in my example data is column H, or the 8th column along.

var SortOrder = [
   {column: 8, ascending: true}
];

Next time you run the function, the imported data will be sorted. In this example the data will be sorted by price, ascending. Click Save and Run, and review the changed order in Sheet1.

Note that unlike a lot of Excel queries or standard JavaScript statements, the column numbers start at ‘1’, not ‘0’. So A=1, B=2, C=3 etc.

Even with pre-sorted data it is still easy to mismatch rows in a pivot table, just like in any spreadsheet. Make sure you test the output thoroughly before you use it in the wild.


Scheduling updates

Now your Google Sheet can load a product feed, unzip it and create a new feed customised to your requirements. It’s a really useful tool already, but it’s not much use if you have to manually update it to get the latest data.

Not to worry, the AppScript interface comes with an easy to use and very powerful scheduling tool that can automate the entire process.

There is also a much needed safety feature – the ability to receive instant notifications if the import process ever fails. You can choose to be notified instantly, or receive a combined summary of all errors on a daily or weekly basis. I usually choose to receive instant notifications, with a daily or weekly summary as well.

Step 1: Open the triggers screen

From the AppScript window choose “triggers” on the left hand side.


Step 2: Create a new trigger

Choose ‘Add Trigger’ to create a new schedule for your script.


Step 3: Customise the settings

You can choose whether to update the script when the file is opened, controlled it using a Google Calendar, or set a schedule.

The function should be called ‘LoadFeedZIP’ but if you have been tinkering that might have changed.

The scheduling options are very generous. You can even schedule updates every hour if you feel the need.


Exporting the feed

The final piece in the jigsaw is the ability to export our freshly polished data as a new csv file. This will usually be the tab containing the pivot table we created in the ‘processing’ section.

Step 1: Open the ‘Publish to web’window.

You can find it in: File > Share > Publish to web


Step 2: Specify what you want to share.

Change the options from “entire sheet” to just the tab containing your newly formatted data. In this example the tab is called “Output”.

The most useful export format for tabular data is ‘csv’, but other formats are available.

Make sure “automatically republish when changes are made” is ticked – that will keep your feed up to date when the data changes.


Step 3: Copy the live URL

Click ‘Publish‘ finalise your selections.

The URL in the text box is the final URL for your new product export.

This URL can be loaded into any platform and used as a standard csv file.


Conclusion

Congratulations! You now have a live URL for a dynamically updated csv file, tailored to your requirements. You can use Sheets to manipulate and explore the data or you could take that URL and use it in other platforms. That pretty much wraps up this short guide.

Anyone who is reasonably comfortable using Google Sheets or MS Excel you should be able to use this method to create custom data sources that are tailored to your own requirements.

If people find it useful I might follow up with more examples and ways to use it.

The scheduling and notification features in the Apps Script screen are really powerful and they are what make this method possible. They may appear quite simple but they are actually more flexible and useful than a lot of professional data processing tools.

It is really amazing that Google makes it available for free. The ability to send error notification emails doesn’t appear very exciting but it is especially important – without an effective way to spot issues processes like this invariably break and get forgotten over time.


Tech stuff

This script could be made much more advanced, but I have kept it simple to make sure as many people as possible can use and understand it. If you would like to see a more advanced version or have any suggestions to improve it please let me know in the comments.

My old computer science lecturer would call this a ‘mundane’ process. That’s not as bad as it sounds – it’s just a computing term that means it is ‘same generation’ solution that uses existing technology and skills without reinventing the wheel. Critically, you should have all the tools you need straight away – Sheets is Free to use and product feeds are easily available. Why make things more complicated than they need to be?

How it works

The script is basically made possible by applying the ‘getrange’ parameter in various ways to control what to change, what to copy and what to paste. You’ll note that first min/max size of the entire sheet for operations like clearing old data before switching to the range size of the imported csv data for sorting and adding extra columns.

If you’d like to update it or improve the script feel free – if you make something exciting please consider sharing it in the comments.


Why use pivot tables?

They are non destructive

A pivot table simply looks at a data source and presents it in different ways. They don’t actually change anything so the accuracy of your source data is preserved.

They work with unlimited rows

You could simply create a new column next to your imported data and manually create some formulas. However, as soon as the file changes and rows are added or deleted there will be missing or extra values in your output file. You would have to manually drag the formulas every time the data changed. Using a pivot table to perform sorts, grouping and calculated fields means you don’t lose anything if the size of the table changes.

They are an introduction to actual programming

A pivot table is simply an interface to some quite advanced grouping, sorting and filtering methods, and a really good way to learn what SQL can achieve. Eventually you will outgrow Google Sheets and want to migrate to a tool like Data Studio, PowerBI or even Access, and a solid understanding of pivots will make that so much easier.

If you don’t want to use pivots…

The query command is very powerful and arguably more flexible than pivots. There is a bit of a learning curve, but it lets you do things like merge several data sources and perform advanced grouping. Don’t discount pivot tables though – if nothing else they are good for prototyping your query command.


Limitations

Size limit

Google Sheets has a limit of 10 million cells in a sheet. Bear in mind that is cells, not rows – it’s a big number but it is possible for a very large file to hit that limit.
If you hit that limit you will see an error like this. If you activated error notifications in the ‘triggers’ screen you will also receive this information in an email.

If you do hit that limit consider removing unnecessary columns from the input file. Sometimes that is easy – I built this system to use Awin product feeds which let you specify which columns to use, but other times it may not be possible. It really depends where you get the data file from.

Badly formed data

There is very little error checking in this script so it does rely on being provided with clean, well formatted csv data. If you are using a csv file that was machine generated (like most affiliate product feeds), then you shouldn’t have anything to worry about. Likewise, if you use a pivot table to create the output the result should be equally well formatted.

Update frequency

I have never come up against any usage limits, but there isn’t much use setting the script to update every hour if it doesn’t need to. If it is a particularly large file it could uses a lot of your networks bandwidth for no reason.

The product data I used in the example video is updated once per day, so to be thorough without going over the top I set the Apps Script trigger to fetch every 12 hours.