Last year (sounds strange writing that already) on the 20th December we blogged about a potential Google Penguin roll back. We had noticed some chatter on Twitter and the webmaster forums which Barry Schwartz highlighted from around the 15th December in the US.
Some site owners and SEO’s were reporting their organic rankings rolling back to as they were pre-penguin 4.0 roll out.
Whenever we see this type of chatter we always keep and eye out for what it may bring in Google.co.uk result as the US usually sees this stuff just before we do.
Anyway in summary we suggested that this was probably a dataset issue and would resolve itself. The last time this happened which was around the 10th October saw very similar fluctuations.
Was it a Penguin Roll Back?
As we suspected no it doesn’t look that way. Last week we started to notice results settling down, no additional link building, no on page changes they just fixed themselves.
So what is a dataset error and why does it cause my rankings to fluctuate so wildly?
Okay so first thing i’m going to say is this is purely my own personal thoughts and Google as far as I am aware have never said anything about dataset issues. But coming from a career originally in IT I am familiar with how databases and systems work especially when making updates to software.
So let my try and give you my understanding of Google dataset errors:
You have a server which has to communicate with many different workstations and devices on its network.
Quickly your server becomes overloaded so you need more servers, you upgrade from a single server to a cluster of servers also known as a datacenter. Your server is running great and your devices are able to GET all the data and information they need held on your servers (datacenter)
But one day you decide that your email software needs updating and there are some security patches that need to be installed, but before you can do this server #1 lets say fails, and the rest of the network is hit with a DDOS attack and your entire network goes down. All hard drives fail except one redundant NAS, data is corrupted from that day. Fortunately you’ve preempted this and have a cloud server backing everything up at 11:59pm the previous day.
Now you have to rebuild your network and get them back online so your business can operate.
Servers (datacenter) back up and running but Billy in human resources can’t find his emails from 7pm onwards. What happened to the data from after the last backup happened?
Well fortunately the cloud has everything backed up and it’s just a case of downloading the backup and migrating throughout your server network. The servers feed the desktop and mobile devices but there’s a lot of them so it takes a day or two.
This is a dataset error. Data missing from the last know update/backup.
Another simplified way to think of it is your PC dies and you have to system restore, the last data you had was from 2 weeks ago. You have to use the cloud or external drive back ups to roll back the last known data, but again it takes time to update (due to the size of the data payload)
Okay making sense so far?
Now lets take the first example of 1 datacentre and 7 servers to thousands and thousands of servers globally. This now takes it from a few terabytes of data on a small network to what is believed to be 320PB of data to handle the current amount of data stored on Google’s network.
What is a PB or Petabyte?
86MB * 4 (for profiles) * 1,000,000,000 = 320PB
To store all this data YouTube needs to have as of today at least 320PB of storage. From that we can estimate that they have roughly around 400PB in storage currently allocated for storing YouTube videos.
A petabyte (PB) is 1015 bytes of data, 1,000 terabytes (TB) or 1,000,000 gigabytes (GB).
DAMN! That blows your mind doesn’t it?
So next time we see dataset errors in Google’s data it could be something as simple as Google changing a font or colour on its search facility that means last known backed up data is shown until the most latest search algorithm data is transferred across.
This also explains why most of the time these dataset glitches only last for 2-3 weeks.
Google Algo updates we know happened around 15th December 2016
For anyone who missed this, apparently it was reported that there was a problem with data for results being pulled for anyone search ‘did the holocaust happen?’. The answer apparently was no and in fact Hitler didn’t do any of these things according to the organic search results.
Thankfully common sense at Google HQ prevailed and this was updated in the search data to show the truth and history we all know.
Google mobile first algo update being tested