Blogs

Big Data

Big Data - the next Big thing..?

  

 

so what is Big Data?

  

 

Every day an increasing amount of information is gathered and stored by all manner of websites, companies, organisations and products.  We record data on everything from weather stations to shopping habits, social media to traffic trends. The volume of this data is growing at an alarming rate as we develop more complex programs for recording data down to the smallest detail for analysis, reporting and business intelligence, and the type of data that we can record is also evolving. 

 

 

Every day, we create 2.5 quintillion bytes of data

 

 

The growing need for "trending" means that not only does an individual customers details now need to be stored, data for all customers is now required to be accessed quickly and analysed as a whole, to find trends and patterns across the whole collection of data whether that be thousands, millions or trillions of values to be analysed

 

 

90% of the data in the world today has been created in the last two years alone

 

 

Structured/UnStructured Data
Data now also does not only mean values and figures that can be represented in spread sheets and graphs as previously favoured. "Data" can now be stored as images, documents, emails, text messages, audio, video, etc.. These various data types are commonly known as “non-structured” or “unstructured”, ie: unable to be stored and analysed in a typical relational database.  This is the real issue faced by todays companies, the data they have stored is not able to be analysed in the traditional way, but simply ignoring this data would be foolhardy

 

 

Many organizations are becoming overwhelmed with the volumes of unstructured information — audio, video, graphics, social media messages — that falls outside the purview of their “traditional” databases. Organizations that do get their arms around this data will gain significant competitive edge

 

 

For example in the past medical records would be held in vast paper files containing Xrays, reports, doctors notes, etc.. Now those Xrays are recorded electronically and can be accessed from a doctors PC without the need for any physical document store.  One hospital will store and record thousands and thousands of Xrays, one hospital group will have millions. Big Data is the means of retrieving this information easily and comparing and analysing this data in a way which has never been possible previously

 

 

“We are only looking at what we have in our data warehouses, it’s not going to be enough for us to get the insights that we need. If you’re a retailer and you were not using all the information you could to judge your customers’ buying patterns, then the retailer across the street probably will, and they’ll steal your customers. That’s the realization, I think, that drove a lot of people to think that they should be capturing much, much more” 

 

 

Data Timeline
The other demand for data is timeliness.  The demand for “real-time” data is increasing, data that is 24 hours or even 1 hour out of date could be the difference between a successful business decision.   The accuracy of the information received often depends on the age of the data analysed, for example a hotel needs up to the minute room allocation data in order to receive new customers.  If this data was 10 minutes old then a room could be allocated twice, or customers turned away when a room is actually available.  For Credit Card companies the ability to recognise fraudulent activity quickly and stop the stolen card saves them many thousands of pounds
Real-time analytics is the big demand.  The Holy Grail is “getting and making effective use of information as it happens.”  Whoever can crack this will be the forerunner

 

 

'Data delayed' is 'Data Denied' 

 

 

Velocity – Often time-sensitive, big data must be used as it is streaming in to the enterprise in order to maximize its value to the business 

 

 

So now we understand that “Big Data” is the means of accessing and analysing large quantities of data regardless of size, format or timestamp

 

 

The next question is how do you compare millions of images to find trends and patterns? Originally in a "structured" database this would not be possible (useable), as a relational database is designed to analyse data, not images. For the latter we need a platform to query "non-structured" or “semi-structured” data.  New technologies are now emerging for the storing and analysing of this large volume structured and non-structured data, but it is still early days

 

 

Platform:
NoSql (short for Not Only SQL) is the term used to describe non-relational databases that can handle both structured and unstructured data.  There are several contenders for big data databases, some use no SQL at all while others utilise certain areas but avoid joins, etc..

 

 

Contenders:
Apache Cassandra is the most common NoSQL database, as used by Facebook
Apache Hadoop (to be integrated with SQL Server 2012 – more on this in the next blog)
SimpleDB
Google BigTable
MapReduce
MemcacheDB
Voldemort

 

 

It has been predicted that “Organisations that will leverage the new data types will outperform their peers by 20% by 2015” - If you ignore big data, your competition will not

 

 

Once Big Data takes off the possibilities opened up will create a lot of changes.  New cars fitted with “smart boxes” recording data about your driving could be accessed by Insurance companies, with lower premiums offered to those drivers who can prove they are a safer risk.  Medical data compared for millions of patients to identify high risk groups for certain illnesses or diseases, and tests developed to pick up the warning signs earlier for those most at risk. GPS data analysed to predict traffic volume and flow at peak times, Event data built into our satellite navigation to warn of possible delays. Live streaming from traffic cameras to our iPhones’.  Facebook photos used to identify benefit fraudsters

 

 

The realization that time to information is critical to extract value from data sources that include mobile devices, RFID, the web and a growing list of automated sensory technologies

Recruitment Activity By Consultant Report

Spring has sprung the grass has riz.. I wonder where my data is....

 

Have you ever wondered how the data on the Recruitment Activity By Consultant is populated? I have.. so I had a look!

 

When you first run the report you are asked for the Date range   

This Date Range is then used to display the information in the Headers below:

 

New Candidate/CV's This currently shows all Contacts with a Date First Registered between the From and To Dates selected where the CV field is not empty, the Consultant they are marked against is the Owner of the Contact (shown under Codes tab of the Contact record)

Updated Candidate/CV's This currently shows all Contacts with a Date Last Updated between the From and To Dates selected where the CV field is not empty, the Consultant they are marked against is the Owner of the Contact (shown under Codes tab of the Contact record)

Tel Calls Made This is updated when a “Telephone Call” or “Client Call” task is recorded by a Consultant (ie: when running a “Call” action), if the call is a Scheduled Task it will not count against this figure

New Requirements This shows all Requirements where the Date Initiated is between the From and To Dates, the Consultant they are marked against is the Owner of the Requirement

CV's Sent This is updated when a “CV Send” task is recorded by a Consultant (ie: when running a “CV Send” action), if the task is a Scheduled Task it will not count against this figure

Interviews This is updated when a “Interview Booked” task is recorded by a Consultant (ie: when running an “Interview” action), if the task is a Scheduled Task it will not count against this figure

Offers Made This shows all “Offers Made” between the From and To Dates, the Consultant they are marked against is the Owner of the Requirement

Offers Accepted This shows all “Offers Accepted” between the From and To Dates, the Consultant they are marked against is the Owner of the Requirement

Offers Rejected This shows all “Offers Rejected” between the From and To Dates, the Consultant they are marked against is the Owner of the Requirement

Withdrawn This shows all “Withdrawn” between the From and To Dates, the Consultant they are marked against is the Owner of the Requirement

Placements This shows all Placements where the Date Initiated is between the From and To Dates, the Consultant they are marked against is the Owner of the Placement

 

I hope you are now as enlightened as I am

SQL 2008 R2 Performance Issues

One of our largest customers has attempted to move to SQL 2008 R2 as part of a Hardware and Software upgrade project, however we have since discovered certain issues with the SQL query performance of SQL 2008R2 against older versions of Microsoft SQL

Thankfully the customer in question has employed a very helpful and willing IT Team who have assisted in re-installing the new SQL Server machine with the older version SQL Server 2005 SP4 using an idenitical database in order to prove for certain that the performance issues are solely down to the Microsoft SQL Server version

Hotfixes have been suggested online:

 

Original post suggesting Hotfix for SQL Server on Windows Server 2008 R2

unexplained slowness

Hotfix from Microsoft

http://support.microsoft.com/default.aspx?scid=kb;EN-US;976700

 

However we did not experience any improvement in performance after installing these.  We will continue to research and investigate these issues with our customers, but in the meantime based on this experience we would hesitate to recommend SQL Server 2008 R2 for running TriSys software until this issue has been acknowledged and resolved by Microsoft


Archives
Search