GTconsult Craig banner

Extracting Unstructured data that should be structured

You know those days when you open up SharePoint and suddenly realise that there is a whole lot of content that should be in Content Types but isn’t. So lets say you have a number of custom Contact Lists all over your site (or in a number of subsites that have been saved as a template). And NOW you want to get all of this data into Excel….

No – I hear you say, we always devise a solid information architecture and governance plan before we store any data… well that is nice for you, but for most mortals who have strung together their intranets on a shoestring budget and learnt as they have built (which is most organic intranets I have seen) you will encounter this issue at some point. This is where we learn to start Extracting unstructured data that should be structured.

So what are our options:

  1. We can go site to site and extract the data – this is the long and laborious approach and if like the example I saw where there were over 140 sites… this is the really long approach
  2. We can write a powershell script to navigate through the structure and extract the data – this is a great approach if you are:
    • well versed in powershell
    • using an on-premise solution
  3. Write a piece of custom code that iterates through all the sites using jQuery and the Rest API – this is a solution you can use on any frontend, either on-premise or O365 and can be adjusted to different means, in my case I wanted the results in a table that I could export to excel, but I might want to report on the data or mash it in some other solution

So lets have a look at option 3 in detail
Option 3 could essentially be managed in two ways using a rest call, the first is using the REST API SEARCH functionality to build a search query and return the results from your search, this approach requires less lines of code and less iterations and will generally be better from a performance point of view. We use this approach a lot in other scenarios, HOWEVER, if your information is not being crawled this is not going to work. The second approach is a little more invasive and requires a number of REST calls to multiple lists. This gets a bit technical, but basically my steps are:

  1. Identify all my sites (in my case I luckily had a list of all the sites, but this could essentially be pulled from your subsites using REST as well
  2. Iterated through each site and extract the data from the Contacts list
  3. Write the information to a table (or do whatever you want to with the data

Okay so lets look at the code:
We start off creating a div in our page called ContactContent:
<div id=”ContactContent”></div>
We then include any required JS Libraries: Jquery, DataTables, etc
<script src="js/jquery-2.2.3.min.js"></script>
<script src="js/jquery.dataTables.min.js"></script>

We then initialise the jQuery code in our page

//globals variables
//an array to store all urls for later calls
var AllURLS = [];
//An variable to store our HTML
var shtml = '';
$(function () {
//Calls the function Initialise()

The GetList function is used to extract data from a list using the REST API, this is a multipurpose list that can we used all over your code (we suggest building up an assets library with these types of function for ease of use all over the site). Our function uses an AJAX call to request data from a list using the parameters of the List Name, the Items we which to select (i.e the column names), what we need expanded in the REST call and what filter criteria we are applying, finally you can pass a Site name for where the list resides, or leave it blank for the current site
function GetList(list,items,expand,filter,app){
//set some variables for your expand and filter statements
expand == "" ? expand="" : expand='&$expand='+expand
filter == "" ? filter="" : filter='&$filter=('+filter+')'
//we use the TOP parameter to overwrite SPs default of 100 items in a REST API call
return $.ajax({
url: app + "/_api/web/lists/getbytitle('"+list+"')/items?$Select="+items+'&$Top=50000'+expand+filter,
method: "GET",
async: true,
headers: { "Accept": "application/json; odata=verbose" }

The initialise function basically generates our table and set up the calls for generating the content
function Initialise(){
//build our datatable structure with all the fields we need as headers, but leave body blank
shtml = '<table id="example"><thead><tr>'+

//Add the table to our ContactCentent div above

//call function to get ALL client site URLs from our site
var clients = GetList("Client%20List","SiteURL","","","");

//iterate through all clients, check is a URL exists and if so push the URL to our array set up above
$.when(clients).done(function(cData) {
$.each(cData.d.results, function(i,result) {
result.SiteURL != undefined && result.SiteURL != null ? AllURLS.push(result.SiteURL.Url) : console.log('Skipped client')

// Get Contacts for each of the sites added to the array

Get Contacts allows us to use the array we generated and perform multiple REST API calls to each of the contact lists we are wanting to interrogate:

function getContacts(AllURLS){
$.each(AllURLS, function(i,result) {
var contact = GetList("Contacts","Title,FirstName,Company,WorkPhone,HomePhone,Email","","",result);
$.when(contact).done(function(coData) {

Finally for each Contact list we call we are going to add a line to our table

function addCont(coData){
$.each(coData.d.results, function(j,result) {
shtml = '<tr>'+
//Add an item to the body of our table
$("#ContactContent tbody").append(shtml);

Our table is now built. If we want to add functionality to it, like generate a datatable to display the data, apply filters, allow sorting or even adding HTML 5 buttons to export to excel or generate a PDF you simply run the following code:
table = $('#example').DataTable({});
Within the Datatables code you can provide multiple functions – these are all available from the API at – DataTables – Jquery Plugin

Hope this comes in handy.

Craig Tarr
Craig Tarr
COO, lover of jQuery, JavaScript and all manners of application generation, Nintex vTE and advocate of everything out of the box.

Leave a Reply

Your email address will not be published. Required fields are marked *