Football: A Data-Driven Evolution

Brought to you by Cambridge Spark, Data Science Specialists. Written by guest blogger, Manja Bogicevic.

In just 10 minutes, 16 players with 6 balls can produce almost 13 million data points!

The origins of football go way back from the beginning of the time, some would say. These days communities, families and friends around the whole world are coming together to watch the World Cup 2018, hosted in Russia. The fascinating fact is that each game will also generate millions of data points and events. Sports such as basketball and tennis have a long history of using data. Now football is the latest sport to become data-driven.

Data-driven scouting is on the rise

One way in which football is strategically adopting data is by collecting various data points to scout for new potential signings. Arsenal paid over £2million for the US company StatDNA [1], whose data has since been used to advise their signings.

The data collected on players is used to build a database within which the club can search for players with the best potential to join their team. The big profits made on these players shows that using data can also have financial benefits.

Photo credit: Fauzan Saari

Betting using data

Data scientists are implementing an analytical approach to bet on football matches [3] and brought the same mindset when advising clubs which players will be the best fit for them to invest in.

Data scientists defined the salaries of players based on 55 metrics (from goals scored, to aggression and ball control) and compared this to the actual salaries from the previous year to reveal overpaid and underpaid players [4]. This method could be used in any industry where there are identifiable attributes in order to determine fairer wages. We could also resolve the issue between man and women salaries as well.

Training with wearables

Players nowadays use wearable technologies [5], and balls are fitted with sensors [6], providing real time performance statistics to clubs. Information collected can include distance covered on the pitch, passes completed. These insights can help managers decide who has performed well enough to earn a place in the starting eleven for the next match.

Collecting data during training can also prevent injuries, increasing the likelihood of the team having of a successful season. Hoffenheim having been playing in the German Bundesliga since 2008 and data science from their partner SAP forms a vital component of their training sessions [7].

Real time tactics on the World Cup

Data collated from training can be used as statistics from previous games to aid tactical decisions before and during the match [8]. Many clubs work with data, not just from their past matches but also from their opponent’s matches, allowing teams to be strategic in matches against their opponents.

Data science orientated solutions are also extremely powerful during the match itself. Coaches can receive a half-time report, thanks to Data Scientists. Another benefit of a club becoming more data-driven is that algorithms can reveal insights that human statisticians would most likely miss.

How can a football club not rely on data?

Did you know that athletes are not only monitored by cameras in stadiums, but also by many smart devices such as heart rate sensors and even local GPS-like systems? Given the success story of the data-driven football club which I previously mentioned, it’s normal that an increasing number of clubs will follow this pattern. The physiological monitoring service collects and transmits information directly from the athlete’s bodies, including heart rate, distance, speed, acceleration and power, and then display those metrics live on an iPad. All this information is made available live on a device to coaches and trainers on the sideline during training session, as well as post-session for in-depth analysis.

Interestingly enough, analysis of the data can help identify the physically fit players from those who could use a rest. Many football leagues and clubs also collaborate with Opta, a leading provider of football sports data. Opta can determine every single action of a player in a specific zone on the field [9], regardless of whether he has a ball or not. It can also measure the distance the player runs during the course of a game. There are more than 100 match-specific statistic categories [2], for instance shots, goals, assists, yellow and red cards, won and lost duels and also some lesser-known categories.

It’s not uncommon to read things from other Data Scientists like “I’ve included all players who started more than 10 games, have more than 4 years in the league, didn’t miss time due to injury, and stayed with one team the entire time.” This often amounts to selecting on the dependent variable and biases your results. A lot of people don’t realise that many of the problems in sports analytics are just specific substantive examples of commonly occurring modelling problems. I’m hoping to change this. Data Scientists encounter this problem all the time. We also want to express how confident we are about those summaries and estimates.

If you want to work in data analytics in sports, go for it. It will be the new big thing. I’d also say not to get discouraged. This stuff is hard and it takes a lot of practice and willingness to make mistakes and be wrong before you get it right. And, if I had one single piece of advice — practice matrix algebra. I’m still learning and making mistakes. But I am learning every day. I now know more then yesterday. If you learn Python every day for 1 hour you will be an expert in that field in two years.

Until next time,


Follow me on LinkedinInstagram and Medium

The future of Law firms and the Legal sector: 4 AI trends in the Law profession

According to Deloitte, 100,000 legal roles will be automated by 2036. They report that by 2020 law firms will be faced with a “tipping point” for a new talent strategy. Now is the time for all law firms to commit to becoming AI-ready by embracing a growth mindset, set aside the fear of failure and begin to develop internal AI practices. There are many who believe innovation is the key to transforming the legal profession. That’s precisely what we PyperAI“the first legal technology venture created by a law firm,” plans to do. When professional sector faces new technology, questions arise regarding how technology will disrupt daily operations and careers. Lawyers and the legal profession are no exception.

“Can machines think?” Let’s expand this question asked by Alan Turing in the 50s. The countless disaster scenarios, in which artificial intelligence (AI) takes over the world and destroys humanity, are already made-up and still being told in Hollywood.

AI has not yet taken control of humanity, but it has indeed taken control of many aspects of our lives even if we do not perceive it as such. We accept AI as a part of our lives. The simplest example is our smartphones! Let’s dig deeper.

The role of Deep Learning

Over the past 7 years, the sub-area of AI is deep learning. Deep learning is more successful than humans especially in processing visual data and analyzing images from the images, what objects or living things exist, relationships with each other, event estimation, object/person tracking, etc.

Deep learning includes AI models that generate the most successful results in the application areas of recent years, based on artificial neural networks and requiring a lot of processing power.

How do NL Systems Learn Language?

Models used for natural language processing are also within the scope of deep learning. Using natural language processing models, we can parse millions of data files loaded into the computer by class. In this process, the system learns the relationship between words from all the documents and is able to predict that the word ‘carrot’ comes after the word ‘rabbit’ with higher probability than the word ‘sun’. AI can estimate this due to the fact that the words perform meaning analysis based on their statistical status in sentences. It is possible to summarize or classify a long paragraph, including time-space information from the single sentences.

Leibniz: The First Lawyer to Predict the Use of Machines in Law

Leibniz, who is one of the grandfathers of AI, was a lawyer and said: ‘It is unworthy of excellent men to lose hours like slaves in the labor of calculation which could safely be relegated to anyone else if machines were used.’

In 1673, he presented the machine for four arithmetic operations in the UK. Leibniz says ‘The only way to correct our reasoning is to make them as tangible as the mathematicians’ so that we can find our error at a glance, and when there are disagreements between people, let’s calculate and see who is right!’So, let’s think, why shouldn’t it be possible for machines to complete all steps of the event chain which occurs in a lawyer’s mind while they are deciding?

Why couldn’t the machine do it? Why can it not calculate who is right in the dispute between people or how to find the middle way? Isn’t that a ‘robot mediator’? These questions belong to the 17th century! I would like to point out, and we are at the MIDof 2019!

AI vs Lawyers

In June 2018, AINOW — a research institute examining the social implications of AI — convened a workshop with the goal of bringing together legal, scientific, and technical advocates who focus on litigating algorithmic decision-making across various areas of the law (e.g., employment, public benefits, criminal justice).

They structured the day with the practical aim of discussing strategy and best practices while also exchanging ideas and experiences in litigation and other advocacy in this space. The gathering included several of the lawyers who brought the cases alongside advocates, researchers, technical experts, social scientists, and other leading thinkers in the area of algorithmic accountability.

How will AI impact the legal profession?

Manja says look at these 4 AI trends for the legal profession:

1. Review documents and legal research

AI-powered software improves the efficiency of document analysis for legal use and machines can review documents and flag them as relevant to a particular case. Once a certain type of document is denoted as relevant, machine learning algorithms can get to work to find other documents that are similarly relevant. Machines are much faster at sorting through documents than humans and can produce output and results that can be statistically validated. They can help reduce the load on the human workforce by forwarding on only documents that are questionable rather than requiring humans to review all documents. It’s important that legal research is done in a timely and comprehensive manner, even though it’s monotonous. AI systems such as the one that we are developing PyperAI leverages natural language processing to help analyze documents.

2. Better perform due diligence

In law offices around the world, legal support professionals are kept busy conducting due diligence to uncover background information on behalf of their clients. This works includes confirming facts and figures and thoroughly evaluating the decisions on prior cases to effectively provide counsel to their clients. Artificial intelligence tools can help these legal support professionals to conduct their due diligence more efficiently and with more accuracy.

3. Contract review

A big portion of work law firms do on behalf of clients is to review contracts to identify risks and issues with how contracts are written that could have negative impacts on their clients. They redline items, edit contracts, and counsel clients if they should sign or not or help them negotiate better terms. AI can help analyze contracts in bulk as well as individual contracts.

4. Predict legal outcomes

AI has the capability of analyzing data to help it make predictions about the outcomes of legal proceedings better than humans. Clients are often asking their legal counsel to predict the future with questions such as “If we go to trial, how likely will it be that I win?” or “Should I settle?” With the use of AI lawyers are able to better answer such questions.

Until next time,



If you are interested in what we are developing for Law Firms and Legal profession contact me on Linkedin or Instagram or schedule a call with me here


1. My mission is to become #NextForbesUnder30

2. I am one of the first Women Machine Learning Entrepreneurs in Serbia

3. I have run 4 half-marathons in Belgrade

4. We are developing PyperAI to help lawyers reduce time and risk and focus on making more deals

5. If you need help on your ML or AI project, contact me or my team

What is Deep Learning?

Just over 20 years ago people didn’t even know what the internet was. Today we can’t even imagine our lives without it. Today I am going to give you a quick overview of what deep learning is and why it’s picking up right now.

And the reason why we are going back to the past is that neural networks along with deep learning have been around for quite some time and they’ve only started picking up now and impacting the world right now. But If you look back at the 80s you’ll see that even though they were invented in the 60s and 70s they really caught on to a trend or called the cold wind in the 80s so people are talking about them a lot. There was a lot of research in that area and everybody thought that deep learning or neural networks were these new things that are going to impact the world. That is going to change everything. That is going to solve all the world problems.

What happened? Why did the neural networks not survive and not change the world then? The reason for that is that they were just not good enough. They are not that good at predicting things and not that good at modeling.

Trending AI Articles:

1. How ethical is Artificial Intelligence?

2. Predicting buying behavior using Machine Learning

3. Understanding and building Generative Adversarial Networks(GANs)

4. AI & NLP Workshop

Or is there another reason?

Well actually there is another reason and the reason is in front of us. It’s the fact that technology back then was not up to the right standard to facilitate neural networks in order for neural networks and deep learning to work properly. You need two things:

  1. data
  2. strong computers to process that data

So let’s have a look at how data or storage of data has evolved over the years and then we’ll look at how technology has evolved.

Here we got three years 1956, 1980 and 2017.

How much did storage look back in 1956? Well, there’s a hard drive and that hard drive is only a 5GB. That’s five megabytes right there in the first picture and it is the size of a small room. In the first picture that’s a hard drive being transported to another location on a plane. And that is what storage looked like in 1956. In 1956 you had to pay two and a half thousand dollars of those days dollars to rent that hard drive to rent it not buy it, for one month.

In 1980 the situation improved a little bit. So here we got a 10-megabyte hard drive for three and a half thousand dollars. It is still very expensive and only 10 megabytes. So that’s like one photo these days. And today in 2018 we’ve got a 256 gigabyte SD card for $150 which can fit on your finger. And if you’re reading this blog a year later or like in 2020 or 2025 you probably laughing at us. All because by then you have even stronger storage capacity.

But nevertheless, the point stands. If we compare these across the board and we even taking price and size into consideration, so from 1956 to 1980 capacity increased about double. From 1980 to 2013 a huge jump in technological progress. And that stands to show that this is not a linear trend. This is an exponential growth in technology and If we add into account price and size you will be in the millions of increase.

And here we actually have a chart on a logarithmic scale.

If we plot the hard drive cost per gigabyte you’ll see that looks something like this. We’re very quickly approaching zero. Right now you can get storage on Dropbox and Google Drive which doesn’t cost you anything. Over the years this is going to go even further. Right now scientists are looking into using DNA for storage. And right now it’s quite expensive. It costs $7000 to synthesize two megabytes of data. But that kind of reminds you of this whole situation of the hard drive and the plan you know that this is going to be very very quickly. 20 years from now everybody’s going to be using DNA storage If we go down this direction. And here are some stats on that so you can explore it further. And basically you can store all of the world’s data in just one kilogram of DNA storage or you can store about 1 billion terabytes of data in one gram of DNA storage.

That’s just something to show how quickly we’re progressing and that this is why deep learning is picking up now. We are finally at the stage where we have enough data to train super cool and super sophisticated models. Back then in the 80s when it was first initially invented just wasn’t the case. And the second thing we talked about is processing capacity.

Here we’ve got an exponential curve again on a log scale. This is how computers have been evolving. This is called Moore’s Law, you’ve probably heard of it. You can see how quickly the processing capacity of computers has been evolving.

Right now we’re somewhere over here where an average computer can be bought for a thousand bucks at the speed of the brain of a rat. Between two and five years it will be the speed of a human or 20:23 and then by 2050 or 2045, it will surpass all of the humans combined. Basically, we’re entering the era of computers that are extremely powerful that can process things WAY faster then we can imagine. All of this brings us to the question: What is deep learning? and what is this whole neural network situation? What is going on? What are we even talking about here?

Geoffrey Hinton

This gentleman over here Geoffrey Hinton is known as the godfather of deep learning. And he did research on deep learning in the 80s. He’s done lots and lots of research papers. He works at Google. So a lot of the things that we’re going to be talking about actually come from him and you can see a lot. He’s got quite a few YouTube videos. He explains things really well so I highly recommend checking them out.

And so the idea behind deep learning is to look at the human brain. This is going to be quite a bit of neuroscience coming up. And in these blog and ones coming up what we’re trying to do here is to see how the human brain operates.

You know we don’t know that much. You don’t know everything about the human brain but that little that we all know we want to mimic it and recreate it. And why is that? Well because the human brain seems to be one of the most powerful tools on this planet for learning, adapting skills and then applying them. If computers could copy that then we could just leverage what natural selection has already decided for us. All of that kind of algorithms that it has decided are the best which are going to leverage that. Why reinvent the bicycle ride? So let’s see how this works.

Here we’ve got some neurons so these neurons which have been smeared onto glass and then have been looked under a microscope with some coloring.

And this is you can see what they look like. They have a body, they have these branches and they have like tails and you can see that they have a nucleus inside in the middle. That’s basically what a neuron looks like in the human brain.

There are approximately 100 billion neurons all together so these are individual neurons. These are actually motor neurons because they’re bigger. They’re easier to see but nevertheless, there are a hundred billion neurons in the human brain. And it is connected to as many as about a thousand of its neighbors. So to give you a picture this is what it looks like. This is an actual data section of the human brain.


This is the cerebellum which is this part of your brain at the back. It is responsible for keeping a balance and some language capabilities and something like that. So this is just to show how works. How many neurons there are like billions and billions and billions of neurons all connecting. It’s like we’re talking about five or five hundred or a thousand or millions billions of neurons in there. And so that’s what we’re going to be trying to recreate. How do we recreate this on a computer? Well, we create an artificial structure called an artificial neural net where we have nodes or neurons and we’re going to have some neurons for input value so these are values that you know about a certain situation.

So, for instance, you’re modeling something you want to predict something you always could have some input something to start. Your prediction is off then that’s called the input layer. Then you have the output. So that’s of value that you want to predict or it’s surprising whether it’s is somebody going to leave the bank or stay in the bank. Is this a fraudulent transaction it’s a real transaction and so on. So that’s going to be the output layer. And in between, we’re going to have a hidden layer. So as you could see in your brain you have so many neurons. Some information is coming in through your eyes, ears, and nose so basically your senses.

And then it’s not just going right away to the output where you have the result. Is going through all of these billions and billions and billions of neurons before guess output. This is the whole concept behind it how we’re going to model the brain. We need these hidden layers that are there before the output to the input Layer neurons connected to hidden Layer neurons. And they connect to output Layer. This is pretty cool.

But what is this all about? Where is the deep learning here, or why is it called deep nothing deep in here? While this is kind of like an option which one might call shallow learning where there isn’t much indeed going on.

But why is it called deep learning Well because then we take this to the next level we separate it even further and we have not just one hit and there we have lots and lots and lots of hidden layers and then we connect everything just like in the human brain connect everything interconnected everything? And that’s how the input values are processed through all these hidden layers just like in the human brain.

Then we have an output value and now we’re talking deep learning.

So that’s what deep learning is all about on a very abstract level. And the further blogs I am going to write will dive deep into deep learning and by the end of it you will know what the deep learning is all about and you will know how to apply it in your projects.

Super excited about deep learning can’t wait to get started and I look forward to seeing you in the next blog or vlog.

Until then enjoy deep learning,


For more follow me on Linkedin, Instagram or Quora.

How to implement machine learning in your business?

how to implepement machine learning in your business

Machine learning is an extremely useful tool. You have public, but mostly they built behind the scenes. Machine learning is used to solve hard problems for different companies. How can I use ML to build smart and profitable tool from our data? — If you are asking yourself, this everyday-you must read this blog. We’ve helped many businesses answer this question.

I will give you an outline of our ML process:

1. Set up the screening meeting with the decision-makers in the company and introduce everyone to machine learning.

It is crucial to explain what is machine learning and what can you do and what you can’t do. It is crucial to remember that you don’t have a magic wand to make money from anything. You should first focus on the most significant problem companies has and resolve that. If you do that you can after test other crazy stuff like scanning surface with drones and getting data insights from it. Any use case you find needs to be rooted in company goals and data. You can’t expect management nor developers to give you all the answers. You need a cross-disciplinary team and meeting. Have a meeting with a vision team (CEO, VP Product), and someone who knows data (CTO, Head of Data Engineering). You have to schedule at least half a day for this. You need a cross-disciplinary meeting. Machine learning is a tool. The way how it goes is that the more you understand it, the better you can put it to use. If people think it’s a magic wand like in Harry Potter, then they won’t be able to help you with your search.

You have to make it practical, leave out the math, and ask yourself these three questions about machine learning:

What is machine learning?

When can you use it?

What are the common misconceptions?

When everyone has an understanding of what machine learning is, it’s time for you to learn from them. Every company is unique. The overlap in what two different companies need is small. Don’t try to fit your company into a box.

2. Get inputs from all departments and forget about assumption.

You should look at the facts. Leave assumptions later when you have enough data and testing and predictions.

3. Make a list of machine learning processes that they can use in their company

4. First mover advantage

Check If MVP ( minimal valuable product) can be made in less than 4 months. It is crucial to deliver a solution fast so they can incorporate in the business.

5. You can’t make it all in once Putting ideas early in the process isn’t the right decision. Refocus the discussion: “This is more of a nice-to-have, so let’s leave it for now.” Ask critical questions: “If we did manage to automate and improve the accuracy or speed of this process by 20%, what would that mean in revenue per year?” To compare the cases you have left, make an Excel spreadsheet with the following columns: Data availability — How easy is it to access the correct data for this tool? If you don’t have the data yet, assume how much time you need to get it. Potential growth — If things go very well, how significant is the potential impact on a business priority? Risk — Do you have unknown factors that could dismiss the project? What does your experience tell you? Time to implement — Prioritise quick wins. With one substantial success behind you, you can move on to the more complex projects. You should use 80/20 rule to prioritize. As I mentioned above resolve first the most significant pain, they have in the company. The more machine learning projects you’ve already delivered, the better you can ask the right questions. You can ask yourself: What are the joint fails in projects like this one? Which datasets are essential to have, and which ones are not so important? What is a reasonable level of improvement? If you haven’t delivered a similar use case, talk to a team that has.

6. Research, analyze and again research :

When you’ve identified your top 3 cases, Google it: Who has implemented similar systems before? What approaches did they try? What were the findings and the final results? However, in machine learning, you can put together some of that information from published academic research. You shouldn’t copy approach. Get inspired and steal the best ideas. Use them to guide your further investigations and make it different. Making different it helps resolve real business problems and get your hands dirty on a machine learning project.

7. Make a report and visualize your insights (use Tableau for example)

You should explain the key point in 15 minutes. I know it is hard to summarize 8-hour meeting in 15 minutes, but that is actually why Data Scientist is paid that so much.

Great ML tools:


What goals are driving your company now? History What projects did you implement in the past? What were the results, the challenges?


What data do you have? Where is it made, and where is it saved? How much consistent history is there in each database?


What is the preferred infrastructure? Are there relevant restrictions on which provider to use (on-premises, AWS, or Google Cloud)?

Data Science Strategy

What is your data science vision? Do you want to build up your expert team, or do you want to find an experienced team to build you a solution? Combination of both? After you know what’s driving your business now, you can get more precise.

It’s time to find all the potential use cases.

Outline the list of processes for machine learning How can machine learning be used to automate decision making in your company? Machine learning is a tool to automate pattern discovery. It’s about improving an existing process by making it a bit better.

Great examples of machine learning are usually:

Data based: The process is already entirely based on data.

Large scale: happens over and over again

Automated: The process already uses technology to some degree.

Good ML examples:

Product recommendations

Credit scoring

Personalized marketing

Fraud detection

Image recognition

So think where data is being used to automate decision making and is their room for improvements. The best place to use machine learning is to support a process that’s currently done by people. Law firms are a good example. If it’s highly repetitive, tedious, and therefore slow, it might be perfect. Can you make it faster? Is it training a machine-learning algorithm to take on some decisions? Finally, make a decision! Update your use case ranking with the new information you picked up. Based on your findings, research, and experience, make a rough project plan. Together with your prioritization goals and the project plans, present your findings to your team. If you’ve done this, always keep the big picture in mind. You should now get your hands dirty. Have fun!

You need help making an ML plan? More than 10 companies have developed their ML plan with us in the last 12 months. I am super excited to get in touch with you and find out what you want to achieve and how ML can help your business to grow and gain more profit.



Until next time

For more follow me on LINKEDIN or INSTAGRAM

Source :