Changes

Published from sandbox
[[Image:Analyse-data.jpg|thumb|right|200px|Photo: [http://www.koiduholdings.com koiduholdings.com]]]

Every data collection project should include data analysis in order to facilitate learning. In essence, data analysis aims to inspect, clean, transform, and visualise data with the goal of reaching certain conclusions about a problem at hand. Through data analysis, describe and summarise the data, identify relationships, and find similarities and differences between the variables. Create insights that you can use and share with others in a visual and accessible format. This will allow you to make better decisions based on an understanding of the data.


{| style="background-color: #efefef;" cellpadding="10px"
|<font size="3" color="#000">'''Analysing and visualising data in six steps'''</font><br>
<font color="#696969" size="3">'''Preparation'''</font><br>
'''Step one''': What do you want to know?<br>
'''Step two''': Which data sources are you planning to use/combine? <br>
'''Step three''': Is your data clean and ready to be used?
|-
|<font color="#696969" size="3">'''Execution'''</font><br>
'''Step four''': Understand your data. <br>
'''Step five''': Explore your data and find relationships. <br>
'''Step six''': What story do you need to communicate? To whom? Why?
|}


===Preparation===

====Step one: What do you want to know?====
Before you start solving any problem, you should know and understand the problem. Get involved with the problem, not the solution. Asking the right question(s) is crucial to creating good insights from data. Based on the previous phases, especially the design of your project, this step should be straightforward and indicate clear steps to be taken in the data analysis process.

====Step two: Which data sources are you planning to use/combine?====
You now know about the problem, but to arrive at a solution you need data. After a successful data collection phase, consider supplementing the information you gathered with external sources. There is a wealth of open data resources available (you can read more about different ways of sharing data in [[Handbook on Data Collection / Phase Eight: Share Data and Communicate Insights | phase eight]]), but linking different datasets in the right format can be complicated and requires careful consideration. It needs to be the right data. Say you want to connect household data on water and sanitation facilities with data about public spending on water, sanitation and hygiene (WASH). Both datasets are collected at different levels: household level and district, region, and country. Can you connect the two, and if so, how do you do it? What are the challenges? You can find more information about the data collection process in phases [[Handbook on Data Collection / Phase Five: Prepare Data Collection|five]] and [[Handbook on Data Collection / Phase Six: Collect Data|six]].

Also consider the ''type'' of data you need for your analysis. Sometimes, data may be available, but in a format that makes it hard to perform this analysis. Can you transform the data into a different format or change the type of analysis to match the data format?

====Step three: Is your data clean and ready to be used?====
After you collect all the information, you can't quite start with the fun stuff: visualisations and regressions, right away. You will always need to clean the data first. Data cleaning very often takes a big chunk of every data analysis project. When you clean data, you try to understand every single column in your dataset, identify the missing or corrupt values, the outliers and make sure the data is consistent, for example, eliminating different spellings of the same thing.

===Execution===
====Step four: Understand your data====
Now that your data is clean, it’s time to look at it and understand the characteristics of the data. These characteristics can include size or amount of data, completeness of the data, and correctness of the data. When you first look at the clean dataset you want to understand all your different variables. It is easier to understand the information in your data when you visualise and summarise it in tables.

====Step five: Explore your data and find the relationships====
You can spend a lot of time looking at the data, but try to quickly find the relevant patterns that will lead you to the answer you want to find. Most of the time when you have to report on the project, you will look at different indicators that were chosen during the design phase and report on the amounts they represent. That’s definitely needed, but data analysis should go beyond counting. When you explore data you should try to compare and contrast different variables. This will identify relationships that might not have been obvious. This process of checking different characteristics of datasets will get easier and more intuitive the more you do it.

In some projects, you want to go even further and consider the wealth of information that can be obtained by applying more complicated data science techniques. You can look for hidden patterns, find explanations and try to predict future outcomes of the specific indicators.

====Step six: What story do you need to communicate? To whom? Why?====
To make data analysis worthwhile, communicate insights you gained in a compelling and digestible way. There is no point in producing information that no one will understand, which could easily be a waste of resources. Create a story that combines the information obtained with the domain knowledge of your organisation to answer the question you asked. Communicating the results of data analysis is the crucial point. Ensure the message you are communicating is clear and targeted to a specific audience. If you make complicated graphics that require a lot of explanation, you risk your audience losing attention and your message getting lost.

===Understand the message you are trying to communicate===
When it comes to data visualisation, you are trying to show one of four things with your data: relationships between data points, a comparison of data points, a composition of data, or a distribution of data.

* To show relation, show the connection or correlation between two or more variables. For example, how the functionality of a waterpoint relates to its age.

* To compare data points, show what distinguishes one set of variables from another. Display how two variables interact, such as the number of water points in the five biggest districts in the country.

* To show the composition of data, collect different types of information that make up a whole and display them together. For example, the different founders of water points in the whole country.

* To show distribution, lay out a collection of related or unrelated information simply to see how it correlates. For instance, display the number of reported water point malfunctions over a long time period and see if you can find seasonal patterns in the functionality.

===Conclusion===
After you have collected the data, it’s crucial to clean the data from outliers and unreliable data points to arrive at useful insights. In order to transform the data into insights, take time to understand the dataset, which will help to uncover patterns in, and relationships between, the data. However, communicating the data effectively to your target audience is as important as the data analysis itself. Read more about that in the next phase: [[Handbook on Data Collection / Phase Eight: Share Data and Communicate Insights|share data and communicate insights]].

===Suggested reading===
* A guide to the basic data analysis: [https://www.measureevaluation.org/resources/training/capacity-building-resources/basic-data-analysis-for-health-programs Introduction to Basic Data Analysis and Interpretation for Health Programs: A Training Tool Kit]

* Picking the right type of chart: [http://extremepresentation.typepad.com/blog/2006/09/choosing_a_good.html The Extreme Presentation(tm) Method, Choosing a Good Chart]

* How to document your data: [http://www.medicine.mcgill.ca/epidemiology/joseph/pbelisle/CodebookCookbook.html How to enter and document your data]

* Data science and it’s potential in international development (blog): [https://blogs.worldbank.org/opendata/will-your-project-fail-without-data-scientist Will your project fail without a data scientist?]

* Data cleaning checklist: [https://www.thedatabank.com/wp-content/uploads/2010/02/Data-Cleanup-Checklist.pdf Data Clean-Up Checklist]

===Acknowledgements===
Authors: Annabelle Poelert (Akvo.org), Karolina Sarna (Akvo.org)<br>
Contributors: Bettina Genthe (Council for Scientific and Industrial Research, CSIR), Camille Clerx (Akvo.org), Lars Heemskerk (Akvo.org), Marten Schoonman (Akvo.org), Nikki Sloan (Akvo.org), Stefan Kraus (Akvo.org)

'''AfriAlliance'''
{|
|The Africa-EU Innovation Alliance for Water and Climate (AfriAlliance), is a 5-year project funded by the European Union’s H2020 Research and Innovation Programme. It aims to improve African preparedness for climate change challenges by stimulating knowledge sharing and collaboration between African and European stakeholders. Rather than creating new networks, the 16 EU and African partners in this project will consolidate existing ones, consisting of scientists, decision makers, practitioners, citizens and other key stakeholders, into an effective, problem-focused knowledge sharing mechanism.
|[[Image:afrialliance-logo-RGB.png|right|110px|link=https://afrialliance.org]]
|}

{|
|AfriAlliance is lead by the IHE Delft Institute for Water Education (Project Director: Dr. Uta Wehn) and runs from 2016 to 2021. The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 689162.
|[[Image:EU_flag_RGB.jpg|right|90px|link=https://ec.europa.eu/programmes/horizon2020/en/]]
|}
Akvopedia-spade, akvouser, bureaucrat, emailconfirmed, staff, susana-working-group-1, susana-working-group-10, susana-working-group-11, susana-working-group-12, susana-working-group-2, susana-working-group-3, susana-working-group-4, susana-working-group-5, susana-working-group-6, susana-working-group-7, susana-working-group-8, susana-working-group-9, susana-working-group-susana-member, administrator, widget editor
30,949
edits