After defining clear Goals and collecting as much information as possible (clarifying key questions), you will have a good idea of the data that you need.
Here you want to be methodical. That is the technical part that requires extracting data, access to data sets, databases, build data models and data cleansing. That means that even if you or your team are not in charge of this part, you need to prepare a full brief.

What data do you need?
In the previous step, information, you should have collected the key questions to answer. They are the starting point for the data collection.
You need to write down the questions if you haven’t already and figure out which data you need to answer them.
Don’t be shy, take this initial list as a wish list, so put everything you think could help answer your key questions.
How to collect and store the data
In a spreadsheet, list all the questions, data sources and fields, you need to answer the key questions you gathered.
The next step is the data gap analysis. For this section, we can use any validated gap analysis technique. If you want to keep it simple, you can start with, availability, accessibility and DQI (Data quality Index).
- Availability refers to the mere existence of the data; do we have competitors data? It is usually a binary result, either yes or no.
- Accessibility is a step after confirming availability, and it answers the question, can we access the data? Here you might get yes, no or yes but. For example, yes, but we cannot use specific fields due to privacy.
- DQI or Data Quality Index. That is a quality of measure for the data we will use, and it answers the question. Can we rely on this data? DQI can become complicated, but I recommend you to make sure that the information is accurate, complete and unique if you want to know what these mean have a look into the Introduction to Business Analytics course. And if you don’t have much time, look at the data quality charts presented here.
Examples of Data points
- Examples of Data for “Jumpy Shoes”:
- Total sales of “Jumpy Shoes” by hour of the day, day, week, month, quarter of year.
- Total sales of “Jumpy Shoes” by model
- Total share of “Jumpy Shoes” sales by model.
- Customer that bought “Jumpy Shoes”
- Demographics
- Psycographics
- Price history of “Jumpy Shoes”
In a more DATA friendly format:
Model | Date | Time | Price | Size | Channel | Oder ID | Customer ID |
Shoe Gx | 20/03/2020 | 1:23pm | 76.5 | 38 | Web | 1214914 | Customer1 |
Shoe Gy | 24/03/2020 | 9:34m | 49 | 42 | In Store | 85895798 | Customer2 |
Customer ID | Age | Gender | Post Code | Number of Purchases | Last Purchase | Customer Lifecycle Value |
Customer1 | 24 | Female | 4057 | 3 | 19/01/2020 | 146.74 |
Customer2 | 44 | Male | 4914 | 1 | 20/05/2019 | 56 |
Final Remark
By the end of this step, you should have collected and prepared the data you will use for the analysis.