I recently wrote a blog post on how we should handle data and why quality matters. My context is academic management and if you would like read more about my data problem, and download the Data cleaner tool I built, click here: Data cleaner tool
This blog post is not too dissimilar from the latter and focuses specifically on dates and how we should also tread carefully.
I was recently working with a large database containing student information (students enrolled in our language school) and wanted to use the sort function in Excel to calculate how many student arrivals and departures we had for the following weekend. Students typically arrive and depart at the weekend and planning in advance is essential for what can be a very busy couple of days.
Under normal circumstances, I would simply trust Excel’s sort function to do its job, ordering students by arrival and departure dates. However, I noticed that what would normally be a routine task, was not working properly. In this instance, if I sorted my database by arrival date, it did not put all the dates in order, instead putting some of them out of order at the end of the data range. This was strange given that on the surface, the dates looked like… well, dates.
A little digging uncovered the fact that Excel classifies the values in each cell by category. What this means is that a cell containing this: 22/02/2021, might have been formatted as text and not a date. In this instance, the program was reading some dates as text.
It should be said that when you open a new Excel spreadsheet, the values for every cell are set to General. If you enter what obviously appears to be a date, it will automatically format the cell to a date value. Essentially, it requires some sort of intervention on the part of the user to physically change it to text or any other category such as numbers or percentages. Therefore, if you are receiving data from multiple sources, as I was, and feeding it into a database, you cannot trust the settings are uniform.
In the example below, I have used the function =ISTEXT(), which tells me whether the information is FALSE, i.e. not text, or TRUE, i.e. text. As you can see, the bottom four dates, marked TRUE, are categorised as text and here lies my sorting problem:
As I said, it requires intervention on the part of the user to undo any date formatting, even if it is not a deliberate act. Even my own attempts to format new data saw me encounter similar issues. Here I used this nested function, =PROPER(LOWER(TRIM(CLEAN(D1)))) (range D1:E2) to capitalise each word while keeping other characters lower case, remove any non-printing characters and trim any leading or trailing spaces:
Another problem I encountered is when a date is formatted as a date but the date order has not been standardised. I am used to working with, DD/MM/YYYY date order, but we should be mindful that this is not standard throughout the world. If you, as I did, receive data from multiple sources, you should check that the date order is standardised, although in some circumstances, having dates with slight variations will not cause too many issues:
However date order can be a problem. If the DD and MM numbers are under 12, there is the potential to misinterpret the date if DD/MM/YYYY and MM/DD/YYYY formats are in your data range:
Depending on what date format you are used to, you might, as I did, read 07/08/2021 as 7th August 2021 and not 8th July 2021, which, if it is formatted as MM/DD/YYYY, is what it really is. It should be noted that if it is formatted as MM/DD/YYYY, the spreadsheet program will read it as 8th July 2021, however, at a glance, you might be forgiven for reading it differently.
This date is significant because on 8th July, this particular cohort of students finish their course, and in advance of that date, they need to receive reports and certificates of course completion, their names needs to be removed from class, activity and excursion lists, meal provisions withdrawn, and occupancy records updated… This list is not exhaustive. Essentially, if the date is read as 7th August, all of these things are not going to happen as they will not be marked as leaving. We might also say the same for the initial problem I faced when sorting students (or at least attempting to) by arrival and departure dates, they too are not going to get their report and certificate.
Formatting dates in a way which is appropriate to your needs is not especially complicated. I recommend using the Text to Columns function in Excel before clearing all formats and then choosing your preferred date order in the Number settings. This of course takes time which is why I have built a tool which does everything for you at the touch of a button.
You can download the Date formatter tool by clicking here: Date formatter tool
Download and open the Date formatter tool and type/paste your dates into column A.
This tool was built using Excel and is macro-enabled. If you are using any other spreadsheet program, you may find it won’t work. It has also been made by me, an English teacher, and not a programmer. However, if you do notice any glitches, please let me know.
Interested in other office tools? You might be interested in this report checker: Student report checker