By Peter Carleton
Did you know there’s a useful piece of software for linting CSV files, for anyone using Windows?
What is linting? It’s typically defined as analysing code (or data) statically, i.e. without executing it.
What is a CSV file? It’s a text file used to handle Excel-like tables of data. This How To Geek article will tell you more. Many of us know it as “the format where the leading zeroes from the phone numbers go missing”!
What problem does linting solve? CSV (and other character-delimited text files) often need to be inspected visually, e.g. to verify data.
Often, software that is commonly used to open CSV files will interpret the data that they contain in some way. In doing so, they may materially change the values. E.g. In some modes of operation, MS Excel will convert dates into serial numbers (and maybe 2 different numbers… because leap years and early Macintosh… dates are annoying).
• To get around this, one can use a “text editor” application which will display text file contents exactly as they are.
• On Windows, MS Notepad is commonly used, chiefly because it comes pre-installed.
• Working with text files in Notepad can be difficult/fatiguing due to its minimal options for formatting data.
• This post introduces CSV Lint, a plugin for an alternative text editor, Notepad++ as a possible solution to this.
An aside on Notepad++: Because MS Notepad only offers a minimal feature set, an open-source application called Notepad++ is popularly used as an alternative.
Notepad++ provides additional functionality for working with text files, as well as an extensive plugin library. It lets you look at XML files easily, features tabbed browsing to flick between files and it allows you to look at CSV files without doing naughty things like removing leading zeroes! Note: It’s no longer available in the Microsoft Store, so please check with your IT before installing.
Where can I find tools to help?
• CSV Lint is available through the Notepad++ plugin manager -Extend functionality with plugins | Notepad++ User Manual (npp-user-manual.org)
• Source code repository - GitHub - BdR76/CSVLint: CSV Lint plug-in for Notepad++ for syntax highlighting, csv validation, automatic column and datatype detecting, fixed width datasets, change datetime format, decimal separator, sort data, count unique values, convert to xml, json, sql etc. A plugin for data cleaning and working with messy data files.
• Notepad++ website - Notepad++ (notepad-plus-plus.org)
What CSV Lint does…
CSV Lint primarily provides tools to: visually inspect data; summarise a CSV file; and convert data into different formats (e.g. into a SQL INSERT INTO statement). I love that it allows you to colour columns and to set-up rules which CSV Lint runs against your entire dataset to flag anomalies.
This GitHub library contains the installer, and detailed instructions on how to install this as an add-on to Notepad++.
How do I use it?
For installation: Once downloaded, in Notepad++ - Plugins menu -> Plugins Admin -> Available tab -> tick CSV Lint -> click Install.
To use: Open a CSV file and Commands/settings are available in Plugins menu -> CSV Lint.
For more info, you can access the developer’s repository of background info:
Watch this YouTube video to find out how to view and edit data more easily. CSV Lint plug-in video
This YouTube video walks you through how to View, Add Column Colors and Sort Data using Notepad++
Related stuff…
There are many other linters for text files. Try searching Reddit or Hackernews as this comes up pretty regularly.
Fun fact: RFC 4180 is often cited as the “definition” for CSV files. However, it is not an approved international standard (“RFC” stands for “Request For Comments” in this context). So while it is the closest thing as exists to an international standard, many software applications do not adhere strictly to it. This includes very popular applications such as MS Excel/Access/MS SQL Server, etc. In turn, RFC 4180 allows for some variation in formats too. This makes life interesting sometimes…
Editor’s Note: If installing software on your machines, please ensure you’ve completed due diligence and that your IT team are happy.