The Lexomics Workflow
Keeping track of your experimental methods is important. It is also challenging to sufficiently document experimental steps and decisions so that future readers and experimenters can reproduce published results. Our workflow is an attempt to share effective practices that make one's choices intentional. When we openly share our methods, data, and workflow, we submit that we facilitate the replication of results (LeBlanc, 2017).
We call Lexos "An Integrated Lexomics Workflow" because it brings together many of the processing steps we in the Lexomics project regularly perform in our research. Some history of the Lexomics project may give some useful perspective on what we mean by a workflow. When the Lexomics project began, it consisted of three simple PERL scripts: one to clean-up texts, one to cut them, and one to perform cluster analysis on them. Each script had to be run in sequence. So, after a while, it made sense to create a single tool that would guide the user from one to the next. It then became clear the tool's interface could allow the user to go back to earlier steps, tweak the settings, and then repeat their experiments. There were in fact many ways in which a user could design experiments using a single tool, and the tool could help the user manage their activities and, perhaps more importantly, to think critically about their process. Thus was Lexos born.
While the strictly linear steps of its origins are no longer the only possible approaches you can adopt when using Lexos, they provided an important insight about how computational text analysis workflows are constructed. They essentially have three basic steps: pre-processing (scrubbing, segmenting), analysis, and visualization. It is not always possible to clearly separate these activities. Even in our earliest scripts, the first two were pre-processing steps and the last, which plotted a tree diagram of the cluster analysis, combined analysis and visualization. But, as Lexos has developed, we have tried to make this its organizing principle, encouraging the user to proceed from text preparation to simple visualization of their data to more complex analysis. This is particularly useful for entry-level users and those whose training has not explored the issues raised by computational methods. (In the Margins is our attempt to position the process of computational text analysis side by side with its product.) Lexos is thus designed to enable newcomers to the field to adopt the Lexomics workflow, empowering them to do sophisticated work in relatively little time.
Following an effective workflow helps you document your computational methods so others can replicate your results.