Sign in or register
for additional privileges

Using NVivo: An Unofficial and Unauthorized Primer

Shalin Hai-Jew, Author

You appear to be using an older verion of Internet Explorer. For the best experience please upgrade your IE version or switch to a another web browser.

Future Look: Data Repositories for NVivo-based Data Sets?

Researchers who use quantitative methods in the social sciences often publish out their data sets into online repositories at the time they publish the work based on the data.  These datasets are original data.  They have a clear provenance. They are placed in a format that is easily ingestible into any number of quantitative data analysis software tools.  (An assumption is that a researcher's work may be checked, for reproducibility and repeatability, an assumption that is not present in qualitative/mixed method/multi-method research in any precise sense.  There is also the idea that the datasets may be used for other research based on different backgrounds of the respective researchers and their access to other datasets that may have some cross-referencing value.)  Some universities host dataverses or data repositories based on the work of their faculty.  

Datasets are often scrubbed of human identifiers ("personally identifiable information") and “noise” before their release into public spaces.  They are shared generally for two reasons:  (1) to enable other researchers to test the findings of the prior researcher (who originated the dataset)…and (2) to enable other researchers to surface new findings from the released data (possibly using new methods or new technologies, or new combined methods and technologies).  




Open-source data repositories tend to mostly share quantitative data—such as those specializing in government data and in map data.  

Currently, there is not an online space per se for the release and distribution of NVivo datasets, which are based on qualitative and mixed methods research.  Part of the concern is that there are privacy issues with using raw qualitative data, which may be used to re-identify participants to the research.  Another issue is the difficulty of trying to “replicate” findings from qualitative or mixed methods data given the differences in methodologies and their underlying theories.  NVivo does not generally enable scrubbing of ingested data, so if identifiers were included initially, it is possible that those would remain throughout the use of the software.  Also, the integrated secondary sources (already-published articles) would be potentially “re-published” and "distributed" with the release of NVivo projects, which may contravene intellectual property and copyright issues.  

Proprietary (.nvp and .nvpx) Project Files

Also, NVivo datasets would require NVivo to open and access given that these are proprietary .nvp files.(By contrast, most format styles for quantitative data are in non-proprietary file formats--or may be easily converted to these non-proprietary formats, which makes the data much easier to query and manipulate using a variety of software tools.)  Researchers may use NVivo to output some table data in .csv or .xl or .xlsx formats that are more easily manipulable in other software programs, but entire projects are in the proprietary .nvp format.  




Some Thoughts about Preparing an NVivo Project for Purposeful Sharing

As a thought experiment, it may be helpful to consider how to prepare an NVivo project for eventual public sharing on publication.  The basic idea is to know what explicit and implicit data are in the project and what may be learned directly and inferred indirectly from the project.  Then, control for what may be seen by others without muffing your data or your codebook.  Generally, it is important to exploit your data as fully as possible before going with public sharing (IMHO).  

I am actually not confident that a person can fully set up a full-blown research project and have checked all the boxes for safe information sharing through the sharing of an NVivo project...but I am open to being proven wrong if anyone wants to have a go at it, in a friendly way.  

Project setup: 
    • It seems to make optimal sense to clean data of any identifiers before anything is ingested into an NVivo project.
 Project event logs:  
    • There may be data leakages from the event log.  
Sensitive codebooks:
    • Be careful about any sharing of codebooks with identifiers included.  The coding you want has to do with themes, not identities per se.  
Classification sheets (applied to case nodes):  
    • Be careful about re-identification of participants from classification sheets applied to case nodes.  
    • Be careful about the potential for cross-referencing contents to re-identify individuals.  
Ingested content and metadata:  
    • There may be data leakages from metadata (like EXIF data in digital imagery), and others.  
Physical maps (from social media platforms):  
    • Some excerpted information from social media platforms will include physical maps of locations of accounts, along with identifiers (or at least name handles).  
    • There are also social networks extracted, with name handles of the respective accounts.  
    • These can be cross-referenced with in-world data to possibly re-identify participants.  
    • Depending on the social media platform, there may be additional data that may ride with the downloaded files from NCapture.  (I have not explored these sufficiently yet to see what rides.)
Imagery / photos:  
    • Photos of people can be re-identified to a person because of the prevalence of facial recognition software...and online reverse image searches...the broad mapping of the WWW and Internet...  This applies to videos as well.  Videos that have people's faces showing and photos with people's faces showing...may be as good as identifying them.  
There could well be other data and other paths to possible compromise.  





A Literate Programming Approach 

A current movement in quantitative-based research is to enable authors to present research online as a stream of human-readable text and machine-readable code which are woven to enable readers to access the analytical data (at minimum) as well as the modeling code--so that the data analysis may be verified.  It is possible that such approaches may flow over to qualitative and mixed methods research.  One popular engine for such dynamic report creation is Knitr (pronounced "knitter").  This approach mixes documentation language for human readability and programming language for machine readability, and as such, it bridges some of the ambitions of the Semantic Web (which suggests a Web that is both human- and machine- readable).  


Additional Online Dataset Exploration

One of the first articles to address this was by Dr. Lisa Cliggett in "Qualitative Data Archiving in the Digital Age: Strategies for Data Preservation and Sharing." This was published in The Qualitative Report (TQR, Vol. 18, 1 - 11) in 2013. 

Cornell University links to a number of Internet data sources for social scientists through the Cornell Institute for Social and Economic Research.  The Inter-university Consortium for Political and Social Science is another source.  The Pew Research Center offers datasets on social and demographic trends.

Various commercial companies will release scrubbed datasets of internally collected data for researcher exploration. Others have created web-based application programming interfaces to enable access to extracted "shadow datasets" of some limited data; others, similarly, enable access to datasets with suppressed data values. These datasets may often only be used under certain legal constraints.  


Comment on this page
 

Discussion of "Future Look: Data Repositories for NVivo-based Data Sets?"

Add your voice to this discussion.

Checking your signed in status ...

Previous page on path "Using NVivo" Cover, page 53 of 58 Next page on path

Related:  Creating Codebooks in NVivo (through Reports...through Share)Research Journaling in NVivoA Research Workflow with NVivo IntegrationsWhat is NVivo?NCapture and the Web, YouTube, and Social MediaData Query: Coding QueryData Query: Group Query (Advanced)Ingesting "Internal" Source ContentsSetting up a Qualitative or Mixed Methods Research Project...ScreenshotDisambiguating Data VisualizationsComplementary ToolsUsing Demographics to Further Explore Interview, Survey, or Focus Group DataIntroTo Go with NVivo or Not? (and Partial Checklist for Stopping Usage of NVivo)mixed methods researchDownloading, Installing, and Registering the SoftwareConducting Data Queries in NVivo (Part 1 of 2)Manual Coding in NVivoData Query: Coding Comparison (Advanced) and Cohen's Kappa CoefficientSome Types of Data Visualizations in NVivoData Query: Matrix Coding QueryThe NVivo User InterfaceStarting a New NVivo ProjectData Query: Word Frequency Count QueryIngesting "External" Source Contents (Think "Proxy")Citing Published and Unpublished Sources in NVivoA Simplified Timeline of Qualitative and Mixed Methods Research (as a semi-recursive process)qualitative research"Autocoding" through Styled or Sequentially-Structured Textual DataData Query: Text Search Query"Using NVivo" CoverTeam or Group Coding in NVivoCopy Project Window in NVivo 10 (Windows)Creating Relationships between Project Items (Nodes, Sources, and Other Entities)Analyzing Social Media Data in NVivo"Autocoding" through Machine Learning from Existing (Human) Coding PatternsConducting Data Queries... (Part 2 of 2)Data Query: Compound Query (Advanced)