C2C Digital Magazine (Spring/Summer 2023)

At the intersection of information science, learning technologies, and data science: A reflection of a three-article dissertation journey

By Javier Leung, University of Missouri


Researchers in information science are required to regularly analyze the discourse happening in large-scale online communities of practice (CoPs) to better understand the evolution of topic patterns and the exchange of valuable work-related practical knowledge. This article comprises three articles published in the three-article dissertation at the University of Missouri in February 2023. It also summarizes the key findings of three online CoPs in instructional design and technology (IDT) and teacher professional development (PD) using a content management system (CMS), the Facebook social media platform, and custom development with responsive layouts. This article encapsulates the design recommendations that enhance the current state of online CoPs and the future development of web technologies that enable online CoPs to become conducive information spaces for practitioners. This article also documents the doctoral journal and provides tips for future doctoral students pursuing a master's degree during doctoral studies. This article culminates with the valuable lessons learned during the doctoral program.


The academic journey of doctoral graduates is unique and characterized by diverse personal experiences and professional milestones. Completing a Ph.D. in Information Science and Learning Technologies in February 2023 and a Master's in Data Science Analytics in May 2019 took nine years to complete. The purpose of this particular article is two-fold. First, this article contains a reflection on the doctoral journal, and the goal is to pass on some nuggets of wisdom to future graduate students. Second, this article summarizes the key findings from the three-article dissertation using web scraping techniques of public information, natural language processing (NLP), and usability testing.

Three concepts are necessary to understand the three-article dissertation related to online communities of practice (CoPs), practical knowledge, and NLP. Wenger and Synder (2000) described CoPs as groups of people informally gathered to share expertise in a specific domain or field as they interact regularly. Zappavigna (2006) stated that CoPs hold collective meanings and processes of acculturation where members internalize the group's norms and implicit knowledge structures. Practical knowledge refers to work-related knowledge valuable to practitioners in a particular profession. By extracting practical knowledge from information and communication technologies (ICTs), NLP algorithms are widely used unsupervised to identify the patterns in practical knowledge from codified language. The three-article dissertation used the Latent Dirichlet Allocation (LDA) and the Natural Language ToolKit (NLTK) to extract syntactic and semantic characteristics for quantifying practical knowledge. Also, LDA and NLTK are commonly used to develop application programming interfaces (APIs) that process textual data to support the organization of practical knowledge.

This article is organized into three sections. Section I summarizes the doctoral journal and research background. Section II outlines the three studies conducted as part of the three-article dissertation and design recommendations for supporting online CoPs. Section III concludes the article with intangible lessons learned from the doctoral journey.

Section I: The Doctoral Journal

In 2012, I vividly remembered reading a few research articles on the Waikato Environment for Knowledge Analysis (WEKA) that allowed academic researchers to perform data mining and machine learning tasks on large-scale datasets. As an open-source data mining tool, the WEKA project was created in 1992 and funded by the New Zealand government in 1993 (Hall et al., 2009). Since the tool's inception, many academic research studies have investigated learners' behaviors from big data generated in learning management systems (LMSs). Back then, I knew that actionable insights from big data from educational systems and learners' digital traces could be explored in efforts to improve online courses and the learning experience at the course and program levels.

Although the inception of WEKA dates back to 1992, most instructional design and technology (IDT) programs in 2006 did not cover data mining and machine learning topics. After 15 years of working in higher education, a doctoral degree in IDT seemed a natural progression. Around 2013, several online and on-campus IDT doctoral programs from Boise State University, Nova Southern University, Kansas University, and the University of Missouri were good choices at the time. After a year of researching IDT doctoral programs and talking to program advisors, the University of Missouri (MU) was the only choice that offered data science and analytics courses as part of a new graduate program. Traditionally, doctoral students take one or two semesters of quantitative and qualitative courses as part of the program of study (PoS). However, my PoS required six semesters of data science and analytics because I had already taken graduate-level courses in statistics during my IDT training.

In early 2014, I took a job opportunity with the MU, where I had to manage an online library for K-12 educators called the EdHub Library. Fast forward to 2019, the topic of data science and analytics was a huge learning curve for coding in Python and R. Despite the challenging aspects of data science and analytics, such experiences during the program provided new perspectives on applying algorithmic concepts to educational research using a variety of data sources, especially in the areas of educational data mining (EDM) and learning analytics (LA).

As of 2019, my research agenda heavily relies on instructional design, e-learning, and data science and analytics skills to measure how 38,000 educators have utilized professional development (PD) materials across Missouri, Kansas, and Nebraska school districts. My publications aim at developing platforms that lead to the digital transformation of user experiences by analyzing big data from analytics, server, and artifacts. More specifically, the two lines of research are related to (1) understanding instructional design practice from users' digital traces and (2) program evaluation of systems and user behaviors using machine learning for prediction, classification, and pattern recognition. Figure 1 summarizes my professional and doctoral background.

Figure 1.  Dr. Javier Leung's Visual CV

Since passing the dissertation defense in February 2023, I have collected a few pieces of advice for future doctoral students as follows:

Section II: A Summary of the Three-Article Dissertation

This section of the article contains the individual studies, problem statement, purpose, contribution to the literature, research questions, methodology, key findings, and design recommendations from the three-article dissertation defense in February 2023.

The sequence of studies presented in the dissertation represents a critical step for improving online CoPs in IDT and teacher PD by analyzing the accumulated practical knowledge through NLP to explore how community members use respective online CoPs with a content management system (CMS), the Facebook social media platform, and custom development of responsive layouts for practical knowledge organization, sharing, and collaboration purposes. I am the sole author of the articles in the dissertation that were accepted in the following scholarly journals:


The first study analyzed 9,033 online news articles in IDT from the E-Learning Industry across seven news categories. The second study examined 6,066 user posts from four public groups on Facebook (Instructional Designers, Designers for Learning, Adobe Captivate Users, and Articulate Storyline). The third study offered a design case for online teacher professional PD in standards-based PD called the EdHub Library, which served around 38,000 educators across Missouri, Kansas, and Nebraska as of 2022.

Problem Statement

ICTs are currently built on Web 2.0 technologies emphasizing the creation and distribution of content (Önday, 2019). ICTs have also allowed the accumulation of practical knowledge in online CoPs as the byproduct of members' interactions. Although ICTs play a critical role in supporting the creation and distribution of practical knowledge, online CoPs have inefficient organizational schemes that mismanage the accumulated practical knowledge with little or no alignment to professional competencies. The lack of organizational schemes affects community members' ability to search for solutions independently, allowing for professional advancement from content- and contextual-based organizational perspectives. First, the literature has noted the usability issues in online CoPs' lack of mechanisms for browsing the accumulated practical knowledge from a content-based organizational perspective. Second, online CoPs are not designed as information spaces that enrich the professional advancement of community members by contextualizing practical knowledge with professional competencies


Although ICTs have enabled practitioners to develop online CoPs out of shared practice and necessity, future technologies should support community members' information tasks through organizational schemes that allow categorizing topics and evaluating practical knowledge based on professional competencies. The dissertation explores the mechanisms to support the navigation and evaluation of the accumulated practical knowledge ingrained in online CoPs as mental models and shared beliefs, identities, and meanings.


The studies provide an exploration of shared practical knowledge and practices in online CoPs and fill a gap in the literature in the following manner:


Dissertation's Research Question

The following research question encapsulates the purpose and sequence of the three-article dissertation: What design features in online information spaces enhance the exchange of skills and knowledge among members of a community of practice?

The sequence of the papers describes the processes for taking inventory of shared practical knowledge from online CoPs that allow for the creation of mechanisms to represent and visualize practical knowledge:



Studies #1 and #2 described NLP tasks for extracting topic structures from codified knowledge found in platforms that use different IAs to organize their practical knowledge. Study #3 used usability testing sessions to investigate a user interface that supported educators' alignment of PD materials with Missouri teacher standards.

Natural Language Processing

In the first two studies, NLP was employed as an exploratory means to examine the practical knowledge behind codified language in prominent online CoPs in IDT. Syntactic and semantic NLP tasks were implemented to extract the syntax-based characteristics and meaning behind text sources.

Syntactic NLP tasks refer to the characteristics of the syntax of words concerned with the position of words in a sentence without understanding the context around them. In these studies, the syntactic NLP tasks included average word and sentence lengths, n-grams, and word frequencies that allowed for exploring the codified knowledge from text sources. Lambda functions were used to calculate the average word and sentence lengths, and the visualizations were generated using the Profile Report package (Brugman, 2021). The n-grams language model in NLTK was implemented to understand the probabilities of contiguous words in trigrams and 4-grams (Natural Language Toolkit — NLTK 3.6.2 Documentation, n.d.). After implementing a stop words dictionary to remove uninformative words, word frequencies were obtained to understand the importance of words based on frequencies.

Semantic NLP tasks are concerned with extracting meaning out of the context of words. The semantic NLP tasks included sentiment analysis, named entity recognition and entity relationships, and topic modeling. The TextBlob package was implemented for sentiment analysis to identify positive, neutral, and negative attitudes in the texts (Lorian, n.d.). Text sources were processed with the spaCy package to extract entities, including names of people, places, organizations, and geographic locations (spaCy · Industrial-Strength Natural Language Processing in Python, n.d.). Once entities were extracted, subject-object relationships were formed as entity pairs to describe the source and target entities linked by edge entities.

The LDA and BERTopic topic modeling algorithms were implemented to identify the latent topic structures in online CoPs. In the first topic modeling technique, LDA generated topic models based on the word representations and probabilities from the bag-of-words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF) (Řehůřek, 2009). In the second topic modeling technique, BERTopic was implemented to generate topic representations against a pre-trained model (Grootendorst, 2020; Reimers, 2021). Although there is no standard evaluation procedure for topic modeling, semantically coherent topics were examined through human judgment and quantitative approaches. Chang et al. (2009) proposed evaluating topic model outputs using two methods, including topic and word intrusion methods. In terms of topic intrusion, discovered topics can be evaluated to determine whether the topic model's decomposition of documents agrees with human judgments. A topic model can be examined regarding word intrusion by observing the words inserted in a topic model that do not provide semantic coherence or coherent meaning. Regarding the quantitative approach, coherence values were obtained by running the topic modeling algorithms until the highest coherence values were achieved.

Usability Testing

In study #3, redesigning the online teacher PD platform, the EdHub Library, required two usability testing sessions through voluntary participation from NEE and Assessment Resource Center (ARC) staff. This study was conducted as part of my duties as an instructional designer for improving a university-related service to school districts. Participant information was not collected, and no identifiable information can be traced back to participants.

The primary objective of the usability testing sessions was to obtain feedback from participants to refine the three-level hierarchical navigation structure of the prototype. The prototype contained three main sections (i.e., getting started, search engine feature, and topic categories). In the first usability testing session of the prototype, five NEE trainers were tasked to look for specific materials, write the location of the materials, and provide feedback related to their user experience.

After implementing feedback from the first usability testing session, the second prototype incorporated a fourth section for accessing dedicated NEE teacher indicator sitemaps directly from the homepage. The same five NEE trainers and three ARC participants were asked to locate materials, write down their location, indicate their search preference, and provide feedback on their user experience. In the two usability testing sessions, the correctness of the location of the materials was verified.

Key Findings

Overall, this line of research is critical in understanding online CoPs that support the information needs of IDT practitioners, software companies, and educators. The series of studies has demonstrated that:

Study #1

There was an overemphasis on the technical aspects of IDT, and LMS was heavily emphasized for various functions. Out of the seven news categories, the articles in the LMS category were the longest, and the rest were shorter than those in other news categories. Across the news categories, most articles were written positively and referenced other articles within the online CoP. A few recommendations from study #1 include: 

Study #2

The four Facebook groups in IDT actively exchanged educational technology and pedagogical advice. Most user posts among the four Facebook groups had similar sentiment distributions, and the majority were positive, followed by neutral and negative sentiments. Identical to the first study's findings, educational technology was overemphasized over pedagogical concepts. More importantly, the hashtag used in the online CoPs did not reflect the accumulated practical knowledge and provided little value to members when searching for information independently. Also, online CoPs needed more protocols for onboarding new members, addressing misinformation, and aligning topic structures to professional standards.

The topic models showed unique and shared characteristics among these online CoPs. These unique characteristics describe the purpose and particular functions of the communities. For example, the Instructional Designers group is a place for reviewing IDT portfolios and resources for educational animation. In contrast, members discussed educational technology research and game development in the Designers for Learning. In the e-learning development communities, these were related to specific integrations with the LMS and JavaScript. Regarding shared characteristics, the first two groups shared IDT graduate programs, job postings, and PD events. Moreover, in the last two groups, not surprisingly, members troubleshoot issues with their respective e-learning development tools.

The recommendations for the second study were related to missed opportunities that Facebook groups could have taken during the pandemic. These communities had the potential to become information hubs to assist practitioners with their transition from in-person to online learning. Also, Facebook groups needed to become more adequate information spaces for community members, and they required topic structures that organized their practical knowledge from the content and context perspectives. Also, platforms should implement mechanisms to process user posts to support all aspects of the knowledge-creation process.

Study #3

This design case showcased implementing a three-level hierarchical structure over a sequential structure for the teacher PD CoP. A similar arrangement can be implemented in other online CoPs where the interface supports content and contextual organization on the homepage and search engine. More importantly, study #3 is an example of an online CoP where knowledge is curated and organized consistently regardless of how educators search for PD materials.


In the first study, external resources from online articles and user comments were not processed due to the need for more consistency. In the second study, word frequencies and topic models can change over time once these online CoPs establish protocols for quality control. The third study was not quite a limitation, but it could have used additional participants, especially novice users, to assess the homepage and search engine.

Design Features

Regardless of web technology platforms or the adoption of future development of web technologies, these recommendations are designed to enhance online CoPs as conducive information spaces for professional advancement:

Section III: Conclusion

In summary, ICTs are built on Web 2.0 technologies that allow for creating, storing, and sharing practical knowledge. Its accumulation is due to members' interaction in online CoPs. Without topic structures in place, members cannot seek solutions independently. Online CoPs, in their current form, are ineffective at curating their knowledge, and the assessment of practical knowledge is not always transparent. Online CoPs must establish protocols when producing, organizing, and evaluating practical knowledge.

Online CoPs must standardize how they produce knowledge, especially on social media platforms. The challenge in online CoPs on Facebook is that members may request advice on educational technology but lack certain contextual information. An excellent example of standardizing such requests is the SOAP protocol, which stands for Subjective Objective Assessment Plan from the healthcare community. With the SOAP protocol, the background information is provided along with the summary of the objectives, intervention plan, and assessment options.

In addition, online CoPs need evaluation criteria for determining the quality of practical knowledge, and they need organizational structures to classify practical knowledge. In the first study, the seven news categories are insufficient to organize online articles, and this online CoP must create additional topic structures to support IDT's pedagogical and professional aspects. In the second study of Facebook groups, the current hashtag structures do not help members when looking for practical knowledge independently.

Although the doctoral journey has ended, I can take the lessons learned and experiences as I move forward in my professional life. More importantly, the doctoral journey has taught me the intangible aspects of attaining a doctoral degree while working full-time in the following ways: gracefully receiving feedback from colleagues and advisors, cultivating the joy of writing, and understanding the ebbs and flows of becoming a researcher, which has taken nine years in the making.



Brugman, S. (2021.). Introduction — pandas-profiling 3.0.0 documentation. Pandas Profiling. Retrieved August 6, 2021, from https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/

Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J. L., & Blei, D. M. (2009). Reading tea leaves: How humans interpret topic models. Advances in neural information processing systems (pp. 288-296).

Grootendorst, M. (2020). GitHub - MaartenGr/BERTopic: Leveraging BERT and c-TF-IDF to create easily interpretable topics. BERTopic. Retrieved August 6, 2021, from https://github.com/MaartenGr/BERTopic

Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: an update. ACM SIGKDD explorations newsletter, 11(1), 10-18. https://doi.org/10.1145/1656274.1656278

Leung, J. (2021). Design features of online teacher professional development: A design case for re-developing the EdHub Library to improve usability and alignment of content with teacher standards. International Journal of Designs for Learning, 12(2), 79-92. https://doi.org/10.14434/ijdl.v12i2.29578

Leung, J. (2022). Examining the Characteristics of Practical Knowledge from Four Public Facebook Communities of Practice in Instructional Design and Technology. IEEE Access, 10, 90669-90689. https://doi.org/10.1109/access.2022.3201893

Leung, J. (2022). An NLP approach for extracting practical knowledge from a CMS-based community of practice in E-Learning. Knowledge, 2(2), 310-336. https://doi.org/10.3390/knowledge2020018

Lorian, S. (n.d.). TextBlob: Simplified Text Processing — TextBlob 0.16.0 documentation. TextBlob: Simplified Text Processing. Retrieved August 6, 2021, from https://textblob.readthedocs.io/en/dev/

Natural Language Toolkit — NLTK 3.6.2 documentation. (n.d.). Natural Language Processing Toolkit - NLTK. Retrieved August 6, 2021, from https://www.nltk.org/

Önday, Ö. (2019). Web 6.0: Journey From Web 1.0 To Web 6.0. Journal of Media & Management, SRC/JMM-102. https://doi.org/10.47363/JMM/2019(1)102

Řehůřek, R. (2009). Gensim: topic modelling for humans. Topic Modelling for Humans. Retrieved August 6, 2021, from https://radimrehurek.com/gensim/

Reimers, N. (2021). Pretrained Models — Sentence-Transformers documentation. Pre-Trained Models. https://www.sbert.net/docs/pretrained_models.html

spaCy · Industrial-strength Natural Language Processing in Python. (n.d.). spaCy - Industrial-Strength Natural Language Processing. Retrieved August 6, 2021, from https://spacy.io/

Wenger, E. C., & Snyder, W. M. (2000). Communities of practice: The organizational frontier. Harvard Business Review, 78(1), 139-146.

Zappavigna, M. S. (2006). Tacit knowledge in communities of practice. In Encyclopedia of communities of practice in information and knowledge management (pp. 508-513). IGI Global. https://doi.org/10.4018/978-1-59140-556-6.ch08


About the Author

Javier Leung is an instructional designer responsible for measuring the impact of online learning environments and 500+ self-placed materials for around 38,000 educators across Missouri, Nebraska, and Kansas. With over 15 years of experience, Dr. Leung is a seasoned instructional designer, e-learning developer, and front-end developer in higher education and talent development. His research aims to impact how technology and learning engineering can be applied to sustain better learning experiences and interfaces in educational systems that involve program evaluation and analyzing big data to understand user behavior using machine learning for prediction, classification, and pattern recognition. Dr. Leung also leverages educational data mining and learning analytics approaches for investigating knowledge structures and shared professional practices from unstructured data in instructional design communities of practice through natural language processing and user experience design. More about this professional background and publications can be accessed at vita.javierleung.com or www.javierleung.com

His email is leungj@missouri.edu. 

This page has paths:

This page has tags:

This page references: