Researchers in information science are required to regularly analyze the discourse happening in large-scale online communities of practice (CoPs) to better understand the evolution of topic patterns and the exchange of valuable work-related practical knowledge. This article comprises three articles published in the three-article dissertation at the University of Missouri in February 2023. It also summarizes the key findings of three online CoPs in instructional design and technology (IDT) and teacher professional development (PD) using a content management system (CMS), the Facebook social media platform, and custom development with responsive layouts. This article encapsulates the design recommendations that enhance the current state of online CoPs and the future development of web technologies that enable online CoPs to become conducive information spaces for practitioners. This article also documents the doctoral journal and provides tips for future doctoral students pursuing a master's degree during doctoral studies. This article culminates with the valuable lessons learned during the doctoral program.
The academic journey of doctoral graduates is unique and characterized by diverse personal experiences and professional milestones. Completing a Ph.D. in Information Science and Learning Technologies in February 2023 and a Master's in Data Science Analytics in May 2019 took nine years to complete. The purpose of this particular article is two-fold. First, this article contains a reflection on the doctoral journal, and the goal is to pass on some nuggets of wisdom to future graduate students. Second, this article summarizes the key findings from the three-article dissertation using web scraping techniques of public information, natural language processing (NLP), and usability testing.
Three concepts are necessary to understand the three-article dissertation related to online communities of practice (CoPs), practical knowledge, and NLP. Wenger and Synder (2000) described CoPs as groups of people informally gathered to share expertise in a specific domain or field as they interact regularly. Zappavigna (2006) stated that CoPs hold collective meanings and processes of acculturation where members internalize the group's norms and implicit knowledge structures. Practical knowledge refers to work-related knowledge valuable to practitioners in a particular profession. By extracting practical knowledge from information and communication technologies (ICTs), NLP algorithms are widely used unsupervised to identify the patterns in practical knowledge from codified language. The three-article dissertation used the Latent Dirichlet Allocation (LDA) and the Natural Language ToolKit (NLTK) to extract syntactic and semantic characteristics for quantifying practical knowledge. Also, LDA and NLTK are commonly used to develop application programming interfaces (APIs) that process textual data to support the organization of practical knowledge.
This article is organized into three sections. Section I summarizes the doctoral journal and research background. Section II outlines the three studies conducted as part of the three-article dissertation and design recommendations for supporting online CoPs. Section III concludes the article with intangible lessons learned from the doctoral journey.
Section I: The Doctoral Journal
In 2012, I vividly remembered reading a few research articles on the Waikato Environment for Knowledge Analysis (WEKA) that allowed academic researchers to perform data mining and machine learning tasks on large-scale datasets. As an open-source data mining tool, the WEKA project was created in 1992 and funded by the New Zealand government in 1993 (Hall et al., 2009). Since the tool's inception, many academic research studies have investigated learners' behaviors from big data generated in learning management systems (LMSs). Back then, I knew that actionable insights from big data from educational systems and learners' digital traces could be explored in efforts to improve online courses and the learning experience at the course and program levels.
Although the inception of WEKA dates back to 1992, most instructional design and technology (IDT) programs in 2006 did not cover data mining and machine learning topics. After 15 years of working in higher education, a doctoral degree in IDT seemed a natural progression. Around 2013, several online and on-campus IDT doctoral programs from Boise State University, Nova Southern University, Kansas University, and the University of Missouri were good choices at the time. After a year of researching IDT doctoral programs and talking to program advisors, the University of Missouri (MU) was the only choice that offered data science and analytics courses as part of a new graduate program. Traditionally, doctoral students take one or two semesters of quantitative and qualitative courses as part of the program of study (PoS). However, my PoS required six semesters of data science and analytics because I had already taken graduate-level courses in statistics during my IDT training.
In early 2014, I took a job opportunity with the MU, where I had to manage an online library for K-12 educators called the EdHub Library. Fast forward to 2019, the topic of data science and analytics was a huge learning curve for coding in Python and R. Despite the challenging aspects of data science and analytics, such experiences during the program provided new perspectives on applying algorithmic concepts to educational research using a variety of data sources, especially in the areas of educational data mining (EDM) and learning analytics (LA).
As of 2019, my research agenda heavily relies on instructional design, e-learning, and data science and analytics skills to measure how 38,000 educators have utilized professional development (PD) materials across Missouri, Kansas, and Nebraska school districts. My publications aim at developing platforms that lead to the digital transformation of user experiences by analyzing big data from analytics, server, and artifacts. More specifically, the two lines of research are related to (1) understanding instructional design practice from users' digital traces and (2) program evaluation of systems and user behaviors using machine learning for prediction, classification, and pattern recognition. Figure 1 summarizes my professional and doctoral background.
Figure 1. Dr. Javier Leung's Visual CV
Since passing the dissertation defense in February 2023, I have collected a few pieces of advice for future doctoral students as follows:
- Getting lost in rabbit holes is expected: This is one of the critical lessons I learned from my doctoral advisor. Although data science and analytics offer ways to analyze data, it is critical to establish the scope of research projects. While large scale or big data afford several investigations, limit the research findings to a few outcomes (i.e., 2-3 research questions) for a single publication. Do not be tempted to perform more than three things at a time. My first experience in this area happened in my first two publications, where I applied unsupervised clustering to analyze resource usage in the EdHub Library. However, I was tempted to perform a prediction task using the same dataset, which added another layer of findings. From a publication standpoint, adding another finding was unrealistic. Instead, consider additional analyses as follow-up studies that build a research agenda.
- Avoid comparison: In academia, comparing the number of publications among colleagues is unhealthy. It is best to avoid the comparison game at all costs and enjoy the process of building a research agenda based on professional accomplishments. After all, a doctoral PoS is tailored to the student's professional interests and outcomes. In the author biography section, a link to the doctoral portfolio contains the graduate coursework that outlines the courses taken during the doctoral studies and transferred credit hours from prior institutions.
- Well-being is always a priority: I passed the comprehensive examinations in Fall of 2019. Then, the world went into crisis during the COVID-19 pandemic in March 2020. During the first few weeks of the pandemic, isolation was not a problem. Still, the pandemic severely reduced opportunities on campus, especially after defending the dissertation proposal in August 2021. Although mental health was not a priority then, mindfulness practices took precedence as the pandemic and stay-at-home mandates unfolded. I learned that my mental health is foundational to my well-being and productivity.
- Best writing moments: The three-article dissertation took about three years to complete since defending the dissertation proposal in 2021. Although self-discipline was not an issue, I realized the best writing moments I enjoyed most were during breaks, between meetings, and during morning writing sessions. Although writing felt like a chore throughout my doctoral studies, I now consider it a moment for creativity.
- Dissertation as part of work: This particular advice is only for some. It is essential to understand that a doctoral program takes several years to complete, and the work environment may change when using work as part of a dissertation. As long as you are the project's primary stakeholder, you will have more assurance about completing the dissertation. However, if you are not the primary stakeholder when using a particular work project, ensure reliable, professional relationships that can stand work changes. Make sure also that the direct reports and higher-ups are in the know. Such understandings between parties prove to be helpful during organizational transitions or reorganizations. In my specific experience, I used my work because it was part of my duties as an instructional designer.
Section II: A Summary of the Three-Article Dissertation
This section of the article contains the individual studies, problem statement, purpose, contribution to the literature, research questions, methodology, key findings, and design recommendations from the three-article dissertation defense in February 2023.
The sequence of studies presented in the dissertation represents a critical step for improving online CoPs in IDT and teacher PD by analyzing the accumulated practical knowledge through NLP to explore how community members use respective online CoPs with a content management system (CMS), the Facebook social media platform, and custom development of responsive layouts for practical knowledge organization, sharing, and collaboration purposes. I am the sole author of the articles in the dissertation that were accepted in the following scholarly journals:
- Study #1: Leung, J. (2022). An NLP Approach for Extracting Practical Knowledge from a CMS-Based Community of Practice in E-Learning. Knowledge, 2(2), 310-336. https://doi.org/10.3390/knowledge2020018
- Study #2: Leung, J. (2022). Examining the Characteristics of Practical Knowledge from Four Public Facebook Communities of Practice in Instructional Design and Technology. IEEE Access, 10, 90669-90689. https://doi.org/10.1109/access.2022.3201893
- Study #3: Leung, J. (2021). Design Features of Online Teacher Professional Development: A Design Case for Re-Developing the EdHub Library to Improve Usability and Alignment of Content with Teacher Standards. International Journal of Designs for Learning, 12(2), 79-92. https://doi.org/10.14434/ijdl.v12i2.29578
The first study analyzed 9,033 online news articles in IDT from the E-Learning Industry across seven news categories. The second study examined 6,066 user posts from four public groups on Facebook (Instructional Designers, Designers for Learning, Adobe Captivate Users, and Articulate Storyline). The third study offered a design case for online teacher professional PD in standards-based PD called the EdHub Library, which served around 38,000 educators across Missouri, Kansas, and Nebraska as of 2022.
ICTs are currently built on Web 2.0 technologies emphasizing the creation and distribution of content (Önday, 2019). ICTs have also allowed the accumulation of practical knowledge in online CoPs as the byproduct of members' interactions. Although ICTs play a critical role in supporting the creation and distribution of practical knowledge, online CoPs have inefficient organizational schemes that mismanage the accumulated practical knowledge with little or no alignment to professional competencies. The lack of organizational schemes affects community members' ability to search for solutions independently, allowing for professional advancement from content- and contextual-based organizational perspectives. First, the literature has noted the usability issues in online CoPs' lack of mechanisms for browsing the accumulated practical knowledge from a content-based organizational perspective. Second, online CoPs are not designed as information spaces that enrich the professional advancement of community members by contextualizing practical knowledge with professional competencies
Although ICTs have enabled practitioners to develop online CoPs out of shared practice and necessity, future technologies should support community members' information tasks through organizational schemes that allow categorizing topics and evaluating practical knowledge based on professional competencies. The dissertation explores the mechanisms to support the navigation and evaluation of the accumulated practical knowledge ingrained in online CoPs as mental models and shared beliefs, identities, and meanings.
The studies provide an exploration of shared practical knowledge and practices in online CoPs and fill a gap in the literature in the following manner:
- Study #1: E-Learning News Outlet CoP: This site has its information architecture (IA) in seven news categories. Though practitioners are invited to write for the online CoP, how community administrators evaluate quality is unclear. While the site's structure helps organize its practical knowledge, additional structures within each news category are needed to facilitate practical knowledge and skills development.
- Study #2: IDT CoPs on Facebook: Members are not obligated to participate, and the problem is that discovering existing practical knowledge within the online CoP is challenging. Designing online CoPs as online information spaces and implementing curation practices should enhance community members' ability to find the answers independently and take a better inventory of their social and intellectual capital.
- Study #3: EdHub Redesign: This paper is an example of designing an online information space for a specific online CoP in K-12 within the Network for Educator Effectiveness (NEE) that aligns with the Missouri teacher standards. This study contributes to the teacher education literature by implementing several mechanisms to support educators' information needs in standards-based teacher PD.
Dissertation's Research Question
The following research question encapsulates the purpose and sequence of the three-article dissertation: What design features in online information spaces enhance the exchange of skills and knowledge among members of a community of practice?
The sequence of the papers describes the processes for taking inventory of shared practical knowledge from online CoPs that allow for the creation of mechanisms to represent and visualize practical knowledge:
- In study #1, the e-learning news outlet is structured into seven major categories. The topic representations of each category allow for further exploration within their existing structure.
- In study #2, CoPs using social media platforms (i.e., Facebook) do not have an IA that allows members to browse the available practical knowledge. The topic representations of these public groups allow for a mechanism to locate specific practical knowledge rather than rely on members' responses in an asynchronous online environment.
- Study #3 aims to improve the usability of an online teacher PD CoP that allows educators to align PD materials with Missouri teacher standards regardless of experience in teacher PD.
Studies #1 and #2 described NLP tasks for extracting topic structures from codified knowledge found in platforms that use different IAs to organize their practical knowledge. Study #3 used usability testing sessions to investigate a user interface that supported educators' alignment of PD materials with Missouri teacher standards.
Natural Language Processing
In the first two studies, NLP was employed as an exploratory means to examine the practical knowledge behind codified language in prominent online CoPs in IDT. Syntactic and semantic NLP tasks were implemented to extract the syntax-based characteristics and meaning behind text sources.
Syntactic NLP tasks refer to the characteristics of the syntax of words concerned with the position of words in a sentence without understanding the context around them. In these studies, the syntactic NLP tasks included average word and sentence lengths, n-grams, and word frequencies that allowed for exploring the codified knowledge from text sources. Lambda functions were used to calculate the average word and sentence lengths, and the visualizations were generated using the Profile Report package (Brugman, 2021). The n-grams language model in NLTK was implemented to understand the probabilities of contiguous words in trigrams and 4-grams (Natural Language Toolkit — NLTK 3.6.2 Documentation, n.d.). After implementing a stop words dictionary to remove uninformative words, word frequencies were obtained to understand the importance of words based on frequencies.
Semantic NLP tasks are concerned with extracting meaning out of the context of words. The semantic NLP tasks included sentiment analysis, named entity recognition and entity relationships, and topic modeling. The TextBlob package was implemented for sentiment analysis to identify positive, neutral, and negative attitudes in the texts (Lorian, n.d.). Text sources were processed with the spaCy package to extract entities, including names of people, places, organizations, and geographic locations (spaCy · Industrial-Strength Natural Language Processing in Python, n.d.). Once entities were extracted, subject-object relationships were formed as entity pairs to describe the source and target entities linked by edge entities.
The LDA and BERTopic topic modeling algorithms were implemented to identify the latent topic structures in online CoPs. In the first topic modeling technique, LDA generated topic models based on the word representations and probabilities from the bag-of-words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF) (Řehůřek, 2009). In the second topic modeling technique, BERTopic was implemented to generate topic representations against a pre-trained model (Grootendorst, 2020; Reimers, 2021). Although there is no standard evaluation procedure for topic modeling, semantically coherent topics were examined through human judgment and quantitative approaches. Chang et al. (2009) proposed evaluating topic model outputs using two methods, including topic and word intrusion methods. In terms of topic intrusion, discovered topics can be evaluated to determine whether the topic model's decomposition of documents agrees with human judgments. A topic model can be examined regarding word intrusion by observing the words inserted in a topic model that do not provide semantic coherence or coherent meaning. Regarding the quantitative approach, coherence values were obtained by running the topic modeling algorithms until the highest coherence values were achieved.
In study #3, redesigning the online teacher PD platform, the EdHub Library, required two usability testing sessions through voluntary participation from NEE and Assessment Resource Center (ARC) staff. This study was conducted as part of my duties as an instructional designer for improving a university-related service to school districts. Participant information was not collected, and no identifiable information can be traced back to participants.
The primary objective of the usability testing sessions was to obtain feedback from participants to refine the three-level hierarchical navigation structure of the prototype. The prototype contained three main sections (i.e., getting started, search engine feature, and topic categories). In the first usability testing session of the prototype, five NEE trainers were tasked to look for specific materials, write the location of the materials, and provide feedback related to their user experience.
After implementing feedback from the first usability testing session, the second prototype incorporated a fourth section for accessing dedicated NEE teacher indicator sitemaps directly from the homepage. The same five NEE trainers and three ARC participants were asked to locate materials, write down their location, indicate their search preference, and provide feedback on their user experience. In the two usability testing sessions, the correctness of the location of the materials was verified.
Overall, this line of research is critical in understanding online CoPs that support the information needs of IDT practitioners, software companies, and educators. The series of studies has demonstrated that:
There was an overemphasis on the technical aspects of IDT, and LMS was heavily emphasized for various functions. Out of the seven news categories, the articles in the LMS category were the longest, and the rest were shorter than those in other news categories. Across the news categories, most articles were written positively and referenced other articles within the online CoP. A few recommendations from study #1 include:
- Focusing on the pedagogical aspects of educational technology implementation and selection.
- Increasing the transparency of evaluation of articles.
- Adding additional structures within news categories and linking articles to professional standards.
The four Facebook groups in IDT actively exchanged educational technology and pedagogical advice. Most user posts among the four Facebook groups had similar sentiment distributions, and the majority were positive, followed by neutral and negative sentiments. Identical to the first study's findings, educational technology was overemphasized over pedagogical concepts. More importantly, the hashtag used in the online CoPs did not reflect the accumulated practical knowledge and provided little value to members when searching for information independently. Also, online CoPs needed more protocols for onboarding new members, addressing misinformation, and aligning topic structures to professional standards.
The recommendations for the second study were related to missed opportunities that Facebook groups could have taken during the pandemic. These communities had the potential to become information hubs to assist practitioners with their transition from in-person to online learning. Also, Facebook groups needed to become more adequate information spaces for community members, and they required topic structures that organized their practical knowledge from the content and context perspectives. Also, platforms should implement mechanisms to process user posts to support all aspects of the knowledge-creation process.
This design case showcased implementing a three-level hierarchical structure over a sequential structure for the teacher PD CoP. A similar arrangement can be implemented in other online CoPs where the interface supports content and contextual organization on the homepage and search engine. More importantly, study #3 is an example of an online CoP where knowledge is curated and organized consistently regardless of how educators search for PD materials.
In the first study, external resources from online articles and user comments were not processed due to the need for more consistency. In the second study, word frequencies and topic models can change over time once these online CoPs establish protocols for quality control. The third study was not quite a limitation, but it could have used additional participants, especially novice users, to assess the homepage and search engine.
Regardless of web technology platforms or the adoption of future development of web technologies, these recommendations are designed to enhance online CoPs as conducive information spaces for professional advancement:
- Design Feature #1: Provide better organizational schemes for categorizing and curating practical knowledge while aligning to professional standards: Online CoPs need to use topic structures that represent the accumulated practical knowledge. Also, additional topic and professional standards structures are required to organize and inventory produced knowledge.
- Design Feature #2: Establish community protocols for addressing misinformation and onboarding new members: The second design recommendation involves implementing protocols to address misinformation and onboarding new members to support the initial socialization stage. That way, community expectations are articulated when individuals join a community.
- Design Feature #3: Increase transparency in online CoPs by explicitly stating the purpose, functions, and protocols for producing, eliciting, and evaluating practical knowledge: Evaluation and elicitation procedures of knowledge need to be included. Online CoPs need to establish policies that enable members to articulate issues related to practice fully—this design recommendation supports better socialization experiences.
- Design Feature #4: Improve the search engine functions while aligning with topic structures and competencies: Currently, search engines are rudimentary and do not mirror community topic structures. The first online CoP can tailor the search engine to mirror the news categories. On the other hand, Facebook groups need to improve their hashtag structures before enhancing the search engine function to mirror a content-based structure.
- Design Feature #5: Leverage NLP pipelines to process and organize practical knowledge to promote member engagement and knowledge sharing: This design recommendation will depend on the technological evolution of platforms. Two types of leadership and member-level dashboards can process text sources to monitor the production and organization of practical knowledge in online CoPs.
Section III: Conclusion
In summary, ICTs are built on Web 2.0 technologies that allow for creating, storing, and sharing practical knowledge. Its accumulation is due to members' interaction in online CoPs. Without topic structures in place, members cannot seek solutions independently. Online CoPs, in their current form, are ineffective at curating their knowledge, and the assessment of practical knowledge is not always transparent. Online CoPs must establish protocols when producing, organizing, and evaluating practical knowledge.
Online CoPs must standardize how they produce knowledge, especially on social media platforms. The challenge in online CoPs on Facebook is that members may request advice on educational technology but lack certain contextual information. An excellent example of standardizing such requests is the SOAP protocol, which stands for Subjective Objective Assessment Plan from the healthcare community. With the SOAP protocol, the background information is provided along with the summary of the objectives, intervention plan, and assessment options.
In addition, online CoPs need evaluation criteria for determining the quality of practical knowledge, and they need organizational structures to classify practical knowledge. In the first study, the seven news categories are insufficient to organize online articles, and this online CoP must create additional topic structures to support IDT's pedagogical and professional aspects. In the second study of Facebook groups, the current hashtag structures do not help members when looking for practical knowledge independently.
Although the doctoral journey has ended, I can take the lessons learned and experiences as I move forward in my professional life. More importantly, the doctoral journey has taught me the intangible aspects of attaining a doctoral degree while working full-time in the following ways: gracefully receiving feedback from colleagues and advisors, cultivating the joy of writing, and understanding the ebbs and flows of becoming a researcher, which has taken nine years in the making.
Brugman, S. (2021.). Introduction — pandas-profiling 3.0.0 documentation. Pandas Profiling. Retrieved August 6, 2021, from https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/
Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J. L., & Blei, D. M. (2009). Reading tea leaves: How humans interpret topic models. Advances in neural information processing systems (pp. 288-296).
Grootendorst, M. (2020). GitHub - MaartenGr/BERTopic: Leveraging BERT and c-TF-IDF to create easily interpretable topics. BERTopic. Retrieved August 6, 2021, from https://github.com/MaartenGr/BERTopic
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: an update. ACM SIGKDD explorations newsletter, 11(1), 10-18. https://doi.org/10.1145/1656274.1656278
Leung, J. (2021). Design features of online teacher professional development: A design case for re-developing the EdHub Library to improve usability and alignment of content with teacher standards. International Journal of Designs for Learning, 12(2), 79-92. https://doi.org/10.14434/ijdl.v12i2.29578
Leung, J. (2022). Examining the Characteristics of Practical Knowledge from Four Public Facebook Communities of Practice in Instructional Design and Technology. IEEE Access, 10, 90669-90689. https://doi.org/10.1109/access.2022.3201893
Leung, J. (2022). An NLP approach for extracting practical knowledge from a CMS-based community of practice in E-Learning. Knowledge, 2(2), 310-336. https://doi.org/10.3390/knowledge2020018
Lorian, S. (n.d.). TextBlob: Simplified Text Processing — TextBlob 0.16.0 documentation. TextBlob: Simplified Text Processing. Retrieved August 6, 2021, from https://textblob.readthedocs.io/en/dev/
Natural Language Toolkit — NLTK 3.6.2 documentation. (n.d.). Natural Language Processing Toolkit - NLTK. Retrieved August 6, 2021, from https://www.nltk.org/
Önday, Ö. (2019). Web 6.0: Journey From Web 1.0 To Web 6.0. Journal of Media & Management, SRC/JMM-102. https://doi.org/10.47363/JMM/2019(1)102
Řehůřek, R. (2009). Gensim: topic modelling for humans. Topic Modelling for Humans. Retrieved August 6, 2021, from https://radimrehurek.com/gensim/
Reimers, N. (2021). Pretrained Models — Sentence-Transformers documentation. Pre-Trained Models. https://www.sbert.net/docs/pretrained_models.html
spaCy · Industrial-strength Natural Language Processing in Python. (n.d.). spaCy - Industrial-Strength Natural Language Processing. Retrieved August 6, 2021, from https://spacy.io/
Wenger, E. C., & Snyder, W. M. (2000). Communities of practice: The organizational frontier. Harvard Business Review, 78(1), 139-146.
Zappavigna, M. S. (2006). Tacit knowledge in communities of practice. In Encyclopedia of communities of practice in information and knowledge management (pp. 508-513). IGI Global. https://doi.org/10.4018/978-1-59140-556-6.ch08
About the Author
Javier Leung is an instructional designer responsible for measuring the impact of online learning environments and 500+ self-placed materials for around 38,000 educators across Missouri, Nebraska, and Kansas. With over 15 years of experience, Dr. Leung is a seasoned instructional designer, e-learning developer, and front-end developer in higher education and talent development. His research aims to impact how technology and learning engineering can be applied to sustain better learning experiences and interfaces in educational systems that involve program evaluation and analyzing big data to understand user behavior using machine learning for prediction, classification, and pattern recognition. Dr. Leung also leverages educational data mining and learning analytics approaches for investigating knowledge structures and shared professional practices from unstructured data in instructional design communities of practice through natural language processing and user experience design. More about this professional background and publications can be accessed at vita.javierleung.com or www.javierleung.com
His email is email@example.com.