## C2C Digital Magazine (Fall 2017 / Winter 2018)

Colleague 2 Colleague, Author
Cover, page 23 of 26

You appear to be using an older verion of Internet Explorer. For the best experience please upgrade your IE version or switch to a another web browser.

#### An Exploratory Factor Analysis of IT Satisfaction Survey Data (from 2014)

By Shalin Hai-Jew, Kansas State University

To see how an exploratory factor analysis might work with real-world data, anonymous IT satisfaction survey data from 2014 was used for this light walk-through.  The survey instrument itself was a revised inherited open-source survey.

First, before data is used, it has to be downloaded accurately.  One step was to recode the Likert-like scores with descending point values instead of ascending point values (Figure 1).

Downloading Data in the Proper Format

Figure 1:  Recoding Values from the Survey

When the data was downloaded, the numerical values were selected—so that the variables (items) could be compared (to see if there were co-associations).  (Figure 2)

Figure 2:   Downloading the Data Table with Numeric Values

The basic data structure of the extracted information was to have variables in the columns, and the row data were the cases.

The survey itself was exported, so that when the factor analysis was done, the questions and prompts related to the respective variables could be read directly (Figure 3), and text labels could be applied to the extracted factors (components or dimensions).  (What should also have been done was a renumbering of each of the questions, so that everything would be in order.  Surveys do not come together in perfect order in chronological time, so the renumbering would have brought increased clarity.)

Figure 3:  Exporting the Survey to Word

Ensuring Data Meets Requirements for Factor Analysis

There are some data requirements before factor analyses may be run on the data effectively.

Sufficiency of dataset and “some multicollinearity”.   Because exploratory factor analysis is a multivariate statistical analysis approach, there have to be a sufficient number of variables that have some similarities among themselves to enable correlations (co-relating).  In this survey, there were some 50 usable items for the analysis and 413 responses (but fewer fully completed cases).

The dataset has to have numeric data for the variables.  This means that any string (text) data has to be recoded to numeric data or omitted.  (In this case, some open-ended questions were omitted as were some complex multi-response questions.)

There have to be a sufficient number of cases (with complete response data per case or per respondent or per row) to enable the identification of factors, in a statistically significant way.

Prior to the factor analysis, some tests were run on the data for “multi-collinearity.”  Some level of multicollinearity is desirable because if variables are unrelated (totally orthogonal), they are not likely to cluster around some related factors.  If they are too collinear, then the results might be muddy, with extracted factors being hard to define.

A simple test sometimes used to linearly test multicollinearity is sometimes done (albeit with switching out different variables to be the dependent variable).  For this process:

Open IBM's SPSS.

Open a new dataset (wait while it generates a preview).

Work through the Text Import Wizard (for the .csv file).  Clarify how the variables are arranged, where the variable names are located (usually in Row 1), the symbol used for decimals, how the cases are arrayed (by rows), the delimiter identification (comma), how dates are represented in the dataset, and so on.  Sometimes, it takes several go-arounds to get the underlying data in shape (in Excel first) before it may be opened into SPSS (Statistical Package for Social Sciences).

Go to the Analyze tab in the menu.  Select Regression -> Linear (Figure 4).

Figure 4:  Analyze -> Regression -> Linear

Pick one variable to be the Dependent Variable (DV), and then load the other variables in the Independent Variables (IV) block (Figure 5).

Figure 5:  Variables in the Linear Regression

In the Statistics area, select the “Collinearity diagnostics” box (Figure 6).

Figure 6:  Selection for Collinearity Diagnostics

In the Coefficients table, check the column for collinearity statistics.  “Tolerance” refers to how variable one independent variable is that is not explained by the other independent variables.  The desirable cutoff for tolerance is said to be between 0.10 to 0.25 as the minimum tolerance ("Tolerance," 2017).  For the “VIF” or Variance Inflation Factor, this index “measures how much the variance (the square of the estimate’s standard deviation) of an estimated regression coefficient is increased because of collinearity” or correlation with other predictor variables ("Variance Inflation Factor," 2017).   The highest level of VIF that is acceptable is 10 (“Variance Inflation Factor,” 2017)  A few variables go over that in this dataset…from the one run linear regression (Table 1).  (As noted earlier, this should be run multiple times with different variables in the DV position.)

Table 1:  Coefficients Table

On the whole, though, this data works.   There is some degree of multicollinearity but not excessive amounts.  There are sufficient cases, and there are sufficient variables (items).

Running the Exploratory Factor Analysis in SPSS

Again, the assumptions of an “exploratory” factor analysis is that the researcher does not have initial hypotheses about factors in the data. There are no applied theories or frameworks or other a priori assumptions. There are no hypotheses being tested with the factor analysis.  (If there were a priori assumptions, then the type of factor analysis being conducted would be a “confirmatory” factor analysis.  The process would be the same, but the analytical approaches would be different.)  Figure 7 shows how to start a factor analysis.

Figure 7:  Analyze -> Dimension Reduction -> Factor

In the related windows, it is important to include all variables in the factor analysis.  In the section for the Correlation Matrix, the Kaiser-Meyer-Olkin (KMO) test should be run in order to see how well the dataset is set up for the extraction of factors, and the related Bartlett’s test of sphericity (which tests for homogeneity of variances and how normal an underlying frequency distribution is for a set of data).  Extract a correlation matrix.  Extract a scree plot.  Set the factor extraction based on factors with eigenvalues greater than 1 (instead of selecting a desired number of factors, at least not initially).  Go with the default maximum iterations for convergence at the default of 25.  The factor analysis rotation may be set at Promax with a Kappa of 4.  Select any other desirable matrices.  Exclude cases listwise (if there is incomplete information in a case, then it is left off altogether from the factor analysis.)  Select the coefficient display format to be sorted by size (from the biggest factors to the smallest), and suppress small coefficients at 0.30 to start.  (The default in SPSS is set to 0.10.)

Resulting Factor Analysis

As noted earlier, the KMO indicates how likely it would be for this dataset to enable some factors (sub-groups) to be extracted.  A KMO score would ideally be 0.80 or higher, and in this case, rounded up, this dataset just makes.  The statistical significance for Bartlett’s Test is sufficient (Table 2).

Table 2:  KMO and Bartlett’s Test of Sphericity

A study of “communalities” shows how much an item correlates with all other items, and Table 3 shows some fairly high correlations, which suggests some communal coherence among the variables.

 Communalities Initial Extraction Q4 1.000 0.743 Q6 1.000 0.533 Q8_1 1.000 0.704 Q8_2 1.000 0.772 Q8_3 1.000 0.780 Q8_4 1.000 0.684 Q36_2 1.000 0.701 Q36_25 1.000 0.670 Q36_3 1.000 0.663 Q36_29 1.000 0.749 Q36_6 1.000 0.696 Q36_4 1.000 0.766 Q36_18 1.000 0.713 Q36_26 1.000 0.671 Q36_15 1.000 0.706 Q36_9 1.000 0.688 Q36_30 1.000 0.750 Q36_7 1.000 0.716 Q36_5 1.000 0.734 Q36_8 1.000 0.608 Q36_10 1.000 0.566 Q36_16 1.000 0.690 Q36_12 1.000 0.668 Q36_19 1.000 0.725 Q36_27 1.000 0.784 Q36_11 1.000 0.614 Q36_31 1.000 0.711 Q10_1 1.000 0.918 Q10_2 1.000 0.903 Q10_3 1.000 0.825 Q12_1 1.000 0.732 Q12_2 1.000 0.639 Q12_3 1.000 0.860 Q12_4 1.000 0.817 Q14_1 1.000 0.824 Q14_2 1.000 0.779 Q14_3 1.000 0.576 Q16_1 1.000 0.895 Q16_2 1.000 0.861 Q16_3 1.000 0.819 Q16_4 1.000 0.895 Q17_1 1.000 0.708 Q17_2 1.000 0.744 Q18 1.000 0.746 Q19_1 1.000 0.724 Q19_2 1.000 0.914 Q19_3 1.000 0.668 Q19_4 1.000 0.757 Q19_5 1.000 0.812 Q19_6 1.000 0.785 Q19_7 1.000 0.724 Q19_8 1.000 0.802 Q19_9 1.000 0.649 Q19_10 1.000 0.715 Q22_1 1.000 0.679 Q22_2 1.000 0.793 Extraction Method: Principal Component Analysis.

Table 3: Communalities

Figure 8 shows a scree plot. This line graph shows the loadings of extracted factors above an eigenvalue of 1 (where the red line is).  This visualization shows the 14 factors that had fairly significant loading by variables in this particular dataset.

Figure 8 :  Scree Plot

Table 4, the “Total Variance Explained,” shows that this factor analysis can explain a cumulative 73.872% of the variance in this dataset.

Table 4:  Total Variance Explained

In Table 5, the Pattern Matrix, the fourteen “common” factors may be seen in the column headers in descending order.  The coefficients in the pattern matrix show how the variables in the left column load to the respective factors.  Cross-loadings, when variables load to two factors (with less than a .2 difference), are usually addressed by eliminating the variables sequentially and seeing if that may be done without ending up with a failure to converge in 25 rotations.   The variables with problematic crossloadings are the following:  Q36_30, Q36_25, Q36_4, Q14_3, Q36_18, Q22_1, Q36_16, Q6, and Q36_10.

If the variables cannot be removed without causing problems in the rotation, then another approach is to raise the threshold for inclusion of loading.  In this case, the loading coefficient was set to 0.4, which resulted in a much cleaner pattern matrix, with only three variables cross-loading.  Increased cumulative variance was explained (74.244%), with 14 factors.

Table 5:  Pattern Matrix

The next step, once a pattern matrix has been cleaned up, is to label the variables with human-readable text for sense-making.  (To see the second pattern matrix, please click here.)  In this case, one has to return to the survey to read the related questions and prompts for the loading items…in order to label them.  (Also, a component correlation matrix can be run to see how the factors correlate with each other.)

So the variables that loaded to the respective factors were as follows:

Factor 1:  Q19_2, Q19_10, Q19_5, Q19_4, Q19_8, Q19_3, Q19_1, and Q19_7
Factor 2:  Q36_29, Q36_27, Q36_31, Q36_19, Q36_11, Q36_7, Q36_6, Q36_12, Q36_8, and Q36_30
Factor 3:  Q16_4, Q16_1, Q16_2, Q16_3, and Q36_3
Factor 4:  Q8_2, Q8_3, Q8_4, Q8_1, and Q36_25 (albeit with co-loading)
Factor 5:  Q10_1, Q10_2, Q10_3, and Q36_5
Factor 6:  Q36_15, Q36_26, Q36_9, and Q36_4
Factor 7:  Q14_1, Q14_2, and Q36_2
Factor 8:  Q12_3 and Q12_4
Factor 9:  Q4 and Q36_18
Factor 10:  Q22_1 and Q19_6
Factor 11:  Q22_2 and Q36_16
Factor 12:  Q12_1 and Q12_2
Factor 13:  Q18 and Q6
Factor 14:  Q17_2 and Q17_1

The applied labels to the respective factors were as follows:

1. Network Function and Reliability
2. IT Physical Spaces and Direct Services
3. Telephone Services
4. Tech Services / Follow-through
5. Help Desk Support and Availability
6. User Accounts, CONNECT Services, Survey Tool, Web Services
7. Computer Technologies in Classrooms
8. Mobile Friendly Technologies
9. Gender and ?
10. Availability of Network and Mobile
11. File Storage Location and Network Speed
12. Mobile Friendly Technologies
13. Wifi Availability on Campus
14. Strength of Signal, Availability of Connectivity on Campus

Table 6:  Named Factors

Ideally, names for these factors (components or dimensions) should be much more parsimonious.

Assessing the Reliability of the Factors

Finally, some researchers test the reliability of the respective factors by bundling the respective variables underlying each factor and running a reliability analysis on them (to see how coherently the respective factors are describing a particular construct).  For this, Cronbach’s alpha is captured as a representation of reliability (Figure 9).

Figure 9:  Analyze -> Scale -> Reliability Analysis

Figure 10 shows the selection of the variables that loaded to Factor 1.

Figure 10:  Variables for Factor 1 Selected

The reliability analyses result in a Cronbach’s alpha score.  The general cutoffs suggest that .6 and over is acceptable, .7 to .9 is good, and .9 is excellent… in terms of how well the variables represent a coherent construct (or factor).  If the numbers are lower than .6, the reliability of the elements representing the factor construct is unreliable…and may not be indicating a coherent or valid component.

 Factor Numbers (in descending order) Variables / Items Topic / Applied Name to the Factors Cronbach’s Alpha (for reliability) 1 Q19_2, Q19_10, Q19_5, Q19_4, Q19_8, Q19_3, Q19_1, and Q19_7 (8 items / variables) Network Function and Reliability 0.93 [182 (44.1%) cases valid cases, 231 (55.9%) excluded (based on listwise deletion)] 2 Q36_29, Q36_27, Q36_31, Q36_19, Q36_11, Q36_7, Q36_6, Q36_12, Q36_8, and Q36_30 (10 items / variables) IT Physical Spaces and Direct Services 0.905 [312 (75.5%) cases valid cases, 101 (24.5%) excluded] 3 Q16_4, Q16_1, Q16_2, Q16_3, and Q36_3 (5 items / variables) Telephone Services .927 [304 (73.6%) cases valid, 109 (26.4%) excluded] 4 Q8_2, Q8_3, Q8_4, Q8_1, and Q36_25 (albeit with co-loading) (5 items / variables) Tech Services / Follow-through .819 [326 (78.9%) cases valid, 87 (21.1%) excluded] 5 Q10_1, Q10_2, Q10_3, and Q36_5 (4 items / variables) Help Desk Support and Availability .914 [314 (76%) cases valid, 99 (24%) excluded] 6 Q36_15, Q36_26, Q36_9, and Q36_4 (4 items / variables) User Accounts, CONNECT Services, Survey Tool, Web Services .720 [325 (78.7%) cases valid, 88 (21.3%) excluded] 7 Q14_1, Q14_2, and Q36_2 (3 items / variables) Computer Technologies in Classrooms .779 [310 (75.1%) cases valid, 103 (24.9%) excluded] 8 Q12_3 and Q12_4 (2 items / variables) Mobile Friendly Technologies .826 [225 (54.5% cases valid, 188 (45.5%) cases excluded] 9 Q4, Q36_18 (2 items / variables) Gender and ? .113 [319 (77.2%) cases valid, 94 (22.8%) excluded] 10 Q22_1 and Q19_6 (2 items / variables) Availability of Network and Mobile .669 [298 (72.2%) valid, 115 cases (27.8%) excluded] 11 Q22_2 and Q36_16 (2 items / variables) File Storage Location and Network Speed .713 [305 (73.8%) cases valid, 108 (26.2%) excluded] 12 Q12_1 and Q12_2 (2 items / variables) Mobile Friendly Technologies -.254 “The value is negative due to a negative average covariance among items. This violates reliability model assumptions. You may want to check item codings.” [279 (67.6%) cases valid, 134 (32.4%) excluded] 13 Q18 and Q6 (2 items / variables) Wifi Availability on Campus .035 [295 (71.4%) cases valid, 118 (28.6%) excluded] 14 Q17_2 and Q17_1 (2 items / variables) Strength of Signal, Availability of Connectivity on Campus .619 [305 (73.8%) cases valid, 108 (26.2%) excluded]

Table 7:  Measuring the Reliability of the Respective Factors by Variables with Cronbach’s Alpha

Table 7 communicates the descending order of reliability of the factors.  Factors 9, 12, and 13 were not considered reliable.

Conclusion

Ideally, the exploratory factor analysis would be explicated in depth in a research context.  The underlying survey instrument would be available for analysis.  The data would be fully explored in other ways beyond the EFA.  How respondents interacted with the survey could be mentioned, and light item analysis may be done.

In this case, the data was fairly dated.  It looks like the original online survey had been revised after the initial offering, so the downloaded data referred to responses to questions no longer available on the Qualtrics site.  (The variable "Gender and ?" points to the missing block of variables.)  This data was used as a walk-through, though, and the outputs still make sense even with the known limits to the data.

About the Author

Shalin Hai-Jew is an instructional designer at Kansas State University.  Her email is shalin@k-state.edu.

Comment on this page

#### Add your voice to this discussion.

Checking your signed in status ...

 Previous page on path Cover, page 23 of 26 Next page on path