Book Review: Contemporary Research into Advanced Applied Data Pattern Recognition
By Shalin Hai-Jew, Kansas State University
Advances in Pattern Recognition Research
Thomas Lu and Tien-Hsin Chao
New York: Nova Science Publishers
2018 272 pp.
“Artificial Intelligence (AI) has become a popular research topic recently. Pattern recognition (PR) is an important part of an AI system. If the AI is considered as the digital ‘brain’, then the PR is the visual and auditory ‘cortex’ that converts the optimal signals from the eyes and the acoustic signals from the ears to meaningful symbolic texts that the brain can digest.”
-- Thomas Lu and Tien-Hsin Chao, Preface, p. vii
What is it in that digital or digitized image? Does the CCTV image show a person of interest moving through the city and approaching a high security zone? Do the biometrics systems show a match to Person A or a whole other person altogether? Do the live images on a fighter jet show an enemy aircraft, and what sort of missile is it firing? What about the images of people at the unmarked border crossing? What does the sonar imagery show about the undersea structures, and is that region of interest showing anything human-made? What do the x-rays or radiographs show about the person’s body, and what are the health implications?
Advances in Pattern Recognition Research (2018), edited by Thomas Lu and Tien-Hsin Chao, both of NASA’s Jet Propulsion Laboratory of the California Institute of Technology, provides a sense of some of the algorithms applied to the analysis of digital and digitized images, the identification of target objects, the informatization of imagery for high-risk contexts, and the extraction of visual data patterns, among others. Even though the included chapters are highly technical (a few including long sequences of equations), Advances… is somewhat readable to a broader audience, including non-specialists, because of fairly tight writing organization, explanatory diagrams, visual examples, general research principles, and other aspects.
A Light Introduction into the Space
A light introduction into the space may be helpful. Based on building human technologies in biomimetic ways (copying biological capabilities observed from nature), computer scientists have been researching how to create computers that could “learn by example,” such as through computational “neurons” (after animal-based and human-based neural networks and brain processing of sensory information in certain brain locations and along certain paths). Their efforts, starting in the 1950s, are coming to fruition, with these connectionist models (in hardware and software) applied to a wide range of machine learning. In the same way that sensory signals are input into biological neurons, with informational processing occurring in the cell body, and the output captured and communicated to another neuron through axons, artificial neural networks (ANN) take in real-world (and / or synthetic) data, process it, and provide powerful insights (such as classification, predictivity, and others). Over the years, the types of NNs (as connectionist models, with linked neurons with different functionalities) have evolved. These models and their machine learning capabilities have been tested based on various datasets and the ability of the technologies to accurately identify the target objects (with the results expressed in confusion matrices and other tools common to “signal detection theory”).
In the cases described in this work, the inputs are images, and the outputs may be data or classification labels or image sets, and others. In between, images are read in “windows” or “grids” and processed using wave to capture changes in the images over time. (A common approach for this is the use of convolutional neural networks.) More applied designs have been created and tested and refined for particular “use cases.”
It turns out that images in the world are prone to all kinds of noisy data, with complex light reflectances and diffractions (and resulting illusions), ambiguous depth of field, visual complexities of moving objects (with shape profiles changing dynamically over time and space), air particles, different light exposures in imagery, camera and sensor artifacts, technological challenges, and other difficulties in acquiring signal-rich imagery. Some interventions involve more effectively separating foreground and background, normalizing lighting in imagery, enhancing the focus on particular regions of interest (labeled as “ROI”), and making other adjustments in fully automated or partially manual ways. For all the unique challenges in different image-capture contexts, human uses of these images have implications for humanity, and hyper-precision is critical.
Some social videos touch lightly on some of the harnessed technologies (at least in simpler instances). These videos are not related to the text.
Target Acquisition through Machine Vision and Image De-Noising
The two co-editors Tien-Tsin Chao and Thomas Lu open this book with their research on “Automatic Target Recognition Processor Using Integrated Grayscale Optical Correlator and Neural Network” (Ch. 1). Their targets include missiles shot from fighter aircraft, fighter aircraft, aircraft in an aircraft boneyard (from a satellite or aerial image), and then potential mines from underwater sonar images. The implied applications may be for fighter pilots needing affirmation of what planes they are engaging and what missiles. Or there may be an application for analyzing dynamic underwater sonar imagery to understand what is there, mine openings (as shadows) and other underwater features and objects. Those examples give a sense of the potential challenges of object identification—the speed of movement in the skies, the low contrast of a fighter jet against a sky, the pressured context…the sonar (sound-wave) imagery from under water, the various depths and elements, and so on.
One target identifier approach uses the Optical Fourier domain correlation architectures approach in which images are analyzed (and time <-> frequency mapped in terms of the visual signals), and those areas that may be of interest would trigger peaks in a grayscale optical correlator. (The parallel processing of such images results in image processing at the speed of light.) While “bulky and fragile” in earlier iterations, these have come a long way (Chao & Lu, 2018, p. 1). The initial image reads (with ROIs identified with low thresholds and so resulting in a number of false positives) are then corrected using neural networks as post-processing after the optical correlation.
The advancement of the laser and electro-optic spatial light modulators (EO-SLMs) made it possible to construct miniaturized optical correlator with low-power consumption and high-speed information processing. Multiple composite filters can be synthesized to recognize broad variations of object classes, viewing angles, scale changes, and background clutters. Digital neural networks are used as a post-processor to assist the optical correlator to identify the objects and to reject false alarms. A multi-stage automated targeted recognition (ATR) system has been designed and implemented to perform computer vision tasks with adequate proficiency in mimicking human vision. The system can detect, identify, and track multiple targets of interest in high-resolution images. The approach is capable of handling large number (sic) of objects variations and sets. (Chao & Lu, 2018, pp. 1 – 2)
To elaborate, the authors write: “The verification stage then transforms the regions of interest into feature space and eliminates false positives using artificial neural network classifiers” (Chao & Lu, 2018, p. 2). The co-authors have developed what they termed “an advanced multi-stage Automatic Target Recognition (ATR) Processor” that is ruggedized, physically light, and built “to achieve an optimal balance between accuracy and computational efficiency by incorporating both fast coarse search/detection and accurate identification/verification” (Chao & Lu, 2018, p. 2). The authors describe the usage of a “feed-forward back-propagation network” for supervised training (with labeled imagery as input data) and a self-organizing neural network for unsupervised training (with unlabeled imagery as input data) (pp. 2 - 3). The authors note that an operator may override the results of the ATR system if they find it is giving wrong output and “force the system to perform supervised learning” (p. 3). The authors explain some of their engineering innovations:
A Grayscale Optical Correlator (GOC) has been developed to replace the Binary-phase only optical Correlator (BPOC). In this correlator, a grayscale input SLM was used to replace the previously used binary SLM. This has eliminated the need for a binarization preprocessing step. A Ferroelectric Liquid Crystal (FLC) SLM (e.g., made by Boulder Nonlinear Systems) capable of encoding real-valued data is also used for the correlation filter implementation. This grayscale optical correlator has enabled a direct implementation of a gray-scale-based correlator composite filter algorithm. (Chao & Lu, 2018, p. 6)
While most of the above will not be understandable to those outside the field, the quote gives a sense of the constant pushes for efficiencies and functions and the need to cobble relevant technologies in the right ways and in the right sequences to solve real-world challenges. [The co-authors share photos of the respective elements of the miniaturized Grayscale Optical Correlator (Chao & Lu, 2018, pp. 11 – 14).]
The system was tested using video data from test flights and sonar arrays (under water). Some images were used to emulate noise and clutter in environments that could confuse the image detector, to enable the co-authors to mitigate those challenges through innovations and designs. There were about 100 images in the sonar training and test dataset, with one to two mines in each image (shadows), along with other shadows. Their final training set resulted in “1175 true positive and 1227 false positive regions” for training the neural network. 70% of the training set was trained using back-propagation, 15% for validation, and 15% for “independent verification of the networks (sic) ability to generalize” (Chao & Lu, 2018, p. 38). This image set was then separated into “easy, medium, hard and all images” based on the challenges to discrimination of the respective underwater mines through visual information (p. 39).
They applied different filters for higher True Positive Rates (TPRs) and more optimized performance. They applied wavelet transform methods to further refine the process and capture information missed by the OT-MACH filter (Chao & Lu, 2018, p. 34). A feature extraction step was added to capture unique feature vectors for the target object (p. 34). Their invention works both to identify target objects through visual data and to track objects moving in physical space. The co-authors explain their work with clarity, by using diagrams, photographs, text descriptions, screenshots of user interfaces, equations, data tables, 3D plots, and other data.
Using Deep Neural Networks (DNN) to Anticipate Flashovers in Fire Scenes for First Respondents
In the context of a fire, a “flashover” is a phase of rapid fire spread due to a buildup of high heat. Its arrival can occur in minutes from the start of a fire, depending on conditions and contexts. For firefighters, knowing when that that temperature threshold is imminent is critical to their survival. Kyongsik Yun, Alexander Huyen, and Thomas Lu’s “Deep Neural Networks for Pattern Recognition” (Ch. 2) aims to create a system using regular camera images of a fire to provide early warning (maybe in a wearable) to firefighters and first respondents of the imminence of flashovers, at a lower cost than the thermal cameras used by some fire stations today.
The co-authors describe a core technology harnessed for their approach—the artificial neural network.
Human visual perception has hierarchical structure as described above and massively parallel processing capability based on thousands of synaptic connections in each neuron. Furthermore, human neural networks feature a winner-take-all framework which selects the most relevant neurons along the spatial dimensions in each layer. These three features (hierarchical structure. Parallel processing, winner-take-all framework) inspired to build artificial neural networks, recently further evolved into deep convolutional neural networks. Winner-take-all framework especially adapted to build essential components of deep convolutional networks as max pooling and rectified linear unit (ReLU). (Yun, Huyen, & Lu, 2018, pp. 50 - 51)
The winner-take-all approach means that the neuron that is most effective in a hidden layer is the one through which future paths will travel (and all other nodes are outside that path). The conditional generative adversarial networks type of deep convolutional neural networks is thought to emulate “a human brain process that combines a bottom-up recognition (discriminator network) and a top-down imagination (generator network)” (Yun, Huyen, & Lu, 2018, p. 51). As an input, these require “a dataset of paired images, a real image and its ground truth, which is the precise segmentation boundary of the object. The model is conditioned to take an infrared image of an object as an input and product a binary mask of the object as an output” (p. 54). Further, these generative adversarial networks…
…learn the loss of classifying whether the generated output is real or not, while at the same time the networks learn the generative model to minimize this loss. The conditional adversarial networks learn the mapping function from the input to the output as well as learning the loss function to train the mapping. (Yun, Huyen, & Lu, 2018, p. 51)
This process enables the input of real imagery and the output of fairly accurate “ground truth” imagery with defined edges. Also, these neural network types calculate the differences between the estimated and true values to understand the gaps in performance.
As with other deep neural networks, these require large amounts of input data to train, and they may involve many hidden layers, each encoding different distinctive features of the images (which are high-dimensional in terms of data). Data samples may also require effortful pre-processing, to ensure that the image sizes are comparable to each other There is work to retrain deep NNs to enhance particular features that may have gone missing in earlier iterations. Some of this work may be manually done (with powerful human vision), and others may be based on algorithmic processes to identify actual edges between objects (Yun, Huyen, & Lu, 2018, p. 56).
The ability to determine an edge may help in the design of “situational awareness” systems, such as those enabling firefighters to see through objects with augmented reality glasses that mitigate visual blockages:
If firefighters can see through the fire, it will help them to navigate the incident more safely and efficiently. Moreover, firefighters can quickly find victims partially occluded by objects. Estimating the size and shape of a fire in a partially occluded situation is particularly important in that it can predict a sudden fire explosion (flashover phenomenon) (Yun, Huyen, & Lu, 2018, p. 59).
To operationalize their research, the authors captured images of fires using Google Images and emplaced false occlusions (white rectangles) on them. They set up their system to generate (using U-Net architecture) what it thought might be behind the occlusions and then checked against the real original unoccluded images as a test. (Yun, Huyen, & Lu, 2018, p. 60)
In their work, they reasoned that some test images did not result in the proper reconstruction of images “due to a lack of variability in the training data set” (Yun, Huyen, & Lu, 2018, p. 63). They observe that the availability of “contextual information, including building structure, fire cause, and gas/chemical sensor information” may enhance increased “semantic reconstruction” to inform the image set (p. 63). Their model was able to predict “a flashover as early as 55 seconds before it occurred” (Yun, Huyen, & Lu, 2018, p. 65). Regular body cameras may be used to capture thermal information to provide first responders with critical information about the heat of a fire and the environs.
The authors’ usage of synthetic test data for machine learning is important. They also make the point that augmented reality may be “a useful method for creating data for DNN (deep neural network) training” to “automatically guide data augmentation by tracing the real-world dynamics of an object in a reference video” (Yun, Huyen, & Lu, 2018, pp. 70 - 71). The co-authors work at the Jet Propulsion Laboratory, California Institute of Technology, in Pasadena, California.
Advances in Joint Transform Correlation in Imagery for Pattern Recognition
In “Robust Pattern Recognition via Joint Transform Correlation” (Ch. 3), Paheding Sidike, Mohammad S. Alam, and Vasit Sagan explore various types of joint transform correlation (JTC) methods that enhance pattern identification in visual imagery (in both facial recognitionand texture identification databases). Generally, JTC methods were designed to “tackle in-plane and out-of-plane object distortions” (p. 81), for shift invariant pattern recognition (that the image is recognized no matter where it falls on the 2D plane). This means that faces can be recognized even if they are angled differently in relation to the camera, for example, and likewise with textures. The advances serve to contravene or minimize noise effects on the capture of an accurate signal, even with unknown input scenes and in the face of geometrical distortions [“changes in size, orientation, and rotation” (p. 82).]
Pattern recognition “based on coherent correlator is one of the most essential paradigms for optimal information processing” (Sidike, Alam, & Sagan, 2018, p. 82), such as the Joint Transform Correlator or the VanderLugt Correlator. An image processed through the correlator results in a 3D plot with “peaks” in particular locales to indicate the presence of visuals of interest. The classical JTC filter is prone to particular types of visual distortions and errors:
However, the high correlation peaks may not correspond to the targets due to the possibility of false alarms generated by complex background and sensor noise. On the other hand, the unknown input images may involve various illumination changes of the target with that of the reference, which can degrade the performance of the classical JTC. These destructive factors cause the classic JTC suffers from low diffraction efficiency, broad correlation width, and discrimination sensitivity. (Sidike, Alam, & Sagan, 2018, pp. 82 - 83)
The authors describe a range of potential solutions, including running target images in different distorted versions through a learner (Synthetic Discriminant Function), eschewing a modular rectangular grid for Gaussian ringlets [by applying a Vectorized Gaussian Ringlet Intensity Distribution (VGRID) with Spectral FJTC] to enable form invariance, “morphological correlation, spatial frequency dependent threshold function and Local Phase (LP) features…fused with JTC or FJTC” to mitigate illumination variances, and others (Sidike, Alam, & Sagan, 2018, p. 84).
The first and third authors from the Department of Earth and Atmospheric Sciences at Saint Louis University in St. Louis, Missouri, and the second author from the College of Engineering, Texas A&M University-Kingsville, in Kingsville, TX.
Tradeoffs between Visual Signal and Background Noise
Several of the works use the Optimal Trade-off Maximum Height Average Correlation Height (OT-MACH) filter, which creates a 3D correlation plot based on a 2D image input. The higher the correlation of the target object to the found one, the higher the correlating height on the 3D plot. In “The Spatial Domain Optimal Trade-off Maximum Average Correlation Height Filter and its Performance Assessment” (Ch. 4), by Akber Gardezi, Ahmed Alkandri, Rupert Young, Philip Birch, and Chris Chatwin (from the Department of Engineering and Design, University of Sussex, Brighton, United Kingdom), the authors compare the performances of the OT-MACH filter, the spatial domain (SPOT-MACH) filter, and the Scale Invariant Feature Transform (SIFT) as applied to infra-red imagery from desert scenes with low foreground-background contrast. The images used for the research include Forward Looking Infrared Sensor (FLIR) imagery from the Kuwait Ministry of Defense:
The detection and recognition of targets in FLIR images has always been a challenging problem due to the varying heat signature of the object and background clutter. The FLIR imagery used in this section has been acquired from a moving platform and contains multiple objects at different orientations and background variations. The movement of the sensor and the object induce coupled motions into the FLIR images which make the detection and tracking of the target object difficult. (Gardezi, Alkandri, Young, Birch, & Chatwin, 2018, p. 126)
As an example, the authors describe one image in the dataset that showed a car with a hot engine but a “cool passenger compartment”; the authors concluded that such mixed heat images from the complex world would be a suitably demanding test for the SPOT-MACH filter (Gardezi, Alkandri, Young, Birch, & Chatwin, 2018, p. 126). In addition to these real-world challenges, the authors also used synthetic images to deceive the respective systems (in order to experiment with ways to work around such deceptions). One image depicted was of a studio-generated image of a toy car in view but with an intense light source in a corner, which confused the filter into identifying the intense light source as a region of interest (p. 122). Non-uniform lighting can be risky and is also not unheard-of in-world.
The various approaches involve tradeoffs. Various approaches “can be made locally adaptive to spatial variations in the input image background clutter” for normalizing for “local intensity changes” (Gardezi, Alkandri, Young, Birch, & Chatwin, 2018, p. 101). Pre-processing the data, then, can affect outcomes. [The authoring team described the use of Oriented Difference-of-Gaussian filters to “equalize the amount of energy at each orientation across the entire scene” (p. 108).]
Changing various parameters on the filters may also be the difference. Correlation filters, optimally, would be “invariant to all distortions of the target object whilst still being able to maintain a good discrimination between similar objects. In other words, the correlation filter must strike a compromise between the requirements of in-class distortion tolerance and out-of-class discrimination” (Gardezi, Alkandri, Young, Birch, & Chatwin, 2018, p. 107).
They describe changing input data by representing shapes of interest in 360-degree angles (from all sides of a volume circle in a scan) given that a shape may instantiate in a 2D plane from any number of possible angles (Gardezi, Alkandri, Young, Birch, & Chatwin, 2018,p. 117). A car reference may be “multiplexed for orientations between 0o and 40o” (p. 119) in one tool.
Threshold settings have to be set with finesse, or else false positives and noise will be plentiful if the thresholds are too low, or many true positives will be missed if the thresholds are set too high. The authors describe well the various challenges of these respective visual filters. Their side-by-side images of the photo and its corresponding 3D auto-correlation plane are evocative. There are also considerations for computational costs of running the respective programs. They summarize:
The SPOT-MACH filter is shown to provide more robust recognition performance than both the OT-MACH filter and the SIFT technique for demanding images in which there is poor contrast or large illumination gradients such as those derived from thermal infra-red cameras operating in desert environments…The disadvantage of the SPOT-MACH filter is its numerically intensive nature since it is template based and is implemented in the spatial domain. (Gardezi, Alkandri, Young, Birch, & Chatwin, 2018, pp. 101 - 102)
Deep Learning Applied to Information Security
Safia Rahmat, Quamar Niyaz, Ahmad Y. Javaid, and Weiqing Sun’s “Application of Deep Learning as a Pattern Recognition Technique in Information Security” (Ch. 5) posits the importance of maintaining the “CIA triad” in data management. The “CIA” refers to “confidentiality, integrity, and availability” (p. 143). In terms of identifying malware, deep learning may be deployed to engage in pattern recognition or “feature learning,” on known pre-labeled data as well as unknown unlabeled data. Early recognition of a cyber attack may enable fast responses in defensive and offensive measures. This co-authorship team, from the University of Toledo, in Toledo, Ohio, identifies the “advantages of deep learning over popular supervised machine learning techniques being used in the real-time intrusion detection systems (IDS)” (p. 142), for computer security, network security, and malware detection. Theirs is a methodical chapter with clear definitions of various machine learning algorithms and approaches to set the context.
In traditional machine learning, humans decide which features are of interest and tune the algorithms to focus on the selected features, then run the ML algorithm, and come out with outputs. In a deep learning algorithm, there are inputs, and the neural network itself optimizes the learning and various weights in each of the connections; however, these require a large amount of data and “high-end machines” for the processing (Rahmat, Niyaz, Javaid, & Sun, 2018, p. 152). Deep Belief Networks and Deep Neural Networks build on unsupervised machine learning (pp. 154 - 155). For their research, they tested with a dataset of “431,926 binaries of which 81,910 were benignware, and 350,016 were labeled as malware” (p. 159). The prediction time “was around 0.1 second for each unknown sample” (p. 160). The co-authors also describe various setups for the detection of malicious apps for smartphones. [One small complaint: Some of the sources in the References list at the end were URLs with partial names only. Closer adherence to the formal source citation method would improve this chapter.]
Improving the MNIST Benchmark Dataset for Training and Testing
In “A Statistical Review of the MNIST Benchmark Data Problem” (Ch. 6), Jiři Grim and Petr Somol focus on the Modified National Institute of Standards and Technology (MNIST) database, which contains a set of handwritten numerical digits, used to train computational image processing systems.
This dataset was drawn from the NIST Special Databases SD3 and SD1. The respective images may be coded in a binary way, correct or incorrect based on the number label applied to the visual. The images are unformatted. As a “benchmark database,” this dataset is used to test the efficacy of various machine learning algorithms. One challenge is that when this set is used in probabilistic neural networks, researchers have found “the training and test sets have slightly different statistical properties with negative consequences for classifier performance” (Grim & Somol, 2018, pp. 173 - 174). The authors propose a different visual set created by agents moving through a 16x16 grid using moves available to rooks (castles) and knights (horses). The paths are set up as lines on a grid, and a set that is computer generated may enable more consistent training-and-test results. Further, such an imageset will disallow the usage of external knowledge of the imagery, which may affect the designs and deployments of various machine learning systems (p. 176).
The co-authors describe some methods to extend the MNIST imageset:
The most successful methods apply some data enlargement techniques to generate differently modified versions of the original MNIST images. In particular, the best classifiers make use of different affine or elastic distortions, shifts, skewing, scaling, compression or even very general nonlinear random transforms in order to extend the training set. (Grim & Somol, 2018, p. 176)
Such deformations of the training set may prevent overfitting of models to the data and “improve generalization” (p. 176) and problem-independent methods” (p. 186). The authors explain:
With the aim of a more general applicability we suggest a model of a perfectly balanced statistical benchmark based on artificial binary patterns generated on a chess-board by random moves of the chess-pieces rook and knight. By using uniquely initialized pseudorandom sequences the data sets are exactly reproducible in arbitrary length and therefore any data enlargement is unnecessary. There is no external knowledge about the binary patterns except for generating rules and there is no simple way to extract informative features from the data. (Grim & Somol, 2018, p. 189)
The included images show some of their computer-generated visual samples. Grim and Somol are from the Institute of Information Theory and Automation of the Czech Academy of Sciences, in Prague, of the Czech Republic.
Enhancing Data Pre-Processing for ANNs
Adel Belayadi’s “Computing with an Artificial Neural Network to Enhance Information Processing: Using a New Method of Feeding the Training Input-Output Mapping” (Ch. 7) proposes a new method for increasing the learning rate of an ANN.
The rationale for the research? He explains:
In the network learning processes, error back propagation (EBP) is considered as one of the most used training algorithm for feedforward artificial neural networks. However, this algorithm is very slow if the size of the network is too large. Additionally, the main problem with the EBP algorithm is that (sic) has a constant learning rate coefficient and different regions of the error surface may have different characteristic gradients that may require a dynamic change of learning rate coefficient. Second-order algorithms help to converge much faster than first order algorithms. Furthermore, by combining the training speed of second-order algorithms and the stability of EBP algorithm, an investigation of different training algorithms will be used to train the neural network connection links such as Levenberg Marquardt and Conjugate Gradient algorithms. (Belayadi, 2018, p. 199)
The author proposes a mathematical approach to enhancing the multilayer artificial networks (MLAN) algorithms in a way that adapts it to the input data, and he works proofs into the chapter. (The complexities of this are beyond the review and the reviewer.) Belaydi works at the Laboratory of Coating Materials and Environment in Boumerdes, Algeria and the Engineering Department, Bab-Ezzouar University, in Algiers, Algeria.
Batch-Based Feature Extraction through Connectionist Models of a Wavelet Neural Network (WNN)
“Batches based Feature Extraction for a Pattern Recognition System using the Connectionist Models of a Wavelet Neural Network” (Ch. 8), Adel Belayadi, Boualem Bourahla, and Fawzia Mekideche-Chafa, is focused on the identification of printed letters using wavelet neural networks. NN are considered connectionist models because of the interconnections between the various neurons. Here, the authors describe a practical setup of using a NN to detect letters
The data used to train the neural model are 120 x 110 binary images of bitmap type. In this approach, the criteria chosen to fill the input data is a proposed one, and it consists in dividing the image’s lines, which generally provide a big number of lines, into batches of a small number of lines and then extracting the input features of the images. The method allows us to easily feed the neural network and reach the required mean square error without being stuck in local optima in a very quick time and with a high recognition rate. (p. 236)
The authors provide a solid review of the literature. Their model, however, seems somewhat rigid to the context and the training takes about 25,000 epochs and high numbers of training examples before recognition is achieved (Belayadi, Bourahla, & Mekideche-Chafa, 2018, p. 253).
The co-authors hail from the following: Bab-Ezzouar University in Algeria (first author, third author), the Laboratory of Coating Materials and Environment, in Algeria (first author), and the Laboratory of Physics and Quantum Chemistry at M. Mammeri University, in Tizi Ouzou, in Algeria (second author).
A Short Note about Academic Research
Academic research generally has to be both cutting-edge and novel in its application of technologies. It has to have some practical applications, such as to solving a practical problem, in a demonstrable and provable way (and has to be repeatable by others). Towards these ends, measures have to be precise and verifiable. The reasoning has to be tight. The academic research has to showcase the expertise of the author-researchers (or author-research teams). The work and writing are presented in ways that are optimally respected by peers in a field. By the time academic research makes it into print, particularly in widely available books, it is assumed that the field has generally advanced beyond the descriptions.
For sensitive research (particularly in technologies), the general public is said to be two to three decades behind where intelligence agencies are, in part because of embargoing, in part because of the decade or more required for most technologies to arrive to commercialization and public markets and user uptake. Commercializing technologies can be challenging, too, with plenty of legal oversight, costs-to-market, required engineering and design, and marketing and sales. A book is then a lagging indicator of the state of a field. An academic reviewer might ask questions about the technologies, the math, the statistics, and the strategic cobbling of the elements in a solution.
A non-expert reviewer might ask questions about a public-facing state of the field, like the following:
- What are visual data patterns? How can computer vision and image recognition be harnessed for practical purposes? (Where are these capabilities functioning stealthily currently, out of the public eye?)
- How recognizable are people in terms of facial recognition? Is it like in the movies?
- What is the general state-of-the-art of image recognition for security and law enforcement? For healthcare? For other applications?
And so, a non-expert reading an academic work may have to draw on some areas of shared knowledge and then constantly reach for a dictionary and reference resources for the rest. This is a weakness in the review. Users of such systems have to understand how to adjust the parameters to tune the computations (without introducing error on the functions, many occurring in the background), and they have to know what the data readouts mean and what to do with them; in these cases where the machines are doing the “heavy lifting,” humans still need to understand what is going on in order to know how these inform their awareness and decision making.
One quibble: It does not seem that there was a consistent standard for the diagrams. Systematized visualizations would enhance this work and potentially increase the clarity around work sequences. Some visuals tend towards sparsity, and others towards ornateness.
Thomas Lu and Tien-Hsin Chao’s edited collection Advances in Pattern Recognition Research (2018) provides a peek at some of the most popular processes applied to image processing and the solving of practical needs. As with most complex technologies, there is a sense that most people will benefit from the capabilities—in terms of increased security in the hands of ethical law enforcement, improved computational-enhanced healthcare, optical character recognition technologies, warfare, and others—but that the processing under the covers may be hidden. There are general public accesses to various machine processing capabilities through tools from IBM, Microsoft, and Google currently (and maybe others), so such technologies and their related functionalities have democratized to a degree. In terms of such technologies in the hands of researchers, though, the dispersions do not yet seem to have moved to non-computer-science researchers, as-yet. (Give it another few years?)
Real-World Case of Algorithms and Images in Action:
How to Take a Picture of a Black Hole (Katie Bouman)
Finally, it helps to have an ending example...from the world...given the first photo ever of a black hole in April 2019, based on the international Event Horizon Telescope project. One of the main contributors spoke about her work on April 28, 2017. This achievement has been many years in the making and required the professional efforts and input of many!
About the Author
Shalin Hai-Jew works as an instructional designer at Kansas State University. Her email is firstname.lastname@example.org.
|Previous page on path||Cover, page 20 of 23||Next page on path|