Policy-Based Data Management
1 2015-08-26T20:59:48-07:00 Amanda Starling Gould 88396408ea714268b8996a4bfc89e43ed955595e 2553 1 Dr. Reagan Moore, from UNC's RENCI presents “Policy-Based Data Management” that will cover the following ideas: A network can be viewed as the ... plain 2015-08-26T20:59:48-07:00 YouTube 2014-09-30T18:44:59.000Z 2244XTGFjPQ FranklinHumanities Amanda Starling Gould 88396408ea714268b8996a4bfc89e43ed955595eThis page has paths:
- 1 media/Network Ecologies Symposium flyer for Friedl Screens.jpg 2016-05-17T05:44:48-07:00 Florian Wiencek ce1ae876f963bfc3b5cf6c3bbd8f57daf911e67f Network Ecologies Symposium Florian Wiencek 29 image_header 245204 2016-08-09T03:08:24-07:00 Florian Wiencek ce1ae876f963bfc3b5cf6c3bbd8f57daf911e67f
This page is referenced by:
-
1
2016-03-28T17:44:23-07:00
Networks, Abstraction, and Artificially Intelligent Network(ed) Systems
40
A conversation with UNC RENCI's Dr. Reagan Moore and Dr. Arcot Rajasekar
plain
2016-08-09T15:07:49-07:00
After an exciting conversation between Network Ecologies scholars Dr. Reagan Moore, Dr. Arcot Rajasekar, Amanda Starling Gould, Florian Wiencek, and Michael Tauschinger-Dempsey, the following set of notes emerged. We provide them here as content for comment, critique and question so that we may collectively arrive at more comprehensive conceptions of the network, network ecologies, and intelligent network(ed) systems.
Why publish notes? “Taken, made, jotted, foot, or head: Notes are necessary interventions between the things we read and the things we write.” (via Geoffrey Nunberg, Berkeley). Take Note, a recent Harvard conference on the past and future roles of note-taking across the university, highlighted how Pascal, Montesquieu, Leibniz, and Walter Benjamin raised academic note-taking from merely scribbled to artful, organized, and publishable. Digital collaborative note-taking processes extend the intervention from solitary to social, permitting a network of thought to emerge around a shared concept.
1. What is the network?
→ See Dr. Moore expand on these notes in his Network_Ecologies Symposium talk “Policy-based consensus building.”- The network is no longer simply the hardware infrastructure (of edges and nodes) but can be characterized by the actions applied by the set of users that access the network. The users represent a community with a common interest that rely upon the network to reach a common goal. The community creates a consensus for the purpose for the network, and the types of data, information, and knowledge that they will share within their network. The purpose can be characterized by a consensus on the policies that govern their interactions. For each policy, a procedure is typically defined to enforce the policy. The result is the ability to make assertions about the properties of the network, based on the policies and procedures that are enforced.
- A network can then be viewed as the data infrastructure that enables formation of a collective purpose.
- A network can be viewed as the mechanism that translates between assertions made by providers of information and assertions desired by users of the information. Each group providing resources to the network should be able to quantify the properties that are associated with their digital objects. A user will have a set of properties they require in order to make effective use of a digital object, or trust whether the digital object comes from an authoritative resource. A mapping is needed between these two sets of assertions.
- A network can be viewed as a community consensus for reaching a common goal.
- The characterization of networks is still evolving through the embedding of policies and procedures within network routers. This will enable a new/future foundation for network structures and behaviors.
- The Future Internet Architecture uses software defined network overlays to embed policies in the network.
- Data management systems define policies that control properties of a shared collection.
- Mappings can be established between the policies used for a virtual collection and the policies used by a virtual network. This makes it possible to embed advanced data manipulation operations within the network, such as addressing by file name, enforcement of access controls, data caching within the network, and optimization of data transport.
- The network will be represented by the operations that can be applied on each data or information exchange.
- An implication is that each network will have policies that manage interactions between users.
- The network will then require publication of the controlling policies as well as the network topology for how nodes are connected. The controlling policies manage the paths that information exchange might take between users.
- Within iRODS, integrated Rule Oriented Data System, policies can be set at the level of the collection, at the level of the community, and at the level of federation for multiple-community use.
- Within 5+ years, “We’ll have a rule engine at each node of a network that enforces the network policies” (Moore).
- We expect knowledge to be encoded with data, making it possible for a data object to inspect its environment, and apply policies to control what can be done with the digital object content. Policies will be “moved” with the data, they are attached at the level of metadata—they become “metameta data” sets.
- Policies are “the true expression of what the collection is about” (paraphrasing Moore). Policies constitute assertions about how a network will behave. Currently, users make assumptions that networks will have well defined behaviors, such as the reliable forwarding of data from one place to another. Explicit policies can control how the network behaves, and also control permissible behavior by the users of the network.
- A network is both the communication mechanism and the users of the communication mechanism (community).
- Similarly, an archive is both the infrastructure that manages preservation of records, and the set of records that are being preserved.
- By considering the user community requirements, a network can minimize the effort needed to deal with the data and information that are being exchanged.
- A network can facilitate transformations of the data (say map to the access device resolution), can provide provenance information (state where the information came from), provide assurances about authoritativeness, limit behavior that is not acceptable to the community, etc.
- The architecture of the network (e.g. in iRODS) becomes less important as behavior is ruled by dynamically-evolving community-sourced policies. Unlike Facebook, for instance, the behavior within iRODS is not based solely on the network’s architecture but on the policies and relationships emerging therein. And because the policies and relationships emerge, grow, shift, they allow for nearly endless variation of architecture and network behavior.
- The worth of a network can be characterized through:
- The size of the community that uses the network. Larger collaborations typically build a stronger consensus by considering a wider set of points of view.
- The types of community interaction that are supported, typically driven by a community consensus on desired policies and procedures.
- The level of sophistication of the policies and procedures (are desired transformations automated? can I audit interactions with the network?)
- The amount of usage.
2. Levels of Abstraction
- Asked to investigate ‘Data Life Cycle Stages,’ Moore and Rajasekar decided the more appropriate question was to address the “community-based collection life cycle” as this better describes how our present and future networks are at work. The ‘data life cycle’ is now not so much based on data but based on policy evolution.
- Digital objects are providing a context through their membership in a collection. The context includes relationships to other objects in the collection (arrangement), descriptive metadata (supporting discovery), and standard services (ways to manipulate the data).
- Each collection represents a consensus by a community that governs the formation of the collection.
- The governing community establishes the policies and procedures that control the contents of the collection, the management of the collection, and the validation of assertions about the collection.
- As the community grows, the set of governing policies and procedures will evolve to meet the requirements of the broader set of users. Each new community requires that the tacit knowledge known by the previous community is made explicit through new policies and procedures.
- The set of governing policies and procedures will evolve as the impact of the collection broadens through use by a broader community.
- Typical stages for a community-based collection can be characterized by the amount of tacit knowledge that is made explicit:
- Local project collection (with local knowledge of semantics, formats, procedures)
- Shared collection (explicit policies on data distribution, access controls)
- Digital library (explicit policies on descriptive metadata, arrangement)
- Processing pipeline (explicit policies on manipulation services)
- Reference collection (explicit policies on representation information needed by someone in the future to use the data).
- iRODS is not just focused on data but also on creating a “knowledge-based environment”. The evolution of data management systems can be characterized by:
- File systems—management of bytes, with support for reading and writing bytes
- Digital libraries—management of information, including search and discovery based on metadata attributes
- Data grid—management of knowledge, including the ability to apply procedures and workflows to manipulate the data. The procedures capture the knowledge relationships that are evaluated to generate information. The information is saved as persistent state information that can be queried. Through the ability to add new procedures, and add new metadata attributes, a data grid can capture and apply knowledge.
- A generalization of the concepts of data, information, knowledge, and wisdom is:
- Data consists of bytes, and are manipulated through Posix I/O.
- Information consists of names that are applied to data objects. The names are stored as metadata attributes in a database.
- Knowledge consists of relationships between names. The relationships can be captured in procedures that are dynamically applied. When the procedure is executed, the result is saved as state information. Hence information is the reification of knowledge relationships.
- Wisdom consists of relationships between relationships. Typically, wisdom is knowing when and where to apply knowledge relationships. Thus much of what is considered wisdom is the application of temporal and procedural relationships to decide when to apply knowledge.
- iRODS explicitly encodes wisdom through policy-enforcement points that trap actions by clients, and then check whether a policy should be executed. The application of the procedure controlled by the policy then constitutes the application of knowledge. The results from running the procedure are stored as information. Thus iRODS constitutes a step toward creating systems that ‘dynamically manage information and knowledge’.
- Indeed, the goal will be to move toward creating wisdom and wisdom-creating systems.
- There are some five types of relationships that could be applied for codifying wisdom:
- semantic—logical. This has been the realm of AI reasoning.
- temporal—procedural. These are the wisdom relationships applied in iRODS
- structural—spatial (automated mapping)
- functional—algorithmic (automated transformations)
- systemic-epistemological (these are properties of a system as a whole, and properties of systems as a class). The application of these types of relationships will constitute “true” wisdom. An example is deriving gravitational forces as the unifying concept behind Kepler’s three laws of planetary movement. Another example is the unification of string theories by showing the multiple approaches are projections from a higher dimensional space.
- There are some five types of relationships that could be applied for codifying wisdom:
- Wisdom = relationships between relationships.
3. AI and Autonomous Objects
- In DataBridge and other iRODS applications, “data become like humans” with their own particular attributes and properties.
- iRODS manages seven logical name spaces (users, files, collections, metadata, storage, policies, procedures). For each name space, a set of operations are defined, and a virtualization mechanism is created for applying the operations across multiple types of storage and software systems. Relationships between the namespaces can be established by tracking the operations performed by users. Examples include:
- Which data sets are used together
- Which procedures are used with which data sets
- Which procedures are used together
- Which user communities use the same data sets or procedures
- iRODS manages seven logical name spaces (users, files, collections, metadata, storage, policies, procedures). For each name space, a set of operations are defined, and a virtualization mechanism is created for applying the operations across multiple types of storage and software systems. Relationships between the namespaces can be established by tracking the operations performed by users. Examples include:
- iRODS progressed based on AI—on creating wise systems, knowledge-based (and not data-based) environments.
- Actually, wisdom within iRODS is currently hard coded in the policy enforcement points.
- A next generation of software is needed to automate application of wisdom through definition and application of wisdom relationships to control the procedures.
- Raja’s question: When policies are encoded, can we make autonomous objects? If data and policy are attached, encoded, can we create an active object that knows when its policy is being abused, that can act and think on its own?
- With iRODS, we give the object its own power by way of its policy encoding.
- In response to Florian’s example of other programs with encoded settings like secure pdfs, for instance: Raja says these others work at the level of hardware or software, iRODS works at the level of the object.
- Wise knowledge-based environments = AI
- An interesting question is whether application of wisdom requires a framework that encompasses all of the data management repositories.
- One possibility is that the network is the logical place to embed wisdom relationships.
- A network may evolve into the infrastructure that enables the extraction of systemic properties and epistemological properties from a system. Examples include:
- Rules for hyphens and use of apostrophe’s by aggregating all examples across a network
- Rules for language translation by aggregating all examples across a network
- Rules for formation of community consensus by aggregating all examples across a network.
*Note: This conversation took place on December 19, 2012 at University of North Carolina at Chapel Hill.
- The network is no longer simply the hardware infrastructure (of edges and nodes) but can be characterized by the actions applied by the set of users that access the network. The users represent a community with a common interest that rely upon the network to reach a common goal. The community creates a consensus for the purpose for the network, and the types of data, information, and knowledge that they will share within their network. The purpose can be characterized by a consensus on the policies that govern their interactions. For each policy, a procedure is typically defined to enforce the policy. The result is the ability to make assertions about the properties of the network, based on the policies and procedures that are enforced.