{"id":83,"date":"2015-06-15T18:37:00","date_gmt":"2015-06-15T23:37:00","guid":{"rendered":"https:\/\/coopercain.com\/?p=83"},"modified":"2022-01-14T13:30:18","modified_gmt":"2022-01-14T18:30:18","slug":"marking-data-for-forwarding-and-re-sharing","status":"publish","type":"post","link":"https:\/\/coopercain.com\/?p=83","title":{"rendered":"Marking Data for Forwarding and Re-Sharing"},"content":{"rendered":"\n<p> Patrick Cain<br> Resident Research Fellow, APWG<br> President, The Cooper-Cain Group, Inc.pcain@apwg.org<br><br>Version  1.6<br> June, 2015<\/p>\n\n\n\n<p>A pdf version of this document &#8211; with nicer pictures &#8211; is available at: <a href=\"https:\/\/coopercain.com\/?post_type=document&amp;p=69\">https:\/\/coopercain.com\/?post_type=document&amp;p=69<\/a><\/p>\n\n\n\n<h1 class=\"wp-block-heading\"> 1 Introduction<\/h1>\n\n\n\n<p>Many parties collect Internet event data such as data such as IP Addresses, originator identification, or communications content to track network congestion, comply with regulatory regimes, or to detect malicious activity. Many times the data collected is not truly \u2018public\u2019 data but has handling and distribution restrictions or caveats on it. The APWG shares some data that carries some further sharing restrictions and is currently exploring ways to mark this data.<br> Most data or event sharing schemes include the ability to add a document sensitivity or classification marking to alert the recipient of the sensitivity of the data or its handling restrictions. For example, the IETF\u2019s IODEF XML format has an attribute at the top-level to choose one of four sensitivity markings \u2013 \u2018default\u2019, \u2018public\u2019, \u2018private\u2019, and \u2018need-to-know\u2019. Those four choices are also available for marking specific sections of event logs or data, so a report can be marked with an overall sensitivity but have portions marked differently. Other data sharing formats (e.g., STIX, REN-ISAC) have equivalent functionality in the same or more \u2013 maybe 6 \u2013 markings. Other schemes have only three levels and invite creative combinations of the three values (e.g., TLP).<br> As data exchanging becomes more automated the challenge is to devise a marking scheme that can be unambiguously interpreted by a machine \u2013 without the need for human assistance. As an example, one may receive 10,000 or so reports of malicious web sites every day. Human review to determine data sensitivity of the reports\u2019 data items will significantly slow down the processing rate of the reports and possibly doom the data exchange. This paper presents a means to mark data to share within known groups that would support automation mechanisms.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">2 The Problem<\/h2>\n\n\n\n<p>\u201cThe Problem\u201d is  really two distinct problems. First, a scheme is needed to properly mark  data as it is received by the recipient to note its sensitivity. This  (sensitivity) marking needs to be flexible enough to support a wide  community of users, be not overly complicated to understand \u2013  particularly by automation systems, and be easily expandable as marks  change and evolve over time. The sensitivity marks tell the recipient  how to locally protect, and possibly re-share, the data. The second part  of the problem is to devise a way to convey additional restrictions on  the recipient. Both markings should unambiguously tell the recipient  what they can do with the data after they receive it, for example, can  they share it with others in their team or disclose details to other  parties (who may be a victim of the event).<br>There is no way for those two problems to be solved with a relatively  small &#8211; four, six, or eight \u2013 set of identifiers. And there is even a  slimmer chance that multiple data sharing communities could agree as to  the definitions of those identifiers. The next sections introduce a way  to deal with both of the identified problems.<br>Note that our problem definition does not use these data sharing  markings as a means to convey content sensitivity. Other marks are  expected to be used for this purpose. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">3 Our Data Sharing Model <\/h2>\n\n\n\n<p>To  understand our problem and possible solutions requires some  understanding of how the APWG receives and distributes data. In short,  the APWG is a data clearinghouse: very little processing of the received  data is performed before the data is forwarded to others. Our goal is  to be a common point of data collection to make it easier to collect  data.<br> The APWG forwards data to a  set of recipients who are allowed to use the data for various purposes  or to share the data further as explained in a contractual agreement.<br> The purposes allowed to receivers of APWG data are roughly as follows. The data is:<br> \u2022 only for the recipient\u2019s use and should not be shared further.<br> \u2022 may be shared with the recipient\u2019s security team<br> \u2022 may be shared with other members of the recipient\u2019s organization<br> \u2022 may be used in products<br> \u2022 may be shared with other security groups<br> \u2022 may be shared with the public<br>Pictorially, the purposes can be shown as a set of concentric circles (as shown in figure 1),  where each purpose is assigned a numerical value, such as:<br> \u2022 1 &#8211; \u2018recipient only\u2019 or \u2018no further sharing\u2019<br> \u2022 2 &#8211; Coworkers in the security group<br> \u2022 3 &#8211; Data incorporated into products<br> \u2022 4 &#8211; Shared with affected users<br> \u2022 5 &#8211; Shared within the company<br> \u2022 6 &#8211; Forwarded to other security groups<br> \u2022 7 &#8211; Shared with the public<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"614\" height=\"556\" src=\"https:\/\/coopercain.com\/wp-content\/uploads\/2020\/01\/Data-Sharing-and-Forwarding-Markings-1.jpg\" alt=\"\" class=\"wp-image-86\"\/><figcaption>Figure 1.<\/figcaption><\/figure>\n\n\n\n<p>Each circle includes the lower numbered circles<br>There  are more complex diagrams to show other relationships. For  example,  circle 2 could be split into two parts, one for friends of Pat  (#2a)  and one for enemies (#2b) of Pat. Data would be shared with the  friends  of Pat (#2a) but not his enemies (#2b). But the data could not  be  further shared as some enemies of Pat (in #2b) would get the data as   part of circle #3 since the larger circles include the inner sets.   Support for this more complex usage has been deferred until the   concentric circle approach has been thoroughly tested. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">4 The Requirements <\/h2>\n\n\n\n<p>Means to express both recipient and re-sharing constraints leads one to a small set of requirements.<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>The solution should inform the recipient of the data what they can do with it. For example, can they share it with others in their company, disclose it publicly, etc. This is called the \u201csharing tag\u201d.<\/li><li>The solution should allow the sharer to add extra guidance, as in \u201cDo not touch this system as it\u2019s under surveillance\u201d, or \u201cDo not share it with Bob as we think he\u2019s a bad guy\u201d or even \u201cPublic disclosure is embargoed until Tuesday at dawn\u201d. Recently the \u201cshare this data but don\u2019t include attribution\u201d has become fashionable as more sensitive data flows among parties. This extra guidance or cautionary detail to be considered when evaluating, interpreting, or doing something is called a \u201ccaveat\u201d.<\/li><li>The apwg shares data between individuals, within groups, with other groups, and with the public. The solution needs to support all four without burdening the APWG operations staff.<\/li><li>The tags should be usable in multiple languages.<\/li><li>The tag should be easy to use in XML, CSV, or any other format-of-the-day.<\/li><\/ol>\n\n\n\n<p> The tags do not have to include all the policy implications of the data  as sharing groups should have guidelines, maybe even contracts, to  convey what the tags would imply. The sharing markings also do not have  to convey data sensitivity marks. In many cases the \u201cwho can see it\u201d  implies certain sensitivities, and should be covered in the sharing  group agreements. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">5 Shoehorning Markings into Existing Structures<\/h2>\n\n\n\n<p> Our problem became visible when we started to share IODEF XML formatted data, which has four predefined tags. One solution was to redefine the restriction class in the IODEF schema to include other enumerations than the four defined in the standard. This has been tried with varying success. Many XML validation tools will mark the XML document as invalid since the IODEF schema doesn\u2019t except the non-standard enumerations. In some cases the standard IODEF schema can be modified to get around this problem but that requires all tools used by data sharers to use the new schema and a new version of the standard to be produced.<br> A second idea tried to redefine what the four classes meant, e.g., \u2018public\u2019 meant share with anyone, \u2018restricted\u2019 meant the recipient could share it with trusted parties, etc.. But it soon became evident that redefining the four markers would only add confusion as not everyone knew or agreed with the new interpretations.<br> Ignoring the IODEF constraint issues and looking at other commonly-used schemes was not fruitful either. A current favourite marking scheme is based on the Traffic Light Protocol (TLP) which defines four levels of sharing and sensitivity. Although the levels are \u2018red\u2019 (no sharing), \u2018amber\u2019 (some sharing) and\u2019 green\u2019 (more sharing) and \u2018white\u2019 (no restrictions) there have been \u2018black\u2019 (which I infer as a burnt out traffic light) and confusion abounds as to what the actual colours mean for further re-sharing of the data. There isn\u2019t enough information in four levels to support our sharing model, either, and although we could probably shoe-horn our groups into four levels there is still no way to add the localized caveats.<br> A real concern is having data marked as \u2018private\u2019 or \u2018amber\u2019 by two different communities with different numbers of tags and unequal definitions of \u2018private\u2019 and conflicting handling caveats and no means-contractually or programmatically to equate them. More operational experience and study will be necessary to alleviate this concern.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">6 A DataMarkings Structure <\/h2>\n\n\n\n<p>As existing marking schemes seem inappropriate to our needs, a totally new structure was designed to hold all the data marking information. The marking scheme is structured as an XML blob since that allows for some easy testing and validation but the structure should work in other formats. The thing, labeled \u2018DataMarkings\u2019, would contain a sequence of markings for a particular community. Each \u2018community\u2019 element includes sensitivity and sharing tag identifiers as defined by and for that community. Different communities could define their own equivalency rules to deal with data crossing group boundaries.<br> For example, a dataMarkings structure that looks like:<\/p>\n\n\n\n<pre>&lt;dataMarkings>\n<tab>&lt;community name=\u201dapwg\u201d version=\u201d1.0\u201d>\n    &lt;tag>3 - Friends&lt;\/tag>\n    &lt;tag>2 \u2013 Enemies of Pat&lt;\/tag>\n  &lt;\/community>\n&lt;\/dataMarkings><\/pre>\n\n\n\n<p> would convey to a recipient that the data should be controlled and further shared as a level \u201c3 \u2013 Friends\u201d and a level \u201c2 \u2013 Enemies of Pat\u201d in the \u201capwg\u201d community. Now, although the \u20182\u2019 and the \u20183\u2019 are the authoritative markers and are intended to help the automation systems, they may not have apparent meaning to a human so the  could also be a defined data marking label like \u201cno sharing outside group\u201d or \u201csharing with public allowed\u201d. The  structure doesn\u2019t need to know this detail. Additionally, there are some paranoid communities where the community name may be sensitive so the structure also allows any text to be used &#8211; e.g., community names generated by a hash or encryption or even random values. Communities are expected to provide guidance to their users on the use of the markings, caveats, and policy implications. The community string also carries a version identifier so communities can change, add, or remove markings without having to pick a different community name. The hope is that the version attribute will reduce the number of \u2018apwg\u2019, \u2018apwg-1\u2019, \u2018apwg-2\u2019 \u2026 \u2018apwg-1367\u2019 distinct community identifiers necessary in the future as the markings evolve. Some thought has been given to defining two other attributes \u2013 \u2018until\u2019 and \u2019after\u2019 \u2013 to deal with embargoed data. For example, data may be \u2018no sharing allowed\u2019 until a point that an investigation is completed, then that data set becomes \u2018share with trusted groups\u2019. Although the XML additions are straightforward, it has not been made part of the  class until development of an acceptable CONOPS and use case is complete. In real operations it may be easier to re-share the embargoed data with a new mark at the embargo expiration than to have to support complex caveat logic.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">6.1 Hierarchical versus distinct markings <\/h3>\n\n\n\n<p>  The  structure supports hierarchical and distinct marking schemes although the first pilots use hierarchical marks.. A community could design their marks to be very specific, e.g., 0 \u2013 recipients, 1-friends of Pat, and 2 \u2013 friends of Bob. If we wanted to share with friends of Pat and friends of Bob the mark would need both an entry for\u20191\u2019 and for \u20182\u2019. There is no means to generate an \u201conly trusted insiders\u201d mark as it seems illogical as how would one know? The only case where this seems to make sense is to mark data as \u201conly the infected system owner\u201d if you are sharing the data with someone who has contact information for the infectee. The  structure may be simplified if such a tag is really implemented as a caveat, which is our current plan. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">7 Carrying Complex Markings into XML Documents <\/h2>\n\n\n\n<p> Another attribute of the community element is the \u2018alias\u2019 attribute. In IODEF and other XML formats, the generator of a report may mark specific parts of the report with more restrictive markings. For example, a spam report may mark the whole report with a \u2018public\u2019 mark but mark the  element with a \u2018good guys only\u2019 as the history may include active investigative data.  The alias attribute allows the report originator to designate a short-hand marking for use later in the document. A more complex example is: <\/p>\n\n\n\n<pre>&lt;dataMarkings&gt;\n  &lt;community name=\u201dapwg\u201d version=\u201d1.2\u201d alias=\u201dprivate\u201d&gt;\n    &lt;tag&gt;3&lt;\/tag&gt;&lt;tag&gt;restrictive&lt;\/tag&gt;\n  &lt;\/community&gt;\n&lt;\/dataMarkings&gt;<\/pre>\n\n\n\n<p> Note that the  class performs the same functions as the \u2018shoehorning\u2019 mentioned above, except by reusing existing  enumerations there is no need to modify the existing IODEF or STIX schemas. The bad news is that there are still only four choices to \u2018alias\u2019 and the access control routines that process the report need to be aware of the equivalent markings. So although the structure supports it there are not many actual uses expected.<br> Although proposed as more of a test feature, it has many advantages over adding additional  structures and reissuing all the format standards.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">8 New XML Data Classes <\/h2>\n\n\n\n<p>This section defines the  structure as an XML-Document. Although it can be used in other formats XML allows for some testing and guided implementations.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"> 8.1 The structure <\/h3>\n\n\n\n<p>The overall structure is two lists of values:<br> BEGIN<br> List of sharing tags (identifier, sharing-value)<br> List of caveats (identifier, value)<br> END<br> The initial sharing tags in the APWG community, apwg-1, would be:<br> 99 &#8211; Recipient only<br> 83 &#8211; Community<br> 73 &#8211; Internal Details<br> 71 &#8211; Internal Summary<br> 53 &#8211; Impacted Party Details<br> 51 &#8211; Impacted Party Summary<br> 43 \u2013 Used in Products<br> 33 &#8211; Trusted Details<br> 31 &#8211; Trusted Summary<br> 11 &#8211; Public Summary<br>  0 &#8211; No Restrictions<br> This list supports our requirement to support the APWG sharing model in a hierarchical way. The numerical values were picked to allow easy (and fast) comparison in software and cardinality went from least restrictvie as a minimal value to the mst restrivtive being a numerically higher value to be consistent with the flow of some other known marking systems. A numerically lower value tag implies the higher values, so a tag value of 31 \u2013 Trusted Summary, implies that the data can be shared with the community (83) and internal groups (73) and every other group numerically larger than 31.<br> Trying to define an initial set of caveats was more challenging. Although there are a number of sharing constraints it is unclear which of those constraints are valid in the APWG sharing model. An initial set of caveats are below but generating an acceptable caveat list will probably take quite some time . The use of non-numerical values should reduce confusion with tag values.<br> NA &#8211; No originator attribution<br> HI \u2013 Historical Data<br> AI \u2013 Active Investigation, do not disturb or contact<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">8.2 A More International-Friendly Syntax <\/h3>\n\n\n\n<p>One concern is that non-English speakers may not adequately comprehend the descriptive portions of the sharing tags. A slight modification to the syntax could help this by modifying the descriptive portion of the tag, as in:<\/p>\n\n\n\n&lt;tag&gt;71 \u2013 Internal Summary&lt;\/tag&gt;\n\n\n\n<p> would change into &lt;tag value=\u201d71\u201d lang=\u201den\u201d&gt;Internal Summary&lt;\/tag&gt;<br> or for a Spanish version: &lt;tag value=\u201d71\u201d lang=\u201dsp\u201d&gt;Resumen interna&lt;\/tag&gt;<\/p>\n\n\n\n<p>  This new encoding would allow the descriptive field to be translated into local languages but the actual tag value would stay the same to optimize processing. Note that this modification would be useful for XML-encoded data markings where the extra bytes needed to encode the language tag do not significantly add to the length of the tag which is untrue for other non-XML encodings. Nevertheless, this is incorporated into the current dataMarkings structure definition.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">9 XML Schema Definition <\/h2>\n\n\n\n<p>To help the tag definitions an XML schema is being developed. It is not final but is referenced here for information.<br> The current XML schema is available at: <\/p>\n\n\n\n<p><a href=\"https:\/\/github.com\/patcain\/data_marking\/blob\/master\/apwg-markings-1.6.xsd\">https:\/\/github.com\/patcain\/data_marking\/blob\/master\/apwg-markings-1.6.xsd<\/a><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"> 10 A Staged STIX Example <\/h2>\n\n\n\n<p>  The following STIX-Document shows placement and an example use of the markings. Some fields have been compacted for display.<br><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>&lt;STIX_Header&gt;\n&lt;Title&gt;Example Report for Scanning for open ssh servers&lt;\/Title&gt;\n&lt;Package_Intent xsi:type=\"stixVocabs:PackageIntentVocab-1.0\"&gt;Indicators - Network Activity&lt;\/Package_Intent&gt;\n&lt;Profiles&gt;\n&lt;stixCommon:Profile&gt;apwg.org:scan-general-1&lt;\/stixCommon:Profile&gt;\n&lt;\/Profiles&gt;\n&lt;Handling&gt;\n&lt;marking:Marking&gt;\n&lt;marking:Marking_Structure marking_model_ref=\"apwg1\"\nxsi:type=\"apwgMarkings:apwgMarkingStructureType\"&gt;\n&lt;apwgMarkings:tag value =\u201d00\u201d&gt;No Restrictions&lt;\/apwgMarkings:tag&gt;\n&lt;\/marking:Marking_Structure&gt;\n&lt;\/marking:Marking&gt;\n&lt;\/Handling&gt;\n&lt;Information_Source&gt;\n\u2026<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">11 Use in CSV formats <\/h2>\n\n\n\n<p> Although we specified the tags and caveats in XML they should work in  CSV sharing communities. The community, tag, and caveats could be  encoded as community\/tag\/caveats followed by a comma, as in<br> ,apwg\/71 \u2013 Internal Summary\/NA &#8211; no attribution .<br> <br>  Some sharing communities may be able to specify shortcuts. If the  community uses the apwg tags, and really wants to save space, the data  marking could be<br> ,71\/NA,<br> Other formats should be able to support our markings in a similar manner.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">12 APWG Pilot Use  <\/h2>\n\n\n\n<p>APWG  researchers have proposed multiple communities for the collection and  sharing of data and incorporated the marks into a test data repository.  Some of the actual policy guidance to mark data are still under  development and are repository and community dependent and the  definitions are quite fluid; do not rely on them for operational use.<br> The current XML schema and CSV guidance are available at github.com\/patCain\/ecrisp. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\"> 13 Further Considerations <\/h2>\n\n\n\n<p> The use of these marking is still in development and the operational  situations are still evolving. Although a draft CONOPS is in the works,  comments, suggestions for improvement, and operations models that break  the concept are always appreciated \u2013particularly if you share data in a  compatible data model as the APWG\u2019s. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">14. References<\/h2>\n\n\n\n<ul class=\"wp-block-list\"><li> Danyliw, R., Meijer,  J., &amp; Demchenko, Y. (2007, December). The Incident Object  Description Exchange Format (RFC 5070). Retrieved January 2012, from  Internet Engineering Task Force: ftp:\/\/ftp.isi.edu\/in-notes\/rfc5070.txt<\/li><li> Traffic Light Protocol, http:\/\/en.wikipedia.org\/wiki\/Traffic_Light_Protocol<\/li><li> Structured Threat Information eXchange, http:\/\/stix.mitre.org <\/li><\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Patrick Cain Resident Research Fellow, APWG President, The Cooper-Cain Group, Inc.pcain@apwg.org Version 1.6 June, 2015 A pdf version of this document &#8211; with nicer pictures &#8211; is available at: https:\/\/coopercain.com\/?post_type=document&amp;p=69 1 Introduction Many parties collect Internet event data such as data such as IP Addresses, originator identification, or communications content to track network congestion, comply &hellip; <a href=\"https:\/\/coopercain.com\/?p=83\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Marking Data for Forwarding and Re-Sharing&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[5,6],"tags":[],"class_list":["post-83","post","type-post","status-publish","format-standard","hentry","category-apwg","category-data-sharing"],"_links":{"self":[{"href":"https:\/\/coopercain.com\/index.php?rest_route=\/wp\/v2\/posts\/83","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/coopercain.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/coopercain.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/coopercain.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/coopercain.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=83"}],"version-history":[{"count":7,"href":"https:\/\/coopercain.com\/index.php?rest_route=\/wp\/v2\/posts\/83\/revisions"}],"predecessor-version":[{"id":118,"href":"https:\/\/coopercain.com\/index.php?rest_route=\/wp\/v2\/posts\/83\/revisions\/118"}],"wp:attachment":[{"href":"https:\/\/coopercain.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=83"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/coopercain.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=83"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/coopercain.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=83"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}