XCON Working Group S. Srinivasan Internet-Draft T. Moore Intended status: Informational Microsoft Corporation Expires: September 5, 2007 March 4, 2007 Media usages and SDP in the XCON data model draft-srinivasan-xcon-usecases-mediausage-01 Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on September 5, 2007. Copyright Notice Copyright (C) The IETF Trust (2007). Abstract The scope of this document is to describe the association of media streams to the XCON data model for various media usages captured in the XCON conferencing scenarios [11]. Srinivasan & Moore Expires September 5, 2007 [Page 1] Internet-Draft mediausage March 2007 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 3. Media stream definitions . . . . . . . . . . . . . . . . . . . 3 3.1. Available media clarification . . . . . . . . . . . . . . 3 3.2. Per-user or per-endpoint media definitions . . . . . . . . 4 4. SDP negotiation and conferencing media usage . . . . . . . . . 5 4.1. Criteria for including media label attribute in SDP . . . 5 4.2. Mapping of media label (in SDP) to media id . . . . . . . 5 5. Media controls definitions . . . . . . . . . . . . . . . . . . 6 6. Media scenarios . . . . . . . . . . . . . . . . . . . . . . . 6 6.1. An example mixer model . . . . . . . . . . . . . . . . . . 6 6.1.1. Conference notification example . . . . . . . . . . . 7 6.2. Common audio/video scenarios . . . . . . . . . . . . . . . 11 6.2.1. Muting an audio stream . . . . . . . . . . . . . . . . 11 6.2.2. Pausing a video stream . . . . . . . . . . . . . . . . 13 6.3. Changing media streams . . . . . . . . . . . . . . . . . . 16 6.4. Changing media sources . . . . . . . . . . . . . . . . . . 16 7. Security Considerations . . . . . . . . . . . . . . . . . . . 16 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 16 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 16 9.1. Normative References . . . . . . . . . . . . . . . . . . . 17 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 18 Intellectual Property and Copyright Statements . . . . . . . . . . 19 Srinivasan & Moore Expires September 5, 2007 [Page 2] Internet-Draft mediausage March 2007 1. Introduction The document clarifies the usages of SDP level attributes used in negotiating media to a conferencing server. RFC 4574 [3] describes a mechanism to label media streams to identify them, but leaves the offer/answer model as an implementation detail. RFC 4575 [2] and the XCON extensions to the conference event package [10] describe mechanisms to notify state of conferences and defines a data model for centralized conferencing. They, however, do not specify semantics of those attributes and their use by the conferencing server in signaling and media negotiations performed with the conferencing client. This document attempts to close the gap by suggesting a means for establishing the relationship between media and the conferencing state information maintained by the XCON conferencing server (for which the data model is described in [7]). 2. Terminology In this document, the key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" are to be interpreted as described in BCP 14, RFC 2119 [1] and indicate requirement levels for compliant implementations. 3. Media stream definitions The XCON framework [1] describes a framework for establishing and participating in a centralized conference. The sections that follow discuss the media aspect of conferencing in more detail. 3.1. Available media clarification The available-media XML element in RFC 4575 [2] offers the ability for a conferencing system to provide a list of media stream inputs or outputs. A conferencing client joining the conference via the conferencing server (eg. the focus) typically subscribes to the conference event package as seen in [11]. The conference event package provides a mechanism for conferencing clients to be notified of conference state as described in the conference framework [1]. The XCON event package [10] further extends RFC 4575 [2] to specify controls (based on [7]) to media streams negotiated by conferencing clients. The XCON data model [7] is derived from the data model described in RFC 4575 [2], so for all practical purposes any references to [2] also apply to the XCON data model [7] and the XCON extensions to the conference event package [10] with a few exceptions (refer [10] for the key differences from the conference event package Srinivasan & Moore Expires September 5, 2007 [Page 3] Internet-Draft mediausage March 2007 in [2]). As more and more conferencing systems begin to offer one or more streams of the same type (like video), conferencing clients ought to be capable of rendering more than one stream as offered by conferencing systems in an inter-operable manner. Mechanisms such as grouping of SDP media lines [8] and SDP media content [9] further help in achieving this goal. It is however important to note that unless a conferencing client understands the context of how these streams ought to be renedered, the conferencing clients may not be able to render streams that it is not aware of. This document only addresses the problem of associating media SDP information (in signaling protocols such as SIP) to the media information supplied in the conferencing document (refer [7]). Media streams offered in a conference in RFC 4575 [2] are each identified via a label. This XML element is defined as optional. A media label, however, SHOULD be assigned if more than one stream of the same media type is offered by the conference. The label also is used to associate the media 'id' attribute (as described in the subsequent sections) to the corresponding stream (m line) in the SDP. This enables conference-aware clients to negotiate SDP media in relation to the conference event package data received from the conference server without having to deal with the specifics of ordering SDP media lines as required or specified by the conferencing server. The label is also unique within the conference-info context as defined in RFC 4575 [2]. A label is typically created when a conference is scheduled, either via conference blueprints [1] or through some other means. A new label MAY however appear in available media element after the conference is active and conferencing clients may decide to render these new streams as required (based on local policy). When a conference is activated and a conferencing client receives a notification with the conference state, the conferencing server typically SHOULD label the media streams. The conferencing client then may use this information or may discover available media via signaling (for example, using SIP OPTIONS) to join the conference and to start receiving media, provided it understands the context in which the specific media needs to be rendered. Mechanisms such as SDP media content [9] further aid in providing conferencing clients with this context. 3.2. Per-user or per-endpoint media definitions Streams sent from a specific user's endpoint device is usually negotiated via some form of a signaling session. The conferencing event package schema contains media XML elements within the users/ user/endpoint elements. The media XML element in RFC 4575 [2] refers to a media stream of which there may be more than one. Each media Srinivasan & Moore Expires September 5, 2007 [Page 4] Internet-Draft mediausage March 2007 stream being sent from the conferencing client to the conference server is identified, within the conferencing event package, via an 'id' (refer [2], for more information). 4. SDP negotiation and conferencing media usage This section explains the semantics of the media label and its usage within the XCON framework. The media label in the conferencing data model or the conferencing event package maps to the media label defined within SDP in RFC 4574 [3]. The media label is used to identify and associate streams in the SDP offer/answer model to the specific streams within conferencing. This section will explain how this is done. 4.1. Criteria for including media label attribute in SDP RFC 4574 [3] suggests that the label may appear either in the offer or the answer and is used to identify the local stream either in the offer or the answer. This section describes how conferencing servers should integrate label into the offer/answer model and associate it with the data model (and thus [10]). All conferencing clients and servers MUST follow the offer/answer model as described in [6]. The following sections only describe the usage of the media label in the context of conferencing within the offer/answer model. The conferencing server SHOULD include a SDP 'label' attribute (as defined in RFC 4574 [3]) for each stream in SDP sent from the conferencing server to the conferencing client (in either the offer or the answer). If there are two or more streams of the same media type (as defined in RFC 4575 [2], Section 5.3.4 with type being the values registered for "media" of SDP [5]), the conferencing server SHOULD include the label for each stream in the SDP sent from the server. The media label MUST follow normative text described in RFC 4575 [2] and RFC 4574 [3]. 4.2. Mapping of media label (in SDP) to media id As the 'id' XML attribute (in Section 5.8 of [2]) is not directly carried in the SDP (or any signaling for that matter), the label attribute also serves the purpose of mapping the media 'id' defined in the data model to the media label defined in the data model. What this means is that a conferencing client will not be able to negotiate different m-lines with the same label within the same conference via separate signaling sessions. [[ Note: Fixing this will require a new SDP attribute for conveying the media id in SDP ]] Srinivasan & Moore Expires September 5, 2007 [Page 5] Internet-Draft mediausage March 2007 5. Media controls definitions Media controls may be defined at the global conference-info level (under available-media as specified in Section 4.1.5.1 in [7]) or may be defined for a specific user and endpoint's media stream (as specified in Section 4.5.2.1 of [7]). The former definition should typically override the latter. For example, if the global audio is muted, then none of the participants audio should be unmuted. The control values for the endpoint's media stream may however have mute set to false. But this value is ignored as the global control is set to true. Note that the global controls only refer to controls for the streams coming from the conference mixer for that stream and does not refer to controls for media streams being sent from the user's endpoint to the mixer. 6. Media scenarios 6.1. An example mixer model [to-mixer streams] [from-mixer streams] |----mixer----| userid=23 , id=34 ----| |--- label='10' userid=23 , id=35 ----| |--- label='11' userid=23 , id=36 ----| |--- label='12' userid=24 , id=24 ----| | userid=24 , id=35 ----| | |-------------| The mixer shown above takes in some set of input streams and mixes them in some form or manner to produce output streams. This document will not cover how the streams are grouped and/or mixed but will only show how the media inputs and outputs tie into the conferencing data model and signaling with an example. For further information refer RFC 4575 [2]. The examples shown below are for information purposes only and is offered to aid in understanding the solution presented in this document. The 'label' parameter above identifies the media stream from and to the mixer. The streams to the mixer, from a specific user and endpoint, are identified by an 'id' in the conferencing data model. The label is unique throughout the conferencing data model. The id is unique within the endpoint media element in the data model and is Srinivasan & Moore Expires September 5, 2007 [Page 6] Internet-Draft mediausage March 2007 generated by the conferencing server. Furthermore, each user is identified by a user identifier, refer [4]. Consider that the label '10' is the stream containing the audio stream mix from all audio input streams offered to every participant. And '11' consists of a video mix that contains one of the layouts as decribed in the scenarios section. And that '12' is an alternate mix of the video streams that is voice activated. And id's 34,35 and 36 for userid 23 are the user's main audio, main video stream and secondary video streams respectively. And id's 24 and 35 for userid 24 are the user's main audio and main video streams respectively. Let us also consider that the mixer mixes the incoming video streams from the participants (going to the mixer) into both label '11' and label '12' streams. Also, the mixer accepts a single input stream at most from the client (in any sendrecv media stream), while rejecting the rest. This is a specific mixer model described here, other mixer models may interpret the input streams differently. The next section will cover how this specific mix will appear in the offer/answer model in SDP. Note that the floor control aspects of the streams above are omitted here for brevity as floor control is defined as being optional. 6.1.1. Conference notification example The notification example given below corresponds to the mixer model defined above. The available-media element lists the media labels as defined. Note that the media labels '11' and '12' are defined with a status element of sendrecv. Using the offer/answer model described earlier, users Bob (userid=23) and Carl (userid=24) have joined the Focus and negotiated media streams as shown in the notification below. It is useful to note that Bob has chosen to recv all video streams, while Carl has decided to opt on the secondary voice-activated video stream. It is quite possible for a conferencing system to expose Bob's input stream directly (without mixing) to the participants of the conference if it deems necessary as Bob has a role of presenter. It may do so, for example, by creating a new label on-the-fly to expose this to the conferencing client. The notification below is what a presenter (Bob) may receive. - Srinivasan & Moore Expires September 5, 2007 [Page 7] Internet-Draft mediausage March 2007 main audio audio sendrecv true main video video sendrecv true secondary video video sendrecv true Bob Hoskins presenter Bob's Laptop connected Srinivasan & Moore Expires September 5, 2007 [Page 8] Internet-Draft mediausage March 2007 dialed-out main audio audio 432424 sendrecv true true main video video 324255 sendrecv true true secondary video video 1324255 recvonly true Srinivasan & Moore Expires September 5, 2007 [Page 9] Internet-Draft mediausage March 2007 full info hsjh8980vhsb78 vav738dvbs 8954jgjg8432 Carl participant Carl's video phone connected dialed-in main audio audio 242443 sendrecv true true secondary video video 632425 sendrecv true Srinivasan & Moore Expires September 5, 2007 [Page 10] Internet-Draft mediausage March 2007 true full info aachsjh8980vhsb78 ffvav738dvbs a8954jgjg8432 6.2. Common audio/video scenarios The following sections are examples of how conference controls and the SDP may be interpreted. This section will not cover the usages of all the controls defined in the XCON data model [7]. [[TBD]] 6.2.1. Muting an audio stream 6.2.1.1. Mute all participants Muting all participants (in other words, activating the control or setting the value to 'true') in the conference typically means that for the entire duration where mute is applicable, all current and future participants of the conference are muted and will not receive any audio. Typically this control is available to presenter or moderator roles in a conferencing system. Setting this control overrides any user-specific control settings specified (see the next few sections). Since no audio is flowing to all participants, activating this control, in turn, may cause the conferencing focus to re-negotiate SDP with the various participants to stop media flowing as and when necessary. This is entirely up to local policy. Note that doing so may cause changes in conference state (with per- endpoint media elements and controls, their respective id's and their default states changing). In the example mixer, the control appears under available-media element as shown below. Srinivasan & Moore Expires September 5, 2007 [Page 11] Internet-Draft mediausage March 2007 true 6.2.1.2. Muting to-mixer stream from a specific participant A mixer stream being sent from a participant to the mixer may be mixed in any form or manner. For example, this may appear in multiple media outputs from the mixer (though not the case in this specific example). Thus, activating this control would most certainly cause this input not appearing in any of the outputs from the mixer. Similar to the previous scenario, activating this control may end up re-negotiating SDP. In the example mixer, the control appears under media element for each user and endpoint. Bob's controls is shown below. true SDP from the conferencing server may look like (some elements omitted) v=0 c=IN IP4 131.164.74.2 t=0 0 m=audio 30000 RTP/AVP 0 a=label:10 Note that even though the above SDP does not contain any information about the media id, the label provides a mapping of the specific m-line to the media section in the data model. Srinivasan & Moore Expires September 5, 2007 [Page 12] Internet-Draft mediausage March 2007 6.2.1.3. Muting from-mixer stream to a specific participant This is a control on a specific mixer stream that is sent from a mixer to the participant negotiated via SDP. This is mostly optional and many conferencing systems may instead opt to not implement such a control. A client may instead, stop sending the media to the output device instead of activating this control to mute the stream. Doing so will have the mixer still sending media packets towards the participant thus taking bandwidth on the network and CPU on the mixer. Activating this control would stop media being send back from the mixer to the participant. Similar to the previous scenarios, activating this control may end up re-negotiating SDP. In the example mixer, the control appears under media element for each user and endpoint. Bob's controls is shown below. true SDP from the conferencing server may look like (some elements omitted) v=0 c=IN IP4 131.164.74.2 t=0 0 m=audio 30000 RTP/AVP 0 a=label:10 As before, note that even though the above SDP does not contain any information about the media id, the label provides a mapping of the specific m-line to the media section in the data model. 6.2.2. Pausing a video stream 6.2.2.1. Pausing video to all participants Pausing the video being sent to all participants (in other words, activating the control or setting the value to 'true') in the conference typically means that for the entire duration where pause is applicable, all current and future participants of the conference would not receive video. Typically this control is available to presenter or moderator roles in a conferencing system. Setting this control overrides any user-specific control settings specified (see Srinivasan & Moore Expires September 5, 2007 [Page 13] Internet-Draft mediausage March 2007 the next few sections). Since no media is flowing to all participants, activating this control, in turn, may cause the conferencing focus to re-negotiate SDP with the various participants to stop media flowing as and when necessary. This is entirely up to local policy. Note that doing so may cause changes in conference state (with per-endpoint media elements and controls, their respective id's and their default states changing). In the example mixer, the control appears under available-media element as shown below. true 6.2.2.2. Pausing to-mixer stream from a specific participant A mixer stream being sent from a participant to the mixer may be mixed in any form or manner. For example, this may appear in multiple media outputs from the mixer (as is the case in the example). Thus, activating this control would most certainly cause this input not appearing in any of the outputs from the mixer. Similar to the previous scenario, activating this control may end up re-negotiating SDP. In the example mixer, the control appears under media element for each user and endpoint. Bob's controls is shown below. Activating this control would end up not showing Bob in any of hte output streams. Srinivasan & Moore Expires September 5, 2007 [Page 14] Internet-Draft mediausage March 2007 true SDP from the conferencing server may look like (some elements omitted) v=0 c=IN IP4 131.164.74.2 t=0 0 m=video 30002 RTP/AVP 31 a=label:11 As before, note that even though the above SDP does not contain any information about the media id, the label provides a mapping of the specific m-line to the media section in the data model. 6.2.2.3. Pausing video from-mixer stream to a specific participant This is a control on a specific mixer stream that is sent from a mixer to the participant negotiated via SDP. This is mostly optional and many conferencing systems may instead opt to not implement such a control. A client may instead, stop sending the media to the display device instead of activating this control to pause the stream. Doing so will have the mixer still sending media packets towards the participant thus taking bandwidth on the network and CPU on the mixer. Activating this control would stop media being send back from the mixer to the participant. Similar to the previous scenarios, activating this control may end up re-negotiating SDP. In the example mixer, the control appears under media element for each user and endpoint. Bob's controls is shown below. Srinivasan & Moore Expires September 5, 2007 [Page 15] Internet-Draft mediausage March 2007 true SDP from the conferencing server may look like (some elements omitted) v=0 c=IN IP4 131.164.74.2 t=0 0 m=video 30002 RTP/AVP 31 a=label:11 As before, note that even though the above SDP does not contain any information about the media id, the label provides a mapping of the specific m-line to the media section in the data model. 6.3. Changing media streams TBD 6.4. Changing media sources TBD 7. Security Considerations TBD 8. Acknowledgements Thanks to Gonzalo Camarillo and Even Roni for useful comments. 9. References Srinivasan & Moore Expires September 5, 2007 [Page 16] Internet-Draft mediausage March 2007 9.1. Normative References [1] Barnes, M., "A Framework and Data Model for Centralized Conferencing", draft-ietf-xcon-framework-05 (work in progress), September 2006. [2] Rosenberg, J., Schulzrinne, H., and O. Levin, "A Session Initiation Protocol (SIP) Event Package for Conference State", RFC 4575, August 2006. [3] Levin, O., Camarillo, G., "The Session Description Protocol (SDP) Label Attribute", RFC 4574, August 2006. [4] Boulton, C., Barnes, M., "A User Identifier for Centralized Conferencing (XCON)", draft-boulton-xcon-userid-00.txt (work-in-progress), October 2006. [5] Handley, M., Jacobson, V. and C. Perkins, "SDP: Session Description Protocol", RFC 4566, July 2006. [6] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with Session Description Protocol (SDP)", RFC 3264, June 2002. [7] Novo, O., Camarillo, G., Morgan, D., "A Common Conference Information Data Model for Centralized Conferencing (XCON)", draft-ietf-xcon-common-data-model-04 (work in progress), March 2007. [8] G. Camarillo, J. Holler, and H. Schulzrinne, "Grouping of Media Lines in the Session Description Protocol (SDP)", RFC 3388, December 2002. [9] J. Hautakorpi, G. Camarillo, "The Session Description Protocol (SDP) Content Attribute", RFC 4796, February 2007. [10] S. Srinivasan, R. Even, "Conference event package extensions for the XCON framework", draft-srinivasan-xcon-eventpkg-extensions-00 (work-in-progress), February 2007. [11] R. Even, N. Ismail, "Conferencing Scenarios", RFC 4597, July 2006. Srinivasan & Moore Expires September 5, 2007 [Page 17] Internet-Draft mediausage March 2007 Authors' Addresses Srivatsa Srinivasan Microsoft Corporation One Microsoft Way Redmond, WA 98052, USA Email: srivats@microsoft.com Tim Moore Microsoft Corporation One Microsoft Way Redmond, WA 98052, USA Email: timmoore@microsoft.com Srinivasan & Moore Expires September 5, 2007 [Page 18] Internet-Draft mediausage March 2007 Full Copyright Statement Copyright (C) The IETF Trust (2007). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Intellectual Property The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Acknowledgment Funding for the RFC Editor function is provided by the IETF Administrative Support Activity (IASA). Srinivasan & Moore Expires September 5, 2007 [Page 19]