1.) Lock/Unlock vs. SetData/GetData and 2.) predefined vertex formats

Two points spring currently to my mind.

Is there any reason besides simplicity not to use the common Lock/Unlock mechanics I would prefer the addition of this to the API. GetData/SetData canthe simply be mapped onto this. I have a large terrain, that I tesselate into the locked buffer directly... With this new scheme I have to build a temporary array and then copy over via SetData, same with textures. On the other hand, if we have to live with this, why do we have the managed pool anymore, kind of pointless!

The other thing are the predefined vertex formats... Any reason to not expose the members directly and use setters and getters instead I don't see the point here too, this just forces us to create one intermediate result and copy operator code after another. Pointless. And I gibe a bet, the jit can't optimize this out.

Imagine a tight inner loop that deforms a mesh...
1. read vertex from source, multiply by perlin, store in destination. Repeat for all vertices.
2. Renormalize the destination mesh.
3. Upload
4. Render
My C++ version don't use any intermediate for this... works like a charm here: http://www.pouet.net/prod.php which=9447 (old 64K of mine)




Answer this question

1.) Lock/Unlock vs. SetData/GetData and 2.) predefined vertex formats

  • Mark A. Richman

    Shawn,

    I think for me at least, I will just have to rewrite some stuff and not play around with the whole vertex buffer all the time. In many scenarios you don't really need to change every vertex, just a small portion of them, hence the get / set might be efficent enough.

    Thanks,

    Roger Larsen


  • jhusain

    Shawn Hargreaves wrote:
    I can't go into too much detail about the security architecture
    Why not Non-disclosure

    I'm not sure what market segment you guys expect to be hitting with XNA, but it seems to me that a very serious information problem is looming. Previously, it was simple. If you were doing X360 dev, you had an XDK, and the XDK included docs. If you didn't have the XDK, you weren't doing X360 development, and you didn't need the docs as a result. Things are not so simple now. I mean sure, there's going to be a lot of noobs who are doing simplistic work, but there are going to be people interested in pushing the machine harder, and testing much more of what's possible. Somebody is going to have to open up at least some of the architectural documentation, I think. You can restrict that information to the creator's club, perhaps, but a total "no docs" approach to the Xbox scares me. (I'm also dreading finding out what kind of adventure it is to use the Xbox extensions.)

    Back on topic, I think most of what there was to say, has been said. I don't see any reason to remove SetData/GetData, but I really think that if locking semantics are at all possible to provide, they should be. Just S+ out a 4 hour chunk one of these days for the entire dev team, sit down, and solve it


  • wanderingmystic

    Is your concern mostly the performance of copying data, or the need to keep around extra buffers

    In both cases something that occurs to me might be a good option for dynamic geometry would be using a small (say 4k) temporary array, repeatedly filling that and then using multiple SetData calls to update sections of the buffer.
    • You don't need huge temporaries, just a small buffer that you can keep around all the time
    • This will be very very fast as the buffer will stay in cache


  • Syed Imam

    I looked through the XNA framework code for SetData and GetData, and it does indeed do a copy of the elements every time you call it.

     

    However, it seems fairly efficient since it uses unsafe code and memcpy to fill the vertex data.

     

    A solution is to use IDirect3DVertextBuffer directly, since the XNA framework internally calls that. Then you can just change this portion of the code when the XNA framework becomes more optimized.

     

    I also notice that the CopyData appears just to pin the pointer for the vertex buffer before a memcpy is called.  I think this actually might be fast J

     

    I think however that you should prevent having to tessellate the whole terrain on each call, figure out what parts need to be tessellated, and thus work with smaller segments of data which you need to copy / manipulate.

     

    A proposed change would be to have Lock and Unlock methods on the VertexBuffer class to optimize for these scenarios. Simply changing Set/GetData to a property will not help, since you would have to lock / pin the vertex buffer and do a copy in the property for each call.

     

    Proposed solution;

    (unsafe)

    buffer.Lock();

    VertexBuffer* bptr = buffer.GetUnsafeVertexBufferPointer();

    // .. manipulate your vertex buffer

    Tessellate(bptr);

    buffer.UnLock();

     

    Instead of using IDirect3DResource9 the VertexBuffer is a new type of pointer which is not tied to a particular version of the 3D sub system.

     

    I know this solution will lead to people forgetting to unlock and it can also lead to issues where they change the pointer to something invalid.

     

    Since the XNA framework is using unsafe methods, the 360 must allow for unsafe dlls to be loaded also, so I guess this could work.

     

    Thanks,

    Roger Larsen


  • deepchasm

    I have a couple questions.

    What exactly are the technical challenges associated with implementing a GraphicsBuffer<T> equivalent on Xbox, with regards to security I can only think of one potential danger -- the pointer to the actual locked region is easy to retrieve via reflection. One could then use unsafe code to access that directly. Even so, I don't see how that's a problem. I don't have any information about what surrounds the locked region. I could just as easily start assigning random addresses to pointers and just scanning across memory. The other alternative would be to attempt to use the locked region to store malicious code. That doesn't make sense either, though, since I could just as easily allocate an array myself and pin it. Indeed, all the potential dangers here appear to be completely innocuous, and the posts in this thread appear to support that conclusion.

    On a somewhat unrelated note, will the Xbox implementations of Array.Copy and Buffer.BlockCopy use the PPC instructions you mentioned Seems like a high performance copy function could come in handy.


  • Kamii47

    I want to help the OP's case here for the locking strategy. I've posted the code in the inner loop of my terrain (MDX2 version): http://hstuart.dk/paste/view.aspx id=ef7f051f-a022-4d52-a1a9-a59588792541

    It's probably a little difficult to understand that snippet, since most of the values are explained elsewhere in the code. Basically what that code is doing is taking a list of visible terrain patches (which are geomipmapped and stitched) and clumping as many as possible together into each draw call, taking into account how many fit in one IB (limited by 16 bit indices) and whether patches are in the same VB.

    In general, this code locks an IB about once every four patches, since that's how many fit into one IB group the way I have it configured. As a result, XNA's SetData methodology roughly quadruples the number of lock calls per frame.

    Consider the case where you're doing skinning on the CPU. (This is highly dubious, but Doom 3 does it and it's a pain to skin Doom MD5 on the GPU.) What you'd normally do in unmanaged code is to lock the buffer, cast the pointer to a vertex structure pointer, and write out each of the skinned vertices based on your bone transforms. In XNA, you'd have a temporary array which you'd write all the skinned vertices to, and then you'd use SetData to copy it over. That's kind of inefficient, seeing as how you ended up with a system memory copy of the data for no reason. (We're up to three copies now. One in my memory, one in DMA-able system memory, and one in video memory. Four if D3D didn't return a pointer to DMA-able memory, for whatever reason.)

    The damning part of that last example, though, is what happens when you move to the X360 and its unified memory. Without the distinction of "video" and "system" memory, we've literally created a copy of our skinned vertices for no reason. Had I simply been handed the vertex buffer memory, shielded by a GraphicsBuffer<T> like in MDX 2.0, I could've written directly to it, it would be fast, and the GPU would read right off of it. Instead I've allocated a temp array the GPU can't see, introduced a copy step, and considerably reduced the total amount of memory available to me. 512 MB of memory for both CPU and GPU That isn't that much (though it's a god send compared to the previous gen). It means that roughly speaking we're going to be fitting into a footprint of 256 MB of system memory, with the rest given to textures and the like. (Spacewar dances uncomfortably close to this footprint, by the way, unoptimized though it may be.) We can't afford to be wasting memory on idiotic extra copies just because you guys are scared of the potential for an exploit.

    (Sorry if that's overly harsh, but this is something that really bothers me.)


  • A.F.B

    I'm getting a bit pedantic when it comes to speed, pointless allocations, etc. ... sorry


  • Scott Chang

    That were just two usage scenarios...

    The first is a transfer of large "on-the-fly" generated data.... the SetDate/GetData is not sufficient here. Imagine you have stored your landscape as a large array of ushorts... 2048 * 2048... nowe tesselate. For a patch size of 33 * 33 (one row/column overlap) the vertex buffer is going to have exactly 68MB... With lock/unlock it is no problem to tesselate this directly into the vertex-buffer... Wit GetData/SetData we have to create a array in system memory, do the tesselation there and the copy. Something like:

    // Scenario 1 - Intermediate buffer.
    // Problem: Pointless allocation of large buffer in user space
    VertexBuffer buffer = new VertexBuffer(...);
    VertexPositionColor[] vertices = new VertexPositionColor[68 Megabytes];
    TesselateFullTerrain(vertices);
    buffer.SetData(vertices);
    vertices = null;

    // Scenario 2 - Multiple set calls.
    // Problem: Many calls to SetData - which implies excessive calls of Lock/Unlock on native resource.
    VertexBuffer buffer = new VertexBuffer(...);
    VertexPositionColor[] vertices = new VertexPositionColor[33 * 33]; // One patch only.
    for(int i =0; i < patches; i++)
    {
    TesselateSinglePatch(i, vertices);
    buffer.SetData(vertices, i * ....);
    }
    vertices = null;

    // Scenario 3. - Ideal scenario - C++ here as no equivalent in XNA.
    // No allocation of help buffers, no excessive locks.
    IDirect3DVertexBuffer* pBuffer = pD3DDevice->CreateVertexBuffer(... D3DPOOL_DEFAULT ...);
    volatile TerrainVertex* pVertices = 0;
    pBuffer->Lock(0, 0, (void**)&pVertices, D3DLOCK_NOSYSLOCK);
    TesselateTerrain(pVertices);
    pBuffer->Unlock();


    The second example is about creating the "day-in-day-out" work to do fancy effects with heavy math... has nothing to do with buffers. It's just, that the predefined vertex formats should expose the data members directly.

    // Current code.
    // Problem: For each an every access we have to use the property
    // which implies calling a copy constructor or copy assignment operator.
    public struct VertexPositionColor
    {
    private Vector3 position;

    // Setter and getter... if we want to set the position components
    // we always have to create a new Vector3 and assign it.
    // Cost 1 - Construction - Vector3 v = new Vector3(1, 2, 3);
    // Cost 2 - Copy assignment - vertex.Position = v;
    public Vector3 Position
    {
    get { return position; }
    set { position = value; }
    }
    }

    // Ideal code.
    public struct VertexPositionColor
    {
    // Just access it :-D
    // vertex.Position.X = 1;
    // vertex.Position.Y = 2;
    // vertex.Position.Z = 3;
    public Vector3 Position;
    }






  • LiamD

    This is good feedback - please keep it coming!

    We're aware that the current design can require extra data copies, but this is one of those places where we had to trade off performance against API simplicity, portability, .NETiness, etc.

    It's great to get this detailed feedback about scenarios where SetData may not be the best approach.

    Two things I'd like to clarify about the implications on Xbox:
    • "Because you guys are scared of the potential for an exploit" is actually a pretty powerful reason for me at least! I'd hate to go down in history as the guy responsible for shipping the API that broke Xbox system security, and that probably wouldn't be too great for my career prospects either:-)
    • Although the Xbox is a UMA architecture, games actually exploit this less than you might expect. The GPU can't just access random CPU pages, because it isn't cache coherent and doesn't go through the page virtualization tables. If you want pages to be visible to the GPU, they have to be allocated in a special way: physically linear, and with CPU caching turned off. This makes GPU pages much slower for the CPU to access. You have to be extremely careful writing code that fills GPU buffers, or performance can just be abysmal. Writing native games in the past, I've actually sometimes found it easier just to generate my data into a cached temporary buffer, then copy that when I'm done. The copy can be surprisingly fast if you use a well written copy function, exploiting the appropriate PPC instructions to avoid cache pollution, etc.


  • Kimme

     Shawn Hargreaves wrote:
    This is good feedback - please keep it coming!

    • "Because you guys are scared of the potential for an exploit" is actually a pretty powerful reason for me at least! I'd hate to go down in history as the guy responsible for shipping the API that broke Xbox system security, and that probably wouldn't be too great for my career prospects either:-)

    Shawn, it's not that we want to compromise the security of the Xbox, we just want to avoid the problems those extra copies cause*. The use of SetData requires us to either a) Keep around temporary buffers to be used for filling VBs, thus wasting memory** or b) Create new buffers on each fill, potentially triggering unwanted collections. I think many of us would be willing to accept a reasonable performace trade off in return for a Lock/Unlock mechanism. Failing that, there's always my suggestion to use a delegate, which while far, far slower than a memory copy, would provide an option in cases where memory has become a severe limitation.

    * As a side note, these concerns are mostly related to the 360, as the generational collector on Windows would mitigate the problem with creating new buffers.

    **  How much depends on the size of the buffers and the number of formats used, so failing any other option, we could always limit the number of formats and use a smaller buffer with multple SetData's. Still, this is far from ideal and complicates the system.


  • buladbanaw

    I think the reasoning behind hiding the pointers is to ensure people don't messup and write unstable code. I don't believe this is security related at all. Or it could be that the implementation on the xbox is so different that they decided to abstract it away.

    One simple way could be to change (VertexBuffer class)

    internal unsafe IDirect3DVertexBuffer9* pComPtr

    to

    protected unsafe IDirect3DVertexBuffer9* pComPtr.

     

    Then we can just inherit from VertexBuffer and go to town.

    Thanks,

    Roger Larsen


  • Tim Hunter

    The Xbox driver model is very different to Windows, so the conventional Windows D3D lock semantics don't really apply. Where the security issues come into play is with the UMA architecture. You might think it would be as simple as "everything is UMA, so no need to lock, you can just get pointers to resources and party away". But it isn't, because of our CLR sandboxing. Yes, we let you run unsafe managed code, but only within a virtualized address space. I can't go into too much detail about the security architecture, but rest assured it is there, it's pretty complicated, and it has big implications for how GPU resources are accessed!


  • sohaibi

    So you are reading the vertex data out of the buffer, updating, and writing it back What pool is the vertex buffer in It would be interesting to see the relevant XNA code that performs this operation so we get a good feel for the usage pattern you are describing.

  • Dotnetter03

    What I do now is to keep a buffer for one terrain patch 32*32 in memory and do a SetData for every single patch.

    Give 4096 calls to SetData (= 4096 times Lock/Unlock) for 2km2 with a density of 1 m2... Hmmm... hopefully its locked with D3DLOCK_NOSYSLOCK...

    Other solution is to SetData the whole array from an intermediate buffer of 68MB... ouch. Hurts even more than the locks.



  • 1.) Lock/Unlock vs. SetData/GetData and 2.) predefined vertex formats