Tuesday, 23 August 2011

Framework 4.0 - XmlSerializer vs DataContractSerializer. Prepare class for serialization. Performance.

Serialization is the process of persisting data structure or object state. You can persist object state to file, database, memory, ..... XML is the common format to transfer data between systems.
You can find in internet many articles about different serializers. I will review 2 common: XmlSerializer and DataContractSerializer. DataContractSerializer was proposed by Microsoft as new faster way to persists objects - It used in WCF. Based on tasks you have to be ready to use one or another, or both. Before use these serializers you should understand how class will be persisted.
For some security reasons, you will probably need hide some class properties before send class to client, such as financial data. In this article I will describe how to do it and will review persistence to file only.

Prepare class for serialization.

If you use Framework 4.0, you don't have to make any special preparations for serialization, but if you want have control on serialization, you should mark class and it's properties with special attributes.
Here is a sample, which prove that you don't have to do anything, but result will be different in every serialization method. Here is sample with Xml Serializer:
Let's say you want serialize following class:

    public class Order
    {
        public string sId = "test";
        public string CustomerId { getset; }
        public string OrderId { getset; }
        public int TrackingId { getset; }
        public System.DateTime DateCreated { getset; }
        public Enums.OrderStatus Status { getset; }
    }

class initialization:

            //try create object and serialize it
            Order order = new Order();
            order.CustomerId = "1902";
            order.Status = OrderStatus.Completed;
            order.DateCreated = System.DateTime.UtcNow;
            order.TrackingId = 100;

1) Use XmlSerializer

following code you can use to serialize class to external file:

System.Xml.Serialization.XmlSerializer sr0 = new System.Xml.Serialization.XmlSerializer(order.GetType());
using (System.Xml.XmlWriter writer = System.Xml.XmlWriter.Create(@"c:\order_100.xml"))
{
    sr0.Serialize(writer, order);
    writer.Close();
}

a) class without serialization attributes and save it to file. All public properties and variables will be serialized by default.
code above will generate following xml:

<?xml version="1.0" encoding="utf-8"?>
<Order xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <sId>test</sId>
  <CustomerId>1902</CustomerId>
  <TrackingId>100</TrackingId>
  <DateCreated>2011-08-23T01:44:29.6794597Z</DateCreated>
  <Status>Completed</Status>
</Order>


b) if you use XmlSerializer and want control on serialization you should mark class as [Serializable].
If you want exclude some properties from serialization, you should mark them with [System.Xml.Serialization.XmlIgnore].
If you want property as an xml attribute instead of xml element during serialization, you can use [System.Xml.Serialization.XmlAttribute(....)]

    [Serializable]
    public class Order
    {
        public string sId = "test";
        public string CustomerId { getset; }
        [System.Xml.Serialization.XmlElement(IsNullable = true)]
        public string OrderId { getset; }
        [System.Xml.Serialization.XmlIgnore]
        public int TrackingId { getset; }
        [System.Xml.Serialization.XmlAttribute("OrderDate")]
        public System.DateTime DateCreated { getset; }
        [System.Xml.Serialization.XmlElement("OrderStatus")]
        public Enums.OrderStatus Status { getset; }
    }

In the result you will get following xml:

<?xml version="1.0" encoding="utf-8"?>
<Order xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" OrderDate="2011-08-23T16:10:14.3406181Z">
  <sId>test</sId>
  <CustomerId>1902</CustomerId>
  <OrderStatus>Completed</OrderStatus>
</Order>

As you can see, TrackingId was excluded from the xml, DateCrated became as an attribute with name OrderDate, Status was renamed to OrderStatus, OrderId doesn't have assigned value but xml node appears in xml.

2) use DataContractSerializer

You can use following code to serialize class:

System.Runtime.Serialization.DataContractSerializer sr1 = new System.Runtime.Serialization.DataContractSerializer(order.GetType());
using (System.Xml.XmlWriter writer = System.Xml.XmlWriter.Create(@"c:\order_100_dc.xml"))
{
    sr1.WriteObject(writer, order);
    writer.Close();
}

a) class is not marked with any serialization attributes
The result will be:

<?xml version="1.0" encoding="utf-8"?>
<Order xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.datacontract.org/2004/07/PerformanceTest.Serialization.BLL">
  <CustomerId>1902</CustomerId>
  <DateCreated>2011-08-23T04:16:08.4371221Z</DateCreated>
  <OrderId i:nil="true" />
  <Status>Completed</Status>
  <TrackingId>100</TrackingId>
  <sId>test</sId>
</Order>

As you can see, it is pretty same result, but xmlns contains project workspace. There is one more difference - public variable at the end of the xml.

b) use the same class as in 1b):

<?xml version="1.0" encoding="utf-8"?>
<Order xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.datacontract.org/2004/07/PerformanceTest.Serialization.BLL">
  <_x003C_CustomerId_x003E_k__BackingField>1902</_x003C_CustomerId_x003E_k__BackingField>
  <_x003C_DateCreated_x003E_k__BackingField>2011-08-23T04:21:49.8596503Z</_x003C_DateCreated_x003E_k__BackingField>
  <_x003C_OrderId_x003E_k__BackingField i:nil="true" />
  <_x003C_Status_x003E_k__BackingField>Completed</_x003C_Status_x003E_k__BackingField>
  <_x003C_TrackingId_x003E_k__BackingField>100</_x003C_TrackingId_x003E_k__BackingField>
  <sId>test</sId>
</Order>

Instead of clean xml element names you can see _x003C_, _x003E etc. This is DataContractSerializer pattern. you still can see that public variable sId is visible as it was.

c) To have full control on class serialization with DataContractSerializer, you should mark class with [DataContract] and every property, which will be serialized with [DataMember]. If you mark class with [DataContract] only, none of properties will be serialized.
for example, if you want serialize sId, CustomerId and OrderId, you have to mark class as follows:

    [DataContract]
    public class Order
    {
        [DataMember]
        public string sId = "test";
        [DataMember]
        public string CustomerId { getset; }
        [DataMember]
        public string OrderId { getset; }
        public int TrackingId { getset; }
        public System.DateTime DateCreated { getset; }
        public Enums.OrderStatus Status { getset; }
    }

In the result you will get following XML:

<?xml version="1.0" encoding="utf-8"?>
<Order xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.datacontract.org/2004/07/PerformanceTest.Serialization.BLL">
  <CustomerId>1902</CustomerId>
  <OrderId i:nil="true" />
  <sId>test</sId>
</Order>
DataContract attribute has overloads, where you can specify namespace.

d) can be class marked both with [Serializable] and [DataContract]?. The answer is yes:
It is used when you have heterogeneous systems with different clients. Some clients can use XmlSerializer and some DataContractSerializer.

    [DataContract(Namespace = "http://Supplier1.org/OMS/Order")]
    [Serializable]
    [System.Xml.Serialization.XmlRoot(Namespace = "http://Supplier1.org/OMS/Order")]
    public class Order
    {
        [DataMember]
        public string sId = "test";
        [DataMember]
        public string CustomerId { getset; }
        [DataMember]
        [System.Xml.Serialization.XmlElement(IsNullable = true)]
        public string OrderId { getset; }
        [System.Xml.Serialization.XmlIgnore]
        public int TrackingId { getset; }
        [System.Xml.Serialization.XmlAttribute("OrderDate")]
        public System.DateTime DateCreated { getset; }
        [System.Xml.Serialization.XmlElement("OrderStatus")]
        public Enums.OrderStatus Status { getset; }
    }



Attributes will be applied based on what serialization method do you use. XmlDataContractSerializer will first look at [DataContract] attribute. If there is no such it will check if [Serializable] is specified and check for [System.Xml.Serialization.....] attributes. If there is no any attributes - default behaviour will be applied. I will describe it later, but it plays very important role when you have many back-end systems with the same class. For example, you have different product supplies and every product supplier has its own order management system. The problem is that every system uses object with the same name - order. To separate it somehow, you should use different namespaces.

Here is a sample how you can do it for DataContractSerializer (OMS - order management system):

[DataContract(Namespace = "http://Supplier1.org/OMS/Order")]
[DataContract(Namespace = "http://Supplier2.org/OMS/Order")]

The same result you can achieve for XmlSerializer by using XmlRoot attribute:

[System.Xml.Serialization.XmlRoot(Namespace = "http://Supplier1.org/OMS/Order")]
[System.Xml.Serialization.XmlRoot(Namespace = "http://Supplier2.org/OMS/Order")]


Deserialization

There is nothing special in this section:
1) Deserialization object from file by using XmlSerializer

Order order = null;
System.Xml.Serialization.XmlSerializer sr2 = new System.Xml.Serialization.XmlSerializer(typeof(Order));
using (System.Xml.XmlReader reader = System.Xml.XmlReader.Create(@"c:\order_100.xml"))
{
    order = (Order)sr2.Deserialize(reader);
}
2) Deserialization object from file by using DataContractSerializer

Order order = null;
System.Runtime.Serialization.DataContractSerializer sr3 = new System.Runtime.Serialization.DataContractSerializer(typeof(Order));
using (System.Xml.XmlReader reader = System.Xml.XmlReader.Create(@"c:\order_100_dc.xml"))
{
    order = (Order)sr3.ReadObject(reader);
}

3) Actually there is one special thing:
If your object was serialized by old framework with special characters your xml could look like this:

<?xml version="1.0" encoding="utf-8"?>
<Order xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <CustomerId>&#xB;</CustomerId>
  <OrderId>0</OrderId>
  <DateCreated>2011-08-11T16:48:44.9546066Z</DateCreated>
  <Status>Completed</Status>
</Order>

xml contains special symbol "0xb", which is not acceptable by XML standard. In ideal world it should be encoded as "&amp;#xB;".

No matter what method do you use you will get something like this:
"There was an error deserializing the object of type ........ ' ', hexadecimal value 0x0B, is an invalid character. Line 1, position ...."



There is a solution from Microsoft - use serializer settings XmlWriterSettings with option CheckCharacters during deserialization. Some people write xml parsers, before pass it to deserialization function.

Performance

I created simple application to make 5000 serializations to local file for both serializes.
Here is result:



Not bad for DataContractSerializer!

How to improve XmlSerializer performance?

There is one method to do it - you have to make serialization dlls by using sgen tool.
Otherwise runtime will use temporary folder to build it. If you have multithread application, it will do it again again and again, which will involve disk IO. I prefer update VS project, which contains classes to be serialized, so every time you make build it will generate such dlls. To do it you have to right mouse click on project and select "Unload project" in popup menu



right mouse click and select "Edit <project name>"



I simply edit project definition with this text an the end of file, right before </Project>:

  <Target Name="GenerateSerializationAssembliesForAllTypes" DependsOnTargets="AssignTargetPaths;Compile;ResolveKeySource" Inputs="$(MSBuildAllProjects);@(IntermediateAssembly)" Outputs="$(OutputPath)$(_SGenDllName)">
    <SGen BuildAssemblyName="$(TargetFileName)" BuildAssemblyPath="$(OutDir)" References="@(ReferencePath)" ShouldGenerateSerializer="true" UseProxyTypes="false" KeyContainer="$(KeyContainerName)" KeyFile="$(KeyOriginatorFile)" DelaySign="$(DelaySign)" ToolPath="$(SGenToolPath)">
      <Output TaskParameter="SerializationAssembly" ItemName="SerializationAssembly" />
    </SGen>
  </Target>
  <Target Name="BeforeBuild">
  </Target>
  <Target Name="AfterBuild" DependsOnTargets="GenerateSerializationAssembliesForAllTypes">
  </Target>

After you saved file, right mouse click on project and select "Reload project"



Rebuild project and open output folder.
If you class library project name is "PerformanceTest.Serialization.BLL" you will see
PerformanceTest.Serialization.BLL.dll and extra one dll, generated by sgen:
PerformanceTest.Serialization.BLL.XmlSerializers.dll (this dll will be used by xmlSerializer to serialize and deserialize objects). By generating that dll you will prevent process to use temp folder to build it during runtime.


1 comment:

  1. How do I know that an app is really using XmlSerializers.dll because when I check in Debug->Windows->Modules I don't see that dll? And in the app only references class library dll inside its bin directory; it doesn't bring the Serializers.dll with it.

    ReplyDelete