How statistics are calculated
We count how many offers each candidate received and for what salary. For example, if a Data QA developer with XML with a salary of $4,500 received 10 offers, then we would count him 10 times. If there were no offers, then he would not get into the statistics either.
The graph column is the total number of offers. This is not the number of vacancies, but an indicator of the level of demand. The more offers there are, the more companies try to hire such a specialist. 5k+ includes candidates with salaries >= $5,000 and < $5,500.
Median Salary Expectation – the weighted average of the market offer in the selected specialization, that is, the most frequent job offers for the selected specialization received by candidates. We do not count accepted or rejected offers.
Trending Data QA tech & tools in 2024
Data QA
What is Data Quality
A data quality analyst maintains an organisation’s data so that they can have confidence in the accuracy, completeness, consistency, trustworthiness, and availability of their data. DQA teams are in charge of conducting audits, defining the data quality standards, spotting outliers, and fixing the flaws, and play a key role at all stages in the data lifecycle. Without DQA work, strategic plans will fail, operations will go awry, customers will leave, and organisations will face substantial financial losses, as well as a lack of customer trust and potential legal repercussions due to poor-quality data.
This is a job that has changed as much as the hidden infrastructure that transforms data into insight and then powers the apps that we all use. I mean, it’s changed a lot.
Data Correctness/Validation
This is the largest stream of all the tasks. When we talk about data correctness, we should be asking: what does correctness mean to you, for this dataset? Because it would be different for every dataset and every organisation. The commonsense interpretation is that it must be what your end user (or business) wants from the dataset. Or what would be an expected result of the dataset.
We can obtain this just by asking questions, or else reading through the list of requirements. Here are some of the tests we might run, in this stream:
Finding Duplicates — nobody wants this in their data.
– Your data contains unique/distinct values in that column/field. Will the returned value be a unique/distinct value in that column/field?
– Any value that can be found in your data is returned.
Data with KPIs – If data has any columns we can sum, min or max on it’s called a key performance indicator. So basically any models which are mostly numeric/int column. eg: Budget, Revenue, Sales etc. If there is data comparison between two datasets then below tests applies:
– Comparing counts between two datasets — get the difference in count
– Compare the unique/distinct values and counts for columns – find out which values are not present in either of the datasets.
– Compare the KPIs between two datasets and get the percentage difference between them.
– Replace missing values – missing in any one of the datasets with primary or composite primary key. This can be done in a data source that does not have primary key too.
– Perform the metrics by segment for the individual column value — that can help you determine what might be going wrong if the count of values in the Zoopla-side doesn’t match the count on the Rightmove-side or if some of the values are missing.
Data Freshness
This is an easy set. How do we know if the data is fresh?
An obvious indication here is to check if your dataset has a date column, in which case, you just check the max date. Another one is, when the data was pulled into a particular table, all of this can be converted into a very simple automated checks, which we might talk about in a later blog entry.
Data Completeness
This could be an intermediate step in addition to data correctness, but how do we know to get there if the space of answers is complete?
To do this test, check if any column has all values null in it perhaps that’s okay, but most of the time it’s bad news.
Another test would be one-valuedness: whether everywhere on the column all values are the same, probably in some cases that would be a fine result, but probably in other cases that would be something we’d rather look into.
What are Data Quality Tools and How are They Used?
Data quality tools are used to improve, or sometimes automate, many processes required to ensure that data stays fit for analytics, data science, and machine learning. For example, such tools enable teams to evaluate their existing data pipelines, identify bottlenecks in quality, and even automate many remediation steps. Examples of activities relating to guaranteeing data quality include data profiling, data lineage, and data cleansing. Data cleansing, data profiling, measurement, and visualization tools can be used by teams to ‘understand the shape and values of the data assets that have been acquired – and how they are being collected’. These tools will call outliers and mixed formats. In the data analytics pipeline, data profiling acts as a quality control gate. And each of these are data management chores.
Where is XML used?
Config Files: The Tidy Cupboard!
- Software settings love to chill in XML's neat drawers, like a well-organized sock drawer for developers.
Soap Opera of Services
- SOAP protocols gossip in XML messages like a postal service for chatty web services.
Ancient Scrolls of Data Exchange
- Before JSON muscled in, XML was the elder statesman for data sharing, trading bits like vintage baseball cards.
The X Marks the Spot
- In the treasure hunt of office templates, XML maps the way to riches in Microsoft Office file formats.
XML Alternatives
JSON (JavaScript Object Notation)
JSON is a lightweight data-interchange format. It's easy for humans to read and write, and easy for machines to parse and generate. Used primarily to transmit data between a server and web application.
{
"name": "John",
"age": 30,
"isStudent": false
}
- Human-readable and writtable
- Lightweight, leading to faster processing
- Widely supported across programming languages
- Lacks support for comments
- No support for namespaces
- Can be verbose for complex structures
YAML (YAML Ain't Markup Language)
YAML is a human-friendly data serialization standard for all programming languages. It's often used for configuration files and in data exchange where human readability is important.
name: John
age: 30
isStudent: false
- Highly readable syntax
- Supports complex data structures
- Uses indentation for scope
- Prone to errors due to indentation
- Can be slow to parse in large quantities
- Lacks security features by default
Protocol Buffers (Protobuf)
Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data, similar to XML but smaller, faster, and simpler. Used for storing and exchanging structured information.
message Person {
required string name = 1;
required int32 age = 2;
optional bool is_student = 3;
}
- Compact and efficient serialization
- Schema-based with clear contracts
- Backward and forward compatibility
- Requires pre-defined schema
- Not human-readable format
- Smaller ecosystem compared to JSON and XML
Quick Facts about XML
XML: The Hierarchical Heavyweight That Outgrew Its SGML Sibling
Picture it: 1996, the year the Spice Girls were telling us what they "really, really want" and the tech world got what it really, really needed – XML! Conceived as a simplified subset of SGML, XML was designed by a ten-member gang called the XML Working Group, helmed by its captain, Jon Bosak. They sought to make this data structuring and transportation champ both human and machine-readable, which is kind of like making broccoli taste like chocolate – ambitious but oh, so beneficial.
<note>
<to>Developer</to>
<from>XML</from>
<heading>Hello, World!</heading>
<body>Don't forget to validate me!</body>
</note>
XML Speaks in Tongues: Namespaces and X-Words
Fast forward a couple of years to 1999, a world nervously peeking at Y2K, and XML 1.0 was spreading like the latest cat meme. But one set of tags wasn't enough to hold the convos across different XML vocabularies. Enter Namespaces in XML – not about outer space, but just as cool. This meant XML could play nice with HTML without overstepping tag boundaries. And with XSLT, XPath, and XQuery joining the party, XML had more X's than a pirate's treasure map!
<html:div xmlns:html="http://www.w3.org/1999/xhtml">
<music:song xmlns:music="http://www.music.org">
<music:title>Code Me Maybe</music:title>
<music:artist>Carly Debug Jepsen</music:artist>
</music:song>
</html:div>
XML 1.1: When It Decided to Get a Makeover
Let's zoom to 2004. Usher's 'Yeah!' was topping the charts, and XML was getting an upgrade to 1.1. And just like low-rise jeans, not everyone was thrilled about the change. This version tweaked character encoding and made it simpler to include special characters. Basically, XML went from the tech equivalent of a flip phone to a smartphone – more features, more emojis, but not everyone wanted to relearn how to text!
<message>
<text>Hello, 😎 World!</text>
</message>
What is the difference between Junior, Middle, Senior and Expert XML developer?
Seniority Name | Experience | Average Salary (USD/year) | Responsibilities & Activities |
---|---|---|---|
Junior XML Developer | 0-2 years | $50,000 - $70,000 |
|
Middle XML Developer | 2-5 years | $70,000 - $90,000 |
|
Senior XML Developer | 5-10 years | $90,000 - $110,000 |
|
Expert/Team Lead XML Developer | 10+ years | $110,000+ |
|
Top 10 XML Related Tech
Java/C#
Like peanut butter and jelly, Java and C# are the classic sandwich spread for XML manipulation – they're the bread-and-butter languages that play nice with XML right out of the box. Java has libraries like JAXB that can marshal and unmarshal XML faster than a cowboy at a rodeo, and C# has LINQ to XML that lets you query your XML documents as if they were SQL-database socialites at a high-tea event.
// Java example JAXB
Unmarshaller unmarshaller = JAXBContext.newInstance(YourClass.class).createUnmarshaller();
YourClass yourClassInstance = (YourClass) unmarshaller.unmarshal(new File("path/to/your/xmlfile.xml"));
// C# example LINQ to XML
XDocument xmlDoc = XDocument.Load("path/to/your/xmlfile.xml");
IEnumerable<XElement> rows = from row in xmlDoc.Descendants("row") select row;
XML Schema
The blueprint of the XML world – if your XML were a LEGO structure, XML Schema would be the instruction booklet, ensuring that each piece clicks exactly where it should. It defines the structure and types of data allowed, making it the strict librarian of the XML data files, always shushing incorrect formats.
// XML Schema example
<xs:element name="contact">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:element name="phone" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
XSLT
XSLT is like the DJ of XML files, remixing and splicing together XML documents to produce a fresh new track... or in this case, a brand spanking new HTML, text, or another XML document. With XSLT, transform your data like a magical origami master folding paper swans.
// XSLT example
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<body>
<h2>My CD Collection</h2>
<table border="1">
<xsl:for-each select="catalog/cd">
<tr>
<td><xsl:value-of select="artist"/></td>
<td><xsl:value-of select="title"/></td>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
XPath
XPath is the treasure map to your XML's gold – with it, point out the exact location of data in an XML document with the precision of a GPS system on a secret agent's car. It's like Where's Waldo, but your Waldo sticks out like a sore thumb.
// XPath example
String expression = "/class/student[@rollno='493']";
Node studentNode = (Node) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODE);
XQuery
Picture a private investigator rifling through a drawer – that's XQuery in the XML database. It's almost synonymous with "Where's the beef?" but for data in XML documents, allowing you to extract the meaty information bits you actually care about.
// XQuery example
for $x in doc("yourdata.xml")//yourElement
where $x/yourSubElement = "value"
return $x
XML DOM
The XML DOM is your XML document’s family tree, but instead of Uncle Bob and Aunt Sue, you have nodes and elements as relatives. It lets you navigate and manipulate these family gatherings using JavaScript, turning you into the ultimate family planner.
// XML DOM example in JavaScript
var xmlDoc = parser.parseFromString(text,"text/xml");
xmlDoc.getElementsByTagName('title')[0].childNodes[0].nodeValue = "New Title";
SAX (Simple API for XML)
SAX is like speed dating for XML parsers – instead of getting cozy with the whole document, it darts through elements firing events, making it a memory-efficient choice for playing the XML field if you're tight on memory budget.
// SAX example
class UserHandler extends DefaultHandler {
public void startElement(String uri,
String localName, String qName, Attributes attributes) throws SAXException {
if (qName.equalsIgnoreCase("book")) {
String isbn = attributes.getValue("isbn");
}
}
}
Apache Camel
Apache Camel is like the Switzerland of application integrations, a peacekeeper that lets different systems talk in XML without throwing punches. It's an integration framework that routes XML messages between APIs like a postal service on steroids.
// Apache Camel routing example
from("file:data/inbox")
.process(new MyTransformer())
.to("jms:queue:order");
SOAP Web Services
The postage stamp on your envelope of data, SOAP envelopes XML web service requests and ensures they’re delivered properly. It's a protocol more formal than a penguin at a gala event, ensuring messages are formatted and transmitted with the decorum of a butler carrying a silver tray.
// SOAP request example
POST /InStock HTTP/1.1
Host: www.example.org
Content-Type: text/xml; charset=utf-8
Content-Length: length
SOAPAction: "http://www.w3.org/2001/12/soap-envelope"
<soap:Envelope xmlns:soap="http://www.w3.org/2001/12/soap-envelope"
soap:encodingStyle="http://www.w3.org/2001/12/soap-encoding">
<soap:Body xmlns:m="http://www.example.org/stock">
<m:GetStockPrice>
<m:StockName>AAPL</m:StockName>
</m:GetStockPrice>
</soap:Body>
</soap:Envelope>
XML-RPC
Think of XML-RPC as your grandma's telegraph system, albeit less old-timey and more internet-friendly. This protocol allows remote procedure calls encoded in XML, letting software on different operating systems talk easily like pen pals from the '90s.
// XML-RPC request example
<?xml version="1.0"?>
<methodCall>
<methodName>methodNameHere</methodName>
<params>
<param><value><int>42</int></value></param>
</params>
</methodCall>