How statistics are calculated
We count how many offers each candidate received and for what salary. For example, if a Data QA developer with JSON with a salary of $4,500 received 10 offers, then we would count him 10 times. If there were no offers, then he would not get into the statistics either.
The graph column is the total number of offers. This is not the number of vacancies, but an indicator of the level of demand. The more offers there are, the more companies try to hire such a specialist. 5k+ includes candidates with salaries >= $5,000 and < $5,500.
Median Salary Expectation – the weighted average of the market offer in the selected specialization, that is, the most frequent job offers for the selected specialization received by candidates. We do not count accepted or rejected offers.
Trending Data QA tech & tools in 2024
Data QA
What is Data Quality
A data quality analyst maintains an organisation’s data so that they can have confidence in the accuracy, completeness, consistency, trustworthiness, and availability of their data. DQA teams are in charge of conducting audits, defining the data quality standards, spotting outliers, and fixing the flaws, and play a key role at all stages in the data lifecycle. Without DQA work, strategic plans will fail, operations will go awry, customers will leave, and organisations will face substantial financial losses, as well as a lack of customer trust and potential legal repercussions due to poor-quality data.
This is a job that has changed as much as the hidden infrastructure that transforms data into insight and then powers the apps that we all use. I mean, it’s changed a lot.
Data Correctness/Validation
This is the largest stream of all the tasks. When we talk about data correctness, we should be asking: what does correctness mean to you, for this dataset? Because it would be different for every dataset and every organisation. The commonsense interpretation is that it must be what your end user (or business) wants from the dataset. Or what would be an expected result of the dataset.
We can obtain this just by asking questions, or else reading through the list of requirements. Here are some of the tests we might run, in this stream:
Finding Duplicates — nobody wants this in their data.
– Your data contains unique/distinct values in that column/field. Will the returned value be a unique/distinct value in that column/field?
– Any value that can be found in your data is returned.
Data with KPIs – If data has any columns we can sum, min or max on it’s called a key performance indicator. So basically any models which are mostly numeric/int column. eg: Budget, Revenue, Sales etc. If there is data comparison between two datasets then below tests applies:
– Comparing counts between two datasets — get the difference in count
– Compare the unique/distinct values and counts for columns – find out which values are not present in either of the datasets.
– Compare the KPIs between two datasets and get the percentage difference between them.
– Replace missing values – missing in any one of the datasets with primary or composite primary key. This can be done in a data source that does not have primary key too.
– Perform the metrics by segment for the individual column value — that can help you determine what might be going wrong if the count of values in the Zoopla-side doesn’t match the count on the Rightmove-side or if some of the values are missing.
Data Freshness
This is an easy set. How do we know if the data is fresh?
An obvious indication here is to check if your dataset has a date column, in which case, you just check the max date. Another one is, when the data was pulled into a particular table, all of this can be converted into a very simple automated checks, which we might talk about in a later blog entry.
Data Completeness
This could be an intermediate step in addition to data correctness, but how do we know to get there if the space of answers is complete?
To do this test, check if any column has all values null in it perhaps that’s okay, but most of the time it’s bad news.
Another test would be one-valuedness: whether everywhere on the column all values are the same, probably in some cases that would be a fine result, but probably in other cases that would be something we’d rather look into.
What are Data Quality Tools and How are They Used?
Data quality tools are used to improve, or sometimes automate, many processes required to ensure that data stays fit for analytics, data science, and machine learning. For example, such tools enable teams to evaluate their existing data pipelines, identify bottlenecks in quality, and even automate many remediation steps. Examples of activities relating to guaranteeing data quality include data profiling, data lineage, and data cleansing. Data cleansing, data profiling, measurement, and visualization tools can be used by teams to ‘understand the shape and values of the data assets that have been acquired – and how they are being collected’. These tools will call outliers and mixed formats. In the data analytics pipeline, data profiling acts as a quality control gate. And each of these are data management chores.
Where is JSON used?
APIs: The Chatty Kathy of Software
- JSON turns APIs into gabby gossips, letting them blab data between services like they're swapping juicy stories over a digital fence.
Config Files: The Lifestyle Coaches for Apps
- In the realm of config files, JSON is the life coach, whispering sweet nothings to apps about how they should behave and strut their code.
Web Storage: The Secret Diary Keeper
- Like a digital diary, JSON helps web storage keep secrets in neat, tidy JSON objects, ready to spill the beans whenever the browser reminisces.
Server Logs: The Overly Detailed Memoir Authors
- Server logs pen their tedious life stories in JSON format, ensuring every "Dear Diary" moment is captured for the nerdy sysadmins to later enjoy.
JSON Alternatives
XML (Extensible Markup Language)
Structured data format widely used for web services, configurations, and data interchange. Verbose compared to JSON. Example: configuration files, SOAP.
<person>
<name>John</name>
<age>30</age>
<city>New York</city>
</person>
- Human-readable and self-descriptive.
- Supports namespaces and complex structures.
- Verbose, leading to larger file sizes.
- Parsing can be slow.
- Widely supported in legacy systems.
YAML (YAML Ain't Markup Language)
Human-friendly data serialization standard, often used for config files and applications that require data to be stored or transmitted. Example: CI/CD configurations, Docker compose.
person:
name: John
age: 30
city: New York
- Very human-readable.
- Allows comments, enhancing document context.
- Indentation-based, which may lead to errors.
- Can be ambiguous in complex structures.
- Supports complex data types natively.
Protocol Buffers
Language-neutral, platform-neutral serialization technique developed by Google. Used for storing and interchanging structured data. Example: gRPC, efficient data storage.
message Person {
string name = 1;
int32 age = 2;
string city = 3;
}
- Very efficient data encoding.
- Strict schema enforced, minimizing ambiguity.
- Requires pre-defined schema and generated code.
- Binary format, not human-readable.
- Excellent for large-scale systems.
Quick Facts about JSON
A Little Bit of JSON's Genesis
Once upon a byte, when the world was in deep need of a lightweight data-interchange format, JSON came to the rescue. Birthed in the cyber womb of Douglas Crockford's brain in the early 2000s, JSON, or JavaScript Object Notation, came into existence. Unlike its bulky cousin XML, JSON donned a svelte figure which made it a darling of web developers for data trafficking. Clever and unassuming, JSON wasn't officially a "thing" until 2001, but its seeds were planted in JavaScript's ECMA-262 standard—because, you know, parentage matters in the tech lineage!
JSON's Coming of Age Saga
Now, let's fast-forward to 2013. By then, JSON's been strutting around the internet for over a decade. Its gawky years are behind it and JSON finally gets the recognition it deserves! Enter the JSON standard: ECMA-404. It's like getting a certification from the big league that says, "Yeah, kid, you made it!". Now officially anointed, JSON wasn't just a convenient format, it became a textbook example—literally—in the developer’s arsenal. Major plot twist, eh?
JSON's Syntax Shenanigans
Delving into the nitty-gritty, JSON's syntax is a smorgasbord of arrays and objects, sizzling like bacon with key-value pairs. It's as easy as declaring a JavaScript object, but don't be fooled—JSON plays hard-to-get with other languages. While it wears JavaScript's jersey, it boasts of language independence, meaning it can mingle with Python, Ruby, and even PHP! Behold JSON's magic spell:
{
"wizard": "Harry Potter",
"muggle-born": true,
"wand": {
"core": "Phoenix feather",
"material": "Holly",
"length": 11
}
}
Voilà! A simple JSON object that could stir up a potion for easy data exchange. Just remember, a missing comma or an extra curly brace and your JSON turns into a Gremlin after midnight—utterly chaotic.
What is the difference between Junior, Middle, Senior and Expert JSON developer?
Seniority Name | Years of Experience | Average Salary (USD/year) | Responsibilities & Activities |
---|---|---|---|
Junior Developer | 0-2 | 50,000 - 70,000 |
|
Middle Developer | 2-5 | 70,000 - 95,000 |
|
Senior Developer | 5-10 | 95,000 - 140,000 |
|
Expert/Team Lead | 10+ | 140,000+ |
|
Top 10 JSON Related Tech
JavaScript
Ah, the bread and butter of web development, JavaScript. It's like the swiss army knife for anyone dabbling in JSON's world. You see, JSON is JavaScript Object Notation, so it's no surprise that they go together like peanut butter and jelly. If JSON were a celebrity, JavaScript would be the paparazzi - always around and deeply interested. Handling JSON in JavaScript is a no-brainer:
var jsonData = '{"name": "Monty", "isPython": false}';
var obj = JSON.parse(jsonData);
console.log(obj.name); // Outputs: Monty
Python
Python slithers its way into JSON handling with such elegance, it makes you want to whisper sweet nothings into its interpreter. It's the gentle giant of programming languages; powerful yet so readable that it feels like pseudo-code. Python treats JSON like it's one of its own dictionaries, which is a type so fitting you'd think JSON was Python's long-lost sibling:
import json
jsonData = '{"name": "Monty", "isPython": true}'
obj = json.loads(jsonData)
print(obj['name']) # Outputs: Monty
Node.js
If JavaScript is the cool kid at school, Node.js is its older, hipster sibling who wears flannel and listens to vinyl records. Node.js takes JavaScript outside the confines of a browser and lets it roam free in the server-side world! JSON is treated with first-class citizenship in Node.js, allowing for seamless parsing and stringifying:
const jsonData = '{"name": "Monty", "isPython": false}';
const obj = JSON.parse(jsonData);
console.log(obj.name); // Outputs: Monty
Express.js
Imagine you're building a clubhouse. Now, Node.js provides the land (server-side environment), but Express.js is the one that actually gives you the tools to build it (framework for web apps). It's like the helpful hardware store guy but for HTTP servers. Handling JSON in Express is smoother than a fresh jar of Skippy, made even easier with built-in middleware:
const express = require('express');
const app = express();
app.use(express.json());
app.post('/api/data', (req, res) => {
console.log(req.body); // req.body is already a parsed JSON object
res.send('JSON received!');
});
jQuery
jQuery may seem like the grandpa of the JS library family, but it's the type of grandpa who's got secret ninja skills. It made working with JS so easy back in the day, it was like throwing a Hadouken in Street Fighter – everyone did it. Even though it's not the coolest kid on the block anymore, it still offers handy AJAX methods for JSON:
$.getJSON('/api/data', function(data) {
console.log(data); // Boom! JSON magic.
});
REST APIs
RESTful services are the gossip queens of the internet; they love to talk and share data. REST APIs and JSON are BFFs because they communicate data in a stateless, cacheable, and platform-independent manner. Curl up with a good URL, send an HTTP request, and get back that juicy JSON goodness:
fetch('https://api.someservice.com/data')
.then(response => response.json())
.then(data => console.log(data));
AJAX
AJAX is like the postal service of the web—it lets you send and receive packages (data) without refreshing the page, which is quite the party trick. Need to send some JSON back home to the server? AJAX will sneak it past the refresh police without causing a scene:
const xhr = new XMLHttpRequest();
xhr.open('POST', '/api/data', true);
xhr.setRequestHeader('Content-Type', 'application/json');
xhr.send(JSON.stringify({ name: "Monty", isPython: false }));
XMLHttpRequest
Before all these newfangled frameworks and libraries, there was XMLHttpRequest (XHR), a valiant knight in the realm of AJAX. It may be a bit long in the tooth now and look like alphabet soup, but XHR was the one you called when you needed to make raw HTTP requests to exchange data, and yes, it can handle JSON too:
var xhr = new XMLHttpRequest();
xhr.onreadystatechange = function() {
if (xhr.readyState === XMLHttpRequest.DONE) {
if (xhr.status === 200) {
console.log(JSON.parse(xhr.responseText));
}
}
};
xhr.open('GET', '/api/data', true);
xhr.send();
Fetch API
The Fetch API is the cool new kid that moved into XMLHttpRequest's neighborhood. It's like the SpaceX to NASA, sending requests with promises that make dealing with responses easier than fitting in on your first day at school. Wanna be a part of the in-crowd? Use Fetch to work with JSON:
fetch('/api/data')
.then(response => response.json())
.then(data => console.log(data));
JSON Schema
Last but not least, think of JSON Schema as the bouncer at a club—it decides what JSON gets to pass through the velvet ropes based on its shape and size. JSON Schema validates the structure of your JSON data, so you don't end up with a string when you're expecting an integer. Keeping your JSON in check never looked so professional:
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"name": {
"type": "string"
},
"isPython": {
"type": "boolean"
}
},
"required": ["name", "isPython"]
}