![]() Consuming JSON Strings in SQL Server. CREATEFUNCTIONdbo. JSON(@JSONNVARCHAR(MAX))RETURNS@hierarchy. How to Extract Filename from Path using SQL Functions. In this sql tutorial, t-sql developers can find sql codes for extracting file name from fullpath of a file. Hi All Experts: I have setup WSUS on a single server. The install of WSUS went very well and I followed all the instructions from Microsoft, but apparently the issue. SQLines provides services and open source tools to help you transfer data, convert database schema (DDL), views, stored procedures, functions, triggers, queries. This article comes to us from Michael K. Campbell. Michael writes "Humans can instantly spot the difference between "411 Madison Avenue" and "411 Madisan Av". ![]() ![]() TABLE ( element_id. INTIDENTITY(1,1)NOTNULL,/* internal surrogate primary key gives the order of parsing and the list order */ sequence. No[int]NULL,/* the place in the sequence for the element */ parent_IDINT,/* if the element has a parent then it is in this column. The document is the ultimate parent, so you can get the structure from recursing from the document */ Object_IDINT,/* each list or object has an object id. This ties all elements to a parent. Lists are treated as objects here */ NAMENVARCHAR(2. String. Value. NVARCHAR(MAX)NOTNULL,/*the string representation of the value of the element. Value. Type. VARCHAR(1. NOTnull/* the declared type of the value represented as a string in String. Value*/ )ASBEGIN DECLARE @First. Object. INT,- -the index of the first open bracket found in the JSON string @Open. Delimiter. INT,- -the index of the next open bracket found in the JSON string @Next. Open. Delimiter. INT,- -the index of subsequent open bracket found in the JSON string @Next. there. Close. Delimiter. INT,- -the index of subsequent close bracket found in the JSON string @Type. NVARCHAR(1. 0),- -whether it denotes an object or an array @Next. Close. Delimiter. Char. CHAR(1),- -either a '}' or a ']' @Contents. NVARCHAR(MAX),- -the unparsed contents of the bracketed expression @Start. INT,- -index of the start of the token that you are parsing @end. INT,- -index of the end of the token that you are parsing @param. INT,- -the parameter at the end of the next Object/Array token @End. Of. Name. INT,- -the index of the start of the parameter at end of Object/Array token @token. NVARCHAR(2. 00),- -either a string or object @value. NVARCHAR(MAX),- - the value as a string @Sequence. Noint,- - the sequence number within a list @name. NVARCHAR(2. 00),- -the name as a string @parent_IDINT,- -the next parent ID to allocate @len. JSONINT,- -the current length of the JSON String @characters. NCHAR(3. 6),- -used to convert hex to decimal @result. BIGINT,- -the value of the hex symbol being parsed @index. SMALLINT,- -used for parsing the hex value @Escape. INT- -the index of the next escape character DECLARE@Strings. TABLE/* in this temporary table we keep all strings, even the names of the elements, since they are 'escaped' in a different way, and may contain, unescaped, brackets denoting objects or lists. These are replaced in the JSON string by tokens representing the string */ ( String_IDINTIDENTITY(1,1), String. Value. NVARCHAR(MAX) ) SELECT- -initialise the characters to convert hex to ascii @characters='0. Sequence. No=0,- -set the sequence no. This is done because [{} and ] aren't escaped in strings, which complicates an iterative parse. ID=0; WHILE1=1- -forever until there is nothing more to do BEGIN SELECT @start=PATINDEX('%[^a- z. A- Z]["]%',@jsoncollate. SQL_Latin. 1_General_CP8. Bin); -- next delimited string IF@start=0. BREAK- -no more so drop through the WHILE loop IFSUBSTRING(@json,@start+1,1)='"' BEGIN- -Delimited Name SET@start=@Start+1; SET@end=PATINDEX('%[^\]["]%',RIGHT(@json,LEN(@json+'|')- @start)collate. SQL_Latin. 1_General_CP8. Bin); END IF@end=0- -no end delimiter to last string BREAK- -no more SELECT@token=SUBSTRING(@json,@start+1,@end- 1) - -now put in the escaped control characters SELECT@token=REPLACE(@token,FROMString,TOString) FROM (SELECT '\"'ASFrom. String,'"'ASTo. String UNIONALLSELECT'\\','\' UNIONALLSELECT'\/','/' UNIONALLSELECT'\b',CHAR(0. UNIONALLSELECT'\f',CHAR(1. UNIONALLSELECT'\n',CHAR(1. UNIONALLSELECT'\r',CHAR(1. UNIONALLSELECT'\t',CHAR(0. SELECT@result=0,@escape=1 - -Begin to take out any hex escape codes WHILE@escape> 0 BEGIN SELECT@index=0, - -find the next hex escape sequence @escape=PATINDEX('%\x[0- 9a- f][0- 9a- f][0- 9a- f][0- 9a- f]%',@tokencollate. SQL_Latin. 1_General_CP8. Bin) IF@escape> 0- -if there is one BEGIN WHILE@index< 4- -there are always four digits to a \x sequence BEGIN SELECT- -determine its value @result=@result+POWER(1. CHARINDEX(SUBSTRING(@token,@escape+2+3- @index,1), @characters)- 1),@index=@index+1; END - - and replace the hex sequence by its unicode value SELECT@token=STUFF(@token,@escape,6,NCHAR(@result)) END END - -now store the string away INSERTINTO@Strings (String. Value)SELECT@token - - and replace the string with a token SELECT@JSON=STUFF(@json,@start,@end+1, '@string'+CONVERT(NVARCHAR(5),@@identity)) END - - all strings are now removed. Now we find the first leaf. WHILE1=1 - -forever until there is nothing more to do BEGIN SELECT@parent_ID=@parent_ID+1 - -find the first object or list by looking for the open bracket SELECT@First. Object=PATINDEX('%[{[[]%',@jsoncollate. SQL_Latin. 1_General_CP8. Bin)- -object or array IF@First. Object=0. BREAK IF(SUBSTRING(@json,@First. Object,1)='{') SELECT@Next. Close. Delimiter. Char='}',@type='object' ELSE SELECT@Next. Close. Delimiter. Char=']',@type='array' SELECT@Open. Delimiter=@first. Object WHILE1=1- -find the innermost object or list.. BEGIN SELECT @len. JSON=LEN(@JSON+'|')- 1 - -find the matching close- delimiter proceeding after the open- delimiter SELECT @Next. Close. Delimiter=CHARINDEX(@Next. Close. Delimiter. Char,@json, @Open. Delimiter+1) - -is there an intervening open- delimiter of either type SELECT@Next. Open. Delimiter=PATINDEX('%[{[[]%', RIGHT(@json,@len. JSON- @Open. Delimiter)collate. SQL_Latin. 1_General_CP8. Bin)- -object IF@Next. Open. Delimiter=0 BREAK SELECT@Next. Open. Delimiter=@Next. Open. Delimiter+@Open. Delimiter IF@Next. Close. Delimiter< @Next. Open. Delimiter BREAK IFSUBSTRING(@json,@Next. Open. Delimiter,1)='{' SELECT@Next. Close. Delimiter. Char='}',@type='object' ELSE SELECT@Next. Close. Delimiter. Char=']',@type='array' SELECT@Open. Delimiter=@Next. Open. Delimiter END - -- and parse out the list or name/value pairs SELECT @contents=SUBSTRING(@json,@Open. Delimiter+1, @Next. Close. Delimiter- @Open. Delimiter- 1) SELECT @JSON=STUFF(@json,@Open. Delimiter, @Next. Close. Delimiter- @Open. Delimiter+1, '@'+@type+CONVERT(NVARCHAR(5),@parent_ID)) WHILE(PATINDEX('%[A- Za- z. SQL_Latin. 1_General_CP8. Bin))< > 0 BEGIN IF@Type='Object'- -it will be a 0- n list containing a string followed by a string, number,boolean, or null BEGIN SELECT @Sequence. No=0,@end=CHARINDEX(': ',' '+@contents)- -if there is anything, it will be a string- based name. SELECT @start=PATINDEX('%[^A- Za- z@][@]%',' '+@contentscollate. SQL_Latin. 1_General_CP8. Bin)- -AAAAAAAA SELECT@token=SUBSTRING(' '+@contents,@start+1,@End- @Start- 1), @endofname=PATINDEX('%[0- 9]%',@tokencollate. SQL_Latin. 1_General_CP8. Bin), @param=RIGHT(@token,LEN(@token)- @endofname+1) SELECT @token=LEFT(@token,@endofname- 1), @Contents=RIGHT(' '+@contents,LEN(' '+@contents+'|')- @end- 1) SELECT @name=stringvalue. FROM@strings WHEREstring_id=@param- -fetch the name END ELSE SELECT@Name=null,@Sequence. No=@Sequence. No+1 SELECT @end=CHARINDEX(',',@contents)- - a string- token, object- token, list- token, number,boolean, or null IF@end=0 - -HR Engineering notation bugfix start IFISNUMERIC(@contents)=1 SELECT@end=LEN(@contents) Else - -HR Engineering notation bugfix end SELECT @end=PATINDEX('%[A- Za- z. A- Za- z. 0- 9@+. SQL_Latin. 1_General_CP8. Bin)+1 SELECT @start=PATINDEX('%[^A- Za- z. A- Za- z. 0- 9@+. SQL_Latin. 1_General_CP8. Bin) - -select @start,@end, LEN(@contents+'|'), @contents SELECT @Value=RTRIM(SUBSTRING(@contents,@start,@End- @Start)), @Contents=RIGHT(@contents+' ',LEN(@contents+'|')- @end) IFSUBSTRING(@value,1,7)='@object' INSERTINTO@hierarchy (NAME,Sequence. No,parent_ID,String. Value,Object_ID,Value. Type) SELECT@name,@Sequence. Using Fuzzy Lookup Transformations in SQL Server Integration Services. Left unchecked, poor data integrity costs businesses billions of dollars each year by negatively impacting business intelligence and decision support systems. Poor data integrity can also wreak havoc with inventory and contract management applications - to say nothing of all of the improperly addressed junk- mail that must pile up at the post office. To help businesses overcome the problems associated with poor data integrity, specialized vendors offer a variety of solutions to help detect and correct subtle differences in semantically identical data through a process known as data cleansing. For organizations with SQL Server 2. Fuzzy Lookup Transformation from SQL Server Integration Services (SSIS) can be leveraged to create data cleansing solutions by detecting semantically equivalent matches which can then be cleansed as needed. Behind the scenes the Fuzzy Lookup operation builds token- based indexes (in the form of tables) against approved values in a reference table. As each piece of non- cleansed data is processed, SSIS compares it against the tokenized index and generates a Similarity percentage along with an accompanying Confidence factor (also expressed as a percentage). Both of these values can then be logically evaluated to help strike any desired balance between possibility and certainty. The process works incredibly well and provides an excellent balance between functionality and manageability. I also found the entire process to be very approachable in terms of learning curve. Testing SSIS' Fuzzy Lookup Functionality. To test SSIS data cleansing, I created a list of parts pulled from the Adventure. Works database and stored it in a flat file using the code in Listing 1 output as an . Listing 1: Creating Bogus Data. SET NOCOUNT ON. WITH Source (Part. Name, Number. Of. Parts. Taken). SELECT TOP 1. Name [Part. Name]. CHARINDEX('A',CAST(NEWID() as varchar(3. Number. Of. Parts. Taken]. Production. Product p. INNER JOIN Production. Product. Inventory i ON p. Product. ID = i. Product.ID. i. Location. ID = 6 - - miscellaneous storage.ORDER BY NEWID().Number. Of. Parts.Taken. Number. Of.Parts. Taken > 0.Once the data was exported, I opened it up in Note. . Pad and made a number of formatting and spelling changes to simulate the type of semantic problems typically encountered in Exact, Transform, and Load (ETL) operations where data has been hand- entered. Once the 'sample' data was created, I opened up SQL Server 2. Business Intelligence Studio and created a new Integration Services Project. Using a Flat File Source linked to my . I routed my 'sloppy' input into a Fuzzy Lookup Transformation to create 'fuzzy matches' against incoming part names in the Adventure. Works. Production. Product table. While you can set threshold criteria directly within the Fuzzy Lookup Transformation itself, you can also opt to have _Similarity and _Confidence columns appended to a default output stream from the transform, which you can then use to evaluate the possible matches with a Conditional Transform operation. This is the route I took, and, as you can see in Figure 1, leveraging a Conditional Split Transformation, I was able to easily set up my own criteria defining Solid and Likely matches, with anything not meeting those criteria being output as Non Matches. Errors were routed out in their own error stream as normal.)Figure 1: Assigning output paths based on confidence levels. With rules in place to parcel results into designated output streams, I routed Solid Matches, which were the bulk of the output, directly to a SQL Server destination to simulate a normal ETL endpoint (see Figure 2). Each of the other three output streams from the Conditional Split were then 'coupled' with a Derived Column Transformation that added a new _Match column (i. I created) to the output for each path, and assigned a literal value of "LIKELY", "NON- MATCH", and "ERROR" for each output type as seen in Figure 2. Figure 2: The entire package (and direction of output paths based on matching criteria)I then used a Union All Transformation (see Figure 2) to combine the output from each of the derived column paths into a single result set to make it easy for humans decipher what would need to be done with results that weren't cleansed programmatically. In this way, they can look at all of the raw data as well as the value provided in the derived column to help guide their decisions. Figure 3 shows some sample output routed in to the Human Intervention endpoint as displayed by a Data Viewer. Looking at the results, it's pretty easy to spot that I've got my tolerances set quite high, but the nice thing about using SSIS' Fuzzy Lookup for data cleansing is that you can easily tune the entire process in iterative fashion using sample data to help get it exactly as you'd like it to be. Figure 3: Examining the output of the "Human Intervention" Endpoint. If you'd like to get a better idea of how the whole process works, download the accompanying sample application which includes the entire project displayed in Figure 2. You may determine that my approach was overkill (I'm leaning that way myself - due to the number of output paths), but hopefully it will provide you with a good overview of the ways you can handle the output from a Fuzzy Lookup Transformation, and save you a bit of time in attempting the fairly small learning curve needed to use SSIS for data cleansing. Conclusion. What I liked best about the Fuzzy Lookup Transform is that it's part of a very flexible and very powerful ETL framework that makes loading, cleansing, and outputting massaged data easy - even to and from heterogeneous locations. There is a learning curve involved (even if you've had plenty of DTS experience), and the designer and tools definitely have a strong 'Microsoft Version 1' feel to them, but other Data Cleansing applications are likely to have their own learning curves and warts as well. Overall, data cleansing with SQL Server Integration Services offers highly flexible yet easy to manage functionality that creates consistent, reproducible reports that organizations can easily cater to their own needs. I highly recommend it because of its excellent blend of functionality as well as its attractive price tag. Michael K. Campbell is a professional SQL Server consultant with years of experience as a DBA and database developer.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. Archives
October 2017
Categories |