ANSI to Unicode mapping issues (resend)

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

ANSI to Unicode mapping issues (resend)

David Brown
(Pardon me if this is a duplicate - I tried sending it a few days ago from a
different address, but it didn't appear to go through)

We have been building and shipping an older ANSI version of our ODBC driver
(StarSQL) in Unix/Linux environments. We recently ported our current Unicode
ODBC driver (which has been running on Windows for several years) to Linux,
and ran into some issues that appear to be related to the unixODBC Driver
Manager mappings from ANSI entry point to the driver's Unicode entry points
when an ANSI application invokes ODBC calls to a Unicode driver.

Has anyone else encountered any of these issues?  Thoughts on a solution?

We are using the 2.3.2 release.

Here is a list of the issues encountered by the developer of our driver:

1)      The Driver Manager does not map calls from an ANSI application's
call to SQLGet/SetStmtOption to a Unicode driver's SQLGet/SetStmtAttrW entry
points. It only does the mapping to SQLGet/SetStmtAttr for ANSI drivers. We
were able to work around this by adding SQLGet/SetStmtOption function entry
points in our driver, but we shouldn't have to do that.

2)      SQLSetDescField does not alter the length supplied by the
application ("buffer_length") when the field supplied is a string which
value gets converted to Unicode before being passed to the Unicode Driver.
In this particular Unicode ODBC API, the buffer_length should be a
byte-count, not a character-count. The implementation of SQLGetDescField in
the unixODBC driver manager does deal with this better and divides
string_length by sizeof(SQLWCHAR) before returning to the application. That
works better, but is too simplistic for multi-byte ANSI data (e.g. UTF-8)
See
#3.

3)      Conversions between Unicode and ANSI are almost universally assuming
that one byte of ANSI data will produce two bytes of Unicode data (when
sizeof(SQLWCHAR) is 2). The code needs to check the length of the resulting
string (ANSI or Unicode) whenever such a conversion occurs and then use the
resulting length when passing it on to the driver or calling application.
Functions like  the ANSI versions of  SQLPrepare and SQLExecDirect can't
just perform an ansi-to-unicode translation and then pass the application
supplied length to the Unicode driver.


Looking at the unixODBC code, it seems clear that we were exposed to similar
issues
with our old ANSI driver when called from a Unicode application.
Applications using parameter markers rather than string literals would be
less sensitive to the limitations of the current unixODBC driver manager
implementation since keywords and identifiers are less likely to contain
"problematic" characters , but it would seem important to address this none
the less.

Any suggestions would be appreciated.

Thanks
David Brown
StarQuest




_______________________________________________
unixODBC-dev mailing list
[hidden email]
http://mailman.unixodbc.org/mailman/listinfo/unixodbc-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ANSI to Unicode mapping issues (resend)

Nick Gorham-2
On 02/05/14 02:31, David Brown wrote:

> (Pardon me if this is a duplicate - I tried sending it a few days ago
> from a different address, but it didn't appear to go through)
>
> We have been building and shipping an older ANSI version of our ODBC
> driver
> (StarSQL) in Unix/Linux environments. We recently ported our current
> Unicode
> ODBC driver (which has been running on Windows for several years) to
> Linux,
> and ran into some issues that appear to be related to the unixODBC Driver
> Manager mappings from ANSI entry point to the driver's Unicode entry
> points
> when an ANSI application invokes ODBC calls to a Unicode driver.
>
> Has anyone else encountered any of these issues?  Thoughts on a solution?
>
> We are using the 2.3.2 release.
>
> Here is a list of the issues encountered by the developer of our driver:
>
> 1)      The Driver Manager does not map calls from an ANSI application's
> call to SQLGet/SetStmtOption to a Unicode driver's SQLGet/SetStmtAttrW
> entry
> points. It only does the mapping to SQLGet/SetStmtAttr for ANSI
> drivers. We
> were able to work around this by adding SQLGet/SetStmtOption function
> entry
> points in our driver, but we shouldn't have to do that.

Will lok at when I get a chance.

>
> 2)      SQLSetDescField does not alter the length supplied by the
> application ("buffer_length") when the field supplied is a string which
> value gets converted to Unicode before being passed to the Unicode
> Driver.
> In this particular Unicode ODBC API, the buffer_length should be a
> byte-count, not a character-count. The implementation of
> SQLGetDescField in
> the unixODBC driver manager does deal with this better and divides
> string_length by sizeof(SQLWCHAR) before returning to the application.
> That
> works better, but is too simplistic for multi-byte ANSI data (e.g. UTF-8)
> See
> #3.

Try 2.3.3pre, that may have been done.

>
> 3)      Conversions between Unicode and ANSI are almost universally
> assuming
> that one byte of ANSI data will produce two bytes of Unicode data (when
> sizeof(SQLWCHAR) is 2). The code needs to check the length of the
> resulting
> string (ANSI or Unicode) whenever such a conversion occurs and then
> use the
> resulting length when passing it on to the driver or calling application.
> Functions like  the ANSI versions of  SQLPrepare and SQLExecDirect can't
> just perform an ansi-to-unicode translation and then pass the application
> supplied length to the Unicode driver.

The A-W and W-A conversions in the driver was meant to be simplistic,
assuming that unicode drivers would also export the ansi entry point and
handle their own conversions. The iconv option should allow this, I will
check, see (1). Remember that when this was written initially, there was
no case where one byte of ANSI would not produce two bytes of unicode.
In generally I think its still true, but the use of UTF starts to break
this.

>
>
>
> Looking at the unixODBC code, it seems clear that we were exposed to
> similar
> issues
> with our old ANSI driver when called from a Unicode application.
> Applications using parameter markers rather than string literals would be
> less sensitive to the limitations of the current unixODBC driver manager
> implementation since keywords and identifiers are less likely to contain
> "problematic" characters , but it would seem important to address this
> none
> the less.
>
> Any suggestions would be appreciated.
>
See (1).

--
Nick
_______________________________________________
unixODBC-dev mailing list
[hidden email]
http://mailman.unixodbc.org/mailman/listinfo/unixodbc-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ANSI to Unicode mapping issues (resend)

Nick Gorham-2
In reply to this post by David Brown
On 02/05/14 02:31, David Brown wrote:

> (Pardon me if this is a duplicate - I tried sending it a few days ago
> from a different address, but it didn't appear to go through)
>
> We have been building and shipping an older ANSI version of our ODBC
> driver
> (StarSQL) in Unix/Linux environments. We recently ported our current
> Unicode
> ODBC driver (which has been running on Windows for several years) to
> Linux,
> and ran into some issues that appear to be related to the unixODBC Driver
> Manager mappings from ANSI entry point to the driver's Unicode entry
> points
> when an ANSI application invokes ODBC calls to a Unicode driver.
>
> Has anyone else encountered any of these issues?  Thoughts on a solution?

I had a look at this over lunch.

>
> We are using the 2.3.2 release.
>
> Here is a list of the issues encountered by the developer of our driver:
>
> 1)      The Driver Manager does not map calls from an ANSI application's
> call to SQLGet/SetStmtOption to a Unicode driver's SQLGet/SetStmtAttrW
> entry
> points. It only does the mapping to SQLGet/SetStmtAttr for ANSI
> drivers. We
> were able to work around this by adding SQLGet/SetStmtOption function
> entry
> points in our driver, but we shouldn't have to do that.

Not sure if by "Driver Manager" here you mean unixODBC or not, but if
you do, I am unsure of this. The DM code for SQLSetStmtOption, does have a

else if ( CHECK_SQLSETSTMTATTRW( statement -> connection ))

which should (or at least thats the intent) map that call to SQLSetStmtAttrW

SQLGetStmtOption is missing the mapping I will add that.


>
> 2)      SQLSetDescField does not alter the length supplied by the
> application ("buffer_length") when the field supplied is a string which
> value gets converted to Unicode before being passed to the Unicode
> Driver.
> In this particular Unicode ODBC API, the buffer_length should be a
> byte-count, not a character-count. The implementation of
> SQLGetDescField in
> the unixODBC driver manager does deal with this better and divides
> string_length by sizeof(SQLWCHAR) before returning to the application.
> That
> works better, but is too simplistic for multi-byte ANSI data (e.g. UTF-8)
> See
> #3.
>
> 3)      Conversions between Unicode and ANSI are almost universally
> assuming
> that one byte of ANSI data will produce two bytes of Unicode data (when
> sizeof(SQLWCHAR) is 2). The code needs to check the length of the
> resulting
> string (ANSI or Unicode) whenever such a conversion occurs and then
> use the
> resulting length when passing it on to the driver or calling application.
> Functions like  the ANSI versions of  SQLPrepare and SQLExecDirect can't
> just perform an ansi-to-unicode translation and then pass the application
> supplied length to the Unicode driver.
>
>
> Looking at the unixODBC code, it seems clear that we were exposed to
> similar
> issues
> with our old ANSI driver when called from a Unicode application.
> Applications using parameter markers rather than string literals would be
> less sensitive to the limitations of the current unixODBC driver manager
> implementation since keywords and identifiers are less likely to contain
> "problematic" characters , but it would seem important to address this
> none
> the less.
>
> Any suggestions would be appreciated.

Looking at the code, I can see its possible that I could make the calls
via iconv modify the buffer length before its passed to the driver for
the cases of SQLSet* and SQLPrepare type things, it gets a bit (lot)
more of a problem if we want the driver manager to do the same thing
going back, In the simple case its fine, I can convert from W to A, get
the new length, and pass that length to the app, but what about the
cases where the target buffer is not large enough, I can't convert data
I dont have, so the length in that case will be a guess at best.

Generally another case where ODBC and UTF is a problem.

What have you found the microsoft DM does under these conditions? Does
it handle (for example) converting UTF-8 to UTC-2 or UTF-16 with the
change in length, or does it avoid the problem?

--
Nick
_______________________________________________
unixODBC-dev mailing list
[hidden email]
http://mailman.unixodbc.org/mailman/listinfo/unixodbc-dev
Loading...