Ankan Priya Ankan Priya - 1 month ago 11
Java Question

Input String Is Encoded, Need The Original String - Java Code

I have a REST web service that takes input in the form of JSON (as multipart form data).

@POST
@Consumes ({"application/ds-json",MediaType.APPLICATION_FORM_URLENCODED,MediaType.APPLICATION_JSON,MediaType.APPLICATION_XML,"text/html",MediaType.MULTIPART_FORM_DATA})
@Produces({ text_html, "application/ds-json" })
@Path("/abc")
public Response abc(@Context HttpServletRequest req, @Context HttpServletResponse response){
.
.
.
.
String strInput = inputJSON.getString("data");
.
.
.
}


The input JSON that I send is
{"data":"Sécurité"}
while the value of string
strInput
I get is
Sécurité


I tried
java.net.URLDecoder.decode(strInput, "iso-8859-1")
to decode it back to its original character, but failed.

I also tried
String strInput = new String((inputJSON.getString("data")).getBytes(), "iso-8859-1");
in anticipation that the incoming characters will get stored in the variable
strInput
as per requirement, but failed.

I feel totally lost here. Can someone help?




EDIT:

To be more clear, below is how exactly I'm sending the JSON to this service(for testing purpose only):


  1. I have created an HTML page that can send POST requests to the web service





<!DOCTYPE html>
<html>

<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Insert title here</title>
</head>

<body>

<form action="http://localhost:8080/xxxx/abc" method="POST" enctype="multipart/form-data">

JSON:
<input type="text" name="data">
<input type="submit" name="submit">
</form>
</body>

</html>






  1. In the page, I enter the text as
    Sécurité


Answer

Thank you everyone, I somehow managed to resolve this issue and @kayaman's comment helped me reach here

What you're seeing is UTF-8 data decoded as ISO-8859-1. – Kayaman

I just converted the input string strInput to bytes as per ISO-8859-1 encoding and again created the new string as per UTF-8 encoding. This did the job for me.

byte[] inputBytes = strInput.getBytes("iso-8859-1");
strInput = new String(inputBytes, "UTF-8"); 

Earlier I was fetching bytes of the input string as byte[] inputBytes = strInput.getBytes(); which by default was returning me UTF-8 decoded bytes which was having extra bytes than what I had expected(I mentioned that to @Kayaman):

@Kayaman Yes, you are very much correct and I tested that in a separate test class. But in my current case(the web service), the data that I'm getting as input seems to be corrupted somehow. I tried printing bytes of both the input and the expected string: byte[] s = strInput.getBytes("UTF-8"); byte[] s1 = "Sécurité".getBytes("UTF-8"); their result: s = [83, -61, -125, -62, -87, 99, 117, 114, 105, 116, -61, -125, -62, -87] s1 = [83, -61, -87, 99, 117, 114, 105, 116, -61, -87] both these should have been same, but I'm getting extra bytes {-125, -62} – Ankan Priya

however, as the string was in ISO-8859-1 encoded form, I needed to get the bytes using the same decoding scheme and it worked(see code snippet above)