Skip to content

feat: generic parser#696

Open
Phillip9587 wants to merge 1 commit into
expressjs:masterfrom
Phillip9587:generic-again
Open

feat: generic parser#696
Phillip9587 wants to merge 1 commit into
expressjs:masterfrom
Phillip9587:generic-again

Conversation

@Phillip9587

Copy link
Copy Markdown
Member

Based on the work by @jonchurch in #606 and previous PRs listed in #22 (comment). Dicussion on this topic happened in #22.


There is one remaining point we need to figure out:

parse function

This PR in its current state passes the body as Buffer and the charset to the user provided parse function:

function customParse(buffer, charset) {
  // will only work if node.js supports this encoding else the user needs to use iconv-lite or similar libraries 
  const stringContent = buffer.toString(charset) 
  // custom parse method
}

With this approach we loose the iconv-lite decoding and the users would need to do that themselves . Should we instead already decode the body and pass it as string without the charset? Like:

function customParse(decodedBodyAsString) {
  // custom parse method
}

I think the most common use-case for the generic parser will be a custom string parser like parsing csv or yaml. Not some custom binary format. As alternative we could expose an option which switches between passing a Buffer or a already decoded string to the parse function.

cc @ctcpip @jonchurch @UlisesGascon @bjohansebas

Co-authored-by: Jon Church <me@jonchurch.com>
@Phillip9587

Copy link
Copy Markdown
Member Author

Hey @UlisesGascon @jonchurch, a review would be really appreciated!

1 similar comment
@Phillip9587

Copy link
Copy Markdown
Member Author

Hey @UlisesGascon @jonchurch, a review would be really appreciated!

@Phillip9587

Copy link
Copy Markdown
Member Author

Hey @UlisesGascon @jonchurch i would really appreciate it if you check this PR out and give me some feedback on how you would like to proceed.

@wesleytodd wesleytodd left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I didn't have a chance to go read the prior conversations, so cant say if all the previous discussions are resolved by this but I am 👍 on what is here. Also with the proposed change to parse as long as you can make it in a backcompat way (since the rest of this is attempting to be backwards compatible AFAICT). Otherwise I am not opposed to a new major for something like this if we deem it worth it.

Comment thread test/generic.js
}
wrapped.called = () => called
return wrapped
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the core test runner provides this functionality now, might be nice to use depending on node version support (which I did not look up).

@ctcpip ctcpip left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not actually requesting changes here, I just want to make sure the repo captain(s) approve this. In particular, Jon had thoughts about generic parser impl.

@jonchurch jonchurch left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the string or buffer question for the parse function, we could add a decode: Boolean to the generic.options object which is required to be set to configure whether or not the read function will decode the body before passing it to parse function. Thereby controlling if parse gets a Buffer or String.

Forcing users to make this decision ahead of time is IMO appropriate, as the parser will need to know if it is operating on a string or a buffer, and what you require differs depending on if it's a "simple" textual representation or if the parser requires the raw bytes. (meaning, every parser should already know what they want before it is even written)

For users who want us to handle decoding via the robust setup we already have, they can opt in, for those who dont they can opt out. Im wary of picking a default, bc I am not aware that there is a safe default here.

Later changes are less breaking if we don't set a default now, bc everyone had to make a choice, so adding a new choice (or a new default if we learn better!) will not shift the behavior under anyone who already wrote a parser.

I went down an xml rabbit hole which is a bit long to put here, but xml is largely textual but can also carry its own encoding information (BOM / encoding declaration), which can disagree with the transport charset. So folks may want the raw bytes and to resolve the encoding themselves.

Point being with the xml example: even a text format can sometimes need the raw bytes, so there's no safe default to pick based just on content type, anyone who is going far enough off the beaten path to write their own parser should have to decide.

An aside on hard things, naming: decode as an option name feels right to me but also feels weird? Weird in that decoding is pretty overloaded as a term in HTTP stuff. Will it confuse people who are also setting inflate? It's not clear by the name alone that this controls if their parser gets string or buffer, parseAs: 'string' | 'buffer'? idk, im sure Im overthinking it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants